Closed Bug 692388 Opened 13 years ago Closed 13 years ago

mozharness MercurialVCS with HG_SHARE_BASE_DIR set completely ignores specified revision

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

(Whiteboard: [mozharness][automation][releases])

Attachments

(3 files, 4 obsolete files)

I tested android nightly builds and android staging release builds at numerous points during 0.4 development, but didn't hit or didn't notice this issue.

Catlee ran into this during 8.0b2 :(

This is the second issue due to bug 650882; I'm wondering if it's worth it currently.

Debugging on mv-moz2-linux-ix-slave21 atm (gracefully shut down) with and without HG_SHARE_BASE_DIR set.

I see 2 easy fixes:

1) stop honoring HG_SHARE_BASE_DIR in mozharness; retag + rerun the android release build
2) branch off of the pre-0.4 revision; bump the config file for 8.0b2; retag + rerun the android release build.

The third, "real" fix would be to figure out how to properly tie in the revision into the hg share code path.
(In reply to Aki Sasaki [:aki] from comment #0)
> 2) branch off of the pre-0.4 revision; bump the config file for 8.0b2; retag
> + rerun the android release build.

Uh, no maemo deb, so I think no config file bump needed. [!]
OHHHH.

So multilocale requires a built objdir for an en-US build.  We create the objdir as a child of the srcdir.

This line:

  19:41:42     INFO - No file /builds/slave/rel-m-beta-lnx-andrd-bld/build/.hg/sharedpath; removing /builds/slave/rel-m-beta-lnx-andrd-bld/build.

means that the en-US objdir was blown away along with mozilla-beta.
Hence no |make package|.


We started pulling build source (--only-pull-build-source option) recently because we no longer check out buildbot-configs because we started moving mozconfigs into the tree.

We also appear to be ignoring the specified revision in the locale repos, which is also not good.
Taking option 2 to try to get a green android build by morning.

 hg tag -r 8aaecd2fd823 -f FENNEC_8_0b2_RELEASE FIREFOX_8_0b2_BUILD1 FENNEC_8_0b2_BUILD1 FIREFOX_8_0b2_RELEASE 
 hg push

Rebuilding the android 8.0b2 release build via the buildbot 'rebuild' button.
Needs testing. This will also have the side effect of turning off hg share for mozharness multilocale on all nightlies, which may or may not be what we want.

I'm also not 100% certain if MercurialVCS.ensure_repo_and_revision() does the right thing *without* HG_SHARE_BASE_DIR set; that's part of what I would be testing.  All side effects of porting code that I don't fully understand.


Fixing this "for real" would mean

* figure out why there was no /builds/slave/rel-m-beta-lnx-andrd-bld/build/.hg/sharedpath -- do we need to change the android release build factory to use hgtool?
** alternately, make sure buildbot-configs is checked out via non-mozharness, so we don't have to deal with re-checking-out mozilla-beta
** alternately, do the entire clobber/pull/build/package/upload/multilocale/package/upload via mozharness
* make sure MercurialVCS._ensure_shared_repo_and_revision() actually updates to the appropriate revision with --tag-override set

or tearing out hgtool from the mozharness code base and testing as well.
Ugh,

hg update -r default
 in dir /builds/slave/rel-m-beta-lnx-andrd-bld/mozharness (timeout 1200 secs)

Evidently the android release build requires some love :(  Good thing mozharness tip has been good for a long time.

... Manually updated mozharness while the build compiled:

[cltbld@mv-moz2-linux-ix-slave21 rel-m-beta-lnx-andrd-bld]$ cd mozharness/
[cltbld@mv-moz2-linux-ix-slave21 mozharness]$ hg ident
8e39746c8531 tip
[cltbld@mv-moz2-linux-ix-slave21 mozharness]$ hg up -r FENNEC_8_0b2_RELEASE
16 files updated, 0 files merged, 17 files removed, 0 files unresolved
(In reply to Aki Sasaki [:aki] from comment #5)
> Ugh,
> 
> hg update -r default
>  in dir /builds/slave/rel-m-beta-lnx-andrd-bld/mozharness (timeout 1200 secs)

This patch should fix this specific issue.
(In reply to Aki Sasaki [:aki] from comment #4)
> * figure out why there was no
> /builds/slave/rel-m-beta-lnx-andrd-bld/build/.hg/sharedpath -- do we need to
> change the android release build factory to use hgtool?
> ** alternately, make sure buildbot-configs is checked out via
> non-mozharness, so we don't have to deal with re-checking-out mozilla-beta

For the mozharness-blowing-away-the-objdir issue, the root cause is a combination of

* we check out mozilla-beta via the Mercurial buildbot step, which does not use hg share as written and configured;
* we check out buildbot-configs via mozharness' pull-build-source action, which uses MercurialVCS, which honors HG_SHARE_BASE_DIR, and also updates mozilla-beta;
* our hg share logic blows away any dest that doesn't have .hg/sharedpath, blowing away the objdir as noted earlier.

I think adding a buildbot step to check out buildbot-configs, or get the l10n json via http, and no longer running the mozharness pull-build-source action, would be the cleanest/fastest solution for this part of the problem.


After writing that, I need to figure out why the locale repos aren't being updated to the correct revision, and test.
This means we'll be pulling buildbot-configs twice if we run the full set of actions in a multilocale build.  However, no one runs the multilocale script except buildbot, and buildbot skips the pull-build-source action now.

The 0.3 MercurialMixin used 'tag'; hgtool and 0.4 MercurialVCS uses 'revision'.
In theory this s,'tag','revision' will solve the updating-to-the-wrong-revision issue, but I'd like to see it in action.
Comment on attachment 565277 [details] [diff] [review]
set mozharness_tag correctly

If we have to spin build2 soon, we should land this and tag the pre-0.4 mozharness revision to avoid hitting this bug.

If we've got some time, I'll have the rest of this bug fixed by beta3.
Attachment #565277 - Flags: review?(catlee)
Comment on attachment 565351 [details] [diff] [review]
also pull buildbot-configs in pull-locale-source; possibly fix MercurialVCS revision

Testing:

HG_SHARE_BASE_DIR=/builds/hg-shared python mozharness/scripts/multil10n.py --config-file multi_locale/release_mozilla-beta_linux-android.json --merge-locales --tag-override=FENNEC_8_0b2_RELEASE --only-pull-locale-source

gives things like

16:49:27     INFO - Running command: hg --cwd /builds/slave/rel-m-beta-lnx-andrd-bld/./mozilla-beta/pt-PT update -C -r FENNEC_8_0b2_RELEASE

which is yay, in terms of locales being updated to the correct revision with HG_SHARE_BASE_DIR set.

Going to try testing the rest of it between interrupts.
(In reply to Aki Sasaki [:aki] from comment #12)

ignore comment 12; I was using mozharness 0.3.
Local testing on linux-ix-slave05 looks good for the mozharness patch: we're updating to the tag, and the multilocale build finished successfully.

Testing the full Android release build and nightly build via buildbot once I get all the user repo stuff set up.
Attached patch mozharness patch 2 (obsolete) — Splinter Review
Fix the user_repo_override for buildbot-configs (staging releases only)
Attachment #565351 - Attachment is obsolete: true
Attachment #565151 - Attachment is obsolete: true
Attachment #565277 - Flags: review?(catlee) → review?(rail)
Attachment #565350 - Flags: review?(rail)
Attachment #565277 - Flags: review?(rail) → review+
Comment on attachment 565350 [details] [diff] [review]
buildbot-configs: stop pulling build source in mozharness

Hmmmm, not sure if I'm following here. You added --only-pull-build-source to make sure that we pull buildbot-configs to build/configs to have JSON l10n changeset file in place. Don't you need that file anymore or is it being pulled somehow else these days?
(In reply to Rail Aliiev [:rail] from comment #16)
> Comment on attachment 565350 [details] [diff] [review] [diff] [details] [review]
> buildbot-configs: stop pulling build source in mozharness
> 
> Hmmmm, not sure if I'm following here. You added --only-pull-build-source to
> make sure that we pull buildbot-configs to build/configs to have JSON l10n
> changeset file in place. Don't you need that file anymore or is it being
> pulled somehow else these days?

The mozharness 2 patch will fix that, once I verify it works on a staging release run
(failed again because I tagged wrong :(  )
Attachment #565350 - Flags: review?(rail) → review+
Comment on attachment 565277 [details] [diff] [review]
set mozharness_tag correctly

http://hg.mozilla.org/build/buildbotcustom/rev/d1208567868c

Now if we need to, we can tag the last 0.3 revision of mozharness and have a working Android release build.
Attachment #565277 - Flags: checked-in+
I can't test on dev-master01 anymore ?

On a fresh master:

Traceback (most recent call last):
  File "/home/asasaki/build-master/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/scripts/runner.py", line 1039, in doCheckConfig
    ConfigLoader(configFileName=configFileName)
  File "/home/asasaki/build-master/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/scripts/checkconfig.py", line 31, in __init__
    self.loadConfig(configFile, check_synchronously_only=True)
  File "/home/asasaki/build-master/lib/python2.6/site-packages/buildbot-0.8.2_hg_aeaa057e9df6_production_0.8-py2.6.egg/buildbot/master.py", line 816, in loadConfig
    "%s uses unknown builder %s" % (s, b)
AssertionError: <buildbotcustom.scheduler.SpecificNightly-props instance at 0x1527cd88> uses unknown builder Android Debug mozilla-central nightly
make: *** [checkconfig] Error 1
Fixed dev-master01 issues.

Hitting this:

18:50:33     INFO - hg share works.
18:50:33     INFO - Updating shared repo
18:50:33     INFO - Cloning http://hg.mozilla.org/users/asasaki_mozilla.com/buildbot-configs to /builds/hg-shared/users/asasaki_mozilla.com/buildbot-configs to revision FENNEC_8_0b3_RELEASE.
18:50:33     INFO - mkdir: /builds/hg-shared/users/asasaki_mozilla.com
18:50:33     INFO - Running command: ['hg', 'clone', '-r', 'FENNEC_8_0b3_RELEASE', 'http://hg.mozilla.org/users/asasaki_mozilla.com/buildbot-configs', '/builds/hg-shared/users/asasaki_mozilla.com/buildbot-configs']
18:51:08     INFO -  requesting all changes
18:51:08     INFO -  adding changesets
18:51:08     INFO -  adding manifests
18:51:08     INFO -  adding file changes
18:51:08     INFO -  added 4749 changesets with 11637 changes to 2341 files
18:51:08     INFO -  updating to branch default
18:51:08     INFO -  1105 files updated, 0 files merged, 0 files removed, 0 files unresolved
18:51:08     INFO - Return code: 0
18:51:08     INFO - Updating /builds/hg-shared/users/asasaki_mozilla.com/buildbot-configs revision FENNEC_8_0b3_RELEASE.
18:51:08     INFO - Running command: ['hg', 'update', '-C', '-r', 'FENNEC_8_0b3_RELEASE'] in /builds/hg-shared/users/asasaki_mozilla.com/buildbot-configs
18:51:08    ERROR -  abort: unknown revision '46454e4e45435f385f3062335f52454c45415345'!
18:51:08    ERROR - Return code: 255


When I do this on disk on linux-ix-slave05, I get the same error.
When I clone without the -r FENNEC_8_0b3_RELEASE, it works.

I'm going to try removing the -r from the clone.
Ok, I think I need to update before I update -r.
My current guess is that hgtool was written to work with hg revisions, not tags, so if the tag is in .hgtags in a later revision, updating to the revision will break without updating to tip first.
Attached patch mozharness patch 3 (obsolete) — Splinter Review
Attachment #565453 - Attachment is obsolete: true
Attachment #565705 - Attachment is obsolete: true
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

This works in a staging release build!!!
Testing an android m-c nightly build.

However, I realized while testing this that updating to tip and then updating to the revision is great for tags (releases) but seems pretty awful for when you know the revision (esp. Try).

There are no multilocale Try builds, so that's not currently an issue.
I think long term we should pass (revision, tag, branch) instead of just (revision, branch).
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

Passed android nightly as well!

This fixes pulling from tags, and slows down updating to a revision.
I think we should have a followup bug to separate revision and tag behavior for if/when we use this in Try, but this should get releases working properly again.
Attachment #565706 - Flags: review?(rail)
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

Nick: If you feel up to reviewing this on your Monday, I can land on PDT Monday and run a preproduction release so we have better coverage before 8.0b3 go-to-build on Tuesday.  Otherwise, please clear the flag and I'll wait for Rail to come back EDT Tuesday.  Thanks either way!
Attachment #565706 - Flags: review?(nrthomas)
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

Sorry, I don't know this code well enough.
Attachment #565706 - Flags: review?(nrthomas)
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

The code itself looks good, but I found l10n_repos name ambiguous. Maybe something like required_repos would be more generic and easier to understand?
Attachment #565706 - Flags: review?(rail) → review+
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

http://hg.mozilla.org/build/mozharness/rev/e76fdabbca55

As mentioned in IRC: I'm fine with renaming, but it's named "l10n_repos" because those are pulled during the pull-locale-source action.  Rail didn't push too strong for a rename so I didn't, but if this is confusing to anyone else, I'm happy to rename.
Attachment #565706 - Flags: checked-in+
Comment on attachment 565350 [details] [diff] [review]
buildbot-configs: stop pulling build source in mozharness

http://hg.mozilla.org/build/buildbot-configs/rev/21d55654c9a3

I'm going to try a preprod release now, yay!
Attachment #565350 - Flags: checked-in+
Can't due to bug 693390, boo.
I have done pretty thorough testing on my end, however, and 8.0b3 will be the deciding factor on what we do going forward.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 693392
Comment on attachment 565706 [details] [diff] [review]
mozharness patch 4 (patch 3 + production configs)

Landed on production and reconfig'd today.
Product: mozilla.org → Release Engineering
Component: Other → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: