Closed Bug 685124 Opened 13 years ago Closed 13 years ago

hgtool needs to recover from being killed

Categories

(Release Engineering :: General, defect, P2)

x86_64
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: catlee)

References

Details

Attachments

(1 file, 1 obsolete file)

if hgtool is killed because it's taking to long, .hg/hgrc doesn't get written which means there is no [paths] section, which breaks buildsymbols.

hgtool needs to detect this case and recover from it.
Attachment #558927 - Flags: review?(bhearsum)
The manual fix for these slaves (so far linux-ix-slave22, linux-ix-slave30, linux64-ix-slave06), is to drop this 

[paths]
default = http://hg.mozilla.org/integration/mozilla-inbound

at /builds/hg-shared/integration/mozilla-inbound/.hg/hgrc
   /builds/slave/m-in-lnx/build/.hg/hgrc
(adjusting lnx vs lnx64, adding -dbg if necessary)

I found that fixing the first location then doing an hg up at the copy doesn't fix up hgrc at the copy, so the patch is probably incomplete.
Comment on attachment 558927 [details] [diff] [review]
clobber shared repo if the default path isn't correct

(In reply to Nick Thomas [:nthomas] from comment #2)
> The manual fix for these slaves (so far linux-ix-slave22, linux-ix-slave30,
> linux64-ix-slave06), is to drop this 
> 
> [paths]
> default = http://hg.mozilla.org/integration/mozilla-inbound
> 
> at /builds/hg-shared/integration/mozilla-inbound/.hg/hgrc
>    /builds/slave/m-in-lnx/build/.hg/hgrc
> (adjusting lnx vs lnx64, adding -dbg if necessary)
> 
> I found that fixing the first location then doing an hg up at the copy
> doesn't fix up hgrc at the copy, so the patch is probably incomplete.

Hmmm, this confuses me - I'm not sure how we can get into a state where both the shared version and final destination are missing an hgrc. If hgtool gets killed while cloning the hg-shared directory, it won't have an hgrc, and the m-in-lnx one won't get created at all. If hgtool gets killed while sharing to the m-in-lnx one, the shared version will be fine, and the m-in-lnx one won't have an hgrc at all, but sharedpath will be set.

Catlee's patch addresses the former. I'm not sure we need to address hgrc in the latter, since subsequent updates should work fine because of sharedpath being set. Let me know if I've missed something here.

The patch looks fine to me, but r- because it needs tests!
Attachment #558927 - Flags: review?(bhearsum) → review-
Attached patch yay for tests!Splinter Review
Attachment #558927 - Attachment is obsolete: true
Attachment #559221 - Flags: review?(bhearsum)
(In reply to Phil Ringnalda (:philor) from comment #6)
> New shooter, linux-ix-slave38 in
> https://tbpl.mozilla.org/php/getParsedLog.php?id=6333123&full=1

I just patched up this slave.
Comment on attachment 559221 [details] [diff] [review]
yay for tests!

Review of attachment 559221 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, but Nick should have a look to, to make sure my statements in comment #3 are indeed correct
Attachment #559221 - Flags: review?(nrthomas)
Attachment #559221 - Flags: review?(bhearsum)
Attachment #559221 - Flags: review+
(In reply to Ben Hearsum [:bhearsum] from comment #3)
> Hmmm, this confuses me - I'm not sure how we can get into a state where both
> the shared version and final destination are missing an hgrc. If hgtool gets
> killed while cloning the hg-shared directory, it won't have an hgrc, and the
> m-in-lnx one won't get created at all. 

Agreed. The issue is what happens on the next build, linux-ix-slave22 on Wednesday. The initial clone times out:

Updating shared repo
command: START
command: hg clone -U -r 8a9c10ebbf00795f005b3612fe472f5bd2bf3382 http://hg.mozilla.org/integration/mozilla-inbound /builds/hg-shared/integration/mozilla-inbound
command: cwd: /builds/slave/m-in-lnx
command: output:

command: output:

command timed out: 3600 seconds without output, attempting to kill
process killed by signal 9
--------

Then the next build pulls rather than clobbering & cloning:

Updating shared repo
command: START
command: hg pull -r 8a9c10ebbf00795f005b3612fe472f5bd2bf3382 http://hg.mozilla.org/integration/mozilla-inbound
command: cwd: /builds/hg-shared/integration/mozilla-inbound
command: output:
pulling from http://hg.mozilla.org/integration/mozilla-inbound
requesting all changes
adding changesets
adding manifests
adding file changes
added 76311 changesets with 360561 changes to 79556 files
(run 'hg update' to get a working copy)
command: END (325.95s elapsed)
----

Buildbot has used a kill -9 for the timeout, so I suspect that abruptly leaves the share in an state with some/most of the history but no hgrc. Then the 2nd build fixes the history but not the hgrc, and we set up our copy from that busted state.
Attachment #559221 - Flags: review?(nrthomas) → review+
linux-ix-slave25 is fixed.
Comment on attachment 559221 [details] [diff] [review]
yay for tests!

hg.mozilla.org/build/tools/rev/6fd5d6755de5 (doesn't require a reconfig)
Attachment #559221 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 691467
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: