Closed Bug 774862 Opened 12 years ago Closed 8 years ago

should fix our polling code to handle the case where hg.mozilla.org resets

Categories

(Release Engineering :: General, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1289514

People

(Reporter: kmoir, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [hg])

Attachments

(1 file)

For example, in bug 774799

/builds/buildbot/build_scheduler/master/twistd.log.5:2012-07-17 11:37:20-0700 [HTTPPageGetter,client] http://hg.mozilla.org/try has been reset 

occurred and this caused duplicate try builds which is a waste of our machine capacity.
Not a release-specific problem, moving to Automation (General).
Component: Release Engineering: Automation (Release Automation) → Release Engineering: Automation (General)
QA Contact: bhearsum → catlee
Becoming more urgent since try seems to respond with 500 errors pretty regularly these days.
Severity: normal → major
Whiteboard: [hg]
See also bug 770811
Our polling code is checking specifically for "unknown revision" in the error message here, so it's not just any old ISE that will trigger this.
This seems to be happening on non-Try too:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=3b72e384d1bf
Adding the [buildduty] flag since buildduty is going to care a lot about the doubling of build load due to this.
Whiteboard: [hg] → [hg][buildduty]
Blocks: 774799
let's disable the resetting code for now. repo resets are pretty infrequent, so we can do them manually until this bug gets fixed for real.
Attachment #644011 - Flags: review?(bhearsum)
Attachment #644011 - Flags: review?(bhearsum) → review+
Attachment #644011 - Flags: checked-in+
We had another batch of double builds on try last night, eg:
https://tbpl.mozilla.org/?tree=Try&rev=eaf06d352463

Was this fix in place at that point? Looking at the timeframes I would have thought so, but I may be messing up the timezones or something :-(
This was checked in, but not yet deployed. I'll do that this morning.
In production
If we care to re-visit this, we need to re-enable the commented-out code and make it more resilient to intermittent 500 errors from hg.mozilla.org.

We're in good shape for now. The next time a project branch or try gets reset, we'll need to reset the pollers manually by restarting the build scheduler master.
Severity: major → normal
Priority: -- → P3
Whiteboard: [hg][buildduty] → [hg]
Bug 833555 is an example of where it would be good to have this code back, since none of us remembered to restart the build scheduler for bug 816300.
Updated the note at https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches#Book_one_of_our_fabulous_.22disposable.22_project_branches to help when builds don't start (when the poller is still looking for changes since an old revision).
Product: mozilla.org → Release Engineering
Please reconfigure the build scheduler. I've made the first push to the Gum twig after a reset and I don't see any builds starting.
Please ignore comment 15, the second push started building just fine.
This is doable now, with bug 1114843 / bug 1065771 - we don't need to rely on the HTTP response code.
This also overlaps with bug 1104374, since the ideal solution here likely relies on switching to using push IDs not changesets.

See also:
https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html#writing-agents-that-consume-pushlog-data
Depends on: 1114843, 1065771
See Also: → 1104374
I'm fixing this as part of scope bloat in bug 1289514.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: