should fix our polling code to handle the case where hg.mozilla.org resets

RESOLVED DUPLICATE of bug 1289514

Status

Release Engineering
General Automation
P3
normal
RESOLVED DUPLICATE of bug 1289514
6 years ago
a year ago

People

(Reporter: kmoir, Unassigned)

Tracking

(Depends on: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [hg])

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
For example, in bug 774799

/builds/buildbot/build_scheduler/master/twistd.log.5:2012-07-17 11:37:20-0700 [HTTPPageGetter,client] http://hg.mozilla.org/try has been reset 

occurred and this caused duplicate try builds which is a waste of our machine capacity.
Not a release-specific problem, moving to Automation (General).
Component: Release Engineering: Automation (Release Automation) → Release Engineering: Automation (General)
QA Contact: bhearsum → catlee
Becoming more urgent since try seems to respond with 500 errors pretty regularly these days.
Severity: normal → major
Whiteboard: [hg]
See also bug 770811
Our polling code is checking specifically for "unknown revision" in the error message here, so it's not just any old ISE that will trigger this.
This seems to be happening on non-Try too:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=3b72e384d1bf

Updated

6 years ago
Blocks: 741688

Comment 6

6 years ago
Adding the [buildduty] flag since buildduty is going to care a lot about the doubling of build load due to this.
Whiteboard: [hg] → [hg][buildduty]

Updated

6 years ago
Blocks: 774799
Created attachment 644011 [details] [diff] [review]
disable resetting

let's disable the resetting code for now. repo resets are pretty infrequent, so we can do them manually until this bug gets fixed for real.
Attachment #644011 - Flags: review?(bhearsum)
Attachment #644011 - Flags: review?(bhearsum) → review+

Updated

6 years ago
Attachment #644011 - Flags: checked-in+
We had another batch of double builds on try last night, eg:
https://tbpl.mozilla.org/?tree=Try&rev=eaf06d352463

Was this fix in place at that point? Looking at the timeframes I would have thought so, but I may be messing up the timezones or something :-(
This was checked in, but not yet deployed. I'll do that this morning.
In production
If we care to re-visit this, we need to re-enable the commented-out code and make it more resilient to intermittent 500 errors from hg.mozilla.org.

We're in good shape for now. The next time a project branch or try gets reset, we'll need to reset the pollers manually by restarting the build scheduler master.
Severity: major → normal
Priority: -- → P3
Whiteboard: [hg][buildduty] → [hg]
Duplicate of this bug: 769016
Bug 833555 is an example of where it would be good to have this code back, since none of us remembered to restart the build scheduler for bug 816300.
Updated the note at https://wiki.mozilla.org/ReleaseEngineering/DisposableProjectBranches#Book_one_of_our_fabulous_.22disposable.22_project_branches to help when builds don't start (when the poller is still looking for changes since an old revision).
(Assignee)

Updated

4 years ago
Product: mozilla.org → Release Engineering
Please reconfigure the build scheduler. I've made the first push to the Gum twig after a reset and I don't see any builds starting.
Please ignore comment 15, the second push started building just fine.
This is doable now, with bug 1114843 / bug 1065771 - we don't need to rely on the HTTP response code.
This also overlaps with bug 1104374, since the ideal solution here likely relies on switching to using push IDs not changesets.

See also:
https://mozilla-version-control-tools.readthedocs.org/en/latest/hgmo/pushlog.html#writing-agents-that-consume-pushlog-data
Depends on: 1114843, 1065771
See Also: → bug 1104374
I'm fixing this as part of scope bloat in bug 1289514.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1289514
You need to log in before you can comment on or make changes to this bug.