I got this red try build job yesterday: <https://tbpl.mozilla.org/php/getParsedLog.php?id=6456569&tree=Try&full=1>. It seems like it just failed to download the mozconfig file, but instead of getting restarted (like this job <https://tbpl.mozilla.org/php/getParsedLog.php?id=6457539&tree=Try&full=1> which failed during hg clone), it just went red: <https://tbpl.mozilla.org/?tree=Try&rev=bb26cf539306>. (I just restarted it manually). I believe this should be marked as an infra exception, right?
Yeah, we should go purple and retry on the 502 errors.
Assignee: nobody → bhearsum
Hmmm, this is trickier than I thought. Because the mozconfig download is run through retry.py, we _could_ have a run which has a 502 error but still completes successfully. If we catch this specific 502 error, those runs will get marked as purple and retry. Maybe we can create a magic string that Buildbot catches and turns purple for. Eg, retry.py could print 'OMG BUILDBOT TURN PURPLE AND RETRY' if it never succeeds in running a command, and Buildbot could look for that instead of the gateway error. This would be re-usable in other places too.
got quorum on this over IRC, retitling the bug to something more generic.
Summary: Build job which failed during downloading of mozconfigs did not restart automatically → create generic RETRY signifier, and make retry.py print it when it fails to successfully run something
Created attachment 561553 [details] [diff] [review] buildbotcustom patch
Attachment #561553 - Flags: feedback?(bear)
Created attachment 561554 [details] [diff] [review] retry.py patch
Comment on attachment 561553 [details] [diff] [review] buildbotcustom patch love the comment - really helpful
Attachment #561553 - Flags: feedback?(bear) → feedback+
Comment on attachment 561553 [details] [diff] [review] buildbotcustom patch Catlee pointed out that we already use Automation Error: in some places, so I'm going to switch to that.
Created attachment 564971 [details] [diff] [review] buildbotcustom patch
Attachment #564971 - Flags: review?(catlee)
Created attachment 564972 [details] [diff] [review] retry.py patch I did a quick test of this in staging by forcing the maybe-download-mozconfig-step to fail, and it worked as expected.
Attachment #564972 - Flags: review?(catlee)
Comment on attachment 564971 [details] [diff] [review] buildbotcustom patch Landed on the default branch. The next scheduled reconfig (possibly tomorrow, Tuesday at the latest) will pick this up. I also landed the retry.py patch, which will take effect immediately, meaning that "Automation Error:" will appear in the logs when something run through retry.py fails.
Attachment #564971 - Flags: checked-in+
This made it to production today.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
This caused (newly filed) Bug 694809
Attachment #564972 - Flags: checked-in+ → checked-in-
Comment on attachment 564971 [details] [diff] [review] buildbotcustom patch Backed out because of infinite retry loops, like bug 694809.
Attachment #564971 - Flags: checked-in+ → checked-in-
Assignee: bhearsum → nobody
Status: RESOLVED → REOPENED
Priority: -- → P3
Resolution: FIXED → ---
Component: Release Engineering → Release Engineering: Automation
OS: Linux → All
Priority: P3 → --
QA Contact: release → catlee
Hardware: x86_64 → All
Product: mozilla.org → Release Engineering
last touched in 2011
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago → 3 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.