create generic RETRY signifier, and make retry.py print it when it fails to successfully run something

RESOLVED WONTFIX

Status

P3
normal
RESOLVED WONTFIX
7 years ago
5 months ago

People

(Reporter: Ehsan, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [retry])

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

7 years ago
I got this red try build job yesterday: <https://tbpl.mozilla.org/php/getParsedLog.php?id=6456569&tree=Try&full=1>.  It seems like it just failed to download the mozconfig file, but instead of getting restarted (like this job <https://tbpl.mozilla.org/php/getParsedLog.php?id=6457539&tree=Try&full=1> which failed during hg clone), it just went red: <https://tbpl.mozilla.org/?tree=Try&rev=bb26cf539306>.  (I just restarted it manually).  I believe this should be marked as an infra exception, right?
Yeah, we should go purple and retry on the 502 errors.
Assignee: nobody → bhearsum
Hmmm, this is trickier than I thought. Because the mozconfig download is run through retry.py, we _could_ have a run which has a 502 error but still completes successfully. If we catch this specific 502 error, those runs will get marked as purple and retry.

Maybe we can create a magic string that Buildbot catches and turns purple for. Eg, retry.py could print 'OMG BUILDBOT TURN PURPLE AND RETRY' if it never succeeds in running a command, and Buildbot could look for that instead of the gateway error. This would be re-usable in other places too.
got quorum on this over IRC, retitling the bug to something more generic.
Summary: Build job which failed during downloading of mozconfigs did not restart automatically → create generic RETRY signifier, and make retry.py print it when it fails to successfully run something

Updated

7 years ago
Blocks: 688217
Created attachment 561553 [details] [diff] [review]
buildbotcustom patch
Attachment #561553 - Flags: feedback?(bear)
Created attachment 561554 [details] [diff] [review]
retry.py patch
Attachment #561554 - Flags: feedback?(catlee)
Attachment #561554 - Flags: feedback?(bear)
Attachment #561553 - Flags: feedback?(catlee)

Comment 6

7 years ago
Comment on attachment 561553 [details] [diff] [review]
buildbotcustom patch

love the comment - really helpful
Attachment #561553 - Flags: feedback?(bear) → feedback+

Updated

7 years ago
Attachment #561554 - Flags: feedback?(bear) → feedback+
Comment on attachment 561553 [details] [diff] [review]
buildbotcustom patch

Catlee pointed out that we already use Automation Error: in some places, so I'm going to switch to that.
Attachment #561553 - Attachment is obsolete: true
Attachment #561553 - Flags: feedback?(catlee)
Attachment #561554 - Attachment is obsolete: true
Attachment #561554 - Flags: feedback?(catlee)
Created attachment 564971 [details] [diff] [review]
buildbotcustom patch
Attachment #564971 - Flags: review?(catlee)
Created attachment 564972 [details] [diff] [review]
retry.py patch

I did a quick test of this in staging by forcing the maybe-download-mozconfig-step to fail, and it worked as expected.
Attachment #564972 - Flags: review?(catlee)

Updated

7 years ago
Attachment #564971 - Flags: review?(catlee) → review+

Updated

7 years ago
Attachment #564972 - Flags: review?(catlee) → review+
Comment on attachment 564971 [details] [diff] [review]
buildbotcustom patch

Landed on the default branch. The next scheduled reconfig (possibly tomorrow, Tuesday at the latest) will pick this up. I also landed the retry.py patch, which will take effect immediately, meaning that "Automation Error:" will appear in the logs when something run through retry.py fails.
Attachment #564971 - Flags: checked-in+
This made it to production today.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED

Updated

7 years ago
Blocks: 694809
This caused (newly filed) Bug 694809
Attachment #564972 - Flags: checked-in+ → checked-in-
Comment on attachment 564971 [details] [diff] [review]
buildbotcustom patch

Backed out because of infinite retry loops, like bug 694809.
Attachment #564971 - Flags: checked-in+ → checked-in-
Assignee: bhearsum → nobody
Status: RESOLVED → REOPENED
Priority: -- → P3
Resolution: FIXED → ---
Component: Release Engineering → Release Engineering: Automation
OS: Linux → All
Priority: P3 → --
QA Contact: release → catlee
Hardware: x86_64 → All
Whiteboard: [retry]

Updated

7 years ago
Priority: -- → P3
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering
last touched in 2011
Status: REOPENED → RESOLVED
Last Resolved: 7 years ago3 years ago
Resolution: --- → WONTFIX
(Assignee)

Updated

5 months ago
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.