Closed Bug 687832 Opened 13 years ago Closed 9 years ago

create generic RETRY signifier, and make retry.py print it when it fails to successfully run something

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ehsan.akhgari, Unassigned)

References

Details

(Whiteboard: [retry])

Attachments

(2 files, 2 obsolete files)

I got this red try build job yesterday: <https://tbpl.mozilla.org/php/getParsedLog.php?id=6456569&tree=Try&full=1>.  It seems like it just failed to download the mozconfig file, but instead of getting restarted (like this job <https://tbpl.mozilla.org/php/getParsedLog.php?id=6457539&tree=Try&full=1> which failed during hg clone), it just went red: <https://tbpl.mozilla.org/?tree=Try&rev=bb26cf539306>.  (I just restarted it manually).  I believe this should be marked as an infra exception, right?
Yeah, we should go purple and retry on the 502 errors.
Assignee: nobody → bhearsum
Hmmm, this is trickier than I thought. Because the mozconfig download is run through retry.py, we _could_ have a run which has a 502 error but still completes successfully. If we catch this specific 502 error, those runs will get marked as purple and retry.

Maybe we can create a magic string that Buildbot catches and turns purple for. Eg, retry.py could print 'OMG BUILDBOT TURN PURPLE AND RETRY' if it never succeeds in running a command, and Buildbot could look for that instead of the gateway error. This would be re-usable in other places too.
got quorum on this over IRC, retitling the bug to something more generic.
Summary: Build job which failed during downloading of mozconfigs did not restart automatically → create generic RETRY signifier, and make retry.py print it when it fails to successfully run something
Blocks: 688217
Attached patch buildbotcustom patch (obsolete) — Splinter Review
Attachment #561553 - Flags: feedback?(bear)
Attached patch retry.py patch (obsolete) — Splinter Review
Attachment #561554 - Flags: feedback?(catlee)
Attachment #561554 - Flags: feedback?(bear)
Attachment #561553 - Flags: feedback?(catlee)
Comment on attachment 561553 [details] [diff] [review]
buildbotcustom patch

love the comment - really helpful
Attachment #561553 - Flags: feedback?(bear) → feedback+
Attachment #561554 - Flags: feedback?(bear) → feedback+
Comment on attachment 561553 [details] [diff] [review]
buildbotcustom patch

Catlee pointed out that we already use Automation Error: in some places, so I'm going to switch to that.
Attachment #561553 - Attachment is obsolete: true
Attachment #561553 - Flags: feedback?(catlee)
Attachment #561554 - Attachment is obsolete: true
Attachment #561554 - Flags: feedback?(catlee)
Attachment #564971 - Flags: review?(catlee)
Attached patch retry.py patchSplinter Review
I did a quick test of this in staging by forcing the maybe-download-mozconfig-step to fail, and it worked as expected.
Attachment #564972 - Flags: review?(catlee)
Attachment #564971 - Flags: review?(catlee) → review+
Attachment #564972 - Flags: review?(catlee) → review+
Comment on attachment 564971 [details] [diff] [review]
buildbotcustom patch

Landed on the default branch. The next scheduled reconfig (possibly tomorrow, Tuesday at the latest) will pick this up. I also landed the retry.py patch, which will take effect immediately, meaning that "Automation Error:" will appear in the logs when something run through retry.py fails.
Attachment #564971 - Flags: checked-in+
Attachment #564972 - Flags: checked-in+
This made it to production today.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 694809
This caused (newly filed) Bug 694809
Attachment #564972 - Flags: checked-in+ → checked-in-
Comment on attachment 564971 [details] [diff] [review]
buildbotcustom patch

Backed out because of infinite retry loops, like bug 694809.
Attachment #564971 - Flags: checked-in+ → checked-in-
Assignee: bhearsum → nobody
Status: RESOLVED → REOPENED
Priority: -- → P3
Resolution: FIXED → ---
Component: Release Engineering → Release Engineering: Automation
OS: Linux → All
Priority: P3 → --
QA Contact: release → catlee
Hardware: x86_64 → All
Whiteboard: [retry]
Priority: -- → P3
Product: mozilla.org → Release Engineering
last touched in 2011
Status: REOPENED → RESOLVED
Closed: 13 years ago9 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: