docker image jobs no longer retry on failure

RESOLVED FIXED

Status

Release Engineering
Release Automation
P1
normal
RESOLVED FIXED
2 months ago
a month ago

People

(Reporter: nthomas, Assigned: rail)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [releaseduty])

Attachments

(1 attachment)

(Reporter)

Description

2 months ago
Since we moved to the Queue API (bug 1259627) we don't retry the flaky docker image generation (bug 1367491). 

In announcing the move rail said:
There is one thing to watch out. The old API supports `reruns`, which helped us rerunning failed tasks automatically. I tried to work around the lack of this feature in the Queue API by using docker-worker's `onExitStatus`, but it may behave a bit differently.
Priority: -- → P1
Whiteboard: [releaseduty]
(Assignee)

Comment 1

2 months ago
Example log: https://public-artifacts.taskcluster.net/QeN83956QuuFuZlzQGyq1w/0/public/logs/live_backing.log
(Assignee)

Updated

2 months ago
Assignee: nobody → rail
(Assignee)

Comment 2

2 months ago
Created attachment 8902131 [details] [review]
retry on 255
Attachment #8902131 - Flags: review?(mtabara)
(In reply to Rail Aliiev [:rail] ⌚️UTC+3 from comment #2)
> Created attachment 8902131 [details] [review]
> retry on 255

Replied in PR with requested changes. Will change the flags here as well once we've merged to avoid re-setting flag for review.
Attachment #8902131 - Flags: review?(mtabara) → review+
(Assignee)

Comment 4

2 months ago
Comment on attachment 8902131 [details] [review]
retry on 255

Deployed
Attachment #8902131 - Flags: checked-in+
(Assignee)

Comment 5

2 months ago
56.0b8 is not helping, there were not failures! :)
(Reporter)

Comment 6

2 months ago
Strongly suspect https://hg.mozilla.org/mozilla-central/rev/84fd52d2832a#l4.14 is the reason for that, and fixes bug 1367491.
(Reporter)

Comment 7

2 months ago
That hasn't been uplifted to beta though, so maybe something else/coincidence.
(Reporter)

Comment 8

2 months ago
We got retries for hg errors in tasks 0, 1, and 2 in https://tools.taskcluster.net/groups/PFi2U7q2SCWNvW-ud7TkWw/tasks/csCjgMfVQxqOyBV6aKAB3w/details. Then it failed in task 3 on a clamav error, where we get an exit status of -1.
(Assignee)

Comment 9

a month ago
Added -1 to the list in https://github.com/mozilla-releng/releasetasks/pull/276 and deployed
(Reporter)

Updated

a month ago
See Also: → bug 1398964
(Assignee)

Comment 10

a month ago
Closing this. Bug 1398964 is a good to have.
Status: NEW → RESOLVED
Last Resolved: a month ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.