The decision task of mozilla-release is busted: it times out on json-automationrelevance
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect, P1)
Tracking
(Not tracked)
People
(Reporter: jlorenzo, Assigned: sheehan)
References
(Regression)
Details
Attachments
(2 files)
Today, I merged mozilla-beta (83) to mozilla-release. As usual, the number of changesets is quite big[1]. For an unknown reason, json-automationrelevance[2] takes too long to respond for the decision task, even after 10 retries with exponential backoff:
[task 2020-11-09T13:53:34.146Z] Querying version control for metadata: https://hg.mozilla.org/releases/mozilla-release/json-automationrelevance/fee723a73e4a12fbd179e05b54f1f2e5623c90c4
[task 2020-11-09T13:53:34.146Z] attempt 1/10
[task 2020-11-09T13:53:34.146Z] retry: calling get_automationrelevance, attempt #1
[task 2020-11-09T13:54:04.414Z] retry: Caught exception:
[task 2020-11-09T13:54:04.414Z] sleeping for 10.89s (attempt 1/10)
[task 2020-11-09T13:54:15.309Z] attempt 2/10
[task 2020-11-09T13:54:15.309Z] retry: calling get_automationrelevance, attempt #2
[task 2020-11-09T13:54:45.630Z] retry: Caught exception:
[task 2020-11-09T13:54:45.630Z] sleeping for 16.43s (attempt 2/10)
[task 2020-11-09T13:55:02.078Z] attempt 3/10
[task 2020-11-09T13:55:02.078Z] retry: calling get_automationrelevance, attempt #3
[task 2020-11-09T13:55:32.321Z] retry: Caught exception:
[task 2020-11-09T13:55:32.321Z] sleeping for 22.40s (attempt 3/10)
[task 2020-11-09T13:55:54.735Z] attempt 4/10
[task 2020-11-09T13:55:54.735Z] retry: calling get_automationrelevance, attempt #4
[task 2020-11-09T13:56:25.042Z] retry: Caught exception:
[task 2020-11-09T13:56:25.042Z] sleeping for 35.24s (attempt 4/10)
[task 2020-11-09T13:57:00.286Z] attempt 5/10
[task 2020-11-09T13:57:00.286Z] retry: calling get_automationrelevance, attempt #5
[task 2020-11-09T13:57:30.597Z] retry: Caught exception:
[task 2020-11-09T13:57:30.597Z] sleeping for 51.25s (attempt 5/10)
[task 2020-11-09T13:58:21.872Z] attempt 6/10
[task 2020-11-09T13:58:21.872Z] retry: calling get_automationrelevance, attempt #6
[task 2020-11-09T13:58:52.212Z] retry: Caught exception:
[task 2020-11-09T13:58:52.212Z] sleeping for 73.28s (attempt 6/10)
[task 2020-11-09T14:00:05.501Z] attempt 7/10
[task 2020-11-09T14:00:05.502Z] retry: calling get_automationrelevance, attempt #7
[task 2020-11-09T14:00:35.796Z] retry: Caught exception:
[task 2020-11-09T14:00:35.796Z] sleeping for 106.81s (attempt 7/10)
[task 2020-11-09T14:02:22.616Z] attempt 8/10
[task 2020-11-09T14:02:22.616Z] retry: calling get_automationrelevance, attempt #8
[task 2020-11-09T14:02:52.967Z] retry: Caught exception:
[task 2020-11-09T14:02:52.968Z] sleeping for 179.35s (attempt 8/10)
[task 2020-11-09T14:05:52.411Z] attempt 9/10
[task 2020-11-09T14:05:52.411Z] retry: calling get_automationrelevance, attempt #9
[task 2020-11-09T14:06:22.757Z] retry: Caught exception:
[task 2020-11-09T14:06:22.757Z] sleeping for 263.29s (attempt 9/10)
[task 2020-11-09T14:10:46.130Z] attempt 10/10
[task 2020-11-09T14:10:46.130Z] retry: calling get_automationrelevance, attempt #10
[task 2020-11-09T14:11:16.447Z] retry: Caught exception:
[task 2020-11-09T14:11:16.447Z] retry: Giving up on get_automationrelevance
[task 2020-11-09T14:11:16.447Z] Error loading tasks for kind test:
Rerunning the decision task a second time gave the same result.
This is blocking releases. Sheehan, Zeid, do you guys own this part of hg.m.o? If so can one of you guys have a look at the server logs? If not, could you loop in the right person?
[1] https://treeherder.mozilla.org/jobs?repo=mozilla-release&revision=fee723a73e4a12fbd179e05b54f1f2e5623c90c4&selectedTaskRun=H45WZSEiSReHJWLllbhDaw.0
[2] https://hg.mozilla.org/releases/mozilla-release/json-automationrelevance/fee723a73e4a12fbd179e05b54f1f2e5623c90c4
Assignee | ||
Comment 1•4 years ago
•
|
||
Last week we added backout metadata to the json-automationrelevance
endpoint for bug 1673985, which is known to be expensive to calculate. For pushes to try, autoland and other more frequently pushed repos this change probably only resulted in a few seconds of extra CPU time. Since release pushes usually have thousands of changesets from weeks of development, calculating the backouts for a single push would be very expensive.
I think the easiest fix here is to gate the expensive check around a query string parameter flag - so calls to json-automationrelevance
that want to compute the backouts for relevant changesets would look something like https://hg.mozilla.org/releases/mozilla-release/json-automationrelevance/fee723a73e4a12fbd179e05b54f1f2e5623c90c4?backouts=1
. Marco, would this solution work for you? As jlorenzo said this is blocking releases.
Reporter | ||
Comment 2•4 years ago
•
|
||
Thanks for the explanation, Connor! I just closed mozilla-release, in the meantime.
Assignee | ||
Comment 3•4 years ago
|
||
Calls to json-automationrelevance
on mozilla-release
can take up to 30s
since deploying changeset 78a0d7c424fc18d. This commit hides the expensive
backout information behind a backouts
query string parameter so only the
relevant calls will perform the expensive calculation.
Updated•4 years ago
|
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/fe454eae1eb7
hgmo: backout information in json-automationrelevance
behind a flag r=zeid
Assignee | ||
Comment 5•4 years ago
•
|
||
I've pushed a fix that will put the expensive calculation on json-automationrelevance
behind a flag. I'm deploying now but we should keep this open while we verify the fix.
Reporter | ||
Comment 6•4 years ago
•
|
||
Decision task is back to green! Thank you very much for this super quick fix, Connor!
For the record, I just reopened mozilla-release.
Comment 7•4 years ago
|
||
It WFM, I have a fix for mozci: https://github.com/mozilla/mozci/pull/353.
Assignee | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Description
•