Intermittent Decision task failure | HTTPError: 400 Client Error: Bad Request for url (due to dependency which expires before task deadline)
Categories
(Firefox Build System :: Task Configuration, defect, P2)
Tracking
(Not tracked)
People
(Reporter: aryx, Unassigned)
References
Details
Attachments
(1 obsolete file)
Many of the gecko decision tasks of the Try repositories are broken (example) because this fetch-civet-source task expires in less than 24 hours which would be the deadline for the tasks in the recent Try pushes.
Try push artifacts expire after 4 weeks. Should a fetch-civet-source task be scheduled on mozilla-central (where artifacts even expire only after a year by default)?
For now, a manually requested fetch-civet-source should resolve the issue for the next 27 days.
Reporter | ||
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Here's the relevant error:
[task 2021-05-21T14:02:28.072Z] http://taskcluster:80 "PUT /queue/v1/task/b9P7J3BBR2O25rzrLjr3fA HTTP/1.1" 400 6616
[task 2021-05-21T14:02:28.072Z] `task.dependencies` references tasks that expires
[task 2021-05-21T14:02:28.072Z] before `task.deadline` this is not allowed, see tasks:
[task 2021-05-21T14:02:28.072Z] * fIF6MHloTKyf9q5_zVpkwg,
[task 2021-05-21T14:02:28.072Z] All taskIds in `task.dependencies` **must** have
[task 2021-05-21T14:02:28.072Z] `task.expires` greater than the `deadline` for this task.
[task 2021-05-21T14:02:28.072Z]
[task 2021-05-21T14:02:28.072Z]
[task 2021-05-21T14:02:28.072Z] ---
[task 2021-05-21T14:02:28.072Z]
[task 2021-05-21T14:02:28.072Z] * method: createTask
[task 2021-05-21T14:02:28.072Z] * errorCode: InputError
[task 2021-05-21T14:02:28.072Z] * statusCode: 400
[task 2021-05-21T14:02:28.072Z] * time: 2021-05-21T14:02:28.079Z
My theory is that this check started happening on the taskcluster side when we recently upgraded taskcluster. The index-search optimizer never bothers to check that the expiry of the dependency is greater than the deadline of the current task.
If my theory is correct, we'll see the same issue happening with other fetch and toolchain tasks as well (it's just that fetch-civet
happened to be the first one to approach expiry since the taskcluster upgrade). So I think we'll have to prioritize this.
I think there are two ways to fix this:
- Turn off the check for Gecko. This isn't ideal since the check does prevent an intermittent failure case... though not one that has seemed to be very harmful this far in Gecko at least.
- Fix the
IndexSearch
optimizer to take the expiry of the dependency and deadline of the current task into account.
#2 is ideal, though it may take a bit of finagling to access the dependent's deadline from within the optimizer.
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Just to clarify why there isn't any issue with fetch-civet
.
Prior to the taskcluster upgrade, this would have still worked (since the task hadn't expired yet). As soon as the task expired, then new decision tasks would have failed to find a replacement via the index-search
optimizer, and we simply would have scheduled a new task to run.
I think that fetch-civet
task likely should have a much longer expiry, but it's not the root cause of the issue. Just the thing that triggers it.
Comment 3•3 years ago
|
||
this would have still worked
Not quite -- if the dependent task runs after the fetch-civet task, then it would fail.
Comment 4•3 years ago
•
|
||
Correct, the IndexSearch
optimizer in Gecko should have been comparing expiry / deadline all along, this error is pointing out the flaw that has always existed in our optimizer.
Comment 5•3 years ago
|
||
Aryx pointed out that we've hit this failure in the past (and "fixed" it by increasing the expiry). So it's not a regression from the upgrade after all, and likely just very rare (and a coincidence that it happened shortly after the upgrade).
We should increase the fetch-civet
expiry either way here.. but might be also worth solving the root issue properly this time.
Comment 7•3 years ago
|
||
Comment 8•3 years ago
|
||
I got a bit nerd sniped here. The attached patch is untested and won't work yet, but should be pretty close to what we need. Posting it to phabricator now as there's a chance I'll let it slip.
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Comment on attachment 9223027 [details]
WIP: Bug 1712333 - Take dependent deadlines into account when deciding whether to replace a task
Revision D115726 was moved to bug 1690947. Setting attachment 9223027 [details] to obsolete.
Description
•