Closed Bug 1677142 Opened 4 years ago Closed 4 years ago

The decision task should not pick failed tasks when looking for cached tasks

Categories

(Firefox Build System :: Task Configuration, defect)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1675330

People

(Reporter: Sylvestre, Unassigned)

References

(Regression)

Details

(Keywords: regression)

I have been trying to hack into clang and our build system.
To evaluate the performances, I started raptor and talos tasks. Both of them never started here:
https://treeherder.mozilla.org/jobs?repo=try&revision=97566c47c5b2fe5638e98cce62afe76f2561ddad

Here are the tasks that I was trying to run: https://firefoxci.taskcluster-artifacts.net/IxSQolksScKcWTo6ou3dIA/0/public/target-tasks.json

Discussing with Joel, looking at the full taskgraph ( https://firefoxci.taskcluster-artifacts.net/IxSQolksScKcWTo6ou3dIA/0/public/task-graph.json ) , he identified that fix-stack had an exception ( https://firefox-ci-tc.services.mozilla.com/tasks/ATKYpdJWRWGZUY8OKGh3Tw )

I still don't know why we had an exception.
We should either fail the decision task or clearly mark fix-stack as failing.

Welcome to fallout from bug 1653050.

The problem here is that the fix-stack the raptor and talos tasks depend on is on another push from a week ago that failed with an exception that you can't rerun because its deadline has expired. So the decision task shouldn't allow to optimize out tasks to tasks that can't be fulfilled this way.

Regressed by: 1653050
Summary: The decision task should fail when a task cannot start (fix-stack only ?) → The decision task should not pick failed tasks when looking for cached tasks
Has Regression Range: --- → yes
Keywords: regression

Folks, would you be able to help with this?
It is quite confusing.
At least, showing a proper error message would help.

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(ahal)

I just r+'ed a patch to improve this :)

Note even with that patch it's possible this will still happen. We could replace with a pending/running task, and then it could fail after the fact. But at least the patch in that bug should stop us from replacing with a task that has already failed.

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(ahal)
Resolution: --- → DUPLICATE
Flags: needinfo?(mcastelluccio)
You need to log in before you can comment on or make changes to this bug.