Closed
Bug 1274176
Opened 9 years ago
Closed 9 years ago
Retrigger doesn't work on taskcluster tests
Categories
(Taskcluster :: Services, defect)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ochameau, Assigned: dustin)
Details
(Whiteboard: [mozilla-taskcluster])
Take this run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a64330e28f9
I'm unable to spawn new dt7 runs.
Neither via Treeherder retrigger, I get the success popup, but nothing happen.
Neither via Taskcluster task inspector. I get "You do not have sufficient scopes" when trying to retrigger it.
This is very limiting our hability to address intermittents for tests that now run only on Taskcluster.
Feel free to move this bug into Taskcluster component.
Comment 1•9 years ago
|
||
<KWierso|afk> emorley: bug 1274176 should probably get moved to a taskcluster component since retriggering directly from taskcluster is failing, too
Component: Treeherder → General
Product: Tree Management → Taskcluster
Version: --- → unspecified
Comment 2•9 years ago
|
||
This is caused by changes made in the decision task process to use the new taskcluster queue dependency system rather than the scheduler. Mozilla-taskcluster that is responsible for handling retrigger events is configured to duplicate the graph as it exists in the scheduler, not using this new dependency system.
Component: General → Platform and Services
Updated•9 years ago
|
Whiteboard: [mozilla-taskcluster]
Updated•9 years ago
|
Assignee: nobody → garndt
Comment 3•9 years ago
|
||
To duplicate a node in the new task dependency system requires retrieving the entire task graph (which can be many hundreds of tasks) and iterating on those to determine the dependency tree...for each retrigger.
I have spoken to Jonas about this and it should be possible, and greatly useful, to add an API endpoint to taskcluster-queue that will allow a client to request the list of tasks that depend on the task we query for, including their task definitions. From there it's just a matter of updating task IDs, timestamps, and resubmitting those tasks.
This is being worked on but will take time to implement, test, and deploy the queue side. From there mozilla-taskcluster will need to be updated to handle this new scenario.
In the meantime, if not having the ability to retrigger becomes a huge burden, and we only care about the task we're retriggering and not things that depend on it (such as only retriggering tests), then we can put in some ability to only retrigger that particular task and not the dependents.
Comment 4•9 years ago
|
||
can we back out the in-tree work that caused this regression until we have a solution? This functionality is very important to developers and sheriffs. If this were a firefox feature that landed and caused a pretty serious regression, we would back it out- I see this done just about every week.
As for a short term hack to assume test only jobs, that might solve our problems and would be better than what we have now.
Assignee | ||
Comment 5•9 years ago
|
||
I did not know this was going to cause issues with retriggering, and only made the link this morning. Sorry about that!
From discussion this morning, the plan is, roughly:
1. fix single-task retriggering right away (hopefully the majority of use-cases)
2. fix retriggering entire subtrees (allowing, for example, retriggering builds) using a brute-force approach
3. modify the queue to better support reverse dependencies for #2
4. fix up mozilla-taskcluster to properly handle big-graph scheduler tasks (bug 1274716)
Comment 6•9 years ago
|
||
This should address #1 https://github.com/taskcluster/mozilla-taskcluster/pull/70
Comment 7•9 years ago
|
||
I was able to retrigger this failed test job on mozilla-inbound to make sure retriggering was working:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=3737024731e6baccf99ac1f01eb5805ac12e3944&selectedJob=28475269
Assignee | ||
Updated•9 years ago
|
Assignee: garndt → dustin
Assignee | ||
Comment 8•9 years ago
|
||
Addressing #2, #3:
https://github.com/taskcluster/mozilla-taskcluster/pull/73
Assignee | ||
Comment 9•9 years ago
|
||
OK, I think we finally got that working. I retriggered
https://treeherder.allizom.org/#/jobs?repo=try&revision=e664a7f36669&selectedJob=22087439
which had 118 dependent jobs, and it created 119 new jobs. It scans sequentially for dependencies, and since it has to look for dependencies for all 119 of those jobs, that takes a while -- 70 seconds in this case.
I'm hopeful that in the future, this work will be done by an action task, and based on the task-graph.json produced by the decision task rather than pulled out of the queue, but for now this will do the trick!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Platform and Services → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•