Closed Bug 1317189 Opened 6 years ago Closed 6 years ago

talos --rebuild option stopped working


(Testing :: General, defect)

Version 3
Not set


(firefox55 fixed)

Tracking Status
firefox55 --- fixed


(Reporter: zbraniecki, Assigned: chmanchester)




(1 file)

For a couple months now I'm testing performance of my branch using the following talos run:

./mach try -b o -p linux64,macosx64,win64 -u none[x64,10.10,Windows\ 8] -t other[x64,10.10,Windows\ 8],other-e10s[x64,10.10,Windows\ 8] --rebuild 20

Historically, this always worked. I got builds like:


but over last two days the builds get 20 rebuilds for linux, but just one for mac and windows:


or like here 20 builds for windows, 20 builds for mac e10s, but only 1 non-e10s mac:


Armen, possibly related to bug 1316976?
adding jobs to that build doesn't work either. I tried to add more talos-other and it never happened.
I did respin talos-other today, and it turned out to kick 1+20 new ones, though there are still pending ones on
More examples:

 - - windows stuck, macos e10s stuck
 - - windows and linux stuck, macos done


Can we get some help with this? I'm running a lot of perf tests right now and this bug is making it rally hard to work.
:bstack, would you be able to help us figure out why this wouldn't be working on linux (i.e. taskcluster) ?
Flags: needinfo?(bstack)
Sorry I've let this language all day today. Had some other stuff I needed to look into first. Afaict, this isn't related to our recent work in triggering talos from treeherder. This would most likely be an in-tree taskgraph generation issue. I'll look into this a bit and defer to someone more wise in the ways of in-tree stuff if I can't find anything awry.
Assignee: nobody → bstack
Flags: needinfo?(bstack)
I'm at a bit of a loss. I don't think I really have the context here to figure out what's going on. wlach, is this related to the work you're doing now?
Flags: needinfo?(wlachance)
No, this isn't really related to anything I'm doing.

I don't really see why this would be taskcluster related, at least not fully, as apparently the problem goes back 3 months (long before we used buildbotbridge to schedule the linux talos jobs). If the problems were linux-specific and were more recent, :wcosta would be the person I'd ping (he was doing most of the work for linux talos and BBB).

:catlee, do you know who might be able to debug this? They would need to know about buildbot and how try syntax translates into talos jobs being scheduled.
Flags: needinfo?(wlachance) → needinfo?(catlee)
Assignee: bstack → nobody
I think --rebuild support is something that trigger-bot [1] handles.

Chris, can you help out here?

Flags: needinfo?(catlee) → needinfo?(cmanchester)
It's pretty unclear to me what the issue is here, or which jobs it impacts, so I pushed to try with `--rebuild` for Linux and OS X:

This is working as expected for buildbot jobs, which are triggered by trigger-bot, and taskcluster jobs, which are triggered by a different mechanism. People reporting this issue refer to jobs being "stuck" -- perhaps this refers to some re-triggered jobs being in pending for an apparently unreasonable amount of time?
Flags: needinfo?(cmanchester)
I think the symptom is more like sending multiple platforms Talos w/ --rebuild in one time, tests might be stucked. Pushing with single platform seems fine.
AFAICT this blocks us evaluating stylo changes on Linux. For example in a recent try push [1] we saw retriggers for win and mac, but not linux. This seems to imply the feature is broken on taskcluster but not buildbot.

chmanchester or wlach, can you take a closer look at this?

Flags: needinfo?(wlachance)
Flags: needinfo?(cmanchester)
I'm sorry, don't think I can help (this is even less my area now than it was a few months ago). If Chris doesn't know what's up, I would escalate to :garndt and/or :jmaher.
Flags: needinfo?(wlachance)
I think I figured this out. It's the difference between "--rebuild" and "--rebuild-talos", the former works fine on TC, the latter as implemented in bug 1333167 does not seem to work, but I think I see the issue.
Assignee: nobody → cmanchester
Blocks: 1333167
Flags: needinfo?(cmanchester)
Actually, based on the links in comment 0 this bug was actually filed about "--rebuild", where the issue still doesn't reproduce. I'll re purpose it to fix "--rebuild-talos" unless there are any objections.
Comment on attachment 8866058 [details]
Bug 1317189 - Fix --rebuild-talos for TC try jobs by checking the correct attribute.
Attachment #8866058 - Flags: review?(wcosta) → review+
Pushed by
Fix --rebuild-talos for TC try jobs by checking the correct attribute. r=wcosta
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.