Closed Bug 1317189 Opened 8 years ago Closed 7 years ago

talos --rebuild option stopped working

Categories

(Testing :: General, defect)

Version 3
defect
Not set
normal

Tracking

(firefox55 fixed)

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: zbraniecki, Assigned: chmanchester)

References

Details

Attachments

(1 file)

For a couple months now I'm testing performance of my branch using the following talos run:

./mach try -b o -p linux64,macosx64,win64 -u none[x64,10.10,Windows\ 8] -t other[x64,10.10,Windows\ 8],other-e10s[x64,10.10,Windows\ 8] --rebuild 20

Historically, this always worked. I got builds like:

 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab90334d93d8
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=79facb824200
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=248c297a129b

but over last two days the builds get 20 rebuilds for linux, but just one for mac and windows:

 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=6544a957e60e64fa97a11e293e17af02c1d1fd22
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=7abee73aa6672ef7528ed4d6345138a50239c74c

or like here 20 builds for windows, 20 builds for mac e10s, but only 1 non-e10s mac:

 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=d0e752b48e4c499d61f49065ce2c585ec4735d1f


 -
Armen, possibly related to bug 1316976?
adding jobs to that build doesn't work either. I tried to add more talos-other and it never happened.
I did respin talos-other today, and it turned out to kick 1+20 new ones, though there are still pending ones on https://treeherder.mozilla.org/#/jobs?repo=try&author=zbraniecki@mozilla.com
More examples:

 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=6544a957e60e64fa97a11e293e17af02c1d1fd22 - windows stuck, macos e10s stuck
 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=13cbd8a4e42c81516f7a2a3c2887865ad0b1a925 - windows and linux stuck, macos done

etc.

Can we get some help with this? I'm running a lot of perf tests right now and this bug is making it rally hard to work.
:bstack, would you be able to help us figure out why this wouldn't be working on linux (i.e. taskcluster) ?
Flags: needinfo?(bstack)
Sorry I've let this language all day today. Had some other stuff I needed to look into first. Afaict, this isn't related to our recent work in triggering talos from treeherder. This would most likely be an in-tree taskgraph generation issue. I'll look into this a bit and defer to someone more wise in the ways of in-tree stuff if I can't find anything awry.
Assignee: nobody → bstack
Status: NEW → ASSIGNED
Flags: needinfo?(bstack)
I'm at a bit of a loss. I don't think I really have the context here to figure out what's going on. wlach, is this related to the work you're doing now?
Flags: needinfo?(wlachance)
No, this isn't really related to anything I'm doing.

I don't really see why this would be taskcluster related, at least not fully, as apparently the problem goes back 3 months (long before we used buildbotbridge to schedule the linux talos jobs). If the problems were linux-specific and were more recent, :wcosta would be the person I'd ping (he was doing most of the work for linux talos and BBB).

:catlee, do you know who might be able to debug this? They would need to know about buildbot and how try syntax translates into talos jobs being scheduled.
Flags: needinfo?(wlachance) → needinfo?(catlee)
Assignee: bstack → nobody
Status: ASSIGNED → NEW
I think --rebuild support is something that trigger-bot [1] handles.

Chris, can you help out here?

[1] http://chmanchester.github.io/blog/2015/07/15/automatic-triggering-on-try-server/
Flags: needinfo?(catlee) → needinfo?(cmanchester)
It's pretty unclear to me what the issue is here, or which jobs it impacts, so I pushed to try with `--rebuild` for Linux and OS X:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=55dfc7a5b6514581601e5472ea73f880a822cdc3
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a45930154c6a16225bbbe23e9dd7ec7c882f2de9

This is working as expected for buildbot jobs, which are triggered by trigger-bot, and taskcluster jobs, which are triggered by a different mechanism. People reporting this issue refer to jobs being "stuck" -- perhaps this refers to some re-triggered jobs being in pending for an apparently unreasonable amount of time?
Flags: needinfo?(cmanchester)
I think the symptom is more like sending multiple platforms Talos w/ --rebuild in one time, tests might be stucked. Pushing with single platform seems fine.
AFAICT this blocks us evaluating stylo changes on Linux. For example in a recent try push [1] we saw retriggers for win and mac, but not linux. This seems to imply the feature is broken on taskcluster but not buildbot.

chmanchester or wlach, can you take a closer look at this?

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=22028266be5e4485a959d44b1619c7e3d3f80dfa
Flags: needinfo?(wlachance)
Flags: needinfo?(cmanchester)
I'm sorry, don't think I can help (this is even less my area now than it was a few months ago). If Chris doesn't know what's up, I would escalate to :garndt and/or :jmaher.
Flags: needinfo?(wlachance)
I think I figured this out. It's the difference between "--rebuild" and "--rebuild-talos", the former works fine on TC, the latter as implemented in bug 1333167 does not seem to work, but I think I see the issue.
Assignee: nobody → cmanchester
Blocks: 1333167
Flags: needinfo?(cmanchester)
Actually, based on the links in comment 0 this bug was actually filed about "--rebuild", where the issue still doesn't reproduce. I'll re purpose it to fix "--rebuild-talos" unless there are any objections.
Comment on attachment 8866058 [details]
Bug 1317189 - Fix --rebuild-talos for TC try jobs by checking the correct attribute.

https://reviewboard.mozilla.org/r/137654/#review141070
Attachment #8866058 - Flags: review?(wcosta) → review+
possibly bug 1352202 is a dup?
Pushed by cmanchester@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d27e83aae737
Fix --rebuild-talos for TC try jobs by checking the correct attribute. r=wcosta
https://hg.mozilla.org/mozilla-central/rev/d27e83aae737
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: