Closed
Bug 1363409
Opened 4 years ago
Closed 4 years ago
unable to schedule e10s specific talos jobs on tc scheduled platforms
Categories
(Firefox Build System :: Task Configuration, defect)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla55
People
(Reporter: jmaher, Assigned: dustin)
Details
(Keywords: regression)
Attachments
(2 files)
in this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4173fcce525300eadf66cc9295e7371d0484a69a&group_state=expanded&selectedJob=97665271 the syntax: try: -b o -p linux,linux64,macosx64,win32,win64 -u none -t chromez-e10s,dromaeojs-e10s,other-e10s,g1-e10s,g2-e10s,svgr-e10s,tp5o-e10s,xperf-e10s --rebuild-talos 5 doesn't trigger jobs except for win7. It should have linux64 and and osx scheduled (win8 would need custom syntax). what is interesting is both osx and linux64 are managed by taskcluster and most likely the try syntax parser isn't understanding other-e10s, g1-e10s, etc. I have seen this often in the last week and I am excited that people are being responsible with their try pushes, but asking them to push again with -t all which only adds to the burden on our infrastructure.
Comment 1•4 years ago
|
||
FYI, more than anything: Once I used -t all, the try submission email has "It looks like this submission has talos jobs. You can compare the performance of your push against a baseline revision here: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=4aeff94a7fc872e4716def28f89647fd0d0b4488" Attempts with e10s only talos didn't have the above text in the submission email. Just the usual results on Treeherder link and builds and logs link.
| Assignee | ||
Comment 2•4 years ago
|
||
There are a half-dozen things that interpret try syntax, and the one that prints the message when you push is the least accurate of them, so I wouldn't read too much into what it says! It even sounds uncertain ("it looks like..").
It looks like the original push *did* trigger jobs for all of those platforms, it just didn't run the talos jobs.
To start debugging this, I'd suggest taking a look at the talos_try_name attribute of the jobs that you'd expect to see scheduled. Maybe those aren't having `-e10s` appended?| Reporter | ||
Comment 3•4 years ago
|
||
:garndt, can someone from the TC team look into this?
Flags: needinfo?(garndt)
Comment 4•4 years ago
|
||
I do not have anyone immediately that could look at this, but there are a few other people that have had their hands in it that could help out I hope. comment 2 does seem to be a step in the right direction for someone that wants to debug it. From my cursory glance at the scheduling, what I can tell is that we allow a user to specifically schedule *only* e10s jobs by appending e10s to the test suite name for both talos and unittests. The intree scheduling respects this for unittests, but I'm not seeing anything that will match talos_try_name in this way. If I adjust the talos_try_name for a talos suite to include "e10s" so it will match the try syntax and include the job in the graph, which comment 2 suggests.
Flags: needinfo?(garndt)
Comment 5•4 years ago
|
||
Also, I'm starting to dig into the in-tree scheduling myself as I'm curious, so for those that dive into it and have questions, I can try to answer what I know so far. If I have any strokes of insight into this problem, I'll leave notes here.
Comment 6•4 years ago
|
||
I'm not entirely sure if this is the right direction, but this seemed to cause the talos e10s tests to show up in the graph. Not sure if the task definitions are correctly though.
Comment 7•4 years ago
|
||
ok, it seems that this has now landed on m-c which enables e10s for talos jobs, and it will default to running both non-e10s and e10s. https://hg.mozilla.org/mozilla-central/rev/77584adad526 It doesn't appear to address the issue where you *only* want to run e10s talos tests, which might be related to the talos-try-name change here: --- a/taskcluster/taskgraph/transforms/tests.py +++ b/taskcluster/taskgraph/transforms/tests.py @@ -533,16 +533,17 @@ def split_e10s(config, tests): test['test-name'] += '-e10s' test['e10s'] = True test['attributes']['e10s'] = True group, symbol = split_symbol(test['treeherder-symbol']) if group != '?': group += '-e10s' test['treeherder-symbol'] = join_symbol(group, symbol) if test['suite'] == 'talos': + test['talos-try-name'] += '-e10s' for i, option in enumerate(test['mozharness']['extra-options']): if option.startswith('--suite='): test['mozharness']['extra-options'][i] += '-e10s' else: test['mozharness']['extra-options'].append('--e10s') yield test Kim, do you have any knowledge about this? I see you enabled the e10s option for talos tests.
Flags: needinfo?(kmoir)
| Assignee | ||
Comment 8•4 years ago
|
||
So looking at tests.py:
521 def split_e10s(config, tests):
522 for test in tests:
..
532 if e10s:
533 test['test-name'] += '-e10s'
and later
671 def make_job_description(config, tests):
675 for test in tests:
..
682 if 'talos-try-name' in test:
683 try_name = test['talos-try-name']
684 attr_try_name = 'talos_try_name'
685 else:
686 try_name = test.get('unittest-try-name', test['test-name'])
687 attr_try_name = 'unittest_try_name'
..
698 attributes.update({
..
705 attr_try_name: try_name,
706 })
Looking in taskcluster/ci/test/tests.yml, `unittest-try-name` is only given explicitly for tasks which run without e10s. But `talos-try-name` is given for every talos job. So the fallback on line 686 to `test['test-name']` occurs for *all* e10s unittests, but for none of the e10s talos.
So the fix in comment 7 would work, but it leaves unittests and talos with markedly different implementations of the same thing, and will break when a unittest gets 'unittest-try-name' added to it in tests.yml, or when the `talos-try-name` is removed from a talos stanza in tests.yml.
I suspect the fix is to move the *-try-name setting logic earlier in `tests.py`, perhaps to `set_defaults`. Then by the time `split_e10s` runs, one of `unittest-try-name` and `talos-try-name` will definitely be set, and we can append `"-e10s"` to it.
Probably a nicer cleanup would be to combine `unittest-try-name` and `talos-try-name` into just `try-name` in the schema, and only split them out into the separate attributes (`unittest_try_name`, `talos_try_name`) in `make_job_description`. But that could be deferred to another bug.Flags: needinfo?(kmoir)
Comment 9•4 years ago
|
||
Major regressions like this (ones without any indication of failure) should be announced to mozilla.dev.platform and firefox-dev to avoid wasting developer and CPU time wondering when/if talos jobs will appear. I've now sent such an email: https://mail.mozilla.org/pipermail/firefox-dev/2017-May/005429.html
Severity: normal → critical
Keywords: regression
| Assignee | ||
Comment 10•4 years ago
|
||
@@ -415066,7 +415066,7 @@
"autoland",
"try"
],
- "talos_try_name": "chromez",
+ "talos_try_name": "chromez-e10s",
"test_chunk": "1",
"test_platform": "linux64-nightly/opt",
"unittest_flavor": "talos",
.. that's a promising diff (same for all the suites)| Comment hidden (mozreview-request) |
| Assignee | ||
Updated•4 years ago
|
Attachment #8867502 -
Flags: review?(garndt)
Comment 12•4 years ago
|
||
| mozreview-review | ||
Comment on attachment 8867502 [details] Bug 1363409: ensure e10s talos have -e10s in their try name https://reviewboard.mozilla.org/r/139042/#review142828 So I tried this out locally and it seemed to take the try syntax for specifying e10s talos tests and also obeyed "all". It also seems to obey tests that are set with "e10s: both" and scheduling non-e10s as well. To be honest, I do not know much about the in-tree scheduling so some of this a little lost on me on how it works, so I totally understand if you want a second opinion. ::: taskcluster/taskgraph/transforms/tests.py:738 (Diff revision 1) > > build_label = test['build-label'] > > - if 'talos-try-name' in test: > - try_name = test['talos-try-name'] > + try_name = test['try-name'] > + if test['suite'] == 'talos': > attr_try_name = 'talos_try_name' It looks like you got rid of this attribute from tests.yml. Wouldn't this be trying to find something that doesn't exist later?
Attachment #8867502 -
Flags: review?(garndt) → review+
| Assignee | ||
Comment 13•4 years ago
|
||
The *attributes* (keys in `task.attributes`) stick around, and are used for try filtering. I just combined the `talos-try-name` and `test-try-name` keys in test descriptions into `try-name`, since it's easy for the code to figure out which attribute the value should end up in.
Comment 14•4 years ago
|
||
Pushed by dmitchell@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/31710e5be104 ensure e10s talos have -e10s in their try name r=garndt
Comment 15•4 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/31710e5be104
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Updated•4 years ago
|
Assignee: nobody → dustin
Updated•3 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•