Closed Bug 1338871 Opened 7 years ago Closed 7 years ago

Enable Talos tests for linux64-stylo builds

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(firefox55 fixed)

RESOLVED FIXED
Tracking Status
firefox55 --- fixed

People

(Reporter: cpeterson, Assigned: kmoir)

References

Details

Attachments

(8 files, 8 obsolete files)

2.32 KB, text/plain
Details
1.40 KB, patch
aobreja
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
8.09 KB, patch
aobreja
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
1.58 KB, patch
aobreja
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
7.43 KB, patch
jmaher
: review+
Details | Diff | Splinter Review
739 bytes, patch
aselagea
: review+
aobreja
: checked-in+
Details | Diff | Splinter Review
2.12 KB, patch
Details | Diff | Splinter Review
1.58 KB, patch
Callek
: review+
Details | Diff | Splinter Review
@ Kim, we'd like to enable Talos tests for the linux64-stylo builds. Note that a couple Talos tests are failing or intermittently timing out on Stylo. This is a known issue Shing is working on and, IIUC, should not block us from enabling Talos:

    - talos-chrome
    - talos-dromaeojs
    - talos-g1
    - talos-g2
    - talos-other
    - talos-svgr
    - talos-tp5o

@ Shing, the Talos try syntax you sent me in email does not include the g3 or g4 tests: try syntax `-t chromez,dromaeojs,other,g1,g2,svgr,tp5o --rebuild-talos 5`. Do you know what the g3 and g4 tests do? They're listed in taskcluster [1], but not on Trychooser.

[1] https://hg.mozilla.org/mozilla-central/file/779d10ed78f5/taskcluster/ci/test/test-sets.yml#l54

    - talos-g3
    - talos-g4
Flags: needinfo?(slyu)
Oh yes, we need g3 and g4 (I confirmed that they run correctly) 

I did part of the work in https://bugzilla.mozilla.org/show_bug.cgi?id=1328765, but when I run talos, the test result is shown under "linux64" not "linux64-stylo"[1]. Is it possible to move it to "linux64-stylo" so we can separate the stylo-vs-non-stylo data on perfherder?[2]

[1] Talos try result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c7d834929a76ff701671a9d0474290d188f1132
[2] Perfherder result: https://treeherder.mozilla.org/perf.html#/graphs?series=%5Btry,ff2723032e6bee08807c0d0b082c8c6af3dca6f5,1,1%5D&selected=%5Btry,ff2723032e6bee08807c0d0b082c8c6af3dca6f5,169752,76742781%5D
Flags: needinfo?(slyu) → needinfo?(kmoir)
Looking at this now
Flags: needinfo?(kmoir)
So from a technical perspective, talos tests in taskcluster run on buildbot bridge and jobs for the stylo platform are not enabled.

From a capacity standpoint, we are already over-extended in terms of jobs running on talos which currently must run on dedicated hardware pool (not AWS). Also, my understanding is that budget is not allocated for new quantum testing.  So I'm not sure how we can enable more talos tests on stylo without taking other steps to address our daily backlog on these platforms.
discussed this in the quantum developer productivity meeting, stylo team agreed to limit these to run on m-c
Depends on: 1339185
No longer depends on: 1339185
From quantum meeting today
* reduce Talos frequency to once daily
Attached patch bug1338871.patch (obsolete) — Splinter Review
Attached file bug1338871builder.diff
Attached patch bug1338871tools.patch (obsolete) — Splinter Review
Attachment #8844212 - Attachment is obsolete: true
Attached patch bug1338871.patch (obsolete) — Splinter Review
Depends on: 1343095
Blocks: 1343095
No longer depends on: 1343095
Attachment #8844204 - Attachment is obsolete: true
Attached patch bug1338871.patchSplinter Review
Attachment #8844214 - Attachment is obsolete: true
Attachment #8844213 - Flags: review?(aobreja)
Attachment #8844256 - Flags: review?(aobreja)
Comment on attachment 8844256 [details] [diff] [review]
bug1338871.patch

> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g2-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g4 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g3 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g2 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g1 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos svgr ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos svgr-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos dromaeojs-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos dromaeojs ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos chromez-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos tp5o ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos other ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos chromez ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g4-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos tp5o-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g1-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos other-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 mozilla-central talos g3-e10s ScriptFactory
6196a6215,6232
> Ubuntu HW 12.04 Stylo x64 try talos g2-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g4 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g3 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g2 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g1 ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos svgr ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos svgr-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos dromaeojs-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos dromaeojs ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos chromez-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos tp5o ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos other ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos chromez ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g4-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos tp5o-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g1-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos other-e10s ScriptFactory
> Ubuntu HW 12.04 Stylo x64 try talos g3-e10s ScriptFactory
Attachment #8844256 - Flags: review?(aobreja) → review+
Attachment #8844213 - Flags: review?(aobreja) → review+
Attachment #8844437 - Flags: review?(aobreja)
Attachment #8844437 - Flags: review?(aobreja) → review+
Attachment #8844213 - Flags: checked-in+
Attachment #8844256 - Flags: checked-in+
Attachment #8844437 - Flags: checked-in+
In bug 1343095, the talos jobs were disabled via buildbot.  I'd like to re-enable them in taskcluster, however the quantum team has specified that they should only run once a day on m-c. This suggests the use of 
.cron.yml or a hook.

Looking at .cron.yml on inbound, all the jobs are builds with a a target defined. Is the best approach to create a filter that would for the talos jobs on m-c to trigger them?  Not sure how that would work because we would have to trigger a build for the artifacts and then run the associated talos tests.

Dustin, what do you suggest as the best approach for running talos tests once a day for a certain branch and platform via taskcluster?
Flags: needinfo?(dustin)
I think the best example would be the valgrind cron.  That will re-run builds, as well, since we don't currently support optimizing builds, but that should only be a handful of jobs per day, right?
Flags: needinfo?(dustin)
won't the builds run all the unittests though?  right now that would be a few for stylo, maybe that is ok.
Not if you don't select them in the target task method
Okay so I will create a new target to run a the linux stylo build + talos and then add this to the .cron.yml.  If the patch in bug 1343095 is reverted, it will enable talos on all commits, so I'll have to change the individual talos task definitions in taskcluster/ci/test/tests.yml so they don't run.  

Maybe define a new stylo build that only includes talos tests not the unit tests here

ci/test/test-platforms.yml
Attached patch bug1338871.patch cron (obsolete) — Splinter Review
This will enable the cron and the talos jobs or stylo opt e10s jobs.  However, I think it also adds talos jobs on every commit where the e10s stylo jobs run. I could limit all the talos definitions in taskcluster/ci/test/tests.yml so linux stylo talos would only run on m-c.  However, I think this would mean that talos tests would still run on every m-c commit for linux stylo builds. So not sure how to disable on every commit while enabling the filter for the cron.

taskcluster/ci/test/test-platforms.yml
linux64-stylo/opt:
 38      build-platform: linux64-stylo/opt
 39      test-sets:
 40          - stylo-tests
 41 +        - talos
Attachment #8846106 - Flags: feedback?(dustin)
Comment on attachment 8846106 [details] [diff] [review]
bug1338871.patch cron

I think I have a solution that addresses this problem
Attachment #8846106 - Flags: feedback?(dustin)
Attached patch bug1338871-tc.patch (obsolete) — Splinter Review
This patch enables the talos tests for the cron but not on every commit
Attachment #8846106 - Attachment is obsolete: true
Attachment #8846681 - Flags: review?(dustin)
Attachment #8846681 - Flags: review?(dustin) → review+
Pushed by kmoir@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b463b127b85a
Enable Talos tests for linux64-stylo builds r=dustin
Attached patch bug1338871-tc2.patch (obsolete) — Splinter Review
Pushed by kmoir@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/252c03880423
Enable Talos tests for linux64-stylo builds r=bustage DONTBUILD
I see the cron(T) job running, but I do not see stylo talos jobs being scheduled:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=1b9293be51637f841275541d8991314ca56561a5&filter-searchStr=talos
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'll investigate
Some notes from my debugging process

nightly builds on m-c are scheduled like this

   mozilla-central: [{hour: 11, minute: 0}]

My talos jobs are scheduled like this

- {hour: 4, minute: 0}

So the talos tests are scheduled before the nightly builds are run.

However in theory the scheduling the talos jobs should trigger the nightly builds required.

So I think I should
1) change the time for the talos jobs on m-c and watch the logs in the taskcluster tools to see what is happening.  Unfortunately, you can't see the historical logs, only the previous log.
2) Schedule the talos jobs at a time that is after the nightlies are run

Actually now that I have written this all down I realize that talos for stylo relies on a nightly build and there aren't any nightly builds for stylo on m-c or anywhere for that matter.  So I think it gets filters out and the tests don't run.  There aren't any stylo builds with attributes nightly = true.
 
So I think we need to 
1) enable talos on m-c for every commit so we don't have to setup stylo nightly builds
2) change seta so it only runs every five commits or something for talos for stylo to reduce impact on talos wait times
Hmm, actually forgot that seta is not enabled on m-c.  Jmaher what do you think about enabling seta for m-c but just for stylo talos?

Otherwise we would have to enable nightly tests for stylo and we don't really need to have signed builds with updates etc for stylo.
Flags: needinfo?(jmaher)
I think we have low enough volume on m-c it should be on to have it working per push.  Since this is done via Taskcluster, we could do m-c, although I would be worried about accidentally skipping other stuff like linux64-pgo or other tests.

maybe a set of logic inside of the decision task:
if branch=='mozilla-central' and platform=='linux64-stylo' and testtype=='talos':
    doSETA()

How about we start with all on m-c?
Flags: needinfo?(jmaher)
So you are suggesting to enable talos per push for linux stylo on m-c which is easy to do.  

I don't understand what the doSeta bit is. Doesn't the seta data have to be in the treeherder url

https://treeherder.mozilla.org/api/project/autoland/seta/job-priorities/?build_system_type=taskcluster&priority=5&format=json

before the jobs will be skipped.  Seta is not enabled for m-c yet.
Flags: needinfo?(jmaher)
I would like to avoid the SETA approach for now, we keep hacking more stuff onto SETA and it gets messier- I think per push on mozilla-central for stylo talos should be sufficient.
Flags: needinfo?(jmaher)
Attached patch bug1338871tc-3.patch (obsolete) — Splinter Review
enable talos on m-c for stylo for every push
Depends on: 1348948
patch to enable m-c(try if specified) talos for linux-stylo
Attachment #8846681 - Attachment is obsolete: true
Attachment #8846833 - Attachment is obsolete: true
Attachment #8849160 - Attachment is obsolete: true
Attachment #8849244 - Flags: review?(jmaher)
Comment on attachment 8849244 [details] [diff] [review]
bug1338871tc-4.patch

Review of attachment 8849244 [details] [diff] [review]:
-----------------------------------------------------------------

thanks :kmoir, this looks good.
Attachment #8849244 - Flags: review?(jmaher) → review+
Alin could you have someone from the buildduty team look at the buildbot masters in bug 1348948 and determine why the linux64-stylo talos jobs aren't appearing on the masters?  I have a lot of taskcluster migration bugs that have deadlines right now and will not be able to get to it this week.
Flags: needinfo?(aselagea)
Andrei will take a took at this.
Thanks Andrei!
Flags: needinfo?(aselagea)
Err, look*
Patch for tools.Adding platform to fx_platforms.
Attachment #8850477 - Flags: review?(aselagea)
Comment on attachment 8850477 [details] [diff] [review]
bug1338871_tools.patch

Lgtm. 

We'll probably need a graceful restart for these masters in order for the new tests to show up.
Attachment #8850477 - Flags: review?(aselagea) → review+
Also gracefull restarted :

buildbot-master103.bb.releng.scl3.mozilla.com
buildbot-master104.bb.releng.scl3.mozilla.com
buildbot-master105.bb.releng.scl3.mozilla.com

And found linux64-stylo talos jobs on the masters.
Pushed by kmoir@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c309f93f5c33
Enable Talos tests for linux64-stylo builds r=jmaher DONTBUILD
https://hg.mozilla.org/mozilla-central/rev/c309f93f5c33
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
talos jobs are still not appearing on th.  am investigating
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
So looking at a recent taskgroup id on m-c

https://tools.taskcluster.net/task-group-inspector/#/dz74gwuyT8CIPq7fLNvIrg?_k=xrdan5

If you look at the linux64 talos builds they run green (example opt talos chrome)
https://tools.taskcluster.net/task-group-inspector/#/dz74gwuyT8CIPq7fLNvIrg/PN1Im_7aQoyHmqFLF697JA?_k=qpz5b8

the corresponding linux stylo64 ones fail with a bbb payload exception
https://tools.taskcluster.net/task-group-inspector/#/dz74gwuyT8CIPq7fLNvIrg/MRH_6HQaR5ikeU3SnS7qQA?_k=rmxe75

The name in the tc payload is 
Ubuntu HW 12.04 x64 mozilla-central stylo talos chromez
The name on the buildbot master of the job is
Ubuntu HW 12.04 Stylo x64 mozilla-central talos chromez

landed a patch to fix this
https://hg.mozilla.org/build/buildbot-configs/rev/39432e03fed0
With the bb change the name is
Ubuntu HW 12.04 x64 stylo mozilla-central talos chromez

on the master which still doesn't match.  Looking at tc transforms it's
Ubuntu HW 12.04 x64 mozilla-central stylo talos chromez

This patch changes it to  
Ubuntu HW 12.04 x64 stylo mozilla-central talos chromez
Attachment #8851651 - Flags: review?(bugspam.Callek)
Attached patch bug1338871name.patch (obsolete) — Splinter Review
wrong patch the first time
Attachment #8851651 - Attachment is obsolete: true
Attachment #8851651 - Flags: review?(bugspam.Callek)
Attachment #8851656 - Flags: review?(bugspam.Callek)
Comment on attachment 8851651 [details] [diff] [review]
bug1338871talosname.patch

I got a midair on a flag clear because of wrong patch, but here is my comment anyway.

>diff --git a/taskcluster/taskgraph/transforms/job/mozharness_test.py b/taskcluster/taskgraph/transforms/job/mozharness_test.py
>@@ -395,16 +396,22 @@ def mozharness_test_buildbot_bridge(conf
>         if m and m.group(1):
>             variant = m.group(1) + ' '
>         buildername = '{} {} {}talos {}'.format(
>             BUILDER_NAME_PREFIX[platform],
>             branch,
>             variant,
>             test_name
>         )
>+        if 'stylo' in buildername:

nit: "if variant == 'stylo'  rather than matching on buildername here.

Maybe instead do (without the above change on PREFIX):

```
  if variant == 'stylo':
    variant = 'stylo '  # trailing space needed to confirm to buildbot naming
  buildername = '{} {} {}talos {}'.format(...
```
Attachment #8851651 - Attachment is obsolete: false
Comment on attachment 8851656 [details] [diff] [review]
bug1338871name.patch

Review of attachment 8851656 [details] [diff] [review]:
-----------------------------------------------------------------

Looked again at the comment where you mention the swapped place in branch vs variant.

This is good, I'd love a comment along the lines of "this variant name has branch come after the variant type in BB" (or some better worded comment) and I'd still test against variant name here rather than `foo in buildername`
Attachment #8851656 - Flags: review?(bugspam.Callek) → review+
Ubuntu HW 12.04 x64 stylo mozilla-central talos chromez

is the new name in tc with this patch
note that variant has a space at the end because of line 396
Attachment #8851656 - Attachment is obsolete: true
Attachment #8851666 - Flags: review?(bugspam.Callek)
Attachment #8851666 - Flags: review?(bugspam.Callek) → review+
Pushed by kmoir@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3a6430335a19
Enable Talos tests for linux64-stylo builds r=Callek DONTBUILD
https://hg.mozilla.org/mozilla-central/rev/3a6430335a19
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
:camd wlach suggested that you might be able to help debug why these new linux64-stylo talos jobs that are running on m-c are not appearing on treeherder. See comment 54 for details. I'm not sure why the talos jobs aren't appearing when the builds and tests for this platform are.
Flags: needinfo?(cdawson)
It's there, just on the wrong row:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=5182b2c4b963ed87d038c7d9a4021463917076cd&filter-searchStr=stylo&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&group_state=expanded&selectedJob=86795430

Buildername:  Ubuntu HW 12.04 x64 stylo mozilla-central talos g3 

This is the first buildbot stylo job, and there are no stylo regexes in Treeherder at the moment (since Taskcluster ingestion doesn't use the awful hardcoded regexes):
https://github.com/mozilla/treeherder/search?q=stylo

They'll need adding to:
https://github.com/mozilla/treeherder/blob/master/treeherder/etl/buildbot.py
Flags: needinfo?(cdawson)
Thanks Ed!  Yeah, seeing those jobs threw me initially.  This explains it.

Kim: So if you (or someone related to this) would submit a PR for the change Ed described, we'll get it merged.
Flags: needinfo?(kmoir)
Depends on: 1351420
opened pr in bug 1351420
Flags: needinfo?(kmoir)
So the linux64-stylo jobs are appearing with the latest push to m-c on the staging treeherder ui

https://treeherder.allizom.org/#/jobs?repo=mozilla-central

They will appear on the regular treeherder ui once they do deployment.
No longer blocks: 1352173
Looks like there was a treeherder deploy because these are visible now.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=891981e67948aaebf7a63bba5181ef0a538ce163
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: