Closed Bug 1475196 Opened 2 years ago Closed 2 years ago

we are running talos on win7/10 and win7/10-msvc builds

Categories

(Testing :: Talos, enhancement)

enhancement
Not set

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: jmaher, Assigned: ahal)

Details

Attachments

(3 files, 1 obsolete file)

we recently turned on msvc builds for windows, and it appears we are duplicating our perf tests on all the integration branches on windows- this is making our windows infrastructure often backlogged.
:dmajor- would you know more about what tests are expected to run and on which branches for the windows msvc?  if we are doing this for a short time, it is ok, but longer than a month I would like to restrict what we run.
Flags: needinfo?(dmajor)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #0)
> we recently turned on msvc builds for windows, and it appears we are
> duplicating our perf tests on all the integration branches on windows- this
> is making our windows infrastructure often backlogged.

Am I understanding correctly, that the msvc builds run on m-c only, but somehow there is duplication of perf tests across _all_ integration branches? Can you point me to an example of such a job on, say, m-i?
Flags: needinfo?(dmajor) → needinfo?(jmaher)
inbound:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=msvc%20talos

autoland:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=msvc%20talos

mozilla-central:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=msvc%20talos

I thought we were going to run this and all tests on mozilla-central as this is tier-2; if you remove my treeherder filter word of talos, you can see all the unittests running as well.
Flags: needinfo?(jmaher)
The intent was certainly just to run on m-c. I removed the filter for "talos" leaving just "msvc" and I don't see the unittests on m-i though. Is there something special about talos that doesn't honor my run-on-projects attribute?
sorry, I had seen some test-verify jobs on mozilla-inbound and as I was switching branches I saw the full set on mozilla-central.  I admit I jumped the gun with connecting the dots incorrectly.  I think it is ok to have test-verify run on there, but it isn't necessary.
This is probably coming from lines such as this one https://searchfox.org/mozilla-central/source/taskcluster/ci/test/talos.yml#87

The direct fix would be to add a by-test-platform for msvc that restricts them to m-c. But perhaps it would be better to have these jobs use `built-projects`? (I don't feel super comfortable making the latter change myself though)
:ahal, could you help pick this up?  I will be on pto and back on Tuesday.
Flags: needinfo?(ahal)
(In reply to Joel Maher ( :jmaher ) (UTC+2) (PTO: Back July 17) from comment #5)
> I think it is ok to have
> test-verify run on there, but it isn't necessary.

We should probably disable test-verify too, because (I suspect) its presence is also triggering build tasks on every m-i/autoland push, which wasn't intended.
Just clarifying..

The goal here is to disable talos on Windows msvc platforms on autoland and inbound (but leave them on central and try)?
Assignee: nobody → ahal
Status: NEW → ASSIGNED
Flags: needinfo?(ahal)
(In reply to Andrew Halberstadt [:ahal] from comment #9)
> Just clarifying..
> 
> The goal here is to disable talos on Windows msvc platforms on autoland and
> inbound (but leave them on central and try)?

Correct. And the same for test-verify jobs.
This refactoring will make it easier to set 'run-on-projects' keys across all talos tasks
in one go (rather than needing to modify each task individually).

The job-defaults are recursively (and naively) merged into the task definitions, so each
task that changes 'run-on-projects' also needs to use the 'by-test-platform' key.
The windows7-32.* line is modified because it also matches the msvc tasks, and taskgraph
errors out if there is any ambiguity over which value to use.

Depends on D2136.
This should work, but I want to diff the taskgraphs before/after first to make sure there are no unintended side effects. I'll likely flag them for review on Monday.
Great, thanks. I'd feel better if someone better versed in this stuff would be the reviewer, Dustin maybe?
Jmaher is probably the best reviewer. Since he'll be back tomorrow (and I'm working through some unintended side effects in my patch anyway), I'll just flag him.
Attachment #8992041 - Attachment is obsolete: true
Attached file Taskgraph diff
This is a diff of the full task graph (before and after). You'll notice that a lot of the "test-verify" and "code-coverage" tasks have been changed from 1 chunk to 0 chunks.

Joel, this is happening from that "perfile_number_of_chunks" function you added awhile back:

Is this chunk changing normal/expected? Should I be worried about these differences? It's a bit of a mystery to me.
Flags: needinfo?(jmaher)
Attachment #8992362 - Attachment is patch: false
Comment on attachment 8992040 [details]
Bug 1475196 - [ci] Move 'run-on-projects' to the job-defaults section in taskcluster/ci/talos.yml

Joel Maher ( :jmaher ) (UTC+2) has approved the revision.

https://phabricator.services.mozilla.com/D2136
Attachment #8992040 - Flags: review+
Comment on attachment 8992042 [details]
Bug 1475196 - [ci] Stop scheduling test-verify tasks on Windows msvc builds on inbound/autoland

Joel Maher ( :jmaher ) (UTC+2) has approved the revision.

https://phabricator.services.mozilla.com/D2138
Attachment #8992042 - Flags: review+
I am not sure why you are seeing different chunks for verify and coverage jobs.  Could it be that you are comparing apples to apples with the same base revision on mozilla-central?  One thing I found while working on test-verify is that while ./mach locally, it would use the latest taskgraph/data from the tip of mozilla-central, it would work as expected one minute and running it a minute later when the tip changes on m-c I would get different results.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher ) (UTC+2) from comment #21)
> I am not sure why you are seeing different chunks for verify and coverage
> jobs.  Could it be that you are comparing apples to apples with the same
> base revision on mozilla-central?  One thing I found while working on
> test-verify is that while ./mach locally, it would use the latest
> taskgraph/data from the tip of mozilla-central, it would work as expected
> one minute and running it a minute later when the tip changes on m-c I would
> get different results.

I'm comparing locally only (without my patch vs with my patch).

But looking again, it looks like that 'perfile_number_of_chunks' function you wrote is using the changed files of the commit as an input. So I guess having the chunks change there isn't an error, I just happen to not be changing anything that would trigger those tasks. In other words, I think that change is expected and safe to ignore.
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6291f882109
[ci] Move 'run-on-projects' to the job-defaults section in taskcluster/ci/talos.yml r=jmaher
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d510f299cf3
[ci] Stop scheduling test-verify tasks on Windows msvc builds on inbound/autoland r=jmaher
Hm, running `arc diff` didn't update the commit message.. that's not good.
https://hg.mozilla.org/mozilla-central/rev/e6291f882109
https://hg.mozilla.org/mozilla-central/rev/1d510f299cf3
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Just to make sure: is it expected we no longer schedule MSVC builds on inbound/autoland? I was just wondering why they disappeared and this bug seems to suggest we only wanted to change things for tests/Talos.
Yes, the intent was to run builds and tests on m-c only. The builds weren't explicitly set to run on inbound, but the tests triggered the builds as a dependency, so removing the tests got us back to the intended situation.
You need to log in before you can comment on or make changes to this bug.