run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests

RESOLVED FIXED in Firefox 60

Status

defect
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: jmaher, Assigned: rwood)

Tracking

unspecified
mozilla61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox60 fixed, firefox61 fixed)

Details

(Whiteboard: [PI:March])

Attachments

(2 attachments)

as we migrate to new hardware we will not be installing windows 7 as an option on the new hardware.  This means we will not have 32 bit coverage and there is still a need to ensure we don't have regressions.

Once we have windows10 running on the new hardware and not in buildbot, we can easily turn on 32 bit binaries testing on the windows10-64 os/hardware.  This is not needed for every push, so we will only run this on autoland.

My thoughts are that we will do:
* autoland only
* only bisect/investigate if 5% regression and a reported regression is not seen on 64 bit windows10
Whiteboard: [PI:February]
Depends on: 1429597
lets make this bug track the work to make this official.

here is a patch that I have been using to test:
https://hg.mozilla.org/try/rev/a104cd781adbf49649662bdfeb73f417b221eb4c

I suspect it won't be backwards compatible, we should:
* fix windows7 xperf jobs to run properly (they run on a VM)
* ensure mozharness config changes are in the right files and complete
* consider splitting out reftest from that patch.

:rwood, could you pick up this work in the short term?
Blocks: 1429597
No longer depends on: 1429597
Flags: needinfo?(rwood)
Summary: run windows 32 bit builds on windows10-64 hardware for talos performance tests → run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #1)
> lets make this bug track the work to make this official.
> 
> here is a patch that I have been using to test:
> https://hg.mozilla.org/try/rev/a104cd781adbf49649662bdfeb73f417b221eb4c
> 
> I suspect it won't be backwards compatible, we should:
> * fix windows7 xperf jobs to run properly (they run on a VM)
> * ensure mozharness config changes are in the right files and complete
> * consider splitting out reftest from that patch.
> 
> :rwood, could you pick up this work in the short term?

Sure... I don't quite understand though - what mozharness config changes do you mean? I've never worked on xperf or refests, are they for that maybe? Thanks :)

I'll file dependent bugs:
- for the reftest part of your patch
- for porting xperf win7 tests to run on 32 bit builds on win 10
- for mozharness config for ?
Flags: needinfo?(rwood) → needinfo?(jmaher)
Oh I see, you have mozharness configs already in

testing/mozharness/configs/talos/windows_config.py

I'll use this bug for that patch.
Flags: needinfo?(jmaher)
yeah, you figured out the mozharness bits- currently win7 xperf runs on VM, it should remain on VM when this patch is ready for review- no need for another bug.

we have bug 1435844 for reftests- things are moving along!
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #4)
> yeah, you figured out the mozharness bits- currently win7 xperf runs on VM,
> it should remain on VM when this patch is ready for review- no need for
> another bug.
> 
> we have bug 1435844 for reftests- things are moving along!

Ok thank you sir!
Assignee: nobody → rwood
Status: NEW → ASSIGNED
In my try run (coomment 7) I have the patch working except the known failure of xperf. It is running on the AWS VM.
Update: The win32 tests (on Win 10 host) should run on: ['mozilla-beta', 'mozilla-central', 'mozilla-inbound', 'autoland', 'try']

Also note, talos g2 (damp) fails consistently on "complicated.netmonitor". There's an intermittent open for that however this seems consistent on the new h/w so I'm going to disable "complicated.netmonitor" from the damp test.
(In reply to Robert Wood [:rwood] from comment #10)
> Update: The win32 tests (on Win 10 host) should run on: ['mozilla-beta',
> 'mozilla-central', 'mozilla-inbound', 'autoland', 'try']
> 
> Also note, talos g2 (damp) fails consistently on "complicated.netmonitor".
> There's an intermittent open for that however this seems consistent on the
> new h/w so I'm going to disable "complicated.netmonitor" from the damp test.

Could you do that only on windows?
(Services.appinfo.OS == "WINNT")
Also, I fixed some races around netmonitor DAMP test in bug 1419327.
It would be handy to check if it fixes this one?
(In reply to Alexandre Poirot [:ochameau] from comment #12)
> Also, I fixed some races around netmonitor DAMP test in bug 1419327.
> It would be handy to check if it fixes this one?

Thanks Alexandre, here's a try run on the new talos windows hardware, with your damp test patch imported from bug 1419327:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bd6e1fa2b3e408a36a855977c0c627e9a5f19e7
Xperf (win7-opt) runs on aws (i.e machine i-01579509c8d7db2cc) [1]:

:27:43     INFO -              u'xperf-e10s': {u'pagesets_name': u'tp5n.zip',
15:27:43     INFO -                              u'talos_options': [u'--xperf_path',
15:27:43     INFO -                                                 u'"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"'],

15:28:39     INFO - Calling ['Z:\\task_1518015759\\build\\venv\\Scripts\\python', 'Z:\\task_1518015759\\build\\tests\\talos\\talos\\run_tests.py', '--branchName', 'try', '--suite', 'xperf-e10s', '--executablePath', 'Z:\\task_1518015759\\build\\application\\firefox\\firefox', '--symbolsPath', 'https://queue.taskcluster.net/v1/task/Sq8JNtALRYqEf5n56ewXew/artifacts/public/build/target.crashreporter-symbols.zip', '--title', 'i-01579509c8d7db2cc', '--webServer', 'localhost', '--webServer', 'localhost', '--webServer', 'localhost', '--webServer', 'localhost', '--log-tbpl-level=debug', '--log-errorsummary=Z:\\task_1518015759\\build\\blobber_upload_dir\\xperf-e10s_errorsummary.log', '--log-raw=Z:\\task_1518015759\\build\\blobber_upload_dir\\xperf-e10s_raw.log'] with output_timeout 3600
15:28:40     INFO -  ERROR: xperf.exe cannot be found at the path specified
15:28:40    ERROR - Return code: 1

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c3b5145e074c92a234249c6791ce20c9bfbccca5&selectedJob=160860154
The xperf.exe path is the same on the existing xperf job on win7 aws [1]

14:46:14     INFO -              u'xperf-e10s': {u'pagesets_name': u'tp5n.zip',
14:46:14     INFO -                              u'talos_options': [u'--xperf_path',
14:46:14     INFO -                                                 u'"c:/Program Files/Microsoft Windows Performance Toolkit/xperf.exe"'],

I have no idea why it's failing on the taskcluster job (comment 15)

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=talos&selectedJob=160852600
Ahh, 'c:/Program Files/Microsoft Windows Performance Toolkit' is missing from the path in the tc job, maybe that's it
(In reply to Robert Wood [:rwood] from comment #14)
> (In reply to Alexandre Poirot [:ochameau] from comment #12)
> > Also, I fixed some races around netmonitor DAMP test in bug 1419327.
> > It would be handy to check if it fixes this one?
> 
> Thanks Alexandre, here's a try run on the new talos windows hardware, with
> your damp test patch imported from bug 1419327:
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=9bd6e1fa2b3e408a36a855977c0c627e9a5f19e7

Still fails on 'complicated.netmonitor' unfortunately, so for now I'll disable that subtest inside DAMP on Windows only as Alexandre suggested in comment 11
Comment on attachment 8949562 [details]
Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests;

https://reviewboard.mozilla.org/r/218916/#review224646

just need to sort out the win_worker_type_platform.

::: taskcluster/taskgraph/transforms/tests.py:928
(Diff revision 1)
>              test['worker-type'] = MACOSX_WORKER_TYPES['macosx64']
>          elif test_platform.startswith('win'):
> -            win_worker_type_platform = WINDOWS_WORKER_TYPES[
> -                test_platform.split('/')[0]
> -            ]
> -            if test.get('suite', '') == 'talos' and 'ccov' not in test['build-platform']:
> +            # for talos xperf we want win7vm; all else on win10 hw
> +            if test.get('suite', '') == 'talos' and "--suite=xperf-e10s" in \
> +              test['mozharness']['extra-options']:
> +                win_worker_type_platform = WINDOWS_WORKER_TYPES['windows7-32']

I am concerned that all other win7 unittests will have problems as well- here this is for talos xperf only.  Maybe:
if suite == talos:
   wintype = win10
   

or something like:
if test['virtualization'] == hardware:
  wintype = win10

::: testing/mozharness/configs/talos/windows_vm_config.py:57
(Diff revision 1)
>          "win64": "python3_x64.manifest",
>      },
>      "env": {
>          # python3 requires C runtime, found in firefox installation; see bug 1361732
> -        "PATH": "%(PATH)s;c:\\slave\\test\\build\\application\\firefox;"
> +        "PATH": "%(PATH)s;c:\\slave\\test\\build\\application\\firefox;" \
> +                "c:\\Program Files\\Microsoft Windows Performance Toolkit\\;"

is this needed?

::: testing/talos/talos/tests/devtools/addon/content/damp.html:15
(Diff revision 1)
> +// Bug 1400580 disable 'complicated.netmonitor' on Win
> +ChromeUtils.import("resource://gre/modules/Services.jsm");
> +var run_complicated_netmonitor = true;
> +if (Services.appinfo.OS == "WINNT") {
> +  run_complicated_netmonitor = false;
> +}

scope creep, but I am fine with it in here.
Attachment #8949562 - Flags: review?(jmaher) → review-
:markco can you update us in this bug with the total number of available windows moonshot machines?
Flags: needinfo?(mcornmesser)
Right now there are 29 that are ready to pick up tasks. If it helps i can can get additional machines stood up Friday morning. Is there a specific number currently needed?

I am planning Monday am at the latest to begin deploying the balance of the Windows nodes.
Flags: needinfo?(mcornmesser)
Comment on attachment 8949562 [details]
Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests;

https://reviewboard.mozilla.org/r/218916/#review224646

> I am concerned that all other win7 unittests will have problems as well- here this is for talos xperf only.  Maybe:
> if suite == talos:
>    wintype = win10
>    
> 
> or something like:
> if test['virtualization'] == hardware:
>   wintype = win10

I'll try that thanks

> is this needed?

I thought it was but I'll take it out and try without thanks

> scope creep, but I am fine with it in here.

Yeah good point I'm going to file a separate bug for that and cc the test owner
I'm going to move the change to the DAMP test to it's own Bug 1437028
Depends on: 1437028
Comment on attachment 8949562 [details]
Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests;

https://reviewboard.mozilla.org/r/218916/#review224804

looking much better, but still a concern in the transform.

::: taskcluster/taskgraph/transforms/tests.py:929
(Diff revisions 1 - 2)
> -              test['mozharness']['extra-options']:
> -                win_worker_type_platform = WINDOWS_WORKER_TYPES['windows7-32']
> -            else:
>                  win_worker_type_platform = WINDOWS_WORKER_TYPES['windows10-64']
> +            else:
> +                win_worker_type_platform = WINDOWS_WORKER_TYPES['windows7-32']

this won't work as we are forcing win10-vm jobs to run on win7.  This transform can change the machine type we choose for all jobs, in this case unittests and talos perf tests.
Attachment #8949562 - Flags: review?(jmaher) → review-
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #36)
> Comment on attachment 8949562 [details]
> Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for
> talos performance tests;
> 
> https://reviewboard.mozilla.org/r/218916/#review224804
> 
> looking much better, but still a concern in the transform.
> 
> ::: taskcluster/taskgraph/transforms/tests.py:929
> (Diff revisions 1 - 2)
> > -              test['mozharness']['extra-options']:
> > -                win_worker_type_platform = WINDOWS_WORKER_TYPES['windows7-32']
> > -            else:
> >                  win_worker_type_platform = WINDOWS_WORKER_TYPES['windows10-64']
> > +            else:
> > +                win_worker_type_platform = WINDOWS_WORKER_TYPES['windows7-32']
> 
> this won't work as we are forcing win10-vm jobs to run on win7.  This
> transform can change the machine type we choose for all jobs, in this case
> unittests and talos perf tests.

I didn't know there were any Win 10 vm jobs, I thought all win 10 was h/w. Alright scratch that one.
all mochitest, xpcshell, web-platform-tests, etc. run on vm- only talos and reftest run on hardware- reftest is due to issues with the win10 vm, ideally it would only be talos on hardware.
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #38)
> all mochitest, xpcshell, web-platform-tests, etc. run on vm- only talos and
> reftest run on hardware- reftest is due to issues with the win10 vm, ideally
> it would only be talos on hardware.

Ah, thanks, yeah I keep thinking this is only for talos and not *all* test jobs
Thanks for the feedback, ok I *think* I have it correct now :)
Comment on attachment 8949562 [details]
Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests;

https://reviewboard.mozilla.org/r/218916/#review224884

excellent, we need to wait until machines are available.  Can you do a try run with:
./mach try -b do -p win32,win64 -u all -t all
Attachment #8949562 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #44)
> Comment on attachment 8949562 [details]
> Bug 1431161 - run windows 32 and 64 bit builds on windows10-64 hardware for
> talos performance tests;
> 
> https://reviewboard.mozilla.org/r/218916/#review224884
> 
> excellent, we need to wait until machines are available.  Can you do a try
> run with:
> ./mach try -b do -p win32,win64 -u all -t all

Thanks for the review, and right - we can't land it until the pool is ready. Good idea, thanks - landed '-u all -t all' on try (comment 45).
Rebased (and fixed conflicts) and landing on try again
Blocks: 1434056
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a1711e96c622
run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests; r=jmaher
Backed out for talos performance test failures. 

backout: https://hg.mozilla.org/integration/autoland/rev/878d64506602a00a86f0c8e2eb970909a0276949

push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=a1711e96c6227cb839be7dee8e6c0d6ec9ae1750&selectedJob=163523588

failure log: https://tools.taskcluster.net/groups/ZYqwCbsUR5ejHuvQBaHUGQ/tasks/ejHxjg34ToKgAxr9uY3N0w/runs/0/logs/public%2Flogs%2Flive_backing.log

[taskcluster 2018-02-21T19:37:47.587Z] TASK FAIL since the task payload is invalid. See errors:
[taskcluster 2018-02-21T19:37:47.587Z] - supersederUrl: Additional property supersederUrl is not allowed
[taskcluster 2018-02-21T19:37:47.588Z] Task not successful due to following exception(s):
[taskcluster 2018-02-21T19:37:47.588Z] Exception 1)
[taskcluster 2018-02-21T19:37:47.588Z] Validation of payload failed for task ejHxjg34ToKgAxr9uY3N0w
[taskcluster 2018-02-21T19:37:47.588Z]
Flags: needinfo?(rwood)
Thanks Natalia, we have a potential fix under review
Flags: needinfo?(rwood)
Blocks: 1439991
Comment on attachment 8952844 [details]
Bug 1431161 - Temporarily turn off coalescing on new win tc h/w;

https://reviewboard.mozilla.org/r/222070/#review228352

Beautiful!
Attachment #8952844 - Flags: review?(pmoore) → review+
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0160e724e111
run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests; r=jmaher
https://hg.mozilla.org/integration/autoland/rev/213725db126c
Temporarily turn off coalescing on new win tc h/w; r=pmoore
https://hg.mozilla.org/mozilla-central/rev/0160e724e111
https://hg.mozilla.org/mozilla-central/rev/213725db126c
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
and here is the "massive" set of wins we get:
https://treeherder.mozilla.org/perf.html#/alerts?id=11706
Backout by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bc49a32f706
backout for win10 hardware failures. r=me
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/c9ec0c37349c
backout for win10 hardware failures. r=me a=backout CLOSED TREE
Status: RESOLVED → REOPENED
Flags: needinfo?(rwood)
Resolution: FIXED → ---
Target Milestone: mozilla60 → ---
Flags: needinfo?(rwood)
Blocks: 1439694
Looks like this backout also disabled the Windows QR talos test jobs from getting run at all. I had just turned those on recently in bug 1440968 and they're not showing up on TreeHerder any more.
Re-opened the review and rebased it, to be more prepared for when we try to land this again
Depends on: 1441208
:markco- any luck figuring out the disk space issues?
Flags: needinfo?(mcornmesser)
Whiteboard: [PI:February] → [PI:March]
The generic worker upgrade (Bug 1443589) will address this.
Depends on: 1443589
Flags: needinfo?(mcornmesser)
Blocks: 1280365
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4992808ab5a3
run windows 32 and 64 bit builds on windows10-64 hardware for talos performance tests. r=rwood
https://hg.mozilla.org/mozilla-central/rev/4992808ab5a3
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Whiteboard: [PI:March] → [PI:March][checkin-needed-beta]
https://hg.mozilla.org/releases/mozilla-beta/rev/156e4b64363d
Whiteboard: [PI:March][checkin-needed-beta] → [PI:March]
See Also: → 1458638
You need to log in before you can comment on or make changes to this bug.