1561324 - Determine Windows configuration options that reduce noise on reference laptop (windows10-64-ref-hw-2017)

Assignee

Description

•

5 years ago

Using the reference laptop in CI and locally we see very significant variations in performance results.
This makes getting reproduceable results extremely difficult.

The purpose of this bug is to collect Windows configuration options that minimize OS-induced noise.

These are the configurations options that I've been disabling locally:
• Indexing Service (file search)
• Windows Defender (default antivirus)

Andrew Creskey [:acreskey]

Assignee

Comment 1

•

5 years ago

Denis, Mike, I know that you two have had some success in reducing noise on the 2017 reference laptop.

Can you please add any OS features that you are disabling?

Flags: needinfo?(mconley)

Flags: needinfo?(dpalmeiro)

Robert Wood [:rwood]

Updated

•

5 years ago

Priority: -- → P1

Florian Quèze [:florian]

Comment 2

•

5 years ago

In my experience, operating system updates make the system much slower, due to triggering a lot of disk activity. Both when they are being downloaded, and after they have been installed during the next ~10h while Windows is 'optimizing' stuff on the disk after the update install.

Not sure if your scripts already include this, but when I was trying to get numbers automatically from this hardware, I used a script that waited for CPU idle and disk idle before starting Firefox.

Denis Palmeiro [:denispal]

Comment 3

•

5 years ago

Windows defender was the big one for me. After I turned that off (I used 3 different ways to do this to make sure it's never on), the machine became quite usable. Other than that, I just make sure disk is close to 0% before I begin my tests.

Flags: needinfo?(dpalmeiro)

Mike Conley (:mconley) (:⚙️)

Comment 4

•

5 years ago

I disable the Superfetch / Prefetch stuff (now called SysMain in Services), because otherwise, I was noticing a big shift in measurement over time as Windows "learned" what I liked to run during start-up.

Flags: needinfo?(mconley)

Greg Mierzwinski [:sparky]

Comment 5

•

5 years ago

When I was running some tests locally, to reduce noise, I disabled bluetooth, enabled metered network connection, and disabled windows updates.

Randell Jesup [:jesup] (needinfo me)

Comment 6

•

5 years ago

I should note that Windows Defender on is the "default" mode that users will use systems in, so our testing should reflect that.

Greg Mierzwinski [:sparky]

Comment 7

•

5 years ago

:jesup, the problem with leaving defaults on is that we introduce false positives/negatives into our data, and this makes it a bit more difficult to directly relate the performance issues to either firefox changes or because some OS tasks whose - resource-usage interacts poorly with firefox - have intermittently started. However, based on this, I'm thinking it might be worthwhile if we look into testing interoperability throughput performance separately from application-only throughput performance.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

5 years ago

Assignee: nobody → acreskey

Status: NEW → ASSIGNED

Andrew Creskey [:acreskey]

Assignee

Comment 8

•

5 years ago

To get an idea on where we stand, I made a fresh baseline here:
windows10-64-ref-hw-2017 and also windows10-64
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=792c62b2ad217ba4c0a49c639d1f6696eac2578b&newProject=try&newRevision=5aec96c94b123de3b4b7d2af1292c86ac20e3e01&framework=10

Andrew Creskey [:acreskey]

Assignee

Updated

•

5 years ago

Summary: Determine Windows configuration options that reduce noise on reference laptop (-ux) → Determine Windows configuration options that reduce noise on reference laptop (windows10-64-ref-hw-2017)

Andrew Creskey [:acreskey]

Assignee

Comment 9

•

5 years ago

•

Edited

Noise is still a major problem on the reference laptop.
Comparing 10 runs against 10 runs of the same changeset I see large differences:

• Amazon warm load metrics off by ~10%
• Facebook loadtime off by 13%
• Netflix metrics off by ~10%

Andrew Creskey [:acreskey]

Assignee

Comment 10

•

5 years ago

Attached image netflix.png — Details

Andrew Creskey [:acreskey]

Assignee

Comment 11

•

5 years ago

Attached image netflix_replicates.png — Details

This is a particularly interesting replicates view.
Note the batch of loadtimes that come in at ~200ms while the median is about 2000ms.

Andrew Creskey [:acreskey]

Assignee

Comment 12

•

5 years ago

Ionut, can I ask for your thoughts on the noise in this comparison?
It's a changeset compared against itself on windows10-64-ref-hw-2017 and also windows10-64
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=792c62b2ad217ba4c0a49c639d1f6696eac2578b&newProject=try&newRevision=5aec96c94b123de3b4b7d2af1292c86ac20e3e01&framework=10

Perfherder is picking up two significant changes on the windows10-64-ref-hw-2017
I also see a 9% and a 10% change on raptor-tp6-imgur-firefox and raptor-tp6-outlook-firefox for windows10-64

I don't have any experience with sheriffing, but I would imagine that all of these are problematic.

If we could solve just the issues that lead to the changes marked as "Significant/Important" by perfherder, would that get us most of the value?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Comment 13

•

5 years ago

•

Edited

(In reply to Andrew Creskey from comment #12)

Ionut, can I ask for your thoughts on the noise in this comparison?
It's a changeset compared against itself on windows10-64-ref-hw-2017 and also windows10-64
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=792c62b2ad217ba4c0a49c639d1f6696eac2578b&newProject=try&newRevision=5aec96c94b123de3b4b7d2af1292c86ac20e3e01&framework=10

Perfherder is picking up two significant changes on the windows10-64-ref-hw-2017
I also see a 9% and a 10% change on raptor-tp6-imgur-firefox and raptor-tp6-outlook-firefox for windows10-64

I don't have any experience with sheriffing, but I would imagine that all of these are problematic.

Indeed, this is a weird situation. I actually see more changes here than those you mentioned. They vary from +/- 4% to 10%.
Seems like our Windows platform's environments aren't yet quite suited for properly running perf tests.

If we could solve just the issues that lead to the changes marked as "Significant/Important" by perfherder, would that get us most of the value?

Yes, I see this as a valuable step forward.

Flags: needinfo?(igoldan)

Andrew Creskey [:acreskey]

Assignee

Comment 14

•

5 years ago

Thank you Ionut.

I did do some quick tests:

1. Disable OCSP and compare against same revision
This is a known source of noise on the reference laptop. I didn't think it would have any impact here because we connect to mitmproxy and thus don't use the actual site certificates.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1a1b8f51b847ef0affe3f17df8a399911ded8021&newProject=try&newRevision=344295d59f90f2e5a1bb2c4aa471c0e903bbe60c&framework=10

Maybe I got lucky on the runs but this comparison doesn't show any flagged perf differences.
So this could be worth looking into.
If this did reduce noise in the test environment, we could argue for disabling OCSP in the perf profile, since OCSP itself will be replaced in the not-too-distant future.

2. Defer setTimeouts() during pageload (would otherwise run on idle)

This is another known source of bimodal behaviour.

Comparing this job against itself also gives a perfherder diff with no flagged perf differences. (although still quite a bit of noise).
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=411785ebb7de902bc37af52759d4d4e2aef83532&newProject=try&newRevision=71cb7ba08f32eea6891e401174323e62209050bf&framework=10

If these results had been smoother, maybe we could consider a 'deterministic load' preference for the perf profile, but I'm not sure...

Andrew Creskey [:acreskey]

Assignee

Comment 15

•

5 years ago

Back to the bug as logged.

Dave, who could explain to me the differences in the OS setup between the reference laptops in test (i.e. windows10-64-ref-hw-2017) and the devices that, to my understanding, run virtualized on AWS, such as windows10-64 ?

Presumably the images for windows10-64 on AWS don't allow system updates and Windows Defender to be running?

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 16

•

5 years ago

(In reply to Andrew Creskey from comment #15)

Dave, who could explain to me the differences in the OS setup between the reference laptops in test (i.e. windows10-64-ref-hw-2017) and the devices that, to my understanding, run virtualized on AWS, such as windows10-64 ?

Kendall: Could someone from your team help to understand the differences between these platforms in automation, or point Andrew to the relevant documentation/configuration.

Flags: needinfo?(dave.hunt) → needinfo?(klibby)

Kendall Libby [:fubar] (he/him)

Comment 17

•

5 years ago

Mark knows the most about the ref laptops, and is familar with AWS, redirecting NI to him.

Flags: needinfo?(klibby) → needinfo?(mcornmesser)

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 18

•

5 years ago

The two most significant differences is the quality of the hardware, and a difference in the Windows 10 build; 1803 for AWS and 1703 for the reference laptops. General configuration like Windows Update and Windows Defender are disable for both platforms.

Do we have examples of the noisy tests from the last week? If so I can start looking through papertrail logs and see if anything obvious jumps out.

Also fee free to hit me up on Slack or send a meeting invite to discuss this in further detail.

Flags: needinfo?(mcornmesser)

Andrew Creskey [:acreskey]

Assignee

Comment 19

•

5 years ago

Thank you Mark.

I think raptor-tp6-netflix-firefox loadtime opt is as good as any for a noisy test example.
If you see anything in the papertrail, I would be curious.
My hunch is that the runtime is fighting with the OS for resources like the slow platter drive, but I don't know for sure.

I'll try some local testing and the script and suggestion from Comment 2 and Comment 3 (wait until the disk is quiet before starting tests) to see if that helps.

By the way, disabling OCSP is not helpful here, I was just lucky in Comment 14.
Here are two pushes with ~10 retries of the same revision compared: 4 tests flagged as significant changes (~8% to 13%)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3ebb29d9815219c462e95200016d6fadd84331dc&newProject=try&newRevision=d8fdc966ff39defbe61af3491ffde17fe6983bb8&framework=10

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 20

•

5 years ago

There was nothing significant in Papertrail.

Is this an issue that progressively gotten worse, or has these test always been noisy?

Andrew Creskey [:acreskey]

Assignee

Comment 21

•

5 years ago

Thanks for looking Mark
As far as I know these tests have always been very noisy.
These are some related bugs:
Bug 1525017 (8 months ago)
Bug 1536090 (7 months ago)

Comment 22

•

5 years ago

I ran a test where I made raptor wait for idle CPU (below 3%) and disk (no new activity) as in :florian's script

However, within the time given to wait (only 15 seconds), the device never comes close to idle:

For example,

[task 2019-10-03T15:27:53.365Z] 15:27:53     INFO -  raptor-main Info: CPU use: 27.7%
[task 2019-10-03T15:27:53.365Z] 15:27:53     INFO -  raptor-main Info: AJC - disk reads: 11
[task 2019-10-03T15:27:53.365Z] 15:27:53     INFO -  raptor-main Info: AJC - disk writes: 8

and

[task 2019-10-03T15:25:27.378Z] 15:25:27     INFO -  raptor-main Info: CPU use: 5.8%
[task 2019-10-03T15:25:27.378Z] 15:25:27     INFO -  raptor-main Info: AJC - disk reads: 321
[task 2019-10-03T15:25:27.378Z] 15:25:27     INFO -  raptor-main Info: AJC - disk writes: 4

I'll try to relax the conditions and give it a bit more time to wait.

Andrew Creskey [:acreskey]

Assignee

Comment 23

•

5 years ago

Mark, I forgot to ask you -- can you tell me if the Windows Indexing Service is disabled on these configurations?

Flags: needinfo?(mcornmesser)

Andrew Creskey [:acreskey]

Assignee

Comment 24

•

5 years ago

I'll investigate this further, but I was able to get the reference laptop to be roughly idle before starting the pageload tests.

I reduced the raptor post_startup_delay from 30 seconds to 1 second and instead made the runner wait for <5 % CPU usage and only a handful of disk read/writes.

The wait for near idle seems to take between 25 and 45 seconds.

I'm now bumping into test timeouts, but it could be an error in how I've set this up.

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 25

•

5 years ago

(In reply to Andrew Creskey from comment #23)

Mark, I forgot to ask you -- can you tell me if the Windows Indexing Service is disabled on these configurations?

It is disabled.

Flags: needinfo?(mcornmesser)

Andrew Creskey [:acreskey]

Assignee

Comment 26

•

5 years ago

I've spun off Bug 1589356 based on Florian's script comment 2 - waiting for the OS to be idle before Raptor starts a test (warm or cold load).
Early results are promising, at least on the other desktop hardware.

Comment 27

•

5 years ago

Mark, I think the last question -- can you tell me if the Windows Superfetch / Prefetch is disabled, as described in comment 4.
This could, at least theoretically, introduce some irregularities into the page load tests.

Flags: needinfo?(mcornmesser)

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 28

•

5 years ago

(In reply to Andrew Creskey from comment #27)

Mark, I think the last question -- can you tell me if the Windows Superfetch / Prefetch is disabled, as described in comment 4.
This could, at least theoretically, introduce some irregularities into the page load tests.

Currently those service are not explicitly disabled. I have asked Bitbar to check one of the laptops to see if the services is running or not.

Flags: needinfo?(mcornmesser)

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 29

•

5 years ago

Bitbar verified that Superfetch was running.

Andrew Creskey [:acreskey]

Assignee

Comment 30

•

5 years ago

Thank you Mark.
Let me ask around for input on this.
Disabling Superfetch could give us more reliable results (again reducing 'realism' in the same way that Windows Update, Windows Defender, and Windows Indexing Service are disabled).

Andrew Creskey [:acreskey]

Assignee

Comment 31

•

5 years ago

The view of the performance team was that Superfetch should not impact pageload performance.

And it turns out that :denispal had done tests that confirm this.

So I'll close this bug -- it doesn't look like there's anything to be done here.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → WONTFIX

Mike Conley (:mconley) (:⚙️)

Comment 32

•

5 years ago

FWIW, Superfetch / Prefetch would definitely impact startup tests. Is that a consideration here, or is this bug strictly about page load?

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Assignee

Comment 33

•

5 years ago

I did create this bug to try and reduce the page load noise but if it could help other tests, that would be great.
Specifically the target was the Windows configuration used in CI (AWS and Bitbar devices).

I know next to nothing about the startup tests -- are they running on AWS and or on the reference laptop in automation?

Flags: needinfo?(acreskey)

Mike Conley (:mconley) (:⚙️)

Comment 34

•

5 years ago

(In reply to Andrew Creskey from comment #33)

I know next to nothing about the startup tests -- are they running on AWS and or on the reference laptop in automation?

They will eventually be running on the reference laptop in automation.

Andrew Creskey [:acreskey]

Assignee

Comment 35

•

5 years ago

(In reply to Mike Conley (:mconley) (:⚙️) (Wayyyy behind on needinfos) from comment #34)

(In reply to Andrew Creskey from comment #33)

I know next to nothing about the startup tests -- are they running on AWS and or on the reference laptop in automation?

They will eventually be running on the reference laptop in automation.

Interesting.

Then let's flip this around: is there any reason to not disable SuperFetch/Sysmain in the automation Windows configurations?

If we're favouring reproducible results in general then I think this can't hurt.

If startup tests are coming to automation then I think this is absolutely necessary.

Mark, I'm leaning on you again for thoughts, next steps?

Status: RESOLVED → REOPENED

Flags: needinfo?(mcornmesser)

Resolution: WONTFIX → ---

Mark Cornmesser [:markco] OOO 2024/04/15

Comment 36

•

5 years ago

The start up testing is going to be a very small pool separate from other reference laptops.

I can set up a laptop with a testing workerType in automation, and have superfetch disabled on that laptop. We will then be able to push tests to it with changes similar to https://hg.mozilla.org/try/rev/c7c581111bdf320defefe476560897c7c810d62e . It is such a s mall pool of nodes that we would have to stick to one or two testing nodes.

Flags: needinfo?(mcornmesser)

Andrew Creskey [:acreskey]

Assignee

Comment 37

•

5 years ago

Thanks again Mark.
Given that there is already work planned for setting up the separate startup testing pool, I don't see anything else to do here.

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

Resolution: --- → WONTFIX

netflix.png 5 years ago Andrew Creskey [:acreskey] 228.31 KB, image/png		Details
netflix_replicates.png 5 years ago Andrew Creskey [:acreskey] 542.90 KB, image/png		Details