1712682 - 47.3 - 2.71% google-docs LastVisualChange / google-docs PerceptualSpeedIndex + 8 more (Linux, Windows) regression on Fri May 21 2021

Reporter

Description

•

3 years ago

Perfherder has detected a browsertime performance regression from push 94a871824a5da2db5dfda13c3a9c51b37c1da275. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio	Suite	Test	Platform	Options	Absolute values (old vs new)
47%	google-docs	LastVisualChange	linux1804-64-shippable-qr	cold webrender	8,716.67 -> 12,840.00
47%	google-docs	LastVisualChange	linux1804-64-shippable-qr	cold webrender	8,690.00 -> 12,760.00
42%	google-docs	LastVisualChange	windows10-64-shippable	cold	7,811.58 -> 11,126.67
23%	twitter	FirstVisualChange	linux1804-64-shippable	warm	160.00 -> 196.67
18%	twitter	FirstVisualChange	linux1804-64-shippable	warm	170.00 -> 200.00
6%	google-docs	PerceptualSpeedIndex	linux1804-64-shippable-qr	cold webrender	2,019.92 -> 2,134.00
6%	google-docs	PerceptualSpeedIndex	windows10-64-shippable	cold	1,819.88 -> 1,920.33
5%	google-docs	LastVisualChange	linux1804-64-shippable	warm	6,556.67 -> 6,856.67
3%	google-docs	LastVisualChange	linux1804-64-shippable-qr	warm webrender	6,643.33 -> 6,833.33
3%	google-docs	PerceptualSpeedIndex	linux1804-64-shippable	cold	1,976.79 -> 2,030.33

Improvements:

Ratio	Suite	Test	Platform	Options	Absolute values (old vs new)
10%	google-slides	FirstVisualChange	linux1804-64-shippable-qr	warm webrender	358.10 -> 323.33
4%	instagram	PerceptualSpeedIndex	linux1804-64-shippable-qr	warm webrender	453.71 -> 434.58
3%	google-slides	SpeedIndex	linux1804-64-shippable-qr	warm webrender	616.67 -> 597.50
2%	google-docs	SpeedIndex	linux1804-64-shippable	warm	1,228.17 -> 1,202.83

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 1

•

3 years ago

Greg, do you have any explanation for that? I cannot believe that the tiny change in Marionette is causing an at maximum 4s delay for the last visual change on Google docs. The only thing that my patch does is to cause a wait for the next tick in the content process whenever a command is run via Marionette that would modify the DOM. The patch prevents a potential data inconsistency between the parent and child process.

How is the screen recording happening? Is that based on Marionette commands? I somewhat get the feeling that this is actually happening here hence the delay for Google docs, which is indeed JS heavy. Note that the duration of a Marionette command should never affect the actual page load timings.

So what kinds of geckodriver / Marionette commands are actually utilized by BrowserTime? Does it maybe run a lot of custom JS code via Execute (Async) Script? Also requesting needinfo from Peter who works on Browsertime.

Flags: needinfo?(peter)

Flags: needinfo?(gmierz2)

Julian Descottes [:jdescottes]

Updated

•

3 years ago

Blocks: 1712691

Peter Hedenskog

Comment 2

•

3 years ago

Four seconds sounds like a lot, for last visual change I always try to look at the screenshot to better understand what happens.

It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (--browsertime.pageLoadStrategy) and then wait --browsertime.pageCompleteCheckStartWait before the first JavaScript is executed (by default that is 5 seconds. Then it will wait --browsertime.pageCompleteCheckPollTimeout until the next time the JavaScript runs (if the page load hasn't finished yet). That default value is 1.5 s. So if you test a page, the first JavaScript runs after 5 seconds, but depending on the phone/server/internet/page if the page finished loading and then every 1.5 seconds. You can also tune the JavaScript that decides if the page has finished by setting your own --browsertime.pageCompleteCheck. I'm not familiar with the configuration you use but you could try increasing the time to first run and see if it changes your metrics.

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

3 years ago

Set release status flags based on info from the regressing bug 1661591

status-firefox88: --- → unaffected

status-firefox89: --- → unaffected

status-firefox-esr78: --- → unaffected

Greg Mierzwinski [:sparky]

Comment 4

•

3 years ago

Here's what I'm seeing in the perfherder-data.json files for the gdocs-vismet.

Before:

        {
          "name": "LastVisualChange",
          "replicates": [
            8800,
            8720,
            8880,
            8760,
            8480,
            8600,
            8080,
            8520,
            8600,
            8480,
            8600,
            7720,
            8760,
            13160,
            8600,
            8600,
            8560,
            8600,
            8920,
            8560,
            8720,
            8200,
            8640,
            12520,
            8600
          ],
          "lowerIsBetter": true,
          "unit": "ms",
          "shouldAlert": true,
          "value": 8600
        },

After:

        {
          "name": "LastVisualChange",
          "replicates": [
            12880,
            13200,
            8600,
            13080,
            12800,
            13160,
            12320,
            9000,
            13240,
            13080,
            12560,
            13120,
            1360,
            7400,
            13600,
            13400,
            12640,
            13280,
            13320,
            13000,
            12800,
            12240,
            13160,
            13080,
            12600
          ],
          "lowerIsBetter": true,
          "unit": "ms",
          "shouldAlert": true,
          "value": 13000
        },

It seems like this marionette patch moved the data from one data mode to another.

The settings that we use for browsertime can be found here: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/base.py#235 (note that we set our pageloadstrategy to none)

Changing these options will certainly change our results but we selected this set since it gave the lowest noise.

It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331

Flags: needinfo?(gmierz2)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 5

•

3 years ago

(In reply to Peter Hedenskog from comment #2)

It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (--browsertime.pageLoadStrategy) and then wait --browsertime.pageCompleteCheckStartWait

So what exactly is executed by this script? Does it directly check document.readyState? If that is the case then it's clearly a broken behavior and should be avoided to do. Instead a custom event listener should be setup for the page load events that once the appropriate event happened store the exact time in a custom property. Or maybe even window.performance.??? already contains some data that will help with that?

Please let us know what exactly happens here so that we can see to make it more reliable. I'm a bit worried by just accepting it given that compared to Chrome (chromedriver) it could result in worse numbers, where the page load isn't that slow at all.

Peter Hedenskog

Comment 6

•

3 years ago

I don't know how you run it since its configurable, but if you don't configure anything this is the default code:

https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/defaultPageCompleteCheck.js

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 7

•

3 years ago

(In reply to Greg Mierzwinski [:sparky] from comment #4)

It seems like this marionette patch moved the data from one data mode to another.

Could you please explain what that actually means in terms of Browsertime? What is this data mode?

It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331

(In reply to Peter Hedenskog from comment #6)

I don't know how you run it since its configurable, but if you don't configure anything this is the default code:

https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/defaultPageCompleteCheck.js

Greg, so there is nothing else that get changed on our side how Browsertime actually runs the page load check script? Given that the performance numbers are directly retrieved, this is really suspicious.

Flags: needinfo?(peter)

Flags: needinfo?(hskupin)

Flags: needinfo?(gmierz2)

Greg Mierzwinski [:sparky]

Comment 8

•

3 years ago

By data mode, I'm talking about a peak in a histogram of the data values. The peak of the histogram moved into the intermittent one when this patch landed (see how the values shifted from ~8000 to ~13000 and that we had ~13000 values in the data from before this patch).

There is nothing that change on our side regarding how browsertime tests are run or how the pageload check is done. Based on your patch, it might be that before this patch we were prematurely ending the pageload?

Flags: needinfo?(gmierz2) → needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 9

•

3 years ago

Ah I see. But what does value mean exactly here? Is that the duration in milliseconds? If yes, it would be good to know what are these extra 5000ms. When I had a look at the browsertime config I saw that --pageCompleteCheckPollTimeout is configured with 1s. So it should repeat the page loaded check each second. But there is also --pageCompleteCheckStartWait, which is set to this 5s:

https://searchfox.org/mozilla-central/rev/da5d08750e504f3710f7ea051327d9c311c39902/testing/raptor/raptor/browsertime/base.py#245-248

So could it be that the first page load check starts delayed because some other script runs in the first ~8s?

Greg, how could such a browsertime job be run with trace logging enabled in geckodriver? Would be great to see one in CI to be able to check this delay.

Flags: needinfo?(hskupin) → needinfo?(gmierz2)

Peter Hedenskog

Comment 10

•

3 years ago

Yes the duration in milliseconds. Another approach could be to use https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/pageCompleteCheckByInactivity.js. or something similar if there are pages that fire the load event early and then still do a lot of work/requests after that.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 11

•

3 years ago

It sounds reasonable in this case. I could just assume that Google docs does a lot of XHR after pageshow. Greg what do you think?

Greg Mierzwinski [:sparky]

Comment 12

•

3 years ago

:whimboo, if you add --verbose to the test task you should be able to obtain those logs. Your assumption about XHR is probably right, google docs takes a really long time to load even without your patch.

Flags: needinfo?(gmierz2)

Julian Descottes [:jdescottes]

Comment 13

•

3 years ago

Setting P3.

Henrik: Do we still plan to investigate this or should it be closed as WONTFIX as suggested in comment 4?

Flags: needinfo?(hskupin)

Priority: -- → P3

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 14

•

2 years ago

Yes, lets mark as WONTFIX based on comment 4.

Status: NEW → RESOLVED

Closed: 2 years ago

Flags: needinfo?(hskupin)

Resolution: --- → WONTFIX

Bugzilla

Quick Search

47.3 - 2.71% google-docs LastVisualChange / google-docs PerceptualSpeedIndex + 8 more (Linux, Windows) regression on Fri May 21 2021

Categories

(Testing :: geckodriver, defect, P3)

Tracking

(firefox-esr78 unaffected, firefox88 unaffected, firefox89 unaffected, firefox90 affected)

People

(Reporter: Bebe, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Crash Data

Security

(public)

User Story

Description

Regressions:

Improvements:

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14