Closed Bug 1712682 Opened 3 years ago Closed 2 years ago

47.3 - 2.71% google-docs LastVisualChange / google-docs PerceptualSpeedIndex + 8 more (Linux, Windows) regression on Fri May 21 2021

Categories

(Testing :: geckodriver, defect, P3)

defect

Tracking

(firefox-esr78 unaffected, firefox88 unaffected, firefox89 unaffected, firefox90 affected)

RESOLVED WONTFIX
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- unaffected
firefox90 --- affected

People

(Reporter: Bebe, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Perfherder has detected a browsertime performance regression from push 94a871824a5da2db5dfda13c3a9c51b37c1da275. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
47% google-docs LastVisualChange linux1804-64-shippable-qr cold webrender 8,716.67 -> 12,840.00
47% google-docs LastVisualChange linux1804-64-shippable-qr cold webrender 8,690.00 -> 12,760.00
42% google-docs LastVisualChange windows10-64-shippable cold 7,811.58 -> 11,126.67
23% twitter FirstVisualChange linux1804-64-shippable warm 160.00 -> 196.67
18% twitter FirstVisualChange linux1804-64-shippable warm 170.00 -> 200.00
6% google-docs PerceptualSpeedIndex linux1804-64-shippable-qr cold webrender 2,019.92 -> 2,134.00
6% google-docs PerceptualSpeedIndex windows10-64-shippable cold 1,819.88 -> 1,920.33
5% google-docs LastVisualChange linux1804-64-shippable warm 6,556.67 -> 6,856.67
3% google-docs LastVisualChange linux1804-64-shippable-qr warm webrender 6,643.33 -> 6,833.33
3% google-docs PerceptualSpeedIndex linux1804-64-shippable cold 1,976.79 -> 2,030.33

Improvements:

Ratio Suite Test Platform Options Absolute values (old vs new)
10% google-slides FirstVisualChange linux1804-64-shippable-qr warm webrender 358.10 -> 323.33
4% instagram PerceptualSpeedIndex linux1804-64-shippable-qr warm webrender 453.71 -> 434.58
3% google-slides SpeedIndex linux1804-64-shippable-qr warm webrender 616.67 -> 597.50
2% google-docs SpeedIndex linux1804-64-shippable warm 1,228.17 -> 1,202.83

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(hskupin)

Greg, do you have any explanation for that? I cannot believe that the tiny change in Marionette is causing an at maximum 4s delay for the last visual change on Google docs. The only thing that my patch does is to cause a wait for the next tick in the content process whenever a command is run via Marionette that would modify the DOM. The patch prevents a potential data inconsistency between the parent and child process.

How is the screen recording happening? Is that based on Marionette commands? I somewhat get the feeling that this is actually happening here hence the delay for Google docs, which is indeed JS heavy. Note that the duration of a Marionette command should never affect the actual page load timings.

So what kinds of geckodriver / Marionette commands are actually utilized by BrowserTime? Does it maybe run a lot of custom JS code via Execute (Async) Script? Also requesting needinfo from Peter who works on Browsertime.

Flags: needinfo?(peter)
Flags: needinfo?(gmierz2)

Four seconds sounds like a lot, for last visual change I always try to look at the screenshot to better understand what happens.

It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (--browsertime.pageLoadStrategy) and then wait --browsertime.pageCompleteCheckStartWait before the first JavaScript is executed (by default that is 5 seconds. Then it will wait --browsertime.pageCompleteCheckPollTimeout until the next time the JavaScript runs (if the page load hasn't finished yet). That default value is 1.5 s. So if you test a page, the first JavaScript runs after 5 seconds, but depending on the phone/server/internet/page if the page finished loading and then every 1.5 seconds. You can also tune the JavaScript that decides if the page has finished by setting your own --browsertime.pageCompleteCheck. I'm not familiar with the configuration you use but you could try increasing the time to first run and see if it changes your metrics.

Set release status flags based on info from the regressing bug 1661591

Here's what I'm seeing in the perfherder-data.json files for the gdocs-vismet.

Before:

        {
          "name": "LastVisualChange",
          "replicates": [
            8800,
            8720,
            8880,
            8760,
            8480,
            8600,
            8080,
            8520,
            8600,
            8480,
            8600,
            7720,
            8760,
            13160,
            8600,
            8600,
            8560,
            8600,
            8920,
            8560,
            8720,
            8200,
            8640,
            12520,
            8600
          ],
          "lowerIsBetter": true,
          "unit": "ms",
          "shouldAlert": true,
          "value": 8600
        },

After:

        {
          "name": "LastVisualChange",
          "replicates": [
            12880,
            13200,
            8600,
            13080,
            12800,
            13160,
            12320,
            9000,
            13240,
            13080,
            12560,
            13120,
            1360,
            7400,
            13600,
            13400,
            12640,
            13280,
            13320,
            13000,
            12800,
            12240,
            13160,
            13080,
            12600
          ],
          "lowerIsBetter": true,
          "unit": "ms",
          "shouldAlert": true,
          "value": 13000
        },

It seems like this marionette patch moved the data from one data mode to another.

The settings that we use for browsertime can be found here: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/base.py#235 (note that we set our pageloadstrategy to none)

Changing these options will certainly change our results but we selected this set since it gave the lowest noise.

It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331

Flags: needinfo?(gmierz2)

(In reply to Peter Hedenskog from comment #2)

It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (--browsertime.pageLoadStrategy) and then wait --browsertime.pageCompleteCheckStartWait

So what exactly is executed by this script? Does it directly check document.readyState? If that is the case then it's clearly a broken behavior and should be avoided to do. Instead a custom event listener should be setup for the page load events that once the appropriate event happened store the exact time in a custom property. Or maybe even window.performance.??? already contains some data that will help with that?

Please let us know what exactly happens here so that we can see to make it more reliable. I'm a bit worried by just accepting it given that compared to Chrome (chromedriver) it could result in worse numbers, where the page load isn't that slow at all.

I don't know how you run it since its configurable, but if you don't configure anything this is the default code:

https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/defaultPageCompleteCheck.js

(In reply to Greg Mierzwinski [:sparky] from comment #4)

It seems like this marionette patch moved the data from one data mode to another.

Could you please explain what that actually means in terms of Browsertime? What is this data mode?

It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331

(In reply to Peter Hedenskog from comment #6)

I don't know how you run it since its configurable, but if you don't configure anything this is the default code:

https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/defaultPageCompleteCheck.js

Greg, so there is nothing else that get changed on our side how Browsertime actually runs the page load check script? Given that the performance numbers are directly retrieved, this is really suspicious.

Flags: needinfo?(peter)
Flags: needinfo?(hskupin)
Flags: needinfo?(gmierz2)

By data mode, I'm talking about a peak in a histogram of the data values. The peak of the histogram moved into the intermittent one when this patch landed (see how the values shifted from ~8000 to ~13000 and that we had ~13000 values in the data from before this patch).

There is nothing that change on our side regarding how browsertime tests are run or how the pageload check is done. Based on your patch, it might be that before this patch we were prematurely ending the pageload?

Flags: needinfo?(gmierz2) → needinfo?(hskupin)

Ah I see. But what does value mean exactly here? Is that the duration in milliseconds? If yes, it would be good to know what are these extra 5000ms. When I had a look at the browsertime config I saw that --pageCompleteCheckPollTimeout is configured with 1s. So it should repeat the page loaded check each second. But there is also --pageCompleteCheckStartWait, which is set to this 5s:

https://searchfox.org/mozilla-central/rev/da5d08750e504f3710f7ea051327d9c311c39902/testing/raptor/raptor/browsertime/base.py#245-248

So could it be that the first page load check starts delayed because some other script runs in the first ~8s?

Greg, how could such a browsertime job be run with trace logging enabled in geckodriver? Would be great to see one in CI to be able to check this delay.

Flags: needinfo?(hskupin) → needinfo?(gmierz2)

Yes the duration in milliseconds. Another approach could be to use https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/pageCompleteCheckByInactivity.js. or something similar if there are pages that fire the load event early and then still do a lot of work/requests after that.

It sounds reasonable in this case. I could just assume that Google docs does a lot of XHR after pageshow. Greg what do you think?

:whimboo, if you add --verbose to the test task you should be able to obtain those logs. Your assumption about XHR is probably right, google docs takes a really long time to load even without your patch.

Flags: needinfo?(gmierz2)

Setting P3.

Henrik: Do we still plan to investigate this or should it be closed as WONTFIX as suggested in comment 4?

Flags: needinfo?(hskupin)
Priority: -- → P3

Yes, lets mark as WONTFIX based on comment 4.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(hskupin)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.