47.3 - 2.71% google-docs LastVisualChange / google-docs PerceptualSpeedIndex + 8 more (Linux, Windows) regression on Fri May 21 2021
Categories
(Testing :: geckodriver, defect, P3)
Tracking
(firefox-esr78 unaffected, firefox88 unaffected, firefox89 unaffected, firefox90 affected)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox88 | --- | unaffected |
firefox89 | --- | unaffected |
firefox90 | --- | affected |
People
(Reporter: Bebe, Unassigned)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: perf, perf-alert, regression)
Perfherder has detected a browsertime performance regression from push 94a871824a5da2db5dfda13c3a9c51b37c1da275. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
47% | google-docs | LastVisualChange | linux1804-64-shippable-qr | cold webrender | 8,716.67 -> 12,840.00 |
47% | google-docs | LastVisualChange | linux1804-64-shippable-qr | cold webrender | 8,690.00 -> 12,760.00 |
42% | google-docs | LastVisualChange | windows10-64-shippable | cold | 7,811.58 -> 11,126.67 |
23% | FirstVisualChange | linux1804-64-shippable | warm | 160.00 -> 196.67 | |
18% | FirstVisualChange | linux1804-64-shippable | warm | 170.00 -> 200.00 | |
6% | google-docs | PerceptualSpeedIndex | linux1804-64-shippable-qr | cold webrender | 2,019.92 -> 2,134.00 |
6% | google-docs | PerceptualSpeedIndex | windows10-64-shippable | cold | 1,819.88 -> 1,920.33 |
5% | google-docs | LastVisualChange | linux1804-64-shippable | warm | 6,556.67 -> 6,856.67 |
3% | google-docs | LastVisualChange | linux1804-64-shippable-qr | warm webrender | 6,643.33 -> 6,833.33 |
3% | google-docs | PerceptualSpeedIndex | linux1804-64-shippable | cold | 1,976.79 -> 2,030.33 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
10% | google-slides | FirstVisualChange | linux1804-64-shippable-qr | warm webrender | 358.10 -> 323.33 |
4% | PerceptualSpeedIndex | linux1804-64-shippable-qr | warm webrender | 453.71 -> 434.58 | |
3% | google-slides | SpeedIndex | linux1804-64-shippable-qr | warm webrender | 616.67 -> 597.50 |
2% | google-docs | SpeedIndex | linux1804-64-shippable | warm | 1,228.17 -> 1,202.83 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Comment 1•3 years ago
|
||
Greg, do you have any explanation for that? I cannot believe that the tiny change in Marionette is causing an at maximum 4s delay for the last visual change on Google docs. The only thing that my patch does is to cause a wait for the next tick in the content process whenever a command is run via Marionette that would modify the DOM. The patch prevents a potential data inconsistency between the parent and child process.
How is the screen recording happening? Is that based on Marionette commands? I somewhat get the feeling that this is actually happening here hence the delay for Google docs, which is indeed JS heavy. Note that the duration of a Marionette command should never affect the actual page load timings.
So what kinds of geckodriver / Marionette commands are actually utilized by BrowserTime? Does it maybe run a lot of custom JS code via Execute (Async) Script
? Also requesting needinfo from Peter who works on Browsertime.
Comment 2•3 years ago
|
||
Four seconds sounds like a lot, for last visual change I always try to look at the screenshot to better understand what happens.
It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (--browsertime.pageLoadStrategy
) and then wait --browsertime.pageCompleteCheckStartWait
before the first JavaScript is executed (by default that is 5 seconds. Then it will wait --browsertime.pageCompleteCheckPollTimeout
until the next time the JavaScript runs (if the page load hasn't finished yet). That default value is 1.5 s. So if you test a page, the first JavaScript runs after 5 seconds, but depending on the phone/server/internet/page if the page finished loading and then every 1.5 seconds. You can also tune the JavaScript that decides if the page has finished by setting your own --browsertime.pageCompleteCheck
. I'm not familiar with the configuration you use but you could try increasing the time to first run and see if it changes your metrics.
Comment 3•3 years ago
|
||
Set release status flags based on info from the regressing bug 1661591
Comment 4•3 years ago
|
||
Here's what I'm seeing in the perfherder-data.json files for the gdocs-vismet
.
Before:
{
"name": "LastVisualChange",
"replicates": [
8800,
8720,
8880,
8760,
8480,
8600,
8080,
8520,
8600,
8480,
8600,
7720,
8760,
13160,
8600,
8600,
8560,
8600,
8920,
8560,
8720,
8200,
8640,
12520,
8600
],
"lowerIsBetter": true,
"unit": "ms",
"shouldAlert": true,
"value": 8600
},
After:
{
"name": "LastVisualChange",
"replicates": [
12880,
13200,
8600,
13080,
12800,
13160,
12320,
9000,
13240,
13080,
12560,
13120,
1360,
7400,
13600,
13400,
12640,
13280,
13320,
13000,
12800,
12240,
13160,
13080,
12600
],
"lowerIsBetter": true,
"unit": "ms",
"shouldAlert": true,
"value": 13000
},
It seems like this marionette patch moved the data from one data mode to another.
The settings that we use for browsertime can be found here: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/browsertime/base.py#235 (note that we set our pageloadstrategy to none)
Changing these options will certainly change our results but we selected this set since it gave the lowest noise.
It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331
Comment 5•3 years ago
|
||
(In reply to Peter Hedenskog from comment #2)
It works like this: Browsertime uses a JavaScript to decide when the page has finished loading (executed through webdriver etc). By default Browsertime uses page load strategy "none" (
--browsertime.pageLoadStrategy
) and then wait--browsertime.pageCompleteCheckStartWait
So what exactly is executed by this script? Does it directly check document.readyState
? If that is the case then it's clearly a broken behavior and should be avoided to do. Instead a custom event listener should be setup for the page load events that once the appropriate event happened store the exact time in a custom property. Or maybe even window.performance.???
already contains some data that will help with that?
Please let us know what exactly happens here so that we can see to make it more reliable. I'm a bit worried by just accepting it given that compared to Chrome (chromedriver) it could result in worse numbers, where the page load isn't that slow at all.
Comment 6•3 years ago
|
||
I don't know how you run it since its configurable, but if you don't configure anything this is the default code:
Comment 7•3 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #4)
It seems like this marionette patch moved the data from one data mode to another.
Could you please explain what that actually means in terms of Browsertime? What is this data mode?
It seems like we should accept this change as a WONTFIX because (as per :whimboo's comment above) the patch directly impacts JS commands that are run from marionette which are used a lot in browsertime. :whimboo, you can find an overview of what's happening in browsertime in one single iteration here: https://github.com/sitespeedio/browsertime/blob/main/lib/core/engine/iteration.js#L83-L331
(In reply to Peter Hedenskog from comment #6)
I don't know how you run it since its configurable, but if you don't configure anything this is the default code:
Greg, so there is nothing else that get changed on our side how Browsertime actually runs the page load check script? Given that the performance
numbers are directly retrieved, this is really suspicious.
Comment 8•3 years ago
|
||
By data mode, I'm talking about a peak in a histogram of the data values. The peak of the histogram moved into the intermittent one when this patch landed (see how the values shifted from ~8000 to ~13000 and that we had ~13000 values in the data from before this patch).
There is nothing that change on our side regarding how browsertime tests are run or how the pageload check is done. Based on your patch, it might be that before this patch we were prematurely ending the pageload?
Comment 9•3 years ago
|
||
Ah I see. But what does value mean exactly here? Is that the duration in milliseconds? If yes, it would be good to know what are these extra 5000ms. When I had a look at the browsertime config I saw that --pageCompleteCheckPollTimeout
is configured with 1s. So it should repeat the page loaded check each second. But there is also --pageCompleteCheckStartWait
, which is set to this 5s:
So could it be that the first page load check starts delayed because some other script runs in the first ~8s?
Greg, how could such a browsertime job be run with trace logging enabled in geckodriver? Would be great to see one in CI to be able to check this delay.
Comment 10•3 years ago
|
||
Yes the duration in milliseconds. Another approach could be to use https://github.com/sitespeedio/browsertime/blob/main/lib/core/pageCompleteChecks/pageCompleteCheckByInactivity.js. or something similar if there are pages that fire the load event early and then still do a lot of work/requests after that.
Comment 11•3 years ago
|
||
It sounds reasonable in this case. I could just assume that Google docs does a lot of XHR after pageshow
. Greg what do you think?
Comment 12•3 years ago
|
||
:whimboo, if you add --verbose
to the test task you should be able to obtain those logs. Your assumption about XHR is probably right, google docs takes a really long time to load even without your patch.
Comment 13•3 years ago
|
||
Setting P3.
Henrik: Do we still plan to investigate this or should it be closed as WONTFIX as suggested in comment 4?
Comment 14•2 years ago
|
||
Yes, lets mark as WONTFIX based on comment 4.
Description
•