2.59 - 48.3% displaylist_mutate / tabswitch / tp5o / tp5o_scroll / tscrollx (linux64-shippable|-qr, windows10-64-shippable|-qr, windows7-32-shippable) regression on push 5a874f1887ff0644f19936203132cc05ff346ec7 (Wed June 3 2020)
Categories
(Core :: Panning and Zooming, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox77 | --- | unaffected |
firefox78 | --- | unaffected |
firefox79 | --- | fixed |
People
(Reporter: alexandrui, Assigned: botond, NeedInfo)
References
(Regression)
Details
(4 keywords)
Attachments
(1 file)
Perfherder has detected a talos performance regression from push 5a874f1887ff0644f19936203132cc05ff346ec7. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
48% tscrollx linux64-shippable-qr opt e10s stylo 1.01 -> 1.49
25% tscrollx windows10-64-shippable-qr opt e10s stylo 0.73 -> 0.92
25% tscrollx windows10-64-shippable-qr opt e10s stylo 0.73 -> 0.91
18% tp5o_scroll linux64-shippable-qr opt e10s stylo 2.24 -> 2.64
16% tscrollx linux64-shippable opt e10s stylo 0.66 -> 0.76
15% tscrollx windows10-64-shippable opt e10s stylo 0.59 -> 0.68
15% tp5o_scroll windows10-64-shippable-qr opt e10s stylo 1.69 -> 1.94
8% displaylist_mutate windows10-64-shippable opt e10s stylo 1,356.71 -> 1,459.12
7% tabswitch windows10-64-shippable opt e10s stylo 9.36 -> 10.03
6% displaylist_mutate linux64-shippable opt e10s stylo 1,408.10 -> 1,492.34
5% displaylist_mutate windows7-32-shippable opt e10s stylo 1,448.26 -> 1,524.52
5% tabswitch windows7-32-shippable opt e10s stylo 10.87 -> 11.42
3% displaylist_mutate linux64-shippable-qr opt e10s stylo 2,915.59 -> 3,007.15
3% tp5o linux64-shippable opt e10s stylo 200.23 -> 205.43
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Set release status flags based on info from the regressing bug 1611660
Assignee | ||
Comment 2•4 years ago
|
||
I don't have an initial theory of why this patch would have caused talos regressions, but I suppose I can try to reproduce it locally and investigate.
One thing that's interesting is that QR platforms are seeing larger regressions than non-QR platforms, e.g. on Linux, the tscrollx regression is 45% (QR) vs. 18% (non-QR), on Windows it's 25% (QR) vs. 15% (non-QR).
Assignee | ||
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Just wanted to post an update to say I've been actively investigating this. Markus identified a suspicious difference between the above profiles (the presence of "waiting for paint" markers in the "before" profile but not in the "after" profile), and I see the same thing during local talos runs. I've been debugging it locally to try to pin down the root cause but haven't gotten anywhere meaningful yet.
Assignee | ||
Comment 5•4 years ago
•
|
||
I've tracked the issue down to this line causing an extra repaint after every first-paint of a scroll frame. This repaint appears to be unnecessary, as we're just accepting the scroll frame's layout viewport for the first time.
I don't understand the exact mechanism by which this extra repaint results in the observed performance regression, but it seems to somehow make it so that going into nsRefreshDriver::GetTransactionId()
, mOutstandingTransactionId
is always the same as mCompletedTransaction
(whereas before the change, mOutstandingTransactionId
would typically be ahead of mCompletedTransaction
by 1), causing us to avoid setting mWaitingForTransaction
, which in turn causes us to avoid this early exit in Tick()
and reach the place where we stop the refresh timer much more often.
Assignee | ||
Comment 6•4 years ago
|
||
In the first-paint case, we are accepting the incoming scroll metadata
wholesale, so most of the consequences of this code block will happen
anyways -- except for the extra repaint, which is not actually needed
and in fact undesirable in the first-paint case.
Assignee | ||
Comment 7•4 years ago
|
||
Comment 9•4 years ago
|
||
Can we file something for the flakiness you observed in tscroll? At least so it's documented.
Comment 10•4 years ago
|
||
Oh yeah I was thinking of mentioning this but forgot: https://wiki.mozilla.org/TestEngineering/Performance/Talos/Tests#tscrollx would be a good place to add a "Possible regression causes" section with notes.
Assignee | ||
Comment 11•4 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #9)
Can we file something for the flakiness you observed in tscroll? At least so it's documented.
Yup, filed bug 1645275 with a summary of my investigation into said flakiness.
Comment 12•4 years ago
|
||
bugherder |
Comment 13•4 years ago
|
||
Botond, it looks like the displaylist_mutate regression was only partially fixed by the fix here. Do you have any ideas on why that might be?
Updated•3 years ago
|
Description
•