40.34 - 12.49% tsvg_static / tsvgr_opacity (OSX) regression on Tue August 2 2022
Categories
(Core :: DOM: UI Events & Focus Handling, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | unaffected |
firefox103 | --- | unaffected |
firefox104 | --- | unaffected |
firefox105 | --- | wontfix |
firefox106 | --- | fix-optional |
People
(Reporter: aglavic, Unassigned)
References
(Regression)
Details
(4 keywords)
Perfherder has detected a talos performance regression from push 5771675980ea9f1e63bcbf905d251f735a7f7bb5. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
40% | tsvg_static | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 43.57 -> 61.15 |
39% | tsvg_static | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 43.45 -> 60.57 |
12% | tsvgr_opacity | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 54.46 -> 61.26 |
For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=35045
Comment 1•2 years ago
|
||
Set release status flags based on info from the regressing bug 1782693
Comment 2•2 years ago
|
||
So it looks like bug 1782693 regressed the performance of loading svg documents on Mac... I wonder:
- If there's a specific check we should add to the code to keep ticking in SVG documents.
- If there might be something bogus in how the test detects the load is finished.
Olli, any thought on this?
Andej, is it easily possible to see a profile of these tests?
Reporter | ||
Comment 3•2 years ago
|
||
Profile job of the tests is currently queued and has not completed you can find it here: https://treeherder.mozilla.org/jobs?repo=autoland&revision=628149ef8b2c734ce965f509b11e387e26196092&group_state=expanded&selectedTaskRun=cXGvfQCySEC9iX3KZK1lhQ.0&searchStr=OS%2CX%2C10.15%2CWebRender%2CShippable%2Copt%2CTalos%2Cperformance%2Ctests%2Cwith%2Csoftware%2Cwebrender%2Cenabled%2C%28profiling%29%2Ctest-macosx1015-64-shippable-qr%2Fopt-talos-svgr-swr%2Cs-p
Comment 4•2 years ago
|
||
(In reply to Andrej (:andrej) from comment #3)
Profile job of the tests is currently queued and has not completed you can find it here: https://treeherder.mozilla.org/jobs?repo=autoland&revision=628149ef8b2c734ce965f509b11e387e26196092&group_state=expanded&selectedTaskRun=cXGvfQCySEC9iX3KZK1lhQ.0&searchStr=OS%2CX%2C10.15%2CWebRender%2CShippable%2Copt%2CTalos%2Cperformance%2Ctests%2Cwith%2Csoftware%2Cwebrender%2Cenabled%2C%28profiling%29%2Ctest-macosx1015-64-shippable-qr%2Fopt-talos-svgr-swr%2Cs-p
Thanks for the profiling jobs. Unfortunately they seem to contain profiles for the svgr subtask (profile_svgr.zip), but not for the tsvg_static tests.
I looked at before/after profiles for big-optimizable-group-opacity-2500.svg
, and I see no difference between them. In both cases there's a long composite taking 3.0s. It's also not obvious what is being measured, the test could do with more profiler makers.
When looking at https://treeherder.mozilla.org/perfherder/comparesubtest?framework=1&originalProject=autoland&originalSignature=4086806&newProject=autoland&newSignature=4086806&originalRevision=f0d33d0689087a6255e544526f6846085e6ec460&newRevision=628149ef8b2c734ce965f509b11e387e26196092&page=1 it seems there's mostly a reduction of the noise.
The more significant changes are in tsvg_static, https://treeherder.mozilla.org/perfherder/comparesubtest?framework=1&originalProject=autoland&originalSignature=4086822&newProject=autoland&newSignature=4086822&originalRevision=f0d33d0689087a6255e544526f6846085e6ec460&newRevision=628149ef8b2c734ce965f509b11e387e26196092&page=1&showOnlyConfident=1 but I won't go very far investigating there without a profile and without knowing what the test is trying to do.
Comment 5•2 years ago
|
||
Daniel, do you know what these tests are measuring, and if this regression looks like something we should worry about?
Comment 6•2 years ago
•
|
||
It's been a long time since I looked at those tests.
I think the regression does look like something we should worry about and address though I would bet it's not a user-visible issue. It's likely more of a situation where the "guilty" commit here has influenced the signal that we're waiting for, in such a way that it reduces our ability to measure regressions into the future. EDIT: it may in fact be user-visible, as indicated by profiles in my next comment.
I think (?) the test is measuring time-to-first-paint after pageload, where "paint" is defined using the MozAfterPaint event, per:
https://searchfox.org/mozilla-central/rev/c0bed29d643393af6ebe77aa31455f283f169202/testing/talos/talos/test.py#956,965,975,983
So: maybe the guilty change here is inadvertently causing a delay to when we dispatch MozAfterPaint, for some reason?
Comment 7•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #4)
The more significant changes are in tsvg_static [...] I won't go very far investigating there without a profile and without knowing what the test is trying to do.
Agreed. I think we do have profiles, though? I found some tsvg_static profiles there, looking at the s-p
jobs here ("bad"):
https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=talos%2Cmac&revision=628149ef8b2c734ce965f509b11e387e26196092
vs. here ("good"):
https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=talos%2Cmac&revision=f0d33d0689087a6255e544526f6846085e6ec460
On the s-p
tasks there, under "Artifacts & debugging tools", I clicked profile_svgr.zip - open in Firefox Profiler
which gives me a listing of the various tests/subtests & lets me pick which one to view.
I looked at the subtest composite-scale-rotate-opacity.svg_pagecycle_1
(since composite-scale-rotate.svg seems to be the worst tsvg_static subtest regression as shown by perfherder), and I get these "bad"/"good" profiles (from after vs. before the regression):
bad: https://share.firefox.dev/3zMVdZE
good: https://share.firefox.dev/3JLrSmX
In both cases, the marker chart shows DocumentLoad
taking about 21-22ms, but there's a much longer delay to completing FirstContentfulComposite
in the bad profile vs. the good profile (58.5ms vs 43.6ms).
Comment 8•2 years ago
|
||
One perhaps-relevant observation: In the "good" profile, it looks like FirstNonBlankPaint
immediately produces a FirstNonBlankComposite
1ms later as part of the same refresh-driver tick; whereas in the "bad" profile, they are separated by 22ms (maybe happening in separate refresh driver ticks?)
Comment 9•2 years ago
|
||
Here's a pair of profiles for composite-scale-rotate.svg_pagecycle_2
:
bad: https://share.firefox.dev/3A8Iz8B
good: https://share.firefox.dev/3Q5XI09
As above, DocumentLoad
is the same duration (about 15ms) and pageshow
fires at around the same timestamp (at around t=0.018ms), and for some reason the bad profile then has quite a long delay before FirstContentfulComposite (around t=0.065ms vs. t=0.039ms).
(And MozAfterPaint does seem to fire immediately after FirstContentfulComposite, so that seems to be the relevant [and unfortunately user-impacting] measurement here.)
Comment 10•2 years ago
•
|
||
I'm not super familiar with the intricacies of paint scheduling, so I'll stop digging at this point. I think/hope the profiles in comment 8 - comment 9 should have enough info for someone to dig further, though. CC'ing tnikkel, mstange, and emilio who might have thoughts/insights here.
tl;dr answers to comment 5:
"what's the profile measuring": first paint (via MozAfterPaint which corresponds to FirstContentfulComposite)
"do we need to worry": yes. the profiles seem to legitimately show substantially-delayed paint operations here.
Comment 11•2 years ago
|
||
FWIW, the regressing bug did change some heuristics which were added in bug 1708121, and there especially https://bugzilla.mozilla.org/show_bug.cgi?id=1708121#c11 is interesting.
Comment 12•2 years ago
|
||
In the "bad" profile, there's a PVsync Notify runnable which doesn't do anything, and then the refresh driver tick happens in the next vsync runnable. I'm not sure why the first one has no effect. In the tick, on the ViewManagerFlush marker, it says that the paint was triggered 37ms ago, which was before the no-op vsync runnable.
Florian, can you get some before/after profiles with IPC info?
Comment 13•2 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #12)
Florian, can you get some before/after profiles with IPC info?
Before: https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=XgdE7FFETOKtcpuzi2tCzA.0&tier=1%2C2%2C3&revision=347e9e381fabf42398ec04527c3072a8a238e311&searchStr=talos
After: https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=50f69ed7832d12c80493e34229df53685b323dd6&searchStr=talos
Comment 14•2 years ago
|
||
Before: https://share.firefox.dev/3BZN0UQ
After: https://share.firefox.dev/3AbFjJL
The "After" profile clearly shows a Vsync event that happened while vsync was being observed, but this event did not trigger a refresh tick. Why!?
Updated•2 years ago
|
Comment 15•2 years ago
|
||
Set release status flags based on info from the regressing bug 1782693
Updated•2 years ago
|
Comment 16•2 years ago
|
||
Markus, is that a regression that we should fix for our upcoming releases?
Comment 17•2 years ago
|
||
I think this regression falls under the "we should understand it, but it probably doesn't have much impact on our users" bucket. So there's not a big rush to fix it. It really seems to be just one skipped vsync event, and this performance test happened to create the right conditions for it to occur.
Updated•2 years ago
|
Comment 18•12 days ago
|
||
Hi :mstange, it's been 2 years with no activity here, could this performance alert be closed as WONTFIX? If some investigation could still be done, it would be better to file a more specific follow-up bug. Something to note is that the osx machines will be migrated from 10.15 to 14.70 in the next month so it may not be reproducible in CI after that point.
Comment 19•11 days ago
|
||
Yeah let's close it as WONTFIX.
I've started an talos-svgr-profiling job here just in case we've added enough markers to make more sense of this based on the profiles: https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=talos-svgr&revision=60b9a1ba2d2e318e3800cfecc5cb0191563e0b91&selectedTaskRun=VfLW4k-BQo-f1woYQfAbKQ.0
Comment 20•11 days ago
|
||
That profile doesn't have the vsync-but-no-refresh-tick issue: https://share.firefox.dev/3CGGxR7
Anyway, this is not worth spending much time on. Speeding up this test would be a matter of making it so that the refresh driver fires sooner (not sure why it doesn't use a catch-up tick here) and so that the compositor composites sooner. That's something we want to do generally, but it's not motivated by page load, and doesn't need to be motivated by this SVG test.
Description
•