Closed Bug 1722487 Opened 3 years ago Closed 3 years ago

21.71 - 9.47% wikipedia FirstVisualChange / espn fnbpaint + 10 more (Android, Linux) regression on Tue July 20 2021

Categories

(Core :: Widget: Gtk, defect)

Firefox 92
defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox90 --- unaffected
firefox91 --- unaffected
firefox92 --- wontfix
firefox93 --- fixed

People

(Reporter: alexandrui, Assigned: emilio)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Crash Data

Attachments

(2 files)

Perfherder has detected a browsertime performance regression from push 9ac290ec5884fd52bb6c16e9794da5b42f211cbb. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Suite Test Platform Options Absolute values (old vs new)
22% wikipedia FirstVisualChange linux1804-64-shippable-qr cold webrender 1,636.67 -> 1,992.00
22% wikipedia fnbpaint linux1804-64-shippable-qr cold webrender 1,618.08 -> 1,967.15
21% wikipedia ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 1,649.00 -> 2,002.65
21% wikipedia SpeedIndex linux1804-64-shippable-qr cold webrender 1,645.50 -> 1,996.15
21% wikipedia PerceptualSpeedIndex linux1804-64-shippable-qr cold webrender 1,660.67 -> 2,002.85
19% wikipedia fnbpaint linux1804-64-shippable-qr cold webrender 1,646.65 -> 1,959.00
19% wikipedia FirstVisualChange linux1804-64-shippable-qr cold webrender 1,670.00 -> 1,980.00
18% wikipedia ContentfulSpeedIndex linux1804-64-shippable-qr cold webrender 1,681.42 -> 1,990.58
18% wikipedia SpeedIndex linux1804-64-shippable-qr cold webrender 1,677.12 -> 1,984.08
18% wikipedia PerceptualSpeedIndex linux1804-64-shippable-qr cold webrender 1,692.21 -> 1,991.17
10% espn fcp android-hw-g5-7-0-arm7-shippable-qr cold webrender 2,301.48 -> 2,520.08
9% espn fnbpaint android-hw-g5-7-0-arm7-shippable-qr cold webrender 2,309.79 -> 2,528.58

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(emilio)

Set release status flags based on info from the regressing bug 1718755

I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Flags: needinfo?(emilio) → needinfo?(aionescu)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Joel may be in a better position to answer this question.

Flags: needinfo?(jmaher)

davehunt/sparky are much more familiar with these harnesses. Usually something like fission or webrender will come in via cli args or preferences in the profile.

Flags: needinfo?(jmaher) → needinfo?(dave.hunt)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

I'm a bit surprised about this affecting cold loads etc, specially on android, since that changed timing, but only of stuff dealing with pref changes etc.

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

Don't think so. I did some retriggers on android.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Alexandru, do you know if the test harness sets some prefs or theme changes / etc that could explain this?

These profiles are merged for our Raptor performance tests:

Tests can specify their own preferences, and variants (such as fission) can also set their own preferences. As far as I know none of these preferences relate to themes.

Flags: needinfo?(dave.hunt)

The retriggers reveal this bug.

Flags: needinfo?(aionescu)

:alexandrui indicates that the retriggers point to bug 1718755 (I presume)

Flags: needinfo?(emilio)
Depends on: 1726548
Flags: needinfo?(emilio)

Remove the need to post a runnable to coalesce them (we still coalesce
theme changes because those do non-trivial work).

What can go on is that on cold load we load the font list async, and
that goes through this code via a faked font.internaluseonly.changed
pref change, which stores a reframe change hint in
mChangeHintForPrefChange.

If the page flushes after this code runs, but before the next refresh
driver tick, we will be just wasting work. So instead do the
RebuildAllStyleData call synchronously. It just schedules a flush and
posts a hint, so it doesn't do the work right there, but makes the next
flush do the work as needed, so it should be faster and also more
correct.

Also improve handling of preference sheet prefs and such to avoid more
wasted work when we go through this code, as the assumption their
handling was doing was that it gets called infrequently, but it's not
the case due to the font list updates.

Depends on D123102

Assignee: nobody → emilio
Status: NEW → ASSIGNED
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5235f051d952
Improve preference change handling code to be faster. r=jfkthame
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch

I'm not seeing any obvious improvements in the graphs from comment 0 so far since this landed :(

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Ok, so I see what's going on. Before my patch, the actual font list change wasn't being processed in time.

I get the expected wins if I avoid the reframe that the font list update causes: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=5c41af5c0cd502c4784dc74f049732d46ab4f3d8&newProject=try&newRevision=a6d8cbfe3c9d8a3356ce3c4b8c33117189927d54&framework=13&page=1

(This is the patch for reference).

Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...

The reframe comes from bug 1647573. If we don't want to just take this regression, we could perhaps avoid reframing here again? Before the patch in this bug, there's the possibility of other layouts happening before the pref update timer, and we were fine with that before... This only seems to affect Linux / Android because presumably other platforms don't trigger font fallback in a way that we need this reflow for this test...

Flags: needinfo?(jfkthame)

Jonathan, I think the behavior now is strictly more correct (and does less work over all). The test regresses because before my patch the work happened off a timer that didn't get measured...

Yeah, that makes sense. So the test doesn't fully reflect user experience, because there may be a further visual change coming after what the test measures. But OTOH, in many cases it may turn out that nothing visibly changes after the update, so showing the pre-update content to the user was actually a win even though it wasn't reliably correct.

When there is a "complete" font-list update (e.g. in response to the user installing a new font while Firefox is running), it's essential that we do a full reframe of everything, because the existing frame tree will be holding references to textruns that have references to font objects that no longer exist.

However, in the case of this reflow we don't actually need to reframe, because existing font objects haven't gone away, and any cached references will still be valid; we should just need to restyle so that textframes can reconsider their font matching.

Maybe it's time to finally get rid of the old hack of touching the font.internaluseonly.updated pref in order to force updates, and instead have a more fine-grained way of notifying such updates.

Flags: needinfo?(jfkthame)

Alright, I'll give that a shot.

Flags: needinfo?(emilio)
Flags: needinfo?(emilio)

(In reply to Pulsebot from comment #10)

Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5235f051d952
Improve preference change handling code to be faster. r=jfkthame

== Change summary for alert #31020 (as of Wed, 25 Aug 2021 11:28:26 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
7% google-search-restaurants dcf android-hw-g5-7-0-arm7-shippable-qr cold webrender 535.52 -> 496.75
7% web-de fnbpaint android-hw-g5-7-0-arm7-shippable-qr cold webrender 391.42 -> 365.42
6% web-de fcp android-hw-g5-7-0-arm7-shippable-qr cold webrender 387.73 -> 363.50
6% web-de fnbpaint android-hw-g5-7-0-arm7-shippable-qr cold webrender 391.48 -> 367.04
6% google-search-restaurants fcp android-hw-g5-7-0-arm7-shippable-qr cold webrender 511.19 -> 480.00
... ... ... ... ...
3% google-search-restaurants loadtime android-hw-g5-7-0-arm7-shippable-qr cold webrender 713.48 -> 690.71

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31020

It's good to see progress being made on this bug, but I think we're out of time to get a fix into Fx92. I'm honestly a bit unsure whether we're better off just taking this hit for a cycle vs. trying to backout the regressing patches this late in the cycle. What are your thoughts, Emilio?

Flags: needinfo?(emilio)

Let's see what Emilio says, but FWIW my vote would be to just take the hit for now. Cold-start wikipedia is something of a worst-case, and doesn't represent a perf regression that users are going to experience on an ongoing basis during a browsing session.

(The second patch here should help further, but I wouldn't want to try and uplift to 92 this late in the cycle; I think we should just aim to make these improvements in 93.)

Yeah, I agree with Jonathan here.

Flags: needinfo?(emilio)

Thanks.

Status: REOPENED → ASSIGNED
Target Milestone: 93 Branch → ---
Pushed by ealvarez@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d400d0a16ce6
Avoid some work for font list updates. r=jfkthame
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 93 Branch

What's the follow-on, now that this bug is marked resolved? We've had no noticeable improvement since this regression happened. Are improvements still planned for 93?

Flags: needinfo?(emilio)

I don't think it's easy to improve on this (ni? Jonathan to confirm). The current behavior is more correct than before. I could avoid or artificially delay the reflow when this happens to restore the previous metrics, but it'd be a correctness issue / potentially cause flashes.

Flags: needinfo?(emilio) → needinfo?(jfkthame)
Regressions: 1730456

Yeah, comment 25 sounds right to me. If we render sooner by deferring the reflow here, we introduce/increase the chance of rendering temporarily-incorrect content that will then flash/shift when it updates to the final rendering, which is undesirable.

Flags: needinfo?(jfkthame)
Regressions: 1727399
Crash Signature: [@ gfxFontGroup::FindFallbackFaceForChar] [@ gfxFontGroup::FindFontForChar]
Has Regression Range: --- → yes
Regressions: 1771810
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: