Closed Bug 1780193 Opened 2 years ago Closed 2 years ago

5.11% perf_reftest_singletons bidi-resolution-1.html (Windows) regression on Thu July 14 2022

Categories

(Core :: Graphics: Text, defect, P3)

defect

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox102 --- unaffected
firefox103 --- unaffected
firefox104 --- wontfix
firefox105 --- fixed

People

(Reporter: aglavic, Assigned: aosmond)

References

(Regression)

Details

(4 keywords)

Attachments

(1 file)

Perfherder has detected a talos performance regression from push a29ac3a0f194beb661c3bd6ae357a41fe32ef88f. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
9% perf_reftest_singletons bidi-resolution-1.html macosx1015-64-shippable-qr e10s fission stylo webrender 132.91 -> 144.55
6% perf_reftest_singletons bidi-resolution-1.html windows10-64-shippable-qr e10s fission stylo webrender 136.06 -> 144.89
5% perf_reftest_singletons bidi-resolution-1.html windows10-64-shippable-qr e10s fission stylo webrender 137.17 -> 144.17

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(aosmond)

This will almost certainly be a result of converting the font-matching code from returning raw gfxFont pointers to returning already_AddRefed; this undoes the optimization from bug 1364224 (in particular, see bug 1364224 comment 7) that greatly reduced refcount churn during textrun construction.

Unfortunately, I think this is required for correctness in the new world of multi-threaded font usage, so we probably have to accept this.

I believe Andrew is on PTO this week, so leaving the needinfo for when he's back, but for now please leave this as-is. If we can find other ways to mitigate the cost of using proper references, that'd be great, but just backing out the "regressing" patch here would not be a good option.

Understood, we will not back out the patch and will wait until Andrew returns or someone else takes over

Set release status flags based on info from the regressing bug 1779519

Severity: -- → S4
Priority: -- → P3

Set release status flags based on info from the regressing bug 1779519

I've been reviewing the test cases and I don't see any explicit uses of OffscreenCanvas or FontFaceSet in the code. Perhaps FontFaceSet is still somehow getting initialized by default for the worker? I've pushed a patch to try that makes the initialization of FontFaceSet asynchronous, and only blocks if we need to wait for it to complete (e.g. if load is called). Waiting for the results.

Making FontFaceSet initialize asynchronously did not solve the problem. It appears it is specifically flipping gfx.offscreencanvas.enabled to true is what caused the regression. I tried MOZ_CRASHing if an OffscreenCanvas was created (directly or through transferControlToOffscreen) and the test still finished successfully.

In this patch, we make gfxFont use normal AddRef/Release semantics, and
instead now hold a strong reference to it inside the cache. At all times
a font should be in the expiration tracker, and whenever a lookup is
performed, we mark it as used.

When the tracker attempts to expire an entry, we check to see if the
only strong reference is inside the cache itself. If so, it is deleted.
If not, we mark it as used again instead of expiring it.

This has the advantage of making gfxFont::Release cheaper when there is
only one outstanding reference left outside the cache. It used to
acquire the gfxFontCache mutex in order to reinsert the font into the
expiration tracker when the last strong reference was cleared. This is a
very common case since the typical use case is only the main thread is
interacting with the cache, one font reference at a time.

This does not completely restore us to the previous performance
observed, but it does reclaim half of the original design benefit while
remaining threadsafe.

Assignee: nobody → aosmond
Status: NEW → ASSIGNED

Comment 5 and comment 6 were meant for another bug.

Flags: needinfo?(aosmond)
Pushed by aosmond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/577c514ce632
Refactor gfxFontCache expiration behaviour to mitigate perf reductions. r=jfkthame
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch

The patch landed in nightly and beta is affected.
:aosmond, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox104 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(aosmond)

Too late for 104. More worrying, however, is that the graphs from the alert don't seem to show much change after this landed.
https://treeherder.mozilla.org/perfherder/alerts?id=34877

It did in my try runs, but yes I agree. I am hoping bug 1784940 may help undo some of these regressions as well.

Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: