Closed Bug 1680855 Opened 4 years ago Closed 4 years ago

It is possible to get into a state where fonts keep getting re-rasterized (glyph cache full?)

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
86 Branch
Performance Impact ?
Tracking Status
firefox85 --- unaffected
firefox86 + fixed

People

(Reporter: mstange, Assigned: mstange)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, power)

Attachments

(1 file)

Attached video screen recording

One of my Firefox windows is currently in a state where scrolling up and down on https://www.wikipedia.org/ (with the full language selector expanded) keeps rasterizing glyphs.

Our caches should be big enough to hold all the needed glyphs for a simple page like this. So something must be going bad with the eviction policy.

Blocks: 1681339
No longer blocks: wr-perf
Severity: -- → S3

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from bug 1684482 comment #6)

I actually used Instruments now to get a native trace of what's going on, and as it looks like all the CPU time is spent on 8 Rayon threads. All of these actually contain stack frames for webrender::glyph_rasterizer::GlyphRasterizer::flush_glyph_requests::process_glyph.

Not sure why this differs from the recording of the Gecko profiler. Maybe it cannot handle those threads?

Also note that the CPU load goes up to 400% in my case for each and every mouse event (move, click) that gets done.

Blocks: power-usage
Keywords: perf, power
Whiteboard: [fxperf]

Btw is that a regression? I have never seen this behavior before the Christmas break. Or maybe something else made it worse for me. It's happening multiple times a day.

I actually cannot see it in Firefox Beta, and Nightly didn't show that behavior with MacOS 10.15.7.

[Tracking Requested - why for this release]: I'm not able to reproduce with the current Firefox 85.0 beta yet, as such this might be a regression from the 86 cycle. I'll run some older Nightly builds with my work profile to figure out when this could have been started. But this might take a bit given that it's not clear yet how to reproduce the problem.

See Also: → 1680864

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

[Tracking Requested - why for this release]: I'm not able to reproduce with the current Firefox 85.0 beta yet, as such this might be a regression from the 86 cycle.

This doesn't make sense - the bug was filed on December 5 GMT, and the 86 branch started only a week later. The regression cannot postdate the report of the issue. It's possible this behaviour only happens with some setting that is off on beta but turned on for nightly?

I'm removing the [fxperf] tag as this clearly isn't a frontend issue.

Whiteboard: [fxperf]
Whiteboard: [qf]

I've had some luck reproducing this bug, by opening long wikipedia articles in many different languages in separate tabs (e.g. these pages: 1 2 3 4 5 6 7 8 9 10) and scrolling up and down aggressively for a while in each tab, and cycling through them. Maybe this can help us get a regression range.

Here some observations given that I hit the problem again on https://github.com/notifications?query=is%3Aunread:

  1. It's only the Github tab that is affected (it doesn't make a difference if it is pinned or not) - no other tabs of the same window show the issue.
  2. Opening a new Github tab in the same window also shows the issue.
  3. Tearing off the Github tab into a new window makes the issue go away.
  4. The Github tab in the original window is still affected.
Priority: -- → P2

I'm currently reading up on how the texture cache works and how it decides to evict entries. What I've noticed so far, just from looking at the debug overlay, is the following: In the beginning, as I scroll around on a wikipedia page, we seem to be growing the texture cache without bound. E.g. I've seen it grow from 11 512x512 textures to 78 512x512 textures. Then, at some point some switch flips and we go into aggressive eviction mode. From then on, scrolling around on the same page keeps re-rasterizing glyphs, and the texture cache size is kept to a very small size, around 3-12 512x512 textures.

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED
Depends on: 1685643

The main problem here is bug 1685643. Another problem is bug 1685564.

Both regressions are on 86 only. As such 85 is unaffected.

I can constantly see this problem with switching to a pinned Github tab. After more than 5 minutes not selecting this tab and doing other work in the browser, this tab always blocks for 1-2s.

Here a profile with a Nightly build from yesterday: https://share.firefox.dev/2MZXNad

With bug 1685643 and bug 1685564 marked RESOLVED FIXED, is there more work to do here, mstange? Is what whimboo is hitting in comment 11 this same issue or a new one?

Flags: needinfo?(mstange.moz)

Note that the issue I was mentioning was first filed as bug 1684482, and then duped here. So if it's different we should reopen bug 1684482.

Let's resolve this bug. As far as I can tell, we don't enter the "terrible state" anymore.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #11)

I can constantly see this problem with switching to a pinned Github tab. After more than 5 minutes not selecting this tab and doing other work in the browser, this tab always blocks for 1-2s.

In this case the re-rasterization is expected. The fact that it's slow is not expected, and I'll probably be fixing that in bug 1681346 or bug 1683975.

Here a profile with a Nightly build from yesterday: https://share.firefox.dev/2MZXNad

It's extra slow in this profile because you're running with DMD. See the time spent in mozilla::dmd::AllocCallback: https://share.firefox.dev/3su0h0K

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(mstange.moz)
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch

(In reply to Markus Stange [:mstange] from comment #14)

It's extra slow in this profile because you're running with DMD. See the time spent in mozilla::dmd::AllocCallback: https://share.firefox.dev/3su0h0K

Interesting. But where can I actually found this specific frame in the callstack? I tried to find but wasn't successful. Also this delay doesn't always happen, only for those tabs I haven't had selected for a while.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #15)

But where can I actually found this specific frame in the callstack?

It's at the tip of the call stack, you have to go as deep as you can in the call tree. Or just invert the tree.

Also this delay doesn't always happen, only for those tabs I haven't had selected for a while.

Yes, this means that the texture cache is working as expected and removes cached glyphs after a while, to save memory.

(In reply to Markus Stange [:mstange] from comment #16)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #15)

But where can I actually found this specific frame in the callstack?

It's at the tip of the call stack, you have to go as deep as you can in the call tree.

And you don't have to open nodes by hand in the call tree, clicking on the sample in the timeline will get you there immediately.

Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: