Closed Bug 1623003 Opened 5 years ago Closed 5 years ago

Investigate the performance of resource_cache::begin_frame

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla78

People

(Reporter: gw, Assigned: bpeers)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

evict_graph.png 5 years ago Bert Peers [:bpeers] 478.29 KB, image/png		Details
Bug 1623003 - investigate begin_frame 5 years ago Bert Peers [:bpeers] 47 bytes, text/x-phabricator-request		Details \| Review
glyph_counter.png 5 years ago Bert Peers [:bpeers] 133.61 KB, image/png		Details
tracy_plot.png 5 years ago Bert Peers [:bpeers] 121.32 KB, image/png		Details

Glenn Watson [:gw]|PTO until Sept 15

Reporter

Description

•

5 years ago

I noticed that ResourceCache::begin_frame is often taking longer to execute than I would expect.

On the old hardware I was testing, I often see ~0.2ms time spent in this method, which is ~11% of the total frame build time I was seeing of ~1.75ms.

This method does do an eviction of unused texture cache entries, which may involve iterating some large arrays. But it still seems to be more of the total frame time than I would expect.

There is some other miscellaneous bookkeeping that is handled by this method. We should investigate further where the time is spent and see if there are any easy fixes we can make to improve the performance here.

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: wr-frame-building-perf

Jessie [:jbonisteel] pls NI

Updated

•

5 years ago

Blocks: wr-intel-later

Miko Mynttinen

Updated

•

5 years ago

Priority: -- → P3

Jeff Muizelaar [:jrmuizel]

Updated

•

5 years ago

Blocks: wr-perf

Bert Peers [:bpeers]

Assignee

Comment 1

•

5 years ago

Attached image evict_graph.png — Details

I looked into this a bit, and most of the time seems spent in clear_evicted in the glyph_cache.
I test with wiki Earth, holding the down arrow to scroll to the bottom of the page.

More than half the frames we iterate over all glyphs to see if they've been evicted; on average we check ~300 glyphs with several spikes above 1000, with a higher average at the bottom of the page (the "References" fine print -- more text on page).
In all but 6 frames, we retain 100% of the glyphs.

As we iterate we also repeatedly ask for get_allocated_size. Calculating this once and storing it on the CacheEntry helps a little bit, as does ignoring the size entirely and just using is_allocated (user_data.bytes_used could be calculated on demand).
But clearly the faster way would be to not have to iterate half the frames over hundreds of glyphs just for the few frames where they got dropped.

(Absolute timings are bloated from logging overhead).

Bert Peers [:bpeers]

Assignee

Comment 2

•

5 years ago

The above might have been immediately obvious with Tracy :) so I'll use this task as an opportunity to set that up properly, and then annotate everything that seems remotely useful.
(I've used RAD Telemetry in the past and the claim that the scoping is super cheap tends to be true -- doubly so when it's all conditioned on profiler feature anyway -- so I think we can go crazy here, better too much info than not enough).

Bert Peers [:bpeers]

Assignee

Comment 3

•

5 years ago

Glenn also suggested annotating allocations and frees to correlate frame spikes with memory activity.

Bert Peers [:bpeers]

Assignee

Comment 4

•

5 years ago

Attached file Bug 1623003 - investigate begin_frame — Details

Adding profile_scopes.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → bpeers

Status: NEW → ASSIGNED

Bert Peers [:bpeers]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Pulsebot

Comment 5

•

5 years ago

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b8092c1039a4 investigate begin_frame r=gw

Bert Peers [:bpeers]

Assignee

Comment 6

•

5 years ago

Attached image glyph_counter.png — Details

Some Tracy improvements landed/coming:

figured out way to get the worker pool to show up; useful for balancing glyph inlining vs. jobifying, also very handy if/when we taskify more in the future;
adding more details esp. on the scene builder side -- anything that's more than ~0.2ms shouldn't be all-blank (unless it really is indivisible, like compositor end_frame);
adding counters to tracy-rs and using them for glyphs, evictions, gpu memory.

I was first going to use emit_malloc/free instead but that seems like a full-blown memory tracker and right now we don't really need that kind of fire power.

Bert Peers [:bpeers]

Assignee

Comment 7

•

5 years ago

Attached image tracy_plot.png — Details

Example plots. Bar chart might be better, future PR? :)

Razvan Maries

Comment 8

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b8092c1039a4

Bert Peers [:bpeers]

Assignee

Comment 9

•

5 years ago

I'll probably be looking at Mac for a while, but some ideas below. Seeing as most time goes into glyphs checking if they're still in the texture cache (and seeing as the answer is 99% "yes" and probably even more so with Glenn's eviction improvements) -- in increasing order of complexity / diminishing ROI:

Move all of this to the end of the frame, after we've handed off the work to scene builder. This way the code is off the critical path and can be left mostly as is. This is with the assumption that it's fine to evict entries & scan the list while render() is running. If not, we can still do most of the work there, just defer the actual evictions until begin_frame using a "to-evict-list"; most of the time the list is empty so we're still saving. This is probably best done/reconsidered after the re-unification of frame build and render thread?
Spend less time by asking fewer questions - we're iterating each glyph, perhaps we can group them somehow? We talked about bundling all 4 subpixel variants in a single entry; we could also group glyphs together as "likely to be needed" but that gets fuzzy -- especially non-Latin languages. This is related to Glenn's work on caching entire TextRuns -- if the TextRun is cached, we don't need to check if the individual glyphs still exist -- in fact we don't need to cache them at all, so this cost would go down by itself.
Invert the relationship -- don't query each item each frame, instead have the TextureCache notify glyphs when they are evicted. Increases coupling, more bookkeeping overhead (+threading concerns) -- seems a bit heavy handed for the cost we're talking about.

Nicolas Silva [:nical]

Updated

•

5 years ago

No longer blocks: wr-perf

Markus Stange [:mstange]

Comment 10

•

5 years ago

Closing this bug because the investigation happened. I'll file a new bug on the optimization ideas.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Summary: Investigate / optimize the performance of resource_cache::begin_frame → Investigate the performance of resource_cache::begin_frame

Target Milestone: --- → mozilla78

Markus Stange [:mstange]

Updated

•

5 years ago

Keywords: leave-open