Investigate the performance of resource_cache::begin_frame
Categories
(Core :: Graphics: WebRender, task, P3)
Tracking
()
People
(Reporter: gw, Assigned: bpeers)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
I noticed that ResourceCache::begin_frame
is often taking longer to execute than I would expect.
On the old hardware I was testing, I often see ~0.2ms time spent in this method, which is ~11% of the total frame build time I was seeing of ~1.75ms.
This method does do an eviction of unused texture cache entries, which may involve iterating some large arrays. But it still seems to be more of the total frame time than I would expect.
There is some other miscellaneous bookkeeping that is handled by this method. We should investigate further where the time is spent and see if there are any easy fixes we can make to improve the performance here.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
I looked into this a bit, and most of the time seems spent in clear_evicted
in the glyph_cache
.
I test with wiki Earth, holding the down arrow to scroll to the bottom of the page.
More than half the frames we iterate over all glyphs to see if they've been evicted; on average we check ~300 glyphs with several spikes above 1000, with a higher average at the bottom of the page (the "References" fine print -- more text on page).
In all but 6 frames, we retain 100% of the glyphs.
As we iterate we also repeatedly ask for get_allocated_size
. Calculating this once and storing it on the CacheEntry
helps a little bit, as does ignoring the size entirely and just using is_allocated
(user_data.bytes_used
could be calculated on demand).
But clearly the faster way would be to not have to iterate half the frames over hundreds of glyphs just for the few frames where they got dropped.
(Absolute timings are bloated from logging overhead).
Assignee | ||
Comment 2•5 years ago
|
||
The above might have been immediately obvious with Tracy :) so I'll use this task as an opportunity to set that up properly, and then annotate everything that seems remotely useful.
(I've used RAD Telemetry in the past and the claim that the scoping is super cheap tends to be true -- doubly so when it's all conditioned on profiler
feature anyway -- so I think we can go crazy here, better too much info than not enough).
Assignee | ||
Comment 3•5 years ago
|
||
Glenn also suggested annotating allocations and frees to correlate frame spikes with memory activity.
Assignee | ||
Comment 4•5 years ago
|
||
Adding profile_scopes.
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
Some Tracy improvements landed/coming:
- figured out way to get the worker pool to show up; useful for balancing glyph inlining vs. jobifying, also very handy if/when we taskify more in the future;
- adding more details esp. on the scene builder side -- anything that's more than ~0.2ms shouldn't be all-blank (unless it really is indivisible, like compositor end_frame);
- adding counters to tracy-rs and using them for glyphs, evictions, gpu memory.
I was first going to use emit_malloc/free instead but that seems like a full-blown memory tracker and right now we don't really need that kind of fire power.
Assignee | ||
Comment 7•5 years ago
|
||
Example plots. Bar chart might be better, future PR? :)
Comment 8•5 years ago
|
||
bugherder |
Assignee | ||
Comment 9•5 years ago
|
||
I'll probably be looking at Mac for a while, but some ideas below. Seeing as most time goes into glyphs checking if they're still in the texture cache (and seeing as the answer is 99% "yes" and probably even more so with Glenn's eviction improvements) -- in increasing order of complexity / diminishing ROI:
-
Move all of this to the end of the frame, after we've handed off the work to scene builder. This way the code is off the critical path and can be left mostly as is. This is with the assumption that it's fine to evict entries & scan the list while render() is running. If not, we can still do most of the work there, just defer the actual evictions until begin_frame using a "to-evict-list"; most of the time the list is empty so we're still saving. This is probably best done/reconsidered after the re-unification of frame build and render thread?
-
Spend less time by asking fewer questions - we're iterating each glyph, perhaps we can group them somehow? We talked about bundling all 4 subpixel variants in a single entry; we could also group glyphs together as "likely to be needed" but that gets fuzzy -- especially non-Latin languages. This is related to Glenn's work on caching entire TextRuns -- if the TextRun is cached, we don't need to check if the individual glyphs still exist -- in fact we don't need to cache them at all, so this cost would go down by itself.
-
Invert the relationship -- don't query each item each frame, instead have the TextureCache notify glyphs when they are evicted. Increases coupling, more bookkeeping overhead (+threading concerns) -- seems a bit heavy handed for the cost we're talking about.
Comment 10•5 years ago
|
||
Closing this bug because the investigation happened. I'll file a new bug on the optimization ideas.
Updated•5 years ago
|
Comment 11•5 years ago
|
||
I filed bug 1685831 as a follow-up.
Description
•