Closed Bug 1641704 Opened 5 years ago Closed 11 months ago

Investigate update_visibility overhead & prepare_primitives

Categories

(Core :: Graphics: WebRender, enhancement, P3)

enhancement

Tracking

()

RESOLVED INCOMPLETE
mozilla78
Tracking Status
firefox78 --- affected

People

(Reporter: bpeers, Assigned: bpeers)

References

(Blocks 1 open bug)

Details

Attachments

(6 files)

Some measuring with Tracy shows that update_visibility is called hundreds of times each frame, regardless of caching, and that more than half this time is in update_prim_dependencies.

We seem to be spending about as much time in the loop around calling add_prim_dependency as in that call itself. Which matches this TODO that there may be some wins here.

I added some tracy_plots and, in a windowed browser scrolling Earth:

  • the majority of tile caches has 8 tiles;
  • the majority of primitives touch 1 tile;
  • the majority of primitives touch the same tile as the previous primitive;
  • scrolling to the bottom already hits Y=70 and "Earth" is not really a huge page, so a fixed 2D array won't cut it;
  • there are some outliers, some primitive thinks it's at tile Offset X = 112;

So a simple 2D array is probably too basic, unless it supports scrolling / virtual origins, and/or has a hashmap for overflowing/storing "odd" outliers.

Instead I can start with a small vec (the above suggests 8 entries) of {TileOffset, Box<Tile>} for 16 bytes each or 2 cache lines total, and do a simple linear scan to find the offset, and remembering+returning the last search.

It should be interesting to see if that helps, and not be so much work that it's a waste when the answer is "No".

Disclaimer: I'm trying to measure with RUSTC OPT LEVEL 2 and "release [optimized]" but I'm not 100% sure my local build is the fastest it could be. Still probably ballpark correct.

A fixed 2D array based on the content size won't work, but the grid of tiles we maintain is based around the tiles that intersect the current display port, so should be relatively small and constant for a given screen size (and should be independent of the content size).

The key parts are [1] where we select a tile grid around the current viewport, and discard / retain tiles, and [2] where we clamp the indices of tiles that are affected by a primitive to the current tile grid size.

[1] https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/picture.rs#2817
[2] https://searchfox.org/mozilla-central/source/gfx/wr/webrender/src/picture.rs#2459

Attached image hashmap_xeon.png

8-entry smallVec with 1 item of LRU on Xeon.

Attached image hashmap_haswell.png

.. and Haswell. Both have a savings of ~0.1 to ~0.2ms, although I should get more runs to make this fair. I'll see what Talos says instead.

I like the SmallVec as it doesn't need to be set up or track any dynamic windows, it's pretty simple and fast (no multiplications, modulo, bounds checking). The key part is that we 'almost always' only hit 1 tile and it's 'almost always' the same one as last time, so all that matters is a very fast LRU check and everything else is just backing store for the "special case". Well, at least scrolling Earth is like that :)

I need to vendor tracy-rs first before I can send this out for review & Talos.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

Plot glyph rasterization, evictions, GPU allocs

Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---

Reduce loop overhead when registering tile dependencies;
most dependencies touch the same one tile repeatedly, so
do a last-access optimization, and also keep the tiles in
2 cache lines for the common case of <= 8 tiles per cache.

This is not update_visibility per se, but regarding PreparePrims (which comes immediately after): I noticed when scrolling Earth that ~100 TextRuns come in per frame and they all have a bunch of glyphs that need to be checked against the cache. Again simply due to volume this takes a bit of time (eg. ~0.3ms) even though nothing needs to be done (all glyphs have been seen before).

Part of that goes into preparing the GlyphKeys, I noticed that the subpixel mode was Horizontal most of the time and the transform was the identity most of the time; but special-casing that yielded small savings per frame (only about 20us per frame).

            match subpx_dir {
                SubpixelDirection::Horizontal => {
                    if transform.is_identity() {
                        self.glyph_keys_range = scratch.glyph_keys.extend(
                            glyphs.iter().map(|src| {
                                GlyphKey::new_subpixel_horizontal(src.index, src.point.x + prim_offset.x)
                            }));
                    } else {
                        self.glyph_keys_range = scratch.glyph_keys.extend(
                            glyphs.iter().map(|src| {
                                let src_point = src.point + prim_offset;
                                let device_offset = transform.transform(&src_point);
                                GlyphKey::new_subpixel_horizontal(src.index, device_offset.x)
                            }));
                    }
                },
                _ => {
                    if transform.is_identity() {
                        self.glyph_keys_range = scratch.glyph_keys.extend(
                            glyphs.iter().map(|src| {
                                let src_point = src.point + prim_offset;
                                let device_offset = DevicePoint::new(src_point.x, src_point.y);
                                GlyphKey::new(src.index, device_offset, subpx_dir)
                            }));
                    } else {
                        self.glyph_keys_range = scratch.glyph_keys.extend(
                            glyphs.iter().map(|src| {
                                let src_point = src.point + prim_offset;
                                let device_offset = transform.transform(&src_point);
                                GlyphKey::new(src.index, device_offset, subpx_dir)
                            }));
                    }
                }
            }

Oh well :)

Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a478e84fec48 Investigate update_visibility overhead r=gw
Flags: needinfo?(bpeers)
Summary: Investigate update_visibility overhead → Investigate update_visibility overhead & prepare_primitives
Pushed by bpeers@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/27aa74e6573f vendor tracy 0.1.1 (gfx/wr, bindings) r=gw

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Severity: -- → S4
Flags: needinfo?(jmathies)
Blocks: wr-frame-building-perf
No longer blocks: wr-perf

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Assignee: bpeers → nobody
Flags: needinfo?(jmathies)

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)
Keywords: leave-open

Hi :gw, is there anything to salvage here (like https://phabricator.services.mozilla.com/D77573) or should this be closed?

Flags: needinfo?(gwatson)
Assignee: nobody → bpeers
Status: REOPENED → RESOLVED
Closed: 5 years ago11 months ago
Flags: needinfo?(gwatson)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: