Closed Bug 1616901 Opened 5 years ago Closed 5 years ago

Texture cache (re)allocation performance issues

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla76

Tracking Flags:

Tracking

Status

firefox76

---

fixed

People

(Reporter: nical, Assigned: nical)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(9 files, 1 obsolete file)

Bug 1616901 - (Don't land) Dump some info about shared texture cache usage. 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Allocate rgb8 linear shared texture cache regions four at a time. r=jrmuizel,gw 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Put a bit more texture cache eviction pressure. r=jrmuizel,gw 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Move picture cache texture array logic into its own struct. r=gw 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Cap the maximum texture array layer count to 32 on all platforms. r=gw,jrmuizel 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Allocate 16 layers for rgba8 linear textures in the cache and allocate new arrays instead of growing it. r=gw 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Adjust reftest expectations. r=jrmuizel 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - More reftest adjustments. 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Yet more reftest adjustments. r=jrmuizel 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1616901 - Remove accidental print statements. r=jrmuizel 5 years ago Nicolas Silva [:nical] 47 bytes, text/x-phabricator-request		Details \| Review

Nicolas Silva [:nical]

Assignee

Description

•

5 years ago

Jrmuizel was looking into performance issues on Windows + Intel, related to the texture array being resized. Part of the problem is that we sometimes have high peaks of texture memory usage, and part comes from having reallocate the the arrays. The badness tends to scale with the size of the texture array.

A typical test case is to go to youtube's front page and scroll to the bottom. By the time we get there array will have been resized multiple times. another one is scrolling down the testcase from bug 1520401.

A nice way to look at the texture cache's allocation patterns is to enable both prefs gfx.webrender.debug.texture-cache and gfx.webrender.debug.texture-cache.clear-evicted.

Some notes/ideas discussed after the standup meeting:

Tweak the texture cache's eviction policy. See the code in texture_cache.rs around EvictionThreshold and EvictionThresholdBuilder. It could be made a lot more aggressive than it currently is, and could be tuned per class of devices.
Grow the texture arrays in larger increments, to resize less often (at the cost of worst overal memory usage).
After a texture array reaches a certain size, allocate an extra texture array instead of growing the current one. This will break batches but avoid the cost of reallocating and copying the contents of the current texture array.
In the same vein, if having more texture arrays provide desirable flexibility, have a separate texture array for opaque and non-opaque content, since that won't cause any batch breaks.
After a texture array reaches a certain size, allocate standalone entries instead of growing the array (even worse for batching but maybe simpler?).
Start with a larger texture array for main windows (and keep intially small texture arrays for popups)
Upload all cached images to long lived standalone textures and copy them on-demand in an atlas that has a very aggressive eviction policy. This may require more work than the rest but would prevent extra texture uploads which some other proposals would regress.

Note: we already eagerly evict items associated with deleted image keys.

Jessie [:jbonisteel] pls NI

Updated

•

5 years ago

Priority: -- → P3

Nicolas Silva [:nical]

Assignee

Comment 1

•

5 years ago

•

Edited

On my laptop (4k screen so hidpi scaling is probably relevant), just starting the browser on about:newtab makes us (re)allocate the 512x512 color texture array 5 times, with 2, 6 ,7 ,8 and finally 9 layers before the page stabilizes.
After that going to https://fr.wikipedia.org/wiki/Caf%C3%A9 causes us to reallocate the texture array with 10, 11, 15, 16 and 18 layers before the page stabilizes. After leaving the tab idle for a moment, it looks like the array is resized down to 12 layers.
Then, scrolling that page all the way to the bottom makes us reallocate the texture array a bunch, until we reach 27 layers.

I think that it is safe to assume that we can start with a larger texture array for main windows (the layer count won't go below 10 on simple web pages on my laptop), and grow it at a coarser granularity. This would be a nice occasion to pick a larger size for the array layers (maybe 1024x0124), allowing slightly larger images to participate in batching. Maybe start with 4 1024x1024 layers for the main window and keep small 512x512 arrays for popups initially.

Edit: I got the same results running the steps above with a device pixel ratio of 1 at a quarter of the screen resolution.

Nicolas Silva [:nical]

Assignee

Comment 2

•

5 years ago

•

Edited

I experimented with 1024x1024 layers, however the slab allocation startegy is not very good with larger sizes. We end up wasting a lot of space on items that are slightly bigger than 512 pixels and allocate a lot of layers as well (though some of it also comes from not placing some images in standalone textures).
Some of the layers could use a guillotine allocator instead of the slab allocator, or we could have 1024x1024 textures but still place the 512px+ items in standalone textures initially.

Nicolas Silva [:nical]

Assignee

Comment 3

•

5 years ago

Another interesting thing is that the slab allocator appears to have a fixed slab size per layer. As a result, we tend to always have at least a layer for each slab size. Setting the layer size to 1024 while keeping the max item size to 512 still leads to a lot of allocated layers, with a lot of free space.

Nicolas Silva [:nical]

Assignee

Comment 4

•

5 years ago

Attached file Bug 1616901 - (Don't land) Dump some info about shared texture cache usage. (obsolete) — Details

Nicolas Silva [:nical]

Assignee

Comment 5

•

5 years ago

I added some quick and dirty logging to see what's going on in the shared cache (patch).

"total" is the number of used AND free regions multiplied by the amount of pixels per region (closest to actual GPU memory usage).
"allocated" percentages are the pixels in allocated entries vs total gpu memory allocated.
"wasted" percentages are the amount of pixels in allocated slots that are wasted due to rounding slab sizes to the next power of two.
"buckets", show a distribution of the texture regions depending on how full they are (first bucket is number of regions less than 12.5% full, next one is number of buckets 12.5% to 25% full, last one is number of regions more than 87.5% full, etc.).
"region sizes" are the number of regions of each slab size (all non-square regions lumped in the same count).

youtube front page
====================
- color 8 linear: regions: 27 + 2 empty, total 7602176px, allocated: 82.421875%, wasted: 46.48007%, buckets: [1, 1, 1, 0, 1, 0, 1, 22], region sizes: 16: 1, 32: 1, 64: 1, 128: 1, 256: 0, 512: 0, non-square: 23
- color 8 nearest: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 8 linear: regions: 1 + 2 empty, total 786432px, allocated: 0.13020833%, wasted: 12.109375%, buckets: [1, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 1, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 16 linear: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0


wikipedia coffee (top of the page)
====================
- color 8 linear: regions: 10 + 0 empty, total 2621440px, allocated: 54.296875%, wasted: 68.59066%, buckets: [3, 0, 1, 0, 2, 0, 0, 4], region sizes: 16: 2, 32: 1, 64: 1, 128: 1, 256: 2, 512: 2, non-square: 1
- color 8 nearest: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 8 linear: regions: 0 + 2 empty, total 524288px, allocated: 0.0%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 16 linear: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0


about:newtab
====================
- color 8 linear: regions: 7 + 3 empty, total 2621440px, allocated: 26.65039%, wasted: 54.420403%, buckets: [3, 0, 0, 1, 1, 0, 1, 1], region sizes: 16: 2, 32: 1, 64: 1, 128: 1, 256: 2, 512: 0, non-square: 0
- color 8 nearest: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 8 linear: regions: 0 + 2 empty, total 524288px, allocated: 0.0%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 16 linear: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0

cnn.com
====================
- color 8 linear: regions: 15 + 5 empty, total 5242880px, allocated: 47.910156%, wasted: 58.60028%, buckets: [1, 2, 2, 0, 1, 1, 2, 6], region sizes: 16: 2, 32: 1, 64: 1, 128: 1, 256: 2, 512: 2, non-square: 6
- color 8 nearest: regions: 0 + 1 empty, total 262144px, allocated: 0.0%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0
- alpha 8 linear: regions: 1 + 2 empty, total 786432px, allocated: 8.333333%, wasted: 94.174194%, buckets: [0, 0, 1, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 1, 512: 0, non-square: 0
- alpha 16 linear: regions: 0 + 0 empty, total 0px, allocated: NaN%, wasted: NaN%, buckets: [0, 0, 0, 0, 0, 0, 0, 0], region sizes: 16: 0, 32: 0, 64: 0, 128: 0, 256: 0, 512: 0, non-square: 0

Take with a pinch of salt, I may have poorly counted the allocations and waste (don't hesitate to double-check the patch).

Some observations:

I expected the slab allocation strategy with powers-of-two slab sizes to waste memory, but not nearly as much as these quick measurements suggest.
The rectangular pages appear to pay off on some sites, like youtube with lots of large images.
Regions with small slab sizes tend to have a lot of free space. In other words we have large regions with just a few items and using smaller regions would significantly reduce the memory cost.
Regions with large slab sizes tend to have less free space, but that's a consequence of having few slots per region and they waste a lot of pixels.
It's worth adjusting the alpha and color arrays separately. The alpha one could grow one layer at a time instead of 4, for example.
We don't run into nearest sampled and alpha16 content much, but I suspect that when we do we'll allocate 4 texture layers just for that one item so it's probably worth growing them in small increments.

It would be nice to reduce the amount of waste per allocation. I suspect that a best-fit shelf allocator with power-of-two shelve sizes would do OK with fragmentation and at least remove a bunch of the waste on the x-axis.

Without changing the allocator type, though, I think that we would benefit in the short term from having 256x256 regions for most slab sizes and 512x512 ones for larger items.

Nicolas Silva [:nical]

Assignee

Comment 6

•

5 years ago

•

Edited

Very relevant comment from Jamie:

one thing we need to be careful about with texture cache changes is that the X position can be important to whether uploads can be DMAd or not. If we try to squeeze more thin items in horizontally we might hit a slow path during uploads on some platforms I think we might already get this wrong on Mac: https://bugzilla.mozilla.org/show_bug.cgi?id=1603783#c2

Means we'll have a hard time having both nicely packed atlas allocation schemes and fast uploads, unless we decouple the upload from the packing by uploading conservatively aligned data and then issuing GPU-side copies into a tightly packed shared cache.

Nicolas Silva [:nical]

Assignee

Comment 7

•

5 years ago

Attached file Bug 1616901 - Allocate rgb8 linear shared texture cache regions four at a time. r=jrmuizel,gw — Details

As an initial step to reduce the reallocation churn a bit. This texture array grows up to 12 layers for any trivial page and goes up to 40+ on many sites (like the youtube front page) so it's far from enough but it's a start. Simple popups like Help > About Nightly don't need more than 4 layers, though.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → nical.bugzilla

Status: NEW → ASSIGNED

Nicolas Silva [:nical]

Assignee

Comment 8

•

5 years ago

Attached file Bug 1616901 - Put a bit more texture cache eviction pressure. r=jrmuizel,gw — Details

Also make sure that the pressure factor never gets to zero.

These new constants aren't deinitive in any way though I think that they are a bit more reasonable.

Depends on D67366

Nicolas Silva [:nical]

Assignee

Comment 9

•

5 years ago

Attached file Bug 1616901 - Move picture cache texture array logic into its own struct. r=gw — Details

Doesn't make for a stellar API but texture_cache.rs has a lot of code and I had a hard time wrapping my head around the scattered parts of picture-cache specific code.

In addition (and more importantly) texture arrays for each tile sizes can be allocated on demand and do not need to be created when initializing the texture cache. This avoids allocating and deallocating the mispredicted picture texture size when the initial window size is garbage.

Depends on D67367

Nicolas Silva [:nical]

Assignee

Comment 10

•

5 years ago

Attached file Bug 1616901 - Cap the maximum texture array layer count to 32 on all platforms. r=gw,jrmuizel — Details

It was previously set on mac only due to driver mischiefs, however the cost of growing texture arrays becomes high with large layer counts, which capping the layer count to 32 everywhere helps mitigate at the expense of batch breaks.

Depends on D67368

Pulsebot

Comment 11

•

5 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a781ce6dd020 Allocate rgb8 linear shared texture cache regions four at a time. r=gw https://hg.mozilla.org/integration/autoland/rev/86167a71e706 Put a bit more texture cache eviction pressure. r=gw https://hg.mozilla.org/integration/autoland/rev/955e7790f963 Move picture cache texture array logic into its own struct. r=gw https://hg.mozilla.org/integration/autoland/rev/8f4b47079a44 Cap the maximum texture array layer count to 32 on all platforms. r=gw

Noemi Erli[:noemi_erli]

Comment 12

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a781ce6dd020
https://hg.mozilla.org/mozilla-central/rev/86167a71e706
https://hg.mozilla.org/mozilla-central/rev/955e7790f963
https://hg.mozilla.org/mozilla-central/rev/8f4b47079a44

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox76: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla76

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Keywords: leave-open

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

status-firefox76: fixed → affected

Nicolas Silva [:nical]

Assignee

Comment 13

•

5 years ago

Attached file Bug 1616901 - Allocate 16 layers for rgba8 linear textures in the cache and allocate new arrays instead of growing it. r=gw — Details

Pulsebot

Comment 14

•

5 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/622591504941 Allocate 16 layers for rgba8 linear textures in the cache and allocate new arrays instead of growing it. r=gw

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Depends on: 1624272

Alexandru Michis [:malexandru]

Comment 15

•

5 years ago

Backout link: https://hg.mozilla.org/integration/autoland/rev/5868a89964eebb17c2a19be85879bbfc91f01c74

Backed out 3 changesets (Bug 1623777, Bug 1616901, Bug 1623647) for causing reftest failures.

Push with failures: https://treeherder.mozilla.org/?#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Crunnable&searchStr=reftest&fromchange=d8b9fdbd8762a8841330ad2f96d924e6b65a057f&tochange=5868a89964eebb17c2a19be85879bbfc91f01c74&selectedJob=294314336

Failure logs:

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Blocks: wr-intel

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Blocks: 1616646

Nicolas Silva [:nical]

Assignee

Comment 16

•

5 years ago

Attached file Bug 1616901 - Adjust reftest expectations. r=jrmuizel — Details

I suspect the number of texture array layers impacts some arithmetic realted to how the GL implementation deals with texture coordinates. These refetests started failing more consistently (though not 100% of the time) with the patch that sets the number of rgba8 linear texture array laters to 16.

Pulsebot

Comment 17

•

5 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6e0471fb1731 Allocate 16 layers for rgba8 linear textures in the cache and allocate new arrays instead of growing it. r=gw https://hg.mozilla.org/integration/autoland/rev/b2eed1f2b242 Adjust reftest expectations. r=jrmuizel

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Blocks: wr-perf-p1

Nicolas Silva [:nical]

Assignee

Comment 18

•

5 years ago

Attached file Bug 1616901 - More reftest adjustments. — Details

Nicolas Silva [:nical]

Assignee

Comment 19

•

5 years ago

Attached file Bug 1616901 - Yet more reftest adjustments. r=jrmuizel — Details

Pulsebot

Comment 20

•

5 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8eb82d6066cc More reftest adjustments. r=jrmuizel https://hg.mozilla.org/integration/autoland/rev/01b9bdb6e5ef Yet more reftest adjustments. r=jrmuizel

Stefan Hindli [:stefan_hindli]

Comment 21

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6e0471fb1731
https://hg.mozilla.org/mozilla-central/rev/b2eed1f2b242
https://hg.mozilla.org/mozilla-central/rev/8eb82d6066cc
https://hg.mozilla.org/mozilla-central/rev/01b9bdb6e5ef

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Depends on: 1624565

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Depends on: 1624640

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Depends on: 1624644

Julien Cristau [:jcristau]

Updated

•

5 years ago

Regressions: 1624250

Nicolas Silva [:nical]

Assignee

Comment 22

•

5 years ago

The texture cache code modified in 1624644 can't affect scene building. It's much more likely that the regression comes from bug 1616412 (unfortunately).

Flags: needinfo?(mikokm)

Regressions: 1616412
No longer regressions: 1624250

Nicolas Silva [:nical]

Assignee

Comment 23

•

5 years ago

Oh snap I updated the wrong bug.

Flags: needinfo?(mikokm)

No longer regressions: 1616412

Nicolas Silva [:nical]

Assignee

Comment 24

•

5 years ago

Attached file Bug 1616901 - Remove accidental print statements. r=jrmuizel — Details

Pulsebot

Comment 25

•

5 years ago

Pushed by nsilva@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7b36a0458e50 Remove accidental print statements. r=jrmuizel

Noemi Erli[:noemi_erli]

Comment 26

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7b36a0458e50

Nicolas Silva [:nical]

Assignee

Updated

•

5 years ago

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

Keywords: leave-open

Resolution: --- → FIXED

Jamie Nicol [:jnicol] out of office until 6th Jan

Updated

•

5 years ago

Regressions: 1632795

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox76: affected → fixed

Phabricator Automation

Updated

•

4 years ago

Attachment #9132314 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.