Improve texture cache eviction policies
Categories
(Core :: Graphics: WebRender, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: gw, Assigned: gw)
References
Details
Attachments
(6 files)
Assignee | ||
Comment 1•5 years ago
|
||
This patch removes two of the existing paths for evicting textures.
One is no longer useful, and the other is over-zealous, and causes
frame stalls.
When the shared texture cache made use of variable layer counts, the
maybe_reclaim_shared_memory
method would look at the space in free
layers, and if large enough, would clear the entire cache and rely on
it being rebuilt implicitly as new requests occur. However, since we
no longer resize the number of slices, this doesn't serve a useful
purpose (the texture cache already frees any empty texture cache units
at the end of each frame).
The other method applies a conservative, periodic GC that occurs if no
other GCs have run for five seconds. However, this is a very aggressive
eviction that often evicts most of the shared cache, and was being run
whenever the size of the texture cache was > 16 MB. The effect of this
is that if you pause scrolling for a few seconds, and then scroll again,
most of the cache gets evicted. Instead, completely remove this concept,
relying on empty texture cache pages being freed and memory pressure
events, and general cache eviction during frames.
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
When document splitting was supported, there was some complexity
when texture cache was cleared, where a frame build could be
required to ensure other documents would render correctly.
Since this is no longer used, we can remove the per-document state
here, which will simplify some future changes to how the cache
eviction policy works.
Comment 4•5 years ago
|
||
bugherder |
Assignee | ||
Comment 5•5 years ago
|
||
-
Maintain a running total of bytes allocated in both standalone and
shared cache regions. This is used as a threshold to know when to
force a mid-frame eviction. Previously, as soon as the currently
allocated set of shared textures were full, we'd force an eviction.
This means that in typical use cases, we were forcing an eviction
as soon as the texture cache is > 16 MB, which is inefficient. -
Separate out picture cache eviction from the normal cache eviction
path. This will be important in the next patch which will change
the eviction algorithm for all shared / standalone entries. -
Remove Eviction::Eager as a policy option for shared and standalone
textures. As part of this, switch render task cache entries to use
Eviction::Auto. This is a better option anyway, there is no real
benefit to evicting render tasks as soon as possible - they should
be expired based on usage, just as for normal cache entries.
Comment 7•5 years ago
|
||
bugherder |
Assignee | ||
Comment 9•5 years ago
|
||
Previously, when we decided to evict elements from the texture cache,
a very simple scheme was used. All elements would be scanned, and
any element that was older than a given threshold would be freed.
This is not ideal for a number of reasons:
- When an eviction takes place, a large number of elements must be scanned.
- When we do evict elements, we often evict a large number of elements in
a single frame, causing CPU spikes. - We often evict more items than necessary, resulting in redundant
uploads of the same elements again on subsequent frames.
This patch introduces a new LRU cache collection, which is used to
manage the lifetime of most texture cache entries [1].
With this change, we can now:
- Efficiently evict items from oldest to newest, without scanning the entire list.
- Efficiently evict only a small number of items per frame, reducing the
chance of CPU frame spikes. - Evict only enough elements to reduce memory under a specific threshold,
which reduces the number of redundant evictions / re-uploads required.
[1] Picture cache tiles and manual eviction elements are stored inside
the new cache for efficiency, but the lifetime of them is managed
separately by the texture cache.
Comment 10•5 years ago
|
||
bugherder |
Comment 11•5 years ago
|
||
For causing bugs: 1643310, 1643238, 1643240.
Backout link: https://hg.mozilla.org/integration/autoland/rev/0d21bdf3fc0118b553c6543ddbff927a2c70cb96
Bug nr 1643310 Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305036539&repo=autoland&lineNumber=13784
Bug nr 1643238 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305007589&repo=autoland&lineNumber=9310
Bug nr 1643240 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305007593&repo=autoland&lineNumber=2077
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
The android failures looks like they need updated fuzziness annotations (they just have a few extra pixels of fuzziness than they already do, see [1] and [2]).
The Windows failure is a bit strange - there is an existing intermittent that was filed 17 days ago for that test [3]. I suspect the change here has just affecting timing in such a way that the pre-existing intermittent failure now behaves differently, but is probably unrelated to the changes in this patch. However, I'll do some local investigation of that today to see what I can find.
[1] https://searchfox.org/mozilla-central/source/layout/reftests/svg/svg-integration/clip-path/reftest.list#41
[2] https://searchfox.org/mozilla-central/source/gfx/tests/reftest/reftest.list#22
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1638841
Assignee | ||
Comment 14•5 years ago
|
||
I confirmed that the android failure is due to a change in location where the items are allocated in the texture cache, and requires an extra 5 pixels added to the fuzziness (~ 0.45% change).
I investigated the windows failure. It's an existing intermittent, but this patch changes how and when it fails - it's unrelated to this patch itself though. I wrote up some initial findings at https://bugzilla.mozilla.org/show_bug.cgi?id=1638841#c3, and set the fuzziness on this test on windows to allow this patch to land.
Comment 15•5 years ago
|
||
Comment 16•5 years ago
|
||
bugherder |
Comment 17•5 years ago
|
||
I'll just park this here as two attachments.
First, the basic idea behind a free list: it's a vec where each element stores either the client's data, or a "next" index.
When a cell is in use, ie. allocated, the next
field is ignored.
When a cell is free, it's linked into a linked list, and the chain that starts at free_head
traverses all unallocated cells via their next
pointer. It's a standard linked list except that all its nodes are part of this array, instead of being allocated/freed individually.
(So ideally, since it's either data or next
, the two could reuse storage via an Enum
).
Comment 18•5 years ago
•
|
||
The LRU tracker re-implements the idea of a free list so it works the same way for cells that are not in use.
Cells that are in use implement a doubly linked list. Each cell has a next
and prev
field plus a payload. The list can be traversed in both directions starting at head
or tail
; head
is the oldest, least recently used item, and tail
the most recently used. Thus to touch
an element and make it the MRU, it's unlinked from the double list and re-linked as the new tail
.
To be able to do this, you need to know where the item that you want to touch currently is, so for this reason the entries
array holds both the data that we're trying to manage in LRU fashion, and an index into the LRU tracker's items. (entries
is a FreeList as well).
In the other direction, removing LRU items means starting a traversal at head
and removing elements. They also need to be removed from the actual entries
, so there's a pointer in the other direction: the free list handle
payload in each LRU tracker element is a handle that points back to the entries
.
To be clear: to implement its FreeList behavior, the LRU tracker re-uses the next
field from the free-so-unused-next/prev pair. The "free list handle" that makes up the payload of each element, is only meant to refer back to the entries
FreeList when LRU items are being popped; it is not itself part of internal list management.
Comment 19•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 20•5 years ago
|
||
bugherder |
Description
•