Closed Bug 1641751 Opened 4 years ago Closed 4 years ago

Improve texture cache eviction policies

Categories

(Core :: Graphics: WebRender, task)

task

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: gw, Assigned: gw)

References

Details

Attachments

(6 files)

No description provided.

This patch removes two of the existing paths for evicting textures.
One is no longer useful, and the other is over-zealous, and causes
frame stalls.

When the shared texture cache made use of variable layer counts, the
maybe_reclaim_shared_memory method would look at the space in free
layers, and if large enough, would clear the entire cache and rely on
it being rebuilt implicitly as new requests occur. However, since we
no longer resize the number of slices, this doesn't serve a useful
purpose (the texture cache already frees any empty texture cache units
at the end of each frame).

The other method applies a conservative, periodic GC that occurs if no
other GCs have run for five seconds. However, this is a very aggressive
eviction that often evicts most of the shared cache, and was being run
whenever the size of the texture cache was > 16 MB. The effect of this
is that if you pause scrolling for a few seconds, and then scroll again,
most of the cache gets evicted. Instead, completely remove this concept,
relying on empty texture cache pages being freed and memory pressure
events, and general cache eviction during frames.

Assignee: nobody → gwatson
Status: NEW → ASSIGNED
Blocks: 1631623
Keywords: leave-open
Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5cc77ea38625
Part 1 - Remove some texture cache eviction paths r=Bert,kvark

When document splitting was supported, there was some complexity
when texture cache was cleared, where a frame build could be
required to ensure other documents would render correctly.

Since this is no longer used, we can remove the per-document state
here, which will simplify some future changes to how the cache
eviction policy works.

  • Maintain a running total of bytes allocated in both standalone and
    shared cache regions. This is used as a threshold to know when to
    force a mid-frame eviction. Previously, as soon as the currently
    allocated set of shared textures were full, we'd force an eviction.
    This means that in typical use cases, we were forcing an eviction
    as soon as the texture cache is > 16 MB, which is inefficient.

  • Separate out picture cache eviction from the normal cache eviction
    path. This will be important in the next patch which will change
    the eviction algorithm for all shared / standalone entries.

  • Remove Eviction::Eager as a policy option for shared and standalone
    textures. As part of this, switch render task cache entries to use
    Eviction::Auto. This is a better option anyway, there is no real
    benefit to evicting render tasks as soon as possible - they should
    be expired based on usage, just as for normal cache entries.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/01f402e8dd9a
Part 2 - Remove per-document complexity in texture cache. r=nical,Bert
Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a900282391e1
Part 3 - Refactor some texture cache code for future eviction changes. r=kvark

Previously, when we decided to evict elements from the texture cache,
a very simple scheme was used. All elements would be scanned, and
any element that was older than a given threshold would be freed.

This is not ideal for a number of reasons:

  • When an eviction takes place, a large number of elements must be scanned.
  • When we do evict elements, we often evict a large number of elements in
    a single frame, causing CPU spikes.
  • We often evict more items than necessary, resulting in redundant
    uploads of the same elements again on subsequent frames.

This patch introduces a new LRU cache collection, which is used to
manage the lifetime of most texture cache entries [1].

With this change, we can now:

  • Efficiently evict items from oldest to newest, without scanning the entire list.
  • Efficiently evict only a small number of items per frame, reducing the
    chance of CPU frame spikes.
  • Evict only enough elements to reduce memory under a specific threshold,
    which reduces the number of redundant evictions / re-uploads required.

[1] Picture cache tiles and manual eviction elements are stored inside
the new cache for efficiency, but the lifetime of them is managed
separately by the texture cache.

Regressions: 1643238
Regressions: 1643240

The android failures looks like they need updated fuzziness annotations (they just have a few extra pixels of fuzziness than they already do, see [1] and [2]).

The Windows failure is a bit strange - there is an existing intermittent that was filed 17 days ago for that test [3]. I suspect the change here has just affecting timing in such a way that the pre-existing intermittent failure now behaves differently, but is probably unrelated to the changes in this patch. However, I'll do some local investigation of that today to see what I can find.

[1] https://searchfox.org/mozilla-central/source/layout/reftests/svg/svg-integration/clip-path/reftest.list#41
[2] https://searchfox.org/mozilla-central/source/gfx/tests/reftest/reftest.list#22
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1638841

Flags: needinfo?(gwatson)

I confirmed that the android failure is due to a change in location where the items are allocated in the texture cache, and requires an extra 5 pixels added to the fuzziness (~ 0.45% change).

I investigated the windows failure. It's an existing intermittent, but this patch changes how and when it fails - it's unrelated to this patch itself though. I wrote up some initial findings at https://bugzilla.mozilla.org/show_bug.cgi?id=1638841#c3, and set the fuzziness on this test on windows to allow this patch to land.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/41191094ff58
Part 3 - Refactor some texture cache code for future eviction changes. r=Bert,kvark
Attached image LRU_freelist.png

I'll just park this here as two attachments.

First, the basic idea behind a free list: it's a vec where each element stores either the client's data, or a "next" index.
When a cell is in use, ie. allocated, the next field is ignored.
When a cell is free, it's linked into a linked list, and the chain that starts at free_head traverses all unallocated cells via their next pointer. It's a standard linked list except that all its nodes are part of this array, instead of being allocated/freed individually.

(So ideally, since it's either data or next, the two could reuse storage via an Enum).

Attached image LRU_tracker.png

The LRU tracker re-implements the idea of a free list so it works the same way for cells that are not in use.

Cells that are in use implement a doubly linked list. Each cell has a next and prev field plus a payload. The list can be traversed in both directions starting at head or tail; head is the oldest, least recently used item, and tail the most recently used. Thus to touch an element and make it the MRU, it's unlinked from the double list and re-linked as the new tail.

To be able to do this, you need to know where the item that you want to touch currently is, so for this reason the entries array holds both the data that we're trying to manage in LRU fashion, and an index into the LRU tracker's items. (entries is a FreeList as well).

In the other direction, removing LRU items means starting a traversal at head and removing elements. They also need to be removed from the actual entries, so there's a pointer in the other direction: the free list handle payload in each LRU tracker element is a handle that points back to the entries.

To be clear: to implement its FreeList behavior, the LRU tracker re-uses the next field from the free-so-unused-next/prev pair. The "free list handle" that makes up the payload of each element, is only meant to refer back to the entries FreeList when LRU items are being popped; it is not itself part of internal list management.

Pushed by gwatson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/01ccbe217e03
Part 4 - Introduce an LRU eviction policy for the texture cache r=Bert,kvark
Keywords: leave-open
Regressions: 1644632
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Regressions: 1658182
Regressions: 1665518
Regressions: 1685744
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: