Closed Bug 1641751 Opened 5 years ago Closed 5 years ago

Improve texture cache eviction policies

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla79

Tracking Flags:

Tracking

Status

firefox79

---

fixed

People

(Reporter: gw, Assigned: gw)

References

Details

Attachments

(6 files)

Bug 1641751 - Part 1 - Remove some texture cache eviction paths 5 years ago Glenn Watson [:gw] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1641751 - Part 2 - Remove per-document complexity in texture cache. 5 years ago Glenn Watson [:gw] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1641751 - Part 3 - Refactor some texture cache code for future eviction changes. 5 years ago Glenn Watson [:gw] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1641751 - Part 4 - Introduce an LRU eviction policy for the texture cache 5 years ago Glenn Watson [:gw] 47 bytes, text/x-phabricator-request		Details \| Review
LRU_freelist.png 5 years ago Bert Peers [:bpeers] 26.38 KB, image/png		Details
LRU_tracker.png 5 years ago Bert Peers [:bpeers] 50.11 KB, image/png		Details

Glenn Watson [:gw]

Assignee

Description

•

5 years ago

No description provided.

Glenn Watson [:gw]

Assignee

Comment 1

•

5 years ago

Attached file Bug 1641751 - Part 1 - Remove some texture cache eviction paths — Details

This patch removes two of the existing paths for evicting textures.
One is no longer useful, and the other is over-zealous, and causes
frame stalls.

When the shared texture cache made use of variable layer counts, the
maybe_reclaim_shared_memory method would look at the space in free
layers, and if large enough, would clear the entire cache and rely on
it being rebuilt implicitly as new requests occur. However, since we
no longer resize the number of slices, this doesn't serve a useful
purpose (the texture cache already frees any empty texture cache units
at the end of each frame).

The other method applies a conservative, periodic GC that occurs if no
other GCs have run for five seconds. However, this is a very aggressive
eviction that often evicts most of the shared cache, and was being run
whenever the size of the texture cache was > 16 MB. The effect of this
is that if you pause scrolling for a few seconds, and then scroll again,
most of the cache gets evicted. Instead, completely remove this concept,
relying on empty texture cache pages being freed and memory pressure
events, and general cache eviction during frames.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → gwatson

Status: NEW → ASSIGNED

Nicolas Silva [:nical]

Updated

•

5 years ago

Blocks: 1631623

Glenn Watson [:gw]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Pulsebot

Comment 2

•

5 years ago

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5cc77ea38625 Part 1 - Remove some texture cache eviction paths r=Bert,kvark

Glenn Watson [:gw]

Assignee

Comment 3

•

5 years ago

Attached file Bug 1641751 - Part 2 - Remove per-document complexity in texture cache. — Details

When document splitting was supported, there was some complexity
when texture cache was cleared, where a frame build could be
required to ensure other documents would render correctly.

Since this is no longer used, we can remove the per-document state
here, which will simplify some future changes to how the cache
eviction policy works.

Razvan Maries

Comment 4

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5cc77ea38625

Glenn Watson [:gw]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1641751 - Part 3 - Refactor some texture cache code for future eviction changes. — Details

Maintain a running total of bytes allocated in both standalone and
shared cache regions. This is used as a threshold to know when to
force a mid-frame eviction. Previously, as soon as the currently
allocated set of shared textures were full, we'd force an eviction.
This means that in typical use cases, we were forcing an eviction
as soon as the texture cache is > 16 MB, which is inefficient.
Separate out picture cache eviction from the normal cache eviction
path. This will be important in the next patch which will change
the eviction algorithm for all shared / standalone entries.
Remove Eviction::Eager as a policy option for shared and standalone
textures. As part of this, switch render task cache entries to use
Eviction::Auto. This is a better option anyway, there is no real
benefit to evicting render tasks as soon as possible - they should
be expired based on usage, just as for normal cache entries.

Pulsebot

Comment 6

•

5 years ago

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/01f402e8dd9a Part 2 - Remove per-document complexity in texture cache. r=nical,Bert

Bogdan Tara[:bogdan_tara | bogdant]

Comment 7

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/01f402e8dd9a

Pulsebot

Comment 8

•

5 years ago

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a900282391e1 Part 3 - Refactor some texture cache code for future eviction changes. r=kvark

Glenn Watson [:gw]

Assignee

Comment 9

•

5 years ago

Attached file Bug 1641751 - Part 4 - Introduce an LRU eviction policy for the texture cache — Details

Previously, when we decided to evict elements from the texture cache,
a very simple scheme was used. All elements would be scanned, and
any element that was older than a given threshold would be freed.

This is not ideal for a number of reasons:

When an eviction takes place, a large number of elements must be scanned.
When we do evict elements, we often evict a large number of elements in
a single frame, causing CPU spikes.
We often evict more items than necessary, resulting in redundant
uploads of the same elements again on subsequent frames.

This patch introduces a new LRU cache collection, which is used to
manage the lifetime of most texture cache entries [1].

With this change, we can now:

Efficiently evict items from oldest to newest, without scanning the entire list.
Efficiently evict only a small number of items per frame, reducing the
chance of CPU frame spikes.
Evict only enough elements to reduce memory under a specific threshold,
which reduces the number of redundant evictions / re-uploads required.

[1] Picture cache tiles and manual eviction elements are stored inside
the new cache for efficiency, but the lifetime of them is managed
separately by the texture cache.

Treeherder Bug Filer

Updated

•

5 years ago

Regressions: 1643238

Treeherder Bug Filer

Updated

•

5 years ago

Regressions: 1643240

Atila Butkovits

Comment 10

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a900282391e1

Atila Butkovits

Comment 11

•

5 years ago

For causing bugs: 1643310, 1643238, 1643240.

Backout link: https://hg.mozilla.org/integration/autoland/rev/0d21bdf3fc0118b553c6543ddbff927a2c70cb96

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=r2%2Cwindows%2Cwebrender&tochange=6bb6a5cdc941eaa15c0e36dcaf8e4ad8509d4e90&fromchange=c7bb28b233961da4f451aa8f27752ee5bf41d5d0&selectedTaskRun=FiKpAhDXQzSVFP_NsonDvw-0

Bug nr 1643310 Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305036539&repo=autoland&lineNumber=13784
Bug nr 1643238 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305007589&repo=autoland&lineNumber=9310
Bug nr 1643240 https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305007593&repo=autoland&lineNumber=2077

Flags: needinfo?(gwatson)

Andreea Pavel [:apavel]

Comment 12

•

5 years ago

Backout merged: https://hg.mozilla.org/mozilla-central/rev/0d21bdf3fc0118b553c6543ddbff927a2c70cb96

Glenn Watson [:gw]

Assignee

Comment 13

•

5 years ago

The android failures looks like they need updated fuzziness annotations (they just have a few extra pixels of fuzziness than they already do, see [1] and [2]).

The Windows failure is a bit strange - there is an existing intermittent that was filed 17 days ago for that test [3]. I suspect the change here has just affecting timing in such a way that the pre-existing intermittent failure now behaves differently, but is probably unrelated to the changes in this patch. However, I'll do some local investigation of that today to see what I can find.

[1] https://searchfox.org/mozilla-central/source/layout/reftests/svg/svg-integration/clip-path/reftest.list#41
[2] https://searchfox.org/mozilla-central/source/gfx/tests/reftest/reftest.list#22
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1638841

Flags: needinfo?(gwatson)

Glenn Watson [:gw]

Assignee

Comment 14

•

5 years ago

I confirmed that the android failure is due to a change in location where the items are allocated in the texture cache, and requires an extra 5 pixels added to the fuzziness (~ 0.45% change).

I investigated the windows failure. It's an existing intermittent, but this patch changes how and when it fails - it's unrelated to this patch itself though. I wrote up some initial findings at https://bugzilla.mozilla.org/show_bug.cgi?id=1638841#c3, and set the fuzziness on this test on windows to allow this patch to land.

Pulsebot

Comment 15

•

5 years ago

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/41191094ff58 Part 3 - Refactor some texture cache code for future eviction changes. r=Bert,kvark

Narcis Beleuzu [:NarcisB]

Comment 16

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/41191094ff58

Bert Peers [:bpeers]

Comment 17

•

5 years ago

Attached image LRU_freelist.png — Details

I'll just park this here as two attachments.

First, the basic idea behind a free list: it's a vec where each element stores either the client's data, or a "next" index.
When a cell is in use, ie. allocated, the next field is ignored.
When a cell is free, it's linked into a linked list, and the chain that starts at free_head traverses all unallocated cells via their next pointer. It's a standard linked list except that all its nodes are part of this array, instead of being allocated/freed individually.

(So ideally, since it's either data or next, the two could reuse storage via an Enum).

Bert Peers [:bpeers]

Comment 18

•

5 years ago

•

Edited

Attached image LRU_tracker.png — Details

The LRU tracker re-implements the idea of a free list so it works the same way for cells that are not in use.

Cells that are in use implement a doubly linked list. Each cell has a next and prev field plus a payload. The list can be traversed in both directions starting at head or tail; head is the oldest, least recently used item, and tail the most recently used. Thus to touch an element and make it the MRU, it's unlinked from the double list and re-linked as the new tail.

To be able to do this, you need to know where the item that you want to touch currently is, so for this reason the entries array holds both the data that we're trying to manage in LRU fashion, and an index into the LRU tracker's items. (entries is a FreeList as well).

In the other direction, removing LRU items means starting a traversal at head and removing elements. They also need to be removed from the actual entries, so there's a pointer in the other direction: the free list handle payload in each LRU tracker element is a handle that points back to the entries.

To be clear: to implement its FreeList behavior, the LRU tracker re-uses the next field from the free-so-unused-next/prev pair. The "free list handle" that makes up the payload of each element, is only meant to refer back to the entries FreeList when LRU items are being popped; it is not itself part of internal list management.

Pulsebot

Comment 19

•

5 years ago

Pushed by gwatson@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/01ccbe217e03 Part 4 - Introduce an LRU eviction policy for the texture cache r=Bert,kvark

Glenn Watson [:gw]

Assignee

Updated

•

5 years ago

Keywords: leave-open

Noemi Erli[:noemi_erli]

Updated

•

5 years ago

Regressions: 1644632

Narcis Beleuzu [:NarcisB]

Comment 20

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/01ccbe217e03

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox79: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla79

Mayank Bansal

Updated

•

5 years ago

Regressions: 1658182

Jamie Nicol [:jnicol]

Updated

•

5 years ago

Regressions: 1665518

Markus Stange [:mstange]

Updated

•

4 years ago

Regressions: 1685744

You need to log in before you can comment on or make changes to this bug.