Open Bug 2008619 Opened 4 months ago Updated 1 month ago

[meta] Text shaping word cache affects performance on several SP3 tests (particularly TipTap)

Categories

(Core :: Graphics: Text, enhancement)

Product:

Component:

Type:

enhancement

Priority:

Not set

Severity:

S2

Tracking

()

Status:

ASSIGNED

Performance Impact

low

People

(Reporter: denispal, Assigned: jlink)

References

(Blocks 2 open bugs)

Details

(Keywords: meta)

Attachments

(1 file)

Bug 2008619: WIP: Skip word cache for very long text runs. 4 months ago Denis Palmeiro [:denispal] 48 bytes, text/x-phabricator-request		Details \| Review

Denis Palmeiro [:denispal]

Reporter

Description

•

4 months ago

The word cache splits text into individual words and shapes each separately which creates a lot of overhead in the Editor-TipTap test during Speedometer3, even with good cache hit rates. When I run 100 iterations of TipTap on my macbook pro, we spend about 2240 ms text shaping when the word cache is enabled and 1193 ms when I skip it. Skipping the word cache leads to a 10% improvement in Editor-TipTap's performance..

Denis Palmeiro [:denispal]

Reporter

Comment 1

•

4 months ago

Here is a profile: https://share.firefox.dev/4bqWI42

Denis Palmeiro [:denispal]

Reporter

Comment 2

•

4 months ago

Since a lot of the overhead is in locking and realloc, maybe we can get away with a smaller fixed size MRU cache instead of the current implementation with mozilla::HashMap.

Denis Palmeiro [:denispal]

Reporter

Comment 3

•

4 months ago

(In reply to Denis Palmeiro [:denispal] from comment #2)

Since a lot of the overhead is in locking and realloc, maybe we can get away with a smaller fixed size MRU cache instead of the current implementation with mozilla::HashMap.

I prototyped this but it doesn't seem to help much.

Denis Palmeiro [:denispal]

Reporter

Comment 4

•

4 months ago

Attached file Bug 2008619: WIP: Skip word cache for very long text runs. — Details

For text runs longer than 1024 characters, skip the word cache and shape the entire run directly. This avoids per-word overhead (i.e. locking, reallocs, etc) that becomes inefficient in cases that have large text blocks. Improves Editor-TipTap by about 10%.

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

4 months ago

See Also: → 2005283

Phabricator Automation

Updated

•

4 months ago

Attachment #9535926 - Attachment description: Bug 2008619: Skip word cache for very long text runs. r=jfkthame! → Bug 2008619: WIP: Skip word cache for very long text runs.

Denis Palmeiro [:denispal]

Reporter

Updated

•

4 months ago

Assignee: dpalmeiro → jlink

Jonathan Kew [:jfkthame]

Comment 5

•

4 months ago

The performance observation here is interesting. It used to be that the word cache gave us a (smallish but significant) perf boost on most text-heavy content, but I wonder if that's still true. In general, harfbuzz shaping performance has been steadily improving thanks to lots of good upstream work by Behdad; it may be that it's reached a point where the cache isn't really gaining us much.

So I guess I'm suggesting that before we do a threshold as per the patch here (which may be fine -- I'm not opposed to the idea), I'd be interested to know what happens (for perf tests in general, not just Editor-TipTap) if we disable the word cache altogether. It should be pretty simple to do that, by hacking gfxFont::SplitAndInitTextRun to just call ShapeTextWithoutWordCache directly. Would you be up for giving that a try?

Comment 6

•

4 months ago

PERF key word?

Denis Palmeiro [:denispal]

Reporter

Comment 7

•

4 months ago

(In reply to Jonathan Kew [:jfkthame] from comment #5)

The performance observation here is interesting. It used to be that the word cache gave us a (smallish but significant) perf boost on most text-heavy content, but I wonder if that's still true. In general, harfbuzz shaping performance has been steadily improving thanks to lots of good upstream work by Behdad; it may be that it's reached a point where the cache isn't really gaining us much.

So I guess I'm suggesting that before we do a threshold as per the patch here (which may be fine -- I'm not opposed to the idea), I'd be interested to know what happens (for perf tests in general, not just Editor-TipTap) if we disable the word cache altogether. It should be pretty simple to do that, by hacking gfxFont::SplitAndInitTextRun to just call ShapeTextWithoutWordCache directly. Would you be up for giving that a try?

Justin was interested in trying out different caching implementations so I transferred this bug to him. Justin, is this something you can also try out? Thanks!

Flags: needinfo?(jlink)

Assignee

Comment 8

•

4 months ago

Sorry, I was out sick yesterday so I haven't done anything here yet. I plan to start taking a look today and will try out Jonathan's suggestion.

Flags: needinfo?(jlink)

Frank Doty [:fdoty]

Updated

•

4 months ago

Whiteboard: [fxpe]

Jira Integration Bot

Updated

•

4 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/FXPE-10

Frank Doty [:fdoty]

Updated

•

4 months ago

Status: NEW → ASSIGNED

BugBot [:suhaib / :marco/ :calixte]

Comment 9

•

3 months ago

The severity field is not set for this bug.
:lsalzman, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)

Lee Salzman [:lsalzman]

Updated

•

3 months ago

Severity: -- → S4

Flags: needinfo?(lsalzman)

Assignee

Comment 10

•

3 months ago

•

I have tried four different ways of disabling the word cache (comparisons here here here and here).

Except on Windows non-ref hardware where anything that causes us to use the word cache less is a win, simply disabling the word cache results in a clear regression. In all of the cases, there are huge regressions in Editor-TipTap. On Windows, those huge Editor-TipTap regressions are sometimes offset by wins in other tests, although those seem less significant (even though the effect is that the overall SP3 score gets pulled upward).

I was surprised to see such a strong regression after Denis' earlier results so I re-implemented his change and tried a few different thresholds here and still see the improvements that he saw. Using a lower threshold than what he picked (1024) might also be better.

There are a few take-aways that I see from this:

The word cache is useful and valuable but we are definitely over-using it.
Non-ref Windows hardware seems to have a distinctly difference experience with the word cache. Perhaps the shaping itself is less expensive there for some reason so there is less benefit in caching the results? Maybe something aspect of the word cache executes less efficiently on that hardware which affects when it should be used?

I'm taking a few action items out of this:

Study how the tests that benefit from the word cache are shaping words vs the tests that don't and use that to determine the appropriate conditions for when we should try to skip it.
Look more closely into what's happening on non-ref Windows hardware. I triggered some performance profiles on these machines in CI and have started looking at the profiles. These are the profiles from the slowest of these runs. So far it's surprising because we aren't actually spending much time in the main code related to shaping and caching.
Try using a simpler data structure (without need for allocations and possibly without need for locking?) like the cache that I previously implemented for JS atoms.

Denis Palmeiro [:denispal]

Reporter

Updated

•

2 months ago

Blocks: sp3-high
No longer blocks: speedometer3

Denis Palmeiro [:denispal]

Reporter

Updated

•

2 months ago

Blocks: speedometer3

Frank Doty [:fdoty]

Updated

•

1 month ago

Whiteboard: [fxpe] → [perf-prio]

Frank Doty [:fdoty]

Updated

•

1 month ago

Whiteboard: [perf-prio]

Assignee

Updated

•

1 month ago

Depends on: 2030147

Assignee

Updated

•

1 month ago

Summary: Text shaping word cache adds a lot of overhead during Editor-TipTap → Text shaping word cache affects performance on several SP3 tests (particularly TipTap)

Assignee

Updated

•

1 month ago

Summary: Text shaping word cache affects performance on several SP3 tests (particularly TipTap) → [meta] Text shaping word cache affects performance on several SP3 tests (particularly TipTap)

Assignee

Updated

•

1 month ago

Severity: S4 → S2

Type: defect → enhancement

Assignee

Comment 11

•

1 month ago

It turns out that the word cache is still very helpful when it comes to performance. The reason that Denis' patch to restrict when we use the word cache made things faster was because it avoided doing something that sabotages the word cache. See bug 2030147 for more details.

I'm turning this into a meta-bug because there are still a few more avenues to follow-up on:

Pre-allocate the word cache with a larger size to avoid some initial re-allocs. A quick test of this seemed to indicate that this was a significant win on Linux and neutral on other desktop platforms.
Consider using a different data structure for the cache. If the "working set" is actually quite large, then the current data structure is probably well-suited. If not, an MRU cache or hash-based cache (similar to a CPU cache) might be a better choice.
Take a look at the locking that is used. In my tests, it seems like we're always doing the shaping from the main thread so maybe the locking isn't really necessary but, then again, maybe it's also not really harming us if there's no contention.

Jonathan Kew [:jfkthame]

Comment 12

•

1 month ago

The cache size/strategy is a tricky one, because it is so dependent on the nature of the content we're dealing with. Not to say it can't be improved, but we should be wary of focusing too much on just a few examples.

Regarding locking, for HTML content we always shape on the main thread, but since we implemented text rendering for offscreen canvas, it's also possible for us to do shaping from DOM worker threads. So that's why we had to add locking.

An alternative would be to avoid ever sharing font instances between threads, but that potentially has other downsides (increased memory usage, and losing the possibility of the word cache being usefully shared across threads -- e.g. if several workers are using the same font, they don't all have to shape the same words independently).

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

1 month ago

Keywords: meta

You need to log in before you can comment on or make changes to this bug.