investigate removing glyph data from gfxTextRun, and instead store a list of references to gfxShapedWords

RESOLVED WONTFIX

Status

()

RESOLVED WONTFIX
7 years ago
6 years ago

People

(Reporter: jfkthame, Unassigned)

Tracking

Trunk
x86
Mac OS X
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

7 years ago
Now that shaped glyph data is cached on a per-word basis by fonts, we may be able to optimize gfxTextRun construction and memory use by eliminating the need to copy the glyph data into each run, and instead store a list of references to the relevant gfxShapedWord records.

See bug 703100 comment 59 for a possible approach.
(Reporter)

Updated

7 years ago
Depends on: 703100
Here's what I wrote:

> I had an idea about storing lists of words in textruns which I'll write down here > for posterity... Something like this:
>
> Store three arrays in each textrun:
> 1) The "word list array", an array of pairs, one for each word that occurs
> (nsRefPtr<gfxShapedWord> word, PRUint32 character offset of the start of the word
> in the textrun)
> 2) The "base word index array", an array of (#characters + 127)/128 PRUint32
> indexes into the word list array. The value at index i is the index of the first
> word that includes a character with offset >= i*128.
> 3) The "word index array", an array of #characters byte. The value at index i is
> either some magic value (e.g. 0xFF) to indicate character i is a space, or
> another value which is added to the "base word index array"[i/128] to yield the
> index into the "word list array" for the word containing character i.
>
> This is a) compact (slightly over 1 byte per character when words are very long,
> 9 bytes per character for the pessimal case of 1-character words) and b) fairly
> efficient (finding the word containing a given character, and the offset of that
> character within the word, is three array lookups and a conditional branch on
> whether it's a space).

The space analysis is a bit off. For 1-character words separated by single spaces, which is about pessimal, the overhead is 10 bytes for every two characters, so 5 bytes per character.
(Reporter)

Comment 2

7 years ago
Created attachment 606547 [details] [diff] [review]
part 1 - remove unused gfxTextRun::ClusterIterator

Posting some WIP here... first, some cleanup to reduce the amount of existing textrun code that needs updating. The gfxTextRun::ClusterIterator class is not used anywhere, so let's remove it.
(Reporter)

Comment 3

7 years ago
Created attachment 606548 [details] [diff] [review]
part 2 - textrun constification

The patches here will move us towards a cleaner API for dealing with textruns, where external code will no longer be allowed to rummage around in their internals.

Start by adding "const" to a bunch of textrun accessor methods, so that we can pass const pointers and references around in more places, and be sure the textrun isn't being tampered with unexpectedly.
(Reporter)

Comment 4

7 years ago
Created attachment 606550 [details] [diff] [review]
part 3 - introduce gfxTextRun::CharGlyphIterator to access the character/glyph records

In preparation for removing the mCharacterGlyphs array from the textrun, introduce a CharGlyphIterator class that can be used to access the glyph data, and remove the GetCharacterGlyphs() method that gives direct access to the array.

The initial implementation of the iterator is trivial, as it just accesses the existing array, but once we move all textrun clients to this accessor, then we'll be free to modify the internal representation.

This currently has a couple of FIXMEs in it, where code was using GetCharacterGlyphs() to directly access and modify cluster information - these need to be re-worked somehow.
(Reporter)

Comment 5

7 years ago
Created attachment 606553 [details] [diff] [review]
part 4 - pass CharGlyphIterators instead of integer offsets in various gfxFont apis

When we remove the mCharacterGlyphs array and go to a list-of-shaped-words representation, setting up a CharGlyphIterator for an arbitrary character offset in the run will become slightly more expensive. Therefore, once we have an iterator for a given offset, we should pass that around rather than passing the offset and re-creating a new iterator in the called function. So this patch changes the arguments of some internal methods from integer offsets to iterators.
(Reporter)

Comment 6

6 years ago
Closing this as WONTFIX, in view of the conflict between this approach and bugs 825875 and 761442, which rely on the option to bypass the shaped-word cache entirely for cases where its cost/benefit ratio is not worthwhile, or it actually prevents correct rendering (i.e. with features that span inter-word spaces).

Some of the initial "cleanup" from here might still be worthwhile in principle, but would likely need significant updating anyway since bug 825871 refactored textrun and shapedword code.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.