Closed Bug 703100 Opened 14 years ago Closed 14 years ago

remove the existing text-run word cache and replace with a simpler and more efficient scheme

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla12

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

(Whiteboard: [Snappy])

Attachments

(12 files, 12 obsolete files)

pt 1 - eliminate gfxTextRunWordCache and gfxTextRunCache. 14 years ago Jonathan Kew [:jfkthame] 110.76 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.1 - implement gfxShapedWord caching for gfxFont instances. 14 years ago Jonathan Kew [:jfkthame] 135.55 KB, patch		Details \| Diff \| Splinter Review
pt 2.2 - adapt Mac font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 24.09 KB, patch		Details \| Diff \| Splinter Review
pt 2.3 - adapt Windows font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 52.49 KB, patch		Details \| Diff \| Splinter Review
pt 2.4 - adapt Linux/Pango font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 35.60 KB, patch		Details \| Diff \| Splinter Review
pt 2.5 - adapt Android/FT2 font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 7.50 KB, patch		Details \| Diff \| Splinter Review
pt 3 - remove copy of original characters from gfxTextRun. 14 years ago Jonathan Kew [:jfkthame] 60.61 KB, patch		Details \| Diff \| Splinter Review
pt 4 - add timed expiration of cached gfxShapedWord records. 14 years ago Jonathan Kew [:jfkthame] 11.26 KB, patch		Details \| Diff \| Splinter Review
pt 5 - optimize allocation of gfxTextRun objects to avoid separate allocation for CompressedGlyph records. 14 years ago Jonathan Kew [:jfkthame] 19.79 KB, patch		Details \| Diff \| Splinter Review
pt 6 - remove pango-specific todo()s in test_backspace_delete, now that it passes on all platforms. 14 years ago Jonathan Kew [:jfkthame] 10.44 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 7 - fix fragile reftests that depend on metrics of fallback fonts used for invisible chars. 14 years ago Jonathan Kew [:jfkthame] 3.66 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 4, v2 - add timed expiration of cached gfxShapedWord records 14 years ago Jonathan Kew [:jfkthame] 13.90 KB, patch		Details \| Diff \| Splinter Review
pt 2.1 v2 - implement gfxShapedWord caching for gfxFont instances. 14 years ago Jonathan Kew [:jfkthame] 150.82 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.2 v2 - adapt Mac font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 24.08 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.3 v2 - adapt Windows font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 52.93 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.4 v2 - adapt Linux/Pango font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 36.18 KB, patch		Details \| Diff \| Splinter Review
pt 2.5 v2 - adapt Android/FT2 font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 8.24 KB, patch		Details \| Diff \| Splinter Review
pt 3 v2 - remove copy of original characters from gfxTextRun. 14 years ago Jonathan Kew [:jfkthame] 62.07 KB, patch		Details \| Diff \| Splinter Review
pt 4 v3 - add timed expiration of cached gfxShapedWord records 14 years ago Jonathan Kew [:jfkthame] 13.76 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 5 v2 - optimize allocation of gfxTextRun objects to avoid separate allocation for CompressedGlyph records. 14 years ago Jonathan Kew [:jfkthame] 19.80 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.5 v3 - adapt Android/FT2 font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 10.45 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.4 v3 - adapt Linux/Pango font code to work with gfxShapedWord caches. 14 years ago Jonathan Kew [:jfkthame] 33.74 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 2.4.1 - make gfxPangoFontGroup font-matching behavior more similar to generic gfxFontGroup version. 14 years ago Jonathan Kew [:jfkthame] 2.77 KB, patch	roc : review+	Details \| Diff \| Splinter Review
pt 3 v3 - remove copy of original characters from gfxTextRun. 14 years ago Jonathan Kew [:jfkthame] 63.71 KB, patch	roc : review+	Details \| Diff \| Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Description

•

14 years ago

(Not sure whether to file this under Layout:Text or Graphics; it seems to straddle the boundary...) The existing textrun word cache implementation has a number of problems: - Cache entries refer to a range within a textrun, which means the glyph data remains owned by a particular textrun, and the lifetime of the cache entry is tied to the lifetime of the textrun where the word was first encountered. - Destroying a textrun requires iterating over its text to find all the words and check whether there are cache entries that refer to them; if so, those entries need to be removed. This makes textrun destruction unnecessarily expensive. - The presence of a single character outside the 8-bit range when a textrun is constructed means that every word in that run will be handled as 16-bit text, even if most of them could be handled as 8-bit. - The "same" word in an 8-bit and a 16-bit textrun is shaped and cached separately. - Textruns have to keep a copy of the original text for the cache entries to refer to, even though for drawing and measurement all we need is the shaped glyph data. I'm working on a set of patches to replace this global cache with a scheme where each gfxFont instance will own the text and glyph data of the words that it has shaped, so that shaped-word lifetime can be decoupled from the lifetime of the textrun where the word happened to be seen first. This will allow us to manage the expiration of cached words better. This will also unify the caching of words from 8-bit and 16-bit textruns on a per-word basis, so that we aren't forced to cache lots of ASCII-only words in a 16-bit form (in addition to an 8-bit form) just because a single non-ASCII character such as a dash or curly quote happened to occur somewhere in the text. I believe this will also make it possible to avoid copying the original text into (most) textruns, which should be a perf and memory win. Once the ownership of shaped-word data is moved from textruns to the gfxFont instances, it may be possible to change the structure of textruns so that we don't copy glyph data into them at all, but instead hold an array of references to shaped words. This could be a further memory win except for extreme edge cases such as a textrun containing a series of single-character words, but it is not entirely clear whether it can be done without hurting performance.

Jonathan Kew [:jfkthame]

Assignee

Comment 1

•

14 years ago

Attached patch pt 1 - eliminate gfxTextRunWordCache and gfxTextRunCache. — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 2

•

14 years ago

Attached patch pt 2.1 - implement gfxShapedWord caching for gfxFont instances. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 3

•

14 years ago

Attached patch pt 2.2 - adapt Mac font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

14 years ago

Attached patch pt 2.3 - adapt Windows font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 5

•

14 years ago

Attached patch pt 2.4 - adapt Linux/Pango font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 6

•

14 years ago

Attached patch pt 2.5 - adapt Android/FT2 font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 7

•

14 years ago

Attached patch pt 3 - remove copy of original characters from gfxTextRun. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 8

•

14 years ago

Attached patch pt 4 - add timed expiration of cached gfxShapedWord records. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

14 years ago

Attached patch pt 5 - optimize allocation of gfxTextRun objects to avoid separate allocation for CompressedGlyph records. (obsolete) — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 10

•

14 years ago

Attached patch pt 6 - remove pango-specific todo()s in test_backspace_delete, now that it passes on all platforms. — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 11

•

14 years ago

Attached patch pt 7 - fix fragile reftests that depend on metrics of fallback fonts used for invisible chars. — Details — Splinter Review

Jonathan Kew [:jfkthame]

Assignee

Comment 12

•

14 years ago

The patches above implement the revised caching scheme for shaped glyph data, as outlined in comment #0. Tryserver results suggest that on some platforms, we may see a Tp improvement of up to about 2%, while on others the effect is insignificant compared to the existing noise on the measurements. In real-world browsing, we should see an improvement in the cache hit rate (as we won't throw away cached words that are still being frequently used just because the original textrun where they were created has been discarded). We may want to add telemetry that reports the cache statistics, to help us fine-tune the gfxShapedWord and gfxFont cache expiration parameters; I propose to file a followup on this. Regarding the individual patches: 1 - Removes the existing caching scheme in preparation for the new implementation. Users of gfxTextRunCache and gfxTextRunWordCache are revised to call gfxFontGroup::MakeTextRun directly, as the new caches will operate below this level. This patch alone would be expected to regress performance, but it simplifies/cleans up the code in readiness for the new cache. 2.1-2.5 - Implement caching within each gfxFont instance, based on "words" (space-separated runs of characters) found when constructing text-runs. Note that all the 2.x parts _must_ land together, as the API of the platform-specific shaper classes is changed so that the tree won't even build until those backends are updated. The patch is split into platform-specific pieces to provide more manageable chunks for review. Part 2.1 (the generic gfxFont support for word-caching, and gfxHarfBuzzShaper adaptation) seems bigger than it really is because a bunch of code moves from gfxTextRun to the new gfxShapedWord class, but is essentially unchanged. 3 - The text-run no longer needs to keep a copy of its original text, as we no longer cache "words" by pointing at segments of a text-run. However, some layout code does need to know where specific characters (space, tab, newline) occurred, so this is recorded in the CompressedGlyph record using new flag bits. 4 - The word caches will automatically disappear when their owning gfxFont instances are deleted, but in the case of a long-lived gfxFont, the cache might grow quite large. So this adds timed expiration of cached words that have not been recently used. (Once we have telemetry to monitor cache behavior, we can try tuning the expiration time both here and for gfxFontCache.) 5 - Optimize text-run creation to allocate the gfxTextRun object and its array of CompressedGlyph records in a single operation; this is easier to do now that gfxTextRun has been simplified by removal of the original text. 6 - The new implementation avoids relying on Pango's clustering support, which avoids the problem discussed in bug 474068 and means we can remove the pango-version-specific todo() stuff in this test. 7 - Fix up a couple of reftests that are sensitive to the metrics of fonts chosen during fallback for zero-width invisible characters like ZWNJ.

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579285 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579286 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579287 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579288 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579289 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579290 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579291 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579292 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579293 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579294 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579295 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Comment 13

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #12) > 6 - The new implementation avoids relying on Pango's clustering support, > which avoids the problem discussed in bug 474068 and means we can remove the > pango-version-specific todo() stuff in this test. As a side benefit, this results in the correct rendering (as per attachment 495460 [details], mentioned in bug 474068 comment 66) of :first-letter with Thai examples such as data:text/html;charset=utf-8,<style>:first-letter{font-size:4em;}</style>เมื่อ which currently (because of the Pango cluster issue) applies :first-letter to too many characters on Linux.

Jonathan Kew [:jfkthame]

Assignee

Comment 14

•

14 years ago

Filed bug 707959 as a followup on adding telemetry to help us tune the caching behavior.

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Blocks: 708075

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Blocks: 707959

Jeff Muizelaar [:jrmuizel]

Comment 15

•

14 years ago

I'm not going to have time to review all of these, so I suggest getting other reviewers for the details. jdagget and bas are good candidates as they have touched more of the shaping code then I ever have. I have however looked over the code to get a general idea of what's going on. I really like the direction that it's going except for cache expiry. I don't really like the idea of each gfxFont having it's own timer. Further, I feel that it would be better to have cache size/expiry managed globally instead of for each gfxFont. In the recent past we used to have a timer per image to do expiry and that was bad. I know that this isn't as extreme as that, but I'd like to avoid getting close. What do you think? Overall though, the new code seems much easier to understand! Thanks for doing it.

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579285 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579286 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579287 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579289 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579290 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579291 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579292 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579293 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579294 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579295 - Flags: review?(jmuizelaar)

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Attachment #579288 - Flags: review?(jmuizelaar)

Jonathan Kew [:jfkthame]

Assignee

Comment 16

•

14 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #15) > don't really like the idea of each gfxFont having it's own timer. Further, I > feel that it would be better to have cache size/expiry managed globally > instead of for each gfxFont. I've gone back and forth on that while working on this, and tried both approaches (with no clear indication of any difference in performance one way or the other). The per-font timers seemed like the "cleanest" design to me, but a single global timer for aging the cached-word entries is marginally less memory overhead. So given your feeling, I'll post an alternative patch taking that approach; I don't really have a preference either way.

Jet Villegas (inactive)

Comment 17

•

14 years ago

Adding [Snappy] search string. This fix aims to improve actual (and hopefully, perceived) layout and rendering performance on pages with large blocks of text (eg. wikipedia.)

Whiteboard: [Snappy]

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

14 years ago

I wrote a lot of this code and I'm happy to review some or all of these patches.

Jonathan Kew [:jfkthame]

Assignee

Comment 19

•

14 years ago

Attached patch pt 4, v2 - add timed expiration of cached gfxShapedWord records (obsolete) — Details — Splinter Review

Alternate implementation of cached-word expiration, using a single global timer instead of per-font timers to age the words.

Assignee: nobody → jfkthame

Attachment #579292 - Attachment is obsolete: true

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579285 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579286 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579287 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579288 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579289 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579290 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579291 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579293 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579294 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579295 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Attachment #579988 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 20

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18) > I wrote a lot of this code and I'm happy to review some or all of these > patches. You're on! It's not really quite as much as it looks - there are a bunch of semi-mechanical changes, basically moving functionality from gfxTextRun to gfxShapedWord and adapting the various gfxFontShaper subclasses to implement ShapeWord() instead of InitTextRun(). So there's not nearly as much actual new code or logic as the quantity of patch might imply. :)

John Daggett (:jtd)

Comment 21

•

14 years ago

It would be nice to have instrument the code in gfxFont::GetShapedWord to record when hits/misses occur, then record telemetry data on that (miss ratio per text run?).

Jonathan Kew [:jfkthame]

Assignee

Comment 22

•

14 years ago

(In reply to John Daggett (:jtd) from comment #21) > It would be nice to have instrument the code in gfxFont::GetShapedWord to > record when hits/misses occur, then record telemetry data on that (miss > ratio per text run?). I already filed bug 707959 as a followup for this.

John Daggett (:jtd)

Comment 23

•

14 years ago

Another thing to consider is skipping the word cache altogether for scripts like CJK where I doubt word caching is buying us anything, since words are not generally delineated by spaces in CJK text and the miss ratio is probably very high.

Jonathan Kew [:jfkthame]

Assignee

Comment 24

•

14 years ago

(In reply to John Daggett (:jtd) from comment #23) > Another thing to consider is skipping the word cache altogether for scripts > like CJK where I doubt word caching is buying us anything, since words are > not generally delineated by spaces in CJK text and the miss ratio is > probably very high. That's an interesting idea - might well be a win. (Hmm, that should be "CJ", not "CJK", as Korean _does_ use word spaces. OTOH, things like Thai, Lao and Khmer would likely fall into the not-worth-caching category.) Another possible tweak that might have a similar overall effect would be to skip the cache for "words" above a certain (to-be-determined) length threshold, as excessively long "words" are unlikely to be used repeatedly. In both cases, though, we should check how often we end up reconstructing textruns more than once for the same document text. If this happens frequently, then the cache could be giving us some benefit even if it ends up caching each complete CJK paragraph separately. Maybe we could get telemetry to report cache miss rates on a per-script basis, to see if the word cache is clearly not proving useful for certain scripts?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #24) > That's an interesting idea - might well be a win. (Hmm, that should be "CJ", > not "CJK", as Korean _does_ use word spaces. OTOH, things like Thai, Lao and > Khmer would likely fall into the not-worth-caching category.) > > Another possible tweak that might have a similar overall effect would be to > skip the cache for "words" above a certain (to-be-determined) length > threshold, as excessively long "words" are unlikely to be used repeatedly. This would require careful measurement. Web pages often aren't prose. The same terms, even whole sentences, may be repeated in UI elements or headings for example. We also benefit from the word cache when navigating between pages that share a lot of the same text content. > In both cases, though, we should check how often we end up reconstructing > textruns more than once for the same document text. If this happens > frequently, then the cache could be giving us some benefit even if it ends > up caching each complete CJK paragraph separately. That too.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #579285 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

14 years ago

Comment on attachment 579286 [details] [diff] [review] pt 2.1 - implement gfxShapedWord caching for gfxFont instances. Review of attachment 579286 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1543,5 @@ > + > + entry = mWordCache.PutEntry(key); > + if (!entry) { > + NS_WARNING("failed to create cache entry for gfxShapedWord - expect missing text"); > + return nsnull; I think these are infallible now, so no need to check results here or in our callers. @@ +1565,5 @@ > + const PRUint8* src = (const PRUint8*)aText; > + PRUnichar *dest = utf16.BeginWriting(); > + while (dest < utf16.EndWriting()) { > + *dest++ = *src++; > + } Use LossyConvertEncoding8to16 or AppendASCIItoUTF16. @@ +1646,5 @@ > + > + for (PRUint32 i = 0; i <= aRunLength; ++i) { > + T ch = i < aRunLength ? text[i] : '\n'; > + bool boundary = IsBoundarySpace(ch); > + bool invalid = boundary ? false : gfxFontGroup::IsInvalidChar(ch); !boundary && gfxFontGroup::IsInvalidChar(ch) Also I think we should declar ch, boundary and invalid and then do an "if (i < aRunLength) ... else { ch = '\n'; boundary = false; invalid = true; }" @@ +1670,5 @@ > + PRUint8 cat = > + gfxUnicodeProperties::GetGeneralCategory(ch); > + if (cat < HB_CATEGORY_COMBINING_MARK || > + cat > HB_CATEGORY_NON_SPACING_MARK) > + { { on previous line @@ +1696,5 @@ > + continue; > + } > + > + // We've decided to break here (i.e. we're at the end of a "word", > + // of the word is becoming excessively long): shape the word and "or the word" @@ +1703,5 @@ > + gfxShapedWord *sw = nsnull; > + if (sizeof(T) == sizeof(PRUnichar) && wordIs8Bit) { > + nsAutoTArray<PRUint8,256> bytes; > + PRUint8 *bp = bytes.AppendElements(length); > + if (bp) { This is an infallible array (default) so you don't need to check the result. Converting to 8-bit text here seems undesirable since on a cache hit, the 16-bit version of GetShapedWord would have done just as well, and on a cache miss, the 8-bit version of GetShapedWord has to reconvert the text to 16-bit for shaping. I guess you're doing it this way because the 8-bit flag is part of the cache key, and we want 8-bit words in 16-bit strings to hit 8-bit words from 8-bit strings. Can't we do that without converting the text? Pass the wordIs8Bit flag into GetShapedWord and then into CacheHashKey, and when it compares the strings, use an 8-to-16 comparison path if necessary? @@ +1730,5 @@ > + > + if (boundary) { > + // word was terminated by a space: add that to the textrun > + if (!aTextRun->SetSpaceGlyphIfSimple(this, aContext, > + aRunStart + i)) Better add a test font for non-simple space glyphs, since otherwise I bet the following code would never be tested. @@ +2344,5 @@ > > bool > +gfxFontGroup::IsInvalidChar(PRUint8 ch) > +{ > + return ((ch & 0x7f) < 0x20); What's the reasoning behind changing IsInvalidChar? I thought it was better to be conservative, because there might be fonts that render glyphs for some of the control characters. @@ +2764,5 @@ > + > + if (sizeof(T) == sizeof(PRUnichar) && aLength > 0) { > + gfxTextRun::CompressedGlyph *glyph = aTextRun->GetCharacterGlyphs(); > + if (!glyph->IsSimpleGlyph()) { > + glyph->SetClusterStart(true); Why is this needed? Can't we get rid of it? ::: gfx/thebes/gfxFont.h @@ +2411,5 @@ > const DetailedGlyph *aGlyphs); > void SetMissingGlyph(PRUint32 aCharIndex, PRUint32 aUnicodeChar); > void SetSpaceGlyph(gfxFont *aFont, gfxContext *aContext, PRUint32 aCharIndex); > > + bool SetSpaceGlyphIfSimple(gfxFont *aFont, gfxContext *aContext, Document this! ::: gfx/thebes/gfxPlatform.h @@ +70,5 @@ > class gfxTextRun; > class nsIURI; > class nsIAtom; > > +class CompressedGlyph; Why add this here? It doesn't seem to be used.

Jonathan Kew [:jfkthame]

Assignee

Comment 27

•

14 years ago

Attached patch pt 2.1 v2 - implement gfxShapedWord caching for gfxFont instances. — Details — Splinter Review

Attachment #579286 - Attachment is obsolete: true

Attachment #579286 - Flags: review?(roc)

Attachment #585663 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 28

•

14 years ago

Attached patch pt 2.2 v2 - adapt Mac font code to work with gfxShapedWord caches. — Details — Splinter Review

Attachment #579287 - Attachment is obsolete: true

Attachment #579287 - Flags: review?(roc)

Attachment #585664 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 29

•

14 years ago

Attached patch pt 2.3 v2 - adapt Windows font code to work with gfxShapedWord caches. — Details — Splinter Review

Attachment #579288 - Attachment is obsolete: true

Attachment #579288 - Flags: review?(roc)

Attachment #585665 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 30

•

14 years ago

Attached patch pt 2.4 v2 - adapt Linux/Pango font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Attachment #579289 - Attachment is obsolete: true

Attachment #579289 - Flags: review?(roc)

Attachment #585667 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 31

•

14 years ago

Attached patch pt 2.5 v2 - adapt Android/FT2 font code to work with gfxShapedWord caches. (obsolete) — Details — Splinter Review

Attachment #579290 - Attachment is obsolete: true

Attachment #579290 - Flags: review?(roc)

Attachment #585668 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 32

•

14 years ago

Attached patch pt 3 v2 - remove copy of original characters from gfxTextRun. (obsolete) — Details — Splinter Review

Attachment #579291 - Attachment is obsolete: true

Attachment #579291 - Flags: review?(roc)

Attachment #585669 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 33

•

14 years ago

Attached patch pt 4 v3 - add timed expiration of cached gfxShapedWord records — Details — Splinter Review

Attachment #579988 - Attachment is obsolete: true

Attachment #579988 - Flags: review?(roc)

Attachment #585670 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 34

•

14 years ago

Attached patch pt 5 v2 - optimize allocation of gfxTextRun objects to avoid separate allocation for CompressedGlyph records. — Details — Splinter Review

Attachment #579293 - Attachment is obsolete: true

Attachment #579293 - Flags: review?(roc)

Attachment #585671 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 35

•

14 years ago

Updated pt 2.1 as per comment #26, and further updated this and the remaining patches to adjust for the landing of Graphite and other changes on trunk. (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #26) > Comment on attachment 579286 [details] [diff] [review] > pt 2.1 - implement gfxShapedWord caching for gfxFont instances. > @@ +1730,5 @@ > > + > > + if (boundary) { > > + // word was terminated by a space: add that to the textrun > > + if (!aTextRun->SetSpaceGlyphIfSimple(this, aContext, > > + aRunStart + i)) > > Better add a test font for non-simple space glyphs, since otherwise I bet > the following code would never be tested. Actually, it is tested already by existing crashtests: we have some tests that use huge font sizes, and in this case the advance of the space glyph will not fit within the available number of bits of a "simple" glyph record, and so SetSpaceGlyphIfSimple bails and we use the fallback path. > @@ +2344,5 @@ > > > > bool > > +gfxFontGroup::IsInvalidChar(PRUint8 ch) > > +{ > > + return ((ch & 0x7f) < 0x20); > > What's the reasoning behind changing IsInvalidChar? Simplicity/efficiency, reducing the number of separate comparison operations. This is a hot code path, so it seemed worth optimizing somewhat. > I thought it was better > to be conservative, because there might be fonts that render glyphs for some > of the control characters. There's no reason to render glyphs for these, given that (as per Unicode spec) they're not printable graphic characters. It's conceivable that fonts using private, non-Unicode encodings could put visible glyphs at these codepoints (and in fact that used to be done, in the old pre-Unicode world of "hacked 8-bit Indic" fonts, etc), but we simply don't support such fonts - hence some of the "bug reports" we used to get re Indian sites that required "custom" fonts - but this practice seems to be dying anyway. If we ever wanted to implement a "show invisibles" mode that renders something visible for the various control characters, we'd need to handle that differently anyhow, as we can't rely on normal fonts having such glyphs, and we already make assumptions about glyph rendering based on standard Unicode semantics - e.g. we render   using the font's "space" glyph, rather than its "nonbreakingspace" glyph. This is just an extension of that optimization: we know what Unicode specifies about the rendering of these characters - they're non-printing - and so we can take a shortcut. > @@ +2764,5 @@ > > + > > + if (sizeof(T) == sizeof(PRUnichar) && aLength > 0) { > > + gfxTextRun::CompressedGlyph *glyph = aTextRun->GetCharacterGlyphs(); > > + if (!glyph->IsSimpleGlyph()) { > > + glyph->SetClusterStart(true); > > Why is this needed? Can't we get rid of it? I don't think so, because we expect the first character of a textrun to be marked as a cluster start; but that is not true for individual ShapedWords (it'll normally be true, of course, but there are edge cases where it isn't - usually when font fallback causes a font switch between a base letter and a diacritic). > ::: gfx/thebes/gfxPlatform.h > @@ +70,5 @@ > > class gfxTextRun; > > class nsIURI; > > class nsIAtom; > > > > +class CompressedGlyph; > > Why add this here? It doesn't seem to be used. Duh. Residue of older version.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

14 years ago

Comment on attachment 585663 [details] [diff] [review] pt 2.1 v2 - implement gfxShapedWord caching for gfxFont instances. Review of attachment 585663 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1548,5 @@ > + if (entry) { > + return entry->mShapedWord; > + } > + > + entry = mWordCache.PutEntry(key); Oops, one thing I didn't notice before. Just use PutEntry instead of GetEntry/PutEntry. If mShapedWord is nonnull we know it was a hit.

Attachment #585663 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

14 years ago

Comment on attachment 585664 [details] [diff] [review] pt 2.2 v2 - adapt Mac font code to work with gfxShapedWord caches. Review of attachment 585664 [details] [diff] [review]: ----------------------------------------------------------------- use mozilla::ArrayLength instead of NS_ARRAY_LENGTH.

Attachment #585664 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

14 years ago

Comment on attachment 585665 [details] [diff] [review] pt 2.3 v2 - adapt Windows font code to work with gfxShapedWord caches. Review of attachment 585665 [details] [diff] [review]: ----------------------------------------------------------------- Do we still need gfxGDIShaper for anything? Presumably Harfbuzz completely supercedes it? Why not get rid of gfxGDIShaper in a followup? How do you know that the IDWriteTextAnalyzer is reusable this way? ::: gfx/thebes/gfxUniscribeShaper.cpp @@ +322,5 @@ > gfxTextRun::DetailedGlyph *details = &detailedGlyphs[i]; > details->mGlyphID = mGlyphs[k + i]; > + details->mAdvance = mAdvances[k + i] * appUnitsPerDevUnit; > + details->mXOffset = float(mOffsets[k + i].du) * appUnitsPerDevUnit * > + (aShapedWord->IsRightToLeft() ? -1.0 : 1.0); Probably worth adding GetDirection to gfxShapedWord and using it here.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 39

•

14 years ago

Comment on attachment 585667 [details] [diff] [review] pt 2.4 v2 - adapt Linux/Pango font code to work with gfxShapedWord caches. Review of attachment 585667 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxPangoFonts.cpp @@ +2050,5 @@ > { > + // if this character is a join-control or the previous is a join-causer, > + // use the same font as the previous range if we can > + if (gfxFontUtils::IsJoinControl(aCh) || gfxFontUtils::IsJoinCauser(aPrevCh)) { > + if (aPrevMatchedFont && aPrevMatchedFont->HasCharacter(aCh)) { Seems like you're changing some behavior here. Would be better to separate that from the gfxShapedWord changes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

14 years ago

Comment on attachment 585668 [details] [diff] [review] pt 2.5 v2 - adapt Android/FT2 font code to work with gfxShapedWord caches. Review of attachment 585668 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFT2Fonts.cpp @@ +459,5 @@ > > + if (IsSyntheticBold()) { > + float synBoldOffset = > + GetSyntheticBoldOffset();//FIXME * CalcXScale(aContext); > + aShapedWord->AdjustAdvancesForSyntheticBold(synBoldOffset); So how are we going to fix the FIXME?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

14 years ago

Comment on attachment 585669 [details] [diff] [review] pt 3 v2 - remove copy of original characters from gfxTextRun. Review of attachment 585669 [details] [diff] [review]: ----------------------------------------------------------------- Can we avoid having to create detailedGlyphs for tabs and newlines by overloading the glyph field of a SimpleGlyph to store the character data? We know there aren't going to be glyphs for those characters. ::: gfx/thebes/gfxFont.cpp @@ +1521,5 @@ > + PRUint8 category = gfxUnicodeProperties::GetGeneralCategory(aUSV); > + return ((category >= HB_CATEGORY_COMBINING_MARK && > + category <= HB_CATEGORY_NON_SPACING_MARK) || > + (aUSV >= 0x200c && aUSV <= 0x200d) || // ZWJ, ZWNJ > + (aUSV >= 0xff9e && aUSV <= 0xff9f)); Comment this range

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

14 years ago

In particular we could use a bit to represent "is this a control or space character" instead of "is this a space character", encode the character code in the glyph field, and if we want to support rendering non-blank space glyphs we can check for a space in gfxFont::Draw and use the font's cached space glyph if necessary.

Jonathan Kew [:jfkthame]

Assignee

Comment 43

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38) > Comment on attachment 585665 [details] [diff] [review] > pt 2.3 v2 - adapt Windows font code to work with gfxShapedWord caches. > Do we still need gfxGDIShaper for anything? Presumably Harfbuzz completely > supercedes it? Why not get rid of gfxGDIShaper in a followup? At present, we don't attempt to use harfbuzz for fonts that are not sfnt-based (truetype/opentype) - i.e. legacy formats such as .fon bitmaps or .pfb type1 fonts - we just send them through the old platform shaping code. We might need to do some work on the callbacks harfbuzz uses to access font data, to ensure they work properly with non-sfnt fonts. So I don't think we're in a position to do this right now, but could be worth doing as a followup. > How do you know that the IDWriteTextAnalyzer is reusable this way? Because it works? :) No, seriously, it's OK because this is just a stateless interface object, it just exists to provide access to the various methods - in effect, it's like a namespace for a bunch of global functions that we need to access. It doesn't have instance-specific state that would mean we need separate instances for each use. "State" is maintained by the client in the various structures and arrays that are passed to and from the IDWriteTextAnalyzer methods.

Jonathan Kew [:jfkthame]

Assignee

Comment 44

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #39) > Comment on attachment 585667 [details] [diff] [review] > pt 2.4 v2 - adapt Linux/Pango font code to work with gfxShapedWord caches. > ::: gfx/thebes/gfxPangoFonts.cpp > @@ +2050,5 @@ > > { > > + // if this character is a join-control or the previous is a join-causer, > > + // use the same font as the previous range if we can > > + if (gfxFontUtils::IsJoinControl(aCh) || gfxFontUtils::IsJoinCauser(aPrevCh)) { > > + if (aPrevMatchedFont && aPrevMatchedFont->HasCharacter(aCh)) { > > Seems like you're changing some behavior here. Would be better to separate > that from the gfxShapedWord changes. The main change this introduces is that format-controls which aren't supported in a particular font will now fall back to a font that does support them - which is the same thing we do elsewhere. (I.e., it's changing to bring it more into line with the code in gfxFont.cpp that we use on other platforms.) I think I probably had a specific reason for doing this, but I can't recall offhand - I could try without it, and see whether something breaks.

Jonathan Kew [:jfkthame]

Assignee

Comment 45

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #40) > Comment on attachment 585668 [details] [diff] [review] > pt 2.5 v2 - adapt Android/FT2 font code to work with gfxShapedWord caches. > ::: gfx/thebes/gfxFT2Fonts.cpp > @@ +459,5 @@ > > > > + if (IsSyntheticBold()) { > > + float synBoldOffset = > > + GetSyntheticBoldOffset();//FIXME * CalcXScale(aContext); > > + aShapedWord->AdjustAdvancesForSyntheticBold(synBoldOffset); > > So how are we going to fix the FIXME? Oops! (I'm surprised that didn't result in any test failures.... should try to create a test that covers it.) The fix will be to make CalcXScale a method on gfxFont instead of just a static function in gfxFont.cpp, so that the gfxFT2Font subclass can also use it.

Jonathan Kew [:jfkthame]

Assignee

Comment 46

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #41) > Comment on attachment 585669 [details] [diff] [review] > pt 3 v2 - remove copy of original characters from gfxTextRun. > Can we avoid having to create detailedGlyphs for tabs and newlines by > overloading the glyph field of a SimpleGlyph to store the character data? We > know there aren't going to be glyphs for those characters. I don't think this is needed, and it would probably hurt drawing performance as it would make spaces more expensive to process - we'd no longer be able to treat them just like all the other simple glyphs in a typical textrun. Note that for tabs and newlines, the CompressedGlyph record will *not* normally be a simple glyph, it will be a blank missing glyph record, and so we won't need to "convert" a simple glyph to a DetailedGlyph, we'll just set the relevant flag within the CompressedGlyph, but its glyph count will remain zero and no DetailedGlyph will be allocated. The code in gfxTextRun::SetIs{Tab,Newline} to convert a simple glyph record to a detailed glyph is there as a precaution, to avoid the risk that calling these methods could result in garbled records, but should rarely if ever be needed in practice.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

14 years ago

Attachment #585665 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 48

•

14 years ago

Comment on attachment 585669 [details] [diff] [review] pt 3 v2 - remove copy of original characters from gfxTextRun. Review of attachment 585669 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.h @@ +1717,5 @@ > (FLAG_NOT_LIGATURE_GROUP_START | FLAG_NOT_MISSING); > } > > + bool CharIsSpace() const { > + return (mValue & FLAG_CHAR_IS_SPACE) != 0; Document that this is a breakable/trimmable space (0x20) only. @@ +2543,5 @@ > // the font shaper and go through the shaped-word cache for most spaces. > // > + // The parameter aTrimmableSpace is set to true for "normal" space > + // characters, false if this was a no-break space (which should not > + // be trimmed if it falls at a run end). Boolean parmaeters suck. Here you could just pass the PRUnichar character code, which would be a little clearer. ::: layout/generic/nsTextFrameThebes.cpp @@ +849,1 @@ > bool aSuppressSink); Let's make these bools into a flags word. @@ +1372,5 @@ > + if (bufferSize < mMaxTextLength || bufferSize == PR_UINT32_MAX || > + !buffer.AppendElements(bufferSize)) { > + return; > + } > + SetupLineBreakerContext(textRun, buffer.Elements()); Why is the buffer a parameter to SetupLineBreakerContext? Can't SetupLineBreakerContext set up the buffer itself? @@ +2018,5 @@ > +// So it does the same walk over the mMappedFlows, but doesn't actually > +// build a new textrun. > +void > +BuildTextRunsScanner::SetupLineBreakerContext(gfxTextRun *aTextRun, > + void *aTextBuffer) Hmm ... if the textrun contained a list of shaped words, couldn't we get the text from the words? Might be simpler and faster than this. If we end up not copying glyph data into the textrun and reference it via words instead, then we'd be getting the text via those words as well. Having an intermediate state where we copy the glyph data and keep a list of the words seems OK since it should still be an improvement --- or at least not much worse than --- keeping the glyph data and the text in most cases.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 49

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #44) > The main change this introduces is that format-controls which aren't > supported in a particular font will now fall back to a font that does > support them - which is the same thing we do elsewhere. (I.e., it's changing > to bring it more into line with the code in gfxFont.cpp that we use on other > platforms.) Even if we take this change in this bug, it would still be good to have it in a separate patch. (In reply to Jonathan Kew (:jfkthame) from comment #45) > The fix will be to make CalcXScale a method on gfxFont instead of just a > static function in gfxFont.cpp, so that the gfxFT2Font subclass can also use > it. So you're going to rework that patch to do that, right? (In reply to Jonathan Kew (:jfkthame) from comment #46) > Note that for tabs and newlines, the CompressedGlyph record will *not* > normally be a simple glyph, it will be a blank missing glyph record, and so > we won't need to "convert" a simple glyph to a DetailedGlyph, we'll just set > the relevant flag within the CompressedGlyph, but its glyph count will > remain zero and no DetailedGlyph will be allocated. I didn't think of that. Great.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #48) > Hmm ... if the textrun contained a list of shaped words, couldn't we get the > text from the words? Might be simpler and faster than this. Oh, you can't do that because you're expiring the words...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 51

•

14 years ago

Comment on attachment 585670 [details] [diff] [review] pt 4 v3 - add timed expiration of cached gfxShapedWord records Review of attachment 585670 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxFont.cpp @@ +1147,5 @@ > + if (!aEntry->mShapedWord) { > + NS_ASSERTION(aEntry->mShapedWord, "cache entry has no gfxShapedWord!"); > + return PL_DHASH_REMOVE; > + } > + if (aEntry->mShapedWord->IncrementAge() == 3) { Make this a constant somewhere instead of a magic number.

Attachment #585670 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 52

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #50) > (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #48) > > Hmm ... if the textrun contained a list of shaped words, couldn't we get the > > text from the words? Might be simpler and faster than this. > > Oh, you can't do that because you're expiring the words... Although refcounting the shaped words would be no problem.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #585671 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #579294 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #579295 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 53

•

14 years ago

Attached patch pt 2.5 v3 - adapt Android/FT2 font code to work with gfxShapedWord caches. — Details — Splinter Review

This addresses the FIXME that I'd forgotten earlier.

Attachment #585668 - Attachment is obsolete: true

Attachment #585668 - Flags: review?(roc)

Attachment #585988 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 54

•

14 years ago

Attached patch pt 2.4 v3 - adapt Linux/Pango font code to work with gfxShapedWord caches. — Details — Splinter Review

Split the gfxPangoFontGroup::FindFontForChar modification out of this patch as requested.

Attachment #585667 - Attachment is obsolete: true

Attachment #585667 - Flags: review?(roc)

Attachment #585989 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 55

•

14 years ago

Attached patch pt 2.4.1 - make gfxPangoFontGroup font-matching behavior more similar to generic gfxFontGroup version. — Details — Splinter Review

This modifies the initial part of gfxPangoFontGroup::FindFontForChar, where the code is aiming to propagate a previously-matched font to the current character in certain cases, to mimic the implementation in gfxFontGroup as used on other platforms. The significant behavior change this introduces, which is needed in this bug, is that space characters will no longer inherit the font of a preceding character that used fallback, but will revert to the font-group's primary font instead. The old code allowed fallback to extend across spaces, and had a comment to the effect that this didn't matter as the font chosen for the space would be ignored anyway, but that is no longer true in the gfxShapedWord-based version of textrun construction where the intervening word spaces are not included in the shaping process but handled separately. Specifically, without this change we get a handful of reftest failures in cases where we have <pre> or monospaced text that contains mixed English and Hebrew, and the space characters of the English (primary) font and Hebrew (fallback) fonts chosen have different widths.

Attachment #585991 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #585988 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #585989 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #585991 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 56

•

14 years ago

Attached patch pt 3 v3 - remove copy of original characters from gfxTextRun. — Details — Splinter Review

Attachment #585669 - Attachment is obsolete: true

Attachment #585669 - Flags: review?(roc)

Attachment #586018 - Flags: review?(roc)

Jonathan Kew [:jfkthame]

Assignee

Comment 57

•

14 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #48) > Comment on attachment 585669 [details] [diff] [review] > pt 3 v2 - remove copy of original characters from gfxTextRun. > Boolean parmaeters suck. Here you could just pass the PRUnichar character > code, which would be a little clearer. OK, makes sense. > ::: layout/generic/nsTextFrameThebes.cpp > @@ +849,1 @@ > > bool aSuppressSink); > > Let's make these bools into a flags word. Done. > @@ +1372,5 @@ > > + if (bufferSize < mMaxTextLength || bufferSize == PR_UINT32_MAX || > > + !buffer.AppendElements(bufferSize)) { > > + return; > > + } > > + SetupLineBreakerContext(textRun, buffer.Elements()); > > Why is the buffer a parameter to SetupLineBreakerContext? Can't > SetupLineBreakerContext set up the buffer itself? Yes, that's reasonable - I was just copying the pattern used with BuildTextRunForFrames, but there's no reason for that. Actually, I think we should do a followup to review the existing buffer allocations in this code, and switch to fallible arrays in some cases where we're allocating space for a potentially huge string of text, and the code include checks for failure - it was clearly written assuming fallible arrays, but we've since changed the default behavior. The buffer passed to BuildTextRunForFrames, for example, should be allocated fallibly, as should the temporary buffer used when we need to "expand" 8- to 16-bit text. Basically, wherever we're doing "nsAutoTArray<T,BIG_TEXT_NODE_SIZE>", we probably want FallibleAutoTArray. > @@ +2018,5 @@ > > +// So it does the same walk over the mMappedFlows, but doesn't actually > > +// build a new textrun. > > +void > > +BuildTextRunsScanner::SetupLineBreakerContext(gfxTextRun *aTextRun, > > + void *aTextBuffer) > > Hmm ... if the textrun contained a list of shaped words, couldn't we get the > text from the words? Might be simpler and faster than this. We could, though it probably wouldn't be significantly faster, as we'll be needing to work with a mapping between textrun offsets and the corresponding words. I'd prefer not to start on that within this bug, however. The intention is to try restructuring gfxTextRun to have a list of references to shaped words instead of its current mCharacterGlyphs array, but I want to do that in a separate bug, as I'm not confident of how performance will work out - iterating over the glyphs (e.g. for drawing) will become slightly more complex than with the existing flat array of CompressedGlyphs in the textrun itself. OTOH, that model should allow us to optimize construction better (no copying of glyph data), and may be friendlier towards off-main-thread shaping. But it seems sufficiently risky/unknown that it should be clearly separate, and I don't want to start adding a list of shaped words to the textrun at this stage, as by itself that would be a net loss (more memory, effort to maintain the offset-to-word mapping, refcount the words, etc). If that followup does work out well, we can then drop the extra character-identifying flags that are being stored in the CompressedGlyph here, as we'll have access to the actual character codes again. But in the meantime I think this is the simplest and safest way to maintain the data we need.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Attachment #586018 - Flags: review?(roc) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 58

•

14 years ago

Target Milestone: --- → mozilla12

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 59

•

14 years ago

I had an idea about storing lists of words in textruns which I'll write down here for posterity... Something like this: Store three arrays in each textrun: 1) The "word list array", an array of pairs, one for each word that occurs (nsRefPtr<gfxShapedWord> word, PRUint32 character offset of the start of the word in the textrun) 2) The "base word index array", an array of (#characters + 127)/128 PRUint32 indexes into the word list array. The value at index i is the index of the first word that includes a character with offset >= i*128. 3) The "word index array", an array of #characters byte. The value at index i is either some magic value (e.g. 0xFF) to indicate character i is a space, or another value which is added to the "base word index array"[i/128] to yield the index into the "word list array" for the word containing character i. This is a) compact (slightly over 1 byte per character when words are very long, 9 bytes per character for the pessimal case of 1-character words) and b) fairly efficient (finding the word containing a given character, and the offset of that character within the word, is three array lookups and a conditional branch on whether it's a space).

Jonathan Kew [:jfkthame]

Assignee

Comment 60

•

14 years ago

Filed followup bug 715473 on re-working the structure of textruns, and bug 715471 on using fallible allocation where appropriate in the textframe code.

Blocks: 715473

Jonathan Kew [:jfkthame]

Assignee

Comment 61

•

14 years ago

Pushed a followup to fix an intermittent crash (see https://tbpl.mozilla.org/php/getParsedLog.php?id=8345059&tree=Mozilla-Inbound) because the text passed to gfxFont::GetShapedWord() is not (usually) null-terminated, so we need to wrap it in an nsDependentCString when passing it to AppendASCIItoUTF16, to avoid reading beyond the end of what's valid: https://hg.mozilla.org/integration/mozilla-inbound/rev/26d7324c8d37

Jeff Muizelaar [:jrmuizel]

Comment 62

•

14 years ago

This seems to have caused a large performance regression of svg opacity. Regression :( SVG, Opacity increase 79.4% on Linux x64 Mozilla-Inbound ---------------------------------------------------------------------- Previous: avg 36.400 stddev 1.094 of 30 runs up to revision 3c970a5c173c New : avg 65.300 stddev 0.274 of 5 runs since revision c0b62edd2917 Change : +28.900 (79.4% / z=26.420) Graph : http://mzl.la/wd9hQZ

Jonathan Kew [:jfkthame]

Assignee

Comment 63

•

14 years ago

Which makes no apparent sense, as Tsvg-opacity doesn't even involve text, AFAICS. Also strange that it should affect Linux only. I'm beginning some investigation locally to try and understand what's going on here, but will be away from my main Linux machine over the weekend so it may be a few days before we have answers to this.

Marco Bonardo [:mak]

Comment 64

•

14 years ago

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Jeff Muizelaar [:jrmuizel]

Comment 65

•

14 years ago

Clang warns about this expression: if (!aFinish && mNumGlyphs < GLYPH_BUFFER_SIZE || !mNumGlyphs) { where it's not clear if it should be: if ((!aFinish && mNumGlyphs < GLYPH_BUFFER_SIZE) || !mNumGlyphs) { or: if (!aFinish && (mNumGlyphs < GLYPH_BUFFER_SIZE || !mNumGlyphs)) { can you add the parenthesis to make it clear which one you mean.

Alice0775 White

Updated

•

14 years ago

Depends on: 716229

Jonathan Kew [:jfkthame]

Assignee

Comment 66

•

14 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #65) > Clang warns about this expression: > if (!aFinish && mNumGlyphs < GLYPH_BUFFER_SIZE || !mNumGlyphs) { > where it's not clear if it should be: > if ((!aFinish && mNumGlyphs < GLYPH_BUFFER_SIZE) || !mNumGlyphs) { > or: > if (!aFinish && (mNumGlyphs < GLYPH_BUFFER_SIZE || !mNumGlyphs)) { > > can you add the parenthesis to make it clear which one you mean. I think you may have added this comment to the wrong bug by mistake? I don't believe I wrote anything like that in these patches....

Jeff Muizelaar [:jrmuizel]

Comment 67

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #66) > I think you may have added this comment to the wrong bug by mistake? I don't > believe I wrote anything like that in these patches.... Quite so.

Jonathan Kew [:jfkthame]

Assignee

Comment 68

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #63) > Which makes no apparent sense, as Tsvg-opacity doesn't even involve text, > AFAICS. > > Also strange that it should affect Linux only. > > I'm beginning some investigation locally to try and understand what's going > on here, but will be away from my main Linux machine over the weekend so it > may be a few days before we have answers to this. I've been trying to investigate this locally with Linux Opt builds from before/after this bug landed, but have not been able to reproduce the apparent regression either using a simple page-load test with the Tsvg-opacity testcases or by running standalone talos.

Jonathan Kew [:jfkthame]

Assignee

Comment 69

•

14 years ago

I still don't understand this, but from the tryserver results at https://tbpl.mozilla.org/?tree=Try&rev=1a04d12654c0, it looks as though bug 716229 may help with the mysterious Tsvg-opacity regression on Linux. Let's see what talos says once that has landed.

Karl Tomlinson (:karlt)

Comment 70

•

14 years ago

Comment on attachment 585989 [details] [diff] [review] pt 2.4 v3 - adapt Linux/Pango font code to work with gfxShapedWord caches. >- pango_break(aUTF8, aUTF8Length, aAnalysis, >- buffer.Elements(), buffer.Length()); > pango_shape(text, len, aAnalysis, glyphString); >- SetupClusterBoundaries(aTextRun, text, len, utf16Offset, aAnalysis); >- SetGlyphs(aTextRun, text, len, &utf16Offset, glyphString, >- aOverrideSpaceWidth); >+ SetGlyphs(aShapedWord, text, len, &utf16Offset, glyphString, >+ aOverrideSpaceWidth, aFont); Did you decide we don't need or want to use the PangoEngineLang for cursor positions in indic scripts (bug 617203 comment 5)? >- cairo_scaled_font_t *cairoFont = CreateScaledFont(renderPattern, face); >- >- nsRefPtr<gfxFcFont> font = static_cast<gfxFcFont*> >- (cairo_scaled_font_get_user_data(cairoFont, &sGfxFontKey)); >- >+ gfxFontStyle style(*aFontStyle); >+ style.size = GetPixelSize(renderPattern); >+ style.style = gfxFontconfigUtils::GetThebesStyle(renderPattern); >+ style.weight = gfxFontconfigUtils::GetThebesWeight(renderPattern); >+ >+ nsRefPtr<gfxFont> font = gfxFontCache::GetCache()->Lookup(fe, &style); > if (!font) { >- gfxFloat size = GetPixelSize(renderPattern); >- >- // Shouldn't actually need to take too much care about the correct >- // name or style, as size is the only thing expected to be important. >- PRUint8 style = gfxFontconfigUtils::GetThebesStyle(renderPattern); >- PRUint16 weight = gfxFontconfigUtils::GetThebesWeight(renderPattern); >- >- // The LangSet in the FcPattern does not have an order so there is no >- // one particular language to choose and converting the set to a >- // string through FcNameUnparse() is more trouble than it's worth. >- nsIAtom *language = gfxAtoms::en; // TODO: get the correct language? >- // FIXME: Pass a real stretch based on renderPattern! >- gfxFontStyle fontStyle(style, weight, NS_FONT_STRETCH_NORMAL, >- size, language, 0.0, >- true, false, >- NS_LITERAL_STRING(""), >- NS_LITERAL_STRING("")); >- > // Note that a file/index pair (or FT_Face) and the gfxFontStyle are > // not necessarily enough to provide a key that will describe a unique > // font. cairoFont contains information from renderPattern, which is a > // fully resolved pattern from FcFontRenderPrepare. > // FcFontRenderPrepare takes the requested pattern and the face > // pattern as input and can modify elements of the resulting pattern > // that affect rendering but are not included in the gfxFontStyle. >- font = new gfxFcFont(cairoFont, fe, &fontStyle); >+ cairo_scaled_font_t *cairoFont = CreateScaledFont(renderPattern, face); >+ font = new gfxFcFont(cairoFont, fe, &style); >+ gfxFontCache::GetCache()->AddNew(font); >+ cairo_scaled_font_destroy(cairoFont); Can you explain the reason for this change, please?

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Blocks: 614476

Nicholas Nethercote [inactive]

Comment 71

•

14 years ago

jkew, can you briefly summarize the memory wins in this bug?

Jonathan Kew [:jfkthame]

Assignee

Updated

•

14 years ago

Depends on: 717175

Jonathan Kew [:jfkthame]

Assignee

Comment 72

•

14 years ago

(In reply to Karl Tomlinson (:karlt) from comment #70) > Did you decide we don't need or want to use the PangoEngineLang for cursor > positions in indic scripts (bug 617203 comment 5)? Some, at least, of the "special breaking rules" that pango implements via pango_break are specifically _not_ desired; see bug 474068 and http://bugzilla.gnome.org/show_bug.cgi?id=576156. The undesired behavior in pango has still not been fixed, AFAIK; in bug 474068 we hacked our test to ignore the problem, but it would also give incorrect first-letter behavior (for example), if we were to have Thai testcases for that. If we determine that there are specific languages/scripts for which using pango would give us _improved_ behavior, we could consider re-enabling it there, but at this point I'm not aware of cases where we need that. > Can you explain the reason for this change, please? The old code was bypassing the global gfxFontCache, which left us without any easy way to keep track of the thebes fonts in order to report their memory usage or to age or flush their ShapedWord caches.

Jonathan Kew [:jfkthame]

Assignee

Comment 73

•

14 years ago

(In reply to Nicholas Nethercote [:njn] from comment #71) > jkew, can you briefly summarize the memory wins in this bug? This bug doesn't by itself substantially affect memory usage; however, it does make it possible for us to tune caching behavior more readily, reducing churn due to discarding and then re-creating data for common words, and it allows us to flush the (potentially large) shaped-word caches on memory-pressure notifications (bug 708075). It's also the first step towards the restructuring of gfxTextRun (bug 715473), which might (if successful) be a significant memory win on text-heavy pages.

Karl Tomlinson (:karlt)

Comment 74

•

14 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #72) > The old code was bypassing the global gfxFontCache, which left us without > any easy way to keep track of the thebes fonts in order to report their > memory usage or to age or flush their ShapedWord caches. The old code did use the gfxFontCache for gfxFonts with a reference count of zero, but the method that was used to track gfxFonts may not have caught that. I gather that word caching has now moved from a FontGroup to Font, and so CSS details such as the language now need to exist on the Font and we now need to have a separate gfxFont for each language (and override). I expect there would be fontconfig corner cases where gfxFontCache::GetCache()->Lookup(fe, &style) doesn't give us the correct cairo_scaled_font, but it looks like it is working with standard configurations. I guess we now no longer have word-based FindFontForChar font-selection caching now that caching is no longer FontGroup based. I wonder what impact that has.

Alice0775 White

Updated

•

14 years ago

Depends on: 717852

Simon Montagu :smontagu

Updated

•

14 years ago

Blocks: 694205

Alice0775 White

Updated

•

14 years ago

Depends on: 726539

Scoobidiver (away)

Updated

•

14 years ago

Depends on: 728133

Daniel Holbert [:dholbert]

Updated

•

14 years ago

Depends on: 728462

Scoobidiver (away)

Updated

•

14 years ago

Depends on: 737942

Alice0775 White

Updated

•

13 years ago

Depends on: 745555

Jonathan Kew [:jfkthame]

Assignee

Updated

•

13 years ago

Depends on: 745699

Jesse Ruderman

Updated

•

13 years ago

Depends on: 751129

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Updated

•

13 years ago

Depends on: 791953

Alice0775 White

Updated

•

12 years ago

Depends on: 909264

Alice0775 White

Updated

•

12 years ago

Depends on: 909344

Alice0775 White

Updated

•

12 years ago

Depends on: 970891

Florian Quèze [:florian]

Updated

•

4 years ago

Depends on: 1736868

You need to log in before you can comment on or make changes to this bug.