Closed Bug 761442 Opened 8 years ago Closed 7 years ago

use of per-word shaping breaks contextual rules used in OpenType features

Categories

(Core :: Graphics: Text, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: jtd, Assigned: jtd)

References

(Depends on 1 open bug)

Details

Attachments

(10 files, 8 obsolete files)

73.03 KB, image/png
Details
13.87 KB, patch
Details | Diff | Splinter Review
74.81 KB, patch
Details | Diff | Splinter Review
9.11 KB, patch
jtd
: review+
Details | Diff | Splinter Review
54.43 KB, text/plain
Details
31.88 KB, text/plain
Details
40.93 KB, text/plain
Details
44.29 KB, text/plain
Details
23.19 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
17.91 KB, patch
jfkthame
: review+
Details | Diff | Splinter Review
When laying out text currently, we split text up at word boundaries and cache it for perf reasons.  This has the downside that it breaks some OpenType contextual features that rely on having before/after context.

I think it would make sense to disable word caching on text runs that have (1) font features defined via font-feature-settings or (2) any font-variant-xxx subproperty value that is not the default value.

Thomas Phinney from Extensis pointed out that our current word segmentation causes some features to fail in Firefox.  Disabling word caching for these cases would solve this problem but leave the default case unaffected.
Comment from Thomas Phinney, via email:

Just use maxContext from the OS/2 table to know how far to let the
context continue beyond a word boundary. The only fonts that have a
large maxContext are wacky things that are not going to be used to set
vast amounts of type. Nobody who matters is going to set a
thousand-word document in Bickham Script Pro or Beowolf or Zapfino.
But people will use them for headlines and small amounts of text, and
expect them to work the same way they do on the desktop. Besides wacky
things, this stuff gets used for things people really care about: they
want their wedding invitations and memorials to dead loved ones to
look nice.

If you watch the video [1] you'll probably see three or four cases
where breaking context at word boundaries adversely affects contextual
behavior. Plus there are another three or four cases where I carefully
avoid doing the thing that would break, or don't comment on it as some
other things are working in the same example. Two main categories:

1) Fancy script fonts that deliberately use context across words.
Example: Bickham Script Pro, type a line with the word "the" occurring
several times. In InDesign, Illustrator or Photoshop every occurrence
of "the" will be different. In Firefox... all the same. This also
affects script or randomized fonts that want to rotate through
different forms of a letter. Example: my demo has a version of Myriad
that rotates weights, and Beowolf that rotates alternate letters. The
rotation sequence would be broken if I included any word spaces, which
is why I don't in my demo.

2) Also affected is any ligature or substitution effect where the
input string is more than one word. A couple of things don't work in
my demo in the end stages with Amy Papaelias' fonts and those inspired
by them, such as changing "it is" to "c'est" and changing "fuck off"
to "go away."

[1] http://www.youtube.com/watch?v=NnMTFSeyc7g
(In reply to John Daggett (:jtd) from comment #0)
> When laying out text currently, we split text up at word boundaries and
> cache it for perf reasons.  This has the downside that it breaks some
> OpenType contextual features that rely on having before/after context.

FWIW, Gecko isn't the only environment where such features will fail, because inter-word spaces aren't treated as "normal" characters in the run of text that gets shaped by an OpenType engine. The same limitation exists in XeTeX, as TeX regards inter-word spaces as "glue" of a completely different nature to the printable characters that make up words. I've seen a comment suggesting that Opera has a similar limitation, though I haven't tested it.

> I think it would make sense to disable word caching on text runs that have
> (1) font features defined via font-feature-settings or (2) any
> font-variant-xxx subproperty value that is not the default value.

Yes, I think we could do this without an excessive performance penalty - it's likely to apply to a relatively small proportion of text, and the perf gain we get from caching is not huge, though it is a measurable difference.

Note that this might complicate the proposed restructuring of textruns in bug 715473, though that is an experimental idea with uncertain benefits - we could simply decide against pursuing it.
Another testcase:

http://people.mozilla.org/~jdaggett/tests/symbolset-test.html

Steps: open the linked testcase

Result: several of the symbols do not display correctly, the 2nd and 3rd columns should display identically.

The font includes contextual ligatures for things like "thumbs up".  Since the shaping code breaks this up into "thumbs" and "up", we lose the ability to match those contextual rules.
Summary: disable word caching when font features are explicitly enabled → use of per-word shaping breaks contextual rules used in OpenType features
Duplicate of this bug: 773154
FF trunk on the left, Chrome on the right.

Chrome gets the contextual rule handling correct but their alignment is wonky (probably yet another artifact of the plain-text/complex-text bifurcation in Webkit code)
I wouldn't suggest using maxContext from OS/2 table.  That's not enough in the presence of marks that can be skipped when matching context.

What we can do is to ask HarfBuzz to check whether the font has any contextual lookups that match the space glyph, and decide whether to use the cache.
(In reply to Behdad Esfahbod from comment #6)
> What we can do is to ask HarfBuzz to check whether the font has any
> contextual lookups that match the space glyph, and decide whether to use the
> cache.

Yes, that would be really great!
... if you can make that check really really fast :-)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #8)
> ... if you can make that check really really fast :-)

Sure.  And it's a once-per-face thing.  We'll get this done this week hopefully.
(In reply to Behdad Esfahbod from comment #6)
> I wouldn't suggest using maxContext from OS/2 table.  That's not enough in
> the presence of marks that can be skipped when matching context.

I don't quite follow, it seems like maxContext still needs to be accounted for when considering how to break up text runs for shaping.  Are you saying it should be ignored or just that the presence of marks can also influence the decision?
(In reply to John Daggett (:jtd) from comment #10)
> (In reply to Behdad Esfahbod from comment #6)
> > I wouldn't suggest using maxContext from OS/2 table.  That's not enough in
> > the presence of marks that can be skipped when matching context.
> 
> I don't quite follow, it seems like maxContext still needs to be accounted
> for when considering how to break up text runs for shaping.  Are you saying
> it should be ignored or just that the presence of marks can also influence
> the decision?

maxContext cannot be used reliably for more that one reason. [*]

If we check that the font doesn't use space glyph in any context though, you will be sure that chopping the context at word boundaries CANNOT affect shaping results.

Hope that clarifies.


* Top reasons being:

1. It's not set reliably by font tools / designers (partly because I think no one uses it, but I may be wrong).

2. OpenType lookups can skip over mark (and other) glyphs.  So, even when a lookup as two glyphs in it's lookahead context, that may match, say, 5 glyphs, if there are three combining marks involved in the sequence.
Duplicate of this bug: 802936
FWIW I have implemented features needed for this fix in HarfBuzz already.  Untested.  Waiting for Jonathan to test them.
Depends on: 825871
Depends on: 826226
With the API available in latest harfbuzz (see bug 826226) plus the ability to shape text directly into the textrun, bypassing the font's word cache (bug 825871), we can now fix this.

There is a small but measurable performance cost to bypassing the word cache, which in a "naive" implementation of this feature results in a 5% Tpn regression on Win7. (Other platforms are not so badly affected.) This is primarily because the Win7 versions of both Times New Roman and Arial (and probably a bunch of other fonts, but as defaults, these are used all over the place) include a number of kerns between letters and the space glyph. As such, harfbuzz detects the existence of lookups involving space, so we use whole-run shaping, lose the benefit of word caching, and perf suffers.

To mitigate this, I propose to make this behavior conditional on the text-rendering mode, so that it only comes into effect when text-rendering:optimizeLegibility (as opposed to optimizeSpeed) is requested, or when the font size exceeds the auto_quality_min_font_size threshold (default: 20 dev px). By doing this, we can avoid regressing performance for the vast majority of content, while making the fixed behavior available when it really matters.
Jonathan, I highly suggest that you always respect the GSUB analysis and abandon cache if space participates in GSUB, but make it dependent on optimizeLegibility for GPOS.  WDYT?
Comment on attachment 698902 [details] [diff] [review]
reftest for ligatures that span inter-word space.

Symbolset is not a free font, it can't be distributed via the Mozilla source tree:

  http://symbolset.com/license/

The reftest needs to be tweaked to use a different, freely available font with spaces used in lookups.
Attachment #698902 - Flags: review?(jdaggett) → review-
Comment on attachment 698900 [details] [diff] [review]
don't use per-word shaping with fonts that use <space> in opentype lookups.

The logic behind this looks fine but I don't like the code structure
and I don't think shaping behavior should depend on the
TEXT_OPTIMIZE_SPEED flag.

> +    uint32_t flags = aTextRun->GetFlags();
> +    if (!(flags & gfxTextRunFactory::TEXT_OPTIMIZE_SPEED) &&
> +        !UseShapedWordCache()) {
> +        return ShapeTextWithoutWordCache(aContext, aString + aRunStart,
> +                                         aRunStart, aRunLength, aRunScript,
> +                                         aTextRun);
> +    }
> +

I don't think we should be using the TEXT_OPTIMIZE_SPEED flag at all
in this case, it will lead to strange, subtle differences in behavior
when compared to other browsers when features are more widely
supported.  The behavior of our shaping code should be transparent, it
shouldn't be dependent like this on hint flags with non-standardized
meaning.

> +    bool UseShapedWordCache() {
> +        if (mUseShapedWordCacheInitialized) {
> +            return mUseShapedWordCache;
> +        }
> +        mUseShapedWordCacheInitialized = true;
> +        // we record this in the font entry so that subsequent
> +        // instantiations for the same font face won't require us
> +        // to re-check the tables
> +        gfxFontEntry *fe = GetFontEntry();
> +        if (!fe->mHasSpaceFeaturesInitialized) {
> +            fe->mHasSpaceFeaturesInitialized = true;
> +            fe->mHasSpaceFeatures = CheckForFeaturesInvolvingSpace();
> +        }
> +        mUseShapedWordCache = !fe->mHasSpaceFeatures;
> +        return mUseShapedWordCache;
> +    }

Better to have the code for 'HasSpaceFeatures' in the font entry
object.  I don't really see the need for *two* member vars here,
mUseShapedWordCache in gfxFont and mHasSpaceFeatures in gfxFontEntry.
Maintain one in the font entry and reduce UseShapedWordCache() to
simply call an inlined method HasSpaceFeatures() in the font entry. 
The code will be simpler and to effectively the same logic.
Attachment #698900 - Flags: review?(jdaggett) → review-
(In reply to Behdad Esfahbod from comment #18)
> Jonathan, I highly suggest that you always respect the GSUB analysis and
> abandon cache if space participates in GSUB, but make it dependent on
> optimizeLegibility for GPOS.  WDYT?

Could we completely disable GPOS effects involving spaces when not using optimizeLegibility? That would ensure the cache does not change rendering.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #21)

> Could we completely disable GPOS effects involving spaces when not using
> optimizeLegibility? That would ensure the cache does not change rendering.

What is the big perf win here of doing this?  It seems like something that just makes
our rendering code "different" rather than dramatically faster.  Seems like we should implement
this first without a dependency on funky perf flags, then add the flag-dependent optimization
only if it's a huge win in practice (rather than theoretically).
Comment on attachment 698901 [details] [diff] [review]
work around reftests that break due to characters kerning with space in Windows default fonts.

Clearing review flag for now, until the rendering hint issue is decided.
Attachment #698901 - Flags: review?(jdaggett)
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #21)
> (In reply to Behdad Esfahbod from comment #18)
> > Jonathan, I highly suggest that you always respect the GSUB analysis and
> > abandon cache if space participates in GSUB, but make it dependent on
> > optimizeLegibility for GPOS.  WDYT?
> 
> Could we completely disable GPOS effects involving spaces when not using
> optimizeLegibility? That would ensure the cache does not change rendering.

Well, it gets too technical defining what "GPOS effects involving spaces" means.  The easiest way to disable those is simply to not provide the space character to the shaper.  Otherwise, it's not clear what it means.  Do you disable the 'kern' feature because it involves space?
(In reply to John Daggett (:jtd) from comment #22)
> What is the big perf win here of doing this?

Jonathan explained this in comment #14:
> a "naive" implementation of this feature results in a 5% Tpn regression on Win7

We aren't taking a 5% Tp regression for the sake of correct shaping around spaces, that should be clear.
(In reply to Behdad Esfahbod from comment #24)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #21)
> > (In reply to Behdad Esfahbod from comment #18)
> > > Jonathan, I highly suggest that you always respect the GSUB analysis and
> > > abandon cache if space participates in GSUB, but make it dependent on
> > > optimizeLegibility for GPOS.  WDYT?

At first glance this seems like a good compromise. One practical issue with this, however, is that the default Windows fonts - both TNR and Arial - actually include a GSUB lookup that involves space, too. So we'd still end up disabling the word cache for those fonts, which means we'd still take the Tp hit.

This "problematic" GSUB rule involving space - there's just one occurrence - is in the <ccmp> feature of the Hebrew script. So if we were to do this analysis and decide whether to cache on a per-script basis instead of globally for the font, we'd avoid taking the hit on Latin text. But keeping track of that info will be significantly more complex, as it'd no longer be just a boolean flag in the font.

> > Could we completely disable GPOS effects involving spaces when not using
> > optimizeLegibility? That would ensure the cache does not change rendering.

We could ignore GPOS for the purpose of deciding whether to use the word cache, which AIUI is what Behdad suggested; but if -GSUB- causes us to use whole-run shaping anyway, then -GPOS- effects involving space will also take effect.
(In reply to John Daggett (:jtd) from comment #20)
> Comment on attachment 698900 [details] [diff] [review]

> > +    bool UseShapedWordCache() {
> > +        if (mUseShapedWordCacheInitialized) {
> > +            return mUseShapedWordCache;
> > +        }
> > +        mUseShapedWordCacheInitialized = true;
> > +        // we record this in the font entry so that subsequent
> > +        // instantiations for the same font face won't require us
> > +        // to re-check the tables
> > +        gfxFontEntry *fe = GetFontEntry();
> > +        if (!fe->mHasSpaceFeaturesInitialized) {
> > +            fe->mHasSpaceFeaturesInitialized = true;
> > +            fe->mHasSpaceFeatures = CheckForFeaturesInvolvingSpace();
> > +        }
> > +        mUseShapedWordCache = !fe->mHasSpaceFeatures;
> > +        return mUseShapedWordCache;
> > +    }
> 
> Better to have the code for 'HasSpaceFeatures' in the font entry
> object.

That's difficult because in the Linux case, the font entry doesn't readily have the information needed to create a harfbuzz face that we can query. In particular, it doesn't support GetFontTable. The harfbuzz shaper on Linux relies on the GetFontTable implemented in gfxFT2FontBase, so it's not available until we've actually instantiated a gfxFcFont, not from the gfxFcFontEntry.

>  I don't really see the need for *two* member vars here,
> mUseShapedWordCache in gfxFont and mHasSpaceFeatures in gfxFontEntry.
> Maintain one in the font entry and reduce UseShapedWordCache() to
> simply call an inlined method HasSpaceFeatures() in the font entry. 
> The code will be simpler and to effectively the same logic.

It's marginally more expensive to test a flag in the font entry (which is accessed via a refptr) than a field that's directly stored in the font, and this check is -very- hot. I went through a number of iterations trying to squeeze out any measurable perf hit. Maybe if we don't use bitfields for these bools in the font entry, we can get away with it that way.
Oops, I assumed symbolset was OK to use because it was used in your testcase, but I see the license allows that but not more general forms of 'distribution'. OK, here's a testcase using the OFL-licensed Ligature Symbols font (http://kudakurage.com/ligature_symbols/), slightly modified to include spaces in some of the ligature rules.
Attachment #698902 - Attachment is obsolete: true
Here's a slightly simplified patch that only stores the flags in the font entry (although the actual feature check is still done by gfxFont). Given the issue of default fonts on Windows having both GSUB and GPOS lookups that involve space, I think making this dependent on the text-rendering property is our best option; it means we can avoid regressing overall text performance with common fonts, while allowing authors who really need the feature to opt-in. Note that webkit also uses the text-rendering property (among other things) to influence whether opentype features are applied. I think this is a perfectly reasonable application of that property -- it allows authors to control whether the browser will use a more full-featured but lower-performance code path, or a faster implementation that may give slightly less sophisticated rendering (but is perfectly adequate for the vast majority of text).
Attachment #699238 - Flags: review?(jdaggett)
Attachment #698900 - Attachment is obsolete: true
<review ping>

To recap the issues here, and why I'm advocating this approach:

Because standard Windows fonts (Times New Roman, Arial, perhaps others) include features that involve space, and because these are commonly used for large amounts of text, we will suffer a substantial performance hit if we -always- disable word caching when such features exist in the font being used.

I don't think doing that is justified, given that the features involved are not critical to "adequately-good" rendering. The quality benefit that would be gained for such text is extremely marginal, and does not justify the performance cost.

The proposed solution here, making the behavior dependent on the "quality" vs "speed" modes, combined with our existing auto_quality_min_font_size threshold, means that we will automatically respect features that involve space in larger font sizes. This is likely to cover most of the use cases of more "exotic" fonts where those features are important (icons, logos, fancy display types, etc), as well as handling the fact that minor adjustments like kerning with the space may be more significant at large sizes, whereas at body-text sizes they're of marginal importance.

In addition, the existence of the text-rendering:optimizeLegibility property means that authors who choose to use fonts with exotic layout features - typically via @font-face - will be able to explicitly opt in to the behavior if they need to ensure it remains operative across the full range of sizes.
Re-ping...?
Review re-ping...? As we're starting a new cycle, it seems like a good time to make a change like this, for maximal pre-release testing time.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #25)

> > What is the big perf win here of doing this?
> 
> Jonathan explained this in comment #14:
> > a "naive" implementation of this feature results in a 5% Tpn regression on Win7
> 
> We aren't taking a 5% Tp regression for the sake of correct shaping around
> spaces, that should be clear.

I think the additional tweaks added to work around the tp perf
regression is actually just a way of gaming the tp test by reducing
the frequency this codepath gets called during the tp test.

The key problem, as the description in bug 834609 details, is that
because the tp test loads a single page multiple times and averages
out the result, text shaping performance has zero impact on tp but
slight changes in the word cache hit ratio have a big impact.  Actual
user page load times are most definitely affected by text shaping
performance.

Effectively we're removing the *right* solution from the default
codepath and telling authors they may need to set additional
properties to get something they should get by default.
Comment on attachment 699238 [details] [diff] [review]
don't use per-word shaping with fonts that use <space> in opentype lookups.

+    uint32_t flags = aTextRun->GetFlags();
+    if (BypassShapedWordCache(flags)) {
+        return ShapeTextWithoutWordCache(aContext, aString + aRunStart,
+                                         aRunStart, aRunLength, aRunScript,
+                                         aTextRun);
+    }
+

Since reflows during page loading penalize performance when we don't
put shaped text into the word cache in some form, I think for now the
right approach is to simply shape the entire text run *and* cache that
in the word cache (rather than reshaping it repeatedly).  Longer term
I think the right fix would be to cache the text runs better within
layout code but that will take more thought and effort.

[See bug 834609, comment 17 for more background on word caching
performance associated with page loads]

+    bool BypassShapedWordCache(uint32_t aRunFlags) {
+        // When creating a textrun in optimizeSpeed mode, we always use the
+        // word cache (which means we do not support features that span
+        // inter-word spaces)
+        if (aRunFlags & gfxTextRunFactory::TEXT_OPTIMIZE_SPEED) {
+            return false;
+        }

I don't think we should be using the text-rendering hint for this. 
That's an author facing property and I think the "right" result should
be default, not something that only happens under certain conditions
or when the author explicitly specifies it.  The only reason for using
the hint here is to make up for a deficiency in the current structure
of our code, one that I think we can resolve more cleanly in the
future.
Attachment #699238 - Flags: review?(jdaggett) → review-
(In reply to John Daggett (:jtd) from comment #35)
> Comment on attachment 699238 [details] [diff] [review]
> don't use per-word shaping with fonts that use <space> in opentype lookups.
> 
> +    uint32_t flags = aTextRun->GetFlags();
> +    if (BypassShapedWordCache(flags)) {
> +        return ShapeTextWithoutWordCache(aContext, aString + aRunStart,
> +                                         aRunStart, aRunLength, aRunScript,
> +                                         aTextRun);
> +    }
> +
> 
> Since reflows during page loading penalize performance when we don't
> put shaped text into the word cache in some form, I think for now the
> right approach is to simply shape the entire text run *and* cache that
> in the word cache

That seems like a misuse of the word cache, and in the case of browsing text-heavy pages with common fonts such as Times New Roman or Arial, it will bloat the "word" caches with records for all the complete textruns.

> I don't think we should be using the text-rendering hint for this. 
> That's an author facing property and I think the "right" result should
> be default, not something that only happens under certain conditions
> or when the author explicitly specifies it.

Because of the size threshold we have for automatically defaulting to "quality" mode, the "right" result would happen by default for the great majority of cases where it's likely to be of any significance. IMO, kerning with inter-word space at body-text sizes simply isn't important enough to justify giving up the benefits of word caching.
(In reply to Jonathan Kew (:jfkthame) from comment #31)

> To recap the issues here, and why I'm advocating this approach:
> 
> Because standard Windows fonts (Times New Roman, Arial, perhaps others)
> include features that involve space, and because these are commonly used for
> large amounts of text, we will suffer a substantial performance hit if we
> -always- disable word caching when such features exist in the font being
> used.

The perf penalty occurs *not* because you're shaping with spaces, it
occurs because shaped text results for these fonts aren't cached in
*any* form.  Given that layout reflows for a typical page can lay out
the same text multiple times when loading a page, it's important that
we cache the results of shaping.

The logic on the current patch is:

  if (font has spaces in contextual rules && font size >= 20px) {
    shape entire text run
    // don't cache!!
  } else {
    break into words  /* contextual rules involving spaces will be broken */
    for each word
      if (word in shaped word cache) {
        use shaped word
      } else {
        shape word
        cache shaped word
      }
  }
  
I'm suggesting we take any number of possible approaches where the
results are cached.  For example, 

  if (font has spaces in contextual rules && font size >= 20px) {
    shape entire text run
    cache the entire text run  // <== 
  } else {
    -- same as the logic above
  }

This is similar to the logic that's used with existing
Chinese/Japanese pages today, since Chinese/Japanese pages don't have
spaces to act as separators and effectively use the "shape entire text
run" path already.

Another approach would be include spaces when shaping the words and
fix up the results when adding the glyphs to the text run.

> The proposed solution here, making the behavior dependent on the
> "quality" vs "speed" modes, combined with our existing
> auto_quality_min_font_size threshold, means that we will
> automatically respect features that involve space in larger font
> sizes. This is likely to cover most of the use cases of more
> "exotic" fonts where those features are important (icons, logos,
> fancy display types, etc), as well as handling the fact that minor
> adjustments like kerning with the space may be more significant at
> large sizes, whereas at body-text sizes they're of marginal
> importance.

Positioning features such as kerning may only make a small difference
at smaller sizes but differences in substitution features where a
different glyph is used will be easy to spot even at small sizes.
Maybe you consider them exotic but that's a difficult thing for
authors to know.

The simple fact is that the testcase in comment 3 will render fine in
IE10 but not in Firefox.  And not with the approach you're suggesting
if the font size is smaller than 20px.
(In reply to Jonathan Kew (:jfkthame) from comment #36)
> (In reply to John Daggett (:jtd) from comment #35)
> > Comment on attachment 699238 [details] [diff] [review]
> > don't use per-word shaping with fonts that use <space> in opentype lookups.
> > 
> > +    uint32_t flags = aTextRun->GetFlags();
> > +    if (BypassShapedWordCache(flags)) {
> > +        return ShapeTextWithoutWordCache(aContext, aString + aRunStart,
> > +                                         aRunStart, aRunLength, aRunScript,
> > +                                         aTextRun);
> > +    }
> > +
> > 
> > Since reflows during page loading penalize performance when we don't
> > put shaped text into the word cache in some form, I think for now the
> > right approach is to simply shape the entire text run *and* cache that
> > in the word cache
> 
> That seems like a misuse of the word cache, and in the case of browsing
> text-heavy pages with common fonts such as Times New Roman or Arial, it will
> bloat the "word" caches with records for all the complete textruns.

I think it's worth evaluating this at least and not dismissing it out
of hand. This is already effectively what happens for Chinese/Japanese
pages where there aren't spaces to act as separators.

I agree that this isn't the ideal approach, ideally text runs would be
preserved longer within layout code so that the code doesn't need to
rely on the word cache when reflowing. We need to get better data on
the lifetime of text shaping results and see if there are ways to
avoid reshaping the same text.

> > I don't think we should be using the text-rendering hint for this. 
> > That's an author facing property and I think the "right" result should
> > be default, not something that only happens under certain conditions
> > or when the author explicitly specifies it.
> 
> Because of the size threshold we have for automatically defaulting to
> "quality" mode, the "right" result would happen by default for the great
> majority of cases where it's likely to be of any significance. IMO, kerning
> with inter-word space at body-text sizes simply isn't important enough to
> justify giving up the benefits of word caching.

This is only true for positioning features (e.g. kerning).  It's *not*
true for substitution features.
(In reply to John Daggett (:jtd) from comment #37)
> (In reply to Jonathan Kew (:jfkthame) from comment #31)
> 
> > To recap the issues here, and why I'm advocating this approach:
> > 
> > Because standard Windows fonts (Times New Roman, Arial, perhaps others)
> > include features that involve space, and because these are commonly used for
> > large amounts of text, we will suffer a substantial performance hit if we
> > -always- disable word caching when such features exist in the font being
> > used.
> 
> The perf penalty occurs *not* because you're shaping with spaces, it
> occurs because shaped text results for these fonts aren't cached in
> *any* form.  Given that layout reflows for a typical page can lay out
> the same text multiple times when loading a page, it's important that
> we cache the results of shaping.
> 
> The logic on the current patch is:
> 
>   if (font has spaces in contextual rules && font size >= 20px) {
>     shape entire text run
>     // don't cache!!
>   } else {
>     break into words  /* contextual rules involving spaces will be broken */
>     for each word

You're missing something here:
  if (word length exceeds limit for caching) {
    shape word into target textrun
  } else {
    ...

>       if (word in shaped word cache) {
>         use shaped word
>       } else {
>         shape word
>         cache shaped word
>       }
>   }
>   
> I'm suggesting we take any number of possible approaches where the
> results are cached.  For example, 
> 
>   if (font has spaces in contextual rules && font size >= 20px) {
>     shape entire text run
>     cache the entire text run  // <== 

This is what I don't believe we should do, due to the cache bloat it will cause.

>   } else {
>     -- same as the logic above
>   }
> 
> This is similar to the logic that's used with existing
> Chinese/Japanese pages today, since Chinese/Japanese pages don't have
> spaces to act as separators and effectively use the "shape entire text
> run" path already.

I think you're overlooking the fact that when a "word" exceeds the threshold gfxShapedWord::kMaxLength (currently 32 chars), we do -not- cache it in the font's word cache (or indeed create a shaped-word object at all - we shape it directly into the target textrun). This is so that we don't bloat the font word caches with arbitrary long strings that are relatively unlikely to occur again.

> Another approach would be include spaces when shaping the words and
> fix up the results when adding the glyphs to the text run.

Won't work in general, where features involve not just the word-boundary space but potentially multiple characters beyond it.

> 
> > The proposed solution here, making the behavior dependent on the
> > "quality" vs "speed" modes, combined with our existing
> > auto_quality_min_font_size threshold, means that we will
> > automatically respect features that involve space in larger font
> > sizes. This is likely to cover most of the use cases of more
> > "exotic" fonts where those features are important (icons, logos,
> > fancy display types, etc), as well as handling the fact that minor
> > adjustments like kerning with the space may be more significant at
> > large sizes, whereas at body-text sizes they're of marginal
> > importance.
> 
> Positioning features such as kerning may only make a small difference
> at smaller sizes but differences in substitution features where a
> different glyph is used will be easy to spot even at small sizes.
> Maybe you consider them exotic but that's a difficult thing for
> authors to know.
> 
> The simple fact is that the testcase in comment 3 will render fine in
> IE10 but not in Firefox.  And not with the approach you're suggesting
> if the font size is smaller than 20px.

My contention is that examples like this are unlikely to be widely used at body-text sizes; this is the sort of thing people care about doing at display sizes, which is why the auto_quality threshold is useful.

In the event that an author -does- want features like this at small text sizes, they'd need to know about setting the text-rendering hint explicitly. That may not be ideal, but it's usable (and I'll note that for your example to work in IE, the author already needed to know about explicitly enabling the ligature feature, as it doesn't seem to be applied by default).

I'm arguing that the proposed approach here makes things strictly better than they are at present, by enabling the features to work in the contexts where they are generally wanted -without- the risk of either regressing performance in general or bloating caches.

If shaping and layout optimizations in the future allow us to entirely eliminate the need for the text-rendering hint, that would be great. But we should treat that as a followup that is dependent on further analysis of text layout behavior, etc., rather than letting it block us from shipping a solution that will address most use-cases today.
(In reply to Jonathan Kew (:jfkthame) from comment #39)

> My contention is that examples like this are unlikely to be widely
> used at body-text sizes; this is the sort of thing people care about
> doing at display sizes, which is why the auto_quality threshold is
> useful.

I think that really depends on the font and what the type designer is
using the contextual rules for.  Especially in the case of
substitution features I don't see how you can make this claim. 
Several of the default fonts that hit the "spaces in contextual rules"
problem on Windows appear to use 'ccmp' substitutions for Arabic or
Hebrew.  The 'ccmp' feature is on by default, why would you assume
that this only applies to display sizes?

The added ripple here is that the auto_quality threshold actually is
different for folks working on Retina displays, where the default
behavior will change not at font sizes of 20px but at 10px instead. So
a designer working with a font on a Retina display will see one thing
while someone viewing it on a non-Retina display will see something
different.

Using the threshold in the case of substitution features also
introduces a *new* bug, namely that animations of text that cross that
threshold will suddenly "pop", as the substitution is enabled or
disabled.  Same thing for the author who adjusts the font size of a
given element and suddenly ends up with different looking text.

> In the event that an author -does- want features like this at small
> text sizes, they'd need to know about setting the text-rendering
> hint explicitly. That may not be ideal, but it's usable...

The 'text-rendering' property currently doesn't do much but we do have
'font-feature-settings' that does.  I would suggest that for
substitution features explicit author control should be lashed to this
instead *without* the size restriction you suggest.  Same for
positional features *except* kerning.  The CSS3 Fonts 'font-kerning'
property defaults to 'auto' (browser decides how and when to apply
kerning) but also allows authors to set kerning explicitly on/off.  So
I think we should follow that, by default ignore space kerns up to a
given font size but allow authors to override that with 'font-kerning'
or by using the 'font-feature-settings' value that enables the 'kern'
feature.  This is much more "forward compatible" with CSS3 Fonts. The
'text-rendering' property shouldn't be used here.  This way examples
that just work in IE10 will also just work in Firefox.

> ... I'll note that for your example to work in IE, the author
> already needed to know about explicitly enabling the ligature
> feature, as it doesn't seem to be applied by default).

Right but those features will be documented in the font.  But only for
Firefox, authors will need to be aware that there are certain
conditions for which a special added property (i.e. 'text-rendering')
will need to be set.  That just makes us look worse than other
browsers.
For all fonts on OSX 10.8, lists of features for each font (regular face). Several of the Microsoft C-fonts are also included.  

Script tags are bracketed with <>. Features associated with lookups that include a space are marked with @.

OSX versions of Arial/Times New Roman:

family: Arial GSUB <arab> ccmp fina init isol liga locl medi rlig <cyrl> <grek> ccmp <hebr> ccmp@ dlig <latn> ccmp locl
family: Arial GPOS <arab> mark mkmk <cyrl> kern@ mark mkmk <grek> kern@ mark mkmk <hebr> mark <latn> kern@ mark mkmk

family: Times New Roman GSUB <arab> ccmp fina init isol liga locl medi rlig <cyrl> <grek> ccmp <hebr> ccmp@ dlig <latn> ccmp locl
family: Times New Roman GPOS <arab> mark mkmk <cyrl> kern@ mark mkmk <grek> kern@ mark mkmk <hebr> mark <latn> kern@ mark mkmk

Windows 8 versions of Arial/Times New Roman:

family: win8Arial GSUB <arab> calt ccmp dlig@ fina init isol liga locl medi rlig <cyrl> c2sc dnom frac lnum numr onum pnum salt smcp ss01 ss02 ss03 tnum unic <grek> c2sc ccmp dnom frac lnum numr onum pnum salt smcp ss01 ss02 ss03 tnum unic <hebr> ccmp@ dlig <latn> c2sc ccmp dnom frac lnum locl numr onum pnum salt smcp ss01 ss02 ss03 tnum unic
family: win8Arial GPOS <arab> kern mark mkmk <cyrl> cpsp kern@ mark mkmk <grek> cpsp kern@ mark mkmk <hebr> mark <latn> cpsp kern@ mark mkmk

family: win8Times New Roman GSUB <arab> calt ccmp dlig@ fina init isol liga locl medi rlig <cyrl> c2sc dlig dnom frac lnum numr onum pnum smcp tnum <grek> c2sc ccmp dlig dnom frac lnum numr onum pnum smcp tnum <hebr> ccmp@ dlig <latn> c2sc ccmp dlig dnom frac lnum locl numr onum pnum smcp tnum
family: win8Times New Roman GPOS <arab> kern mark mkmk <cyrl> cpsp kern@ mark mkmk <grek> cpsp kern@ mark mkmk <hebr> mark <latn> cpsp kern@ mark mkmk
Union of the feature dumps above, with only those fonts containing feature rules involving spaces included and omitting those fonts for which only kern and ccmp have feature rules involving spaces.

It's interesting to point out that a number of fonts use space-containing rules for the handling fractions ('frac'), so substitution features containing spaces in their lookups are not just for exotic features used only in display sized text.
Modify the logic in BypassShapedWordCache to evaluate substitution features based on the script (so as to segregate the influence of features like ccmp) and distinguish kerning vs. non-kerning positioning features.  Additionally, explicitly enabling/disabling font-feature-settings to override the default behavior.  Only kerning will be disabled below the auto-quality value.

I still need to set up some tests and do some tryserver runs but this should allow the use of the word cache even with fonts like Win7 Arial and Times New Roman, defaults fonts with spaces in contextual lookups.

Behdad, I'm wondering if there's a more efficient way of collecting per-feature lists.  The hb_ot_layout_table_get_feature_tags function returns a list of features but with *lots* of repeats so the code here has to try and fish out individual features and I have a feeling it could be done better.
Attachment #726138 - Flags: feedback?(mozilla)
Attachment #726138 - Flags: feedback?(jfkthame)
Comment on attachment 726138 [details] [diff] [review]
patch, first pass on bypassing the word cache more selectively

John,

This is how you are supposed to access feature tags:

  - Get number of scripts and script tags the way you are getting,

  - For each script tag, get number of languages and language tags: hb_ot_layout_script_get_language_tags().  Use HB_OT_LAYOUT_DEFAULT_LANGUAGE_INDEX for the default LangSys of the script.

  - For each (script_index,language_index):

    * hb_ot_layout_language_get_required_feature_index()

    * hb_ot_layout_language_get_feature_indexes()

  - Only then you can use the get_feature_tags() array to lookup tags for feature indexes used by this script,language combination.

Does that make sense?
Attachment #726138 - Flags: feedback?(mozilla) → feedback-
Comment on attachment 726138 [details] [diff] [review]
patch, first pass on bypassing the word cache more selectively

Review of attachment 726138 [details] [diff] [review]:
-----------------------------------------------------------------

In principle, the approach here should give good behavior (see also comment 26, which mentioned the possibility of per-script tracking of space features). I'm still somewhat concerned the more complex implementation (of font introspection, but more importantly of BypassShapedWordCache(), which is very much on the critical path for text layout) will lead to a performance hit that is hard to justify - but let's see what talos thinks of it.

Clearing feedback? flag for now, as I assume you'll be reworking this in the light of Behdad's comment.

::: gfx/thebes/gfxFont.h
@@ +357,5 @@
> +    bool             mHasSpaceFeaturesPosNonKerning : 1;
> +    bool             mHasSpaceFeaturesSubDefault : 1;
> +
> +    // bitvector of substitution space features per script
> +    uint32_t         mHasSpaceFeaturesSub[4];  // size == num of MOZ_ script codes

This should be calculated from MOZ_NUM_SCRIPT_CODES, not coded as a "magic number". And then you won't need the static assert in CheckForFeaturesInvolvingSpace.
Attachment #726138 - Flags: feedback?(jfkthame)
With the original patch, I'm seeing a bunch of reftest failures on Windows 7/8:

https://tbpl.mozilla.org/?tree=Try&rev=e9b74d918d32

layout/reftests/first-letter/484400-1.html
layout/reftests/font-inflation/height-constraint-percent-1.html
layout/reftests/font-inflation/height-constraint-percent-2.html
layout/reftests/font-inflation/height-constraint-percent-3.html
layout/reftests/font-inflation/height-constraint-percent-8.html
layout/reftests/font-inflation/fixed-height-body.html
layout/reftests/font-inflation/fixed-height-body-child.html
layout/reftests/text-transform/greek-small-caps-1.html
(In reply to Jonathan Kew (:jfkthame) from comment #47)
> ... I'm still somewhat concerned the more complex implementation (of
> font introspection, but more importantly of BypassShapedWordCache(), which
> is very much on the critical path for text layout) will lead to a
> performance hit that is hard to justify - but let's see what talos thinks of
> it.

I ran through some timings of this on OSX, using the rdtsc to count cycles. I built with a local optimized build, loaded a series of URL's from the guardian, then loaded a local copy of the HTML5 spec, then quit and dumped the stats.  The average time is given with a count of how many times a given piece of code was hit.  I separated out the initial call to BypassShapedWordCache from subsequent calls for each font.

Timing within gfxFont::SplitAndInitTextRun (avg time per call, (count)):

  CheckForFeaturesInvolvingSpace 1612272.100000 clocks (    40)
  BypassShapedWordCache               24.165527 clocks (353930)
  ShapeTextWithoutWordCache        56463.173709 clocks (  9182)
  InitWordCache                       23.677077 clocks (344788)
  shaping w/ word cache            10661.229994 clocks (344788)

The CheckForFeaturesInvolvingSpace value is the cost of checking the GSUB/GPOS features.  It's expensive but only done once per font for the lifetime of the browser run.

So I don't think adding the a couple more flags to BypassShapedWordCache matters very much, it's equivalent to 0.2% of the overall cost of shaping with the word cache.
Revised based on Behdad's comments.  I also removed the no-kerning/kerning at 20px transition.  Kerning will only be applied at spaces if the author specifically enables the kern feature.  That's basically existing behavior and we can work on improving that in the future so that it can be on by default always.

Win7 try server run:

  https://tbpl.mozilla.org/?tree=Try&rev=4d226c891ae4

This seems to indicate a perf improvement over existing code so I need to look a little more closely at the talos output and make sure the results make sense.

I haven't looked at the reason for the reftest failures, that's up next.
Attachment #726138 - Attachment is obsolete: true
Analysis of perf numbers in tryserver build in the last comment:

https://docs.google.com/spreadsheet/ccc?key=0ArCKGq7OfNmMdGd0X1QzZUNYejZxelZJdlpXR2VKSXc&usp=sharing

The perf improvements appear to be across the board, still a little puzzled by this, need to do some logging with the actual testpages to see if I can figure out what the difference is.  The first run numbers do increase but not hugely so (2.6%).  That includes the analysis needed for each font which is a once-per-font-per-browser-run analysis.  Need to understand why ratio of the first run time to the median time increases so dramatically for one of the first few Chinese sites, alipay.com.  Possibly a Chinese font with a large array of GSUB/GPOS features?
What's with this:

-    bool result = hb_set_has(glyphs, aGlyph);
+    bool result = false;
+    if (!hb_set_is_empty(glyphs)) {
+        result = hb_set_has(glyphs, aGlyph);
+    }

The spacesFound variables should be renamed to glyphFound.

In this piece of code:

+    do {
+        len = NS_ARRAY_LENGTH(featureTags);
+        hb_ot_layout_language_get_feature_tags(aFace, aTableTag,
+                                               aScriptIndex, aLangIndex, 0,
+                                               &len, featureTags);
+        for (i = 0; i < len; i++) {
+            uint32_t featureIndex;
+            hb_ot_layout_language_find_feature(aFace, aTableTag,
+                                               aScriptIndex, aLangIndex,
+                                               featureTags[i], &featureIndex);
+
+            spacesFound = HasLookupRuleWithGlyphByFeature(aFace, aTableTag,
+                                                          featureIndex, aGlyph);
+            if (spacesFound) {
+                bool foundExcluded = false;
+                for (hb_tag_t *tag = aExcludeFeatures; *tag; tag++) {
+                    if (featureTags[i] == *tag) {
+                        foundExcluded = true;
+                        aHasSpacesExluded = true;
+                        break;
+                    }
+                }
+                if (!foundExcluded) {
+                    aHasSpaces = true;
+                }
+            }
+        }
+        offset += len;
+    } while (len == NS_ARRAY_LENGTH(featureTags));

hb_ot_layout_language_find_feature() linearly walks over all features in the langsys, which makes the look O(n^2) unnecessarily.  If you switch to using hb_ot_layout_language_get_feature_indexes() and hb_ot_layout_table_get_feature_tags() instead, the loop would be linear.  Only relevant to rogue fonts, but though I mention.

Also, if you have any feedback on the HarfBuzz API, I'd like to hear.
Revised based on comments above.

Note on the API: the code here is using hb_ot_layout_language_get_feature_tags to do a single index ==> tag lookup, which it didn't quite occur to me to be the way to do this initially.  The "offset" parameter is somewhat curious until you look at the implementation source and realize what it's for.  Maybe a helper method that took a single index argument with no length parameter?
Attachment #730200 - Attachment is obsolete: true
Attachment #731037 - Flags: review?(mozilla)
Attachment #731037 - Flags: review?(jfkthame)
Tryserver run with latest patch:

  https://tbpl.mozilla.org/?tree=Try&rev=bb621ee783c0

As I've said before I'm somewhat skeptical about the relevance of the tp numbers in this area of code but at least there's not an obvious regression.

  tp5o trunk:            364.75
  tp5o jkew-original:    364.51
  tp5o selective-bypass: 362.57

It would be nice to get this landed before the cutover Monday.  I'll be working on this over the weekend if there are any changes that need to be made.

I'd be happy to file a follow-up bug on "enable kerning at space boundaries by default" or something like that.
(In reply to John Daggett (:jtd) from comment #51)
> Analysis of perf numbers in tryserver build in the last comment:
> 
> https://docs.google.com/spreadsheet/
> ccc?key=0ArCKGq7OfNmMdGd0X1QzZUNYejZxelZJdlpXR2VKSXc&usp=sharing
> 
> The perf improvements appear to be across the board, still a little puzzled
> by this, need to do some logging with the actual testpages to see if I can
> figure out what the difference is.  The first run numbers do increase but
> not hugely so (2.6%).  That includes the analysis needed for each font which
> is a once-per-font-per-browser-run analysis.

But doesn't this imply there's likely to be a significant regression in the time it takes to render the first page when the browser is launched? Depending what fonts it uses, it looks like we could regress that perceived startup speed by several percent here.

Given how much emphasis there has been on startup performance, I'm not sure that'd be acceptable. Are these figures still roughly the same for the latest version of the patch, or is there a significant improvement?
Comment on attachment 731037 [details] [diff] [review]
patch, bypass word cache more selectively

Review of attachment 731037 [details] [diff] [review]:
-----------------------------------------------------------------

I've been concerned all along at the potential cost of the analysis being done here, and the significant increase in first-run times (as noted above) seems to suggest that it is indeed expensive enough to hurt us.

I think if we're going to do a detailed analysis (inspecting individual scripts and features, not just a single pass over the lookups) then we need to structure it so that we don't analyze any given lookup repeatedly.

::: gfx/thebes/gfxFont.cpp
@@ +1529,5 @@
>  }
>  
> +static bool
> +HasLookupRuleWithGlyphByFeature(hb_face_t *aFace, hb_tag_t aTableTag,
> +                                uint32_t featureIndex,

nit, "aFeatureIndex".

@@ +1560,5 @@
> +
> +static void
> +HasLookupRuleWithGlyphByLanguage(hb_face_t *aFace, hb_tag_t aTableTag,
> +                                 bool& aHasGlyph, hb_tag_t *aExcludeFeatures,
> +                                 bool& aHasGlyphExluded, uint16_t aGlyph,

s/Exluded/Excluded/ please! :)

@@ +1591,5 @@
> +
> +        for (i = 0; i < len; i++) {
> +            uint32_t featureIndex = featureIndexes[i];
> +            glyphFound = HasLookupRuleWithGlyphByFeature(aFace, aTableTag,
> +                                                         featureIndex, aGlyph);

As with the iteration over languages (see below), this will often lead to inspecting the same lookup multiple times, as it's common for multiple feature records to refer to the same lookups.

@@ +1621,5 @@
> +}
> +
> +static void
> +HasLookupRuleWithGlyph(hb_face_t *aFace, hb_tag_t aTableTag, bool& aHasGlyph,
> +                       hb_tag_t *aExcludeFeatures, bool& aHasGlyphExluded,

s/Exluded/Excluded/

@@ +1646,5 @@
> +            HasLookupRuleWithGlyphByLanguage(aFace, aTableTag, aHasGlyph,
> +                                             aExcludeFeatures,
> +                                             aHasGlyphExluded,
> +                                             aGlyph, script, lang);
> +        }

Iterating over the languages like this could get expensive, for fonts that support a number of distinct language systems. Note that it is very common for multiple language systems to end up referencing the -same- lookups for many of their features (e.g. all the 'kern' features in Times New Roman point to the same lookup; there's plenty of lookup-sharing among other features, too, both in GSUB and GPOS).

This means that we'll be iterating over the same lookup multiple times while looking for the space glyph. If it is present, that's ok - we'll stop looking once we've found it - but in the common case where it is -not- present, we'll waste time failing to find it many times. I think if we're going to do this level of detailed analysis, we need to make it as efficient as possible, and avoid this redundancy.

@@ +1716,5 @@
> +            offset += len;
> +        } while (len == NS_ARRAY_LENGTH(scriptTags));
> +    }
> +
> +    // GPOS lookups - distinguish kerning from non-kerning features

It's not clear to me why we'd handle space issues on a per-script basis for GSUB, but not have similar logic for GPOS. For example, IIRC Times New Roman has kerning that involves <space> in some scripts (Latin, Cyrillic, Greek), but not in others (Hebrew, Arabic), so those scripts wouldn't need to abandon caching due to kerning being enabled.

::: gfx/thebes/gfxFont.h
@@ +358,5 @@
> +    bool             mHasSpaceFeaturesNonKerning : 1;
> +    bool             mHasSpaceFeaturesSubDefault : 1;
> +
> +    // bitvector of substitution space features per script
> +    uint32_t         mHasSpaceFeaturesSub[(MOZ_NUM_SCRIPT_CODES >> 5) + 1];

This should be [(MOZ_NUM_SCRIPT_CODES + 31) / 32], to avoid allocating an extra element in the case where MOZ_NUM_SCRIPT_CODES is exactly divisible by 32.

@@ +1588,5 @@
> +                     "need to initialize space lookup flags");
> +        NS_ASSERTION(aRunScript < MOZ_NUM_SCRIPT_CODES, "weird script code");
> +        if (aRunScript == MOZ_SCRIPT_INVALID ||
> +            aRunScript >= MOZ_NUM_SCRIPT_CODES)
> +        {

We normally put the open-brace on the end of the preceding line.

@@ +1615,5 @@
> +        // that involve spaces, bypass
> +        if (HasSubstitutionRulesWithSpaceLookups(aRunScript) ||
> +            mFontEntry->mHasSpaceFeaturesNonKerning ||
> +            mFontEntry->mHasSpaceFeaturesSubDefault)
> +        {

ditto

@@ +1622,5 @@
> +
> +        // if kerning explicitly enabled/disabled via font-feature-settings or
> +        // font-kerning and kerning rules use spaces, only bypass when enabled
> +        if (mKerningSet && mFontEntry->mHasSpaceFeaturesKerning)
> +        {

ditto
Analysis of tryserver builds with latest pull (on Friday) from mozilla-central, with Jonathan's original patch and my additional revised patch:

https://docs.google.com/spreadsheet/ccc?key=0ArCKGq7OfNmMdGt2MGI0ZlpFNkNkdzdHNlFzU284X0E&usp=sharing

Builds:

baseline mozcentral: https://tbpl.mozilla.org/?tree=Try&rev=f3af72e71528
jkew patch: https://tbpl.mozilla.org/?tree=Try&rev=9701afa0ebc2
jd revisions patch: https://tbpl.mozilla.org/?tree=Try&rev=bb621ee783c0

This actually shows the revised patch doing better than the baseline both in terms of the tp5o value and in terms of total first load time.  The ratio of first/median is also lower, this was the value I was concerned about.  Part of this is the elimination of the O(n^2) logic that Behdad noted in comment 52.
(In reply to Jonathan Kew (:jfkthame) from comment #56)
> Comment on attachment 731037 [details] [diff] [review]
> patch, bypass word cache more selectively
> 
> I've been concerned all along at the potential cost of the analysis being
> done here, and the significant increase in first-run times (as noted above)
> seems to suggest that it is indeed expensive enough to hurt us.
> 
> I think if we're going to do a detailed analysis (inspecting individual
> scripts and features, not just a single pass over the lookups) then we need
> to structure it so that we don't analyze any given lookup repeatedly.

The logic in this patch is the same as the logic used within the routines within harfbuzz that collect the lookups.  In other words, if you look at what's happening within the lookup collection routines within harfbuzz, they are iterating over script-lang-feature in exactly the same way.  I don't think the iteration here is that different, I think the work is basically the same as the original patch, which is why the first run numbers are generally the same or lower than the original patch.  The revised patch has an early exit condition if both booleans evaluate to true, while the original patch always collects all lookups.
(In reply to John Daggett (:jtd) from comment #58)

> The logic in this patch is the same as the logic used within the routines
> within harfbuzz that collect the lookups...

Scratch that, the harfbuzz code uses sets to collect the lookups so the same lookup isn't read twice.  Working on a patch that collects the GPOS features into kerning and non-kerning sets, that should satisfy this concern.
Duplicate of this bug: 856266
Revised based on review comments.

Code now iterates over languages/features and collects the lookups into kerning and non-kerning sets in the GPOS case.  This avoids the problem of pulling glyphs for a given lookup multiple times.
Attachment #731037 - Attachment is obsolete: true
Attachment #731037 - Flags: review?(mozilla)
Attachment #731037 - Flags: review?(jfkthame)
Attachment #731798 - Flags: review?(jfkthame)
> > +            offset += len;
> > +        } while (len == NS_ARRAY_LENGTH(scriptTags));
> > +    }
> > +
> > +    // GPOS lookups - distinguish kerning from non-kerning features
> 
> It's not clear to me why we'd handle space issues on a per-script
> basis for GSUB, but not have similar logic for GPOS. For example,
> IIRC Times New Roman has kerning that involves <space> in some
> scripts (Latin, Cyrillic, Greek), but not in others (Hebrew,
> Arabic), so those scripts wouldn't need to abandon caching due to
> kerning being enabled.

It's simply a generalization of the data (see attached files for OSX/Win7/Win8 fonts).  For GSUB features, fonts with spaces in lookup rules are fairly uncommon and specific to a few select scripts in general (e.g. ccmp for Hebrew in Arial).  For GPOS features, kerning is the predominant case and in most cases
the rules with spaces exist across all scripts.  Times New Roman and Arial on 
Windows 7 do have the differences you describe but that's not the common case.

I think we could definitely consider this in the future but essentially it makes the general algorithm more complicated only to get kerning in the Arial/TNR Arabic/Hebrew case. I think it makes sense to land this patch before and tweak it once it lands.
Comment on attachment 731798 [details] [diff] [review]
patch, bypass word cache more selectively

Review of attachment 731798 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +1519,2 @@
>                                             glyphs, glyphs, glyphs,
>                                             glyphs);

It's probably worth checking hb_set_has (which should be a cheap test) after each lookup is processed, as that will let you break out of the loop early if the result turns true.

E.g. something like

  bool result = false;
  while (!result && hb_set_next(...)) {
    hb_ot_layout_lookup_collect...
    result = hb_set_has(...);
  }

@@ +1536,5 @@
> +    uint32_t i, len, offset;
> +
> +    offset = 0;
> +    do {
> +        len = NS_ARRAY_LENGTH(lookups);

I believe mozilla::ArrayLength is preferred nowadays.

@@ +1543,5 @@
> +        for (i = 0; i < len; i++) {
> +            hb_set_add(aLookups, lookups[i]);
> +        }
> +        offset += len;
> +    } while (len == NS_ARRAY_LENGTH(lookups));

mozilla::ArrayLength

@@ +1565,5 @@
> +    uint32_t i, len, offset;
> +
> +    offset = 0;
> +    do {
> +        len = NS_ARRAY_LENGTH(featureIndexes);

mozilla::ArrayLength

@@ +1583,5 @@
> +                                                   &featureTag);
> +
> +            // excluded feature?
> +            bool excluded = false;
> +            for (hb_tag_t *tag = aExcludeFeatures; *tag; tag++) {

Given that this "exclude" feature is only there for kerning (and it's unlikely, IMO, that we'll want to extend it to other things), using an array of feature tags seems like overkill.

I can understand if you don't want to hard-code the 'kern' tag here, but I'd suggest having the caller pass just a single "excluded tag", and get rid of the loop. No need for a more general mechanism at this point.

If we ever did want to handle a list of excluded features, it wouldn't be hard to replace the single tag with a list at that time, but I'm guessing we'll never need it.

@@ +1595,5 @@
> +            hb_set_t *lookups = excluded ? aExcludedFeatureLookups : aLookups;
> +            CollectLookupsByFeature(aFace, aTableTag, featureIndex, lookups);
> +        }
> +        offset += len;
> +    } while (len == NS_ARRAY_LENGTH(featureIndexes));

mozilla::ArrayLength

@@ +1636,5 @@
> +        hb_ot_layout_lookup_collect_glyphs(aFace, aTableTag, index,
> +                                           glyphs, glyphs, glyphs,
> +                                           glyphs);
> +    }
> +    aHasGlyph = hb_set_has(glyphs, aGlyph);

As in HasLookupRuleWithGlyphByScript, check hb_set_has after each lookup and exit the loop if true.

@@ +1646,5 @@
> +        hb_ot_layout_lookup_collect_glyphs(aFace, aTableTag, index,
> +                                           glyphs, glyphs, glyphs,
> +                                           glyphs);
> +    }
> +    aHasGlyphExcluded = hb_set_has(glyphs, aGlyph);

ditto

@@ +1692,5 @@
> +        hb_tag_t scriptTags[8];
> +
> +        offset = 0;
> +        do {
> +            len = NS_ARRAY_LENGTH(scriptTags);

mozilla::ArrayLength

@@ +1711,5 @@
> +                    }
> +                }
> +            }
> +            offset += len;
> +        } while (len == NS_ARRAY_LENGTH(scriptTags));

mozilla::ArrayLength
Revised based on review comments.
Attachment #731798 - Attachment is obsolete: true
Attachment #731798 - Flags: review?(jfkthame)
Attachment #734331 - Flags: review?
Minor tweak, needed to add more API calls to the symbols.def.in list to make the Windows build function.
Attachment #734331 - Attachment is obsolete: true
Attachment #734331 - Flags: review?
Attachment #734467 - Flags: review?
Comment on attachment 734467 [details] [diff] [review]
patch v5a, bypass word cache more selectively

Review of attachment 734467 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxFont.cpp
@@ +1542,5 @@
> +    do {
> +        len = ArrayLength(lookups);
> +        hb_ot_layout_feature_get_lookups(aFace, aTableTag, aFeatureIndex,
> +                                         offset, &len, lookups);
> +        for (i = 0; i < len; i++) {

Small nit - this is C++, not K&R, so IMO it's nicer to declare the loop index variable right in the for-statement rather than at the top of the function, here and in other instances.

(You've also got other variable declarations that could be done at the point of use rather than bunching them at the start of a block, but the loop indexes are the most obvious.)
Attachment #734467 - Flags: review? → review+
Comment on attachment 699238 [details] [diff] [review]
don't use per-word shaping with fonts that use <space> in opentype lookups.

Flipping this to r+, concerns addressed in separate follow-up patch
Attachment #699238 - Flags: review- → review+
Oh, I also meant to say - we really should have a reftest that tests the distinction between default kerning and explicitly-enabled kerning here. At least tests that use the standard Win7/8 fonts to confirm they behave as intended...if they pass trivially on platforms that don't have the relevant fonts, that's OK.
https://hg.mozilla.org/mozilla-central/rev/8a0073ae1a45
https://hg.mozilla.org/mozilla-central/rev/703e4668b5c8
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Matt Brubeck (:mbrubeck) from comment #70)
> It looks like this caused a regression in the "Trace Malloc Leaks" test:
> http://graphs.mozilla.org/graph.html#tests=[[28,63,8]]&sel=1365600372000,
> 1365773172000&displayrange=7&datatype=running

Hmmm, I see the bump in the leaks number in the Win7 build log.  Any idea if there's a more detailed logfile hanging around somewhere?
Attached patch patch, additional reftests (obsolete) — Splinter Review
This tweaks the original tests Jonathan made and adds (1) kerning tests for Arial/Times New Roman and (2) full-width feature test for Japanese fonts.  The appropriate fonts are only available on OSX or Win7+ so the tests are marked appropriately for other platforms.  Jonathan's original test uses a downloadable font so that is tested on all platforms.

tryserver reftest run:
https://tbpl.mozilla.org/?tree=Try&rev=5f9a81483a34
Attachment #737390 - Flags: review?(jfkthame)
Comment on attachment 737390 [details] [diff] [review]
patch, additional reftests

We should also have tests for the behavior where 'kern' is not explicitly set at all (either to on or off), to confirm that's doing what we expect (i.e. ignoring the kerns with <space>, but still doing kerning within words).
(In reply to Jonathan Kew (:jfkthame) from comment #73)
> Comment on attachment 737390 [details] [diff] [review]
> patch, additional reftests
> 
> We should also have tests for the behavior where 'kern' is not explicitly
> set at all (either to on or off), to confirm that's doing what we expect
> (i.e. ignoring the kerns with <space>, but still doing kerning within words).

So you mean something like <span>AWAY</span> with != default nokern as the test?
For <span>AWAY</span>, we'd expect default == kern on, and default != kern off (or equivalently, check that kern on != kern off and that default == kern on).

And for <span>Q A Q</span>, we'd expect default == kern off, and default != kern on.

Yes?

(The latter will of course need to be updated if/when we make kerning-with-space enabled by default, but we should have a test that confirms we are getting the behavior we expect right now.)
Revised to add additional kerning tests, as per comment 75
Attachment #737390 - Attachment is obsolete: true
Attachment #737390 - Flags: review?(jfkthame)
Attachment #739424 - Flags: review?(jfkthame)
Comment on attachment 739424 [details] [diff] [review]
patch, additional reftests

Review of attachment 739424 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine. One suggestion below is to invert the sense of the space-kerning tests so they "fail" with an UNEXPECTED PASS if the kerning actually does occur.

::: layout/reftests/font-features/fwid-spaces-ref.html
@@ +2,5 @@
> +<html lang="ja">
> +<head>
> +<title>Full-width variations of regular spaces</title>
> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
> +  

nit, there are a couple of stray spaces here that seem to appear in all these files.

::: layout/reftests/font-features/reftest.list
@@ +60,5 @@
> +# requires Japanese font with feature support, WinXP lacks one
> +random-if(!winWidget&&!cocoaWidget) fails-if(/^Windows\x20NT\x205\.1/.test(http.oscpu)) HTTP(..) == fwid-spaces.html fwid-spaces-ref.html
> +# Arial/Times New Roman on Win7+/OSX 10.6+ have kerning pairs that include spaces
> +random-if(!winWidget&&!cocoaWidget) random-if(/^Windows\x20NT\x205\.1/.test(http.oscpu)) HTTP(..) == kerning-spaces-arial-nokern.html kerning-spaces-arial-default.html
> +random-if(!winWidget&&!cocoaWidget) random-if(/^Windows\x20NT\x205\.1/.test(http.oscpu)) HTTP(..) != kerning-spaces-arial-kern.html kerning-spaces-arial-default.html

Actually, on second thought I think the best way to handle the kerning-spaces tests for arial and tnr would be to make the nokern test use !=, and the kern test use ==. After all, that would be the "ideal" result; the fact that currently, they have the opposite behavior is a known limitation of our implementation.

Then simply annotate them as failing on Win7+/OSX (and random elsewhere, as you already have).
Attachment #739424 - Flags: review?(jfkthame) → review+
Depends on: 903475
Depends on: 921858
Depends on: 941474
You need to log in before you can comment on or make changes to this bug.