Closed Bug 975915 Opened 6 years ago Closed 4 years ago

automatic fractions feature is applied by default in some cases

Categories

(Core :: Graphics: Text, defect)

defect
Not set

Tracking

()

RESOLVED INVALID

People

(Reporter: jtd, Unassigned)

References

()

Details

Within harfbuzz code, the OpenType fractions feature is applied automatically to certain text strings. Specifically, numbers separated by a fraction-slash (u+2044) will automatically appear with the OpenType "frac" feature enabled.

Steps to reproduce:

Open the URL in Nightly.

Result: strings with fraction-slashes are automatically formatted as fractions using the OpenType "frac" feature while fractions with normal slashes are not.

Expected result: fractional forms only appear when explicitly enabled.

While I understand the reasoning behind this, that the use of the fraction-slash implies the desire to see fraction-like formatting, I think it will lead to inconsistency with other browsers (neither Chrome or IE implement this and I strongly suspect they never will).  It's also sort of inconsistent with the OpenType spec, where the "frac" feature is defined as a feature that is off by default:

https://www.microsoft.com/typography/otspec/features_fj.htm#frac

The CSS3 Fonts spec requires default features to be enabled but "frac" is not one of them:

http://www.w3.org/TR/css3-fonts/#default-features

While the details of this are subtle, there are also performance considerations. We can make perf optimizations if we assume a core set of default features. If shapers are adhoc enabling non-default features like this, these perf optimizations get significantly more complex.
Summary: automatic fractions feature is applied by default in some case → automatic fractions feature is applied by default in some cases
(In reply to John Daggett (:jtd) from comment #0)

> Expected result: fractional forms only appear when explicitly enabled.

I disagree with the expectation here. The fraction slash U+2044 is a character that has specific behavior in relation to the characters surrounding it, and this behavior should be implemented (by default) in the rendering system.

This was recently discussed on the OpenType list. As Thomas Phinney pointed out there, the Unicode standard itself specifies this (TUS 6.2, p200):

>>>
*Fraction Slash.* U+2044 fraction slash is used between digits to form numeric fractions, such as 2/3 and 3/9. The standard form of a fraction built using the fraction slash is defined as follows: any sequence of one or more decimal digits (General Category = Nd), followed by the fraction slash, followed by any sequence of one or more decimal digits. Such a fraction should be displayed as a unit, such as ¾ or [vertical 3/4 glyph]. The precise choice of display can depend on additional formatting information.
<<<

Yes, frac (and numr/dnom) is not a "default" feature as mentioned in CSS3 Fonts - it's not applied indiscriminately to all text, like liga or kern or mark. Rather, it's more comparable to the features that the complex-script shapers apply selectively, based on analysis of the text, in order to render individual glyphs and sequences appropriately.
I agree with Jonathan.  Also, one of the criteria we have been following in HarfBuzz is: when Unicode and OpenType disagree, Unicode wins.  That OpenType spec says frac is off by default is irrelevant.  OpenType spec is a set of recommendations as best since it has not really be updated or designed to keep up with advances in Unicode and shaping technology.  For example, skipping ZWJ/ZWNJ/Default_Ignorables is never mentioned in OpenType.  Normalizing text to find the best rendering possible is not specificed in OpenType either.
Unlike script handling, I don't think this is a cut and dry "must have" feature.  Automatically applying the "frac" feature is by no means the common behavior here. No other browser does this, nor do common applications using CoreText like TextEdit on OSX.  Nor do I think this is part of author expectations. There just isn't a large volume of content out there that really requires this functionality.  The subtleties of fractions usually requires some sort of explicit formatting (e.g. to distinguish 1 2/3 from 12/3).  The back and forth on the OpenType list I think reflects that there's no clear consensus that this is required behavior.

The downside of this functionality is fairly large, especially for Gecko.  All strings of Latin text need to explicitly be searched to determine whether a fraction-slash is present or not.  Over time I think it's reasonable to expect more and more fonts to support automatic fractions.  If we need to delicately handle these fonts to avoid bypassing the word cache in most cases it definitely adds overhead.  I think the balance of general performance versus specific feature leans towards avoiding this sort of special-case text handling if at all possible.

I think it's perfectly reasonable for harfbuzz to support this sort of feature but I think it needs to be behind an explicit option that in the case of Gecko is disabled. Browsers are just too performance sensitive to be doing this sort of special-case handling for strings of Latin text.
(In reply to John Daggett (:jtd) from comment #3)
> Unlike script handling, I don't think this is a cut and dry "must have"
> feature.  Automatically applying the "frac" feature is by no means the
> common behavior here. No other browser does this, nor do common applications
> using CoreText like TextEdit on OSX.

That depends on the font - at least a couple of Apple's fonts (try Skia and Apple Chancery) do implement this behavior. (Via AAT rather than OpenType 'frac', but that's incidental.)

>  Nor do I think this is part of author
> expectations. There just isn't a large volume of content out there that
> really requires this functionality.

There isn't a large volume of such content, in part (at least) because few systems have yet implemented the Unicode-specified behavior of fraction slash, and therefore there's little incentive for authors to use it. In many fonts, a sequence of <digit><U+2044><digit> actually looks really *bad* because the slash clashes with the adjacent digits; its design and spacing is intended to work with superior and inferior numerals, not normal lining numerals.

>  The subtleties of fractions usually
> requires some sort of explicit formatting (e.g. to distinguish 1 2/3 from
> 12/3).

Appropriate Unicode plain text for something like 1⅔ would be <one><thinspace><two><fraction-slash><three>. This doesn't require any explicit formatting, *provided* the rendering system implements fraction slash properly.

>  The back and forth on the OpenType list I think reflects that
> there's no clear consensus that this is required behavior.
> 
> The downside of this functionality is fairly large, especially for Gecko. 
> All strings of Latin text need to explicitly be searched to determine
> whether a fraction-slash is present or not.  Over time I think it's
> reasonable to expect more and more fonts to support automatic fractions.  If
> we need to delicately handle these fonts to avoid bypassing the word cache
> in most cases it definitely adds overhead.

Fortunately, we don't actually need to avoid the word cache for this, because of the Unicode spec for how Fraction Slash behaves: the effect does not include or reach across spaces. The Fira 'frac' feature may also affect spaces (intended to improve examples such as "1 ²⁄₃" by reducing the space), but to gain *that* benefit authors will need to explicitly apply it. So for purposes of word caching, I think we can legitimately ignore the frac feature, even though harfbuzz may apply it.

>  I think the balance of general
> performance versus specific feature leans towards avoiding this sort of
> special-case text handling if at all possible.
> 
> I think it's perfectly reasonable for harfbuzz to support this sort of
> feature but I think it needs to be behind an explicit option that in the
> case of Gecko is disabled. Browsers are just too performance sensitive to be
> doing this sort of special-case handling for strings of Latin text.

I still disagree with this. By this kind of argument, we should never have implemented things like enabled-by-default ligatures and kerning for Latin script. After all, they're typographic subtleties rather than a cut-and-dried "must have" feature; other browsers didn't implement them; and they certainly have a performance overhead.

FWIW, I'd also expect this behavior to appear in Chrome in due course, now that it's built in to harfbuzz.
(In reply to Jonathan Kew (:jfkthame) from comment #4)

> FWIW, I'd also expect this behavior to appear in Chrome in due course, now
> that it's built in to harfbuzz.

Correct.  Definitely not in the "fast-path", but we are working on getting rid of that one too.

Lets not put performance first!
(In reply to Jonathan Kew (:jfkthame) from comment #4)
>> I think it's perfectly reasonable for harfbuzz to support this sort
>> of feature but I think it needs to be behind an explicit option
>> that in the case of Gecko is disabled. Browsers are just too
>> performance sensitive to be doing this sort of special-case
>> handling for strings of Latin text.
>
> I still disagree with this. By this kind of argument, we should
> never have implemented things like enabled-by-default ligatures and
> kerning for Latin script. After all, they're typographic subtleties
> rather than a cut-and-dried "must have" feature; other browsers
> didn't implement them; and they certainly have a performance
> overhead.

That's not a fair comparison. The feature here is one that's applied
to common text strings for the sake of a *single* codepoint, the
fraction-slash, and is not commonly expected behavior.  It's a special
case situation, not common functionality.

Kerning and ligatures have a huge benefit to commonly used text which
is why we accept the performance hit.

> Fortunately, we don't actually need to avoid the word cache for
> this, because of the Unicode spec for how Fraction Slash behaves:
> the effect does not include or reach across spaces. The Fira 'frac'
> feature may also affect spaces (intended to improve examples such as
> "1 ²⁄₃" by reducing the space), but to gain *that* benefit authors
> will need to explicitly apply it. So for purposes of word caching, I
> think we can legitimately ignore the frac feature, even though
> harfbuzz may apply it.

Ok, if we don't have to sniff textruns for the fraction-slash (beyond
what harfbuzz is already doing) and do any sort of special word cache
handling for fonts that support automatic fractions, I'm less opposed
to this.

> FWIW, I'd also expect this behavior to appear in Chrome in due course, now
> that it's built in to harfbuzz.

I hope so, that would be a great thing for the web platform. I'm
skeptical that this will happen for the default Latin text path
however.
(In reply to Behdad Esfahbod from comment #5)

> Lets not put performance first!

Um, we're a browser, performance is critical. We balance feature
quality with performance all the time.  Which is why we need to be
exceedingly careful about adding overhead that brings marginal
improvement to a very small set of actual use cases. In this case, the
use of the word cache is critical to Firefox performance currently
(improvements in layout would make it less critical but it will still
be important). But if this feature doesn't conflict with typical word
cache usage, I'm not as concerned about it.
FWIW, the fraction processing does not impose any performance degradation in HB anymore, as we only the extra processing if we saw a FRACTION SLASH in the buffer while processing Unicode properties.

I'd say this should be closed.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.