use harfbuzz for all text shaping on Windows

VERIFIED FIXED in mozilla28

Status

()

defect
VERIFIED FIXED
7 years ago
3 years ago

People

(Reporter: jfkthame, Assigned: jfkthame)

Tracking

(Blocks 1 bug)

unspecified
mozilla28
All
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

With harfbuzz now supporting the full range of complex scripts, we can look at the possibility of switching completely from Windows platform shaping (Uniscribe and DirectWrite) to harfbuzz for all text. This will give us consistent behavior across platforms, and allow us to simplify the codebase and eliminate the multiple Windows shaping back-ends we're currently maintaining.

Before we make this switch, however, we should have a thorough test suite in place to verify that we don't regress text shaping with any of the wide range of fonts and scripts out there... hence marking as dependent on bug 770591.
Jonathan, it would be very useful to allow for windows users to test this out by allowing - via a preference - to switch all text shaping to harfbuzz. gfx.font_rendering.harfbuzz.always = true/false?

That'd facilitate broader testing.
You can do that by setting gfx.font_rendering.harfbuzz.scripts to -1. (It's a set of bit flags -- see comments in libpref/src/init/all.js -- so -1 turns them all on.)
Ah, had noticed this pref but my search for documentation on it returned nothing. Good to know.
Blocks: 854897
I think it's time for us to flip the switch, and start shaping all text through harfbuzz, with one exception (see below). The harfbuzz Indic and SE Asian shapers are sufficiently mature and tested, IMO, that we can move forward here.

This will give us common behavior across all platforms for OpenType fonts; we already use harfbuzz for all text (except Graphite fonts) on Linux, Android, B2G, and OS X except when using AAT fonts (i.e. most OS X system fonts).

For current harfbuzz testing status, see http://people.mozilla.org/~jkew/hb-test/stats.html. (Note that many of the "discrepancies" flagged there are understood and accepted; there are cases where the harfbuzz and uniscribe results are equally acceptable even though not identical, and even cases harfbuzz gives a clearly better result than uniscribe.)

The one exception we should keep on Windows, I think, is Korean. Harfbuzz does not yet have a hangul shaper, which means that Korean text encoded with sequences of jamo (rather than as precomposed hangul syllables) will not shape correctly; see examples such as:

http://ec2-54-226-13-158.compute-1.amazonaws.com/testcase-view.html?file=out/fonts/win8/malgun.ttf/texts/hang/wikipedia/ko.txt.diffs#L_1_1
http://ec2-54-226-13-158.compute-1.amazonaws.com/testcase-view.html?file=out/fonts/win8/malgun.ttf/texts/hang/wikipedia/ko.txt.diffs#G_1_1

AFAICT, such text is extremely rare on the Web; nevertheless, it would be unfortunate to regress it. So until the hangul shaper in harfbuzz is ready, we should keep using the uniscribe/directwrite path for that.
Assignee: nobody → jfkthame
Tryserver run with the patches here:
https://tbpl.mozilla.org/?tree=Try&rev=e73d743cc504
(In reply to Jonathan Kew (:jfkthame) from comment #4)
> The one exception we should keep on Windows, I think, is Korean. Harfbuzz
> does not yet have a hangul shaper, which means that Korean text encoded with
> sequences of jamo (rather than as precomposed hangul syllables) will not
> shape correctly; see examples such as:
>
> [snip]
>
> AFAICT, such text is extremely rare on the Web; nevertheless, it would be
> unfortunate to regress it. So until the hangul shaper in harfbuzz is ready,
> we should keep using the uniscribe/directwrite path for that.

My understanding is that this use case arises in input method situations.
Comment on attachment 826368 [details] [diff] [review]
use harfbuzz on Windows for all text shaping except Hangul

Hmmm, why omit Hangul only under Windows?  Seems like the same exception logic should apply on OSX.  Or does OSX lack the ability to do Hangul shaping with OpenType fonts (i.e. with fonts lacking AAT tables)?
Comment on attachment 826369 [details] [diff] [review]
adjust reftest annotations to account for switch from uniscribe/directwrite to harfbuzz for indic shaping

Would it make sense just to remove the reftest for 553571 rather than mark it completely random?  Either way is fine, but if you leave it please explain what it's doing and fix the grammar.

r+ with those changes.
Attachment #826369 - Flags: review?(jdaggett) → review+
(In reply to John Daggett (:jtd) from comment #8)
> > AFAICT, such text is extremely rare on the Web; nevertheless, it would be
> > unfortunate to regress it. So until the hangul shaper in harfbuzz is ready,
> > we should keep using the uniscribe/directwrite path for that.
> 
> My understanding is that this use case arises in input method situations.

I'd expect that normally the input method takes care of composing sequences of jamo into precomposed syllables as appropriate. And when a syllable is only partially typed, it wouldn't be expected to combine yet anyway.

But yes, it may be that there are some input method situations where this would be relevant.

(In reply to John Daggett (:jtd) from comment #9)
> Comment on attachment 826368 [details] [diff] [review]
> use harfbuzz on Windows for all text shaping except Hangul
> 
> Hmmm, why omit Hangul only under Windows?  Seems like the same exception
> logic should apply on OSX.  Or does OSX lack the ability to do Hangul
> shaping with OpenType fonts (i.e. with fonts lacking AAT tables)?

I believe the default Korean fonts that Apple ships are AAT rather than OpenType Layout fonts, so the issue would rarely arise.

Note that we've been sending all (OpenType) text through harfbuzz on OS X for some time now (since bug 797402, landed for mozilla-19), and the problem, if there is one, hasn't yet been reported. Which of course doesn't absolutely prove anything, but does suggest that it's not a widespread concern.

In fact, AFAICT the Apple Gothic font (for example) doesn't do -any- composition of jamo sequences; the decomposed form simply isn't supported. E.g., try

  data:text/html,<p style="font-family:apple gothic">&%23x3145;&%23x314f;&%23x3141;

which renders (for me on 10.7, at least) as the decomposed sequence, not as "삼".

If we want to investigate behavior of (non-Apple) Korean fonts on (various releases of) OS X, and consider changing the settings if that appears helpful, I think we should spin that off into a separate bug. (But IMO, it's probably not worth pursuing. We should just get full Hangul support into harfbuzz, and consider the issue closed.)
Comment on attachment 826368 [details] [diff] [review]
use harfbuzz on Windows for all text shaping except Hangul

(In reply to Jonathan Kew (:jfkthame) from comment #11)
> Note that we've been sending all (OpenType) text through harfbuzz on OS X
> for some time now (since bug 797402, landed for mozilla-19), and the
> problem, if there is one, hasn't yet been reported. Which of course doesn't
> absolutely prove anything, but does suggest that it's not a widespread
> concern.

Um, that's not such a great way of proving "widespread concern".  Users in Asia often don't report problems they run into.  And our share in Korea is very low due to the unique characteristics of the Korean market (special encryption standards etc.).

But I guess it doesn't matter so much.  I was just puzzled by the inconsistency across platforms.  Might be nice to have a comment explaining the Hangul situation under OSX.
Attachment #826368 - Flags: review?(jdaggett) → review+
(In reply to John Daggett (:jtd) from comment #10)
> Comment on attachment 826369 [details] [diff] [review]
> adjust reftest annotations to account for switch from uniscribe/directwrite
> to harfbuzz for indic shaping
> 
> Would it make sense just to remove the reftest for 553571 rather than mark
> it completely random?  Either way is fine, but if you leave it please
> explain what it's doing and fix the grammar.

Actually, I think we can explicitly mark it "fails" for all platforms, with a comment explaining that the behavior it was testing was Windows-specific and is not currently expected to occur anywhere. Then it'll alert us with an "unexpected pass" if we ever change things back, or if harfbuzz behavior changes with respect to this example.

(In reply to John Daggett (:jtd) from comment #12)
> Comment on attachment 826368 [details] [diff] [review]
> use harfbuzz on Windows for all text shaping except Hangul
> 
> (In reply to Jonathan Kew (:jfkthame) from comment #11)
> > Note that we've been sending all (OpenType) text through harfbuzz on OS X
> > for some time now (since bug 797402, landed for mozilla-19), and the
> > problem, if there is one, hasn't yet been reported. Which of course doesn't
> > absolutely prove anything, but does suggest that it's not a widespread
> > concern.
> 
> Um, that's not such a great way of proving "widespread concern".  Users in
> Asia often don't report problems they run into.  And our share in Korea is
> very low due to the unique characteristics of the Korean market (special
> encryption standards etc.).

True. Still, the fact that the default font on OS X doesn't even seem to support composing hangul syllables from jamo sequences also suggests to me that this is far from a mainstream issue.

Updated comments and pushed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/013c9b686c9b
https://hg.mozilla.org/integration/mozilla-inbound/rev/e4f335f65678
(In reply to Jonathan Kew (:jfkthame) from comment #13)
> (In reply to John Daggett (:jtd) from comment #10)
> > Comment on attachment 826369 [details] [diff] [review]
> > adjust reftest annotations to account for switch from uniscribe/directwrite
> > to harfbuzz for indic shaping
> > 
> > Would it make sense just to remove the reftest for 553571 rather than mark
> > it completely random?  Either way is fine, but if you leave it please
> > explain what it's doing and fix the grammar.
> 
> Actually, I think we can explicitly mark it "fails" for all platforms,

Argh - that doesn't quite work, as devices that completely lack any Sinhala fonts and end up drawing notdef boxes may accidentally "pass" the test. So I've pushed a followup to mark it as random after all. (We may want to tighten up on this at some point when we have more complete font support and can decide exactly what should be tested here.)

https://hg.mozilla.org/integration/mozilla-inbound/rev/b85a27bdccfe
about:support is still showing DirectWrite=true for me on 20131106 Nightly.  Shouldn't it be false now?
(In reply to Alexandre Folle de Menezes from comment #16)
> about:support is still showing DirectWrite=true for me on 20131106 Nightly. 
> Shouldn't it be false now?

DirectWrite shaping is distinct from DirectWrite rendering.  What you're seeing now is HarfBuzz shaping with DirectWrite rendering.  Clear as mud? ;)
Adding "verifyme" for QA to do some exploratory testing here and check for regressions (all types of characters, including other languages, different fonts etc).
Keywords: verifyme
I performed extensive exploratory testing on this matter using the latest Beta (Build ID: 20140224220227) with the following platforms:
- Windows 7 64-bit [1]
- Windows 8.1 64-bit [2]
- Windows XP 64-bit [3]

I was able to confirm the fix using the following locales: fr, de, he and ko on each of the mentioned platforms, with additional testing on special unicode characters using website [4].

Additional notes:
- Some special unicode characters are still displayed using hex code boxes instead, but this might be a font support related issue.
- The ko (Korean) build is displayed completely with hex code boxes on Windows XP [3], again, this seems like a font support issue, but a confirmation on the matter would be appreciated.


[1] Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
[2] Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
[3] Mozilla/5.0 (Windows NT 5.2; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
[4] http://shapecatcher.com/
Status: RESOLVED → VERIFIED
Keywords: verifyme
Blocks: 985220
You need to log in before you can comment on or make changes to this bug.