Last Comment Bug 780409 - update HarfBuzz code following the HB Lemongrass (July 2012) hackfest
: update HarfBuzz code following the HB Lemongrass (July 2012) hackfest
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Graphics: Text (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: Jonathan Kew (:jfkthame)
:
Mentors:
Depends on:
Blocks: 631159 723966 766217 767249 772380 776085 782908
  Show dependency treegraph
 
Reported: 2012-08-04 12:17 PDT by Jonathan Kew (:jfkthame)
Modified: 2016-04-07 11:38 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch, update harfbuzz to commit 8ba8042... (586.31 KB, patch)
2012-08-04 12:17 PDT, Jonathan Kew (:jfkthame)
jd.bugzilla: review+
Details | Diff | Review
patch 2, update types in gfxFont.h to match harfbuzz changes (2.50 KB, patch)
2012-08-04 12:20 PDT, Jonathan Kew (:jfkthame)
jd.bugzilla: review+
Details | Diff | Review
followup, remove extra comma at end of enum (1.08 KB, patch)
2012-08-07 03:19 PDT, Landry Breuil (:gaston)
jfkthame: review+
Details | Diff | Review

Description Jonathan Kew (:jfkthame) 2012-08-04 12:17:06 PDT
Created attachment 649024 [details] [diff] [review]
patch, update harfbuzz to commit 8ba8042...

During the Toronto work week and following, harfbuzz has made great strides in Indic/SEAsian shaping, as well as some useful performance improvements. So, it's time for us to take an updated copy.

This should also resolve a number of our currently-open HB-related issues.

Tryserver run at https://tbpl.mozilla.org/?tree=Try&rev=0812bde6b33c looks good.
Comment 1 Jonathan Kew (:jfkthame) 2012-08-04 12:20:02 PDT
Created attachment 649025 [details] [diff] [review]
patch 2, update types in gfxFont.h to match harfbuzz changes

A few minor type updates needed for the latest harfbuzz version.
Comment 2 John Daggett (:jtd) 2012-08-05 19:26:35 PDT
What's the plan with the coretext shaping code?  Is this something intended for Chrome only?
Comment 3 John Daggett (:jtd) 2012-08-05 19:58:58 PDT
> +  {HB_SCRIPT_INVALID,	false,     0,BASE_POS_LAST, REPH_POS_DEFAULT,    REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_DEVANAGARI,true, 0x094D,BASE_POS_LAST, REPH_POS_BEFORE_POST,REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_BENGALI,	true, 0x09CD,BASE_POS_LAST, REPH_POS_AFTER_SUB,  REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_GURMUKHI,	true, 0x0A4D,BASE_POS_LAST, REPH_POS_BEFORE_SUB, REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_GUJARATI,	true, 0x0ACD,BASE_POS_LAST, REPH_POS_BEFORE_POST,REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_ORIYA,	true, 0x0B4D,BASE_POS_LAST, REPH_POS_AFTER_MAIN, REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_TAMIL,	true, 0x0BCD,BASE_POS_LAST, REPH_POS_AFTER_POST, REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_TELUGU,	true, 0x0C4D,BASE_POS_LAST, REPH_POS_AFTER_POST, REPH_MODE_EXPLICIT},
> +  {HB_SCRIPT_KANNADA,	true, 0x0CCD,BASE_POS_LAST, REPH_POS_AFTER_POST, REPH_MODE_IMPLICIT},
> +  {HB_SCRIPT_MALAYALAM,	true, 0x0D4D,BASE_POS_LAST, REPH_POS_AFTER_MAIN, REPH_MODE_LOG_REPHA},
> +  {HB_SCRIPT_SINHALA,	false,0x0DCA,BASE_POS_FIRST,REPH_POS_AFTER_MAIN, REPH_MODE_EXPLICIT},
> +  {HB_SCRIPT_KHMER,	false,0x17D2,BASE_POS_FIRST,REPH_POS_DEFAULT,    REPH_MODE_VIS_REPHA},

We have tests for some of these scripts but I think we should be
adding more tests so that we can establish some level of baseline
support against which to test future updates.  Seems like scripts with
less overlap need the most attention, Khmer for example.

Would it make sense to rename reftests/indic-shaping to reftests/complex-shaping?
Comment 4 John Daggett (:jtd) 2012-08-05 23:51:09 PDT
Comment on attachment 649024 [details] [diff] [review]
patch, update harfbuzz to commit 8ba8042...

+     * - If a ligature is formed of components that some of which are also ligatures
+     *   themselves, and those ligature components had marks attached to *their*
+     *   components, we have to attach the marks to the new ligature component
+     *   positions!  Now *that*'s tricky!  And these marks may be following the
+     *   last component of the whole sequence, so we should loop forward looking
+     *   for them and update them.
+     *
+     *   Eg. the sequence is LAM,LAM,SHADDA,FATHA,HEH, and the font first forms a
+     *   'calt' ligature of LAM,HEH, leaving the SHADDA and FATHA with a ligature
+     *   id and component == 1.  Now, during 'liga', the LAM and the LAM-HEH ligature
+     *   form a LAM-LAM-HEH ligature.  We need to reassign the SHADDA and FATHA to
+     *   the new ligature with a component value of 2.

Here too, it would be nice to have tests to cover these sorts of edge cases.

> +++ b/gfx/harfbuzz/src/hb-ot-shape-complex-indic-machine.hh
> @@ -26,183 +26,772 @@
>   * Google Author(s): Behdad Esfahbod
>   */
>  
>  #ifndef HB_OT_SHAPE_COMPLEX_INDIC_MACHINE_HH
>  #define HB_OT_SHAPE_COMPLEX_INDIC_MACHINE_HH
>  
>  #include "hb-private.hh"
>  
> -HB_BEGIN_DECLS
>  
> -
> -#line 38 "hb-ot-shape-complex-indic-machine.hh.tmp"
> +#line 36 "hb-ot-shape-complex-indic-machine.hh.tmp"
>  static const unsigned char _indic_syllable_machine_trans_keys[] = {
> -	5u, 5u, 1u, 2u, 1u, 2u, 5u, 5u, 1u, 5u, 1u, 2u, 5u, 5u, 1u, 13u, 
> -	4u, 11u, 4u, 11u, 5u, 11u, 1u, 10u, 1u, 10u, 10u, 10u, 10u, 10u, 4u, 10u, 
> -	5u, 10u, 8u, 10u, 5u, 10u, 6u, 10u, 9u, 10u, 4u, 11u, 1u, 13u, 4u, 10u, 
> -	4u, 10u, 5u, 10u, 4u, 10u, 5u, 10u, 8u, 10u, 10u, 10u, 10u, 10u, 4u, 10u, 
> -	4u, 10u, 5u, 10u, 4u, 10u, 5u, 10u, 8u, 10u, 10u, 10u, 10u, 10u, 0
> +	1u, 16u, 13u, 13u, 4u, 14u, 5u, 7u, 7u, 7u, 5u, 7u, 5u, 7u, 7u, 7u, 
> +	5u, 7u, 5u, 7u, 7u, 7u, 5u, 7u, 5u, 7u, 7u, 7u, 4u, 4u, 6u, 6u, 
> +	16u, 16u, 4u, 7u, 6u, 6u, 16u, 16u, 4u, 7u, 6u, 6u, 16u, 16u, 4u, 7u, 
> +	6u, 6u, 16u, 16u, 4u, 14u, 4u, 14u, 4u, 14u, 4u, 14u, 4u, 14u, 4u, 14u, 
> +	4u, 14u, 4u, 14u, 4u, 14u, 1u, 16u, 13u, 13u, 4u, 14u, 5u, 7u, 7u, 7u, 
> +	5u, 7u, 5u, 7u, 7u, 7u, 5u, 7u, 5u, 7u, 7u, 7u, 5u, 7u, 5u, 7u, 
> +	7u, 7u, 4u, 4u, 6u, 6u, 16u, 16u, 4u, 7u, 6u, 6u, 16u, 16u, 4u, 7u, 

Is there any document or reference on how this is generated?

> BUILT_SOURCES += hb-ot-shape-complex-indic-machine.hh
> EXTRA_DIST += hb-ot-shape-complex-indic-machine.rl
> hb-ot-shape-complex-indic-machine.hh: hb-ot-shape-complex-indic-machine.rl
> 	$(AM_V_GEN)$(top_srcdir)/missing --run ragel -e -F1 -o "$@.tmp" "$<" && \
> 	mv "$@.tmp" "$@" || ( $(RM) "$@.tmp" && false )

How is the hb-ot-shape-complex-indic-machine.hh generated here?  I see it in 
Makefile.am but I'm guessing you're building this yourself and checking it in
with each update.  It would be nice to have a note somewhere on how that's done.

r+ with the inclusion of additional tests.
Comment 5 Jonathan Kew (:jfkthame) 2012-08-06 04:18:29 PDT
Pushed to inbound, with an added README-mozilla file that discusses updating, etc.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e62bc7eb4031
https://hg.mozilla.org/integration/mozilla-inbound/rev/f24229bc0ec8

Re test coverage, see bug 770591 (Continuous-testing HarfBuzz), which will provide us with far more extensive and detailed test coverage, especially for the various complex scripts.

I looked at the possibility of using reftests (with the test-pref and ref-pref annotations) to compare harfbuzz shaping with platform-native shaping for these various scripts, but this is not workable in general because we tend to run into subpixel rendering differences due to the engines working with different precisions, rounding at different stages, etc. These do not indicate actual shaping problems, but cause reftest-style comparisons to fail even when the shaping is in fact correct.
Comment 6 Jonathan Kew (:jfkthame) 2012-08-06 04:24:02 PDT
(In reply to John Daggett (:jtd) from comment #2)
> What's the plan with the coretext shaping code?  Is this something intended
> for Chrome only?

The hb-coretext code exists just for testing purposes, to make it easy to compare harfbuzz and core text behaviors using the hb testing utilities (hb-shape, hb-view). It is not currently intended or ready for production use (and I, at least, don't have any current plans to try and bring it to the point where it could replace our existing core text backend).
Comment 8 John Daggett (:jtd) 2012-08-06 17:27:08 PDT
Jonathan, there were two other questions in comment 4.  Could you comment on those.
Comment 9 John Daggett (:jtd) 2012-08-06 17:30:39 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #5)
> Pushed to inbound, with an added README-mozilla file that discusses
> updating, etc.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/e62bc7eb4031
> https://hg.mozilla.org/integration/mozilla-inbound/rev/f24229bc0ec8
> 
> Re test coverage, see bug 770591 (Continuous-testing HarfBuzz), which will
> provide us with far more extensive and detailed test coverage, especially
> for the various complex scripts.

I agree that continuous-testing harfbuzz will provide much more extensive testing
coverage but I think we should have tests included in our normal test suite that
sanity check some of the complex shaping code in harfbuzz, something that tests
actual browser content.  Right now you're basically checking in new code with no
tests.
Comment 10 Mathieu Pellerin 2012-08-06 18:42:18 PDT
Thanks a bunch for this, you've just made a whole country, Cambodia, able to read Khmer web sites on Android <= 3.0! :)

BTW, on the harfbuzz mailing list, behdad was referring to possible speed boost due to newly implemented caching mechanism. As the (positive) impact of caching mechanism in firefox been analyzed? If so, I'd love to see the numbers.
Comment 11 Jonathan Kew (:jfkthame) 2012-08-07 01:07:25 PDT
(In reply to John Daggett (:jtd) from comment #8)
> Jonathan, there were two other questions in comment 4.  Could you comment on
> those.

The indic-machine.hh file is generated from the indic-machine.rl description by the Ragel (www.complang.org/ragel/) finite state machine compiler. To avoid requiring Ragel as a build dependency for gecko, I've been pre-generating the .hh file and checking it in with the harfbuzz sources (which is also what Behdad does for tarball releases, but we're normally taking the code directly from the Git repo where the .rl file is the canonical source).

(As mentioned in comment #5, I included a README-mozilla file with brief notes on this, and how it gets updated.)
Comment 12 Jonathan Kew (:jfkthame) 2012-08-07 01:24:03 PDT
(In reply to John Daggett (:jtd) from comment #9)
> I think we should have tests included in our normal test suite
> that
> sanity check some of the complex shaping code in harfbuzz, something that
> tests
> actual browser content.  Right now you're basically checking in new code
> with no
> tests.

Not really. We do have "tests that sanity check some of the complex shaping code" using actual browser content. That's what the tests in layout/reftests/indic-shaping do, as well as a few Arabic-script tests in layout/reftests/text (and the font feature tests in layout/reftests/font-features). Adding similar test cases of a few words in another half-dozen scripts - that all go through the exact same codepaths in our code, and the same shaping backend in harfbuzz - would gain us virtually nothing. All we're testing here is that we are in fact calling harfbuzz, "some kind of shaping action" is happening, and we're getting back plausible output. And that's all our current frameworks can reasonably do.

To seriously test the correctness of (and check for regressions in) harfbuzz complex-script shaping, we need a test suite with hundreds of thousands of examples, not a few dozen words, and we need to validate the actual glyphs and positions being returned against results that Uniscribe (for example) would give. We don't have a test framework at present that can do this; Behdad has some great tools but they're not running and reporting in an automated, continuous way. That's what bug 770591 aims to handle.
Comment 13 Landry Breuil (:gaston) 2012-08-07 02:21:14 PDT
Fwiw, this commit broke my openbsd/gcc 4.2.1 builds :

gfx/harfbuzz/src/hb-ot-shape-normalize-private.hh:41: error: comma at end of enumerator list

http://buildbot.rhaalovely.net/builders/mozilla-central-amd64/builds/476/steps/build/logs/stdio

What is the proper way to apply fixes to harfbuzz ? A direct commit or a patch like it's done for skia/libffi ?
Comment 14 Jonathan Kew (:jfkthame) 2012-08-07 02:54:44 PDT
(In reply to Landry Breuil (:gaston) from comment #13)
> Fwiw, this commit broke my openbsd/gcc 4.2.1 builds :
> 
> gfx/harfbuzz/src/hb-ot-shape-normalize-private.hh:41: error: comma at end of
> enumerator list
> 
> http://buildbot.rhaalovely.net/builders/mozilla-central-amd64/builds/476/
> steps/build/logs/stdio
> 
> What is the proper way to apply fixes to harfbuzz ? A direct commit or a
> patch like it's done for skia/libffi ?

Ah, drat - overlooked that one. For now, if you want to post a patch that just removes the spurious comma, I'll be happy to r+ it. You could attach it as a followup here rather than opening a new bug.

I'll ask Behdad to fix it upstream.
Comment 15 Landry Breuil (:gaston) 2012-08-07 03:19:30 PDT
Created attachment 649582 [details] [diff] [review]
followup, remove extra comma at end of enum
Comment 16 Jonathan Kew (:jfkthame) 2012-08-07 03:33:21 PDT
r=me, and I went ahead and pushed it to inbound as I had a tree handy.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2637d896de91
Comment 17 Jonathan Kew (:jfkthame) 2012-08-07 04:07:10 PDT
(In reply to Mathieu Pellerin from comment #10)
> BTW, on the harfbuzz mailing list, behdad was referring to possible speed
> boost due to newly implemented caching mechanism. As the (positive) impact
> of caching mechanism in firefox been analyzed? If so, I'd love to see the
> numbers.

I haven't done extensive/systematic analysis, but on a very text-heavy (but simple) page - loading a Tinderbox log, as in bug 762710) - I saw the contribution from shaping drop from about 13% to under 10% of the total time. So it's clearly made a difference.

The benefit should be greater when using fonts with a lot of lookups (e.g. large pages full of Indic text). But on the other hand that's not typical of most people's browsing experience, so we shouldn't over-emphasize the benefit here. For most people, page load/reflow times are dominated by other things than text shaping.
Comment 18 Ed Morley [:emorley] 2012-08-07 07:32:54 PDT
https://hg.mozilla.org/mozilla-central/rev/2637d896de91
Comment 19 tugstugi 2012-10-08 02:45:32 PDT
@Jonathan

Now firefox is shaping the classic Mongolian correctly, but we have to set gfx.font_rendering.harfbuzz.scripts. Could someone make the harfbuzz shaping for classic Mongolian default on all platforms except Windows?
Comment 20 Jonathan Kew (:jfkthame) 2012-10-08 04:58:59 PDT
Bug 797398 (Linux) and bug 797402 (Mac OS X) intend to do this.

Note You need to log in before you can comment on or make changes to this bug.