Closed Bug 1066160 Opened 5 years ago Closed 5 years ago

import the Brotli decompressor into gecko, to support WOFF2 font decoding

Categories

(Core :: Layout: Text and Fonts, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(2 files, 2 obsolete files)

This is needed to enable support for the new WOFF2 format (bug 1064737).

For this purpose, we only need the Brotli decoder, not the encoder.

Brotli draft spec:
https://code.google.com/p/font-compression-reference/source/browse/brotli/brotlispec.txt

Upstream source for the Brotli decoder:
https://code.google.com/p/font-compression-reference/source/browse/#git%2Fbrotli%2Fdec
This simply imports the Brotli decoder source from upstream (currently at 6cef49677dc4c650ef6e3f56041e0a41803afa8c) into modules/brotli/dec; is that a reasonable place for it?
Attachment #8488089 - Flags: review?(gps)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
And this includes it in the build, although nothing links to it yet (that's coming in bug 1064737).
Attachment #8488091 - Flags: review?(gps)
830K patch! I knew this was coming but that's a bit of sticker shock. What kind of binary/memory hit are we looking at? If the savings in font download size is significant enough to warrant a new decompressor, please share the metrics for that. Also, if we take this code, what else could we use it for? Can we replace and remove some legacy decompressor already in the tree to amortize the cost of Brotli?
Whiteboard: [MEMSHRINK]
In the 64-bit OS X build, it looks like Brotli compiles to a little over 200K of object code. (From the look of the code, the vast majority of that is a large static data table that it uses.)

Draft WOFF2 Evaluation Report:
  http://www.w3.org/TR/WOFF20ER/
According to http://www.w3.org/TR/WOFF20ER/#appendixB, WOFF2 using Brotli gets around 20-25% better compression than WOFF 1.0 (results vary considerably depending on the particular fonts involved - sometimes as much as 50% better).

Examples of anecdotal evidence of the benefits to be gained:
  https://twitter.com/patrickhamann/status/497767778703933442
  https://twitter.com/tkadlec/status/509688058573504512
  https://twitter.com/wpseo/status/482516050303807490

If we take this, I don't imagine it will replace existing decompressors for existing file types/formats that are specced to use other compression algorithms; but it might be a possible contender for future use, e.g. as a new HTTP-level compression option instead of the proposed addition of LZMA (bug 366559).
(In reply to Jonathan Kew (:jfkthame) from comment #4)
> In the 64-bit OS X build, it looks like Brotli compiles to a little over
> 200K of object code. (From the look of the code, the vast majority of that
> is a large static data table that it uses.)

Ew, that static array is nasty. Load from compressed file on initialization? Where does that array originate? Just from a casual glance I think that would compress down to something more reasonable.
(In reply to John Daggett (:jtd) from comment #5)
> (In reply to Jonathan Kew (:jfkthame) from comment #4)
> > In the 64-bit OS X build, it looks like Brotli compiles to a little over
> > 200K of object code. (From the look of the code, the vast majority of that
> > is a large static data table that it uses.)
> 
> Ew, that static array is nasty. Load from compressed file on initialization?
> Where does that array originate? Just from a casual glance I think that
> would compress down to something more reasonable.

I'm sure it could be compressed and take up a bit less space on disk, but OTOH it would then need to be decompressed into RAM in order to initialize the decoder; that would increase the RAM footprint and add to initialization time. As it stands, it presumably just gets linked into a read-only data segment and mapped into memory "for free" by the loader.

If you think an alternative approach might be better, that would be something to take up with the Google team via the WebFonts WG. (Or we could of course fork their implementation, but I'd really prefer to avoid having to maintain something separate.)
(In reply to Jonathan Kew (:jfkthame) from comment #6)
> Or we could of course fork their implementation

Let's not do that. I just want to make sure that all stakeholders are aware of the code size hit, and aware of the user benefits. As with any significant additional code, I want us to consider other uses for it, and potential replacement of old code. As Brotli is designed for fast decompression via network streaming, I would think that's good for all sorts of stuff--perhaps to replace internal uses of gzip/lzma for lazy-loaded data?
(In reply to Jonathan Kew (:jfkthame) from comment #6)

> I'm sure it could be compressed and take up a bit less space on disk, but
> OTOH it would then need to be decompressed into RAM in order to initialize
> the decoder; that would increase the RAM footprint and add to initialization
> time. As it stands, it presumably just gets linked into a read-only data
> segment and mapped into memory "for free" by the loader.

I highly doubt this is "for free" on low-memory devices. This seems like enough of a codesize hit that we might have to disable this for low-memory devices.
If this is needed for a web standard, not much we can do about it...
Whiteboard: [MEMSHRINK]
Comment on attachment 8488089 [details] [diff] [review]
pt 1 - import the Brotli decoder source from upstream.

Review of attachment 8488089 [details] [diff] [review]:
-----------------------------------------------------------------

Please do not check external code into the tree without including details on where that code came from so people in the future may verify and update the code. I'm sure I could figure out where this came from with a little effort. But I shouldn't have to.

Let's add a README.mozilla (we have 13 in the tree today and most seem to be from external projects) indicating where this code comes from. In addition, the commit message should include exactly where the code came from, what version/commit was used, whether there were changes to code, etc. See e.g. https://hg.mozilla.org/mozilla-central/rev/a3fa93d4db23
Attachment #8488089 - Flags: review?(gps)
Comment on attachment 8488091 [details] [diff] [review]
pt 2 - include Brotli in the gecko build.

Review of attachment 8488091 [details] [diff] [review]:
-----------------------------------------------------------------

Do we need two moz.build files? What happens if you put path prefixes in the modules/brotli/moz.build file? (We try to minimize the number of moz.build files in the tree to reduce overhead.) Alternatively, you could kill modules/brotli/moz.build directly and reference the dec one from config/external/moz.build. A bit unclean, but it gets the job done.
Attachment #8488091 - Flags: review?(gps) → review+
You also probably want to include an update shell script, which pulls the latest upstream and applies any local patches we might have. See e.g. mfbt/decimal/ for an example.
Well, comment 0 here includes the details of where to find upstream; but you're right, we should have this in the tree. Added a brief README, as well as a simple update script.
Attachment #8498788 - Flags: review?(gps)
Attachment #8488089 - Attachment is obsolete: true
Using path prefixes in brotli/moz.build seems to work fine, so we can dispense with the moz.build in the /dec subdir. And we can also use UNIFIED_SOURCES instead of SOURCES, it seems. Carrying over r=gps, though I'll double-check this builds OK on all platforms before trying to land it.
Attachment #8488091 - Attachment is obsolete: true
Comment on attachment 8498788 [details] [diff] [review]
pt 1 - Import the Brotli decoder source (commit d9a74803fa884559879e3205cfe6f257a2d85519) from upstream.

Review of attachment 8498788 [details] [diff] [review]:
-----------------------------------------------------------------

The UX is very minor. I'm sure competent people can figure it out :)

::: modules/brotli/update.sh
@@ +10,5 @@
> +mv TEMP/brotli/dec dec
> +rm -rf TEMP
> +
> +echo Updated brotli/dec to $COMMIT.
> +hg stat .

Not sure why you run `hg status` here.

Should you just say "remember to verify and commit the changes to source control?"
Attachment #8498788 - Flags: review?(gps) → review+
(In reply to Gregory Szorc [:gps] from comment #15)
> 
> Not sure why you run `hg status` here.
> 
> Should you just say "remember to verify and commit the changes to source
> control?"

The intent was that it'd show a brief summary of changed files, but simply echoing a message such as you suggest may be better.
Now that this big library has made it into the tree, how does it look to Coverity?
Flags: needinfo?(erahm)
We did a run on 2014-10-09, I didn't see any high priority bugs related to brotli show up. We should probably keep an eye on AWSY to see what kind of memory regressions we get.
Flags: needinfo?(erahm)
You need to log in before you can comment on or make changes to this bug.