Last Comment Bug 366559 - Brotli Accept-Encoding/Content-Encoding
: Brotli Accept-Encoding/Content-Encoding
Status: RESOLVED FIXED
: dev-doc-complete, feature
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- enhancement with 23 votes (vote)
: mozilla44
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on: fuzzing-brotli 1207256 1207298 1211580 1211916 1243724 1261318
Blocks: 1242904
  Show dependency treegraph
 
Reported: 2007-01-10 06:41 PST by Neil Harris
Modified: 2016-04-01 17:08 PDT (History)
80 users (show)
dveditz: sec‑review+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed
44+


Attachments
(Part 1: code) Add support for LZMA2 decompression "Content-Encoding: xz" using liblzma (14.45 KB, patch)
2013-04-04 07:38 PDT, Luke Deller
no flags Details | Diff | Review
(Part 2: build) Add support for LZMA2 decompression "Content-Encoding: xz" using liblzma (4.33 KB, patch)
2013-04-04 07:41 PDT, Luke Deller
no flags Details | Diff | Review
patch with adjusted diff settings (Part 1: code) (33.80 KB, patch)
2013-04-04 07:56 PDT, Luke Deller
no flags Details | Diff | Review
patch with adjusted diff settings (Part 2: build) (5.62 KB, patch)
2013-04-04 07:57 PDT, Luke Deller
no flags Details | Diff | Review
patch (part 0: import XZ embedded) (89.69 KB, patch)
2013-04-13 06:49 PDT, Luke Deller
mcmanus: feedback+
Details | Diff | Review
patch (part 1: code) (24.64 KB, patch)
2013-04-13 06:49 PDT, Luke Deller
mcmanus: feedback+
Details | Diff | Review
patch (part 2: build) (12.83 KB, patch)
2013-04-13 06:50 PDT, Luke Deller
no flags Details | Diff | Review
patch (part 3: test) (4.61 KB, patch)
2013-04-14 04:09 PDT, Luke Deller
mcmanus: feedback+
Details | Diff | Review
patch (part 0: import XZ embedded) (89.70 KB, patch)
2013-05-27 07:37 PDT, Luke Deller
gerv: review+
Details | Diff | Review
patch (part 1: code) (23.91 KB, patch)
2013-05-27 07:59 PDT, Luke Deller
mcmanus: review-
Details | Diff | Review
patch (part 2: build) (6.75 KB, patch)
2013-05-27 08:03 PDT, Luke Deller
ted: review-
Details | Diff | Review
patch (part 3: test) (4.53 KB, patch)
2013-05-27 08:08 PDT, Luke Deller
mcmanus: review+
Details | Diff | Review
patch (part 2: build) (6.09 KB, patch)
2013-05-31 07:13 PDT, Luke Deller
ted: review+
Details | Diff | Review
patch (part 1: code) + whitespace fixes (24.48 KB, patch)
2013-08-15 19:00 PDT, Alex Xu
no flags Details | Diff | Review
patch (part 1: code) (24.59 KB, patch)
2014-01-14 05:08 PST, Luke Deller
no flags Details | Diff | Review
patch (part 2: build) (5.46 KB, patch)
2014-01-14 05:16 PST, Luke Deller
ted: review+
Details | Diff | Review
patch (part 3: test) (4.49 KB, patch)
2014-01-14 05:41 PST, Luke Deller
no flags Details | Diff | Review
patch 1, update brotli snapshot (207.03 KB, patch)
2015-09-21 08:07 PDT, Patrick McManus [:mcmanus]
jfkthame: review+
Details | Diff | Review
patch 2, fix nsHTTPCompressConv indentation (33.01 KB, patch)
2015-09-21 08:07 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review
patch 3, fix nsHTTPCompressConv bracing style (13.20 KB, patch)
2015-09-21 08:08 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review
patch 4, fix nsHTTPCompressConv namespace (4.89 KB, patch)
2015-09-21 08:08 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review
patch 5, fix nsHTTPCompressConv manual addref (1.20 KB, patch)
2015-09-21 08:08 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review
patch 6, support different content encodings for http vs https (11.23 KB, patch)
2015-09-21 08:08 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review
patch 7, content-encoding brotli for https (25.68 KB, patch)
2015-09-21 08:08 PDT, Patrick McManus [:mcmanus]
daniel: review+
Details | Diff | Review

Description Neil Harris 2007-01-10 06:41:55 PST
Adding LZMA as a transfer-encoding method for HTTP offers the prospect of 30 - 40% greater compression for page downloads, compared to existing gzip compression, and hence could produce significantly increased page-load speed for mobile devices with restricted bandwidth. This would also benefit users in developing countries, where high-speed access is expensive or unavailable: for example, for the version of Firefox in the OLPC laptop.

Example:

On three representative web pages downloaded from the web, I got the following results (figure represent the total length of the three files when each compressed individually)

Uncompressed: 201185 bytes, 100%
gzip:          60911 bytes,  30% of original
lzma:          44111 bytes,  21% of original, a factor of 1.38 less than gzip

Thus, the extra compression offers a 38% increase in download speed relative to gzip encoding, or, to put it another way, could offer a 28% reduction in page download time. Since LZMA, like gzip, is a streaming compression technique, these increases would apply to incremental as well as total page load time.

Advantages:
* LZMA offers a significant speedup for bandwidth-restricted devices, where page load speed is most critical
* Would only affect the HTTP transfer code, would be completely transparent to every other part of the application
* HTTP Accept-transfer-encoding mechanism allows for complete forwards and backwards compatibility with all existing web server and client implementations* Code is small, won't add much to browser footprint
* Will increase both total and time-to-first rendering page load times
* Appears to be unencumbered by patent issues (please check!)
* Free software implementations readily available
* To reiterate: 30%+ speedup!

Possible disadvantages (and mitigating factors):
* LZMA is not yet standardized (but open source implementations are widely available, so implement first, then standardize -- perhaps call the method x-mozilla-lzma until it is standardized) 
* not yet supported by any webservers (this is a chicken-and-egg situation: support in Firefox will certainly drive adoption: in particular, just adding LZMA compression in Apache would instantly make LZMA available for 50% of the web server installed base)
* memory footprint and CPU overhead need investigating (but implementation cannot make anything worse than status quo, even in the worst case, since neither web servers nor browsers would be forced to use it, so it can be switched off where necessary, and both CPU speed and memory are getting bigger much faster than download speeds for mobile devices -- also, since the CPU is often idle while waiting for download data, this may still increase download speeds even on relatively slow platforms: needs benchmarking)
Comment 1 Neil Harris 2007-01-10 16:08:20 PST
The reference implementation for LZMA is available under the LGPL. See http://www.7-zip.org/sdk.html

The busybox code appears to have a GPLv2 implementation of an LZMA decoder. It's surprisingly small: about 500 lines of C. See http://www.busybox.net/cgi-bin/viewcvs.cgi/trunk/busybox/archival/libunarchive/decompress_unlzma.c?rev=17164&view=markup 

This seems almost too small. Perhaps I've missed something and this isn't the whole thing?
Comment 2 Neil Harris 2007-01-10 16:44:05 PST
After a bit more code inspection: no, the busybox code does not seem to use any symbols it does not define, except for things like reading/writing/malloc/free/error handling, and some simple macros.
Comment 3 Neil Harris 2007-01-10 17:03:53 PST
There seems to be another LGPL'd implementation at http://tukaani.org/lzma/
Comment 4 Jesse Ruderman 2007-01-11 03:55:51 PST
See also bug 173044, "Support for bzip2 transfer encoding".
Comment 5 Jesse Ruderman 2007-01-11 03:56:05 PST
I talked to some Dreamhost people about this a while back.  I asked them something like "If Firefox supported LZMA or bzip2 transfer-encoding, would Dreamhost use it?"  IIRC, they concluded that gzip is the best compression method that's fast enough to use on a web server.
Comment 6 Neil Harris 2007-01-11 04:54:04 PST
Yes, LZMA is considerably slower to compress. On my 2 GHz x86 box, separately compressing the same three files mentioned above took 20 milliseconds using gzip, and 180 milliseconds using LZMA, an average of 60ms per file, about 10x slower than gzip; this would mean that LZMA cost would most likely become the dominant factor in pages serving time, which would be bad.

However, If we assume that user time is more valuable than server time, LZMA  still wins for slow links. if the webserver was sending these pages to a mobile device over a 40 kbps GPRS connection, it would have taken an average of 13 seconds each to download them. In this case, spending 60ms of overall web server CPU time per page in compression would save about 3.6 seconds of overall delivery time per page. 

This would probably also be the case for links with long round-trip times, such as satellite links or round-the-world links (UK <--> Australia is 330ms), where slow-start will cause the initial stages of a TCP connection to be very slow, even if there is plenty of bandwidth available on the intermediate links. And, come to think of it, the slow-start problem will also apply to very slow links, because of packetization delay, and to congested links, because packet drops will slow TCP to a crawl.
Comment 7 Neil Harris 2007-01-11 05:26:35 PST
Here are some more benchmark results: http://tukaani.org/lzma/benchmarks

I think Jesse's right: this is not a no-brainer like gzip, the tradeoffs are more complex. I still think it's worth considering: where else can you get a 30% speedup at the cost of 500 lines of code? 

More thoughts:
* since the choice of encoding is ultimately under the control of the server, server admins who don't want to use LZMA won't have to
* since it is ultimately under the control of the server, LZMA encoding could also be applied selectively in real-time, with LZMA encoding only being used when the server CPU load is relatively low, and this would also eliminate the possibility of LZMA being used as a denial-of-service vector
* LZMA encoding could also be applied selectively by a smart webserver, based on one or more of: user-agent string; source IP address (eg. using a geolocation database); per-website configuration; RTT sniffing from the three-way handshake

Comment 8 Jesse Ruderman 2007-01-11 05:42:00 PST
For sites that cache the compressed version of a frequently requested page, the compression time is less important.
Comment 9 Neil Harris 2007-01-11 06:15:51 PST
And, re-working the numbers, LZMA might make more sense for providers than I thought. At a leasing cost of $100/month for a single-CPU server, and a bandwidth cost of $50/Mbit-month, allowing for 2:1 overprovisioning to allow for a reasonable peak-to-mean ratio, I make the costs to be something like:

1 millisecond of CPU time:    $4.45e-8 (depreciation + power + hosting: aabout $1400/year) 
1 kilobyte of data transfer:  $3.08e-7 (about $0.30 per gigabyte)

If it costs 60 CPU-milliseconds to save 10 kilobytes of data output over the alternative of gzip, the overall costs are:

60 ms of CPU:                 $2.67e-6
10 kbytes of data transfer:  -$3.08e-6
 
so the provider is actually a little tiny bit up on the deal. (Depending on your assumptions, it can also go the other way -- still, I think this suggests that there the two effects are of similar magnitude.)
Comment 10 Neil Harris 2007-01-11 06:24:53 PST
...and, as Jesse points out in comment #8, if the page is frequently hit, and can be cached in its compressed state, then LZMA will be a pure win for the service provider, in both data transfer and CPU/diskio terms, as well as increasing the number of pages which can be kept in RAM at a given moment, which can be the dominating factor for many servers. 
Comment 11 Neil Harris 2007-01-11 07:05:45 PST
An example of RTT-bound TCP communication making HTTP speeds much lower than link speeds: 

Hitting the 16 kbytes of http://www.australia.gov.au/index.html (just the HTML without any of its linked content) from my UK 2Mbit ADSL connection typically takes about 1.5 seconds, an effective data rate of 24 kbps which is entirely dominated by TCP slow-start (the 3-way handshake, then four bursts of MTU packets, making up 11 packets in all: one, then two, then four, then the last four, is 4 RTT: this is completely consistent with the earlier estimate of 350ms to Australia)

Note that this scenario would still hold true even if the link speeds were increased greatly: the overhead is entirely in the slow-start.
Comment 12 Neil Harris 2007-01-17 08:54:47 PST
One more thought: since we only need to specify the _decoder_, any of a number of possible encoders could be used, some of which might be fater than there reference implementation: I wonder where the slowdown is in LZMA relative to GZIP: is it in the substring-finding code, or the range encoder, and whether either of those could be made faster?
Comment 13 Neil Harris 2007-01-21 15:33:10 PST
Yet more: it looks like the RTT of TCP connections can be determined using getsockopt(fd,SOL_TCP, TCP_INFO, ...) and looking in the tcpi_rtt field in the returned structure. This could be used to flag long-latency connections at either the web server or client end, allowing the LZMA capability to be turned on or off at either end on a per-connection basis, depending on whether it would give an anticipated advantage.
Comment 14 Michael Buckley 2010-12-12 21:10:31 PST
Some more benchmarks (the last link has rotted): http://stephane.lesimple.fr/wiki/blog/lzop_vs_compress_vs_gzip_vs_bzip2_vs_lzma_vs_lzma2-xz_benchmark_reloaded

Up to 10x compression time and 4x decompress time (on different levels though).  Memory usage can be huge.  Though with some of the options set, I can see use cases for it.
Comment 15 Neil Harris 2010-12-13 12:17:26 PST
After repeating some calculations:

Given the recent huge increases in mobile device data link speeds, and reduction in last-mile RTT, with the release of 3G technologies and the forthcoming availability of VHF/UHF wavebands for mobile use in many countries, I feel this feature offers fewer advantages that I thought for mobiles.

However, for long-RTT intercontinental paths such as might be found in Europe-Australia or US-Africa links, the RTT reduction is still significant, regardless of link speeds, even after allowing for compression time overheads and I believe it's still worth further considering the merits of implementing a decoder for better-than-gzip compression, and then see if it gets used.

If it's not used, it can be taken out, and because of content negotiation, that should not break anything.

Perhaps the experience of the Google/Apache Pagespeed initiative might be relevant here?
Comment 16 Jorge Quiñónez 2011-02-25 22:50:06 PST
Shouldn't the "LZMA" be changed to "XZ" in the title of the Bug? Or am I thinking or confused?
Comment 17 Jorge Quiñónez 2011-02-25 22:51:03 PST
meant to say "Or am I thinking incorrectly or confused?"
Comment 18 Carlo Alberto Ferraris 2011-02-25 23:45:02 PST
(In reply to comment #16)
> Shouldn't the "LZMA" be changed to "XZ" in the title of the Bug? Or am I
> thinking or confused?

XZ is a file format, LZMA is the compression algorithm. Since we're talking about the compression algorithm, LZMA is correct.
Comment 19 Jorge Quiñónez 2011-02-27 19:28:04 PST
Thanks for the correction, Carlo. However, I just read that the XZ file format uses the LZMA2 compression algorithm. So shouldn't the big title be corrected to "LZMA2"?
Comment 20 Carlo Alberto Ferraris 2011-02-27 22:16:23 PST
The SDK is the same, so no big deal. Also, the only difference between LZMA and LZMA2 is, according to wikipedia:

LZMA2 is a modified version of LZMA that offers a better compression ratio for uncompressible data (random data expands about 0.005%, compared to 1.35% with original LZMA), and optionally can compress multiple parts of large files in parallel, greatly increasing compression speed but with a possible reduction in compression ratio.

So basically you just have to tell your webserver not to compress already-compressed files. 

<joking>If you really want better compression, file a bug for PAQ* support</joking>
Comment 21 Justin Lebar (not reading bugmail) 2013-02-28 23:13:16 PST
I think Google just validated [1] that they're willing to trace CPU cycles on encode for a higher compression ratio in at least some circumstances, which argues against Jesse's (extremely old) data in comment 5.

Perhaps we should revisit this bug?  It sounds like it might make a good intern/GSoC project.

[1] http://googledevelopers.blogspot.com/2013/02/compress-data-more-densely-with-zopfli.html
Comment 22 Patrick McManus [:mcmanus] 2013-03-01 09:28:24 PST
(In reply to Justin Lebar [:jlebar] from comment #21)
> I think Google just validated [1] that they're willing to trace CPU cycles
> on encode for a higher compression ratio in at least some circumstances,

A perhaps less charitable reading of that is to say google demonstrated how far they are willing to go to stay compatible with existing formats and definitions..

but mostly I jest.

I'm less concerned about fragmentation than most of my mozillian colleagues and things that can be done with very modest bytes of request headers are generally not very costly - even if they have low usage rates.

The bigger concern is interop problems that will happen with broken intermediaries. To me that's where the risk/reward calculation comes in with adding new Accept-* stuff.

If there were actually a patch for this we could take it behind a pref and actually do some A/B testing around it rather than pre judging it..
Comment 23 Justin Lebar (not reading bugmail) 2013-03-01 09:30:24 PST
> If there were actually a patch for this we could take it behind a pref and actually do 
> some A/B testing around it rather than pre judging it.

How would such an A/B test work?  That is, how would we detect people who were broken by this additional header?
Comment 24 Patrick McManus [:mcmanus] 2013-03-01 09:48:13 PST
(In reply to Justin Lebar [:jlebar] from comment #23)
> > If there were actually a patch for this we could take it behind a pref and actually do 
> > some A/B testing around it rather than pre judging it.
> 
> How would such an A/B test work?  That is, how would we detect people who
> were broken by this additional header?

we'd have to correlate it to some other kind of error rate.. http errors, decode errors, etc.. 

as an aside I would only want to see this as CE; not TE. Yes, I know the difference and yes I wish the world really supported TE, but it doesn't - so let's just keep consistent with how the world already uses CE to mean what TE is supposed to mean and let TE meet its fate.
Comment 25 Neil Harris 2013-03-01 10:31:33 PST
Google's QUIC protocol is another sign that concentrating on the smallest possible data payload is the right way to achieve speed, by reducing the number of packets and round trips: they appear to be using going as far as using FEC to reduce round trips to recover from packet loss. This would be entirely compatible with initiatives such as QUIC and SPDY, as far as I can see.
Comment 26 Luke Deller 2013-04-04 07:38:52 PDT
Created attachment 733317 [details] [diff] [review]
(Part 1: code) Add support for LZMA2 decompression "Content-Encoding: xz" using liblzma
Comment 27 Luke Deller 2013-04-04 07:41:58 PDT
Created attachment 733318 [details] [diff] [review]
(Part 2: build) Add support for LZMA2 decompression "Content-Encoding: xz" using liblzma

I have attached some patches to add support for LZMA2 encoding using liblzma.  This is my first patch for Firefox, and I would be grateful for some feedback.  I have not yet done a test case.
Comment 28 Justin Lebar (not reading bugmail) 2013-04-04 07:44:29 PDT
Comment on attachment 733317 [details] [diff] [review]
(Part 1: code) Add support for LZMA2 decompression "Content-Encoding: xz" using liblzma

Thanks for this patch!

I can tell you that the first thing people are going to ask for is a patch with -p -U8.  You can add the following to your hgrc:

 [diff]
 git = 1
 showfunc = 1
 unified = 8

Before I forget, we will definitely want a runtime pref and some telemetry for this.
Comment 29 Patrick McManus [:mcmanus] 2013-04-04 07:49:32 PDT
could definitely be interesting.

should add it to the accept-encoding header (when pref'd on).

does the library exist (and was your build patch tested with) for all of windows, linux, os x?
Comment 30 Luke Deller 2013-04-04 07:56:02 PDT
Created attachment 733325 [details] [diff] [review]
patch with adjusted diff settings (Part 1: code)

adjusting hg diff settings as per Justin's comment
Comment 31 Luke Deller 2013-04-04 07:57:18 PDT
Created attachment 733327 [details] [diff] [review]
patch with adjusted diff settings (Part 2: build)
Comment 32 Luke Deller 2013-04-04 08:08:29 PDT
@Justin re a runtime pref, does the network.http.accept-encoding pref suffice?  It does not disable any code, but it should prevent servers from giving us xz content if the server is set up right.  See that my patch (part 2) adjusts the default for that pref when built with liblzma.

BTW for testing I have used Apache's "multiviews" feature, along the lines presented here: http://www.johnhawthorn.com/2009/10/gzip-compression-in-apache-through-multiviews/
Comment 33 Justin Lebar (not reading bugmail) 2013-04-04 08:10:08 PDT
Luke, the way we usually deal with dependencies such as liblzma is to import them wholesale into the tree.  It looks like liblzma is in the public domain and supports all the platforms we care about?  They don't explicitly say they support Mac+Clang, but they support Linux+Clang, so that's probably not an issue.

http://tukaani.org/xz/
Comment 34 Justin Lebar (not reading bugmail) 2013-04-04 08:12:54 PDT
> @Justin re a runtime pref, does the network.http.accept-encoding pref suffice?

I'd probably want a pref that prevents this code from running under any circumstances, so that it's easier to disable this code when it hits the field if it has e.g. a security vulnerability.  But that's really a minor point.
Comment 35 Luke Deller 2013-04-06 06:20:19 PDT
@Patrick the liblzma library exists on all of Windows, Linux and OS X, however I have only tested on Linux to date.  I can set myself up with building Firefox on Windows too, but will need help from others to test on Mac OS X.

I had a look at importing liblzma source into the mozilla tree per Justin's comment 33, and found two hurdles:

1) liblzma is one of several build products of the "XZ Utils" source package.  It seems a bit awkward to include the whole package, but extricating liblzma source only looked like a nontrivial hack.

2) XZ Utils must be built with gcc/mingw on Windows - msvc is not supported (though the resulting library binary can be used with msvc).

The same author provides an "XZ Embedded" package which looks like it could be easily imported into the mozilla source tree.  It has a different API and supports a subset of features (eg no SHA256 integrity check).  I'm going to have a shot at an alternative patch which uses this.
Comment 36 Luke Deller 2013-04-13 06:49:06 PDT
Created attachment 737130 [details] [diff] [review]
patch (part 0: import XZ embedded)
Comment 37 Luke Deller 2013-04-13 06:49:40 PDT
Created attachment 737131 [details] [diff] [review]
patch (part 1: code)
Comment 38 Luke Deller 2013-04-13 06:50:27 PDT
Created attachment 737132 [details] [diff] [review]
patch (part 2: build)
Comment 39 Luke Deller 2013-04-13 06:58:06 PDT
Updated patches:
- tested on both Windows (MSVC10) and Linux (Mint 14/amd64/gcc-4.7.2)
- by default uses the XZ Embedded library imported into the mozilla source tree, but if configured with --with-system-lzma then it will use the system's liblzma library instead
- adds two new prefs: "converter.xz.disabled" and "converter.xz.memory_limit_mb"
Comment 40 Luke Deller 2013-04-14 04:09:15 PDT
Created attachment 737233 [details] [diff] [review]
patch (part 3: test)

add a unit test (based on the existing test for Content-Encoding: gzip)
Comment 41 Jason Duell [:jduell] (needinfo? me) 2013-05-07 16:53:51 PDT
Comment on attachment 737131 [details] [diff] [review]
patch (part 1: code)

Review of attachment 737131 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/streamconv/converters/nsHTTPCompressConv.h
@@ +2,5 @@
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +#if !defined (__nsHTTPXzConv__h__)

drive by comment: did you mean to modify nsHttpCompressConv.h like this, or should you be adding a new nsHttpXzConv.h file here?  I'm guessing the latter.  I don't think this patch will compile as-is.
Comment 42 Luke Deller 2013-05-07 23:46:55 PDT
(In reply to Jason Duell (:jduell) from comment #41)
> drive by comment: did you mean to modify nsHttpCompressConv.h like this, or
> should you be adding a new nsHttpXzConv.h file here?  I'm guessing the
> latter.  I don't think this patch will compile as-is.

Hi Jason.  The patch looks like this because I started with "hg copy nsHttpCompressConv.h nsHttpXzConv.h", to retain history of lines I did not touch.  It applies for me using "hg patch" which does not modify nsHttpCompressConv.h.  See that the patch for that file starts with the following:

> diff --git a/netwerk/streamconv/converters/nsHTTPCompressConv.h b/netwerk/streamconv/converters/nsHTTPXzConv.h
> copy from netwerk/streamconv/converters/nsHTTPCompressConv.h
> copy to netwerk/streamconv/converters/nsHTTPXzConv.h
> --- a/netwerk/streamconv/converters/nsHTTPCompressConv.h
> +++ b/netwerk/streamconv/converters/nsHTTPXzConv.h
Comment 43 Patrick McManus [:mcmanus] 2013-05-15 09:46:31 PDT
(In reply to Neil Harris from comment #0)

> * To reiterate: 30%+ speedup!

a 30% reduction in bytes is not a 30% speedup. Also, to be fair, this only applies to resources that are compressible with this scheme and the bulk of a pages byte count are in images which are not applicable here. Indeed less than 30% of the page is generally text to start with.

That being said, I'm interested in this. I'm not especially concerned about format proliferation because this is a transparent transport issue. If it doesn't work out well we can simply remove the support.

Are you planning on offering apache patches to that project too? Adoption by something like mod_pagespeed would make this very interesting.
Comment 44 Patrick McManus [:mcmanus] 2013-05-15 09:46:46 PDT
Comment on attachment 737130 [details] [diff] [review]
patch (part 0: import XZ embedded)

Review of attachment 737130 [details] [diff] [review]:
-----------------------------------------------------------------

Its a fairly short import - that's good :)

you'll want to get gerv involved for licensing

and maybe ted too to decide on where it should be from a build perspective.

it would be good if you could measure how many bytes this adds to an opt windows build... both for the download and code size.
Comment 45 Patrick McManus [:mcmanus] 2013-05-15 09:47:05 PDT
Comment on attachment 737131 [details] [diff] [review]
patch (part 1: code)

splinter won't let me provide inline comments to this patch, so I apologize for the confusing format of this patch feedback.

CreateNewHTTPXzConvFactory () - I know that's copy and pasted, but we can fix some stuff up here.. use NS_ENSURE_ARGS instead of checking !aResult explicitly.. you've also got trailing whitespace in here. also "inst" can be managed with a nsRefPtr instead of as plain pointer.. that will help you plug the leak you've got right now in the QI-fails error path. (just replace RELEASE(inst) with inst.forget())

you change nsNetStartup to fail if nsHttpXzConv::Init() fails. Can we be more robust here? I'd rather networking could continue without xz :)

For the new files nsHttpXzConv.[cpp/h] I would prefer
 * 2 space indents
 * sorted include directives (unless something has a dep)
 * name the file Foo instead of nsFoo. Same thing for the classes
 * put everything in the namespace mozilla::net

(you used old code as a template and the current conventions have changed without updating all the old code)

can you make sure you need all those includes?

you do an hg cp to create nsHttpXzConv.h - don't do that in this case. Just use regular cp and hg add it.

I don't care to support system LZMA (MOZ_NATIVE_LZMA) if we have imported this code into the tree - that's too confusing and hurts testing by increasing diversity. (it also hurts rapid release). Just use the one embedded and drop the macros and build support for system versions.

ironically mInitialised is unitialised )

I'm trying to understand why you have the mLazyInitLock and relatedly why the prefobserver is defined to be THREADSAFE_ISUPPORTS (instead of regular ISUPPORTS). When can this code called from a non-main thread?

Your handling of the disabled pref doesn't seem sufficient. We'll just throw errors. We need to remove it from the Accept header request line when its disabled.

I think you leak smPrefObserver. (which should be named sPrefObserver)

put the nsHttpConv::Init() method after the ctor and dtor

make mListener a nsCOMPTr and then you don't need to init, addref and release it manually

You don't do anything with mSyncConvContext except hold a reference to it. I think its fine to delete that var entirely.

mInpBuffer and mOutBuffer can be converted to nsAutoArrayPtr<unsigned char> to simplify their management

I hate to make comments about syntax, but instead of 
if (foo)
{
you should do "if (foo) {" (one line) to stay consistent with new necko code.. there is a lot of this in ondataavailable()

the coding guidelines also say to not do "if (foo == NULL)" just say "if (!foo)".. also if you do need to say NULL, say nullptr instead.

use c++ casts instead of (unsigned char *)

I would probably set a minimum size for the first allocation instead of letting streamlen be the floor.. that's because you try and reuse that same buffer in future ODA calls. 8KB maybe.

not checking the return value of iStr->Read() is surely a mistake

"(uint64_t)smPrefObserver->mXzMemoryLimitMB*1048576;" needs whitespace

use std::min instead of XPCOM_MIN

I don't understand why the handling of LZMA_BUF_ERROR is the same as LZMA_OK

instead of  __nsHTTPXzConv__h__ please use mozilla_net_HttpXzConv_h
Comment 46 Patrick McManus [:mcmanus] 2013-05-15 09:47:22 PDT
Comment on attachment 737132 [details] [diff] [review]
patch (part 2: build)

Review of attachment 737132 [details] [diff] [review]:
-----------------------------------------------------------------

you need somebody like ted to help with this patch.. I think a compile disable switch is good - but we shouldn't support a system copy of the library in that case.

::: modules/libpref/src/init/all.js
@@ +927,5 @@
>  
>  // Enable http compression: comment this out in case of problems with 1.1
>  // NOTE: support for "compress" has been disabled per bug 196406.
>  // NOTE: separate values with comma+space (", "): see bug 576033
> +pref("network.http.accept-encoding", "xz, gzip, deflate");

this change should be part of the streamconv patch not the build system one

also, make xz last in the list.

::: netwerk/streamconv/converters/Makefile.in
@@ +53,5 @@
>  include $(topsrcdir)/config/rules.mk
>  
>  DEFINES += -DIMPL_NS_NET
> +
> +CFLAGS += -O0

I'm guessing this isn't the right place to do this
Comment 47 Patrick McManus [:mcmanus] 2013-05-15 09:47:47 PDT
Comment on attachment 737233 [details] [diff] [review]
patch (part 3: test)

Review of attachment 737233 [details] [diff] [review]:
-----------------------------------------------------------------

I really appreciate the test

don't do the hg cp thing here.

it would be good if the test set the prefs you rely on explicitly.
Comment 48 Luke Deller 2013-05-17 00:35:36 PDT
(In reply to Patrick McManus [:mcmanus] from comment #44)
> it would be good if you could measure how many bytes this adds to an opt
> windows build... both for the download and code size.

I did an optimized Windows build from today's hg tip with no .mozconfig, before and after my patches (all of them, not just the "import XZ embedded" patch, otherwise the linker would optimize it out).

Before patches:
- xul.dll is 19,309,056 bytes
- firefox-24.0a1.en-US.win32.installer.exe is 20,951,581 bytes
After patches:
- xul.dll is 19,317,760 bytes
- firefox-24.0a1.en-US.win32.installer.exe is 20,955,469 bytes
Comment 49 Luke Deller 2013-05-17 04:26:35 PDT
(In reply to Patrick McManus [:mcmanus] from comment #45)
> Comment on attachment 737131 [details] [diff] [review]
> patch (part 1: code)

Thanks Patrick for your review.  I'm working through adjustments accordingly.  Just a couple that need conversation:

> I'm trying to understand why you have the mLazyInitLock and relatedly why
> the prefobserver is defined to be THREADSAFE_ISUPPORTS (instead of regular
> ISUPPORTS). When can this code called from a non-main thread?

mLazyInitLock is needed because the pref observer is lazily initialized (registered with the pref service) the first time an nsHTTPXzConv is constructed, which I expect can happen in parallel if there are concurrent HTTP requests both requiring xz decoding.

I would have liked to do this initialisation non-lazily, at the point when nsHTTPXzConv::Init is called from nsNetStartup.  However that proved to be too early: the pref service intialisation indirectly calls nsNetStartup (when loading its config file), so the pref service is not yet available to us at that point.

As for why this class is THREADSAFE_ISUPPORTS, that is me following other pref observers that I saw in the netwerk module such as nsHttpHandler, nsDNSService and nsFtpProtocolHandler - but in retrospect they all support other interfaces too, so if in fact the pref updates do not happen concurrently then we can lose the thread safety.

> Your handling of the disabled pref doesn't seem sufficient. We'll just throw
> errors. We need to remove it from the Accept header request line when its
> disabled.

There was already a separate pref to remove it from the Accept header request line: network.http.accept-encoding.

I added another pref here in response to comment 34 above.  Should I make this new pref also trigger an update to network.http.accept-encoding, or leave them separately controllable?

(I would have preferred this pref to be able to remove the xz filter from the stream converter service entirely, but looking at netwerk/streamconv/src/nsStreamConverterService.cpp it seemed that there is not a mechanism to do that currently)
Comment 50 Patrick McManus [:mcmanus] 2013-05-20 06:47:00 PDT
(In reply to Luke Deller from comment #48)

> Before patches:
> - xul.dll is 19,309,056 bytes
> - firefox-24.0a1.en-US.win32.installer.exe is 20,951,581 bytes
> After patches:
> - xul.dll is 19,317,760 bytes
> - firefox-24.0a1.en-US.win32.installer.exe is 20,955,469 bytes

great!
Comment 51 Patrick McManus [:mcmanus] 2013-05-20 06:50:56 PDT
(In reply to Luke Deller from comment #49)
> constructed, which I expect can happen in parallel if there are concurrent
> HTTP requests both requiring xz decoding.

all the decode happens serialized on the main thread. (the requests can be interleaved on the main thread, but the decoder won't actually run in parallel). So you can just MOZ_ASSERT(NS_IsMainThread())

> > Your handling of the disabled pref doesn't seem sufficient. We'll just throw
> > errors. We need to remove it from the Accept header request line when its
> > disabled.
> 
> There was already a separate pref to remove it from the Accept header
> request line: network.http.accept-encoding.

Right.. I think if the disabled pref is set that it should scrub the accept-encoding line. I don't actually care for the fact that we expose the accept header as a pref at all but its kind of too late to change that (addons, etc..) given its not a huge problem.


> (I would have preferred this pref to be able to remove the xz filter from
> the stream converter service entirely, but looking at
> netwerk/streamconv/src/nsStreamConverterService.cpp it seemed that there is
> not a mechanism to do that currently)

you'll need to do it over in httphandler.
Comment 52 Luke Deller 2013-05-27 07:37:32 PDT
Created attachment 754466 [details] [diff] [review]
patch (part 0: import XZ embedded)

minor adjustment to xz_config.h for MSVC boolean from upstream
Comment 53 Luke Deller 2013-05-27 07:59:05 PDT
Created attachment 754472 [details] [diff] [review]
patch (part 1: code)

Updated to address feedback.  Initialisation needed to be reworked: to update the "network.http.accept-encoding" pref in response to changes in the "converter.xz.disabled" pref requires our pref observer to be registered at app startup rather than just the first time someone uses xz encoding.  I put an initialisation call from nsHttpHandler::Init, is that appropriate?
Comment 54 Luke Deller 2013-05-27 08:03:34 PDT
Created attachment 754475 [details] [diff] [review]
patch (part 2: build)

removed support for system liblzma as per feedback
Comment 55 Luke Deller 2013-05-27 08:08:48 PDT
Created attachment 754476 [details] [diff] [review]
patch (part 3: test)

address feedback: don't do hg cp; explicitly set prefs
Comment 56 Gervase Markham [:gerv] 2013-05-28 04:00:05 PDT
Comment on attachment 754466 [details] [diff] [review]
patch (part 0: import XZ embedded)

r=gerv; no additional action needed for code which has been placed in the public domain/given universal permissions using an appropriate notice, which this is.

Gerv
Comment 57 Neil Harris 2013-05-28 05:40:59 PDT
(In reply to Patrick McManus [:mcmanus] from comment #43)
> (In reply to Neil Harris from comment #0)
> 
> > * To reiterate: 30%+ speedup!
> 
> a 30% reduction in bytes is not a 30% speedup. Also, to be fair, this only
> applies to resources that are compressible with this scheme and the bulk of
> a pages byte count are in images which are not applicable here. Indeed less
> than 30% of the page is generally text to start with.
> 

Agreed -- the majority of bytes associated with a page are usually compressed images and fonts, and better text compression won't help with these.

My point here is that the arrival of the initial HTML page content, and to a lesser extent stylesheets and javascript, is the critical path for the first flash of page content on a new page, which is one of the most salient features for user perception of site responsiveness, and these resources are all compressable by LZMA: already-compressed fonts, images and so on can be loaded lazily. 

Better compression of text resource will be even more effective when moving from one page to another on a site where fonts, stylesheets, scripts, site-wide images etc. are already in the cache, and the HTML content and per-page images are the only thing that changes from page to page.

LZMA transfer may also potentially be able to speed up loading of text-encoded dynamic content like JSON resources.
Comment 58 Arco Santosini 2013-05-28 13:29:46 PDT
I mentioned this patch in wikipedia (http://en.wikipedia.org/wiki/HTTP_compression#Content-coding_tokens): lzma[citation needed] - Firefox and Gecko will be supporting LZMA compression, this is particularly interesting for smartphones and tablet where bandwidth is limited: LZMA has a very high compression ratio compared to gzip (patch discussed in [1]).

I think it is important to make this patch known around so that http servers and other browsers might find interest in supporting this compression algorithm.
Comment 59 Justin Lebar (not reading bugmail) 2013-05-28 13:34:27 PDT
> Firefox and Gecko will be supporting LZMA compression

Although I'm excited about LZMA, I don't think we have committed to anything at this point.
Comment 60 Justin Lebar (not reading bugmail) 2013-05-28 13:35:05 PDT
I'd edit the wikipedia article myself, but I think that would count as original research.  :)
Comment 61 Patrick McManus [:mcmanus] 2013-05-28 13:57:13 PDT
(In reply to Arco Santosini from comment #58)
> I mentioned this patch in wikipedia
> (http://en.wikipedia.org/wiki/HTTP_compression#Content-coding_tokens):
> lzma[citation needed] - Firefox and Gecko will be supporting LZMA
>

you really shouldn't say that until it is landed.
Comment 62 Luke Deller 2013-05-30 06:06:00 PDT
I have posted some notes about using this patch onto a wiki page, including how to set up Apache to serve LZMA2 compressed static content:
https://wiki.mozilla.org/LZMA2_Compression
Comment 63 Ted Mielczarek [:ted.mielczarek] 2013-05-30 12:02:55 PDT
Comment on attachment 754475 [details] [diff] [review]
patch (part 2: build)

Review of attachment 754475 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +4064,5 @@
> +    MOZ_XZ=1)
> +
> +if test "$MOZ_XZ" = 1; then
> +    AC_DEFINE(MOZ_XZ)
> +fi

There isn't a system libxz available on Linux, right?

::: modules/xz-embedded/Makefile.in
@@ +1,4 @@
> +#
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.

I'm not wild about this being in modules/, but I guess that's where libbz2 and zlib are so there isn't a better place to put it.

::: modules/xz-embedded/objs.mk
@@ +8,5 @@
> +		xz_dec_lzma2.c \
> +		xz_dec_stream.c \
> +		$(NULL)
> +
> +MODULES_XZEMBEDDED_SRC_CSRCS := $(addprefix $(topsrcdir)/modules/xz-embedded/, $(MODULES_XZEMBEDDED_SRC_LCSRCS))

This is unnecessary unless this code is going to get built from some other directory (which it shouldn't). Just define these all directly in the Makefile.in and get rid of this objs.mk.

::: netwerk/streamconv/converters/Makefile.in
@@ +44,5 @@
>  
> +ifdef MOZ_XZ
> +CPPSRCS += \
> +		HTTPXzConv.cpp \
> +		$(NULL)

You'll need to move this to moz.build now.
Comment 64 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2013-05-30 12:08:44 PDT
I don't have a strong opinion about whether this is a good idea, but we should be aware of the additional attack surface that parsing/decompressing LZMA adds. Even zlib has had exploitable bugs relatively recently. If we want to do this then we should make sure we have that risk under control. I'm not sure what that means (fuzzing? just verifying that the upstream has a good security story).
Comment 65 Luke Deller 2013-05-30 19:47:15 PDT
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #63)

Thanks for looking at this Ted

> There isn't a system libxz available on Linux, right?

Not for "XZ Embedded" - its README explicitly warns against making a shared library, as it does not commit to any API/ABI stability across versions.

There is a system liblzma available on Linux that we could use there.  It is part of the "XZ Utils" source package that also provides the "xz" command line utility.  This is from the same upstream maintainer as "XZ Embedded".

The earlier version of my patch had a --with-system-lzma configure option to use this, and some #define/#ifdefs in the source to gloss over slight differences in the API between liblzma and XZ Embedded.  Patrick's feedback was to remove this:

(Patrick McManus from comment #45)
> I don't care to support system LZMA (MOZ_NATIVE_LZMA) if we have imported this
> code into the tree - that's too confusing and hurts testing by increasing
> diversity. (it also hurts rapid release). Just use the one embedded and drop
> the macros and build support for system versions.
Comment 66 Ted Mielczarek [:ted.mielczarek] 2013-05-31 05:01:31 PDT
Okay, thanks for the info. This is just the sort of thing that Linux distro maintainers will inevitably ask for. If we're committing to only supporting XZ embedded then it doesn't sound like an issue.
Comment 67 Luke Deller 2013-05-31 07:13:02 PDT
Created attachment 756558 [details] [diff] [review]
patch (part 2: build)

updated build patch to address Ted's feedback
Comment 68 Ted Mielczarek [:ted.mielczarek] 2013-05-31 10:01:33 PDT
Comment on attachment 756558 [details] [diff] [review]
patch (part 2: build)

Review of attachment 756558 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/xz-embedded/Makefile.in
@@ +14,5 @@
> +		xz_crc32.c \
> +		xz_crc64.c \
> +		xz_dec_lzma2.c \
> +		xz_dec_stream.c \
> +		$(NULL)

Please use two-space indent, not tabs. Also note that you'll be racing to land with bug 870406, so you may have to move this to moz.build if you lose that race.

@@ +18,5 @@
> +		$(NULL)
> +
> +LIBRARY_NAME	= mozxz
> +
> +EXPORTS		= xz.h

If you're only going to use this in one place you could leave this out and just use "LOCAL_INCLUDES += $(topsrcdir)/modules/xz-embedded" in the Makefile where you're using it.

::: netwerk/streamconv/converters/moz.build
@@ +29,5 @@
>      ]
> +
> +if CONFIG['MOZ_XZ']:
> +    CPP_SOURCES += [
> +        'HTTPXzConv.cpp',

Make sure that either this patch gets merged with the other one, or this diff hunk gets moved to the other patch before landing, since this patch clearly won't build on its own (if a bug is landed as multiple changesets, they should all be individually buildable).
Comment 69 Daniel Veditz [:dveditz] 2013-05-31 14:46:00 PDT
yes, we should do a security review of this feature to determine how much upstream testing there's been and what fuzzing we want to do on our end. 

If we're importing the code I guess that means we need to add LZMA to the list of things we watch for security updates at the very least.

> +pref("converter.xz.disabled", false);

Our code prefers "enabled" prefs over "disabled" by about 4 to 1, most of us find it slightly more readable. Obviously a sizeable minority of 20% disagree so I only offer it as a suggestion.

Another name nit: shouldn't this be "network.something", probably even "network.http.something"? I'll defer to the network peers on that but this functionality seems very unlike the "converter.html2txt.*" prefs that currently exist.
Comment 70 Patrick McManus [:mcmanus] 2013-07-03 08:22:24 PDT
Comment on attachment 754472 [details] [diff] [review]
patch (part 1: code)

Review of attachment 754472 [details] [diff] [review]:
-----------------------------------------------------------------

I apologize for holding this back - I've just been busy.

This is making real progress - thanks Luke!

The primary issues are couple cases of reference counting and dealing with the defaults in case the prefs aren't set the way you think they are. (prefs vary by product). I also agree with a previous commenter that the pref should be for "enabled" rather than "disabled"

There are a lot of formatting issues, I only mentioned a few.. chiefly, please review for whitespace at the end of lines and functions should look like
foo::bar(uint32_t apple,
        uint32_t pear)
{}

instead of
foo::bar(uint32_t apple,
  uint32_t pear)
{}

More substantively, to move forward with this we need some kind of partner on the server side.. it doesn't need to be a big thing - it can just be a patch included in some other open source project... or maybe even a pledge to do so. I suggested before that you approach the mod_pagespeed folks and gague their interest - have you done that or something else? I don't want to include code that doesn't have a dance partner and can't find one either.

::: netwerk/streamconv/converters/HTTPXzConv.cpp
@@ +20,5 @@
> +namespace net {
> +
> +// nsISupports implementation
> +NS_IMPL_THREADSAFE_ISUPPORTS3(HTTPXzConv,
> +  nsIStreamConverter,

indentation

@@ +64,5 @@
> +{
> +  MOZ_ASSERT(NS_IsMainThread()); // otherwise we may have a race condition
> +
> +  nsresult rv = Preferences::AddStrongObserver(this, kDisabledPrefName);
> +  NS_ENSURE_SUCCESS_VOID(rv); 

whitespace

@@ +77,5 @@
> +  sPrefObserver = nullptr;
> +}
> +
> +nsresult
> +HTTPXzConvPrefObserver::Observe(nsISupports* subject, const char* topic,

its a minor problem that the a-e header only gets rewritten on an observation.. it could actually start out in a way that doesn't match your enabled state - so you probably need to
make sure to do this rewrite at least once. (i.e different products use different pref files, etc..) The c++ code should assume its is disabled unless it sees a pref that enables it (the opposite of what it is now)

@@ +78,5 @@
> +}
> +
> +nsresult
> +HTTPXzConvPrefObserver::Observe(nsISupports* subject, const char* topic,
> +  const PRUnichar* data)

indentation

@@ +97,5 @@
> +  // value we send for the Accept-Encoding: request header accordingly.
> +  // First get the existing value from "network.http.accept-encoding" pref:
> +  nsAutoCString acceptEncoding;
> +  nsresult rv = Preferences::GetCString(kAcceptEncodingPrefName,
> +    &acceptEncoding);

indentation

@@ +163,5 @@
> +}
> +
> +HTTPXzConv::~HTTPXzConv()
> +{
> +  NS_IF_RELEASE(mListener);

this is a smart pointer so manual reference count changes are suspicious

@@ +175,5 @@
> +{
> +  if (!sPrefObserver) {
> +    sPrefObserver = new HTTPXzConvPrefObserver();
> +  }
> +  nsCOMPtr<nsIObserver> prefObserver = sPrefObserver; // this adds a ref

can't you just get rid of the global sPrefObserver and replace it with mPrefObserver = new  HTTPXzConvPrefObserver(); and then use mPrefObserver in its place.. that also lets you get rid of the already_AddRefed return value and the forget()

@@ +180,5 @@
> +  return prefObserver.forget();
> +}
> +
> +NS_IMETHODIMP
> +HTTPXzConv::AsyncConvertData(const char* aFromType, 

trailing white space on all these argument and indentation

@@ +191,5 @@
> +    return NS_ERROR_ABORT; // XZ decoding disabled by pref
> +
> +  // hook ourself up with the receiving listener. 
> +  mListener = aListener;
> +  NS_ADDREF(mListener);

mlistener is a smart pointer.. why are you manually adding another ref?
Comment 71 Luke Deller 2013-07-22 04:56:57 PDT
Thanks for getting to this Patrick.  I have been on vacation for a couple of weeks, back now and looking at addressing your comments.  On a couple of your points:

(In reply to Patrick McManus [:mcmanus] from comment #70)
> More substantively, to move forward with this we need some kind of partner
> on the server side.. it doesn't need to be a big thing - it can just be a
> patch included in some other open source project... or maybe even a pledge
> to do so. I suggested before that you approach the mod_pagespeed folks and
> gague their interest - have you done that or something else? I don't want to
> include code that doesn't have a dance partner and can't find one either.

Does Apache's existing content negotiation feature suffice for this?  This is what I have been using for testing.  I put some notes on how to set this up on the wiki page mentioned above: https://wiki.mozilla.org/LZMA2_Compression

I was a bit shy of approaching the mod_pagespeed people as I wasn't sure whether this functionality would fit better into mod_deflate (or maybe a new module similar to mod_deflate).  I can pursue dynamic compression further if you think it is worthwhile at this stage.

> @@ +175,5 @@
> > +{
> > +  if (!sPrefObserver) {
> > +    sPrefObserver = new HTTPXzConvPrefObserver();
> > +  }
> > +  nsCOMPtr<nsIObserver> prefObserver = sPrefObserver; // this adds a ref
> 
> can't you just get rid of the global sPrefObserver and replace it with
> mPrefObserver = new  HTTPXzConvPrefObserver(); and then use mPrefObserver in
> its place.. that also lets you get rid of the already_AddRefed return value
> and the forget()

We need this preference observer to remain alive throughout the lifetime of the application, so that any changes to the disabled/enabled pref will result in the accept-encoding pref being updated.  The observer will be constructed when the static method containing this code (HTTPXzConv::InitPrefObserver) is called from nsHttpHandler.cpp at startup, and it is destroyed when the preference service releases its reference to it at shutdown.  The member variable mPrefObserver in the HTTPXzConv class is just me being defensive: the HTTPXzConv instance holds a reference to the observer so that it cannot be destroyed while the converter is alive.  Would you rather drop mPrefObserver?  Otherwise do you think this approach is reasonable?
Comment 72 Alex Xu 2013-08-15 19:00:30 PDT
Created attachment 791070 [details] [diff] [review]
patch (part 1: code) + whitespace fixes

I believe I have fixed all of the whitespace issues.

I understand that typically in Mozilla culture, patches are usually not changed by people other than the original submitter.

However, I'd like to poke this into getting moving again. :)

I hope I have not offended.
Comment 73 Jonathan Kew (:jfkthame) 2013-09-10 22:53:02 PDT
Before we decide to press ahead and deploy this, I think we should also consider the new Brotli format under development by Google's data-compression team. See my post in dev.platform for further information.[1]

ISTM that Brotli might be a better choice overall, given the substantially faster decompression rates it offers.


[1] https://groups.google.com/forum/#!topic/mozilla.dev.platform/CBhSPWs3HS8
Comment 74 (dormant account) 2013-09-18 13:40:56 PDT
(In reply to Jonathan Kew (:jfkthame) from comment #73)
> ISTM that Brotli might be a better choice overall, given the substantially
> faster decompression rates it offers.

I think we should land this on development channels and hold it back from release while we discuss it so people can experiment to see pro/cons of lzma+alternatives.
Comment 75 Tim Ruehsen 2014-01-02 03:04:51 PST
Wikipedia http://en.wikipedia.org/wiki/HTTP_compression still talks about 'lzma' token, but as I have seen in the current patch files, it should be 'xz'. What should the article say / promote ?

Just to mention:
I experimentally implemented xz / lzma for the client tool Mget (https://github.com/rockdaboot/mget).
Comment 76 Luke Deller 2014-01-14 05:08:32 PST
Created attachment 8359763 [details] [diff] [review]
patch (part 1: code)

update to work with current hg tip, incorporate feedback from Patrick and whitespace fixes from Alex (thanks!)
Comment 77 Luke Deller 2014-01-14 05:16:26 PST
Created attachment 8359769 [details] [diff] [review]
patch (part 2: build)

update to work with current hg tip, including that we lost the race to land with bug 870406 that Ted warned of :-)
Comment 78 Luke Deller 2014-01-14 05:41:40 PST
Created attachment 8359780 [details] [diff] [review]
patch (part 3: test)

update to work with current hg tip
Comment 79 Ted Mielczarek [:ted.mielczarek] 2014-01-14 06:21:15 PST
Comment on attachment 8359769 [details] [diff] [review]
patch (part 2: build)

Review of attachment 8359769 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/xz-embedded/moz.build
@@ +4,5 @@
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +EXPORTS += [
> +    'xz.h',

As I said before, if you're only using this header in one place it might be better to just use LOCAL_INCLUDES in that place instead.
Comment 80 Patrick McManus [:mcmanus] 2014-01-14 06:33:07 PST
to move forward here we're going to need some kind of server side support that indicates someone really wants to run this on the web. Being able to configure apache to do so, but having no real evidence of anyone willing to do so isn't really good enough to add a new format to the web. We'll want to be able to evaluate the results.

I'm going to clear the review flags until there is evidence of that. But don't take this the wrong way - I'd like to give it a try.
Comment 81 Luke Deller 2014-01-15 03:08:37 PST
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #79)
> As I said before, if you're only using this header in one place it might be
> better to just use LOCAL_INCLUDES in that place instead.

Oh I meant to run this by you: I found that I would need to add LOCAL_INCLUDES to two different moz.build files so does your preference remain the same in this case?

(xz.h is included into HTTPXzConv.h which is included into both netwerk/streamconv/converters/HTTPXzConv.cpp and netwerk/protocol/http/nsHttpHandler.cpp, so I would need to add LOCAL_INCLUDES to both netwerk/streamconv/converters/moz.build and netwerk/protocol/http/moz.build)
Comment 82 Luke Deller 2014-01-15 03:22:26 PST
(In reply to Patrick McManus [:mcmanus] from comment #80)
> to move forward here we're going to need some kind of server side support
> that indicates someone really wants to run this on the web. Being able to
> configure apache to do so, but having no real evidence of anyone willing to
> do so isn't really good enough to add a new format to the web. We'll want to
> be able to evaluate the results.

Ok - I had been hoping that the existing support for this in Apache via content negotiation would suffice initially, but wasn't sure of how you felt about that, so thanks for clarifying and I will pursue something more as you suggest.
Comment 83 Ted Mielczarek [:ted.mielczarek] 2014-01-15 11:26:09 PST
(In reply to Luke Deller from comment #81)
> Oh I meant to run this by you: I found that I would need to add
> LOCAL_INCLUDES to two different moz.build files so does your preference
> remain the same in this case?

I guess do whatever's easiest. Traditionally EXPORTS was "public headers", but that distinction is fuzzy nowadays.
Comment 84 James Willcox (:snorp) (jwillcox@mozilla.com) 2014-07-03 14:00:20 PDT
Patrick, can we get the ball rolling on this again? I would like to use this with the proxy server.

Luke, are you interested in working on this again? The patches mostly seem to apply and work. I just needed a couple tweaks in SpdyStream so that we actually send 'accept-encoding' headers. In SPDY, it's assumed you have gzip and deflate available.
Comment 85 James Willcox (:snorp) (jwillcox@mozilla.com) 2014-07-03 14:01:42 PDT
A git branch with the relevant patches applied on top of a recent gecko is here: https://github.com/snorp/gecko-dev/tree/xz
Comment 86 Patrick McManus [:mcmanus] 2014-07-03 18:15:42 PDT
snorp - I'd like you to schedule a sec-review if you want to champion this. I'm just worried about the attack surface tradeoof
Comment 87 Cristian 2014-07-10 07:37:52 PDT
(In reply to Patrick McManus [:mcmanus] from comment #80)
> to move forward here we're going to need some kind of server side support
> that indicates someone really wants to run this on the web. Being able to
> configure apache to do so, but having no real evidence of anyone willing to
> do so isn't really good enough to add a new format to the web. We'll want to
> be able to evaluate the results.
> 
> I'm going to clear the review flags until there is evidence of that. But
> don't take this the wrong way - I'd like to give it a try.

I think a viable way to get this implemented in a web server is to create a LZMA version of nginx's ngx_http_gzip_static_module. This way, web authors who already use build scripts (e.g. grunt-contrib-compress) to generate .gz files of their static files can add a build step to generate LZMA files. This would enable LZMA for static assets (CSS, JS, web fonts etc).
Comment 88 ashumeow 2014-07-30 00:14:39 PDT
I read this link:- https://wiki.mozilla.org/LZMA2_Compression
I found that cool Mozillians are into "xz embedded" in LZMA. The last version "xz" thingy was stable since 2009, which means that there was no update for last 5 years. LZMA/LZMA2 can be recommended, i also recommend it since i am trying to propose an improve version of LZMA.
And LZMA is good for HTTP encoding, since it has good compression power. (^_^)
Comment 89 ashumeow 2014-07-30 07:07:21 PDT
I have to disagree with the last disadvantage of yours.
I have done benchmark testing for LZMA using an existing tool which i am working for my college project, You can find my results here:- http://geekresearchlab.net/mtechproject/index.php?controller=post&action=view&id_post=16
The memory can have higher compression. (^_^) Look at the CPU utilization power too.
The actually disadvantage is:- The "speed" of compression is not satisfactory.

(In reply to Neil Harris from comment #0)
> Adding LZMA as a transfer-encoding method for HTTP offers the prospect of 30
> - 40% greater compression for page downloads, compared to existing gzip
> compression, and hence could produce significantly increased page-load speed
> for mobile devices with restricted bandwidth. This would also benefit users
> in developing countries, where high-speed access is expensive or
> unavailable: for example, for the version of Firefox in the OLPC laptop.
> 
> Example:
> 
> On three representative web pages downloaded from the web, I got the
> following results (figure represent the total length of the three files when
> each compressed individually)
> 
> Uncompressed: 201185 bytes, 100%
> gzip:          60911 bytes,  30% of original
> lzma:          44111 bytes,  21% of original, a factor of 1.38 less than gzip
> 
> Thus, the extra compression offers a 38% increase in download speed relative
> to gzip encoding, or, to put it another way, could offer a 28% reduction in
> page download time. Since LZMA, like gzip, is a streaming compression
> technique, these increases would apply to incremental as well as total page
> load time.
> 
> Advantages:
> * LZMA offers a significant speedup for bandwidth-restricted devices, where
> page load speed is most critical
> * Would only affect the HTTP transfer code, would be completely transparent
> to every other part of the application
> * HTTP Accept-transfer-encoding mechanism allows for complete forwards and
> backwards compatibility with all existing web server and client
> implementations* Code is small, won't add much to browser footprint
> * Will increase both total and time-to-first rendering page load times
> * Appears to be unencumbered by patent issues (please check!)
> * Free software implementations readily available
> * To reiterate: 30%+ speedup!
> 
> Possible disadvantages (and mitigating factors):
> * LZMA is not yet standardized (but open source implementations are widely
> available, so implement first, then standardize -- perhaps call the method
> x-mozilla-lzma until it is standardized) 
> * not yet supported by any webservers (this is a chicken-and-egg situation:
> support in Firefox will certainly drive adoption: in particular, just adding
> LZMA compression in Apache would instantly make LZMA available for 50% of
> the web server installed base)
> * memory footprint and CPU overhead need investigating (but implementation
> cannot make anything worse than status quo, even in the worst case, since
> neither web servers nor browsers would be forced to use it, so it can be
> switched off where necessary, and both CPU speed and memory are getting
> bigger much faster than download speeds for mobile devices -- also, since
> the CPU is often idle while waiting for download data, this may still
> increase download speeds even on relatively slow platforms: needs
> benchmarking)
Comment 90 Alex Xu 2014-07-30 07:31:20 PDT
Please discuss pros and cons on the mailing list; bugzilla is for technical implementation discussion only.
Comment 91 ashumeow 2014-07-30 09:30:53 PDT
In that comment, i posted a "link" of my benchmark testing thingy of LZMA, which is a part of a technical discussion. Read the full comment before you type. (-_-)

(In reply to Alex Xu from comment #90)
> Please discuss pros and cons on the mailing list; bugzilla is for technical
> implementation discussion only.
Comment 92 naelphin 2014-09-16 02:27:48 PDT
Chrome tried adding bzip2 to the compression list back in 2009: https://code.google.com/p/chromium/issues/detail?id=1480

They found out that were helpful transparent proxies out there than automatically gzipped the content in transit and changed its mime type, so when the browser decoded it thinking it was gzipped, it ended up with a bzip2 blob instead, breaking the display. They had to back out the bzip2 from the header request.

There's little doubt there are still helpful proxies out there that do this, so it is unlikely there will be a non-gzipped compression method anytime soon.
Comment 93 anthonyryan1 2014-09-27 21:35:05 PDT
https://code.google.com/p/chromium/issues/detail?id=14801 actually.

(In reply to naelphin from comment #92)
> Chrome tried adding bzip2 to the compression list back in 2009:
> https://code.google.com/p/chromium/issues/detail?id=1480
Comment 94 ashumeow 2014-09-30 00:07:43 PDT
(In reply to naelphin from comment #92)
> Chrome tried adding bzip2 to the compression list back in 2009:
> https://code.google.com/p/chromium/issues/detail?id=1480
> 
> They found out that were helpful transparent proxies out there than
> automatically gzipped the content in transit and changed its mime type, so
> when the browser decoded it thinking it was gzipped, it ended up with a
> bzip2 blob instead, breaking the display. They had to back out the bzip2
> from the header request.
> 
> There's little doubt there are still helpful proxies out there that do this,
> so it is unlikely there will be a non-gzipped compression method anytime
> soon.

(In reply to anthonyryan1 from comment #93)
> https://code.google.com/p/chromium/issues/detail?id=14801 actually.
> 
> (In reply to naelphin from comment #92)
> > Chrome tried adding bzip2 to the compression list back in 2009:
> > https://code.google.com/p/chromium/issues/detail?id=1480


Actually, BZip2 could be less efficient than LZMA. WHY??? Because earlier Bzip used arithmetic coding and then, in newer versions stable as 2010, they have shifted to Huffman coding.
Speaking of Statistical coding power, Faster Arithmetic coding which is called range coding is being used in LZMA. The faster arithmetic coding is way better than the normal arithmetic coding as well as Huffman coding. That means, LZMA has huge chance even over GZip.
The Question is:- Then, how come GZip could be better than LZMA? The answer is simple. Actually, LZMA has problem in dictionary coding where it uses LZ77 (Sliding Window Mechanism) and that has lot of drawbacks.
An improved version of LZMA is needed though. It just requires minor improvements.
If anybody liked my comment:- Then kindly visit my undergoing project site http://www.geekresearchlab.net/mtechproject which is based on this LZMA thingy (^_^) Happy to participate here!
Comment 95 jonas echterhoff 2014-11-20 04:38:06 PST
I just read the comments here about compression times, and wanted to drop a comment from our point of view:

I work at Unity, and we are working on making the Unity Game Engine export to WebGL (see this blog post for some playable samples: http://blogs.unity3d.com/2014/04/29/on-the-future-of-web-publishing-in-unity/). Our use case is a little different from traditional "web pages" as we use this technology to load huge game worlds in the browser, which means that the uncompressed size of code and data for a page might be well in excess of 100MB, so download times are critical for the user experience. We normally pre-compress all data to gzip before uploading it to the web server, so that there is no server-side compression involved (compression is only done once at player build time). So for us, the tradeoff of having 30-40% smaller downloads at the cost of some slightly longer player build times is a no-brainer.

We would love to see this feature in a future version of Firefox (and other browsers)!
Comment 96 Tim Ruehsen 2014-11-20 06:50:51 PST
30-40% was an estimate somewhere. I really depends very much on your data (e.g. I remember 10-15% for some artificial data).
Could you make a test with a typical build and post the figures here ?
So that we have figures for a real use case.
Thank you.
Comment 97 alpha_one_x86 2014-11-20 07:01:40 PST
http://catchchallenger.first-world.info/official-server/datapack-explorer.html -> for the page ont this section on my dev server, the difference for compressed elements (css, html, js) do a 2x-3x more compact with xz -9 over gzip.
The factor: file size, actually on typical cms, have large js/css/html then xz is useful
File type: js/css mostly minimized take advantage of it, html if is statically doing

On poor states (see http://www.netindex.com/) or over tor/i2p, each byte win is better.
Comment 98 jonas echterhoff 2014-11-20 07:22:54 PST
For some real life numbers, I took our WebGL build of the game Dead Trigger 2 (http://beta.unity3d.com/jonas/DT2/).

The total size of the data and code that that game downloads is 
110.9 MB uncompressed
38.8 MB gzip
31.6 MB lzma 

(so that is a saving of 18.5%. Significant, but indeed less then the quoted 30-40%)

If I specifically look at the game's JavaScript code (without the game data), the numbers are more in favor of lzma:
39.9 MB uncompressed
7 MB gzip
4 MB lzma
(a saving of 43%!).

The latter number is perhaps more important to us. Because we have the capability to stream in game assets as the player progresses in the game, or by showing interactive loading screens. But we cannot do anything before the code is loaded.
Comment 99 James Willcox (:snorp) (jwillcox@mozilla.com) 2014-11-20 07:30:31 PST
(In reply to jonas echterhoff from comment #98)

In the case of Unity, I don't think you really need to have browser support (or at least not the browser support described in this bug). Since you control how the assets are fetched, it should be possible to decompress the LZMA payloads in JS. Native code would be faster, but maybe an emscriptened lzma lib would be good enough.
Comment 100 jonas echterhoff 2014-11-20 07:42:24 PST
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #99)

Indeed we could decompress LZMA in JS and we have been experimenting with doing exactly that. However, at least when we tried, the lzma lib was fairly slow when running in JS through emscripten, so the net result was slower then using gzip through the browser (at least for any decent connection speed). 

Alon Zakai had the following comment on lmza performance in emscripten:
"The slow perf is because all the LZMA implementations I can find use 64-bit ints heavily. These are not close to native speed even in asm.js, as JS has no native way to represent them."

Additionally, the browser could handle the decompression asynchronously while downloading, whereas in JavaScript, I could not do this (without additional tricks such as chunking up the data), as the XmlHttpRequest class will not give me access to the data before the download has finished.
Comment 101 James Willcox (:snorp) (jwillcox@mozilla.com) 2014-11-20 07:59:32 PST
(In reply to jonas echterhoff from comment #100)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #99)
> 
> Indeed we could decompress LZMA in JS and we have been experimenting with
> doing exactly that. However, at least when we tried, the lzma lib was fairly
> slow when running in JS through emscripten, so the net result was slower
> then using gzip through the browser (at least for any decent connection
> speed). 
> 
> Alon Zakai had the following comment on lmza performance in emscripten:
> "The slow perf is because all the LZMA implementations I can find use 64-bit
> ints heavily. These are not close to native speed even in asm.js, as JS has
> no native way to represent them."

Ah, well that's a bummer.

> 
> Additionally, the browser could handle the decompression asynchronously
> while downloading, whereas in JavaScript, I could not do this (without
> additional tricks such as chunking up the data), as the XmlHttpRequest class
> will not give me access to the data before the download has finished.

Chunking it wouldn't be too hard, you could probably just use range requests. I'm not sure if that would cause problems with CDNs, etc, however. The decompression would happen in a web worker. If the decompression itself is just hopelessly slow, though, I guess that's a problem.
Comment 102 Luke Wagner [:luke] 2014-11-20 08:22:24 PST
Supporting what Jonas is saying: minified asm.js seems to compress really well with lzma compared gzip, esp at larger file sizes.  Here are some more data points, taken from the (minified) asm.js files of the recent Humble Bundle release:

                  gzip   lzma    reduction
  Voxatron:       396k   309k    28%
  Osmos:          654k   481k    26%
  Zenbound2:      757k   548k    27%
  Democracy3:     892k   617k    30%
  FTL:           1426k   978k    31%
  Dustforce DX:  1560k   948k    39%
  Super Hexagon: 1562k  1081k    30%
  AaaaaaaAAAaaa: 4952k  3076k    37%
  Jack Lumber:   5520k  3352k    39%

(Note: iiuc, these are all distinct C++ codebases being compiled by Emscripten.)
Comment 103 Jeff Muizelaar [:jrmuizel] 2014-11-20 10:18:10 PST
Can you try out brotli https://github.com/google/brotli. It should give comparable compression performance with much better decode speed.
Comment 104 Luke Wagner [:luke] 2014-11-20 11:27:04 PST
                  gzip   lzma (% smaller gzip)   brotli (% smaller gzip)
  Voxatron:       396k   309k (28%)                343k (13%)
  Osmos:          654k   481k (26%)                542k (17%)
  Zenbound2:      757k   548k (27%)                609k (20%)
  Democracy3:     892k   617k (30%)                703k (21%)
  FTL:           1426k   978k (31%)               1128k (20%)
  Dustforce DX:  1560k   948k (39%)               1126k (28%)
  Super Hexagon: 1562k  1081k (30%)               1261k (19%)
  AaaaaaaAAAaaa: 4952k  3076k (37%)               3719k (25%)
  Jack Lumber:   5520k  3352k (39%)               4051k (26%)

Regarding decompression speed: if it happens on a background thread in a pipelined fashion (it does right?) AND if we assume more than one core (which, based on http://stats.unity3d.com/web/cpu.html, is a pretty reasonable assumption these days, at least for some application domains like gaming) (AND we ignore power), then it seems like all that matters is that lzma can decompress faster than the network can download.  I don't have any data on in-memory decompression speed of lzma; do you?

Also: do you know if the encode speed of the brotli code you linked to is representative?  I ask because encoding is massively slower than lzma.  Like, for AaaaaAAaaa and Jack Lumber, it took ~2m20s whereas xz took ~12s (this is after I changed the makefile to build with -O3).
Comment 105 Patrick McManus [:mcmanus] 2014-11-24 10:53:47 PST
I continue to be open to the prospect of adding a non-deflate based decoder and the negotiation for it as part of Accept-Encoding.

I've said upthread a few times that what we need is a content provider willing to do make a real use case available for negotiation that way - then we can see how it works out. Without a dance partner this is just a lot of buffer management code in the tree and that's never a good thing :)

I don't have my heart set on either lzma or brotli (and I reserve the right to set my heart later :)) but in either event someone would probably have to put an unbitrotted patch set forward.
Comment 106 Jyrki Alakuijala 2014-11-26 02:36:37 PST
Numbers for decompression speed for a corpora of asm.js files with C++ on a modern high-end desktop cpu (E5-1650 v2 @ 3.50GHz):
gzip 370.15 MB/s, compression density 4.745x
brotli 348.21 MB/s, compression density 6.264x
lzma 120.59 MB/s, compression density 7.309x

Decompression speeds are measured from the uncompressed size.

Some fuzzy opinions around the topic:

Handheld devices have lesser cpus. To approximate this, we can look at a 2005 2.4 GHz AMD CPU that showed of 65-83 MB/s for gzip decompression and 20-33 MB/s for LZMA. http://tukaani.org/lzma/benchmarks.html

A decompression speed of 20 MB/s with a 7x compression ratio corresponds to a network speed of 23 Mbit/s. If one has a faster network connection, then decompression will slow it down to an effective 23 Mbit/s. This is even more annoying if the file is coming from a local cache or a near-by network proxy -- or the local cache is force to hold an uncompressed copy.

These compression density numbers for asm.js are not representative for html docs, there brotli and lzma have more of an equal match for compression density, but brotli still decompresses faster. In a multi-lingual reference corpus that I use for html, brotli is 2 % more dense than lzma and 17 % more dense than gzip.

Minified JavaScript benefits less from improved compression, only about 10 % with Brotli vs gzip and 11 % with LZMA vs gzip, but there one still has to pay the full cost of slower decompression with LZMA.
Comment 107 Boyan Ilianov 2015-01-02 12:42:41 PST
Hey guys can you try out using this implementation? https://github.com/nmrugg/LZMA-JS
Comment 108 bugzilla.mozilla.org 2015-06-12 17:06:40 PDT
(In reply to Boyan Ilianov from comment #107)
> Hey guys can you try out using this implementation?
> https://github.com/nmrugg/LZMA-JS

I created some tests, see: https://github.com/pypyjs/pypyjs.github.io/issues/4#issuecomment-111549828
Comment 109 Jyrki Alakuijala 2015-09-17 09:16:35 PDT
I want to share an update about the performance numbers on the asm.js corpus as we have improved brotli's decompression speed and compression density. Gzip is here with a 15 bit window, while lzma, lzham and brotli are applied with a 24 bit window, each algorithm with its highest quality setting.

gzip 370.15 MB/s, compression density 4.745x
brotli 392.73 MB/s, compression density 7.071x
lzma 120.59 MB/s, compression density 7.284x
lzham 242.14 MB/s, compression density 7.008

For this corpus brotli compresses 3 % less than lzma, 1 % more than lzham, and 33 % more densely than gzip.
Comment 110 alpha_one_x86 2015-09-17 09:48:34 PDT
Take care, some algo have similar compression/decompression speed, other not. In my case I clearly need the best ratio, at cost of speed.
Comment 111 Neil Harris 2015-09-18 05:00:51 PDT
(In reply to Jyrki Alakuijala from comment #109)
> I want to share an update about the performance numbers on the asm.js corpus
> as we have improved brotli's decompression speed and compression density.
> Gzip is here with a 15 bit window, while lzma, lzham and brotli are applied
> with a 24 bit window, each algorithm with its highest quality setting.
> 
> gzip 370.15 MB/s, compression density 4.745x
> brotli 392.73 MB/s, compression density 7.071x
> lzma 120.59 MB/s, compression density 7.284x
> lzham 242.14 MB/s, compression density 7.008
> 
> For this corpus brotli compresses 3 % less than lzma, 1 % more than lzham,
> and 33 % more densely than gzip.

Since the low compression speed of LZMA has been the principal objection to implementing this, I think this is a game changer.

It's clear from the above that Brotli now looks like being the superior algorithm, as it has near-LZMA compression size performance at near-gzip compression speed performance. So let's go for Brotli!
Comment 112 alpha_one_x86 2015-09-18 05:27:30 PDT
It's not a problem for static file where the compression is cached.
I live in bolivia, where the private server is hosted on 50KB/s upload. Then where the compression speed need just be greater than 1MB/s to sature the inter-city connexion...
Comment 113 Neil Harris 2015-09-18 05:58:15 PDT
I should just clarify my position: I'm _not_ saying we shouldn't deploy LZMA in favour of Brotli; I think we should [experimentally at first] deploy both, and trial this in the field with real-world webservers to find out the scale of the real speed advantages, which I think will be significant in many cases, particularly in the common cases of mobile or developing countries where RTTs are high and available bandwidth is low.
Comment 114 Neil Harris 2015-09-18 06:06:16 PDT
One final (for now) aside: we should also look at the possibility of supporting SDCH (see https://engineering.linkedin.com/shared-dictionary-compression-http-linkedin ) which potentially offers even higher degrees of compression, but at the cost of a lot of added complexity and inter-relationships within the web platform, and also some possible security risks if weak hashes were used. 

But since implementing SDCH would be a radically different issue from the one in this bug, this isn't the place to discuss it any further; I will look at filing a separate bug for SDCH support.
Comment 115 alpha_one_x86 2015-09-19 03:43:16 PDT
I will add brotli to:
http://catchchallenger.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO
But cearly lack of benchmark/test on internet
Comment 116 Jyrki Alakuijala 2015-09-19 11:32:45 PDT
#115 Quick_Benchmark is about compressing 445 MB single file, while the average web html document is 5000 times smaller. If you want to compare with an html file, you can look at the compression density for 'cp.html' in the Canterbury corpus: https://quixdb.github.io/squash-benchmark/

The '3 % less compression density than lzma' results I showed in #109 were for an asm.js corpus. For a multi-lingual html corpus with 93 languages brotli compresses 11 % more densely than lzma. Brotli needs less warm-up time and is relatively more efficient for small (roughly 1 MB or less) files.

#114 Brotli's current encoder and decoder implementations have an interface for the custom dictionaries in SDCH (sandwich), so later we could serve brotli sandwich, too.
Comment 118 Patrick McManus [:mcmanus] 2015-09-21 08:07:37 PDT
Created attachment 8663695 [details] [diff] [review]
patch 1, update brotli snapshot
Comment 119 Patrick McManus [:mcmanus] 2015-09-21 08:07:59 PDT
Created attachment 8663696 [details] [diff] [review]
patch 2, fix nsHTTPCompressConv indentation
Comment 120 Patrick McManus [:mcmanus] 2015-09-21 08:08:14 PDT
Created attachment 8663697 [details] [diff] [review]
patch 3, fix nsHTTPCompressConv bracing style
Comment 121 Patrick McManus [:mcmanus] 2015-09-21 08:08:25 PDT
Created attachment 8663698 [details] [diff] [review]
patch 4, fix nsHTTPCompressConv namespace
Comment 122 Patrick McManus [:mcmanus] 2015-09-21 08:08:35 PDT
Created attachment 8663699 [details] [diff] [review]
patch 5, fix nsHTTPCompressConv manual addref
Comment 123 Patrick McManus [:mcmanus] 2015-09-21 08:08:45 PDT
Created attachment 8663700 [details] [diff] [review]
patch 6, support different content encodings for http vs https
Comment 124 Patrick McManus [:mcmanus] 2015-09-21 08:08:56 PDT
Created attachment 8663701 [details] [diff] [review]
patch 7, content-encoding brotli for https
Comment 125 Patrick McManus [:mcmanus] 2015-09-21 08:10:09 PDT
Comment on attachment 8663695 [details] [diff] [review]
patch 1, update brotli snapshot

Review of attachment 8663695 [details] [diff] [review]:
-----------------------------------------------------------------

the brotli update includes a streaming interface that this feature needed
Comment 126 Patrick McManus [:mcmanus] 2015-09-21 08:14:01 PDT
I think brotli makes a lot of sense here, both on a performance basis and based on the fact that we're already carrying and using the brotli decoder for use with the WOFF font format. That's good from both a code size and security exposure point of view.

The patches I uploaded restrict brotli to https only. See comment 92.
Comment 127 Patrick McManus [:mcmanus] 2015-09-21 08:18:27 PDT
@dveditz - comment 69 talked about security review of this feature.. given that it is now implemented with brotli using a library already carried in gecko (for woff) is there extra action that should be taken?
Comment 128 Daniel Veditz [:dveditz] 2015-09-21 08:32:37 PDT
I tagged Christoph for sec-review because I wanted this fuzzed more than a formal sec-review. That's still true (I'm not sure we fuzzed brotli versions of WOFF, either.)
Comment 129 Jonathan Kew (:jfkthame) 2015-09-21 08:49:00 PDT
Comment on attachment 8663695 [details] [diff] [review]
patch 1, update brotli snapshot

Review of attachment 8663695 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me for updating our in-tree copy of the brotli decoder.
Comment 130 Neil Harris 2015-09-21 10:51:27 PDT
(In reply to Jyrki Alakuijala from comment #116)

> 
> #114 Brotli's current encoder and decoder implementations have an interface
> for the custom dictionaries in SDCH (sandwich), so later we could serve
> brotli sandwich, too.

That's insanely great. This could lead to massive speedups of time-to-first-content with boilerplate-heavy websites, far beyond that of brotli alone.
Comment 131 Daniel Stenberg [:bagder] 2015-09-21 23:39:51 PDT
Comment on attachment 8663696 [details] [diff] [review]
patch 2, fix nsHTTPCompressConv indentation

Review of attachment 8663696 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, yes it looks good to get this on a two-space indent! =)

Would it also make sense to modify some of the brace positions etc while you're at it and cleaning up the style?

Like the old code now has braces on the line after if, while and else, while the new code follows our more modern style with the braces on the same line. Also, this code is fairly littered with single-line if statements without braces.
Comment 132 Daniel Stenberg [:bagder] 2015-09-21 23:40:46 PDT
Comment on attachment 8663696 [details] [diff] [review]
patch 2, fix nsHTTPCompressConv indentation

Review of attachment 8663696 [details] [diff] [review]:
-----------------------------------------------------------------

bah, nevermind, I see you already did that in the next patch =)
Comment 133 Daniel Stenberg [:bagder] 2015-09-22 00:02:17 PDT
Comment on attachment 8663697 [details] [diff] [review]
patch 3, fix nsHTTPCompressConv bracing style

Review of attachment 8663697 [details] [diff] [review]:
-----------------------------------------------------------------

Getting more consistent bracing is good. Just two minor questions/nits on conditions left without braces but otherwise fine.

::: netwerk/streamconv/converters/nsHTTPCompressConv.cpp
@@ +176,4 @@
>          memset(&d_stream, 0, sizeof (d_stream));
>  
>          if (inflateInit(&d_stream) != Z_OK)
>            return NS_ERROR_FAILURE;

no braces for this one?

@@ +196,3 @@
>              rv = do_OnDataAvailable(request, aContext, aSourceOffset, (char *)mOutBuffer, bytesWritten);
>              if (NS_FAILED (rv))
>                return rv;

I noticed you didn't add braces for several conditions in this style, was that on purpose?
Comment 134 alpha_one_x86 2015-09-22 01:23:04 PDT
For one of my typical content brotli is 25% bigger than lzma.
Comment 135 Daniel Stenberg [:bagder] 2015-09-22 01:47:56 PDT
Comment on attachment 8663700 [details] [diff] [review]
patch 6, support different content encodings for http vs https

Review of attachment 8663700 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +1164,5 @@
>          nsXPIDLCString acceptEncodings;
>          rv = prefs->GetCharPref(HTTP_PREF("accept-encoding"),
>                                    getter_Copies(acceptEncodings));
>          if (NS_SUCCEEDED(rv))
> +            SetAcceptEncodings(acceptEncodings, false);

Opportunity to add braces for the conditional expression?
Comment 136 Chris Bentzel 2015-09-22 04:44:44 PDT
You likely already know this, but we also plan to support Brotli for Content-Encoding in Chromium. We plan to restrict it to https like your plans to avoid many of the middlebox problems encountered when rolling out SDCH-over-HTTP a few years back.

For fuzzing - we also want to step up fuzzing on Chromium side especially since the current use of Brotli-for-WOFF is in a sandbox and the Content-Encoding case will not be. Always good to have multiple fuzzers (especially since it would feed into shared brotli code base) but wondering if there's a way to combine efforts here.
Comment 137 Daniel Stenberg [:bagder] 2015-09-22 06:11:36 PDT
Comment on attachment 8663701 [details] [diff] [review]
patch 7, content-encoding brotli for https

Review of attachment 8663701 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/streamconv/converters/nsHTTPCompressConv.cpp
@@ +156,5 @@
> +  nsHTTPCompressConv *self = static_cast<nsHTTPCompressConv *>(closure);
> +  *countRead = 0;
> +
> +  // this is documented in brotli/dec/decode.h
> +  const uint32_t kOutSize = 128 * 1024;

I would like this constant slightly better explained/motivated so that we can avoid having to look up that header and also, I did look in the header and it wasn't immediately obvious!
Comment 139 Patrick McManus [:mcmanus] 2015-09-22 09:45:03 PDT
looks like the push gave the android build #include pain.

will revert fix and resubmit
Comment 140 Patrick McManus [:mcmanus] 2015-09-22 09:49:51 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d687549721e4f0d8902ec4e9704488b0b052cc2
bug 366559 - backout due to android build bustage patch 7 on CLOSED TREE r=backout
Comment 142 Patrick McManus [:mcmanus] 2015-09-22 13:16:35 PDT
a full set of working builds
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa226b9aeb77
Comment 144 Patrick McManus [:mcmanus] 2015-09-22 18:14:32 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8e7377270a1614eff46ff5ca9f26b198f6ac222
bug 366559 - patch 7, content-encoding brotli for https r=bagder
Comment 145 Carsten Book [:Tomcat] 2015-09-23 03:34:15 PDT
https://hg.mozilla.org/mozilla-central/rev/f8e7377270a1
Comment 146 Jyrki Alakuijala 2015-10-05 14:47:12 PDT
In https://datatracker.ietf.org/doc/draft-alakuijala-brotli/ section 12. 'IANA Considerations' we are making a reservation for content encoding type 'bro', not 'brotli'.

We are hoping to establish a file ending .bro for brotli compressed files, a command line tool 'bro' for compressing and uncompressing brotli files, and a accept/content encoding type 'bro'.

bro is short for brotli -- there are no hidden meanings in it, just practicality: less typing, less bytes to upload/download, shorter compressed filenames.
Comment 147 Patrick McManus [:mcmanus] 2015-10-06 06:49:10 PDT
Thanks for pointing that out jyrki, can I talk you out of it? Certainly not too late to change the draft registration.

"bro" has a gender problem, even though the dual meaning is unintentional. It comes of misogynistic and unprofessional due to the world it lives in. I received a series of 'bro' jokes in response to my posting about this new feature.

Best to avoid it rather than spending time defending an arbitrary nickname.

My interest is only in content-encoding interop.
Comment 148 Gervase Markham [:gerv] 2015-10-06 07:01:26 PDT
(In reply to Patrick McManus [:mcmanus] from comment #147)
> "bro" has a gender problem.

I don't think (English) words can have gender problems. English words don't have gender.

> It comes of misogynistic and unprofessional due to the world it lives in.

I think you are reading in. For a start, I (and I'm not alone) call several of my closer male friends "bro" on occasion, and this is neither insulting nor (by some mysterious process) makes me a misogynist.

> I
> received a series of 'bro' jokes in response to my posting about this new
> feature.

Does that say more about the word, or about the people you hang with? :-)

He's not suggesting we call the thing .brogrammer, which would be different.

Gerv
Comment 149 Patrick McManus [:mcmanus] 2015-10-06 07:07:52 PDT
fwiw imo the bro assignment will likely be a problem during ietf review for the same reasons.. best to sidestep it now.
Comment 150 Patrick McManus [:mcmanus] 2015-10-06 07:08:43 PDT
and of course gerv's "but I didn't mean it that way" will be part of the discourse - as it always is.

better off being professional and not trying to toe that line.
Comment 151 Jyrki Alakuijala 2015-10-06 07:28:37 PDT
would you be fine with 'br' ?
Comment 152 Patrick McManus [:mcmanus] 2015-10-06 07:38:44 PDT
(In reply to Jyrki Alakuijala from comment #151)
> would you be fine with 'br' ?

sounds good - thanks. I'll make that change and let the content-provider I know is working with this know.
Comment 153 Jyrki Alakuijala 2015-10-06 07:55:54 PDT
No, thank you for the advice. It is always better to fix things like this as early as possible.
Comment 154 Neil Harris 2015-10-08 07:54:29 PDT
(In reply to Patrick McManus [:mcmanus] from comment #152)
> (In reply to Jyrki Alakuijala from comment #151)
> > would you be fine with 'br' ?
> 
> sounds good - thanks. I'll make that change and let the content-provider I
> know is working with this know.

That's excellent. Thank you.
Comment 155 dE 2015-10-08 20:14:57 PDT
Thanks for the fix. Now that a major browser supports it we may see more of it's adoption.
Comment 156 Adrian 2015-10-10 02:33:40 PDT Comment hidden (spam)
Comment 157 Jyrki Alakuijala 2015-10-10 04:10:46 PDT
I have asked a feminist friend from the North American culture-sphere, and she advised against bro. We have found a compromise that satisfies us, so we don't need to discuss this further. Even if we don't understand why people are upset from our cultural standpoint, they would be (unnecessarily) upset and this is enough reason not to use it.
Comment 158 Christian Biere 2015-10-10 04:55:12 PDT Comment hidden (abuse)
Comment 159 Jyrki Alakuijala 2015-10-10 05:17:46 PDT Comment hidden (advocacy)
Comment 160 Christian Biere 2015-10-10 06:26:29 PDT Comment hidden (abuse)
Comment 161 tetyys 2015-10-10 07:27:12 PDT Comment hidden (abuse)
Comment 162 Iambetterthanmew 2015-10-10 08:31:08 PDT Comment hidden (advocacy)
Comment 163 James 2015-10-10 09:21:41 PDT Comment hidden (abuse)
Comment 164 Jamal 2015-10-10 09:49:37 PDT Comment hidden (abuse)
Comment 165 Jamal 2015-10-10 10:02:47 PDT Comment hidden (abuse)
Comment 166 Daniel Veditz [:dveditz] 2015-10-11 14:59:16 PDT
I'll call this sec-review+ because we now have a brotli fuzzer that has successfully uncovered several bugs and can be added to our on-going fuzzing efforts.
Comment 167 Michal Purzynski [:michal`] (use NEEDINFO) 2015-10-12 08:11:32 PDT
I'd don't like it as just "br" for the reasons Jamal gave above. We need to find a new name.
Comment 169 Neil Harris 2015-10-26 10:08:57 PDT
(In reply to Michal Purzynski [:michal`] (use NEEDINFO) from comment #167)
> I'd don't like it as just "br" for the reasons Jamal gave above. We need to
> find a new name.

I think "br" (2 characters) is fine if you think of it as being like "gz" (also 2). On the other hand, "brotli" (5 characters) is fine if you think of it as being like "gzip" (4). Also,  if you are using HTTP/2, which is likely to be the dominant protocol within a few years,once you've used it once, it will then be cached and header-compressed for the next of the life of a connection, so it's not as if the three extra characters in "brotli" would eat 3 more header bytes per transaction in perpetuity.
Comment 170 Neil Harris 2015-10-26 10:10:19 PDT
(In reply to Neil Harris from comment #169)
> (In reply to Michal Purzynski [:michal`] (use NEEDINFO) from comment #167)
> > I'd don't like it as just "br" for the reasons Jamal gave above. We need to
> > find a new name.
> 
> I think "br" (2 characters) is fine if you think of it as being like "gz"
> (also 2). On the other hand, "brotli" (5 characters) is fine if you think of
> it as being like "gzip" (4). Also,  if you are using HTTP/2, which is likely
> to be the dominant protocol within a few years,once you've used it once, it
> will then be cached and header-compressed for the next of the life of a
> connection, so it's not as if the three extra characters in "brotli" would
> eat 3 more header bytes per transaction in perpetuity.

Agh! s/5/6/, s/3/4/
Comment 171 Masatoshi Kimura [:emk] 2015-10-26 20:47:18 PDT
In any case, it should be reflected to the brotli draft:
https://tools.ietf.org/html/draft-alakuijala-brotli-07#section-12
Comment 172 (not reading bugmail) Nick Desaulniers [:\n] 2015-11-06 14:39:40 PST
removing devrel-needed: https://hacks.mozilla.org/2015/11/better-than-gzip-compression-with-brotli/
Comment 173 Ritu Kothari (:ritu) 2016-01-23 12:24:47 PST
Added to Fx44 release notes.

Note You need to log in before you can comment on or make changes to this bug.