Open Bug 1301878 (zstd) Opened 4 years ago Updated 10 days ago

Implement support for Zstandard (zstd)

Categories

(Core :: General, enhancement)

enhancement
Not set
major

Tracking

()

People

(Reporter: Virtual, Unassigned, NeedInfo)

References

(Blocks 1 open bug, )

Details

(Keywords: feature, nightly-community)

Has Regression Range: --- → irrelevant
Has STR: --- → irrelevant
This bug is quite generic. Support for zstd for what? Possible replacement where brotli is currently used? (woff, HTTP compression) Are there W3C/WhatWG/IETF work underway in that direction?

Relatedly, are the patent licensing terms from zstd (or anything else from Facebook, really) considered problematic? (https://github.com/facebook/zstd/blob/dev/PATENTS https://github.com/facebook/zstd/issues/335)
Flags: needinfo?(gerv)
(In reply to Mike Hommey [:glandium] (VAC: Apr 20-May 4) from comment #1)
> Relatedly, are the patent licensing terms from zstd (or anything else from
> Facebook, really) considered problematic?
> (https://github.com/facebook/zstd/blob/dev/PATENTS
> https://github.com/facebook/zstd/issues/335)

ellee knows the answer to this question :-)

Gerv
Flags: needinfo?(gerv) → needinfo?(ellee)
Flags: needinfo?(ellee)
Mike, I'll reach out to you directly. Thanks!
Elvin, I never heard back from you. I got reminded of the issue by this HN thread: https://news.ycombinator.com/item?id=14779881 , which also reminded me that the same patent license applies to react, which we already ship (and I'm not sure this issue was ever considered when react was added to the Firefox code)
Flags: needinfo?(ellee)
Sorry -- i sent you a note via e-mail. The ASF decision you referenced is helpful to know, and I'm discussing it with some other folks.
Flags: needinfo?(ellee)
Depends on: 1477516
Blocks: 1477516
No longer depends on: 1477516
Type: defect → task

Hello. I've investigated RFC 8478 a bit.

3.1.1.1.3. Dictionary_ID:
However, for frames and dictionaries distributed in public space, Dictionary_ID must be attributed carefully. The following ranges are reserved for use only with dictionaries that have been registered with IANA (see Section 6.3).

6.3. Dictionaries:
However, there are at present no such dictionaries published for public use, so this document makes no immediate request of IANA to create such a registry.

Brotli RFC 7932 includes single public dictionary optimized for web content. Zstd RFC looks like a joke.

I made a web search and found almost nothing. 8 months passed and nobody are going to provide these dictionaries. I can't understand how anyone can implement zstd bindings for firefox without clear dictionary registry. Zstd will never destroy brotli without effective dictionaries.

Please let me know if I am wrong. Thank you.

Please let me know if I am wrong. Thank you.

Even if zstd doesn't destroy brotli for static content, it can definitely seriously outperform all others for dynamic content. A dynamically generated php (or whatever dynamic) page can be compressed with zstd instead of deflate while transmitting to the browser. Brotli has really good performance decompressing but it uses a LOT of resources to compress. Zstd can get close-enough compression results with only a small fraction of the resources to compress while using about the same resources to decompress as brotli.

Do you think this is not reason enough?

Besides, the RFC does state there's still studying needed before optimized dictionaries are distributed... I just guess they are still working on them, as written.

Cloudflare is using zstd with success back in 2018 and zstd has been having lots of improvements ever since.

I since mid this year, I strongly believe that (with the current selection) zstd will be the "new" de-facto standard to compress, replacing deflate.

IMO: Lz4 is a quick-win compression ("stolen" from deflate), LZMA is the compression at all costs, brotli is the static content compression and zstd is the "balanced" compression (role deflate has now).

From cloudflare's brotli study they concluded, for streaming, brotli is even worse than zlib/gzip/deflate, so they dropped the idea.

Flags: needinfo?(aladjev.andrew)

Even if zstd doesn't destroy brotli for static content, it can definitely seriously outperform all others for dynamic content.

Hello. Yes, may be, but I think that integration is not really possible.

Please imagine that we are trying to implement zstd support in firefox. Web browser is a special application in terms of compatibility. For example we released firefox v75 with zstd support (without dictionary support) in 2019. It means that our browser should be able to decompress any zstd content in 2019-2030 years. We can see in RFC the following text:

Frame header: ..., Dictionary_id
This is a variable size field, which contains the ID of the dictionary required to properly decode the frame.

Facebook will provide dictionary ecosystem in 2023-2024 year (just my assumption). Webservers will produce compressed content with dictionary id fields. Our firefox v75 released in 2019 won't be able to decompress this content. So I think that firefox and other web browsers should wait until dictionary system will appear.

Flags: needinfo?(aladjev.andrew)

It's not immediately obvious why 2020's Firefox would need to be able to decompress zstd-encoded responses from 2030. But maybe this is not a bad moment for a discussion about Accept-Encoding and how to handle dictionary variants of zstd.

(In reply to Andrew from comment #9)

Even if zstd doesn't destroy brotli for static content, it can definitely seriously outperform all others for dynamic content.

Hello. Yes, may be, but I think that integration is not really possible.

Please imagine that we are trying to implement zstd support in firefox. Web browser is a special application in terms of compatibility. For example we released firefox v75 with zstd support (without dictionary support) in 2019. It means that our browser should be able to decompress any zstd content in 2019-2030 years. We can see in RFC the following text:

Frame header: ..., Dictionary_id
This is a variable size field, which contains the ID of the dictionary required to properly decode the frame.

Facebook will provide dictionary ecosystem in 2023-2024 year (just my assumption). Webservers will produce compressed content with dictionary id fields. Our firefox v75 released in 2019 won't be able to decompress this content. So I think that firefox and other web browsers should wait until dictionary system will appear.

What is the issue? Firefox can ship with zstd already ready to have dictionaries "plugged in", except it comes with none installed initially. Later on, a quick update can add official dictionaries as they come out. This feature can even be developed such that dictionaries are searched in a directory and they are used in firefox (can be used for quick patching like this for PC that update late).
Nothing prevents firefox from being already shipping with zstd dictionary capability. (sing your dates) In 2023-2024, when a dictionary is defined in the RFC, we create a ticket and a quick patch will add the dictionary to firefox. Even for LTS. What's the trouble?

Flags: needinfo?(aladjev.andrew)

What is the issue? Firefox can ship with zstd already ready to have dictionaries "plugged in", except it comes with none installed initially. Later on, a quick update can add official dictionaries as they come out. This feature can even be developed such that dictionaries are searched in a directory and they are used in firefox (can be used for quick patching like this for PC that update late).

There is no such directory. It is not possible to implement it with imaginary directory ecosystem.

when a dictionary is defined in the RFC, we create a ticket and a quick patch will add the dictionary to firefox. Even for LTS. What's the trouble?

RFC 8478 does not define any way of downloading, synchronizing, protecting dictionaries, etc.

ZSTD developers said the following:

The plan is the opposite: as I described in the caniuse thread, the RFC does not standardize the use of a dictionary. Responses with Content-Encoding: zstd should not use a dictionary. If and when a dictionary-based scheme is standardized for HTTP, it will use a different content-coding identifier.

So for now integration is possible without dictionary, some workarounds for dictionary support will appear later.

Flags: needinfo?(aladjev.andrew)

(In reply to Andrew from comment #12)

What is the issue? Firefox can ship with zstd already ready to have dictionaries "plugged in", except it comes with none installed initially. Later on, a quick update can add official dictionaries as they come out. This feature can even be developed such that dictionaries are searched in a directory and they are used in firefox (can be used for quick patching like this for PC that update late).

There is no such directory. It is not possible to implement it with imaginary directory ecosystem.
There's none because there's no zstd implemented in firefox.

when a dictionary is defined in the RFC, we create a ticket and a quick patch will add the dictionary to firefox. Even for LTS. What's the trouble?

RFC 8478 does not define any way of downloading, synchronizing, protecting dictionaries, etc.
That's fine. Why would it define that if it's the program's responsibility to ship the dictionary and the OS' responsibility to prevent the files from being tampered.
ZSTD developers said the following:

The plan is the opposite: as I described in the caniuse thread, the RFC does not standardize the use of a dictionary. Responses with Content-Encoding: zstd should not use a dictionary. If and when a dictionary-based scheme is standardized for HTTP, it will use a different content-coding identifier.

So for now integration is possible without dictionary, some workarounds for dictionary support will appear later.
I didn't know that part. If that's so, then, what's the problem???
Zstd can have a great web use for dynamic contents with good streaming compression for very fast speeds!
I don't understand what I'm missing then...

Flags: needinfo?(aladjev.andrew)

I am with brunoais: Firefox can ship now with the current state of zstd / zstd for the web. If there are updates in the future, they can be added in the future.

Type: task → enhancement

Commonly brotli decodes a web page in one millisecond, fully shadowed by the transfer in any mobile system. Definitely there is no waiting or multithreaded decoding on any mobile platform that I know of.

Dynamic brotli compression is somewhat more dense than dynamic zstd compression. Particularly so for languages with utf-8 use (like Chinese, Russian, Vietnamese, ...).

In my experiments brotli slows down less than zstd in parallel computation. This is mostly because brotli achieves compression density by more computation whereas zstd is relying on more memory access (larger window size).

Brotli is more streamable. Zstd gets its slightly better decoding speed by processing blocks of data. The format is ordered in away that makes full streaming inpossible and some of the already received data is just not decodeable. In Brotli streaming was the first class citizen, and much more data can be outputed and used for further processing (like HTML tree construction or JavaScript parsing), streaming all the way down to the last LZ copy or literal. Blocked processing of Zstd is favorable for datacenter level computing, but when humans are waiting for data, full streaming is more favorable.

Brotli is fast enough. Decoding speed -- while slightly slower than zstd -- is far faster than any mobile connection. Because of streaming, the decoding can happen during the transfer.

We don't observe in our mobile or desktop deployments the kind of gzip vs. brotli issues that some confused early bloggers reported. Later bloggers did call out these confused bloggers as rumors. https://certsimple.com/blog/nginx-brotli writes "To summarize Akamai's study on Brotli performance... Brotli with setting 4 is both significantly smaller AND compresses faster than gzip"

Please note that rfc8478 paragaph "3.1.1.1.1.2. Single_Segment_Flag" leaves the window size open. Even in the area of normal operation, and the client promising to be able to decode zstd, the server has no way to actually know if the client will be able to decode the stream if the request is for more than 8 MB. In brotli there is no such gray area that can fragment the client space. I'd strongly recommend removing the gray area.

Please note that when zstd beats brotli in compression speed or density benchmarks, it is because it defaults to larger window sizes (often 128 MB vs. brotli's 4 MB). When used in a browser environment usual deployments use smaller window size because larger window sizes mean more OoMs in browsers. As one datapoint Facebook themselves are using a 512 kB window with brotli for dynamic HTTP compression.

Curl now support zstd

You need to log in before you can comment on or make changes to this bug.