Improve seekable stream (szip) compression and CLI

RESOLVED FIXED in mozilla22

Status

()

RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: glandium, Assigned: glandium)

Tracking

Trunk
mozilla22
All
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(8 attachments, 4 obsolete attachments)

3.88 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
5.45 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.47 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.78 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
10.87 KB, patch
froydnj
: review+
gerv
: review+
Details | Diff | Splinter Review
3.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
12.93 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
Comment hidden (empty)
(Assignee)

Comment 1

6 years ago
Created attachment 720773 [details] [diff] [review]
Add buffer length information when initializing a SeekableZStream
Attachment #720773 - Flags: review?(nfroyd)
(Assignee)

Comment 2

6 years ago
Created attachment 720775 [details] [diff] [review]
Add a helper class for a buffer mapped from a file
Attachment #720775 - Flags: review?(nfroyd)
(Assignee)

Comment 3

6 years ago
Created attachment 720776 [details] [diff] [review]
Add a command line argument to szip to decompress an existing seekable stream
Attachment #720776 - Flags: review?(nfroyd)
(Assignee)

Comment 4

6 years ago
Created attachment 720779 [details] [diff] [review]
Sanity check that szip's compressed output can properly be decompressed
Attachment #720779 - Flags: review?(nfroyd)
(Assignee)

Comment 5

6 years ago
Created attachment 720781 [details] [diff] [review]
Use raw zlib streams when compressing with szip

And keep compatibility to decompress old streams.
Attachment #720781 - Flags: review?(nfroyd)
(Assignee)

Comment 6

6 years ago
Created attachment 720782 [details] [diff] [review]
Allow to specify a chunk size on szip command line
Attachment #720782 - Flags: review?(nfroyd)
(Assignee)

Comment 7

6 years ago
Created attachment 720797 [details] [diff] [review]
Add smart filters borrowed from xz-utils to improve SeekableZStream compression rate

This makes (compressed) libxul.so significantly smaller, and closer to the gzipped size (which is better because szip is compressing by chunks), at the expense of decompression time:
- uncompressed: 21311612
- compressed without filter: 11174496
- compressed with filter: 10752640
- gzip -9: 10604797

Since the format allows the filter to be enabled or not, we can decide later if the trade is worth it.

Gerv, the filter code is borrowed from xz-utils, is the comment enough in mozglue/linker/SeekableZStream.cpp? (the file is MPL2 currently)
The original code is
- http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/arm.c;h=258d870fead34bf32e520f4cdef7132a20a6c610;hb=e7b424d267a34803db8d92a3515528be2ed45abd
- http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/armthumb.c;h=06c21e4067a56e75105177a2c26a2c30c153d791;hb=e7b424d267a34803db8d92a3515528be2ed45abd
Attachment #720797 - Flags: review?(nfroyd)
Attachment #720797 - Flags: review?(gerv)
(Assignee)

Comment 8

6 years ago
Created attachment 720799 [details] [diff] [review]
Add a dictionary to improve compression rate

The dictionary population is quite stupid, but works well enough.

Combined with filters, this brings us even closer to gzip -9:
- uncompressed: 21311612
- compressed with filter + dict: 10616818
- gzip -9: 10604797
Attachment #720799 - Flags: review?(nfroyd)
Comment on attachment 720797 [details] [diff] [review]
Add smart filters borrowed from xz-utils to improve SeekableZStream compression rate

If it's PD, you don't even legally need the comment - although it's good in case there's a bug in the code and it needs porting either way. So this is fine :-) r=gerv.

Gerv
Attachment #720797 - Flags: review?(gerv) → review+
(Assignee)

Comment 10

6 years ago
Created attachment 721235 [details] [diff] [review]
Sanity check that szip's compressed output can properly be decompressed
Attachment #721235 - Flags: review?(nfroyd)
(Assignee)

Updated

6 years ago
Attachment #720779 - Attachment is obsolete: true
Attachment #720779 - Flags: review?(nfroyd)
(Assignee)

Comment 11

6 years ago
Created attachment 721237 [details] [diff] [review]
Allow to specify a chunk size on szip command line
Attachment #721237 - Flags: review?(nfroyd)
(Assignee)

Updated

6 years ago
Attachment #720782 - Attachment is obsolete: true
Attachment #720782 - Flags: review?(nfroyd)
(Assignee)

Comment 12

6 years ago
Created attachment 721239 [details] [diff] [review]
Add a dictionary to improve compression rate
Attachment #721239 - Flags: review?(nfroyd)
(Assignee)

Updated

6 years ago
Attachment #720799 - Attachment is obsolete: true
Attachment #720799 - Flags: review?(nfroyd)
Attachment #720773 - Flags: review?(nfroyd) → review+
Comment on attachment 720775 [details] [diff] [review]
Add a helper class for a buffer mapped from a file

Review of attachment 720775 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +16,5 @@
>  
> +class FileBuffer: public MappedPtr
> +{
> +public:
> +  bool Init(const char *name, bool _writable = false)

Nit: can you make this |writable_| (or similar, I'm not picky about the naming convention) so that it doesn't conflict with the implementation namespace?
Attachment #720775 - Flags: review?(nfroyd) → review+
Comment on attachment 720776 [details] [diff] [review]
Add a command line argument to szip to decompress an existing seekable stream

Review of attachment 720776 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +181,5 @@
> +
> +  struct stat st;
> +  int ret = fstat(origBuf.getFd(), &st);
> +  if (ret == -1) {
> +    log("Couldn't stat %s: %s", firstArg[1], strerror(errno));

I think you want to use firstArg[0] here.
Attachment #720776 - Flags: review?(nfroyd) → review+
Comment on attachment 720781 [details] [diff] [review]
Use raw zlib streams when compressing with szip

Review of attachment 720781 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/SeekableZStream.h
@@ +37,5 @@
>    /* Number of chunks */
>    le_uint32 nChunks;
>  
>    /* Size of last chunk (> 0, <= Chunk size) */
> +  le_uint16 lastChunkSize;

Can you add a check to szip.cpp to ensure that we're always going to assign in-range values to lastChunkSize?

Or, hm, looking at szip.cpp, I guess since CHUNK is 16K, we're assuming that lastChunkSize is always going to be < 16K?  A check asserting that would be good.
Attachment #720781 - Flags: review?(nfroyd) → review+
Comment on attachment 721237 [details] [diff] [review]
Allow to specify a chunk size on szip command line

Review of attachment 721237 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +245,5 @@
> +      if (!firstArg[0])
> +        break;
> +      if (!GetSize(firstArg[0], &chunkSize) || !chunkSize ||
> +          (chunkSize % 4096) || (chunkSize > 1 << 15)) {
> +        log("Invalid chunk size");

Aha, here's the check I was looking for. :)  If you could make that 15 more obviously dependent on the fields in the header, that would be good.
Attachment #721237 - Flags: review?(nfroyd) → review+
Attachment #721235 - Flags: review?(nfroyd) → review+
Attachment #720797 - Flags: review?(nfroyd) → review+
Comment on attachment 721239 [details] [diff] [review]
Add a dictionary to improve compression rate

Review of attachment 721239 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks sane, but I'd appreciate clarification on Dictionary's constructor.

::: mozglue/linker/szip.cpp
@@ +90,5 @@
> +    std::vector<stat_pair> statsVec(stats.begin(), stats.end());
> +    std::sort(statsVec.begin(), statsVec.end(), stat_cmp);
> +
> +    piece *dictPieces = reinterpret_cast<piece *>(
> +                        static_cast<void *>(*this));

This code and the loop below made me scratch my head for a while.  I'm guessing this DTRT because the cast to |void*| returns the mapped buffer, yes?  But AFAICS, those bits haven't been initialized by Buffer's constructor...

@@ +130,4 @@
>    {}
>  
> +  const static signed char winSizeLog = 15;
> +  const static size_t winSize = 1 << 15;

|1 << winSizeLog| ?
Attachment #721239 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 18

6 years ago
Created attachment 721503 [details] [diff] [review]
Allow to specify a chunk size on szip command line

Re-requesting review for good measure.
Attachment #721503 - Flags: review?(nfroyd)
(Assignee)

Updated

6 years ago
Attachment #721237 - Attachment is obsolete: true
Comment on attachment 721503 [details] [diff] [review]
Allow to specify a chunk size on szip command line

Works for me, thanks.
Attachment #721503 - Flags: review?(nfroyd) → review+
(Assignee)

Updated

6 years ago
Depends on: 848270
(Assignee)

Updated

6 years ago
Blocks: 848770
You need to log in before you can comment on or make changes to this bug.