Closed Bug 847479 Opened 8 years ago Closed 8 years ago

Improve seekable stream (szip) compression and CLI

Categories

(Core :: mozglue, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(8 files, 4 obsolete files)

3.88 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
5.45 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.47 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.78 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
10.87 KB, patch
froydnj
: review+
gerv
: review+
Details | Diff | Splinter Review
3.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
12.93 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
No description provided.
And keep compatibility to decompress old streams.
Attachment #720781 - Flags: review?(nfroyd)
Attachment #720782 - Flags: review?(nfroyd)
This makes (compressed) libxul.so significantly smaller, and closer to the gzipped size (which is better because szip is compressing by chunks), at the expense of decompression time:
- uncompressed: 21311612
- compressed without filter: 11174496
- compressed with filter: 10752640
- gzip -9: 10604797

Since the format allows the filter to be enabled or not, we can decide later if the trade is worth it.

Gerv, the filter code is borrowed from xz-utils, is the comment enough in mozglue/linker/SeekableZStream.cpp? (the file is MPL2 currently)
The original code is
- http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/arm.c;h=258d870fead34bf32e520f4cdef7132a20a6c610;hb=e7b424d267a34803db8d92a3515528be2ed45abd
- http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/armthumb.c;h=06c21e4067a56e75105177a2c26a2c30c153d791;hb=e7b424d267a34803db8d92a3515528be2ed45abd
Attachment #720797 - Flags: review?(nfroyd)
Attachment #720797 - Flags: review?(gerv)
The dictionary population is quite stupid, but works well enough.

Combined with filters, this brings us even closer to gzip -9:
- uncompressed: 21311612
- compressed with filter + dict: 10616818
- gzip -9: 10604797
Attachment #720799 - Flags: review?(nfroyd)
Comment on attachment 720797 [details] [diff] [review]
Add smart filters borrowed from xz-utils to improve SeekableZStream compression rate

If it's PD, you don't even legally need the comment - although it's good in case there's a bug in the code and it needs porting either way. So this is fine :-) r=gerv.

Gerv
Attachment #720797 - Flags: review?(gerv) → review+
Attachment #720779 - Attachment is obsolete: true
Attachment #720779 - Flags: review?(nfroyd)
Attachment #721237 - Flags: review?(nfroyd)
Attachment #720782 - Attachment is obsolete: true
Attachment #720782 - Flags: review?(nfroyd)
Attachment #720799 - Attachment is obsolete: true
Attachment #720799 - Flags: review?(nfroyd)
Attachment #720773 - Flags: review?(nfroyd) → review+
Comment on attachment 720775 [details] [diff] [review]
Add a helper class for a buffer mapped from a file

Review of attachment 720775 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +16,5 @@
>  
> +class FileBuffer: public MappedPtr
> +{
> +public:
> +  bool Init(const char *name, bool _writable = false)

Nit: can you make this |writable_| (or similar, I'm not picky about the naming convention) so that it doesn't conflict with the implementation namespace?
Attachment #720775 - Flags: review?(nfroyd) → review+
Comment on attachment 720776 [details] [diff] [review]
Add a command line argument to szip to decompress an existing seekable stream

Review of attachment 720776 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +181,5 @@
> +
> +  struct stat st;
> +  int ret = fstat(origBuf.getFd(), &st);
> +  if (ret == -1) {
> +    log("Couldn't stat %s: %s", firstArg[1], strerror(errno));

I think you want to use firstArg[0] here.
Attachment #720776 - Flags: review?(nfroyd) → review+
Comment on attachment 720781 [details] [diff] [review]
Use raw zlib streams when compressing with szip

Review of attachment 720781 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/SeekableZStream.h
@@ +37,5 @@
>    /* Number of chunks */
>    le_uint32 nChunks;
>  
>    /* Size of last chunk (> 0, <= Chunk size) */
> +  le_uint16 lastChunkSize;

Can you add a check to szip.cpp to ensure that we're always going to assign in-range values to lastChunkSize?

Or, hm, looking at szip.cpp, I guess since CHUNK is 16K, we're assuming that lastChunkSize is always going to be < 16K?  A check asserting that would be good.
Attachment #720781 - Flags: review?(nfroyd) → review+
Comment on attachment 721237 [details] [diff] [review]
Allow to specify a chunk size on szip command line

Review of attachment 721237 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/szip.cpp
@@ +245,5 @@
> +      if (!firstArg[0])
> +        break;
> +      if (!GetSize(firstArg[0], &chunkSize) || !chunkSize ||
> +          (chunkSize % 4096) || (chunkSize > 1 << 15)) {
> +        log("Invalid chunk size");

Aha, here's the check I was looking for. :)  If you could make that 15 more obviously dependent on the fields in the header, that would be good.
Attachment #721237 - Flags: review?(nfroyd) → review+
Attachment #721235 - Flags: review?(nfroyd) → review+
Attachment #720797 - Flags: review?(nfroyd) → review+
Comment on attachment 721239 [details] [diff] [review]
Add a dictionary to improve compression rate

Review of attachment 721239 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks sane, but I'd appreciate clarification on Dictionary's constructor.

::: mozglue/linker/szip.cpp
@@ +90,5 @@
> +    std::vector<stat_pair> statsVec(stats.begin(), stats.end());
> +    std::sort(statsVec.begin(), statsVec.end(), stat_cmp);
> +
> +    piece *dictPieces = reinterpret_cast<piece *>(
> +                        static_cast<void *>(*this));

This code and the loop below made me scratch my head for a while.  I'm guessing this DTRT because the cast to |void*| returns the mapped buffer, yes?  But AFAICS, those bits haven't been initialized by Buffer's constructor...

@@ +130,4 @@
>    {}
>  
> +  const static signed char winSizeLog = 15;
> +  const static size_t winSize = 1 << 15;

|1 << winSizeLog| ?
Attachment #721239 - Flags: review?(nfroyd) → review+
Re-requesting review for good measure.
Attachment #721503 - Flags: review?(nfroyd)
Attachment #721237 - Attachment is obsolete: true
Comment on attachment 721503 [details] [diff] [review]
Allow to specify a chunk size on szip command line

Works for me, thanks.
Attachment #721503 - Flags: review?(nfroyd) → review+
Depends on: 848270
Blocks: 848770
You need to log in before you can comment on or make changes to this bug.