Closed Bug 847479 Opened 13 years ago Closed 13 years ago

Improve seekable stream (szip) compression and CLI

Categories

(Core :: mozglue, defect)

All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(8 files, 4 obsolete files)

3.88 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
5.45 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.47 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
4.78 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
10.87 KB, patch
froydnj
: review+
gerv
: review+
Details | Diff | Splinter Review
3.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
12.93 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
No description provided.
And keep compatibility to decompress old streams.
Attachment #720781 - Flags: review?(nfroyd)
Attachment #720782 - Flags: review?(nfroyd)
This makes (compressed) libxul.so significantly smaller, and closer to the gzipped size (which is better because szip is compressing by chunks), at the expense of decompression time: - uncompressed: 21311612 - compressed without filter: 11174496 - compressed with filter: 10752640 - gzip -9: 10604797 Since the format allows the filter to be enabled or not, we can decide later if the trade is worth it. Gerv, the filter code is borrowed from xz-utils, is the comment enough in mozglue/linker/SeekableZStream.cpp? (the file is MPL2 currently) The original code is - http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/arm.c;h=258d870fead34bf32e520f4cdef7132a20a6c610;hb=e7b424d267a34803db8d92a3515528be2ed45abd - http://git.tukaani.org/?p=xz.git;a=blob;f=src/liblzma/simple/armthumb.c;h=06c21e4067a56e75105177a2c26a2c30c153d791;hb=e7b424d267a34803db8d92a3515528be2ed45abd
Attachment #720797 - Flags: review?(nfroyd)
Attachment #720797 - Flags: review?(gerv)
The dictionary population is quite stupid, but works well enough. Combined with filters, this brings us even closer to gzip -9: - uncompressed: 21311612 - compressed with filter + dict: 10616818 - gzip -9: 10604797
Attachment #720799 - Flags: review?(nfroyd)
Comment on attachment 720797 [details] [diff] [review] Add smart filters borrowed from xz-utils to improve SeekableZStream compression rate If it's PD, you don't even legally need the comment - although it's good in case there's a bug in the code and it needs porting either way. So this is fine :-) r=gerv. Gerv
Attachment #720797 - Flags: review?(gerv) → review+
Attachment #720779 - Attachment is obsolete: true
Attachment #720779 - Flags: review?(nfroyd)
Attachment #721237 - Flags: review?(nfroyd)
Attachment #720782 - Attachment is obsolete: true
Attachment #720782 - Flags: review?(nfroyd)
Attachment #720799 - Attachment is obsolete: true
Attachment #720799 - Flags: review?(nfroyd)
Attachment #720773 - Flags: review?(nfroyd) → review+
Comment on attachment 720775 [details] [diff] [review] Add a helper class for a buffer mapped from a file Review of attachment 720775 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/linker/szip.cpp @@ +16,5 @@ > > +class FileBuffer: public MappedPtr > +{ > +public: > + bool Init(const char *name, bool _writable = false) Nit: can you make this |writable_| (or similar, I'm not picky about the naming convention) so that it doesn't conflict with the implementation namespace?
Attachment #720775 - Flags: review?(nfroyd) → review+
Comment on attachment 720776 [details] [diff] [review] Add a command line argument to szip to decompress an existing seekable stream Review of attachment 720776 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/linker/szip.cpp @@ +181,5 @@ > + > + struct stat st; > + int ret = fstat(origBuf.getFd(), &st); > + if (ret == -1) { > + log("Couldn't stat %s: %s", firstArg[1], strerror(errno)); I think you want to use firstArg[0] here.
Attachment #720776 - Flags: review?(nfroyd) → review+
Comment on attachment 720781 [details] [diff] [review] Use raw zlib streams when compressing with szip Review of attachment 720781 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/linker/SeekableZStream.h @@ +37,5 @@ > /* Number of chunks */ > le_uint32 nChunks; > > /* Size of last chunk (> 0, <= Chunk size) */ > + le_uint16 lastChunkSize; Can you add a check to szip.cpp to ensure that we're always going to assign in-range values to lastChunkSize? Or, hm, looking at szip.cpp, I guess since CHUNK is 16K, we're assuming that lastChunkSize is always going to be < 16K? A check asserting that would be good.
Attachment #720781 - Flags: review?(nfroyd) → review+
Comment on attachment 721237 [details] [diff] [review] Allow to specify a chunk size on szip command line Review of attachment 721237 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/linker/szip.cpp @@ +245,5 @@ > + if (!firstArg[0]) > + break; > + if (!GetSize(firstArg[0], &chunkSize) || !chunkSize || > + (chunkSize % 4096) || (chunkSize > 1 << 15)) { > + log("Invalid chunk size"); Aha, here's the check I was looking for. :) If you could make that 15 more obviously dependent on the fields in the header, that would be good.
Attachment #721237 - Flags: review?(nfroyd) → review+
Attachment #721235 - Flags: review?(nfroyd) → review+
Attachment #720797 - Flags: review?(nfroyd) → review+
Comment on attachment 721239 [details] [diff] [review] Add a dictionary to improve compression rate Review of attachment 721239 [details] [diff] [review]: ----------------------------------------------------------------- I think this looks sane, but I'd appreciate clarification on Dictionary's constructor. ::: mozglue/linker/szip.cpp @@ +90,5 @@ > + std::vector<stat_pair> statsVec(stats.begin(), stats.end()); > + std::sort(statsVec.begin(), statsVec.end(), stat_cmp); > + > + piece *dictPieces = reinterpret_cast<piece *>( > + static_cast<void *>(*this)); This code and the loop below made me scratch my head for a while. I'm guessing this DTRT because the cast to |void*| returns the mapped buffer, yes? But AFAICS, those bits haven't been initialized by Buffer's constructor... @@ +130,4 @@ > {} > > + const static signed char winSizeLog = 15; > + const static size_t winSize = 1 << 15; |1 << winSizeLog| ?
Attachment #721239 - Flags: review?(nfroyd) → review+
Re-requesting review for good measure.
Attachment #721503 - Flags: review?(nfroyd)
Attachment #721237 - Attachment is obsolete: true
Comment on attachment 721503 [details] [diff] [review] Allow to specify a chunk size on szip command line Works for me, thanks.
Attachment #721503 - Flags: review?(nfroyd) → review+
Depends on: 848270
Blocks: 848770
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: