Closed Bug 747488 Opened 13 years ago Closed 7 years ago

lzma performance testing

Categories

(Tamarin Graveyard :: Library, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pnkfelix, Assigned: dschaffe)

References

Details

Attachments

(4 files)

We need performance tests for lzma. In particular we need to double-check that the revised implementation I have suggested (patch S on Bug 729336) is competitive with the original implementation.
Blocks: 729336
irichter reports that there is a 2x slow-down when using a large .zip file as input to compress. I suspect this is as-designed for the implementation technique I used (since it will goes through 2 passes rather than 1 pass for non-compressible input); I am working on confirming that claim now.
This is a helper script I hacked together. Its purpose is to generate random inputs of arbitrary length, allowing one to turn a knob to vary the amount of variability in the generated byte sequence. You feed it three arguments: the output file name, the size of the domain to draw from for each byte (in range [1,256]), and the number of bytes to emit. (Choosing a smaller domain for the second argument means that the result is likely to be more compressible, especially as the length gets large.) As an aesthetic nicety, the domain starts at byte 48 (== ASCII '0') and goes up, wrapping around once you request a domain larger than 208 elements. This is just to make it easy to constrain the generated sequences to just decimal digits, or an interesting and printable subset of ASCII codes. Example runs: % avmshell bigrand.abc -- /dev/stdout 1 20 ; echo 00000000000000000000 % avmshell bigrand.abc -- /dev/stdout 2 20 ; echo 10001001011010001000 % avmshell bigrand.abc -- /dev/stdout 2 20 ; echo 00110101100110010010 % avmshell bigrand.abc -- /dev/stdout 10 20 ; echo 24883493077522345450 % avmshell bigrand.abc -- /dev/stdout 78 20 ; echo EUP4Tm4G`?qoQDEgBhDP
Assignee: nobody → fklockii
Or perhaps a better illustration of the end point being made via the utility (in terms of how compressibility varies with the domain size): % FILE=twomill.txt DOM=2 LEN=1000000 ; rm -f $FILE $FILE.gz && avmshell bigrand.abc -- $FILE $DOM $LEN && gzip -c $FILE > $FILE.gz && ls -l $FILE $FILE.gz -rw-r--r-- 1 fklockii staff 1000000 Apr 25 15:37 twomill.txt -rw-r--r-- 1 fklockii staff 159016 Apr 25 15:37 twomill.txt.gz % FILE=tenmill.txt DOM=10 LEN=1000000 ; rm -f $FILE $FILE.gz && avmshell bigrand.abc -- $FILE $DOM $LEN && gzip -c $FILE > $FILE.gz && ls -l $FILE $FILE.gz -rw-r--r-- 1 fklockii staff 1000000 Apr 25 15:37 tenmill.txt -rw-r--r-- 1 fklockii staff 470625 Apr 25 15:37 tenmill.txt.gz % FILE=maxmill.txt DOM=256 LEN=1000000 ; rm -f $FILE $FILE.gz && avmshell bigrand.abc -- $FILE $DOM $LEN && gzip -c $FILE > $FILE.gz && ls -l $FILE $FILE.gz -rw-r--r-- 1 fklockii staff 1000000 Apr 25 15:38 maxmill.txt -rw-r--r-- 1 fklockii staff 1000185 Apr 25 15:38 maxmill.txt.gz
This is the ad-hoc suite I e-mailed to Ingo on April 23rd. (I had hoped to take the time to clean this up before posting it here. But at this point it is easiest to just post it here.)
FYI Some additional background and dialogue is available to Adobe internally here: https://zerowing.corp.adobe.com/x/34LjJw
Thanks for your feedback Felix. I used your implementation of the seeded random number generator to get rid of all test files. For the text files I used a different approach: I put some sample text into the bytearray-test-helper.as and generate text input from this sample. I hope that the generated text data will be as realistic as possible in terms of word distribution and that the lzma implementation shows a similar behavior when it comes to this artificial test data.
Attachment #630949 - Flags: review?(fklockii)
Comment on attachment 630949 [details] [diff] [review] simplify performance testcases Review of attachment 630949 [details] [diff] [review]: ----------------------------------------------------------------- In its current form, the output from code doesn't match the format of our other performance tests (because it has invented a completely new set of metric names to tag the output from the tests), and so I'm not sure what value we get from it. I guess we might still be able to to run-to-run comparisons of different builds of the shell. So hey, if this is the way QE thinks it can support this, that is fine. That's the main reason I am R+'ing this. (Maybe the reality is that our performance test infrastructure needs revision anyway, and as part of that work, we should support more flexible encoding of performance tests than the one-file : one-individual-test, e.g. perhaps support a many-tests in one-file that this code is clearly calling out for.)
Attachment #630949 - Flags: review?(fklockii) → review+
changeset: 7418:d582d61e66fc user: Brent Baker <brbaker@adobe.com> summary: Bug 747488: performance test cases/media for LZMA feature (p=ingo.richter) http://hg.mozilla.org/tamarin-redux/rev/d582d61e66fc
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
(wait; I'll wait until attachment 630949 [details] [diff] [review] is pushed before I close this. sorry.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
reassigning to Dan, so he can decide whether he wants to land attachment 630949 [details] [diff] [review] or just close as is.
Assignee: fklockii → dschaffe
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
Status: REOPENED → RESOLVED
Closed: 13 years ago7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: