Currently the treeherder log parser waits until it has downloaded the full log before unzipping and processing. We could potentially make it slightly faster if we got started on parsing it before it was complete. I got curious about how much this could help us so I wrote something up. Unfortunately my benchmarking suggests it's not particularly helpful (it shaves between .4 and .1 seconds usually), but perhaps it's worth adding anyway.
Created attachment 8593669 [details] Benchmark script On my workstation, I get this set of 10 results on a largish log without the "optimization": (eideticker)wlach@eideticker:~/src/treeherder-service$ python t.py http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win64/1429215986/mozilla-central-win64-bm82-build1-build170.txt.gz [0.8731870651245117, 0.9529199600219727, 0.8774969577789307, 0.9509341716766357, 0.8769950866699219, 0.9473130702972412, 0.863955020904541, 0.9525530338287354, 1.4248991012573242, 1.0506041049957275] 0.977085757256 And this set of results with the optimization: (eideticker)wlach@eideticker:~/src/treeherder-service$ python t.py http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win64/1429215986/mozilla-central-win64-bm82-build1-build170.txt.gz [1.0908939838409424, 0.8961830139160156, 0.7244958877563477, 0.8918678760528564, 0.7274820804595947, 0.9782431125640869, 1.1817591190338135, 0.9811978340148926, 0.727877140045166, 0.9805409908294678] 0.918054103851
Created attachment 8593670 [details] [review] PR This is like the least urgent thing ever, since it doesn't improve performance that much (see above for benchmarks). I'm kind of on the fence about whether we should even commit it, as it makes the code slightly more complex. I more just wanted to get it out there so people could know that I tried it. Anyway, wouldn't mind a second opinion.
Testing again against on a web server on my local machine (i.e basically instantaneous download), I still get an average difference of about .1 seconds. I guess it partly depends on (1) how fast the machine is, (2) how saturated the CPU is already, and (3) how long the download takes. This algorithm will show the most improvement on a slow machine with an unsaturated cpu and slow network performance (since we'll take advantage of the long time it takes to download the file to get a jump start on decompression). If network is the dominating factor (which it seems to be, at least on my workstation), or the CPU is already saturated, expect little improvement from doing things this way.
I updated the PR to include a more realistic test program (which we can now run any time), which actually uses treeherder's log parser artifact builder classes. In this case, the difference in speed tends to be greater (.2 secs on average): Before: (venv)vagrant@local:~/treeherder-service$ ./manage.py test_parse_log --profile 10 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1429500689/mozilla-central-linux-bm74-build1-build125.txt.gz Timings: [1.5258538722991943, 1.6284010410308838, 1.7903828620910645, 1.7481331825256348, 2.2356438636779785, 1.8331339359283447, 1.715242862701416, 2.031848907470703, 1.862015962600708, 1.8552160263061523] Average: 1.82258725166 Total: 18.2258725166 After: (venv)vagrant@local:~/treeherder-service$ ./manage.py test_parse_log --profile 10 http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux/1429500689/mozilla-central-linux-bm74-build1-build125.txt.gz Timings: [1.6193771362304688, 1.614927053451538, 1.5895788669586182, 1.4775769710540771, 1.4487788677215576, 1.5538148880004883, 1.507270097732544, 2.427928924560547, 1.3921799659729004, 1.4321610927581787] Average: 1.60635938644 Total: 16.0635938644
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/ae995bf6fa2fdf6c6ff365525adfba0774daa5a7 Bug 1155451 - Don't block on full log download before parse