Treeherder log parser blocks on completed download before parsing



Tree Management
Treeherder: Data Ingestion
3 years ago
3 years ago


(Reporter: wlach, Assigned: wlach)




(2 attachments)

Currently the treeherder log parser waits until it has downloaded the full log before unzipping and processing. We could potentially make it slightly faster if we got started on parsing it before it was complete.

I got curious about how much this could help us so I wrote something up. Unfortunately my benchmarking suggests it's not particularly helpful (it shaves between .4 and .1 seconds usually), but perhaps it's worth adding anyway.
Created attachment 8593669 [details]
Benchmark script

On my workstation, I get this set of 10 results on a largish log without the "optimization":

(eideticker)wlach@eideticker:~/src/treeherder-service$ python 
[0.8731870651245117, 0.9529199600219727, 0.8774969577789307, 0.9509341716766357, 0.8769950866699219, 0.9473130702972412, 0.863955020904541, 0.9525530338287354, 1.4248991012573242, 1.0506041049957275]

And this set of results with the optimization:

(eideticker)wlach@eideticker:~/src/treeherder-service$ python 
[1.0908939838409424, 0.8961830139160156, 0.7244958877563477, 0.8918678760528564, 0.7274820804595947, 0.9782431125640869, 1.1817591190338135, 0.9811978340148926, 0.727877140045166, 0.9805409908294678]
Created attachment 8593670 [details] [review]

This is like the least urgent thing ever, since it doesn't improve performance that much (see above for benchmarks). I'm kind of on the fence about whether we should even commit it, as it makes the code slightly more complex. I more just wanted to get it out there so people could know that I tried it. Anyway, wouldn't mind a second opinion.
Attachment #8593670 - Flags: review?(mdoglio)
Testing again against on a web server on my local machine (i.e basically instantaneous download), I still get an average difference of about .1 seconds. I guess it partly depends on (1) how fast the machine is, (2) how saturated the CPU is already, and (3) how long the download takes.

This algorithm will show the most improvement on a slow machine with an unsaturated cpu and slow network performance (since we'll take advantage of the long time it takes to download the file to get a jump start on decompression). If network is the dominating factor (which it seems to be, at least on my workstation), or the CPU is already saturated, expect little improvement from doing things this way.
I updated the PR to include a more realistic test program (which we can now run any time), which actually uses treeherder's log parser artifact builder classes. In this case, the difference in speed tends to be greater (.2 secs on average):


(venv)vagrant@local:~/treeherder-service$ ./ test_parse_log --profile 10
Timings: [1.5258538722991943, 1.6284010410308838, 1.7903828620910645, 1.7481331825256348, 2.2356438636779785, 1.8331339359283447, 1.715242862701416, 2.031848907470703, 1.862015962600708, 1.8552160263061523]
Average: 1.82258725166
Total: 18.2258725166


(venv)vagrant@local:~/treeherder-service$ ./ test_parse_log --profile 10
Timings: [1.6193771362304688, 1.614927053451538, 1.5895788669586182, 1.4775769710540771, 1.4487788677215576, 1.5538148880004883, 1.507270097732544, 2.427928924560547, 1.3921799659729004, 1.4321610927581787]
Average: 1.60635938644
Total: 16.0635938644
Attachment #8593670 - Flags: review?(mdoglio) → review+

Comment 5

3 years ago
Commit pushed to master at
Bug 1155451 - Don't block on full log download before parse
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.