Stop using Cython for the log parser if the performance gain is not significant



Tree Management
Treeherder: Data Ingestion
3 years ago
3 years ago


(Reporter: emorley, Assigned: emorley)




(1 attachment)



3 years ago
We currently use Cython for some (but not all) of our log parser code, in an attempt to improve performance.

I'm curious to know how much of a perf benefit we're getting, and so whether it's worth using Cython with the rest of the log parser files (or any other parts of Treeherder)? 

Or alternatively if it's not really worth it (especially if we start to parse fewer logs given bug 1080760), we could just stop using Cython, which would:
1) Simplify the travis/vagrant/prod build
2) Reduce the number of steps/custom things someone contributing to Treeherder has to worry about (eg getting caught out by changes not taking affect until the log_parser files are rebuilt after changes are made to them)

If we're only seeing a 5-20% perf improvement I'd be inclined to ditch Cython - obviously if it's giving us a 2x+ speedup, that's a different matter (and perhaps we should see if it's also helpful for parts of say builds-4hr handling).
I really wonder whether we're getting any substantial performance benefit from using cython, log parsing is basically just doing a lot of regexes, which is fast in python. Then again, someone decided that this was worth doing at some point so... I dunno. Maybe Cam or Mauro would have more to say.

If you're interested in profiling this, you could use my test_parse_log management command:

(be careful to control for the speed of downloading the actual logs, which can lead to dramatically different results -- consider perhaps downloading them in advance and log parsing against the local network)

Comment 2

3 years ago
Also the move from python 2.6 to 2.7 might have helped reduce the gap :-)

Comment 3

3 years ago
It appears that we're only speeding up the log parse by ~2% (this is total parse time, including download; I didn't use the log locally since the results were consistent enough it wasn't necessary).

First batch of 50: 1.4435 seconds/log
Later batch of 50: 1.4412 seconds/log

Without Cython...
First batch of 50: 1.4720 seconds/log
Later batch of 50: 1.4711 seconds/log

For full output, see:

As such, I think we should just ditch Cython for the log parser (IMO anything less than say 20% isn't worth the added complexity).
Assignee: nobody → emorley
Priority: -- → P3

Comment 4

3 years ago
Created attachment 8627462 [details] [review]
Stop using Cython to build the log parser
Attachment #8627462 - Flags: review?(wlachance)
Comment on attachment 8627462 [details] [review]
Stop using Cython to build the log parser

It's nice to reduce complexity where we can!
Attachment #8627462 - Flags: review?(wlachance) → review+

Comment 6

3 years ago
Commit pushed to master at
Bug 1169916 - Stop using Cython to build the log parser

Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.

The .gitignore and entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.

Comment 7

3 years ago
This looks fine to me on prod perf wise:[end]=1435683371&tw[start]=1435672571&type=other#id=5b224f746865725472616e73616374696f6e2f43656c6572792f70617273652d6c6f67222c22225d

I've also cleaned up the build/ directory in the src dir on stage+prod.
Last Resolved: 3 years ago
Resolution: --- → FIXED


3 years ago
Blocks: 1178832


3 years ago
Summary: Evaluate our use of Cython (eg for the log parser) → Stop using Cython for the log parser if the performance gain is not significant
You need to log in before you can comment on or make changes to this bug.