Closed
Bug 1169916
Opened 10 years ago
Closed 10 years ago
Stop using Cython for the log parser if the performance gain is not significant
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P3)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
We currently use Cython for some (but not all) of our log parser code, in an attempt to improve performance.
I'm curious to know how much of a perf benefit we're getting, and so whether it's worth using Cython with the rest of the log parser files (or any other parts of Treeherder)?
Or alternatively if it's not really worth it (especially if we start to parse fewer logs given bug 1080760), we could just stop using Cython, which would:
1) Simplify the travis/vagrant/prod build
2) Reduce the number of steps/custom things someone contributing to Treeherder has to worry about (eg getting caught out by changes not taking affect until the log_parser files are rebuilt after changes are made to them)
If we're only seeing a 5-20% perf improvement I'd be inclined to ditch Cython - obviously if it's giving us a 2x+ speedup, that's a different matter (and perhaps we should see if it's also helpful for parts of say builds-4hr handling).
Comment 1•10 years ago
|
||
I really wonder whether we're getting any substantial performance benefit from using cython, log parsing is basically just doing a lot of regexes, which is fast in python. Then again, someone decided that this was worth doing at some point so... I dunno. Maybe Cam or Mauro would have more to say.
If you're interested in profiling this, you could use my test_parse_log management command:
https://github.com/mozilla/treeherder/blob/master/treeherder/etl/management/commands/test_parse_log.py
(be careful to control for the speed of downloading the actual logs, which can lead to dramatically different results -- consider perhaps downloading them in advance and log parsing against the local network)
Assignee | ||
Comment 2•10 years ago
|
||
Also the move from python 2.6 to 2.7 might have helped reduce the gap :-)
Assignee | ||
Comment 3•10 years ago
|
||
It appears that we're only speeding up the log parse by ~2% (this is total parse time, including download; I didn't use the log locally since the results were consistent enough it wasn't necessary).
Cython...
First batch of 50: 1.4435 seconds/log
Later batch of 50: 1.4412 seconds/log
Without Cython...
First batch of 50: 1.4720 seconds/log
Later batch of 50: 1.4711 seconds/log
For full output, see:
https://emorley.pastebin.mozilla.org/8838088
As such, I think we should just ditch Cython for the log parser (IMO anything less than say 20% isn't worth the added complexity).
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Priority: -- → P3
Assignee | ||
Comment 4•10 years ago
|
||
Attachment #8627462 -
Flags: review?(wlachance)
Comment 5•10 years ago
|
||
Comment on attachment 8627462 [details] [review]
Stop using Cython to build the log parser
It's nice to reduce complexity where we can!
Attachment #8627462 -
Flags: review?(wlachance) → review+
Comment 6•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/58813b0c51969b9471b8afe1b8624eaf46cde0c5
Bug 1169916 - Stop using Cython to build the log parser
Since it only speeds up parsing by a few percent of total runtime, and
is therefore not worth the added complexity for deployment and local
hack-test-debug cycles when working on the log parser.
The .gitignore and update.py entries will be removed in a later commit,
once the stage/prod src directories have been cleaned up.
Assignee | ||
Comment 7•10 years ago
|
||
This looks fine to me on prod perf wise:
https://rpm.newrelic.com/accounts/677903/applications/4180461/transactions?show_browser=false&tw[end]=1435683371&tw[start]=1435672571&type=other#id=5b224f746865725472616e73616374696f6e2f43656c6572792f70617273652d6c6f67222c22225d
I've also cleaned up the build/ directory in the src dir on stage+prod.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•10 years ago
|
Summary: Evaluate our use of Cython (eg for the log parser) → Stop using Cython for the log parser if the performance gain is not significant
You need to log in
before you can comment on or make changes to this bug.
Description
•