Closed Bug 1076770 Opened 10 years ago Closed 9 years ago

Profile the log parser to see if performance can be improved

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

Details

Broken out of bug 1074927. There might be some further perf improvements we can make. It would also be interesting to know how much slower each additional regex makes it, and as such, whether it's worth spending time trying to figure out if any are no longer used.
Priority: P2 → P3
Component: Treeherder → Treeherder: Data Ingestion
I know in many places we've intentionally used .match() instead of .search(), since if you're matching from start of string, it's faster. However there are places where we've added a '.*' to the start of the regex just so we can use .match() - but interestingly this seems to actually be slower than using .search() eg: (from bug 1121670) >>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'.*\d+ bytes leaked \((.+)\)$')", ... number = 10000000) 43.355268762 >>> print timeit.timeit(stmt="r.search(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'\d+ bytes leaked \((.+)\)$')", ... number = 10000000) 18.8647965157 -> So just over twice as fast to use .search() and drop the '.*' (using Python 2.7.9) Using .+ doesn't seem to be a
Bah didn't mean to submit. I was going to say using .+ doesn't seem to be as bad, however it turns out what helped was adding a space between the '.*' or '.+' and the '\d+'. eg: >>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'.* \d+ bytes leaked \((.+)\)$')", ... number = 10000000) 19.0460573373 Anyway guess this shows we need to profile and not assume - and significant speedups are possible.
Priority: P3 → P4
Lets not worry about this unless we start getting backlogs, or log parsing tasks start appearing in the slow transaction traces in New Relic.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.