Closed Bug 1076770 Opened 10 years ago Closed 9 years ago

Profile the log parser to see if performance can be improved

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: emorley, Unassigned)

References

Details

Ed Morley [:emorley]

Reporter

Description

•

10 years ago

Broken out of bug 1074927. There might be some further perf improvements we can make. It would also be interesting to know how much slower each additional regex makes it, and as such, whether it's worth spending time trying to figure out if any are no longer used.

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Priority: P2 → P3

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Component: Treeherder → Treeherder: Data Ingestion

Ed Morley [:emorley]

Reporter

Comment 1

•

10 years ago

I know in many places we've intentionally used .match() instead of .search(), since if you're matching from start of string, it's faster. However there are places where we've added a '.*' to the start of the regex just so we can use .match() - but interestingly this seems to actually be slower than using .search() eg: (from bug 1121670) >>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'.*\d+ bytes leaked $(.+)$$')", ... number = 10000000) 43.355268762 >>> print timeit.timeit(stmt="r.search(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'\d+ bytes leaked $(.+)$$')", ... number = 10000000) 18.8647965157 -> So just over twice as fast to use .search() and drop the '.*' (using Python 2.7.9) Using .+ doesn't seem to be a

Ed Morley [:emorley]

Reporter

Comment 2

•

10 years ago

Bah didn't mean to submit. I was going to say using .+ doesn't seem to be as bad, however it turns out what helped was adding a space between the '.*' or '.+' and the '\d+'. eg: >>> print timeit.timeit(stmt="r.match(s)", ... setup="import re; s = 'TEST-UNEXPECTED-FAIL | leakcheck | tab process: 42114 bytes leaked (AsyncLatencyLogger, AsyncTransactionTrackersHolder, AudioOutputObserver, BufferRecycleBin, CipherSuiteChangeObserver, ...)'; r = re.compile(r'.* \d+ bytes leaked $(.+)$$')", ... number = 10000000) 19.0460573373 Anyway guess this shows we need to profile and not assume - and significant speedups are possible.

Ed Morley [:emorley]

Reporter

Updated

•

10 years ago

Priority: P3 → P4

Ed Morley [:emorley]

Reporter

Comment 3

•

10 years ago

Using the New Relic thread profiler: https://rpm.newrelic.com/accounts/677903/applications/5585473/profiles/1671082

Ed Morley [:emorley]

Reporter

Comment 4

•

9 years ago

Lets not worry about this unless we start getting backlogs, or log parsing tasks start appearing in the slow transaction traces in New Relic.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Profile the log parser to see if performance can be improved

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P4)

Tracking

(Not tracked)

People

(Reporter: emorley, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4