Closed
Bug 1156746
Opened 10 years ago
Closed 10 years ago
We should run more log processors in parallel
Categories
(Tree Management :: Treeherder: Data Ingestion, defect)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: wlach, Assigned: wlach)
Details
Attachments
(1 file)
Incidentally, while working on bug 1155451 (which is pretty marginal), I found out that we're effectively only running two log processing tasks at a time (http://celery.readthedocs.org/en/latest/userguide/workers.html#concurrency). Since this process is mostly I/O bound (by the time it takes to download logs), we should definitely increase the number of concurrent tasks. Using the same value we use for pushlog ingestion (5) sounds like a good start.
| Assignee | ||
Comment 1•10 years ago
|
||
This should give us a nice speed bump, as we can process more stuff as we wait for others to download.
Attachment #8595326 -
Flags: review?(mdoglio)
Updated•10 years ago
|
Attachment #8595326 -
Flags: review?(mdoglio) → review+
Comment 2•10 years ago
|
||
I guess this will be hard to measure using New Relic, since we don't really have a good metric for "max length of the pending log parser queue in the last day".
Comment 3•10 years ago
|
||
Though I guess we could look at the peak throughput at busy times over the course of a few days, and see if we can get the peaks higher.
Comment 4•10 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/2719eab2135da94ae16ac9035c9418af089db47a
Bug 1156746 - Bump the number of concurrent log processing workers
Comment 5•10 years ago
|
||
:edmorley I agree the best metric we can look at it's the throughput (aka rpm). And with the new deploy notifications it *should* be easy enough to identify improvements vs regressions
| Assignee | ||
Comment 6•10 years ago
|
||
(In reply to William Lachance (:wlach) from comment #0)
> Since this process is mostly I/O bound (by the time it takes to download
> logs), we should definitely increase the number of concurrent tasks. Using
> the same value we use for pushlog ingestion (5) sounds like a good start.
By the way, that's a hypothesis based on my testing in the Toronto office. Actually results will vary on the speed of the network connection between ftp.mozilla.org and the environment in which we're running Treeherder. I'd be surprised if this wasn't still the biggest factor there though.
| Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 7•10 years ago
|
||
According to new relic the parse-log task isn't really i/o bound. That's probably the case on your local machine, where you have a decent amount of cpu cycles but the network link is tipically >=100 MB/s. In production the situation is likely the opposite: a gigabit connection and poor (also single I guess) cpu. If I'm interpreting the data on NR correctly, 68% of the time is spent parsing/unzipping the log, up to 16% to download the log, 10% retrieving the bug suggestions and a cumulative 2% spent on updating/saving the results.
Comment 8•10 years ago
|
||
| Assignee | ||
Comment 9•10 years ago
|
||
(In reply to Mauro Doglio [:mdoglio] from comment #7)
> According to new relic the parse-log task isn't really i/o bound. That's
> probably the case on your local machine, where you have a decent amount of
> cpu cycles but the network link is tipically >=100 MB/s. In production the
> situation is likely the opposite: a gigabit connection and poor (also single
> I guess) cpu. If I'm interpreting the data on NR correctly, 68% of the time
> is spent parsing/unzipping the log, up to 16% to download the log, 10%
> retrieving the bug suggestions and a cumulative 2% spent on updating/saving
> the results.
Yeah, could be, I'm still struggling a bit to interpret new relic stuff correctly. I was basing my assumption on the server view that :edmorley just linked (
https://rpm.newrelic.com/accounts/677903/servers/5753304), which indicates that we're not even close to using the (2) CPUs on the log processors most of the time. Let's see how we do with the new settings.
You need to log in
before you can comment on or make changes to this bug.
Description
•