Closed Bug 1156746 Opened 10 years ago Closed 10 years ago

We should run more log processors in parallel

Categories

(Tree Management :: Treeherder: Data Ingestion, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: wlach)

Details

Attachments

(1 file)

Incidentally, while working on bug 1155451 (which is pretty marginal), I found out that we're effectively only running two log processing tasks at a time (http://celery.readthedocs.org/en/latest/userguide/workers.html#concurrency). Since this process is mostly I/O bound (by the time it takes to download logs), we should definitely increase the number of concurrent tasks. Using the same value we use for pushlog ingestion (5) sounds like a good start.
This should give us a nice speed bump, as we can process more stuff as we wait for others to download.
Attachment #8595326 - Flags: review?(mdoglio)
Attachment #8595326 - Flags: review?(mdoglio) → review+
I guess this will be hard to measure using New Relic, since we don't really have a good metric for "max length of the pending log parser queue in the last day".
Though I guess we could look at the peak throughput at busy times over the course of a few days, and see if we can get the peaks higher.
:edmorley I agree the best metric we can look at it's the throughput (aka rpm). And with the new deploy notifications it *should* be easy enough to identify improvements vs regressions
(In reply to William Lachance (:wlach) from comment #0) > Since this process is mostly I/O bound (by the time it takes to download > logs), we should definitely increase the number of concurrent tasks. Using > the same value we use for pushlog ingestion (5) sounds like a good start. By the way, that's a hypothesis based on my testing in the Toronto office. Actually results will vary on the speed of the network connection between ftp.mozilla.org and the environment in which we're running Treeherder. I'd be surprised if this wasn't still the biggest factor there though.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
According to new relic the parse-log task isn't really i/o bound. That's probably the case on your local machine, where you have a decent amount of cpu cycles but the network link is tipically >=100 MB/s. In production the situation is likely the opposite: a gigabit connection and poor (also single I guess) cpu. If I'm interpreting the data on NR correctly, 68% of the time is spent parsing/unzipping the log, up to 16% to download the log, 10% retrieving the bug suggestions and a cumulative 2% spent on updating/saving the results.
(In reply to Mauro Doglio [:mdoglio] from comment #7) > According to new relic the parse-log task isn't really i/o bound. That's > probably the case on your local machine, where you have a decent amount of > cpu cycles but the network link is tipically >=100 MB/s. In production the > situation is likely the opposite: a gigabit connection and poor (also single > I guess) cpu. If I'm interpreting the data on NR correctly, 68% of the time > is spent parsing/unzipping the log, up to 16% to download the log, 10% > retrieving the bug suggestions and a cumulative 2% spent on updating/saving > the results. Yeah, could be, I'm still struggling a bit to interpret new relic stuff correctly. I was basing my assumption on the server view that :edmorley just linked ( https://rpm.newrelic.com/accounts/677903/servers/5753304), which indicates that we're not even close to using the (2) CPUs on the log processors most of the time. Let's see how we do with the new settings.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: