The populate-performance-series task is causing too much load on the rabbitmq1 node

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: emorley, Assigned: wlach)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
See:
https://rpm.newrelic.com/accounts/677903/applications/4180461_h5411945/transactions?type=all&show_browser=false#sort_by=time_consumed

We're getting alerts due to rabbitmq1's CPU usage being above 90%.

We should:
1) Quickly sanity check the populate-performance-series task isn't doing something obvious it shouldn't.
2) Consider moving populate-performance-series onto another node (either existing or new).
3) Perform a more detailed profile of the populate-performance-series task to find perf wins (if #1 and #2 mean this is still worth the dev time).

Will, could I leave you to drive this?
Flags: needinfo?(wlachance)
(Reporter)

Comment 1

4 years ago
There were 1175 tasks in the populate_performance_series queue - I've cleared it to give rabbitmq1 some breathing room.
Ok, so I did a bit of digging into where we might be churning quite a bit in python. Most of the code in this task is extremely straightforward and fast. The only likely culprit I could find was the json processing. For a large set of series, we can wind up processing a fair bit of it. Each time isn't huge (0.07s on my Thinkpad w520 for the biggest series over 90 days but in aggregate (e.g. 51 series in tp5 * 5 time series) that could add up. 

simplejson is nearly 10 times (0.009s for the same series) and we already have it installed, so let's just use it. ujson is slightly faster still (0.006s) but it's not installed.

We're actually already using simplejson for processing the huge builds4h json, presumably for a similar reason.
Flags: needinfo?(wlachance)
Created attachment 8562310 [details] [review]
Use simplejson for performance series processing
Assignee: nobody → wlachance
Attachment #8562310 - Flags: review?(emorley)
We may still want to put this on another node btw, if it's really critical we don't use up too much cpu time with this.
(Reporter)

Comment 6

4 years ago
Comment on attachment 8562310 [details] [review]
Use simplejson for performance series processing

r+ but left a comment - though we can just deal with that later I guess (and remove all simplejson uses from the repo at once, if there's no longer any benefit for Python 2.7).
Attachment #8562310 - Flags: review?(emorley) → review+
I wrote up a silly script to benchmark perf on the staging node.

#!/usr/bin/env python

import time
import simplejson
import json
import sys

contents = open(sys.argv[1]).read()
for module in [simplejson, json]:
    t1 = time.time()
    module.loads(contents)
    print "%s" % (time.time() - t1)

[wlachance@treeherder-rabbitmq1.stage.private.scl3 ~]$ python tjson.py big_series.json 
0.0112409591675
0.526923179626
[wlachance@treeherder-rabbitmq1.stage.private.scl3 ~]$ python2.7 tjson.py big_series.json 
0.00804281234741
0.0201890468597

Results: Big improvement on python 2.6. Slightly less big, but still large improvement on python 2.7.
Comment on attachment 8562310 [details] [review]
Use simplejson for performance series processing

After discussing this on irc, we figured it would be best to use simplejson elsewhere in the production code. Can't hurt. Updated the PR to do that.
Attachment #8562310 - Flags: review+ → review?(emorley)
(Reporter)

Comment 9

4 years ago
Comment on attachment 8562310 [details] [review]
Use simplejson for performance series processing

\o/
Attachment #8562310 - Flags: review?(emorley) → review+

Comment 11

4 years ago
Commit pushed to master at https://github.com/mozilla/treeherder-service

https://github.com/mozilla/treeherder-service/commit/4219192709aababab4bea04951839f1a3a2a23df
Bug 1131560 - Use simplejson inside production code

It appears as if simplejson is a fair bit faster than the standard json module,
which can make a difference in some cases.

For example, with the performance data, we can wind up processing a fair bit
of json. Each time isn't huge (0.07s on my laptop for the biggest series
over 90 days), but in aggregate (e.g. 51 series in tp5 * 5 time series)
that could add up. simplejson is nearly 10 times (0.009s for the same
series) and we already have it installed, so let's just use it. ujson is
slightly faster still (0.006s) but it's not installed.
Merged.

I'm not sure if we should bother filing a bug to move the performance task or if we should just keep it in the back of our heads?
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(Reporter)

Comment 13

4 years ago
Let's not bother with a bug for the moment - I think it will be handled as part of a future "let's rebalance all the nodes/tasks" bug.
You need to log in before you can comment on or make changes to this bug.