The /seta/job-priorities/ API spends 50%+ of the profile fetching full-task-graph.json

RESOLVED FIXED

Status

Tree Management
Treeherder: SETA
P2
normal
RESOLVED FIXED
10 months ago
10 months ago

People

(Reporter: emorley, Assigned: rwood)

Tracking

Details

Attachments

(2 attachments)

(Reporter)

Description

10 months ago
In this trace:
https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1487170601&tw%5Bstart%5D=1487168801#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e736574613a536574614a6f625072696f72697479566965775365742e6c697374222c22225d&app_trace_id=ecb6456f-f38e-11e6-9c7c-b82a72d22a14_0_2028

...4 seconds out of the total 7 seconds API response time are spent fetching:
https://queue.taskcluster.net/v1/task/czFEDsZRRiGfjulCHPzjfQ/artifacts/public/full-task-graph.json

Thoughts:
1) The request to that URL redirects (HTTP 303) to https://public-artifacts.taskcluster.net/czFEDsZRRiGfjulCHPzjfQ/0/public/full-task-graph.json , which wastes time. If the redirect target is constant, perhaps we should fetch from public-artifacts.taskcluster.net directly instead?

2) For these API requests the latest decision task ID is used by _taskcluster_runnable_jobs() rather than a specific ID, so the response could just be cached for eg N minutes, to save having to fetch the graph at all.
(Reporter)

Comment 1

10 months ago
At first glance it doesn't seem like this API endpoint even fetches the task graph (I had to look through twice even after seeing the request in the New Relic trace, and Rob missed it the first time too).

However it's just a very convoluted call chain:

SetaJobPriorityViewSet calls seta_job_scheduling() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/webapp/api/seta.py#L34)
...which calls _process() (via https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L77 or https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L49)
...which calls build_ref_data_names() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L22)
...which calls list_runnable_jobs() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/seta.py#L181)
...which calls _taskcluster_runnable_jobs() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/runnable_jobs.py#L213)
...which fetches the task graph (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/runnable_jobs.py#L131)

This code seems in need of some refactoring.
(Reporter)

Updated

10 months ago
Blocks: 1343599
(Assignee)

Updated

10 months ago
Assignee: nobody → rwood
Status: NEW → ASSIGNED

Comment 2

10 months ago
Created attachment 8842586 [details] [review]
[treeherder] rwood-moz:bug1339829 > mozilla:master
(Assignee)

Comment 3

10 months ago
Comment on attachment 8842586 [details] [review]
[treeherder] rwood-moz:bug1339829 > mozilla:master

As :emorley suggested, a quick win is to change the taskgraph URL. Just a first patch for this bug, until I make another patch with caching or other improvements.
Attachment #8842586 - Flags: review?(emorley)
(Reporter)

Comment 4

10 months ago
Comment on attachment 8842586 [details] [review]
[treeherder] rwood-moz:bug1339829 > mozilla:master

r+ with wrapping tweak :-)
Attachment #8842586 - Flags: review?(emorley) → review+

Comment 5

10 months ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/8f03e522c4ef60483be661bce1051d3e876e8413
Bug 1339829 - SETA: Change taskcluster taskgraph URL (#2216)

Comment 6

10 months ago
Created attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master
(Assignee)

Comment 7

10 months ago
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Refactor of job-priorities code, including caching of the reference data names / runnable jobs.
Attachment #8843054 - Flags: review?(jmaher)
(Reporter)

Comment 8

10 months ago
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Will, would you mind taking a look at the caching parts of this? I think the current in-memory approach won't work across a multi-dyno multi-gunicorn-thread architecture and a memcached approach might be best - which you have more experience with than I :-)
Attachment #8843054 - Flags: review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Yeah, you should use django's caching framework to store stuff that you might want to read-back later. Caching in-memory (as the current patch seems to do) won't work, since we have multiple nodes which might want to make use of the same data.

Fortunately, django's cache is quite easy to use:

https://docs.djangoproject.com/en/1.10/topics/cache/#basic-usage

Cancelling review for now.
Attachment #8843054 - Flags: review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

commented in the PR- lets do this for 1 hour of caching, not 10 minutes.  Also will need to re-review after using django caching vs in memory.  I am confused by some of the reshuffling- I would like to learn why we are moving things around between files so much.
Attachment #8843054 - Flags: review?(jmaher) → review-
(Assignee)

Comment 11

10 months ago
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

(In reply to Joel Maher ( :jmaher) from comment #10)
>I would like to learn why we are moving things around between files so much.

I just moved build_ref_data_names out of /etl/seta.py and into /seta/job_priorities.py because the only place it is called is from within job_priorities. I moved the SETA_LOW_VALUE_PRIORITY and SETA_HIGH_VALUE_PRIORITY constants out of treeherder/config/settings.py and into treeherder/seta/settings.py since those values are only specific to SETA. Also I made a couple of small changes within job_priorities.py to reduce the callstack in comment 1 at least a bit.

In this latest PR update I changed the cache to use Django's cache API (it's really cool, thanks guys!).
Attachment #8843054 - Flags: review- → review?(jmaher)
Attachment #8843054 - Flags: review?(jmaher) → review+
(Assignee)

Comment 12

10 months ago
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Not really necessary but if you want to take a peak at the updated cache code, thanks for mentioning the django cache API.
Attachment #8843054 - Flags: review+ → review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

I only looked at the caching parts, not the rest of the patch. One small suggested change but otherwise looks good.
Attachment #8843054 - Flags: review?(wlachance) → review+

Comment 14

10 months ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/76fc09fd1e17fda36e65085fafdba24e5f890359
Bug 1339829 - Refactor of SETA job-priorities API (#2221)
(Assignee)

Updated

10 months ago
Status: ASSIGNED → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
(Reporter)

Comment 15

10 months ago
This is working great - thank you! Both the typical and worst case call times are now much lower.

eg:

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m9.465s
user    0m0.000s
sys     0m0.015s

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m2.991s
user    0m0.030s
sys     0m0.000s

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.970s
user    0m0.000s
sys     0m0.000s

--

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.608s
user    0m0.031s
sys     0m0.015s

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.202s
user    0m0.015s
sys     0m0.031s

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.248s
user    0m0.015s
sys     0m0.015s
(Reporter)

Comment 16

10 months ago
(Prod was using the old implementation, stage the new)
(Assignee)

Comment 17

10 months ago
(In reply to Ed Morley [:emorley] from comment #16)
> (Prod was using the old implementation, stage the new)

Great! Thanks for the numbers.
You need to log in before you can comment on or make changes to this bug.