Closed Bug 1339829 Opened 7 years ago Closed 7 years ago

The /seta/job-priorities/ API spends 50%+ of the profile fetching full-task-graph.json

Categories

(Tree Management Graveyard :: Treeherder: SETA, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: rwood)

References

Details

Attachments

(2 files)

In this trace:
https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1487170601&tw%5Bstart%5D=1487168801#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e736574613a536574614a6f625072696f72697479566965775365742e6c697374222c22225d&app_trace_id=ecb6456f-f38e-11e6-9c7c-b82a72d22a14_0_2028

...4 seconds out of the total 7 seconds API response time are spent fetching:
https://queue.taskcluster.net/v1/task/czFEDsZRRiGfjulCHPzjfQ/artifacts/public/full-task-graph.json

Thoughts:
1) The request to that URL redirects (HTTP 303) to https://public-artifacts.taskcluster.net/czFEDsZRRiGfjulCHPzjfQ/0/public/full-task-graph.json , which wastes time. If the redirect target is constant, perhaps we should fetch from public-artifacts.taskcluster.net directly instead?

2) For these API requests the latest decision task ID is used by _taskcluster_runnable_jobs() rather than a specific ID, so the response could just be cached for eg N minutes, to save having to fetch the graph at all.
At first glance it doesn't seem like this API endpoint even fetches the task graph (I had to look through twice even after seeing the request in the New Relic trace, and Rob missed it the first time too).

However it's just a very convoluted call chain:

SetaJobPriorityViewSet calls seta_job_scheduling() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/webapp/api/seta.py#L34)
...which calls _process() (via https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L77 or https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L49)
...which calls build_ref_data_names() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/seta/job_priorities.py#L22)
...which calls list_runnable_jobs() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/seta.py#L181)
...which calls _taskcluster_runnable_jobs() (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/runnable_jobs.py#L213)
...which fetches the task graph (https://github.com/mozilla/treeherder/blob/3919352ca82a4195ad2d1ec2eeb0259b56817059/treeherder/etl/runnable_jobs.py#L131)

This code seems in need of some refactoring.
Blocks: 1343599
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Comment on attachment 8842586 [details] [review]
[treeherder] rwood-moz:bug1339829 > mozilla:master

As :emorley suggested, a quick win is to change the taskgraph URL. Just a first patch for this bug, until I make another patch with caching or other improvements.
Attachment #8842586 - Flags: review?(emorley)
Comment on attachment 8842586 [details] [review]
[treeherder] rwood-moz:bug1339829 > mozilla:master

r+ with wrapping tweak :-)
Attachment #8842586 - Flags: review?(emorley) → review+
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Refactor of job-priorities code, including caching of the reference data names / runnable jobs.
Attachment #8843054 - Flags: review?(jmaher)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Will, would you mind taking a look at the caching parts of this? I think the current in-memory approach won't work across a multi-dyno multi-gunicorn-thread architecture and a memcached approach might be best - which you have more experience with than I :-)
Attachment #8843054 - Flags: review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Yeah, you should use django's caching framework to store stuff that you might want to read-back later. Caching in-memory (as the current patch seems to do) won't work, since we have multiple nodes which might want to make use of the same data.

Fortunately, django's cache is quite easy to use:

https://docs.djangoproject.com/en/1.10/topics/cache/#basic-usage

Cancelling review for now.
Attachment #8843054 - Flags: review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

commented in the PR- lets do this for 1 hour of caching, not 10 minutes.  Also will need to re-review after using django caching vs in memory.  I am confused by some of the reshuffling- I would like to learn why we are moving things around between files so much.
Attachment #8843054 - Flags: review?(jmaher) → review-
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

(In reply to Joel Maher ( :jmaher) from comment #10)
>I would like to learn why we are moving things around between files so much.

I just moved build_ref_data_names out of /etl/seta.py and into /seta/job_priorities.py because the only place it is called is from within job_priorities. I moved the SETA_LOW_VALUE_PRIORITY and SETA_HIGH_VALUE_PRIORITY constants out of treeherder/config/settings.py and into treeherder/seta/settings.py since those values are only specific to SETA. Also I made a couple of small changes within job_priorities.py to reduce the callstack in comment 1 at least a bit.

In this latest PR update I changed the cache to use Django's cache API (it's really cool, thanks guys!).
Attachment #8843054 - Flags: review- → review?(jmaher)
Attachment #8843054 - Flags: review?(jmaher) → review+
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

Not really necessary but if you want to take a peak at the updated cache code, thanks for mentioning the django cache API.
Attachment #8843054 - Flags: review+ → review?(wlachance)
Comment on attachment 8843054 [details] [review]
[treeherder] rwood-moz:bug1339829-two > mozilla:master

I only looked at the caching parts, not the rest of the patch. One small suggested change but otherwise looks good.
Attachment #8843054 - Flags: review?(wlachance) → review+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
This is working great - thank you! Both the typical and worst case call times are now much lower.

eg:

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m9.465s
user    0m0.000s
sys     0m0.015s

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m2.991s
user    0m0.030s
sys     0m0.000s

$ time curl -sSfo /dev/null https://treeherder.mozilla.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.970s
user    0m0.000s
sys     0m0.000s

--

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.608s
user    0m0.031s
sys     0m0.015s

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.202s
user    0m0.015s
sys     0m0.031s

$ time curl -sSfo /dev/null https://treeherder.allizom.org/api/project/mozilla-inbound/seta/job-priorities /
real    0m1.248s
user    0m0.015s
sys     0m0.015s
(Prod was using the old implementation, stage the new)
(In reply to Ed Morley [:emorley] from comment #16)
> (Prod was using the old implementation, stage the new)

Great! Thanks for the numbers.
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: