Closed
Bug 1492768
Opened 7 years ago
Closed 7 years ago
Perfherder graphs frequently don't load due to /alertsummary/ API calls timing out
Categories
(Tree Management :: Perfherder, enhancement, P1)
Tree Management
Perfherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: wlach)
Details
Attachments
(1 file)
I have spent a lot of time refreshing pages and trying to get graphs to load- this seems to be a problem for the tart test. Here is a recent example:
https://treeherder.mozilla.org/perf.html#/graphs?timerange=1209600&series=mozilla-inbound,1644601,1&series=autoland,41098223c1c7585b9af92ff51b83ea579d41e6d0,0
If I CTRL+K to see the devtools console, it loads right up, I see this in the console error:
TypeError: "H.zoom is undefined; can't access its "x" property"
zoomGraph https://treeherder.mozilla.org/perf.1085b7484a9698d040ea.bundle.js:21:43553
plotGraph https://treeherder.mozilla.org/perf.1085b7484a9698d040ea.bundle.js:21:45477
processQueue https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:136126
scheduleProcessQueue https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:136792
digest https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:144507
evalAsync https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:146563
completeTask https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:154211
Zi https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:74056
defer https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:74028
evalAsync https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:146532
get https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:135248
scheduleProcessQueue https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:136684
reject https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:137529
rejectPromise https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:137448
reject https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:137853
get https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:41:270288
invoke https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:69230
qi https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:70322
getService https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:68458
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:41:289817
invoke https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:69230
createInjector https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:70530
forEach https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:45661
createInjector https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:70502
Di https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:55448
bootstrap https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:55796
angularInit https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:54941
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:269259
jr https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:48658
Fr https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:49077
resolve https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:49310
Dr https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:46768
fireWith https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:47622
fire https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:47657
Dr https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:46768
fireWith https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:47622
ready https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:50917
completed https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:3371
EventListener.handleEvent* https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:51053
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:1830
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:1729
__webpack_require__ https://treeherder.mozilla.org/manifest.fea02df2c5ce4f6b21b2.bundle.js:1:114
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:45:269675
__webpack_require__ https://treeherder.mozilla.org/manifest.fea02df2c5ce4f6b21b2.bundle.js:1:114
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:8:6648
__webpack_require__ https://treeherder.mozilla.org/manifest.fea02df2c5ce4f6b21b2.bundle.js:1:114
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:71:12334
__webpack_require__ https://treeherder.mozilla.org/manifest.fea02df2c5ce4f6b21b2.bundle.js:1:114
webpackJsonp https://treeherder.mozilla.org/manifest.fea02df2c5ce4f6b21b2.bundle.js:1:454
<anonymous> https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js:1:1
Possibly unhandled rejection: {} vendor.b94667f4e12c463094ae.bundle.js:45:127585
Source map error: request failed with status 404
Resource URL: https://treeherder.mozilla.org/vendor.b94667f4e12c463094ae.bundle.js
Source Map URL: vendor.b94667f4e12c463094ae.bundle.js.map[Learn More]
:emorley has anything changed in treeherder recently that would be a good cause of this error?
Flags: needinfo?(emorley)
Comment 1•7 years ago
|
||
The graphs URL above works for me?
The line that appears to be the one referenced in the stack trace above, is:
https://github.com/mozilla/treeherder/blob/a0c1b607debad384642b9cfb258d5da9e3c4bba6/ui/js/controllers/perf/graphs.js#L370
The history of that file is here:
https://github.com/mozilla/treeherder/commits/a0c1b607debad384642b9cfb258d5da9e3c4bba6/ui/js/controllers/perf/graphs.js
Though these two calls are very slow - perhaps they timed out for you and Perfherder doesn't correctly handle failed requests/display a nice message?
10:35:02.969 XHR GET https://treeherder.mozilla.org/api/performance/alertsummary/?alerts__series_signature=1644601&repository=2 [HTTP/1.1 200 OK 17593ms]
10:35:03.078 XHR GET https://treeherder.mozilla.org/api/performance/alertsummary/?alerts__series_signature=1653659&repository=77 [HTTP/1.1 200 OK 11816ms]
Flags: needinfo?(emorley)
| Reporter | ||
Comment 2•7 years ago
|
||
It seems as though this morning all my perfherder loads are timing out (I have gigabit fiber to my house, so I doubt it is on my end unless it is my computer or web browser - firefox nightly freshly restarted). The code sheriffs have been leaving notes on bugs that they cannot load the graphs, this is new as of the last 24 hours.
when I try to get the alertsummary lines you provided (that returned in 11 and 18 seconds...that is a long time), I get application error which looks like a heroku message or something.
| Reporter | ||
Comment 3•7 years ago
|
||
using the console trick, I see the same heroku error message:
Possibly unhandled rejection: {"data":"<!DOCTYPE html>\n\t<html>\n\t <head>\n\t\t<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n\t\t<meta charset=\"utf-8\">\n\t\t<title>Application Error</title>\n\t\t<style media=\"screen\">\n\t\t html,body,iframe {\n\t\t\tmargin: 0;\n\t\t\tpadding: 0;\n\t\t }\n\t\t html,body {\n\t\t\theight: 100%;\n\t\t\toverflow: hidden;\n\t\t }\n\t\t iframe {\n\t\t\twidth: 100%;\n\t\t\theight: 100%;\n\t\t\tborder: 0;\n\t\t }\n\t\t</style>\n\t </head>\n\t <body>\n\t\t<iframe src=\"//www.herokucdn.com/error-pages/application-error.html\"></iframe>\n\t </body>\n\t</html>","status":503,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"jsonpCallbackParam":"callback","url":"/api/performance/alertsummary/?alerts__series_signature=1651299&repository=77","headers":{"Accept":"application/json, text/plain, */*","X-CSRFToken":"s23IFzzAFPFHOyMl2wnUVn1sgZ7lssGlY68nk4WgYRxQKWEyf751b9hjvTA93KRP"}},"statusText":"Service Unavailable","xhrStatus":"complete"}
I don't know why we are getting errors from the api- that has always worked well; has the API had recent changes?
Flags: needinfo?(emorley)
Summary: perfherder fails to load tart graphs → perfherder fails to load most graphs
Comment 4•7 years ago
|
||
The HTTP 503 means the request has been killed for taking too long (over 20 seconds).
Looking at new relic, I can see that there are indeed Perfherder requests that are taking that long (and so are killed by gunicorn with a SystemExit exception):
https://rpm.newrelic.com/accounts/677903/applications/14179757/filterable_errors?tw%5Bend%5D=1537437517&tw%5Bstart%5D=1537351117#/table?top_facet=transactionUiName&barchart=barchart
(Looking at the New Relic user list I see you have an account already, but here's a screenshot for anyone else: https://screenshots.firefox.com/VJOaGNIaF7fVnJEl/rpm.newrelic.com)
Someone (Ionut?) needs to do some profiling and speed up that API:
https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1537437798&tw%5Bstart%5D=1537351398#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e706572666f726d616e63655f646174613a506572666f726d616e6365416c65727453756d6d617279566965775365742e6c697374222c22225d
Screenshots showing profile breakdown:
https://screenshots.firefox.com/qKkjvqE645d0cxF8/rpm.newrelic.com
https://screenshots.firefox.com/taIZNuh3GWnucJlQ/rpm.newrelic.com
In addition, the error handling in the frontend needs to be adjusted such that failed requests display a user-friendly in-app notification message rather than resulting in an exception.
Flags: needinfo?(emorley)
Priority: -- → P1
Summary: perfherder fails to load most graphs → Perfherder graphs frequently don't load due to /alertsummary/ API calls timing out
Comment 5•7 years ago
|
||
1600+ database queries for one API request seems pretty broken:
https://screenshots.firefox.com/taIZNuh3GWnucJlQ/rpm.newrelic.com
| Reporter | ||
Comment 6•7 years ago
|
||
:wlach, I don't know if you have any thoughts here- I think :emorley has outlined where the problems are, maybe you have some background or suspicions based on the original implementation. My main concern is the api timeout and the large volume of queries for a single api call.
:igoldan is out until next week and I will be out as of this afternoon- I suspect :igoldan could start looking into this next week- at the very least getting proper error reporting.
Flags: needinfo?(wlachance)
Comment 7•7 years ago
|
||
The API is defined here:
https://github.com/mozilla/treeherder/blob/d98f2a3d27c6462ad267ca8250a0178d904853a0/treeherder/webapp/api/performance_data.py#L251-L282
Serializer here:
https://github.com/mozilla/treeherder/blob/d98f2a3d27c6462ad267ca8250a0178d904853a0/treeherder/webapp/api/performance_serializers.py#L109-L129
Guessing this just needs a sprinkling of select_related or similar.
Comment 8•7 years ago
|
||
| Assignee | ||
Comment 10•7 years ago
|
||
Ok, the PR seems to improve query times by a factor of 2 on prototype, mainly by prefetching framework and user data. There's still a ridiculous number of queries, but that's a little harder to fix given how the endpoint is constructed -- we're currently filtering on the field of a relation, which is difficult to express in a performant way using django rest framework. We may want to consider either using a custom endpoint which looks up alert summaries based on alerts (essentially the inverse of what we're doing right now) or a graphql query here.
| Assignee | ||
Comment 11•7 years ago
|
||
Comment on attachment 9010652 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4050
Ed, I think you're in the best place to review this.
Attachment #9010652 -
Flags: review?(emorley)
Comment 12•7 years ago
|
||
(In reply to William Lachance (:wlach) (use needinfo!) from comment #10)
> we're currently filtering on the field of a relation,
> which is difficult to express in a performant way using django rest framework
Are you sure it's a limitation of django-rest-framework, and not due to the API using the auto-generated queries via django-filters? Sadly I'm not really familiar enough with d-r-f to suggest anything more concrete. George would have been a good person to ask for this.
Thank you for the PR though and nice work on the 2x speedup :-)
Updated•7 years ago
|
Attachment #9010652 -
Flags: review?(emorley) → review+
| Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #12)
> (In reply to William Lachance (:wlach) (use needinfo!) from comment #10)
> > we're currently filtering on the field of a relation,
> > which is difficult to express in a performant way using django rest framework
>
> Are you sure it's a limitation of django-rest-framework, and not due to the
> API using the auto-generated queries via django-filters? Sadly I'm not
> really familiar enough with d-r-f to suggest anything more concrete. George
> would have been a good person to ask for this.
>
> Thank you for the PR though and nice work on the 2x speedup :-)
No, I'm not sure at all. :) I feel like I'm constantly learning about new things that are possible with drf and django-filters, and SQL for that matter. Basically this PR just fixes some things that were obviously wrong with my initial naive implementation.
Comment 14•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/2806e500714486fde9c745fb71acdf445d825825
Bug 1492768 - Optimize alert summary queries executed from graphs view (#4050)
Not a complete solution but hopefully this will improve things enough
to stop the timeouts we're seeing.
Comment 15•7 years ago
|
||
Many thanks!
Assignee: nobody → wlachance
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•