Closed
Bug 1203518
Opened 9 years ago
Closed 9 years ago
High prod database response time due to API being hammered
Categories
(Tree Management :: Treeherder: Infrastructure, defect, P1)
Tree Management
Treeherder: Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cbook, Unassigned)
References
Details
Both nigel and me notice a slowdown in treeherder that affects sheriffing with staring jobs etc
05:16 <&mdoglio> Tomcat|Sheriffduty: checking newrelic
05:18 <&mdoglio> I can see we have a slowdown in the job list endpoint since 7am this morning
05:36 <&mdoglio> the only uncommon thing is that one of the task queues has a bit of backlog
05:37 <&mdoglio> but a general slowdown of the database could be the cause
Reporter | ||
Comment 1•9 years ago
|
||
closing trees since this is getting worse and it makes no sense to try sheriffing with waiting +40 seconds and more for a result
05:50 < Tomcat|Sheriffduty> nigelb: now change to other result and it migth get slower :)
05:51 < nigelb> yep.
05:51 < nigelb> exactly.
05:52 < nigelb> It's much slower than usual :(
Comment 2•9 years ago
|
||
Both the throughput and the response time of the treeherder.webapp.api.jobs:JobsViewSet.list transaction have increased today:
https://rpm.newrelic.com/accounts/677903/applications/4180461/transactions?tw%5Bend%5D=1441889033&tw%5Bstart%5D=1441845833#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e6a6f62733a4a6f6273566965775365742e6c697374222c22225d
The increase in response time seems disproportional to the increase in throughput, but I guess it could still be due to us getting hammered?
Most of the profile time is spent in the job table select (80%).
Example slow JobsViewSet.list transaction:
https://rpm.newrelic.com/accounts/677903/applications/4180461/transactions#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e6a6f62733a4a6f6273566965775365742e6c697374222c22225d&app_trace_id=3fca17-23883a18-5765-11e5-97ea-b82a72d22a14
Example slow SQL query traces (job table select):
https://rpm.newrelic.com/accounts/677903/applications/4180461/datastores?tw%5Bend%5D=1441889356&tw%5Bstart%5D=1441885756#/overview/All/trace/?id=4eb47b-275e5bab-57ba-11e5-97ea-b82a72d22a14&metric=Datastore%2Fstatement%2FMySQL%2Fjob%2Fselect
https://rpm.newrelic.com/accounts/677903/applications/4180461/datastores?tw%5Bend%5D=1441889221&tw%5Bstart%5D=1441846021#/overview/All/trace/?id=4eb47b-881c8b74-57b9-11e5-97ea-b82a72d22a14&metric=Datastore%2Fstatement%2FMySQL%2Fjob%2Fselect
Updated•9 years ago
|
Component: Treeherder → Treeherder: Infrastructure
Priority: -- → P1
QA Contact: laura
Comment 3•9 years ago
|
||
An IP in Mumbai made ~80000 requests to us over 2 hours, which has now been banned for an hour.
I've filed bug 1203551 to make the API request that was being made a 400 not 200 (since it was invalid; guessing it must still hit the DB as-is?) and bug 1203556 for adding a rate-limit for GETs to our API.
Thank you everyone for your help! :-)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Summary: Treeherder database slowness → High prod database response time due to API being hammered
Comment 4•9 years ago
|
||
Some links for posterity:
https://cliang.pastebin.mozilla.org/8845876
https://cliang.pastebin.mozilla.org/8845877
https://graphite-scl3.mozilla.org/render?width=500&from=22:00_20150909&until=18:00_20150910&height=400&target=zeus.zlb1_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-http.Mbps_in&target=zeus.zlb1_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_in&target=zeus.zlb3_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-http.Mbps_in&target=zeus.zlb3_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_in&target=zeus.zlb5_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_in&target=zeus.zlb1_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-http.Mbps_out&target=zeus.zlb1_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_out&target=zeus.zlb3_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-http.Mbps_out&target=zeus.zlb3_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_out&target=zeus.zlb5_ops_scl3_mozilla_com.vservers.treeherder_mozilla_org-https.Mbps_out&title=treeherder.mozilla.org
Intermittent errors with page - https://treeherder.mozilla.org/
Unknown repository.
This repository is either unknown to Treeherder or it doesn't exist. If this repository does exist, please file a bug against the Treeherder product in Bugzilla to get it added to the system.
Also page is randomly loading
You need to log in
before you can comment on or make changes to this bug.
Description
•