Closed
Bug 1485089
Opened 7 years ago
Closed 7 years ago
Intermittents Commenter experiencing OperationalError: (3024, 'Query execution was interrupted, maximum statement execution time exceeded')
Categories
(Tree Management :: Treeherder, defect, P1)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: sclements, Assigned: sclements)
Details
Attachments
(1 file)
This has happened twice in the past week (and once the week before), according to an alert I've set up in papertrail. The odd thing is that when I run it manually with the Heroku CLI (with the prod database), I often hit this same error so I'm unable to run the daily commenter even manually for that day. However, if I run it locally with the db mirror I use for development, it runs successfully. So far this has only affected the daily comments, not the weekly comments.
Transaction trace:
https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions#id=5b224f746865725472616e73616374696f6e2f43656c6572792f696e7465726d697474656e74732d636f6d6d656e746572222c22225d&tab-detail_556f290e-a510-11e8-b58d-0242ac110006_2104_3630=database_queries&app_trace_id=556f290e-a510-11e8-b58d-0242ac110006_2104_3630
Stack trace:
https://rpm.newrelic.com/accounts/677903/applications/14179757/filterable_errors#/show/2c6881e6-a4d7-11e8-b58d-0242ac110006_0_4600/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&filters=%5B%7B%22key%22%3A%22error.class%22%2C%22value%22%3A%22django.db.utils%3AOperationalError%22%2C%22like%22%3Afalse%7D%5D
Assignee | ||
Comment 1•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Priority: -- → P1
Assignee | ||
Comment 2•7 years ago
|
||
Ed, do you know if the db mirror I use for development is up to date?
Flags: needinfo?(emorley)
Comment 3•7 years ago
|
||
Running `SHOW SLAVE STATUS` on the replica will show whether replication is still running (ie hasn't errored) and if there is any lag (a few seconds to a minute is expected). If your username doesn't have permissions to run that command; temporarily log into the replica using the credentials stored under DATABASE_URL on the treeherder-prod Heroku app (but make sure to connect over TLS and to switch back to the other credentials after).
Flags: needinfo?(emorley)
Assignee | ||
Comment 4•7 years ago
|
||
I ran that command on the replica and it looks to be in working order. I got the following (edited):
Slave_IO_State: Waiting for master to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table: mysql.plugin,mysql.rds_monitor,mysql.rds_sysinfo,mysql.rds_replication_status,mysql.rds_history,innodb_memcache.config_options,innodb_memcache.cache_policies
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Skip_Counter: 0
Exec_Master_Log_Pos: 2296140
Relay_Log_Space: 25125710
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Comment 5•7 years ago
|
||
I ran an `optimize table treeherder.bugscache;` on prod's RDS for bug 1484642; it may help here too.
Assignee | ||
Comment 6•7 years ago
|
||
This hasn't happened for weeks but unfortunately occurred again this morning (daily task).
Assignee | ||
Comment 7•7 years ago
|
||
It failed again this Tuesday (daily task) and again I couldn't run the task manually with the heroku CLI. Since I've set up the papertrail alerts this has happened on:
8/18 - Saturday
8/21 - Tuesday
*skipped 3 weeks after Ed optimized bugscache (coincidence??)
9/11 - Tuesday
9/18 - Tuesday
The task runs on the same dyno as the seta-analyze-failures and fetch-push-logs but those runs once a day and every 5 minutes, respectively. Looking in New Relic, I don't see a spike in non-web transactions memory usage at the time the task failed.
I did make recent optimization to the commenter's queries (bug 1491133) so it'll be interesting to see if that helps at all.
Cam suggested it could be the RDS snapshot instance (running at GMT-700) overlapping with the Commenter task (running at 7am UTC) so it might be worth changing the daily task to run an hour earlier.
Comment 8•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Attachment #9010459 -
Flags: review?(emorley)
Comment 9•7 years ago
|
||
Comment on attachment 9010459 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/4046
Thank you :-)
Attachment #9010459 -
Flags: review?(emorley) → review+
Comment 10•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/bb4aee78a3d3d704e9de79f15121bad68115a034
Bug 1485089 - Change commenter daily task
Change daily task time to run one hour earlier since it appears that
the tuesday daily task is running at the same time the RDS snapshot occurs
and might be contributing to the celery task timeout exception.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → sclements
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•7 years ago
|
||
Keeping this open for a few more weeks to see if the above mentioned changes fix the problem.
Assignee | ||
Comment 12•7 years ago
|
||
I haven't seen this particular error reoccur in the past month, so closing it out.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Intermittent Failures View → TreeHerder
You need to log in
before you can comment on or make changes to this bug.
Description
•