Closed
Bug 638010
Opened 14 years ago
Closed 9 years ago
"deadlock detected" in PHX monitor/processors
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: rhelmer, Unassigned)
References
Details
Monitor and all processors went down around 2011-03-01 21:15, with error messages such as this from processor09:
"""
2011-03-01 21:05:08,456 CRITICAL - Thread-7 - something's gone horribly wrong wi
th the database connection
2011-03-01 21:05:08,457 CRITICAL - Thread-7 - Caught Error: <class 'psycopg2.ext
ensions.TransactionRollbackError'>
2011-03-01 21:05:08,458 CRITICAL - Thread-7 - deadlock detected
DETAIL: Process 20775 waits for RowExclusiveLock on relation 133535 of database
48819; blocked by process 1817.
Process 1817 waits for AccessExclusiveLock on relation 49320 of database 48819;
blocked by process 20775.
HINT: See server log for query details.
"""
puppet picked things back up as bkero and I were looking into it, and everything seems ok as of 2011-03-01 21:25.
Comment 1•14 years ago
|
||
We've seen this once before. See bug 575760. Did it occur when the create_partitions cron was running?
Reporter | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
> We've seen this once before. See bug 575760. Did it occur when the
> create_partitions cron was running?
Yes:
05 21 * * 2 socorro /data/socorro/application/scripts/crons/cron_create_partitions.sh
From /var/log/socorro/cron_create_partitions.log:
started 2011-03-01 21:05:02
completed 2011-03-01 21:05:08
Comment 3•14 years ago
|
||
we need to look ahead and see if the partitions for the next four weeks exist and owned by the correct user. If they don't exist, create them manually. If they don't have the same owner as the rest of the partitions, that should get corrected, too.
Like last time, this problem is a rarity. When we get around to refactoring the SQL code, we'll rework the partition creation code with an eye for prevention of this problem.
Reporter | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
> we need to look ahead and see if the partitions for the next four weeks exist
> and owned by the correct user. If they don't exist, create them manually. If
> they don't have the same owner as the rest of the partitions, that should get
> corrected, too.
Looks ok to me:
breakpad=> \dt *_201103*
List of relations
Schema | Name | Type | Owner
--------+--------------------------+-------+-------------
public | extensions_20110307 | table | breakpad_rw
public | extensions_20110314 | table | breakpad_rw
public | extensions_20110321 | table | breakpad_rw
public | extensions_20110328 | table | breakpad_rw
public | frames_20110307 | table | breakpad_rw
public | frames_20110314 | table | breakpad_rw
public | frames_20110321 | table | breakpad_rw
public | frames_20110328 | table | breakpad_rw
public | plugins_reports_20110307 | table | breakpad_rw
public | plugins_reports_20110314 | table | breakpad_rw
public | plugins_reports_20110321 | table | breakpad_rw
public | plugins_reports_20110328 | table | breakpad_rw
public | reports_20110307 | table | breakpad_rw
public | reports_20110314 | table | breakpad_rw
public | reports_20110321 | table | breakpad_rw
public | reports_20110328 | table | breakpad_rw
(16 rows)
Comment 5•14 years ago
|
||
That looks good to me too. I suggest rather than trying to chase the cause of the deadlock, we should defer the solution to time that we refactor the SQL in the system.
Comment 6•14 years ago
|
||
Lars, Rob,
Creating a new partition actually takes a lock on the reports table (etc.) which actually blocks read queries as well as writes. Hence, the deadlock.
The only real way to avoid this is to take an explicit exclusive lock on the partitioned tables, with NOWAIT in a retry loop. That'll still block, but won't create deadlocks.
Assignee | ||
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•