Closed Bug 638010 Opened 14 years ago Closed 9 years ago

"deadlock detected" in PHX monitor/processors

Tracking

(Not tracked)

Status:

RESOLVED INVALID

People

(Reporter: rhelmer, Unassigned)

References

Details

Robert Helmer [:rhelmer]

Reporter

Description

•

14 years ago

Monitor and all processors went down around 2011-03-01 21:15, with error messages such as this from processor09: """ 2011-03-01 21:05:08,456 CRITICAL - Thread-7 - something's gone horribly wrong wi th the database connection 2011-03-01 21:05:08,457 CRITICAL - Thread-7 - Caught Error: <class 'psycopg2.ext ensions.TransactionRollbackError'> 2011-03-01 21:05:08,458 CRITICAL - Thread-7 - deadlock detected DETAIL: Process 20775 waits for RowExclusiveLock on relation 133535 of database 48819; blocked by process 1817. Process 1817 waits for AccessExclusiveLock on relation 49320 of database 48819; blocked by process 20775. HINT: See server log for query details. """ puppet picked things back up as bkero and I were looking into it, and everything seems ok as of 2011-03-01 21:25.

Laura Thomson :laura

Comment 1

•

14 years ago

We've seen this once before. See bug 575760. Did it occur when the create_partitions cron was running?

Robert Helmer [:rhelmer]

Reporter

Comment 2

•

14 years ago

(In reply to comment #1) > We've seen this once before. See bug 575760. Did it occur when the > create_partitions cron was running? Yes: 05 21 * * 2 socorro /data/socorro/application/scripts/crons/cron_create_partitions.sh From /var/log/socorro/cron_create_partitions.log: started 2011-03-01 21:05:02 completed 2011-03-01 21:05:08

K Lars Lohn [:lars] [:klohn]

Comment 3

•

14 years ago

we need to look ahead and see if the partitions for the next four weeks exist and owned by the correct user. If they don't exist, create them manually. If they don't have the same owner as the rest of the partitions, that should get corrected, too. Like last time, this problem is a rarity. When we get around to refactoring the SQL code, we'll rework the partition creation code with an eye for prevention of this problem.

Robert Helmer [:rhelmer]

Reporter

Comment 4

•

14 years ago

K Lars Lohn [:lars] [:klohn]

Comment 5

•

14 years ago

That looks good to me too. I suggest rather than trying to chase the cause of the deadlock, we should defer the solution to time that we refactor the SQL in the system.

[:jberkus] Josh Berkus

Comment 6

•

14 years ago

Lars, Rob, Creating a new partition actually takes a lock on the reports table (etc.) which actually blocks read queries as well as writes. Hence, the deadlock. The only real way to avoid this is to take an explicit exclusive lock on the partitioned tables, with NOWAIT in a retry loop. That'll still block, but won't create deadlocks.

Nobody; OK to take it and work on it

Assignee

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Selena Deckelmann :selenamarie :selena

Updated

•

12 years ago

Blocks: 823507

Peter Bengtsson [:peterbe]

Comment 7

•

9 years ago

Too old.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

"deadlock detected" in PHX monitor/processors

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: rhelmer, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7