Closed
Bug 807306
Opened 12 years ago
Closed 12 years ago
Postgres connection blip on staging today
Categories
(Socorro :: Database, task)
Socorro
Database
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: laura, Assigned: selenamarie)
Details
Processors and monitor lost connection, with error messages, as per lars:
at 6:06 both staging processors and the monitor lost their connections to Postgres. They retried and eventually gave up.
2012-10-31 06:05:26,063 ERROR - MainThread - MainThread retry_wrapper_for_generators: failed too many times on this one operation, iterator_for_all_legacy_to_be_processed
Sheeri restarted processors and monitors and everything resumed as normal.
Selena or Matt: any idea what happened?
Assignee | ||
Comment 1•12 years ago
|
||
There's nothing in the pgbouncer or Postgres logs before, at or after those times.
I dug around and didn't see anything interesting in any of the logs.
Comment 2•12 years ago
|
||
There seems to be similar alerts for this same checkganglia last_record_reports on tp-socorro01-master01 before it hit stage. It started at 11:36pm pacific and recovered at 2:36am pacific this morning.
Assignee | ||
Comment 3•12 years ago
|
||
Asking solarce to review VIP connection settings per details in bug 771218#c95
Assignee | ||
Comment 4•12 years ago
|
||
Review revealed that a 10s timeout was in effect on 5432, 6432 and 6433 and "passive monitoring was turned on the backend pool settings". The timeout has been set to 0 and the passive monitoring is now turned off.
Per earlier investigations, this will likely solve this problem.
Assignee | ||
Comment 5•12 years ago
|
||
Yesterday's errors were logged on socorroadm.stage. Today, no errors reported.
Huzzah!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → sdeckelmann
You need to log in
before you can comment on or make changes to this bug.
Description
•