Closed Bug 807306 Opened 12 years ago Closed 12 years ago

Postgres connection blip on staging today

Categories

(Socorro :: Database, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: laura, Assigned: selenamarie)

Details

Processors and monitor lost connection, with error messages, as per lars: at 6:06 both staging processors and the monitor lost their connections to Postgres. They retried and eventually gave up. 2012-10-31 06:05:26,063 ERROR - MainThread - MainThread retry_wrapper_for_generators: failed too many times on this one operation, iterator_for_all_legacy_to_be_processed Sheeri restarted processors and monitors and everything resumed as normal. Selena or Matt: any idea what happened?
There's nothing in the pgbouncer or Postgres logs before, at or after those times. I dug around and didn't see anything interesting in any of the logs.
There seems to be similar alerts for this same checkganglia last_record_reports on tp-socorro01-master01 before it hit stage. It started at 11:36pm pacific and recovered at 2:36am pacific this morning.
Asking solarce to review VIP connection settings per details in bug 771218#c95
Review revealed that a 10s timeout was in effect on 5432, 6432 and 6433 and "passive monitoring was turned on the backend pool settings". The timeout has been set to 0 and the passive monitoring is now turned off. Per earlier investigations, this will likely solve this problem.
Yesterday's errors were logged on socorroadm.stage. Today, no errors reported. Huzzah!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee: nobody → sdeckelmann
You need to log in before you can comment on or make changes to this bug.