Socorro Postgres Autovac Freeze CRITICAL

RESOLVED FIXED

Status

Socorro
Database
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: ericz, Assigned: selenamarie)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

5 years ago
We just added a check for Autovac Freeze on the Socorro Postgres databases and it was CRITICAL from the get-go:

POSTGRES_AUTOVAC_FREEZE CRITICAL: (host:10.8.70.100) 'breakpad'=98%:96:97
We put the check in a 90-day downtime a while ago, and the downtime expired:

nagios-phx1: sheeri: tp-socorro01-master01.phx1.mozilla.com:Socorro Postgres Autovac Freeze is CRITICAL - POSTGRES_AUTOVAC_FREEZE CRITICAL: (host:10.8.70.100) 'breakpad'=98%:96:97 Last Checked: 2013-06-10 07:17:52 PDT

It's been ack'd on socrro01-master02:
nagios-phx1: sheeri: tp-socorro01-master02.phx1.mozilla.com:Socorro Postgres Autovac Freeze is ACKNOWLEDGEMENT (CRITICAL) - POSTGRES_AUTOVAC_FREEZE CRITICAL: (host:10.8.70.101) 'breakpad'=98%:96:97 Last Checked: 2013-06-10 07:22:46 PDT

I have put the master01 in another 90-day downtime, as this is not urgent.
Selena, can you elaborate on what needs to be done to fix the postgres autovac?
Flags: needinfo?(sdeckelmann)
This requires some analysis that I don't currently have time for. 

Basically, we should be attempting to adjust our autovacuum settings to get ourselves into a "reasonable" vacuum schedule for our workload. This would avoid excessive vacuuming and reduce IO load on the system. 

I don't have time to manage this task at this time, and it is not urgent to resolve at this tine.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Flags: needinfo?(sdeckelmann)
Resolution: --- → WONTFIX
If it's not urgent to resolve, it should stay open with a WONTFIX.

If it's actually a WONTFIX, should we turn off the Nagios check that has been in CRITICAL status since we added the check (as per the monitoring audit) back in January?
(er, if it's not urgent it should stay open, not be WONTFIX'd....the first sentence should say)
Reopening, because there's a lingering question

If this bug is actually a WONTFIX, should we turn off the Nagios check that has been in CRITICAL status since we added the check (as per the monitoring audit) back in January?

It paged again over the weekend and there doesn't seem to be a point to putting in 90-day outage windows over and over, but I don't want to remove a monitoring check without explicit OK.
Status: RESOLVED → REOPENED
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(laura)
Resolution: WONTFIX → ---
(In reply to Sheeri Cabral [:sheeri] from comment #6)
> Reopening, because there's a lingering question
> 
> If this bug is actually a WONTFIX, should we turn off the Nagios check that
> has been in CRITICAL status since we added the check (as per the monitoring
> audit) back in January?
> 
> It paged again over the weekend and there doesn't seem to be a point to
> putting in 90-day outage windows over and over, but I don't want to remove a
> monitoring check without explicit OK.

Sounds great.  Go ahead.
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(laura)
Done!

Sheeri-Cabral:nagios scabral$ svn commit -m "removing autovac freeze check as per bug 828615"
Sending        manifests/mozilla/checkcommands.pp
Sending        manifests/mozilla/services.pp
Transmitting file data ..
Committed revision 79672.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.