Closed Bug 465687 Opened 17 years ago Closed 16 years ago

Add nagios scripts and cacti trendings for monitoring health of Socorro database

Categories

(mozilla.org Graveyard :: Server Operations, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: reed)

References

Details

Not sure if this exists or not. If it does, it should be enabled and not ignored.
Assignee: server-ops → aravind
Whiteboard: coming from postgres consultants
Do we have this in place for thursday's cutover?
Assignee: aravind → reed
Whiteboard: coming from postgres consultants → week to two weeks ETA
Depends on: 480167
Blocks: 494003
Whiteboard: week to two weeks ETA → 06/19
In progress...
Status: NEW → ASSIGNED
Yay!!!!!!!!!
Whiteboard: 06/19
ETA?
(In reply to comment #4) > ETA? Tomorrow. I had it active Monday night, but there was a problem with the password I was using for nagios to access the pgsql database, so I need to fix that. The cacti stuff is all active already as of last week, I think.
Whiteboard: 07/03
Whiteboard: 07/03 → 07/06
ETA?
Bug's been open since November. Need a firm ETA on this, one that won't slip.
Whiteboard: 07/06
Severity: minor → critical
Priority: -- → P1
OS: Other → BeOS
OS: BeOS → All
Bug 480167 is out of scope of what this bug is for... Mostly working except for the Nagios check for "PostgreSQL Bloat", which is reporting "No matching relations found due to exclusion/inclusion options" when --include=jobs is used. Not sure what this means or what I need to do... Nagios checks are available at https://nagios.mozilla.org/nagios/cgi-bin/status.cgi?host=tm-breakpad01-master01 Cacti graphs are available at http://nm-dash01.nms.mozilla.org/cacti/graph_view.php?action=tree&tree_id=2&leaf_id=218 Two of the cacti graphs are looking strange, so I'm continuing to look into them.
No longer depends on: 480167
Summary: Add nagios script to monitor health of Socorro database → Add nagios scripts and cacti trendings for monitoring health of Socorro database
(In reply to comment #8) > Two of the cacti graphs are looking strange, so I'm continuing to look into > them. Ok, fixed the problem with the PGBouncer graph. I've been debugging the table bloat graph for the last little bit, and it seems like it's failing due to the pgsql version mismatch on nm-dash01. If I run the command directly on tm-breakpad01-master01, it completes fine, so I think there's just a version mismatch. I'll have to see about installing a newer pgsql client version on nm-dash01 and seeing if that helps.
(In reply to comment #9) > I've been debugging the table bloat graph for the last little bit, and it seems > like it's failing due to the pgsql version mismatch on nm-dash01. If I run the > command directly on tm-breakpad01-master01, it completes fine, so I think > there's just a version mismatch. I'll have to see about installing a newer > pgsql client version on nm-dash01 and seeing if that helps. pgsql client version upgrade didn't help, but I figured out that the user had to be a superuser in order to run the command, so I created a new superuser pgsql account for cacti to use. That is working now. The only thing left not working is the "PostgreSQL Bloat" nagios check mentioned in comment #8. Aravind, do you have any idea about that?
Whiteboard: all working/done except "PostgreSQL Bloat" nagios check
E-mailed pgexperts for advice.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: all working/done except "PostgreSQL Bloat" nagios check
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.