Closed Bug 480346 Opened 15 years ago Closed 15 years ago

Make Soccoro collector less error-prone against disk failures

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: chizu)

Details

Yesterday we have noticed that no crash reports could be sent by the crash reporter. Marcia filed bug 480234 on that at around 9am. 10 hours later I've seen the same problem while trying to investigate a crash. I've cc'ed Lars on the bug and the problem could be fixed immediately. There reason was that no free i-nodes were left. Nagios hasn't detected this problem. As consequence we don't have crash reports for about 10 hours yesterday.

The Nagios scripts should be enhanced to cover those disk problems too. It's bad when we loose crash reports and are even not able to check own crashes for ourself.
[09:25:39PM] <reed>  -W, --iwarning=PERCENT%
[09:25:39PM] <reed>     Exit with WARNING status if less than PERCENT of inode space is free
[09:25:39PM] <reed>  -K, --icritical=PERCENT%
[09:25:39PM] <reed>     Exit with CRITICAL status if less than PERCENT of inode space is free
[09:25:47PM] <reed> we may not be doing -W and -K
Keywords: dataloss
Means those warnings were shown over 10 hours and no-one has taken care of it?
(In reply to comment #2)
> Means those warnings were shown over 10 hours and no-one has taken care of it?

?? Where did you get that from?
Sorry, I miss-read your last comment.
Assignee: server-ops → thardcastle
ETA on this?  It's just a Nagios check right?
The available inode count is now watched by nagios.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.