Make Soccoro collector less error-prone against disk failures

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
--
major
RESOLVED FIXED
9 years ago
3 years ago

People

(Reporter: whimboo, Assigned: chizu)

Tracking

Details

(Reporter)

Description

9 years ago
Yesterday we have noticed that no crash reports could be sent by the crash reporter. Marcia filed bug 480234 on that at around 9am. 10 hours later I've seen the same problem while trying to investigate a crash. I've cc'ed Lars on the bug and the problem could be fixed immediately. There reason was that no free i-nodes were left. Nagios hasn't detected this problem. As consequence we don't have crash reports for about 10 hours yesterday.

The Nagios scripts should be enhanced to cover those disk problems too. It's bad when we loose crash reports and are even not able to check own crashes for ourself.
[09:25:39PM] <reed>  -W, --iwarning=PERCENT%
[09:25:39PM] <reed>     Exit with WARNING status if less than PERCENT of inode space is free
[09:25:39PM] <reed>  -K, --icritical=PERCENT%
[09:25:39PM] <reed>     Exit with CRITICAL status if less than PERCENT of inode space is free
[09:25:47PM] <reed> we may not be doing -W and -K
Keywords: dataloss
(Reporter)

Comment 2

9 years ago
Means those warnings were shown over 10 hours and no-one has taken care of it?
(In reply to comment #2)
> Means those warnings were shown over 10 hours and no-one has taken care of it?

?? Where did you get that from?
(Reporter)

Comment 4

9 years ago
Sorry, I miss-read your last comment.
(Assignee)

Updated

9 years ago
Assignee: server-ops → thardcastle

Comment 5

9 years ago
ETA on this?  It's just a Nagios check right?
(Assignee)

Comment 6

9 years ago
The available inode count is now watched by nagios.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.