Closed
Bug 1111381
Opened 9 years ago
Closed 9 years ago
make fuzzer-linux3.sec.scl3.mozilla.com an (irc) alert
Categories
(Infrastructure & Operations :: MOC: Projects, task)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: achavez, Assigned: ryanc)
Details
I got an alert for fuzzer-linux3 and wasn't able to ssh to it and follow the run book. I was told by fox2mike that this alert and possibly more fuzzer alerts need to be changed to irc alerts. We don't have access to these machines to fix if broken.
Comment 1•9 years ago
|
||
ashihs how does one turn an alert into irc only ?
Component: MOC: Problems → MOC: Projects
Updated•9 years ago
|
Assignee: nobody → rwatson
Assignee | ||
Comment 2•9 years ago
|
||
Thu 22:02:58 PST [5862] fuzzer-linux3.sec.scl3.mozilla.com:HP Health is UNKNOWN: UNKNOWN - hanging hpasmdcli processes. Any update with this?
Status: NEW → ASSIGNED
Flags: needinfo?(rwatson)
Reporter | ||
Comment 3•9 years ago
|
||
[15:46:45] <nagios-scl3> Fri 15:46:45 PDT [5118] fuzzer-linux3.sec.scl3.mozilla.com:HP Health is CRITICAL: CRITICAL - hpasmd needs to be started
Comment 4•9 years ago
|
||
My 2c - I prefer if hardware checks alert the oncall, so that important stuff - drive/controller failures, firmware updates, etc. doesn't get missed
Updated•9 years ago
|
Assignee: rwatson → nobody
Flags: needinfo?(rwatson)
Assignee | ||
Comment 5•9 years ago
|
||
Tue 23:37:04 PDT [5379] fuzzer-linux3.sec.scl3.mozilla.com:HP Health is CRITICAL: CHECK_NRPE: Socket timeout after 60 seconds. Rechecking resolved this. :ashish, I agree. How do you think we should go about this? This has bug has been pretty stale.
Flags: needinfo?(ashish)
QA Contact: dmoore → lypulong
Assignee | ||
Comment 6•9 years ago
|
||
:ashish, As you mentioned, it'd only be worth it if there were more than 10 hosts that needed to be IRC'd. I'd say all fuzzers need this. It should only alert for array and disk failures etc..
Assignee: nobody → rchilds
Comment 7•9 years ago
|
||
I think that is the case now - only HP stuff pages oncall
Flags: needinfo?(ashish)
Comment 8•9 years ago
|
||
(In reply to Ashish Vijayaram [:ashish] from comment #7) > I think that is the case now - only HP stuff pages oncall And their HP daemons keep on breaking, paging you folks.
Assignee | ||
Comment 9•9 years ago
|
||
A lot of these boxes are severely outdated with patches, so maybe that will remedy these HP alerts. Since I reinstalled fuzzer2 with Puppet, its been perfect. [rchilds@admin1a.private.scl3 ~]$ ssh -A fuzzer-linux6.sec.scl3.mozilla.com Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-45-generic x86_64) * Documentation: https://help.ubuntu.com/ 327 packages can be updated. 261 updates are security updates.
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•