Closed
Bug 1172913
Opened 10 years ago
Closed 10 years ago
zlb1.ops.scl3.mozilla.com unresponsive to the world for anything but ping.
Categories
(Infrastructure & Operations Graveyard :: WebOps: Other, task)
Infrastructure & Operations Graveyard
WebOps: Other
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1189638
People
(Reporter: rwatson, Unassigned)
Details
(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1325] )
Lots of alerts from Nagios:
0:14 <nagios-scl3> (IRC) Tue 05:10:13 PDT [5145] zlb1.ops.scl3.mozilla.com:Out of memory - killed process is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/Out+of+memory+-+killed+process)
13:10:35 <nagios-scl3> Tue 05:10:34 PDT [5146] zlb1.ops.scl3.mozilla.com:ZXTM - procs is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/ZXTM+-+procs)
13:10:45 <nagios-scl3> Tue 05:10:44 PDT [5148] zlb1.ops.scl3.mozilla.com:gmond procs is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/gmond+procs)
13:10:45 <nagios-scl3> Tue 05:10:44 PDT [5149] zlb1.ops.scl3.mozilla.com:ZXTM Process Health is CRITICAL: Error while fetching SNMP (http://m.mozilla.org/ZXTM+Process+Health)
13:11:14 <nagios-scl3> (IRC) Tue 05:11:13 PDT [5150]
Eventually able to reach console, unable to log in due to type lag.. pir forced a reboot, and things started responding again.
Comment 1•10 years ago
|
||
Absolutely nothing in the logs (even via rsyslog) makes this a pain.
Not sure what else we can do here for an event that has no logs and no root cause. We can do firmware updates during the next real TCW, if permitted, but that's really the sort of thing we should be doing anyways.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•