Closed Bug 1172913 Opened 10 years ago Closed 10 years ago

zlb1.ops.scl3.mozilla.com unresponsive to the world for anything but ping.

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1189638

People

(Reporter: rwatson, Unassigned)

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1325] )

Lots of alerts from Nagios: 0:14 <nagios-scl3> (IRC) Tue 05:10:13 PDT [5145] zlb1.ops.scl3.mozilla.com:Out of memory - killed process is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/Out+of+memory+-+killed+process) 13:10:35 <nagios-scl3> Tue 05:10:34 PDT [5146] zlb1.ops.scl3.mozilla.com:ZXTM - procs is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/ZXTM+-+procs) 13:10:45 <nagios-scl3> Tue 05:10:44 PDT [5148] zlb1.ops.scl3.mozilla.com:gmond procs is CRITICAL: (Return code of 255 is out of bounds) (http://m.mozilla.org/gmond+procs) 13:10:45 <nagios-scl3> Tue 05:10:44 PDT [5149] zlb1.ops.scl3.mozilla.com:ZXTM Process Health is CRITICAL: Error while fetching SNMP (http://m.mozilla.org/ZXTM+Process+Health) 13:11:14 <nagios-scl3> (IRC) Tue 05:11:13 PDT [5150] Eventually able to reach console, unable to log in due to type lag.. pir forced a reboot, and things started responding again.
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/1325]
Absolutely nothing in the logs (even via rsyslog) makes this a pain.
Not sure what else we can do here for an event that has no logs and no root cause. We can do firmware updates during the next real TCW, if permitted, but that's really the sort of thing we should be doing anyways.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Duping this forward.
Resolution: INCOMPLETE → DUPLICATE
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.