Closed Bug 1519300 Opened 6 years ago Closed 6 years ago

CHECK_NRPE STATE CRITICAL for mac-v2-signing9

Categories

(Infrastructure & Operations :: RelOps: Hardware, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: apop, Assigned: dhouse)

References

Details

During my shift this alert came into #platform-ops-alerts channel :

mac-v2-signing9.srv.releng.mdc1.mozilla.com:Puppet freshness is CRITICAL: CHECK_NRPE STATE CRITICAL: Socket timeout after 60 seconds.

I've set a downtime for the service Puppet freshness until 01-12-2019 05:48:37

Can you please take a look ?

Flags: needinfo?(dhouse)

Aki opened a ticket earlier for this host's high cpu usage (kernel_task) and so that is likely related and may have caused the socket timeout.

Assignee: relops → dhouse
Depends on: 1519261
Flags: needinfo?(dhouse)

we rebooted and reset the smc, nvram, and (safemode) OS cache/other. Nagios shows the machine has been running with normal load since then.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED

Still good after the weekend. We'll see how it does this week during normal business.

https://nagios1.private.releng.mdc1.mozilla.com/releng-mdc1/cgi-bin/status.cgi?host=mac-v2-signing9.srv.releng.mdc1.mozilla.com

load
This service has 1 comment associated with it	View Extra Service Notes
	OK 	01-14-2019 05:04:42 	2d 7h 0m 47s 	1/3 	OK - load average: 1.48, 1.32, 1.24 
Status: RESOLVED → VERIFIED
Component: RelOps: Puppet → RelOps: Hardware
QA Contact: mcornmesser

Still good after a day of business:

 	01-15-2019 16:34:42 	3d 18h 32m 0s 	1/3 	OK - load average: 1.33, 1.41, 1.40
You need to log in before you can comment on or make changes to this bug.