Closed Bug 825072 Opened 12 years ago Closed 12 years ago

Load high on kvm3/4.infra.scl1

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dustin, Unassigned)

Details

17:21 < nagios-releng> Thu 14:21:19 PST [495] kvm3.infra.scl1.mozilla.com:avg load is CRITICAL: CRITICAL - load average: 23.76, 25.70, 17.25 (http://m.allizom.org/avg+load)
17:36 < nagios-releng> Thu 14:36:49 PST [402] kvm4.infra.scl1.mozilla.com:avg load is CRITICAL: CRITICAL - load average: 32.27, 26.15, 19.38 (http://m.allizom.org/avg+load)
I don't see any smoking guns in htop.  I checked buildbot-master40 and buildapi01, but neither is swapping nor do they have high CPU utilization.

There are no instances with primary and secondary on these nodes, so no guesses from that perspective.

I'm about to take off, so Ben, if you have a chance to take a look maybe you'll see what I'm missing.
I downtimed the load check for 6h.
Both nodes are fine now.

I'm going to go ahead and blame the usual suspect: swap on DRBD.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.