Closed Bug 694774 Opened 13 years ago Closed 13 years ago

1/3 of production tegras are offline - all of them attached to CDU4

Categories

(Infrastructure & Operations :: RelOps: General, task)

ARM
Android
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bear, Unassigned)

Details

The CDU itself has power (I can get to it's web interface) and I can even request that the outlets go thru a reboot cycle - but the tegras all show hard OFFLINE and have been for a while: 2011-10-14 17:30:08 tegra-095 p online active active :: 2011-10-14 17:35:34 tegra-095 p OFFLINE active active :: PING tegra-095.build.mtv1.mozilla.com (10.250.49.83): 56 data bytes Request timeout for icmp_seq 0;
Are these still offline? Did you try rebooting the CDU itself?
All of these tegras were in switch4, one of the new switches, which wasn't completely configured yet. I'm not sure how they ended up in that switch. At any rate, they're online now - the switch has been reconfigured.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Ah, I would have called that switch 2, but there's no reason to believe that netops would have gone with chronological numbering instead of a physical sequence. (the new switches were added between this one and the 'first' switch where the fiber terminates) Anyway, that switch was previously flat, so if the reconfig didn't leave it running as a flat switch, this all makes sense.
There was already a bug open to get all of the switch info into inventory (I suspect this was a mis-labeling or no-labeling problem) but it hasn't been top priority because getting the hardware up was more important. We'll make sure things get documented so this doesn't happen again.
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.