Closed Bug 641647 Opened 14 years ago Closed 14 years ago

sjc1 reboot requests

Categories

(Infrastructure & Operations :: RelOps: General, task)

All
Other
task
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: phong)

References

Details

talos-r3-snow-032.build.mozilla.org
Alias: reboots
bug 629511 comment #42 and #48 describe previous weird behaviour from talos-r3-snow-032. It probably needs a reimage if it is hung shutting down again.
Assignee: server-ops → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
talos-r3-snow-032: Hung shutting down, as suspected. Reimaged. Also picked up from nagios while I was here: talos-r3-fed-021: date problem talos-r3-w7-036: gray screen -> rebooted
moz2-darwin9-slave70.build.mozilla.org - not responding to pings or ssh
talos-r3-w7-052
talos-r3-fed-035
mv-moz2-linux-ix-slave23
remove mv-moz2-linux-ix-slave23 in comment 6 - I forgot it was an IX box, and successfully rebooted it via IPMI. It was stuck in the puppet loop.
talos-r3-fed-018
No longer blocks: 642344
FYI: mw32-ix-slave02 - blank screen, rebooted via IPMI
w32-ix-slave41 - no ping IPMI timed out while connecting
talos-r3-fed64-039 - no ping
talos-r3-fed-037.build - no ping
talos-r3-fed64-014.build - no ping talos-r3-xp-030.build - no ping
talos-r3-fed-039 - no ping
talos-r3-xp-030 - no ping
(In reply to comment #15) > talos-r3-xp-030 - no ping meh. repeating what aki said
talos-r3-w7-036 (note: already rebooted once in comment 2; feel free to spin out into a new bug for further whacking)
Assignee: server-ops-releng → zandr
talos-r3-fed64-001.build - no ping
try-mac-slave35.build.mozilla.org - dead to the world
talos-r3-fed-027
talos-r3-fed64-031
talos-r3-fed-013
moz2-darwin10-slave15 (sjc1)
(In reply to comment #10) > w32-ix-slave41 - no ping > IPMI timed out while connecting Sorry, this is out at IX for repair. Ignore.
talos-r3-fed64-006
talos-r3-fed64-001
talos-r3-xp-030: blank screen -> reboot talos-r3-w7-036: gray screen talos-r3-w7-052: gray screen talos-r3-fed-013: date problem talos-r3-fed-018: DEAD_FISH_MODE? Up, but no IP talos-r3-fed-027: gray screen talos-r3-fed-035: gray screen talos-r3-fed-037: gray screen talos-r3-fed-039: date problem talos-r3-fed64-001: date problem talos-r3-fed64-006: blank screen -> reboot talos-r3-fed64-014: gray screen talos-r3-fed64-031: File system screwed up -> reimaged talos-r3-fed64-039: date problem and from Nagios: talos-r3-xp-042: gray screen
talos-r3-fed64-051 - can't SSH to it - commands timing out, and unkillable - I've disabled in slavealloc, so if it manages to reach the allocator, it will stop burning builds
talos-r3-fed-040
talos-r3-fed64-051 apparently isn't talking to the allocator, but it takes either 60 or 80 minutes to time out on each job, so it only managed to take, screw up, and apparently fail to retrigger 16 jobs Monday.
(In reply to comment #28) > talos-r3-fed64-051 remove from this list - moved to bug 651133
talos-r3-w7-036
talos-r3-fed-028
talos-r3-w7-036: gray screen talos-r3-fed-028: blank screen -> reboot, seems OK talos-r3-fed-040: date problem
talos-r3-fed-046
talos-r3-w7-052
hm, that's four failed mini's today - I wonder if there's some underlying cause? Cooling failure? Dunno..
(In reply to comment #37) > hm, that's four failed mini's today - I wonder if there's some underlying > cause? Cooling failure? Dunno.. The underlying cause is "Using Mac Minis in a Production Environment" Cooling was fine in scl1 while I was there. w7-036 is a known repeat offender, the date problem is no surprise.
Consolidated list: sjc1: try-mac-slave35 moz2-darwin10-slave15 moz2-darwin9-slave70 scl1: talos-r3-fed-046 talos-r3-w7-052
Assigning to phong to hit the sjc1 machines in comment 39. Assign back when you're done.
Assignee: zandr → phong
(In reply to comment #39) > Consolidated list: > > sjc1: > try-mac-slave35 > moz2-darwin10-slave15 > moz2-darwin9-slave70 rebooted.
Assignee: phong → zandr
Flags: colo-trip+
p3-win03 (geriatric windows slave) seems to be stuck at the shutdown screen and needs a reboot.
talos-r3-snow-026
(In reply to comment #42) > p3-win03 (geriatric windows slave) seems to be stuck at the shutdown screen and > needs a reboot. power cycled
talos-r3-fed64-035
talos-r3-snow-009
(In reply to comment #46) > talos-r3-snow-009 Never mind - it got better..
talos-r3-fed-021
talos-r3-fed64-033
talos-r3-fed-029
talos-r3-fed-004
Component: Server Operations: RelEng → Server Operations: Netops
Flags: colo-trip+
try-mac-slave11 (sjc1)
talos-r3-snow-026: no lease talos-r3-fed-004: date problem talos-r3-fed-021: date problem talos-r3-fed-029: no lease talos-r3-fed64-033: gray screen Assigning to phong, for comment 52. Phong, please RESO/FIXE when you're done. Everyone else, new bug please.
Assignee: zandr → phong
talos-r3-fed64-035: usb hang talos-r3-w7-052: gray screen talos-r3-fed-046: gray screen
Alias: reboots
phong: if you're not there yet, please also hit moz2-darwin10-slave15
Summary: reboot requests → sjc1 reboot requests
Component: Server Operations: Netops → Server Operations
QA Contact: zandr → mrz
(In reply to comment #55) > phong: if you're not there yet, please also hit moz2-darwin10-slave15 rebooted.
(In reply to comment #52) > try-mac-slave11 (sjc1) won't power on at all. bringing back to MV.
Status: NEW → RESOLVED
Closed: 14 years ago
Component: Server Operations → Server Operations: RelEng
Flags: colo-trip+
QA Contact: mrz → zandr
Resolution: --- → FIXED
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.