Closed
Bug 641647
Opened 14 years ago
Closed 14 years ago
sjc1 reboot requests
Categories
(Infrastructure & Operations :: RelOps: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: phong)
References
Details
talos-r3-snow-032.build.mozilla.org
Reporter | ||
Updated•14 years ago
|
Alias: reboots
Comment 1•14 years ago
|
||
bug 629511 comment #42 and #48 describe previous weird behaviour from talos-r3-snow-032. It probably needs a reimage if it is hung shutting down again.
Updated•14 years ago
|
Assignee: server-ops → server-ops-releng
Component: Server Operations → Server Operations: RelEng
QA Contact: mrz → zandr
Comment 2•14 years ago
|
||
talos-r3-snow-032: Hung shutting down, as suspected. Reimaged.
Also picked up from nagios while I was here:
talos-r3-fed-021: date problem
talos-r3-w7-036: gray screen -> rebooted
Comment 3•14 years ago
|
||
moz2-darwin9-slave70.build.mozilla.org - not responding to pings or ssh
Comment 4•14 years ago
|
||
talos-r3-w7-052
Comment 5•14 years ago
|
||
talos-r3-fed-035
Comment 6•14 years ago
|
||
mv-moz2-linux-ix-slave23
Comment 7•14 years ago
|
||
remove mv-moz2-linux-ix-slave23 in comment 6 - I forgot it was an IX box, and successfully rebooted it via IPMI. It was stuck in the puppet loop.
Comment 8•14 years ago
|
||
talos-r3-fed-018
Comment 9•14 years ago
|
||
FYI:
mw32-ix-slave02 - blank screen, rebooted via IPMI
Comment 10•14 years ago
|
||
w32-ix-slave41 - no ping
IPMI timed out while connecting
Comment 11•14 years ago
|
||
talos-r3-fed64-039 - no ping
Comment 12•14 years ago
|
||
talos-r3-fed-037.build - no ping
Comment 13•14 years ago
|
||
talos-r3-fed64-014.build - no ping
talos-r3-xp-030.build - no ping
Comment 14•14 years ago
|
||
talos-r3-fed-039 - no ping
Comment 15•14 years ago
|
||
talos-r3-xp-030 - no ping
Comment 16•14 years ago
|
||
(In reply to comment #15)
> talos-r3-xp-030 - no ping
meh. repeating what aki said
Comment 17•14 years ago
|
||
talos-r3-w7-036 (note: already rebooted once in comment 2; feel free to spin out into a new bug for further whacking)
Updated•14 years ago
|
Assignee: server-ops-releng → zandr
Comment 18•14 years ago
|
||
talos-r3-fed64-001.build - no ping
Reporter | ||
Comment 19•14 years ago
|
||
try-mac-slave35.build.mozilla.org - dead to the world
Comment 20•14 years ago
|
||
talos-r3-fed-027
Comment 21•14 years ago
|
||
talos-r3-fed64-031
Comment 22•14 years ago
|
||
talos-r3-fed-013
Comment 23•14 years ago
|
||
moz2-darwin10-slave15 (sjc1)
Comment 24•14 years ago
|
||
(In reply to comment #10)
> w32-ix-slave41 - no ping
> IPMI timed out while connecting
Sorry, this is out at IX for repair. Ignore.
Comment 25•14 years ago
|
||
talos-r3-fed64-006
Comment 26•14 years ago
|
||
talos-r3-fed64-001
Comment 27•14 years ago
|
||
talos-r3-xp-030: blank screen -> reboot
talos-r3-w7-036: gray screen
talos-r3-w7-052: gray screen
talos-r3-fed-013: date problem
talos-r3-fed-018: DEAD_FISH_MODE? Up, but no IP
talos-r3-fed-027: gray screen
talos-r3-fed-035: gray screen
talos-r3-fed-037: gray screen
talos-r3-fed-039: date problem
talos-r3-fed64-001: date problem
talos-r3-fed64-006: blank screen -> reboot
talos-r3-fed64-014: gray screen
talos-r3-fed64-031: File system screwed up -> reimaged
talos-r3-fed64-039: date problem
and from Nagios:
talos-r3-xp-042: gray screen
Comment 28•14 years ago
|
||
talos-r3-fed64-051
- can't SSH to it
- commands timing out, and unkillable
- I've disabled in slavealloc, so if it manages to reach the allocator, it will stop burning builds
Comment 29•14 years ago
|
||
talos-r3-fed-040
Comment 30•14 years ago
|
||
talos-r3-fed64-051 apparently isn't talking to the allocator, but it takes either 60 or 80 minutes to time out on each job, so it only managed to take, screw up, and apparently fail to retrigger 16 jobs Monday.
Comment 31•14 years ago
|
||
Comment 32•14 years ago
|
||
talos-r3-w7-036
Comment 33•14 years ago
|
||
talos-r3-fed-028
Comment 34•14 years ago
|
||
talos-r3-w7-036: gray screen
talos-r3-fed-028: blank screen -> reboot, seems OK
talos-r3-fed-040: date problem
Comment 35•14 years ago
|
||
talos-r3-fed-046
Comment 36•14 years ago
|
||
talos-r3-w7-052
Comment 37•14 years ago
|
||
hm, that's four failed mini's today - I wonder if there's some underlying cause? Cooling failure? Dunno..
Comment 38•14 years ago
|
||
(In reply to comment #37)
> hm, that's four failed mini's today - I wonder if there's some underlying
> cause? Cooling failure? Dunno..
The underlying cause is "Using Mac Minis in a Production Environment"
Cooling was fine in scl1 while I was there. w7-036 is a known repeat offender, the date problem is no surprise.
Comment 39•14 years ago
|
||
Consolidated list:
sjc1:
try-mac-slave35
moz2-darwin10-slave15
moz2-darwin9-slave70
scl1:
talos-r3-fed-046
talos-r3-w7-052
Comment 40•14 years ago
|
||
Assigning to phong to hit the sjc1 machines in comment 39. Assign back when you're done.
Assignee: zandr → phong
Assignee | ||
Comment 41•14 years ago
|
||
(In reply to comment #39)
> Consolidated list:
>
> sjc1:
> try-mac-slave35
> moz2-darwin10-slave15
> moz2-darwin9-slave70
rebooted.
Assignee: phong → zandr
Flags: colo-trip+
Comment 42•14 years ago
|
||
p3-win03 (geriatric windows slave) seems to be stuck at the shutdown screen and needs a reboot.
Comment 43•14 years ago
|
||
talos-r3-snow-026
Comment 44•14 years ago
|
||
(In reply to comment #42)
> p3-win03 (geriatric windows slave) seems to be stuck at the shutdown screen and
> needs a reboot.
power cycled
Comment 45•14 years ago
|
||
talos-r3-fed64-035
Comment 46•14 years ago
|
||
talos-r3-snow-009
Comment 47•14 years ago
|
||
(In reply to comment #46)
> talos-r3-snow-009
Never mind - it got better..
Comment 48•14 years ago
|
||
talos-r3-fed-021
Comment 49•14 years ago
|
||
talos-r3-fed64-033
Comment 50•14 years ago
|
||
talos-r3-fed-029
Comment 51•14 years ago
|
||
talos-r3-fed-004
Component: Server Operations: RelEng → Server Operations: Netops
Flags: colo-trip+
Comment 52•14 years ago
|
||
try-mac-slave11 (sjc1)
Comment 53•14 years ago
|
||
talos-r3-snow-026: no lease
talos-r3-fed-004: date problem
talos-r3-fed-021: date problem
talos-r3-fed-029: no lease
talos-r3-fed64-033: gray screen
Assigning to phong, for comment 52. Phong, please RESO/FIXE when you're done.
Everyone else, new bug please.
Assignee: zandr → phong
Comment 54•14 years ago
|
||
talos-r3-fed64-035: usb hang
talos-r3-w7-052: gray screen
talos-r3-fed-046: gray screen
Updated•14 years ago
|
Alias: reboots
Comment 55•14 years ago
|
||
phong: if you're not there yet, please also hit moz2-darwin10-slave15
Updated•14 years ago
|
Summary: reboot requests → sjc1 reboot requests
Updated•14 years ago
|
Component: Server Operations: Netops → Server Operations
QA Contact: zandr → mrz
Assignee | ||
Comment 56•14 years ago
|
||
(In reply to comment #55)
> phong: if you're not there yet, please also hit moz2-darwin10-slave15
rebooted.
Assignee | ||
Comment 57•14 years ago
|
||
(In reply to comment #52)
> try-mac-slave11 (sjc1)
won't power on at all. bringing back to MV.
Status: NEW → RESOLVED
Closed: 14 years ago
Component: Server Operations → Server Operations: RelEng
Flags: colo-trip+
QA Contact: mrz → zandr
Resolution: --- → FIXED
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•