Closed
Bug 661382
Opened 13 years ago
Closed 13 years ago
reboot requests (scl1)
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: zandr)
References
Details
talos-r3-w7-011 (from bug 660305)
Reporter | ||
Updated•13 years ago
|
colo-trip: --- → scl1
Reporter | ||
Comment 1•13 years ago
|
||
talos-r3-fed-042
Reporter | ||
Comment 2•13 years ago
|
||
talos-r3-fed64-025
Comment 3•13 years ago
|
||
talos-r3-w7-ref
Assignee | ||
Comment 4•13 years ago
|
||
talos-r3-w7-ref is in the midst of being rebuild, see bug 660244
Reporter | ||
Comment 5•13 years ago
|
||
talos-r3-fed-029
Reporter | ||
Comment 6•13 years ago
|
||
talos-r3-fed64-021
talos-r3-fed-043
Reporter | ||
Comment 7•13 years ago
|
||
talos-r3-xp-054
Reporter | ||
Comment 8•13 years ago
|
||
talos-r3-w7-032
Reporter | ||
Comment 9•13 years ago
|
||
talos-r3-fed-056
talos-r3-fed-060
(both newly-reimaged hosts from bug 659933)
Reporter | ||
Comment 10•13 years ago
|
||
talos-r3-w7-052
Reporter | ||
Comment 11•13 years ago
|
||
talos-r3-fed-006
Reporter | ||
Comment 12•13 years ago
|
||
talos-r3-fed-031
Reporter | ||
Comment 13•13 years ago
|
||
talos-r3-fed-010
talos-r3-fed64-026
Comment 14•13 years ago
|
||
talos-r3-w7-046
Comment 15•13 years ago
|
||
talos-r3-w7-053
Comment 16•13 years ago
|
||
(In reply to comment #15)
> talos-r3-w7-053
This one came back on it's own after a prolonged restart. Please ignore.
Assignee | ||
Comment 17•13 years ago
|
||
talos-r3-xp-054: No problem found
talos-r3-w7-011: gray screen
talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress)
talos-r3-w7-046: Powered off? [1]
talos-r3-w7-052: gray screen
talos-r3-fed-006: DEAD_FISH
talos-r3-fed-010: DEAD_FISH
talos-r3-fed-029: DEAD_FISH
talos-r3-fed-031: DEAD_FISH
talos-r3-fed-042: blank screen -> "USB hang" -> reimaged
talos-r3-fed-043: DEAD_FISH
talos-r3-fed-056: gray screen
talos-r3-fed-060: Powered off? [1]
talos-r3-fed64-021: DEAD_FISH
talos-r3-fed64-025: DEAD_FISH
talos-r3-fed64-026: DEAD_FISH
That's a really bad 10 days, especially for Fedora hosts. Doing something about DEAD_FISH is the next big reliability win.
The two powered down machines scare me, another failure mode to track for bug 662100.
[1] We have seen minis power off due to disk errors during boot. See https://bugzilla.mozilla.org/show_bug.cgi?id=658414#c15
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 18•13 years ago
|
||
talos-r3-fed64-026 was DEAD_FISH again today, as was -034.
Curiously, -026 didn't last long enough for nagios to clear the alert, it thought it had been down for days.
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to comment #17)
> talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress)
AHT reported no errors.
Comment 20•13 years ago
|
||
(In reply to comment #17)
> talos-r3-fed-060: Powered off? [1]
>
> The two powered down machines scare me, another failure mode to track for
> bug 662100.
>
> [1] We have seen minis power off due to disk errors during boot. See
> https://bugzilla.mozilla.org/show_bug.cgi?id=658414#c15
I can ssh into that machine but it seems to believe that is ref machine.
armenzg-laptop $ ssh talos-r3-fed-060.build
Warning: Permanently added 'talos-r3-fed-060.build,10.12.50.225' (RSA) to the list of known hosts.
Last login: Mon May 16 08:23:39 2011 from bm-vpn01.build.mozilla.org
[cltbld@talos-r3-fed-ref ~]$ uname -a
Linux talos-r3-fed-ref.build.mozilla.org 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686 i686 i386 GNU/Linux
Reporter | ||
Comment 21•13 years ago
|
||
(In reply to comment #20)
> I can ssh into that machine but it seems to believe that is ref machine.
fixed
(In reply to comment #19)
> (In reply to comment #17)
> > talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress)
>
> AHT reported no errors.
This is up and running ATM, for the record.
Reporter | ||
Updated•13 years ago
|
Alias: reboots-scl1
Updated•11 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•