Closed Bug 661382 Opened 13 years ago Closed 13 years ago

reboot requests (scl1)

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: zandr)

References

Details

talos-r3-w7-011 (from bug 660305)
colo-trip: --- → scl1
talos-r3-fed-042
talos-r3-fed64-025
talos-r3-w7-ref
talos-r3-w7-ref is in the midst of being rebuild, see bug 660244
talos-r3-fed-029
talos-r3-fed64-021 talos-r3-fed-043
talos-r3-xp-054
talos-r3-w7-032
talos-r3-fed-056 talos-r3-fed-060 (both newly-reimaged hosts from bug 659933)
talos-r3-w7-052
talos-r3-fed-006
talos-r3-fed-031
talos-r3-fed-010 talos-r3-fed64-026
talos-r3-w7-046
talos-r3-w7-053
(In reply to comment #15) > talos-r3-w7-053 This one came back on it's own after a prolonged restart. Please ignore.
talos-r3-xp-054: No problem found talos-r3-w7-011: gray screen talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress) talos-r3-w7-046: Powered off? [1] talos-r3-w7-052: gray screen talos-r3-fed-006: DEAD_FISH talos-r3-fed-010: DEAD_FISH talos-r3-fed-029: DEAD_FISH talos-r3-fed-031: DEAD_FISH talos-r3-fed-042: blank screen -> "USB hang" -> reimaged talos-r3-fed-043: DEAD_FISH talos-r3-fed-056: gray screen talos-r3-fed-060: Powered off? [1] talos-r3-fed64-021: DEAD_FISH talos-r3-fed64-025: DEAD_FISH talos-r3-fed64-026: DEAD_FISH That's a really bad 10 days, especially for Fedora hosts. Doing something about DEAD_FISH is the next big reliability win. The two powered down machines scare me, another failure mode to track for bug 662100. [1] We have seen minis power off due to disk errors during boot. See https://bugzilla.mozilla.org/show_bug.cgi?id=658414#c15
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
talos-r3-fed64-026 was DEAD_FISH again today, as was -034. Curiously, -026 didn't last long enough for nagios to clear the alert, it thought it had been down for days.
(In reply to comment #17) > talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress) AHT reported no errors.
(In reply to comment #17) > talos-r3-fed-060: Powered off? [1] > > The two powered down machines scare me, another failure mode to track for > bug 662100. > > [1] We have seen minis power off due to disk errors during boot. See > https://bugzilla.mozilla.org/show_bug.cgi?id=658414#c15 I can ssh into that machine but it seems to believe that is ref machine. armenzg-laptop $ ssh talos-r3-fed-060.build Warning: Permanently added 'talos-r3-fed-060.build,10.12.50.225' (RSA) to the list of known hosts. Last login: Mon May 16 08:23:39 2011 from bm-vpn01.build.mozilla.org [cltbld@talos-r3-fed-ref ~]$ uname -a Linux talos-r3-fed-ref.build.mozilla.org 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686 i686 i386 GNU/Linux
(In reply to comment #20) > I can ssh into that machine but it seems to believe that is ref machine. fixed (In reply to comment #19) > (In reply to comment #17) > > talos-r3-w7-032: gray screen (Frequent Flier: AHT in progress) > > AHT reported no errors. This is up and running ATM, for the record.
Alias: reboots-scl1
See Also: → 660305
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.