Closed Bug 635416 Opened 13 years ago Closed 13 years ago

reboot requests

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: zandr)

References

Details

(Whiteboard: [slaveduty])

bug 629511 took care of a bunch of reboots, but a few were missed:

mv-moz2-linux-ix-slave07
mv-moz2-linux-ix-slave21
mv-moz2-linux-ix-slave22
w32-ix-slave05

I can't get to the IPMI interface on any of these.  The first three are unpingable, while the last failed doing a MozBuildTools install, and seems to be hung without RDP, VNC, or SSH access available.
Alias: reboots
no ping:
talos-r3-fed-003
talos-r3-fed-018
no ping:
talos-r3-fed-024
talos-r3-fed-008
talos-r3-w7-032
no ping:
talos-r3-w7-036
^^ w7-036 may deserve its own bug for further - it seems to fail to reboot more often than others.  I'll leave that to server ops..
rebooted in bug 634368, but I didn't manage to catch it before it disappeared again:
moz2-darwin9-slave51
ping, but no SSH or VNC.  Not running buildslave, so no worries:
talos-r3-fed64-040
Blocks: 629006
no ping:
talos-r3-fed64-044
argh, my eyes are getting bleary.  ** IGNORE comment 7 **
no ping:
talos-r3-fed-044
talos-r3-fed-039
the two hosts in comment #8 are part of the mass die-off in bug 636051.  I'll re-add them here if we decide a reboot is the appropriate solution.
fed-003: date problem
fed-008: date problem
fed-018: DHCP failure at 19 Feb 07:07
fed-024: DHCP failure at 20 Feb 18:57
fed-039: got it in bug 636051
fed-044: got in in bug 636051

fed64-040: looked like it never got rebooted after imaging?

w7-032: gray screen -> reboot
w7-036: gray screen -> reboot
After zandr's impromptu scl trip, the list is:

moz2-darwin9-slave51
mv-moz2-linux-ix-slave07
mv-moz2-linux-ix-slave21
mv-moz2-linux-ix-slave22
talos-r3-fed-042
w32-ix-slave05

(and yes, talos-r3-fed64-040 hasn't been set up yet)
Assignee: server-ops-releng → zandr
add
mv-moz2-linux-ix-slave10 (no ping)
add
linux-ix-slave16 (fallout from bug 636342)
add
w32-ix-slave10 (stuck at the OPSI prompt; needs a reboot and the event log needs to be cleared (run -> eventvwr, clear out the OPSI list))
add
talos-r3-fed64-030 (no ping)
(In reply to comment #12)
> add
> mv-moz2-linux-ix-slave10 (no ping)

Managed to reset this using IPMI, there was barf on the console from puppetd.
add:
talos-r3-xp-024 (no ping)
add
w32-ix-slave10 (stuck at the OPSI prompt; needs a reboot and the event log
needs to be cleared (run -> eventvwr, clear out the OPSI list))
add
w32-ix-slave14 (same reason)
add
talos-r3-fed-022 (no ping)
add
cm-bbot-linux-002.mozilla.org

(if you'd like this one on a separate bug, let me know)
add
win64-ix-ref (no ping)
(see bug 635416)
add
talos-r3-w7-036.build.scl1.mozilla.com no ping or ssh
linux-ix-slave16 is on the list, but also has slow io, so maybe it should just be bundled off to IX while it's down?
add
w32-ix-slave08 (no ping, IMPI doesn't work)
add
w32-ix-slave18 (no ping or ssh)
(In reply to comment #27)
> add
> w32-ix-slave18 (no ping or ssh)

ignore this - nick can reach it via vnc and it appears stopped
add
talos-r3-fed-023 (no ping)
talos-r3-xp-024: gray screen -> reboot
talos-r3-w7-036: gray screen -> reboot
talos-r3-fed-022: date problem
talos-r3-fed-023: gray screen -> reboot
talos-r3-fed-042: gray screen -> reboot
talos-r3-fed-051: date problem
talos-r3-fed64-030: gray screen -> reboot
talos-r3-fed64-038: gray screen -> reboot
w32-ix-slave05: Hung in MozillaBuild install -> rebooted, came up normally
IPMI is pending inventory update: correct address is 10.250.50.229

w32-ix-slave08: S.M.A.R.T. status BAD -> powered off, added to repair list
IPMI is pending inventory update: correct address is 10.250.50.232

w32-ix-slave10: Hung at OPSI prompt -> rebooted, came up normally
IPMI is pending inventory update: correct address is 10.250.50.234

w32-ix-slave14: Up, responsive. 
IPMI is pending inventory update: correct address is 10.250.50.238

linux-ix-slave16: rebooted during (inadvertent) move to scl1.

mv-moz2-linux-ix-slave07: No address on eth0, rebooted. bug 636390?
mv-moz2-linux-ix-slave10: Up and responsive
mv-moz2-linux-ix-slave21: host down, IPMI wedged -> bug 639424
mv-moz2-linux-ix-slave22: No address on eth0, rebooted. bug 636390?

I've sent rtucker the inventory update file, should get applied in a day or so.

That leaves:
moz2-darwin9-slave51
cm-bbot-linux-002
which are MPT, so I filed bug 639425

Thus endeth another reboots bug. :)
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Could you remove the itrequest flag on bug 639425 ?
And thanks!
(In reply to comment #32)
> Could you remove the itrequest flag on bug 639425 ?

done
Alias: reboots
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.