Out of the new pandas in r201, these are the ones that are unable to image properly: 574, 562, 488, 342 - have never been up 468, 548, 555, 570 - continually stuck at fail_android_downloading 472, 554, 598 - seem to be in reboot loops I've noticed that there are others that image, come up, and then fall off the network (this is with the android image) as well. Right now, the ones that nagios lists as down (out of p2-p6) are: panda-0598.p6.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:14 0d 2h 14m 5s panda-0574.p6.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:13 1d 0h 7m 6s panda-0562.p6.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:39 1d 0h 6m 40s panda-0495.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:05 0d 0h 4m 14s panda-0492.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:39 0d 0h 3m 40s panda-0488.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:13 1d 0h 7m 6s panda-0486.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:05 0d 0h 2m 14s panda-0484.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:13 0d 0h 19m 6s panda-0470.p5.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:52:12 0d 0h 3m 7s panda-0432.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:03 0d 0h 4m 16s panda-0429.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:03 0d 0h 2m 16s panda-0419.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:52:12 0d 0h 3m 7s panda-0417.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:51:03 0d 0h 4m 16s panda-0416.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:54:12 0d 0h 1m 7s panda-0405.p4.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:52:02 0d 0h 3m 17s panda-0342.p3.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:00 1d 0h 7m 19s panda-0316.p3.releng.scl1.mozilla.com CRITICAL 11-28-2012 14:53:12 0d 0h 2m 7s
> 468, 548, 555, 570 - continually stuck at fail_android_downloading http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0468 http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0548 http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0555 http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0570 This is a permanent state. Looking at the logs, it never successfully formats the partitions, so I'm guesssing new sdcards. Jake? (P.S. Yes, I know the logs are out of order due to timezone problems. It's fixed already) > 472, 554, 598 - seem to be in reboot loops This indicates that they're having some failure for which we're more permissive, so it's best to see where they land (in a state with a "fail_" prefix) before passing judgement. Now that I look, they're all in failed_pxe_booting, meaning that after power-cycling them, we heard nothing. This is usually an sdcard issue, but it's hard to tell with the bogus logging exactly what went on here. The other possibility is that mobile-imaging-006 did this while it was sick and not receiving any HTTP posts. I've re-started installs on these three pandas. If they're in a failed_* state tomorrow, give 'em new sdcards.
(In reply to Dustin J. Mitchell [:dustin] from comment #1) I had already let the the ones not listed as fail_android_downloading above fall into a failure state of failed_pxe_booting and retried them again after all the relays was up functional. They're now back in that state.
I added the beginnings of a KB for pandas here https://mana.mozilla.org/wiki/display/IT/Panda+Failure+Modes let's keep adding to that as we learn more
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 817103
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.