If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Status

Infrastructure & Operations
RelOps
RESOLVED DUPLICATE of bug 817103
5 years ago
4 years ago

People

(Reporter: arr, Assigned: dividehex)

Tracking

Details

(Reporter)

Description

5 years ago
Out of the new pandas in r201, these are the ones that are unable to image properly:

574, 562, 488, 342 - have never been up
468, 548, 555, 570 - continually stuck at fail_android_downloading
472, 554, 598 - seem to be in reboot loops

I've noticed that there are others that image, come up, and then fall off the network (this is with the android image) as well.

Right now, the ones that nagios lists as down (out of p2-p6) are:

panda-0598.p6.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:14   0d 2h 14m 5s panda-0574.p6.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:13   1d 0h 7m 6s   
panda-0562.p6.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:39   1d 0h 6m 40s  
panda-0495.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:05   0d 0h 4m 14s  
panda-0492.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:39   0d 0h 3m 40s  
panda-0488.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:13   1d 0h 7m 6s   
panda-0486.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:05   0d 0h 2m 14s  
panda-0484.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:13   0d 0h 19m 6s  
panda-0470.p5.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:52:12   0d 0h 3m 7s   
panda-0432.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:03   0d 0h 4m 16s  
panda-0429.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:03   0d 0h 2m 16s  
panda-0419.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:52:12   0d 0h 3m 7s   
panda-0417.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:51:03   0d 0h 4m 16s  
panda-0416.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:54:12   0d 0h 1m 7s   
panda-0405.p4.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:52:02   0d 0h 3m 17s  
panda-0342.p3.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:00   1d 0h 7m 19s  
panda-0316.p3.releng.scl1.mozilla.com CRITICAL  11-28-2012 14:53:12   0d 0h 2m 7s
> 468, 548, 555, 570 - continually stuck at fail_android_downloading

  http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0468
  http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0548
  http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0555
  http://mobile-services.build.scl1.mozilla.com/ui/log.html?device=panda-0570

This is a permanent state.  Looking at the logs, it never successfully formats the partitions, so I'm guesssing new sdcards.  Jake?

(P.S. Yes, I know the logs are out of order due to timezone problems.  It's fixed already)

> 472, 554, 598 - seem to be in reboot loops

This indicates that they're having some failure for which we're more permissive, so it's best to see where they land (in a state with a "fail_" prefix) before passing judgement.  Now that I look, they're all in failed_pxe_booting, meaning that after power-cycling them, we heard nothing.  This is usually an sdcard issue, but it's hard to tell with the bogus logging exactly what went on here.  The other possibility is that mobile-imaging-006 did this while it was sick and not receiving any HTTP posts.  I've re-started installs on these three pandas.  If they're in a failed_* state tomorrow, give 'em new sdcards.
(Reporter)

Comment 2

5 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #1)

I had already let the the ones not listed as fail_android_downloading above fall into a failure state of failed_pxe_booting and retried them again after all the relays was up functional.  They're now back in that state.
I added the beginnings of a KB for pandas here
  https://mana.mozilla.org/wiki/display/IT/Panda+Failure+Modes
let's keep adding to that as we learn more
(Reporter)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 817103
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.