Closed Bug 807163 Opened 12 years ago Closed 9 years ago

logcat chassis 6 panda

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: van, Unassigned)

Details

(Whiteboard: [reit-panda])

Attachments

(5 files)

panda-0073 12 years ago Van Le [:van] 105.73 KB, text/plain		Details
panda-0075 12 years ago Van Le [:van] 45.92 KB, text/plain		Details
panda-0076 12 years ago Van Le [:van] 79.79 KB, text/plain		Details
panda-0080 12 years ago Van Le [:van] 57.56 KB, text/plain		Details
panda-0081 12 years ago Van Le [:van] 42.83 KB, text/plain		Details

Van Le [:van]

Reporter

Description

•

12 years ago

Attached file panda-0073 — Details

The following pandas in chassis 6 are bad and I've attached the logcat for them.

panda-0073
panda-0075
panda-0076
panda-0080
panda-0081

Van Le [:van]

Reporter

Comment 1

•

12 years ago

Attached file panda-0075 — Details

Van Le [:van]

Reporter

Comment 2

•

12 years ago

Attached file panda-0076 — Details

Van Le [:van]

Reporter

Comment 3

•

12 years ago

Attached file panda-0080 — Details

Van Le [:van]

Reporter

Comment 4

•

12 years ago

Attached file panda-0081 — Details

Hal Wine [:hwine] use NI!

Updated

•

12 years ago

Blocks: 799698

Whiteboard: [reit-panda]

Armen [:armenzg]

Comment 5

•

12 years ago

Carrying forward:
(In reply to Van Le [:van] from comment #14)
> I'm going to work on one chassis at a time and open bugs for the pandas I
> can't get to come online.
> 
> Chassis 6: I was NOT able to get the following pandas online after multiple
> SD card swaps, reboots, etc... I have opened bug 807163 and attached the
> logcat for the pandas.
> 
> panda-00[73,75,76,80,81]

Geoff Brown [:gbrown] (pto, back Jun 10)

Comment 6

•

12 years ago

Can someone attach logs for a couple of "good" pandas, for comparison?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 7

•

12 years ago

"adb shell dumpsys" (or is it "adb dumpsys"? I forget) from good/bad devices may also provide some clues.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 8

•

12 years ago

This looks pretty telling:

E/EthernetStateMachine( 1397): DhcpHandler: DHCP request failed: Timed out waiting for dhcpcd to start

We should try to find what command line it's using there and run it ourselves to debug further.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 9

•

12 years ago

It looks like the daemon is started by setting the 'ctl.start' property to something like 'dhcpcd_eth0:eth0' and then waiting on property 'init.svc.dhcpcd_eth0' to be set. It's timing out while waiting on that property. See dhcp_utils.c in the Linaro Android source.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 10

•

12 years ago

Do these machines fail on a consistent basis or is it just sometimes?

Van Le [:van]

Reporter

Comment 11

•

12 years ago

:snorp, the ones in this bug are the ones I was unable to get online after several reboots/SD card swaps/reimages.

Van

Kim Moir [:kmoir] ET

Comment 12

•

12 years ago

Is there any update on the status of these pandas?

Armen [:armenzg]

Comment 13

•

12 years ago

Should we move these pandas to rack#10 and see if we can re-image them with mozpool?

Remove block on bug 799698.

No longer blocks: 799698

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

10 years ago

can we close this bug?  14 months with no traction?

Hal Wine [:hwine] use NI!

Comment 15

•

10 years ago

We shouldn't just close the bug -- we need to make a decision on these hardware units. The two choices I see listed are:
 a) :armenzg in comment 13 - physically move units to another location for further investigation. That would be 2 new bugs (1 to move, the other for investigation)
 b) my suggestion to declare them BER (beyond economic repair) and file a bug to decom them

After that decision, and those bugs are filed, we can close this one.

Callek: do you know enough about the current state to make the call?

Flags: needinfo?(bugspam.Callek)

Justin Wood (:Callek)

Comment 16

•

10 years ago

I don't know enough to identify how difficult/costly a repair is. I do know however that we are very over-current-capacity needs with regard to pandas.

So my recommendation is:

(a) store pandas in a "possibly damaged" box, leaving asset tags in place, without sdcards, and document in inventory this bug number as a reference point incase we need to spend time on recovery in future
(b) close this bug.

Any human effort to recover from unknown panda device [hardware] conditions is, imo, not worth it at this time.

Flags: needinfo?(bugspam.Callek)

Hal Wine [:hwine] use NI!

Comment 17

•

10 years ago

Okay, I like the minimize human effort -- so lets leave them in chassis if :dividhex agrees that won't hurt. That's save pulling, boxing, and then moving that box to scl3

Jake: I think this is our first decom-some-pandas-in-chassis case. Can you advise on:
 - whether it's okay to leave physically in the chassis
 - how inventory should be marked so we won't get confused.
 - any other needed changes to procedure in comment 16

Flags: needinfo?(jwatkins)

Jake Watkins [:dividehex]

Comment 18

•

10 years ago

(In reply to Hal Wine [:hwine] (use needinfo) from comment #17)
> Okay, I like the minimize human effort -- so lets leave them in chassis if
> :dividhex agrees that won't hurt. That's save pulling, boxing, and then
> moving that box to scl3
> 
> Jake: I think this is our first decom-some-pandas-in-chassis case. Can you
> advise on:
>  - whether it's okay to leave physically in the chassis
>  - how inventory should be marked so we won't get confused.
>  - any other needed changes to procedure in comment 16

It is perfectly fine to leave them in the chassis until the can be properly decommed and removed.
Not sure how to mark them in inventory since they aren't actually decommed (removed from chassis).  Maybe "error/service."  I also think it is a good idea to note the bug# in inventory.  

My bigger question is; are these really "bad" pandas?  I can't find the original reason they were considered bad and needed dcops to look at them to begin with.  This bug is also extremely old and I believe it was filed during the smoke test era.  The environment around it has also changed alot since then. eg. mozpool upgrades, psu adjustments, ethernet cable reseating, chassis was moved to different pod, etc.  It is possible the original interpretation of them as "bad" may have been from external influence.  They all pass the basic mozpool selftest so is there any reason not to just add these back to the pool and see if they attract sheriff attention?

Flags: needinfo?(jwatkins)

Van Le [:van]

Reporter

Comment 19

•

9 years ago

no tracking for 1.5 years, going to close. let me know if this is still relevant.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.