Closed Bug 1069095 (panda-0619) Opened 10 years ago Closed 9 years ago

panda-0619 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

()

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Hasn't taken a job for 26 days.
Depends on: 1072405
Had to do a little bit of extra work to get this panda back into production. Required a self-test run, followed by a re-image, but it's taking jobs now.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Failing every other job, disabled in slavealloc. (Also rather suspicious that it only did two jobs, while most of this busted set did six or eight today before I disabled them.)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tools repo updated on foopy so that this panda can be properly rebooted again.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
30% green, 14% orange, 18% red, 38% retry. According to Ouija, we expect 21.1% failure for pandas, not 70%. Disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1094944
replaced SD card, panda passed self test.
Reenabled to build up to strike two.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Score since then: 10 green, 3 orange, 5 red, 16 blue. Expected failure rate (including retries) for a panda is currently at 26.9%, and that's 70.6%.

Strike two, disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1107566
this panda had a bad power cable going back to the power control relay board. i moved the power over to bank 8 and updated inventory however the board is not completing the self test. is there another file that needs to be updated to reflect the relay setting or maybe perhaps time to decomm this board?
(In reply to Van Le [:van] from comment #8)
> this panda had a bad power cable going back to the power control relay
> board. i moved the power over to bank 8 and updated inventory however the
> board is not completing the self test. is there another file that needs to
> be updated to reflect the relay setting or maybe perhaps time to decomm this
> board?

Adding a NI for Jake to see if there's something else that needs updating.
Flags: needinfo?(jwatkins)
(In reply to Van Le [:van] from comment #8)
> this panda had a bad power cable going back to the power control relay
> board. i moved the power over to bank 8 and updated inventory however the
> board is not completing the self test. is there another file that needs to
> be updated to reflect the relay setting or maybe perhaps time to decomm this
> board?

Simply changing the system.relay.0 k/v in inventory is correct.  The mozpool inventory sync cron will pick it up at the 15,45 hour marks.

Bank 8 does not exist on those relay boards.  You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing. So you might want to double check which relay it is connected to. At the bottom of this mana page is the board layout. You can us that to see which bank and relay you are actually connected to.
https://mana.mozilla.org/wiki/display/IT/Power+Control+Relay+Board

If you have a volt meter, you can check that the cable is getting power and you can use that meter to see if the selftest is actually hitting the relay you 'think' you are on.

Also, how did you determine the power cable was bad? It may have been a blown fuse. In that case, the power cable connector shorted out on the chassis or the pandaboard smoked and blew the fuse.  And I seem to recall we left the fuses out on the 12th power cable (the one that goes to the empty panda bracket). So if you switched over to that one, make sure there is a fuse for it. And you should remove the fuse for the power connector that is not in place. (so it doesn't short out also)

If you are sure it is receiving power and connected to the correct relay then the pandaboard is probably smoked in which I would say decomm it. If you're sure your on the right relay and don't have a meter, decomm it.
Flags: needinfo?(jwatkins)
>Bank 8 does not exist on those relay boards.  You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing.

yah inventory is correct, i mean to write bank 2 relay 8, not sure what happened.

>Also, how did you determine the power cable was bad? It may have been a blown fuse.

I tested by moving the cables back and forth and replaced the fuse several several times. It could be possible that relay or cable somehow shorted but I checked the connections as well. I didn't have a volt meter on hand but I'll give it another look before I decomm it. Thanks for the info.
Depends on: 1111812
No longer depends on: 1111812
(In reply to Van Le [:van] from comment #11)
> >Bank 8 does not exist on those relay boards.  You might have meant bank 2, relay 8 which is also what inventory shows and what mozpool(bmm) is showing.
> 
> yah inventory is correct, i mean to write bank 2 relay 8, not sure what
> happened.

I've updated our copy of the relay info:

https://hg.mozilla.org/build/tools/rev/751cafa489e5

> >Also, how did you determine the power cable was bad? It may have been a blown fuse.
> 
> I tested by moving the cables back and forth and replaced the fuse several
> several times. It could be possible that relay or cable somehow shorted but
> I checked the connections as well. I didn't have a volt meter on hand but
> I'll give it another look before I decomm it. Thanks for the info.

Did we come to any resolution here, i.e. did we find any further problems? Can I return this panda to service? In the absence of any errors, I'd prefer not to decomm.
Flags: needinfo?(vle)
it's bad, we should decommission it. FWIW, we have over 200+ panda spares just sitting around.
Flags: needinfo?(vle)
Depends on: 1116577
Decommissioned.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.