Closed Bug 1057069 Opened 10 years ago Closed 10 years ago

Determine why batches of pandas stopped reporting on 2014-04-25 and 2014-07-13

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P2)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: coop)

Details

If we look at the slave health aggregation for pandas:

https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=panda

...we can see two distinct groupings in the slaves marked as "broken."

One group stopped reporting on 2014-04-25. This includes the following pandas:

* panda-0320
* panda-0321
* panda-0322
* panda-0323
* panda-0324
* panda-0325
* panda-0326
* panda-0327
* panda-0328
* panda-0329
* panda-0331
* panda-0332

Another group stopped reporting on 2014-07-13. This includes the following pandas:

* panda-0861
* panda-0862
* panda-0863
* panda-0864
* panda-0865
* panda-0866
* panda-0867
* panda-0868
* panda-0871
* panda-0872
* panda-0873

Each group looks contiguous numerically, which makes me suspect chassis or foopy issues following an outage like a colo move.
Bug 1001705 was filed on 2014-04-25. There's all kinds of stuff happening in that bug that could be keeping those pandas offline.

We added the new emulator slave class on 2014-07-14 in bug 1034055. Unsure if that's related, but I haven't found any other likely events yet.
(In reply to Chris Cooper [:coop] from comment #0) 
> * panda-0320
> * panda-0321
> * panda-0322
> * panda-0323
> * panda-0324
> * panda-0325
> * panda-0326
> * panda-0327
> * panda-0328
> * panda-0329
> * panda-0331
> * panda-0332

These are all on foopy58.

> * panda-0861
> * panda-0862
> * panda-0863
> * panda-0864
> * panda-0865
> * panda-0866
> * panda-0867
> * panda-0868
> * panda-0871
> * panda-0872
> * panda-0873

These are all on foopy101.
(In reply to Chris Cooper [:coop] from comment #2)
> These are all on foopy58. 
> These are all on foopy101.

Bingo. There are no panda-* directories under /builds on either of these two foopies.
I ran create_device_dirs on these two foopies, and they are much happier now.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
At this point, have we verified all of the foopies were reinstalled correctly?  This is the second problem we've run into (the first was some of the foopies not having the correct version of the software)?
(In reply to Amy Rich [:arich] [:arr] from comment #5)
> At this point, have we verified all of the foopies were reinstalled
> correctly?  This is the second problem we've run into (the first was some of
> the foopies not having the correct version of the software)?

All of the foopies we care about (i.e. for pandas) are now working correctly.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.