Bug 929823 (panda-0843)

panda-0843 problem tracking

RESOLVED FIXED

Status

P1
normal
RESOLVED FIXED
5 years ago
10 months ago

People

(Reporter: Callek, Unassigned)

Tracking

Details

(Whiteboard: [buildduty][buildslaves][capacity])

Comment hidden (empty)
How about rather than panda-recovery, we give it panda-extreme-unction? This panda, while disabled mind you, has made people wait for 62 hours since October 18th, while it spends 20 minutes failing to do anything, then sets RETRY.

We do have some way of actually disabling a panda, don't we?
Severity: normal → critical
Priority: P3 → P1
It has now run mochitest-8 on the merge to beta fifteen times during the course of the last five and a half hours.
Disabled in slacealloc.
Callek, this panda is still taking jobs even though it is disabled in slavealloc:
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=panda-0843

I thought slavealloc was supposed to work for pandas now?
Flags: needinfo?(bugspam.Callek)
(Reporter)

Comment 5

5 years ago
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> Callek, this panda is still taking jobs even though it is disabled in
> slavealloc:
> https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.
> html?name=panda-0843
> 
> I thought slavealloc was supposed to work for pandas now?

Ugh apparantly buildbot has been running for this panda since July first! Which means it didn't get the change that allowed it to automatically shut down after every job, so it wasn't using any new code for slavealloc.

I killed all buildbot jobs on this foopy, since they were all running for that long. (And could explain some of the retry amounts for these pandas)
Flags: needinfo?(bugspam.Callek)
(In reply to Justin Wood (:Callek) from comment #5)
> Ugh apparantly buildbot has been running for this panda since July first!
> Which means it didn't get the change that allowed it to automatically shut
> down after every job, so it wasn't using any new code for slavealloc.
> 
> I killed all buildbot jobs on this foopy, since they were all running for
> that long. (And could explain some of the retry amounts for these pandas)

Great - thank you :-)
Could we check that the same hasn't occurred on any of the other foopies?
Flags: needinfo?(bugspam.Callek)
At first glance, at least the following seem to be exhibiting the same problem (found by selecting a handful of disabled tegras at random and seeing if they are still taking jobs + what master they are on):
foopy89
foopy91
foopy92
foopy94
foopy96
foopy97
foopy98

Think we may need to check them all sadly.
Moving discussion to bug 888835.
Flags: needinfo?(bugspam.Callek)
(Reporter)

Comment 9

5 years ago
Sending this slave to recovery
-->Automated message.
(Reporter)

Comment 10

5 years ago
recovered by "panda-recovery" bug 902657
Severity: critical → normal
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED

Updated

10 months ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.