Closed Bug 843744 (panda-0081) Opened 11 years ago Closed 9 years ago

Decommission panda-0081

Categories

(Infrastructure & Operations :: DCOps, task, P3)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Unassigned)

References

()

Details

(Whiteboard: [buildduty][buildslaves][capacity])

DC ops is looking at this panda, as it's very flaky.

See bug 817103 for more details
Also forgot to mention that I chmoded og-r the panda-0081 directory on foopy85 so it would stay out of commission
 echo "bug 843744" > /builds/panda-0081/disabled.flg is a better way to disable a panda
The issue that first caused this panda to fail (5 months ago) might have been an issue with the automation process and not really a problem with the panda board itself.  Since I really didn't find anything wrong with this panda board, I issued a reimage via mozpool.
Resolving all panda bugs linked from Bug 817103 that are not in troubleshoot or failed_pxe* state in lifeguard.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
pdu reboot didn't help, needs recovery
Status: RESOLVED → REOPENED
Depends on: 902657
Resolution: FIXED → ---
Sending this slave to recovery
-->Automated message.
Depends on: 948669
panda-081 -  selftest.py[INFO]: test_preseed_file_integrity[FAILED] boot.scr : 5a5c34aa07d2d8f23e1b69347d49bacf205041dd != 6261fdd19a45db13e6503c5010e3917dbb13eeed

fix - replaced SD card with correct preseed.
Back in production
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting reboot via Mozpool...Failed.
Filed IT bug for reboot (bug 1067672)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Didn't recover, hasn't taken a job for 26 days.
QA Contact: armenzg → bugspam.Callek
Depends on: 1072405
Panda was added to the proper foopy and is now taking jobs.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Failing every other job, disabled in slavealloc.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Tools repo updated on foopy so that this panda can be properly rebooted again.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Since then, 2 green, 1 orange, 17 retry, 0 jobs for the last 4 days. Not sure this was a good choice for backfill. Disabled.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1079329
replaced SD card, panda passed self test
Reenabled and rebooted.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Attempting SSH reboot...Failed.
Attempting reboot via Mozpool...Failed.
Filed IT bug for reboot (bug 1082738)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attempting SSH reboot...Failed.
Attempting reboot via Mozpool...Failed.
Filed IT bug for reboot (bug 1083006)
replaced SD card, panda passed self test.

panda-0081	ready (request 2313518)
Score for the 2014-10-10 SD card: 11 retries, 2 failures, 5 passes.
Score for the 2014-10-20 SD card: 3 retries, 2 failures, 0 passes (and I'm already calling that a complete score since it hasn't taken a job for three days now).

It ain't the SD card. Disabled in slavealloc so we can skip another pointless card replacement the next time it fails to reboot.
Let's decomm this one.
Assignee: nobody → server-ops-dcops
Component: Buildduty → Server Operations: DCOps
Product: Release Engineering → mozilla.org
QA Contact: bugspam.Callek → dmoore
Summary: panda-0081 problem tracking → Decommission panda-0081
colo-trip: --- → scl3
Assignee: server-ops-dcops → nobody
Component: Server Operations: DCOps → Server Operations: MOC
Hey MOC team can you remove this host from nagios before I physically decomm it?  Thanks.
(In reply to Vinh Hua [:vinh] from comment #22)
> Hey MOC team can you remove this host from nagios before I physically decomm
> it?  Thanks.

Panda's are commented in nagios. Can you update the status of the bug ?
Flags: needinfo?(vhua)
Assignee: nobody → server-ops
Component: Server Operations: MOC → Server Operations
Assignee: server-ops → server-ops-dcops
Component: Server Operations → DCOps
Product: mozilla.org → Infrastructure & Operations
Panda-0081 has been physically decomm'd.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Flags: needinfo?(vhua)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.