Closed
Bug 915457
Opened 12 years ago
Closed 12 years ago
Triage tegras with no completed jobs within last 24 hours
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: coop)
Details
Attachments
(1 file)
|
778 bytes,
text/x-python-script
|
Details |
We currently have about 145 tegras that have not run/completed a job within the last 24 hours, and have a large (>1100) pending count for tegra-run jobs.
We need to triage this list, look for any systemic problems and nurse these back to life as soon as possible.
https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=tegra
| Reporter | ||
Comment 1•12 years ago
|
||
Coop recovered most of these this past evening... so I concentrated on other related work instead of this bug, so handing to him pending when I get up ;-)
Coop managed the recovery by:
(in a script loop, across foopies/devices):
* Kill any hung/idle buildbot procs
* force reboot device
As of a few hours ago we had all but 40 devices up!
Assignee: bugspam.Callek → coop
| Assignee | ||
Comment 2•12 years ago
|
||
Here's the script I ran last night to resurrect the hung tegras.
'tegra_list' contained a list of hung tegras, as reported by slave_health. I'll get this script added to braindump today.
I think we should turn back on kittenherder reboots for tegras in the short-term.
| Assignee | ||
Comment 3•12 years ago
|
||
The tegra problem tracking bugs are a mess. I'll take some time today to try to resolve any open tegra bugs that shouldn't still be open, provided we don't hit another try-nado or similar apocalypse.
| Assignee | ||
Comment 4•12 years ago
|
||
This is mostly cleaned up now. tegras that were in the buildduty queue have all been nudged to their next state, whether that's recovery or production.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•