Closed
Bug 734798
(tegra-036)
Opened 13 years ago
Closed 10 years ago
tegra-036 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
()
Details
(Whiteboard: [buildduty][capacity][buildslaves])
According to my counting on my fingers stats, it has failed 30% of its jobs over the last 200 and 40% of its jobs over the last 50, which puts it above my usual threshold of "this one is failing 50%, it's unquestionably broken" but solidly at the bottom of the "this is one crappy tegra" range. So, I'd like to be able to watch what it does post-reimaging, to see whether the bottom of that range can actually be pulled up into the horrible normal of around 20% failures.
Updated•13 years ago
|
Alias: tegra-036
Summary: Please disable tegra-036 and enroll it in tegra recovery → tegra-036 problem tracking
Whiteboard: [buildduty][capacity][buildslaves]
Comment 1•13 years ago
|
||
Ran stop_cp.sh.
Updated•13 years ago
|
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Comment 2•13 years ago
|
||
Already back in production.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 3•12 years ago
|
||
Failed every one of the last 25 jobs.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•12 years ago
|
||
stop_cp run on it
Updated•12 years ago
|
Assignee: coop → nobody
Comment 5•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 6•12 years ago
|
||
https://secure.pub.build.mozilla.org/buildapi/recent/tegra-036 - that didn't help.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 7•12 years ago
|
||
8 red in a row.
Comment 8•12 years ago
|
||
Ran stop_cp.sh again.
Reporter | ||
Comment 9•12 years ago
|
||
SUTAgent deploy brought it back out of the grave, dripping clots of dirt and flesh. Three green runs, but it'll turn back to the red side and I'll be back.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 10•12 years ago
|
||
And, no surprise, I'm back: 4 reds in a row, and no reason to think it'll turn back toward the light.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 11•12 years ago
|
||
s/4 reds in a row/10 reds in a row/
Too bad the ability to correctly predict whether a tegra has hit a brief bad stretch, or has turned into pure blood-dripping evil, isn't more broadly useful.
Comment 12•12 years ago
|
||
I've now sent Milla Jovovich after this Zombie Device, and she did her dastardliest. (she found the stop_cp.sh command and ran it)
Comment 13•12 years ago
|
||
rumor has it that DCOps learned some new magic tricks at hogwarts recently, or maybe they are the same tricks with just more flair. Either way, lets see if this tegra falls for the tricks, or just tries to eat more brains. -- Assigning to recovery one last time (if this fails within a week of recovery, lets kill it with thermite)
Depends on: 792316
Comment 14•12 years ago
|
||
So they reimaged, start_cp running now for it
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 15•12 years ago
|
||
49 straight retries.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•12 years ago
|
Summary: tegra-036 problem tracking → [disable me] tegra-036 problem tracking
Updated•12 years ago
|
Summary: [disable me] tegra-036 problem tracking → tegra-036 problem tracking
Reporter | ||
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 16•12 years ago
|
||
After an impressive run of 54 retires in a row, it tripped over its own feet, did a purple and check itself offline.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•12 years ago
|
||
3 passes out of 25 runs:
https://secure.pub.build.mozilla.org/buildapi/recent/tegra-036
Please can this tegra be disabled until investigated/reimaged/beaten with a bat.
Severity: normal → major
Updated•12 years ago
|
Comment 19•12 years ago
|
||
err, stopped.
Comment 20•12 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #19)
> err, stopped.
*actually* stopped now -- apparantly it wasn't
bash-3.2$ ps auxwww | grep tegra-036
cltbld 28349 0.0 0.1 2447024 2824 ?? S 23Oct12 4:53.19 /opt/local/Library/Frameworks/Python.frame
work/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-036 --debug
cltbld 28348 0.0 0.1 2457020 3808 ?? S 23Oct12 1:39.39 /opt/local/Library/Frameworks/Python.frame
work/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-036 --debug
cltbld 59447 0.0 0.0 2425520 116 s002 R+ 3:42AM 0:00.00 grep tegra-036
Updated•12 years ago
|
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 21•12 years ago
|
||
No jobs taken on this device for >= 7 weeks
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 22•12 years ago
|
||
(mass change: filter on tegraCallek02reboot2013)
I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.
---
Command I used (with a manual patch to the fabric script to allow this command)
(fabric)[jwood@dev-master01 fabric]$ python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046 048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra
The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy
Comment 23•12 years ago
|
||
Buildbot is properly connected and happy now.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Comment 24•11 years ago
|
||
Wednesday, October 02, 2013 4:21:17 AM
Comment 25•11 years ago
|
||
flashed and reimaged. if it fails again, we should replace the SD card.
Comment 26•11 years ago
|
||
Back in production
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Comment 27•11 years ago
|
||
SD card replaced, tegra reimaged and flashed.
[vle@admin1a.private.scl3 ~]$ fping tegra-036.tegra.releng.scl3.mozilla.com
tegra-036.tegra.releng.scl3.mozilla.com is alive
Comment 28•11 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 29•10 years ago
|
||
Hasn't taken a job for 13 days.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•10 years ago
|
QA Contact: armenzg → bugspam.Callek
Reporter | ||
Comment 30•10 years ago
|
||
Disabled in slavealloc to stop the pointless stream of reboots.
Comment 31•10 years ago
|
||
SD card formatted, tegra flashed and reimaged.
vle@vle-10516 ~ $ telnet tegra-036.tegra.releng.scl3.mozilla.com 20701
Trying 10.26.85.23...
Connected to tegra-036.tegra.releng.scl3.mozilla.com.
Escape character is '^]'.
$>^]
telnet> q
Reporter | ||
Comment 32•10 years ago
|
||
Reenabled.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•