Bug 722001 (tegra-118)

tegra-118 problem tracking

RESOLVED FIXED

Status

Infrastructure & Operations
CIDuty
P3
normal
RESOLVED FIXED
7 years ago
a month ago

People

(Reporter: philor, Unassigned)

Tracking

Details

(Whiteboard: [badslave][buildduty], URL)

(Reporter)

Description

7 years ago
Since the evening of January 13th, it has done over 200 jobs without ever hitting anything but exception and retry.
I called './stop_cp.sh tegra-118' on foopy14 to (hopefully) prevent it doing more jobs.

Updated

7 years ago
Depends on: 722873

Updated

7 years ago
Priority: -- → P3
Assignee: nobody → bhearsum
With the dependent bug fixed, I restarted this tegra.
This tegra is back in production again. The first job it ran was purple, but the next two were green.
Alias: tegra-118
Assignee: bhearsum → nobody
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Resolution: --- → FIXED
Summary: tegra-118 needs help → tegra-118 problem tracking
Duplicate of this bug: 731925

Updated

6 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

6 years ago
Depends on: 778812

Updated

6 years ago
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
failed last 7 jobs
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
ran stop_cp.sh on this tegra
Went offline midjob, trying a PDU reboot.
Back in production.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 9

6 years ago
21 timeouts in verify.py in a row.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Updated

6 years ago
Summary: tegra-118 problem tracking → [disable me] tegra-118 problem tracking
Summary: [disable me] tegra-118 problem tracking → tegra-118 problem tracking
This tegra was doing ok, now it's having a hard time again. Probably needs recovery. I ran stop_cp.sh on it.
Blocks: 806950
No longer blocks: 806950
Depends on: 806950
Back in production.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Reporter)

Comment 19

6 years ago
34% green.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Updated

6 years ago
Depends on: 808437
Ran ./stop_cp.sh
Blocks: 808468

Updated

6 years ago
Blocks: 813012

Updated

6 years ago
No longer blocks: 813012
Brought back to life.
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
...and dying again, please reimage
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

6 years ago
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
No jobs taken on this device for >= 7 weeks
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(mass change: filter on tegraCallek02reboot2013)

I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.

---
Command I used (with a manual patch to the fabric script to allow this command)

(fabric)[jwood@dev-master01 fabric]$  python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046  048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra

The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy

Updated

5 years ago
Depends on: 838687

Updated

5 years ago
No longer blocks: 808468
Depends on: 808468
had to cycle clientproxy to bring this back
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering
pdu reboot didn't fix this one
Status: RESOLVED → REOPENED
Depends on: 912682
Resolution: FIXED → ---
recovery didn't help, dunno what to do

Updated

5 years ago
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
2014-01-16 10:35:14 tegra-118 p    online   active  OFFLINE :: error.flg [Automation Error: Unable to connect to device after 5 attempts] 

pdu reboot didn't help
Status: RESOLVED → REOPENED
Depends on: 960642
Resolution: FIXED → ---
SD card replaced & reimaged/flashed.
(In reply to Eric Ramirez [:Eric] from comment #29)
> SD card replaced & reimaged/flashed.

can we try again?
Depends on: 974917

Comment 31

4 years ago
SD card formatted, tegra reimaged and flashed.

[vle@admin1a.private.scl3 ~]$ fping tegra-118.tegra.releng.scl3.mozilla.com
tegra-118.tegra.releng.scl3.mozilla.com is alive

Updated

4 years ago
Depends on: 971859
(Reporter)

Updated

4 years ago
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
QA Contact: armenzg → bugspam.Callek
Resolution: --- → FIXED

Updated

a month ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.