Closed
Bug 838438
(tegra-129)
Opened 12 years ago
Closed 11 years ago
tegra-129 problem tracking
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Unassigned)
References
()
Details
(Whiteboard: [buildduty][buildslaves][capacity])
No jobs taken on this device for > 3 week (< 6 weeks)
Reporter | ||
Comment 1•12 years ago
|
||
(mass change: filter on tegraCallek02reboot2013)
I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.
---
Command I used (with a manual patch to the fabric script to allow this command)
(fabric)[jwood@dev-master01 fabric]$ python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046 048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra
The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy
Reporter | ||
Comment 2•12 years ago
|
||
had to cycle clientproxy to bring this back
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•12 years ago
|
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Reporter | ||
Comment 3•12 years ago
|
||
last job: Wednesday, September 04, 2013 5:54:36 PM
Comment 4•12 years ago
|
||
flashed and reimaged
Comment 5•12 years ago
|
||
Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 6•11 years ago
|
||
Power cycled, waited a day.
error.flg: Automation Error: Unable to connect to device after 5 attempts
SD card reformat was successful:
$>exec newfs_msdos -F 32 /dev/block/vold/179:9
newfs_msdos: warning, /dev/block/vold/179:9 is not a character device
newfs_msdos: Skipping mount checks
/dev/block/vold/179:9: 15110464 sectors in 236101 FAT32 clusters (32768 bytes/cluster)
bps=512 spc=64 res=32 nft=2 mid=0xf0 spt=16 hds=4 hid=0 bsec=15114240 bspf=1845 rdcl=2 infs=1 bkbs=2
return code [0]
$>exec rebt
$>^]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 7•11 years ago
|
||
fails to connect to buildbot:
./check.sh:
2014-04-08 16:26:01,215 tegra-129 p online active OFFLINE ::
watcher.log:
2714 04/08/2014 16:25:04: INFO: Unable to determine state from Mozpool, falling back to device checks
2715 04/08/2014 16:25:04: INFO: INFO: attempting to ping device
2716 04/08/2014 16:25:04: DEBUG: calling [ping -c 5 tegra-129]
2717 04/08/2014 16:25:08: INFO: Connecting to: tegra-129
2718 04/08/2014 16:25:08: INFO: INFO: Unable to connect to device after 1 try
2719 04/08/2014 16:25:08: INFO: We're going to sleep for 90 seconds
2720 04/08/2014 16:26:38: INFO: Connecting to: tegra-129
2721 04/08/2014 16:26:38: INFO: INFO: Unable to connect to device after 2 try
2722 04/08/2014 16:26:38: INFO: We're going to sleep for 90 seconds
2723 04/08/2014 16:28:08: INFO: Connecting to: tegra-129
2724 04/08/2014 16:28:08: INFO: INFO: Unable to connect to device after 3 try
2725 04/08/2014 16:28:08: INFO: We're going to sleep for 90 seconds
2726 04/08/2014 16:29:38: INFO: Connecting to: tegra-129
2727 04/08/2014 16:29:38: INFO: INFO: Unable to connect to device after 4 try
2728 04/08/2014 16:29:38: INFO: We're going to sleep for 90 seconds
2729 2014-04-08 16:30:01 -- *** ERROR *** failed to aquire lockfile
2730 04/08/2014 16:31:08: INFO: Connecting to: tegra-129
2731 04/08/2014 16:31:08: INFO: /builds/tegra-129/error.flg
2732 04/08/2014 16:31:38: INFO: verifyDevice: failing to telnet
2733 reconnecting socket
2734 reconnecting socket
2735 reconnecting socket
2736 reconnecting socket
2737 reconnecting socket
2738 Automation Error: Unable to connect to device after 5 attempts
2739 2014-04-08 16:31:38 -- Verify procedure failed
2740 2014-04-08 16:31:38 -- *** ERROR *** Exiting due to verify failure
not sure what to do.
Comment 8•11 years ago
|
||
slave has been rebooted many times but connection can not be made.
my vote is to file for a re-image. This is an escalation step from us re-formatting during https://bugzilla.mozilla.org/show_bug.cgi?id=838438#c6.
Comment 9•11 years ago
|
||
replaced sd card, reimaged and flashed.
[vle@admin1a.private.scl3 ~]$ telnet tegra-129.tegra.releng.scl3.mozilla.com 20701
Trying 10.26.85.96...
Connected to tegra-129.tegra.releng.scl3.mozilla.com.
Escape character is '^]'.
$>^]q
Updated•11 years ago
|
QA Contact: armenzg → bugspam.Callek
Comment 10•11 years ago
|
||
Reenabled.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Comment 11•11 years ago
|
||
Attempting SSH reboot...Failed.
Attempting PDU reboot...Failed.
Filed IT bug for reboot (bug 1018441)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 12•11 years ago
|
||
sd card formatted, tegra flashed and reimaged.
vle@vle-10516 ~ $ telnet tegra-129.tegra.releng.scl3.mozilla.com 20701
Trying 10.26.85.96...
Connected to tegra-129.tegra.releng.scl3.mozilla.com.
Escape character is '^]'.
$>^]
telnet> q
Updated•11 years ago
|
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•