Closed Bug 778818 (tegra-179) Opened 12 years ago Closed 11 years ago

tegra-179 problem tracking

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

ARM
Android

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Unassigned)

References

()

Details

(Whiteboard: [buildduty][buildslaves][capacity])

      No description provided.
Depends on: 778812
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
offline, trying a pdu reboot.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Didn't respond to the PDU reboot. Needs recovery.
Depends on: 786767
Back in production.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Not only is this the only thing keeping bug 718929 in business, it's managing to fail elsewhere when it can't get a reftest job to fail that way on. Maybe needs a new SD card to go with a reimage?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
sal just said he would grab this now (since he is still around after *just* finishing 792316 for the others there).

We're replacing the sdcard and reimaging
Depends on: 792316
swapped sdcards and re-imaged. 

[sespinoza@natasha ~]$ ping tegra-179.build.mtv1
PING tegra-179.build.mtv1.mozilla.com (10.250.50.89) 56(84) bytes of data.
64 bytes from tegra-179.build.mtv1.mozilla.com (10.250.50.89): icmp_seq=1 ttl=60 time=21.8 ms

--- tegra-179.build.mtv1.mozilla.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 21.811/21.811/21.811/0.000 ms
ABSOLUTELY no better: https://secure.pub.build.mozilla.org/buildapi/recent/tegra-179

I don't know whats wrong, joel/clint do you guys want to peek here.... or shall we just decom it entirely?
(In reply to Justin Wood (:Callek) from comment #8)
> ABSOLUTELY no better:

(ran stop_cp.sh just now, and renamed buildbot.tac to buildbot.tac.disabled to prevent us from bringing it back on accident for now)
the only thing I see in the full log is:
E/GeckoConsole( 1541): [JavaScript Error: "DOMApplicationRegistry: Could not read from /mnt/sdcard/tests/reftest/webapps/webapps.json : [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIChannel.asyncOpen]"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: resource://gre/modules/NetUtil.jsm :: NetUtil_asyncOpen :: line 165"  data: no]" {file: "resource://gre/modules/Webapps.jsm" line: 184}]


I recall seeing this on try server the other day.  If this is a specific piece of hardware that is always doing this error I would be surprised.  That error is a good sign that either the foopy is not serving the files correctly, the profile is not setup, or the unknown.  Looking at the log, the profile is setup properly, so there is either the foopy messing up the webserver creation or the unknown.
Depends on: 793471
[TODO set to recovery]

reminder has buildbot.tac.disabled

and has already had stop_cp run.
Depends on: 806950
Went to recovery, but isn't a production machine.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Agent check is failing, not sure what to do about it. Callek?
Assignee: nobody → bugspam.Callek
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 808437
Ran ./stop_cp.sh
Blocks: 808468
Blocks: 813012
No longer blocks: 813012
Apparantly this is on the wrong PDU entry:

bash-3.2$ python sut_tools/tegra_powercycle.py tegra-179
12/04/2012 10:54:53: DEBUG: rebooting tegra-179 at pdu5.df202-1.build.mtv1.mozilla.com .AA2
SNMPv2-SMI::enterprises.1718.3.2.3.1.11.1.1.2 = INTEGER: 3
bash-3.2$ ping tegra-179
PING tegra-179.build.mtv1.mozilla.com (10.250.50.89): 56 data bytes
64 bytes from 10.250.50.89: icmp_seq=0 ttl=64 time=1.112 ms
64 bytes from 10.250.50.89: icmp_seq=1 ttl=64 time=1.153 ms

Please reimage + sdcard swap + verify PDU value
Depends on: 817995
Tegra-179 reimaged + sd card swapped

--- tegra-179.build.mtv1.mozilla.com ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Depends on: 824833
sut agent won't come up even after power cycle: has been down since 12-21-2012 22:49:47
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
in production after bug 824833 work
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
10:26 < nagios-releng> Tue 07:26:27 PST [457] tegra-179.build.mtv1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
pdu reboot requested
Assignee: bugspam.Callek → nobody
didn't work, off to recovery
Depends on: 825335
Tegra was reimaged. No SD swap.
Didn't work.  I rebooted the panda and still can't connect to the data port.
Depends on: 830739
Depends on: 833384
(mass change: filter on tegraCallek02reboot2013)

I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.

---
Command I used (with a manual patch to the fabric script to allow this command)

(fabric)[jwood@dev-master01 fabric]$  python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046  048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra

The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy
Depends on: 838687
No longer blocks: 808468
Depends on: 808468
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → FIXED
Back from recovery
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.