Bugzilla

Updated

•

12 years ago

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Comment 3

•

12 years ago

Went offline. Trying a PDU reboot.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 4

•

12 years ago

Back online.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Comment 5

•

12 years ago

Last Job 20 days, 19:39:58 ago

error.flg [Remote Device Error: Unable to properly remove /mnt/sdcard/tests]

remotely reformatted sdcard

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 6

•

12 years ago

Green Jobs again

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Armen [:armenzg]

Comment 7

•

12 years ago

Please swap the SD card.

error.flg [Remote Device Error: unable to write to sdcard]

Status: RESOLVED → REOPENED

Depends on: 802655

Resolution: FIXED → ---

Salvador Espinoza [:sal]

Comment 8

•

12 years ago

sd card swapped.

Reporter

Comment 9

•

12 years ago

36% green over the last 300 runs.

Blocks: 438871

Whiteboard: [buildduty][capacity][buildslaves] → [buildduty][capacity][buildslaves][orange]

Comment 10

•

12 years ago

Has been running jobs for awhile now.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Reporter

Comment 11

•

12 years ago

Yes, it has been running jobs for a while now. Please, please make it stop running jobs.

Dunno if we have a better report, but according to https://secure.pub.build.mozilla.org/buildapi/reports/slaves (once you "Show all entries" and sort by Slave Name to put the tegras together, because none of the other controls work), a non-broken tegra will run at better than 70% green, up to 100% green. This one has been under 50% for its last 50 jobs, under 50% for its last 100 jobs, under 50% for its last 500 jobs.

Randomly choosing from its failures which are currently under my nose, https://tbpl.mozilla.org/php/getParsedLog.php?id=16611152&tree=Mozilla-Inbound is it barely managing to stay awake long enough to get past verify.py, then falling asleep at the start of the test run. https://tbpl.mozilla.org/php/getParsedLog.php?id=16607708&tree=Mozilla-Inbound was a fairly normal failure, I didn't even notice it was 048, except that it's a "failed to initialize browser" of the sort where... it managed to stay awake through verify.py, and then fell asleep before the test could get started running. That run was a quick turnaround, because on the same push it had already run https://tbpl.mozilla.org/php/getParsedLog.php?id=16606802&tree=Mozilla-Inbound where it made it through verify.py, and then fell asleep before the test could start running.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 12

•

12 years ago

Seems to be running jobs. There's some red talos jobs with "failed to initialize" browser, but I think those are not indicative of tegra failure?

Reporter

Comment 13

•

12 years ago

Right, like most of our android failures, which are a very limited set of second-hand reports of something that someone heard was the thing that someone else could see through someone else's window, they are indicative of "talos tried to start up the browser, but didn't get the first thing back from the browser it expected to get." One thing to keep in mind while seeing that failure reports all look familiar is the fact that when a code push completely breaks the browser, so that it can't even start up, I can (and sometimes do) star every failure of a completely broken browser with an existing bug. We just don't get or report all that much detail, and we have so many intermittent failures that anything which can happen probably does have a bug.

In the case of failed to initialize, bug 686085, of the fairly small percentage of them that Orange Factor has actually noticed, https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=686085&endday=2012-10-31&startday=2012-10-24&tree=all, one tegra hit it twice, one tegra (of whom I'll now be suspicious) hit it five times, and this tegra hit it 14 times, half the instances of it. Assuming the instances Orange Factor bothers to notice are representative, disabling this slave should cut the instances of that hundreds-of-failures bug in half.

Reporter

Comment 14

•

12 years ago

Just before it took out your release-mozilla-beta_tegra_android-armv6_test-mochitest-1, it took out a reftest with a bug 689856 "just stop." Of the 18 instances of that in the last week that Orange Factor knows about, 13 were this slave (and 4 were 084, the same one I said in comment 13 I'd now be suspicious of).

Comment 15

•

12 years ago

Ran stop_cp on it. I have no idea what to do with this one, given that it just came back from recovery. Callek?

Assignee: nobody → bugspam.Callek

Comment 16

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #15)
> Ran stop_cp on it. I have no idea what to do with this one, given that it
> just came back from recovery. Callek?

Lets find a nearby garbage shute. Objections?

Reporter

Updated

•

12 years ago

Depends on: 808437

Reporter

Updated

•

12 years ago

No longer blocks: 438871

Whiteboard: [buildduty][capacity][buildslaves][orange] → [buildduty][capacity][buildslaves]

Comment 17

•

12 years ago

Ran ./stop_cp.sh

Blocks: 808468

Updated

•

12 years ago

Blocks: 813012

Updated

•

12 years ago

No longer blocks: 813012

Ed Morley [:emorley]

Updated

•

12 years ago

URL: https://secure.pub.build.mozilla.org/...

Comment 18

•

12 years ago

Back in production

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Comment 19

•

11 years ago

No jobs taken on this device for >= 7 weeks

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 20

•

11 years ago

(mass change: filter on tegraCallek02reboot2013)

I just rebooted this device, hoping that many of the ones I'm doing tonight come back automatically. I'll check back in tomorrow to see if it did, if it does not I'll triage next step manually on a per-device basis.

---
Command I used (with a manual patch to the fabric script to allow this command)

(fabric)[jwood@dev-master01 fabric]$  python manage_foopies.py -j15 -f devices.json `for i in 021 032 036 039 046  048 061 064 066 067 071 074 079 081 082 083 084 088 093 104 106 108 115 116 118 129 152 154 164 168 169 174 179 182 184 187 189 200 207 217 223 228 234 248 255 264 270 277 285 290 294 295 297 298 300 302 304 305 306 307 308 309 310 311 312 314 315 316 319 320 321 322 323 324 325 326 328 329 330 331 332 333 335 336 337 338 339 340 341 342 343 345 346 347 348 349 350 354 355 356 358 359 360 361 362 363 364 365 367 368 369; do echo '-D' tegra-$i; done` reboot_tegra

The command does the reboot, one-at-a-time from the foopy the device is connected from. with one ssh connection per foopy

Updated

•

11 years ago

Depends on: 838687

Comment 21

•

11 years ago

had to cycle clientproxy and its back up and happy.

Status: REOPENED → RESOLVED

Closed: 12 years ago → 11 years ago

Resolution: --- → FIXED

Updated

•

11 years ago

No longer blocks: 808468

Depends on: 808468

Comment 22

•

11 years ago

Hasn't run a job in 14 days, 4:37:08

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 23

•

11 years ago

Back in production

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

URL: https://secure.pub.build.mozilla.org/... → https://secure.pub.build.mozilla.org/...

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Comment 24

•

11 years ago

ping check failing, pdu reboot didn't help

Status: RESOLVED → REOPENED

Depends on: 944498

Resolution: FIXED → ---

Updated

•

11 years ago

Depends on: 949447

Eric Ramirez [:Eric]

Comment 25

•

11 years ago

sdcard replaced and reimaged/flashed.

Updated

•

10 years ago

Assignee: bugspam.Callek → nobody

Comment 26

•

10 years ago

handled in last recovery on Dec 16.

Status: REOPENED → RESOLVED

Closed: 11 years ago → 10 years ago

Resolution: --- → FIXED

Pete Moore [:pmoore][:pete]

Comment 27

•

10 years ago

2014-01-16 10:35:34 tegra-048 p    online   active  OFFLINE :: error.flg [Automation Error: Unable to connect to device after 5 attempts] 

pdu reboot didn't help

Status: RESOLVED → REOPENED

Depends on: 960642

Resolution: FIXED → ---

Eric Ramirez [:Eric]

Comment 28

•

10 years ago

SD card reformatted and flashed.

Updated

•

10 years ago

Assignee: nobody → pmoore

Pete Moore [:pmoore][:pete]

Comment 29

•

10 years ago

Investigation has shown that the device is running, and so is the sut agent, but the testroot command returns 'unable to determine testroot' which causes the verify.py script to fail, which causes watch_devices.sh to disable the buildbot slave to prevent it from taking jobs.

[cltbld@foopy112.tegra.releng.scl3.mozilla.com sut_tools]$ telnet tegra-048 20701
Trying 10.26.85.33...
Connected to tegra-048.
Escape character is '^]'.
$>testroot
##AGENT-WARNING##  unable to determine test root
$>quit
quit
$>Connection closed by foreign host.
[cltbld@foopy112.tegra.releng.scl3.mozilla.com sut_tools]$ 

I will now investigate why the test root cannot be determined...

Reporter

Updated

•

10 years ago

QA Contact: armenzg → bugspam.Callek