Closed Bug 610925 Opened 14 years ago Closed 13 years ago

Very frequent "unrecognized output format" running Talos tpan tsvg tzoom tp4m or twinopen on Android Tegra 250s

Categories

(Testing :: Talos, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: jmaher)

References

Details

(Keywords: intermittent-failure, Whiteboard: [android][talos][android_tier_1])

Not sure whether to blame this on releng or Testing: Talos...

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1289362164.1289362448.32300.gz
Android Tegra 250 tracemonkey talos remote-tpan on 2010/11/09 20:09:24 

tegra-003: 
		Started Tue, 09 Nov 2010 20:10:08
Running test tpan: 
		Started Tue, 09 Nov 2010 20:10:08
reconnecting socket
FIRE PROC: 'org.mozilla.fennec  -profile /mnt/sdcard/tests/profile http://bm-remote.build.mozilla.org/getInfo.html'
reconnecting socket
FIRE PROC: 'org.mozilla.fennec  -profile /mnt/sdcard/tests/profile http://bm-remote.build.mozilla.org/startup_test/fennecmark/fennecmark.html?test=PanDown%26webServer=bm-remote.build.mozilla.org'
reconnecting socket
pushing directory: /tmp/tmpzx9Gky/profile to /mnt/sdcard/tests/profile
	Screen width/height:0/0
	colorDepth:24
	Browser inner width/height: 1366/743

NOISE: __startSecondTimestamp1289362415796__endSecondTimestamp
Failed tpan: 
		Stopped Tue, 09 Nov 2010 20:14:05
FAIL: Busted: tpan
FAIL: unrecognized output format
Completed test tpan: 
		Stopped Tue, 09 Nov 2010 20:14:05
(In reply to comment #0)
> Not sure whether to blame this on releng or Testing: Talos...
> 

I'm working with jmaher to find out which side of the fence the issue is
I sometimes see similar issues on the N900s, mainly on the older codepaths.
When I've seen this it's been due to using older versions of the pageloader.
Sheesh. I really need to learn to look at more than just the current orange.

tpan hasn't ever succeeded, TraceMonkey or Mobile.

Hidden on TraceMonkey.
No longer blocks: 438871
Summary: Intermittent "FAIL: unrecognized output format" running talos on Tegra 250s → Talos tpan on Tegra 250s has never worked
Whiteboard: [orange]
there are two pieces here:
1) the builds are static on 10/26
2) there is a bug in fennecmark that seems to be overwriting the log file (I have a pending hack for it)

Before this was turned on, bear and myself had seen this pass dozens of times on device (me in a standalone environment, bear in a end to end staging environment).
Assignee: nobody → bear
Whiteboard: [android][talos]
Got better at some point, unhidden on TM.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
I suspect the big batch of these today is the build isn't working.
Mobile appears to not have a working build, producing "initialization timed out" on every flavor of Talos, but that big bunch are TraceMonkey, from Tuesday to today, on runs where only tpan failed.
(In reply to comment #6)
> When I've seen this it's been due to using older versions of the pageloader.

An |hg ident| in /builds/talos-data/pageloader@mozilla.org/ on foopy0{2,3,4} gives me 31249cbe4f19, which is tip of hg.mozilla.org/build/pageloader .

Looking at the history of these, it's an intermittent orange -- less than 50% of runs, but definitely a common occurrence.
Looking at the logs, it appears that the browser is dumping some info:

reconnecting socket
FIRE PROC: ' "MOZ_CRASHREPORTER_SHUTDOWN=1,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1" org.mozilla.fennec  -profile /mnt/sdcard/tests/profile http://bm-remote.build.mozilla.org/getInfo.html'
reconnecting socket
FIRE PROC: ' "MOZ_CRASHREPORTER_SHUTDOWN=1,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1" org.mozilla.fennec  -profile /mnt/sdcard/tests/profile http://bm-remote.build.mozilla.org/startup_test/fennecmark/fennecmark.html?test=PanDown%26webServer=bm-remote.build.mozilla.org'
reconnecting socket
pushing directory: /tmp/tmptt9SmZ/profile to /mnt/sdcard/tests/profile


This may be interfering with talos' ability to pick out the data it wants from the browser - causing the unrecognized output error.
Hm, looking closer it gets passed the extra dump from the browser successfully.

It chokes because the test is only dumping the second piece of information that talos requires - the shutdown timestamp.  There appears to be no information at all provided by the browser after the first cycle of the test.
the test is failing to run.  It could be many reasons, but if the test isn't
outputting the expected stdout, we get this error.  Since this passes something
(likewise tzoom and tp4) it falls into the random orange category.

My thoughts are the browser hangs either early on or part way through the test.
 It could be an exception internal to the browser or a lack of memory.
Depends on: 649215
Whiteboard: [android][talos] → [android][talos][orange]
Blocks: 438871
Philor,

marking this as fixed since it hasn't appeared to have happened in over a week with all the Talos and Tracemonkey activity.
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
I'm going to risk the wrathe of Joel and punt this one over the fence to the ateam - the logs show that the test is running and it's the framework saying the error and the test finishes and the tegra reboots.

If there is something that I can do, then I need a cluestick reminder
Assignee: bear → nobody
Component: Release Engineering → New Frameworks
Product: mozilla.org → Testing
QA Contact: release → new-frameworks
Version: other → unspecified
running this locally, I see that fennec is launched, but appears to crash prematurely and it outputs nothing to the log file (it does create the log file though).

In watching logcat, I see this when we fail:
I/GeckoAppJava( 3493): GeckoAppShell.alertsProgressListener_OnCancel('addons'
D/dalvikvm( 1488): GC_EXPLICIT freed 396 objects / 60424 bytes in 19ms
.
:
I/ActivityManager( 1027): Process org.mozilla.fennec_aurora (pid 3493) has died.
I/WindowManager( 1027): WIN DEATH: Window{444de290 org.mozilla.fennec_aurora/org.mozilla.fennec_aurora.App paused=false}
I/WindowManager( 1027): WIN DEATH: Window{444f0cc8 SurfaceView paused=false}
D/Zygote  (  939): Process 3493 exited cleanly (1)


and I see this when the test is running fine:
I/GeckoAppJava( 3493): GeckoAppShell.alertsProgressListener_OnCancel('addons'
D/dalvikvm( 1488): GC_EXPLICIT freed 388 objects / 60064 bytes in 51ms
.
:
I/GeckoAppJava( 4080): XRE exited
I/GeckoAppJava( 4080): we're done, good bye
I/GeckoApp( 4080): pause
E/JavaBinder( 4080): Unknown binder error code. 0xfffffff7
I/AndroidRuntime( 4080): AndroidRuntime onExit calling exit(0)
I/ActivityManager( 1027): Process org.mozilla.fennec_aurora (pid 4080) has died.
I/WindowManager( 1027): WIN DEATH: Window{44587200 org.mozilla.fennec_aurora/org.mozilla.fennec_aurora.App paused=false}
I/WindowManager( 1027): WIN DEATH: Window{445887f0 SurfaceView paused=false}


notice the extra GeckoAppJava messages.  Also, this doesn't leave the tegra in an unusable state, I am able to run other tests on it successfully without a reboot.
Is the premature crash a Fennec bug?
it is hard to tell, it could be a side effect of the way fennecmark works.
in running this test many times locally, I can sometimes get it to pass 10 times in a row and other times get it to fail 5 times in a row.  There is no additional information from logcat and no information written to the log file (although I know the file was truncated, so fennec did launch and get part of the way through fennecmark).

I need to spend an hour or two and see if fennecmark is causing the problem or if fennec is.  Anybody can chime in with ideas!
No longer depends on: 649215
Depends on: 649215
Blocks: 661896
Depends on: 662936
Dumping all the flavors of Talos into here per bug 650650 comment 1.
Summary: Very frequent "Unrecognized output format" running Talos tpan on Tegra 250s → Very frequent "Unrecognized output format" running Talos tpan tsvg tzoom tp4m or twinopen on Android Tegra 250s
Summary: Very frequent "Unrecognized output format" running Talos tpan tsvg tzoom tp4m or twinopen on Android Tegra 250s → Very frequent "unrecognized output format" running Talos tpan tsvg tzoom tp4m or twinopen on Android Tegra 250s
Status: REOPENED → NEW
Component: New Frameworks → Talos
QA Contact: new-frameworks → talos
Joel: any progress since 20may? As you can see from the comments, this is still happening in production frequently.
John: see bug 662936 for the work we are doing on this bug.
OK, even I had to get bored with the copy-pasting eventually. Just assume this still happens constantly.
Joel, can we make run_tests.py dump out the output its not recognizing so we can get more info for this?
the unrecognized output error is because there is no output.  Basically talos didn't match any of the regex's on a blank string.
tp4m is practically permaorange atm, I can hardly find green runs. Is someone actively working on this issue?
we are unable to find the crash because the crashdump goes into libc and we don't have symbols for that.
Depends on: 675750
Whiteboard: [android][talos][orange] → [android][talos][orange][android_tier_1]
Version: unspecified → Trunk
Assignee: nobody → jmaher
http://tbpl.allizom.org/php/getParsedLog.php?id=6042637

(first one I've seen, or at least noticed, since the rm /etc/hosts)
I think this bug is resolved by bug 662936 (removing the /etc/hosts file).  Only one comment in here regarding a similar failure.  I cannot get to the log right now, but I think we are in the clear with this bug.  

I think any new crashes should be filed as new bugs as the root problems and affected tests will be different than what this has historically been.

Any objections to marking this as resolved?
Status: NEW → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Whiteboard: [android][talos][orange][android_tier_1] → [android][talos][android_tier_1]
You need to log in before you can comment on or make changes to this bug.