Closed Bug 977945 Opened 6 years ago Closed 6 years ago

Intermittent PROCESS-CRASH | java-exception | java.lang.StackOverflowError at java.lang.reflect.Constructor.constructNative(Native Method) | followed by testSessionHistory | GeckoEventExpecter - blockForEvent timeout: Gecko:Ready

Categories

(Firefox for Android :: Testing, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED INVALID

People

(Reporter: KWierso, Unassigned)

References

Details

(Keywords: crash, intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=35381912&tree=Mozilla-Inbound
slave: tegra-051



63 INFO TEST-PASS | testInputConnection | Can update composition - foobar should equal foobar
64 INFO TEST-PASS | testInputConnection | Can update composition - 6 should equal 6
65 INFO TEST-PASS | testInputConnection | Can update composition - 6 should equal 6
66 INFO TEST-PASS | testInputConnection | Can finish composition - foobar should equal foobar
67 INFO TEST-PASS | testInputConnection | Can finish composition - 6 should equal 6
68 INFO TEST-PASS | testInputConnection | Can finish composition - 6 should equal 6
69 INFO TEST-END | testInputConnection | finished in 9830ms
70 INFO TEST-START | Shutdown
71 INFO Passed: 67
72 INFO Failed: 0
73 INFO Todo: 0
74 INFO SimpleTest FINISHED
INFO | automation.py | Application ran for: 0:00:13.531426
INFO | zombiecheck | Reading PID log: /tmp/tmp_U_8Etpidlog
/data/anr/traces.txt not found
PROCESS-CRASH | java-exception | java.lang.StackOverflowError at java.lang.reflect.Constructor.constructNative(Native Method)
WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
runtests.py | Running tests: end.
Mochi-Remote WARNING | cleaning up pidfile '/builds/tegra-051/test/../runtestsremote.pid' was unsuccessful from the test harness
Mochi-Remote ERROR   | runTests() exited with code 1
Mochi-Remote INFO    | 
--------- beginning of /dev/log/system
02-27 15:40:51.689 I/ActivityManager( 1021): Force stopping package org.mozilla.fennec uid=10033
02-27 15:40:51.709 I/ActivityManager( 1021): Start proc org.mozilla.fennec for added application org.mozilla.fennec: pid=7961 uid=10033 gids={3003, 1015, 1006}
02-27 15:40:51.886 I/ActivityManager( 1021): Starting activity: Intent { act=android.intent.action.MAIN flg=0x10000000 cmp=org.mozilla.fennec/.App (has extras) }
02-27 15:40:51.926 W/InputManagerService( 1021): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@4838b968 (uid=10032 pid=1505)
--------- beginning of /dev/log/main
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to find class referenced in signature (Landroid/view/accessibility/AccessibilityNodeInfo;)
02-27 15:40:52.126 I/dalvikvm( 7961): Could not find method android.view.accessibility.AccessibilityEvent.setToIndex, referenced from method org.mozilla.gecko.GeckoAccessibility.populateEventFromJSON
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to resolve virtual method 4294: Landroid/view/accessibility/AccessibilityEvent;.setToIndex (I)V
02-27 15:40:52.126 I/dalvikvm( 7961): Could not find method android.view.accessibility.AccessibilityEvent.setMaxScrollX, referenced from method org.mozilla.gecko.GeckoAccessibility.populateEventFromJSON
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to resolve virtual method 4285: Landroid/view/accessibility/AccessibilityEvent;.setMaxScrollX (I)V
02-27 15:40:52.126 I/dalvikvm( 7961): Could not find method android.view.accessibility.AccessibilityNodeInfo.obtain, referenced from method org.mozilla.gecko.GeckoAccessibility.sendAccessibilityEvent
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to resolve static method 4334: Landroid/view/accessibility/AccessibilityNodeInfo;.obtain (Landroid/view/View;I)Landroid/view/accessibility/AccessibilityNodeInfo;
02-27 15:40:52.126 I/dalvikvm( 7961): Could not find method android.view.accessibility.AccessibilityNodeInfo.setEnabled, referenced from method org.mozilla.gecko.GeckoAccessibility.sendAccessibilityEvent
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to resolve virtual method 4347: Landroid/view/accessibility/AccessibilityNodeInfo;.setEnabled (Z)V
02-27 15:40:52.126 W/dalvikvm( 7961): Unable to resolve superclass of Lorg/mozilla/gecko/GeckoAccessibility$GeckoAccessibilityDelegate; (643)
02-27 15:40:52.126 W/dalvikvm( 7961): Link of class 'Lorg/mozilla/gecko/GeckoAccessibility$GeckoAccessibilityDelegate;' failed
02-27 15:40:52.126 E/dalvikvm( 7961): Could not find class 'org.mozilla.gecko.GeckoAccessibility$GeckoAccessibilityDelegate', referenced from method org.mozilla.gecko.GeckoAccessibility.setDelegate
02-27 15:40:52.126 W/dalvikvm( 7961): VFY: unable to resolve new-instance 1675 (Lorg/mozilla/gecko/GeckoAccessibility$GeckoAccessibilityDelegate;) in Lorg/mozilla/gecko/GeckoAccessibility;
02-27 15:40:52.126 W/GeckoEventDispatcher( 7961): unregisterEventListener: event 'Prompt:ShowTop' has no listeners
02-27 15:40:52.136 I/GeckoApp( 7961): Creating HealthRecorder.



LATER


Thread[GeckoBackgroundThread,5,main]
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:288)
android.os.MessageQueue.next(MessageQueue.java:146)
android.os.Looper.loop(Looper.java:110)
org.mozilla.gecko.util.GeckoBackgroundThread.run(GeckoBackgroundThread.java:32)

Thread[HeapWorker,5,system]
dalvik.system.NativeStart.run(Native Method)

Thread[Binder Thread #1,5,main]
dalvik.system.NativeStart.run(Native Method)


2 INFO TEST-UNEXPECTED-FAIL | testSessionHistory | GeckoEventExpecter - blockForEvent timeout: Gecko:Ready
3 INFO TEST-END | testSessionHistory | finished in 90087ms
4 INFO TEST-START | Shutdown
5 INFO Passed: 0
6 INFO Failed: 1
7 INFO Todo: 0
8 INFO SimpleTest FINISHED
INFO | automation.py | Application ran for: 0:01:33.456031
INFO | zombiecheck | Reading PID log: /tmp/tmpVirviipidlog
/data/anr/traces.txt not found
Automation Error: No crash directory (/mnt/sdcard/tests/profile/minidumps/) found on remote device
WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
runtests.py | Running tests: end.
Mochi-Remote WARNING | cleaning up pidfile '/builds/tegra-051/test/../runtestsremote.pid' was unsuccessful from the test harness
Mochi-Remote ERROR   | runTests() exited with code 1
Mochi-Remote INFO    | 
--------- beginning of /dev/log/main
02-27 15:41:10.906 I/SUTAgentAndroid( 1505): 10.26.84.12 : mkdr /mnt/sdcard/tests/profile/extensions/staged/special-powers@mozilla.org/chrome/specialpowers/modules
02-27 15:41:10.936 I/SUTAgentAndroid( 1505): 10.26.84.12 : push /mnt/sdcard/tests/profile/extensions/staged/special-powers@mozilla.org/chrome/specialpowers/modules/MockFilePicker.jsm 7455
02-27 15:41:10.966 I/SUTAgentAndroid( 1505): 10.26.84.12 : push /mnt/sdcard/tests/profile/extensions/staged/special-powers@mozilla.org/chrome/specialpowers/modules/MockColorPicker.jsm 3923
02-27 15:41:10.996 I/SUTAgentAndroid( 1505): 10.26.84.12 : push /mnt/sdcard/tests/profile/extensions/staged/special-powers@mozilla.org/chrome/specialpowers/modules/MockPermissionPrompt.jsm 3053
02-27 15:41:11.026 I/SUTAgentAndroid( 1505): 10.26.84.12 : isdir /mnt/sdcard/tests/profile/extensions/staged/special-powers@mozilla.org/chrome/specialpowers/content
02-27 15:41:11.136 I/SUTAgentAndroid( 1505): 10.26.84.12 : isdir /mnt
02-27 15:41:11.156 I/SUTAgentAndroid( 1505): 10.26.84.12 : isdir /mnt/sdcard
02-27 15:41:11.176 I/SUTAgentAndroid( 1505): 10.26.84.12 : isdir /mnt/sdcard/tests
Bug 968409 landed testInputConnection just before this bug was created - I wonder if it isn't related.
The output in comment 0 and comment 17 appear to appear after the 14th assertion of testInputConnection. However, why then, does the test continue? Perhaps the output is not being returned in order, though it's curious that it occurred after assertion 14 each time.
02-27 15:41:00.679 W/dalvikvm( 1210): Stack overflow on call to interp (req=524320 top=0x430cc000 cur=0x430cbba4 size=12288 Lcom/android/inputmethod/latin/LatinKeyboardView;.<init>)
02-27 15:41:00.679 I/dalvikvm( 1210): threadid=1: stack overflow on call to Lcom/android/inputmethod/latin/LatinKeyboardView;.<init>:VLL

Perhaps the IME gets killed instead of Fennec?
Assertion 14 from comment 19 is:

02-27 15:41:00.439 I/Robocop ( 7961): 14 INFO TEST-PASS | testInputConnection | Given message occurred for registered event: {"errorType":"","bgColor":"transparent","type":"DOMContentLoaded","tabID":0} - DOMContentLoaded should equal DOMContentLoaded

It's unrelated to the IME and nothing specific to testInputConnection appears to run yet - the line of the assertion is [1], which is just loading the initial page. However, if the page has begun loading, we do focus the <input> element on the page, which should open the IME. If the IME is killed and restarted for this StackOverflowError, then it makes sense that the test could continue running.

[1]: https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/tests/testInputConnection.java?rev=88184c55d335#22
(In reply to Michael Comella (:mcomella) from comment #21)
> However, if the page has begun loading, we do focus the
> <input> element on the page, which should open the IME.

Specifically, https://mxr.mozilla.org/mozilla-central/source/mobile/android/base/tests/robocop_input.html?rev=ca6fe7cea127&force=1#14
Jim, any ideas?
Flags: needinfo?(nchen)
(In reply to Michael Comella (:mcomella) from comment #44)
> Jim, any ideas?

From the logcat,

> 04-15 09:04:20.897 W/dalvikvm( 1211): Stack overflow on call to interp (req=524320 top=0x430cc000 cur=0x430cbba4 size=12288 Lcom/android/inputmethod/latin/LatinKeyboardView;.<init>)
> 04-15 09:04:20.897 I/dalvikvm( 1211): threadid=1: stack overflow on call to Lcom/android/inputmethod/latin/LatinKeyboardView;.<init>:VLL
> 04-15 09:04:20.897 I/dalvikvm( 1211):   method requires 262140+20+262140=524300 bytes, fp is 0x430cbb90 (10384 left)

I think it's a bug in Froyo dalvik. 262140 is a nonsensical value (4 * 0xFFFF).

I think the best we can do is to ignore this particular exception in the test harness.
Flags: needinfo?(nchen)
(In reply to Jim Chen [:jchen :nchen] from comment #46)
> I think the best we can do is to ignore this particular exception in the
> test harness.

gbrown, where can we hook in?
Flags: needinfo?(gbrown)
PROCESS-CRASH | java-exception | java.lang.StackOverflowError at java.lang.reflect.Constructor.constructNative(Native Method)

comes from mozcrash.check_for_java_exception -- http://hg.mozilla.org/mozilla-central/annotate/1f932e462b84/testing/mozbase/mozcrash/mozcrash/mozcrash.py#l164.

I originally introduced this functionality in bug 823452, scraping logcat for "REPORTING UNCAUGHT EXCEPTION", which in all likelihood would only come from Fennec. 

:edmorley expanded this in bug 872116 to also catch "FATAL EXCEPTION", which seemed reasonable at the time, but can come from other processes.

Now that I see this bug, where we are reporting exceptions that are not even thrown by Fennec, I wonder if checking for FATAL EXCEPTION is appropriate. Or could we still check for FATAL EXCEPTION but restrict this somehow to only report exceptions coming from Fennec?

:edmorley -- any thoughts?
Flags: needinfo?(gbrown) → needinfo?(emorley)
(In reply to Geoff Brown [:gbrown] from comment #48)
> Now that I see this bug, where we are reporting exceptions that are not even
> thrown by Fennec, I wonder if checking for FATAL EXCEPTION is appropriate.
> Or could we still check for FATAL EXCEPTION but restrict this somehow to
> only report exceptions coming from Fennec?
> 
> :edmorley -- any thoughts?

The regex was tweaked so we'd provide more parsable log output to pick up bug 870762 (and presumably more since then, but since the differentiation between "REPORTING UNCAUGHT EXCEPTION" and "FATAL EXCEPTION" isn't surfaced in the parsed line, I don't know for what other bugs). 

For the failure mode in this bug, what is the desired outcome? Should the tests have actually be marked as passing? I'm presuming not, given the other robocop failures in the log. Therefore I presume we're just saying that the java-exception log line is misleading and is noise that we're hoping to suppress? (My preferred next step depends on the answer to this :-))
Flags: needinfo?(emorley) → needinfo?(gbrown)
Yes, in the context of this bug, I think we're just saying that the java-exception log line is misleading and is noise that we're hoping to suppress. The other robocop failures would remain and the test job would still fail.

I think it is conceivable that in other circumstances, some system component could report FATAL EXCEPTION unrelated to Fennec or our tests and we would want to ignore it.

But then again, FATAL EXCEPTION checking has not caused us any other trouble that I can think of, and any component reporting a FATAL EXCEPTION during a test is an unusual, possibly interesting, possibly dangerous condition.
Flags: needinfo?(gbrown)
Jim, come to think of it, we never seem to crash on a lone StackOverflowError - there always appears to be a failure following it. Do you think there's any correlation here?
Flags: needinfo?(nchen)
(In reply to Michael Comella (:mcomella) from comment #52)
> Jim, come to think of it, we never seem to crash on a lone
> StackOverflowError - there always appears to be a failure following it. Do
> you think there's any correlation here?

Could be. Seems like the keyboard crash leaves the system in some kind of limbo state, and the next test gets affected by failing to initialize.
Flags: needinfo?(nchen)
This seems to only happen on tegra-051. Let's re-image that tegra and not worry about making any changes here (including the java exception handling that I was talking about).
(In reply to Geoff Brown [:gbrown] from comment #55)
> This seems to only happen on tegra-051. 

Oh yeah good spot! haha
Have disabled in slavealloc and commented in bug 778896.
Status: NEW → RESOLVED
Closed: 6 years ago
Depends on: tegra-051
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.