Closed Bug 691073 Opened 13 years ago Closed 13 years ago

while running jsreftests on my local tegra, I get a crash

Categories

(Core :: JavaScript Engine, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: jmaher, Assigned: mjrosenb)

References

Details

(Keywords: intermittent-failure, Whiteboard: [mobile_unittests][android_tier_1])

Attachments

(1 file)

currently we run jsreftests in 2 chunks on our automation.  If I run them all together in a single run on my local tegra, I get a crash.  I have tried a debug build of fennec, but that crashes on startup.  

There is nothing in the logfile or logcat to help indicate what is happening.  

I believe this is the cause of about half of the oranges we see on crashtest/jsreftest/reftest.
ok, so I reproduced this with a working debug build and I found this:
http://people.mozilla.org/~jmaher/reftest_debug_crash.log
please let me know if this stackdump is useful or not.  I have reproduced this 3 different times today, so this is definitely something fishy we have.
FWIW: 0xbbadbeef is a JS JIT assertion, and this particular assertion is at http://mxr.mozilla.org/mozilla-central/source/js/src/assembler/assembler/ARMAssembler.h#1286

Looks like a full instruction is passed to nameCC, but it only expects a conditional code. Should be easy to see who is doing this once we get a full stack trace.
(In reply to Jim Chen [:jchen] from comment #3)
> once we get a full stack trace.

Basically the current stack trace only has the top frame (the unwinder is not seeing libxul's text section for some reason?). A logcat stack trace would be good too.
So I reproduced this crash and I have a logcat output, but there is no data in logcat, here is essentially what I see:
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
D/dalvikvm( 1441): GC_EXPLICIT freed 405 objects / 54912 bytes in 22ms
I/wpa_supplicant( 1349): CTRL-EVENT-STATE-CHANGE id=-1 state=2
V/WifiMonitor( 1021): Event [CTRL-EVENT-STATE-CHANGE id=-1 state=2]
V/WifiStateTracker( 1021): Changing supplicant state: INACTIVE ==> SCANNING
D/NetworkStateTracker( 1021): setDetailed state, old =IDLE and new state=SCANNING
D/ConnectivityService( 1021): Dropping ConnectivityChange for WIFI: DISCONNECTED/SCANNING
I/wpa_supplicant( 1349): CTRL-EVENT-SCAN-RESULTS  Ready
D/dalvikvm( 1441): GC_EXPLICIT freed 182 objects / 13416 bytes in 21ms
D/dalvikvm( 1441): GC_EXPLICIT freed 209 objects / 14784 bytes in 19ms
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/wpa_supplicant( 1349): CTRL-EVENT-STATE-CHANGE id=-1 state=1
V/WifiMonitor( 1021): Event [CTRL-EVENT-STATE-CHANGE id=-1 state=1]
V/WifiStateTracker( 1021): Changing supplicant state: SCANNING ==> INACTIVE
D/NetworkStateTracker( 1021): setDetailed state, old =SCANNING and new state=IDLE
D/ConnectivityService( 1021): Dropping ConnectivityChange for WIFI: DISCONNECTED/IDLE
D/dalvikvm( 1441): GC_EXPLICIT freed 423 objects / 55624 bytes in 26ms
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
D/pull warning( 1441): more bytes read than expected
D/dalvikvm( 1441): GC_EXPLICIT freed 184 objects / 13504 bytes in 25ms
D/dalvikvm( 1441): GC_EXPLICIT freed 201 objects / 14424 bytes in 23ms
I/ActivityManager( 1021): Process org.mozilla.fennec (pid 6526) has died.
I/WindowManager( 1021): WIN DEATH: Window{445ce8a0 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/WindowManager( 1021): WIN DEATH: Window{44532f30 SurfaceView paused=false}
D/Zygote  (  939): Process 6526 exited cleanly (1)
I/UsageStats( 1021): Unexpected resume of com.mozilla.SUTAgentAndroid while already resumed in org.mozilla.fennec

here is a link to the entire logcat session from the full run:
http://people.mozilla.org/~jmaher/reftest_dump/jsreftests.log

Is there a way to generate a logcat stack trace?
so I put some dump statements and turned on the default verbose ones, here is what I get:

passing test:
REFTEST INFO | START http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-1.js
REFTEST INFO | [CONTENT] 314: in recvloadscripttest, now StartTestURI with type_script
REFTEST INFO | [CONTENT] 314: in StartTestURI
REFTEST INFO | [CONTENT] 314: in StartTestURI, calling LoadURI
REFTEST INFO | [CONTENT] go to webnavigation and loaduri
REFTEST INFO | [CONTENT] OnDocumentLoad triggering AfterOnLoadScripts
REFTEST INFO | Initializing canvas snapshot
REFTEST INFO | [CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd
REFTEST INFO | [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Initializing canvas snapshot
REFTEST INFO | [CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd
REFTEST INFO | [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | [CONTENT] MakeProgress: dispatching MozReftestInvalidate
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | [CONTENT] MakeProgress: Completed
REFTEST INFO | [CONTENT] RecordResult fired
REFTEST INFO | RecordResult fired


crashing test:
REFTEST TEST-START | http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-2.js
REFTEST INFO | START http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-2.js
314:  calling sendloadscripttest
314:  sent async message to load script test
REFTEST INFO | [CONTENT] 314: in recvloadscripttest, now StartTestURI with type_script
REFTEST INFO | [CONTENT] 314: in StartTestURI
REFTEST INFO | [CONTENT] 314: in StartTestURI, calling LoadURI
REFTEST INFO | [CONTENT] go to webnavigation and loaduri


Here is the code that outputs the OnDocumentLoad (next missing log statement in the crash scenario):
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest-content.js#490
Product: Fennec → Core
QA Contact: general → general
Assignee: nobody → general
Component: General → JavaScript Engine
QA Contact: general → general
David, this sounds like it is blocking test automation on Android and/or ARM.
Joel, rumor has it this is blocking you on something. Is that correct? What do you need?
:dmandelin, I am able to see fennec crash while running the jsreftests on our automation environment (android tegras).  This is pretty reliable and I suspect this crash is one of the more common oranges we encounter during our automation.

I am not sure if somebody on the javascript team has access to a tegra, I can help with that as well as setting up an environment to reproduce this problem.
Blocks: 438871
can we get some movement on this bug?
Assignee: general → mrosenberg
I've been working on this.  After not being able to run any tests on my phone, I'm attempting to run the tests with the fix for Bug 694241 in place.
ok I have everything running, and have gotten some assertion failures.
in a "how did this ever work" type of bug, the mask to get rid of an extra bit we ram into the condition codes explicitly does not mask out the bit that we ram into the condition codes.

This causes us to generate instructions that we did not inted to.  I am unsure if this was causing the crash/failure, but it is a bug.

diff --git a/js/src/assembler/assembler/MacroAssemblerARM.h b/js/src/assembler/assembler/MacroAssemblerARM.h
--- a/js/src/assembler/assembler/MacroAssemblerARM.h
+++ b/js/src/assembler/assembler/MacroAssemblerARM.h
@@ -38,18 +38,18 @@
 
 #include "ARMAssembler.h"
 #include "AbstractMacroAssembler.h"
 
 namespace JSC {
 
 class MacroAssemblerARM : public AbstractMacroAssembler<ARMAssembler> {
     static const int DoubleConditionMask = 0x0f;
-    static const int DoubleConditionBitSpecial = 0x10;
-    COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);
+    static const int DoubleConditionBitSpecial = 0x8;
+    //COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);
 public:
     enum Condition {
         Equal = ARMAssembler::EQ,
         NotEqual = ARMAssembler::NE,
         Above = ARMAssembler::HI,
         AboveOrEqual = ARMAssembler::CS,
         Below = ARMAssembler::CC,
         BelowOrEqual = ARMAssembler::LS,
ok, I have done quite a few tests with this on a tegra and samsung galaxy tablet.  It seems that I can't reproduce a failure on the samsung tablet with this change.  It seems that I run into some problems on the tegra, the difference is instead of a 100% repro case, it is <50% reproducing.

So lets get this checked in and see where we are!

Thanks for finding this problem.
can we get this checked in?
Comment on attachment 569552 [details] [diff] [review]
actually mask out the extra bit used for "double condition codes"

Review of attachment 569552 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/MacroAssemblerARM.h
@@ +43,5 @@
>  
>  class MacroAssemblerARM : public AbstractMacroAssembler<ARMAssembler> {
>      static const int DoubleConditionMask = 0x0f;
> +    static const int DoubleConditionBitSpecial = 0x8;
> +    //COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);

If the assert is no longer valid, just delete it.
Attachment #569552 - Flags: review?(dvander) → review+
https://hg.mozilla.org/mozilla-central/rev/e5bd32b653b8
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Whiteboard: [orange][mobile_unittests][android_tier_1] → [mobile_unittests][android_tier_1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: