Last Comment Bug 691073 - while running jsreftests on my local tegra, I get a crash
: while running jsreftests on my local tegra, I get a crash
Status: RESOLVED FIXED
[mobile_unittests][android_tier_1]
: intermittent-failure
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: ARM Android
: -- normal (vote)
: mozilla10
Assigned To: Marty Rosenberg [:mjrosenb]
:
Mentors:
: 696754 (view as bug list)
Depends on:
Blocks: 438871
  Show dependency treegraph
 
Reported: 2011-10-01 11:18 PDT by Joel Maher (:jmaher)
Modified: 2012-11-25 19:31 PST (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
actually mask out the extra bit used for "double condition codes" (1.10 KB, patch)
2011-10-25 17:07 PDT, Marty Rosenberg [:mjrosenb]
dvander: review+
Details | Diff | Review

Description Joel Maher (:jmaher) 2011-10-01 11:18:35 PDT
currently we run jsreftests in 2 chunks on our automation.  If I run them all together in a single run on my local tegra, I get a crash.  I have tried a debug build of fennec, but that crashes on startup.  

There is nothing in the logfile or logcat to help indicate what is happening.  

I believe this is the cause of about half of the oranges we see on crashtest/jsreftest/reftest.
Comment 1 Joel Maher (:jmaher) 2011-10-03 07:33:55 PDT
ok, so I reproduced this with a working debug build and I found this:
http://people.mozilla.org/~jmaher/reftest_debug_crash.log
Comment 2 Joel Maher (:jmaher) 2011-10-03 14:18:00 PDT
please let me know if this stackdump is useful or not.  I have reproduced this 3 different times today, so this is definitely something fishy we have.
Comment 3 Jim Chen [:jchen] [:darchons] 2011-10-03 22:14:28 PDT
FWIW: 0xbbadbeef is a JS JIT assertion, and this particular assertion is at http://mxr.mozilla.org/mozilla-central/source/js/src/assembler/assembler/ARMAssembler.h#1286

Looks like a full instruction is passed to nameCC, but it only expects a conditional code. Should be easy to see who is doing this once we get a full stack trace.
Comment 4 Jim Chen [:jchen] [:darchons] 2011-10-03 22:20:57 PDT
(In reply to Jim Chen [:jchen] from comment #3)
> once we get a full stack trace.

Basically the current stack trace only has the top frame (the unwinder is not seeing libxul's text section for some reason?). A logcat stack trace would be good too.
Comment 5 Joel Maher (:jmaher) 2011-10-04 04:18:47 PDT
So I reproduced this crash and I have a logcat output, but there is no data in logcat, here is essentially what I see:
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
D/dalvikvm( 1441): GC_EXPLICIT freed 405 objects / 54912 bytes in 22ms
I/wpa_supplicant( 1349): CTRL-EVENT-STATE-CHANGE id=-1 state=2
V/WifiMonitor( 1021): Event [CTRL-EVENT-STATE-CHANGE id=-1 state=2]
V/WifiStateTracker( 1021): Changing supplicant state: INACTIVE ==> SCANNING
D/NetworkStateTracker( 1021): setDetailed state, old =IDLE and new state=SCANNING
D/ConnectivityService( 1021): Dropping ConnectivityChange for WIFI: DISCONNECTED/SCANNING
I/wpa_supplicant( 1349): CTRL-EVENT-SCAN-RESULTS  Ready
D/dalvikvm( 1441): GC_EXPLICIT freed 182 objects / 13416 bytes in 21ms
D/dalvikvm( 1441): GC_EXPLICIT freed 209 objects / 14784 bytes in 19ms
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/wpa_supplicant( 1349): CTRL-EVENT-STATE-CHANGE id=-1 state=1
V/WifiMonitor( 1021): Event [CTRL-EVENT-STATE-CHANGE id=-1 state=1]
V/WifiStateTracker( 1021): Changing supplicant state: SCANNING ==> INACTIVE
D/NetworkStateTracker( 1021): setDetailed state, old =SCANNING and new state=IDLE
D/ConnectivityService( 1021): Dropping ConnectivityChange for WIFI: DISCONNECTED/IDLE
D/dalvikvm( 1441): GC_EXPLICIT freed 423 objects / 55624 bytes in 26ms
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): AndroidBridge::GetDPI
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
I/Gecko   ( 6526): WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/m-cen-lnx-andrd-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1890
D/pull warning( 1441): more bytes read than expected
D/dalvikvm( 1441): GC_EXPLICIT freed 184 objects / 13504 bytes in 25ms
D/dalvikvm( 1441): GC_EXPLICIT freed 201 objects / 14424 bytes in 23ms
I/ActivityManager( 1021): Process org.mozilla.fennec (pid 6526) has died.
I/WindowManager( 1021): WIN DEATH: Window{445ce8a0 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/WindowManager( 1021): WIN DEATH: Window{44532f30 SurfaceView paused=false}
D/Zygote  (  939): Process 6526 exited cleanly (1)
I/UsageStats( 1021): Unexpected resume of com.mozilla.SUTAgentAndroid while already resumed in org.mozilla.fennec

here is a link to the entire logcat session from the full run:
http://people.mozilla.org/~jmaher/reftest_dump/jsreftests.log

Is there a way to generate a logcat stack trace?
Comment 6 Joel Maher (:jmaher) 2011-10-04 12:09:39 PDT
so I put some dump statements and turned on the default verbose ones, here is what I get:

passing test:
REFTEST INFO | START http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-1.js
REFTEST INFO | [CONTENT] 314: in recvloadscripttest, now StartTestURI with type_script
REFTEST INFO | [CONTENT] 314: in StartTestURI
REFTEST INFO | [CONTENT] 314: in StartTestURI, calling LoadURI
REFTEST INFO | [CONTENT] go to webnavigation and loaduri
REFTEST INFO | [CONTENT] OnDocumentLoad triggering AfterOnLoadScripts
REFTEST INFO | Initializing canvas snapshot
REFTEST INFO | [CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd
REFTEST INFO | [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | Initializing canvas snapshot
REFTEST INFO | [CONTENT] AfterOnLoadScripts belatedly entering WaitForTestEnd
REFTEST INFO | [CONTENT] WaitForTestEnd: Adding listeners
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FIRE_INVALIDATE_EVENT
REFTEST INFO | [CONTENT] MakeProgress: dispatching MozReftestInvalidate
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_FOR_REFTEST_WAIT_REMOVAL
REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
REFTEST INFO | [CONTENT] MakeProgress: Completed
REFTEST INFO | [CONTENT] RecordResult fired
REFTEST INFO | RecordResult fired


crashing test:
REFTEST TEST-START | http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-2.js
REFTEST INFO | START http://192.168.1.109:9999/jsreftest/tests/jsreftest.html?test=ecma/String/15.5.4.7-2.js
314:  calling sendloadscripttest
314:  sent async message to load script test
REFTEST INFO | [CONTENT] 314: in recvloadscripttest, now StartTestURI with type_script
REFTEST INFO | [CONTENT] 314: in StartTestURI
REFTEST INFO | [CONTENT] 314: in StartTestURI, calling LoadURI
REFTEST INFO | [CONTENT] go to webnavigation and loaduri


Here is the code that outputs the OnDocumentLoad (next missing log statement in the crash scenario):
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/reftest-content.js#490
Comment 7 Doug Turner (:dougt) 2011-10-05 10:11:21 PDT
David, this sounds like it is blocking test automation on Android and/or ARM.
Comment 8 David Mandelin [:dmandelin] 2011-10-05 16:10:55 PDT
Joel, rumor has it this is blocking you on something. Is that correct? What do you need?
Comment 9 Joel Maher (:jmaher) 2011-10-05 18:15:03 PDT
:dmandelin, I am able to see fennec crash while running the jsreftests on our automation environment (android tegras).  This is pretty reliable and I suspect this crash is one of the more common oranges we encounter during our automation.

I am not sure if somebody on the javascript team has access to a tegra, I can help with that as well as setting up an environment to reproduce this problem.
Comment 10 Joel Maher (:jmaher) 2011-10-11 09:58:52 PDT
can we get some movement on this bug?
Comment 11 Marty Rosenberg [:mjrosenb] 2011-10-13 17:22:27 PDT
I've been working on this.  After not being able to run any tests on my phone, I'm attempting to run the tests with the fix for Bug 694241 in place.
Comment 12 Marty Rosenberg [:mjrosenb] 2011-10-14 13:04:20 PDT
ok I have everything running, and have gotten some assertion failures.
in a "how did this ever work" type of bug, the mask to get rid of an extra bit we ram into the condition codes explicitly does not mask out the bit that we ram into the condition codes.

This causes us to generate instructions that we did not inted to.  I am unsure if this was causing the crash/failure, but it is a bug.

diff --git a/js/src/assembler/assembler/MacroAssemblerARM.h b/js/src/assembler/assembler/MacroAssemblerARM.h
--- a/js/src/assembler/assembler/MacroAssemblerARM.h
+++ b/js/src/assembler/assembler/MacroAssemblerARM.h
@@ -38,18 +38,18 @@
 
 #include "ARMAssembler.h"
 #include "AbstractMacroAssembler.h"
 
 namespace JSC {
 
 class MacroAssemblerARM : public AbstractMacroAssembler<ARMAssembler> {
     static const int DoubleConditionMask = 0x0f;
-    static const int DoubleConditionBitSpecial = 0x10;
-    COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);
+    static const int DoubleConditionBitSpecial = 0x8;
+    //COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);
 public:
     enum Condition {
         Equal = ARMAssembler::EQ,
         NotEqual = ARMAssembler::NE,
         Above = ARMAssembler::HI,
         AboveOrEqual = ARMAssembler::CS,
         Below = ARMAssembler::CC,
         BelowOrEqual = ARMAssembler::LS,
Comment 13 Joel Maher (:jmaher) 2011-10-19 14:39:46 PDT
ok, I have done quite a few tests with this on a tegra and samsung galaxy tablet.  It seems that I can't reproduce a failure on the samsung tablet with this change.  It seems that I run into some problems on the tegra, the difference is instead of a 100% repro case, it is <50% reproducing.

So lets get this checked in and see where we are!

Thanks for finding this problem.
Comment 14 Joel Maher (:jmaher) 2011-10-24 19:53:11 PDT
can we get this checked in?
Comment 15 Marty Rosenberg [:mjrosenb] 2011-10-25 17:07:08 PDT
Created attachment 569552 [details] [diff] [review]
actually mask out the extra bit used for "double condition codes"
Comment 16 David Anderson [:dvander] 2011-10-25 17:19:49 PDT
Comment on attachment 569552 [details] [diff] [review]
actually mask out the extra bit used for "double condition codes"

Review of attachment 569552 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/assembler/assembler/MacroAssemblerARM.h
@@ +43,5 @@
>  
>  class MacroAssemblerARM : public AbstractMacroAssembler<ARMAssembler> {
>      static const int DoubleConditionMask = 0x0f;
> +    static const int DoubleConditionBitSpecial = 0x8;
> +    //COMPILE_ASSERT(!(DoubleConditionBitSpecial & DoubleConditionMask), DoubleConditionBitSpecial_should_not_interfere_with_ARMAssembler_Condition_codes);

If the assert is no longer valid, just delete it.
Comment 17 Christian Holler (:decoder) 2011-10-25 17:37:38 PDT
*** Bug 696754 has been marked as a duplicate of this bug. ***
Comment 18 Marco Bonardo [::mak] 2011-10-27 01:46:14 PDT
https://hg.mozilla.org/mozilla-central/rev/e5bd32b653b8

Note You need to log in before you can comment on or make changes to this bug.