Closed Bug 687147 Opened 8 years ago Closed 8 years ago

fennec is crashing while running reftests and mochitests

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 9

People

(Reporter: jmaher, Assigned: glandium)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [mobile_unittests][android_tier_1][inbound][QA?])

Attachments

(3 files)

While running reftests and mochitests we would run into a situation where we would launch fennec and our scripts would fail because fennec was not in the process list.  This has been an increasing problem on the automation infrastructure and we were able to debug this here and found this stack trace via logcat:

I/ActivityManager( 1017): Starting activity: Intent { act=android.intent.action.MAIN flg=0x10000000 pkg=org.mozilla.fennec cmp=org.mozilla.fennec/.App (has extras) }
I/ActivityManager( 1017): Start proc org.mozilla.fennec for activity org.mozilla.fennec/.App: pid=4393 uid=10033 gids={3003, 1015}
D/dalvikvm(  938): GC_EXPLICIT freed 301 objects / 11408 bytes in 33ms
D/dalvikvm( 1428): GC_EXPLICIT freed 1225 objects / 123272 bytes in 51ms
D/dalvikvm(  938): GC_EXPLICIT freed 57 objects / 2528 bytes in 29ms
D/dalvikvm(  938): GC_EXPLICIT freed 2 objects / 48 bytes in 27ms
D/dalvikvm( 1428): GC_EXPLICIT freed 347 objects / 53536 bytes in 40ms
D/dalvikvm( 4393): GC_FOR_MALLOC freed 2070 objects / 152800 bytes in 26ms
I/GeckoApp( 4393): create
I/GeckoApp( 4393): Checking for an update
I/GeckoApp( 4393): start
I/GeckoApp( 4393): resume
D/dalvikvm( 4393): Trying to load lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338
D/dalvikvm( 4393): Added shared lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338
D/dalvikvm( 4393): No JNI_OnLoad found in /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338, skipping init
I/GeckoAppShell( 4393): env0: ANDROID_SOCKET_zygote=10
I/GeckoAppShell( 4393): env1: ANDROID_BOOTLOGO=1
I/GeckoAppShell( 4393): env2: EXTERNAL_STORAGE=/mnt/sdcard
I/GeckoAppShell( 4393): env3: ANDROID_ASSETS=/system/app
I/GeckoAppShell( 4393): env4: ASEC_MOUNTPOINT=/mnt/asec
I/GeckoAppShell( 4393): env5: PATH=/sbin:/system/sbin:/system/bin:/system/xbin
I/GeckoAppShell( 4393): env6: ANDROID_DATA=/data
I/GeckoAppShell( 4393): env7: BOOTCLASSPATH=/system/framework/core.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/com.nvidia.graphics.jar
I/GeckoAppShell( 4393): env8: ANDROID_PROPERTY_WORKSPACE=9,32768
I/GeckoAppShell( 4393): env9: ANDROID_ROOT=/system
I/GeckoAppShell( 4393): env10: LD_LIBRARY_PATH=/system/lib
I/GeckoAppShell( 4393): env11: MOZ_CRASHREPORTER_NO_REPORT=1
I/GeckoAppShell( 4393): env12: MOZ_CRASHREPORTER=1
I/GeckoAppShell( 4393): env13: XPCOM_MEM_BLOAT_LOG=/tmp/tmpJNuzc2/runreftest_leaks.log
I/GeckoAppShell( 4393): env14: XPCOM_DEBUG_BREAK=stack
I/GeckoAppShell( 4393): env15: MOZ_PROCESS_LOG=/tmp/tmpkzILk_pidlog
I/GeckoAppShell( 4393): env16: NO_EM_RESTART=1
I/GeckoAppShell( 4393): env17: null
W/InputManagerService( 1017): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@44489d78 (uid=10031 pid=1428)
I/GeckoSurfaceView( 4393): surface created
I/DEBUG   (  936): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (  936): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20101105.183900:eng/test-keys'
I/DEBUG   (  936): pid: 4393, tid: 4401  >>> org.mozilla.fennec <<<
I/DEBUG   (  936): signal 4 (SIGILL), fault addr 70000940
I/DEBUG   (  936):  r0 70001ae0  r1 00000001  r2 70000921  r3 70000923
I/DEBUG   (  936):  r4 70001ae4  r5 00000001  r6 00000004  r7 00000000
I/DEBUG   (  936):  r8 428ef000  r9 80c0d40e  10 00000f6c  fp 80c0d40e
I/DEBUG   (  936):  ip 80c0e12c  sp 4738f720  lr 80c0745b  pc 70000940  cpsr 00000030
I/DEBUG   (  936):  d0  0000007442e80000  d1  3ff0000042e80000
I/DEBUG   (  936):  d2  0000000050baf6de  d3  0000000000000000
I/DEBUG   (  936):  d4  00000011000000b4  d5  3fe999999999999a
I/DEBUG   (  936):  d6  3fe8000000000000  d7  3eaaaaab3f800000
I/DEBUG   (  936):  d8  0000000000000000  d9  0000000000000000
I/DEBUG   (  936):  d10 0000000000000000  d11 0000000000000000
I/DEBUG   (  936):  d12 0000000000000000  d13 0000000000000000
I/DEBUG   (  936):  d14 0000000000000000  d15 0000000000000000
I/DEBUG   (  936):  scr 60000012
I/DEBUG   (  936): 
I/GeckoSurfaceView( 4393): surfaceChanged: fmt: -1 dim: 1680 1025
I/ActivityManager( 1017): Displayed activity org.mozilla.fennec/.App: 382 ms (total 382 ms)
I/DEBUG   (  936):          #00  pc 00000940  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):          #01  lr 80c0745b  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936): 
I/DEBUG   (  936): code around pc:
I/DEBUG   (  936): 70000920 bf004770 b108b510 efa0f7ff bf00bd10 
I/DEBUG   (  936): 70000930 f7ffb510 bd10efa8 4604b510 f7ff4620 
I/DEBUG   (  936): 70000940 b910efa2 f8a4f000 bd10e7f8 f7ffb510 
I/DEBUG   (  936): 70000950 bd10efbe 4605b570 4628460c f7ff4621 
I/DEBUG   (  936): 70000960 4606efb6 f7ffb938 6803ef88 d0022b16 
I/DEBUG   (  936): 
I/DEBUG   (  936): code around lr:
I/DEBUG   (  936): 80c07438 2301b90a 1e4be004 0483eb00 33fff04f 
I/DEBUG   (  936): 80c07448 460d009e 6822e006 1e5319a4 d8003303 
I/DEBUG   (  936): 80c07458 3d014790 dcf62d00 bf00bd70 4b19b51f 
I/DEBUG   (  936): 80c07468 22004601 681b447b d005428b 00a4f8d3 
I/DEBUG   (  936): 80c07478 b110461a e7f74603 4813b989 143df240 
I/DEBUG   (  936): 
I/DEBUG   (  936): stack:
I/DEBUG   (  936):     4738f6e0  70000add  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f6e4  00000002  
I/DEBUG   (  936):     4738f6e8  700007d0  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f6ec  80c0f920  
I/DEBUG   (  936):     4738f6f0  70000218  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f6f4  70000538  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f6f8  00000002  
I/DEBUG   (  936):     4738f6fc  80c0fc28  
I/DEBUG   (  936):     4738f700  00005368  
I/DEBUG   (  936):     4738f704  80c0c984  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     4738f708  0000001e  
I/DEBUG   (  936):     4738f70c  80c0fc54  
I/DEBUG   (  936):     4738f710  70001b9c  
I/DEBUG   (  936):     4738f714  0000001e  
I/DEBUG   (  936):     4738f718  df002777  
I/DEBUG   (  936):     4738f71c  e3a070ad  
I/DEBUG   (  936): #00 4738f720  80c0fc54  
I/DEBUG   (  936):     4738f724  00002731  
I/DEBUG   (  936):     4738f728  0000001e  
I/DEBUG   (  936):     4738f72c  80c08807  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     4738f730  80c0fc54  
I/DEBUG   (  936):     4738f734  00000004  
I/DEBUG   (  936):     4738f738  0000001e  
I/DEBUG   (  936):     4738f73c  00000000  
I/DEBUG   (  936):     4738f740  428ef000  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f744  80c0fc54  
I/DEBUG   (  936):     4738f748  80c0fc54  
I/DEBUG   (  936):     4738f74c  ffffffff  
I/DEBUG   (  936):     4738f750  0000001e  
I/DEBUG   (  936):     4738f754  00000000  
I/DEBUG   (  936):     4738f758  428ef000  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4738f75c  80c0d40e  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     4738f760  00000f6c  
I/DEBUG   (  936):     4738f764  80c08a3d  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/WindowManager( 1017): WIN DEATH: Window{444dbcc8 SurfaceView paused=false}
D/Zygote  (  938): Process 4393 terminated by signal (4)
I/ActivityManager( 1017): Process org.mozilla.fennec (pid 4393) has died.
I/WindowManager( 1017): WIN DEATH: Window{443f0b58 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/UsageStats( 1017): Unexpected resume of com.mozilla.SUTAgentAndroid while already resumed in org.mozilla.fennec
this appear intermittent while running on the same tegra/install, but I found that removing /data/data/org.mozilla.fennec/cache and /app_tmp directories yielded a reproduction....could be coincidence.
so talos will run into this problem when we update the talos extensions to have an unlimited maxversion.
Have you tried a new build? This looks to be the same as Bug 686763, except that crash involves the system linker and this crash involves our custom linker.

Very strange crash nonetheless... AFAICT, the crash happens when the linker tries to call .init_array in libmozalloc.so. The address to call is 0x70000920, but for some reason PC ends up at 0x70000940 (even crazier, various logcats show PC values at 0x70000940, 0x70000960, and 0x70000980; 0x20 increments).
See Also: → 686763
this is still happening on tinderbox, so we are not in the clear yet
ok, I got the crash data from a live tegra, it looks very similar, just a little different: 



I/ActivityManager( 1017): Starting activity: Intent { act=android.intent.action.VIEW dat=http://people.mozilla.org/~jmaher/talos/getInfo.html flg=0x10000000 pkg=org.mozilla.fennec cmp=org.mozilla.fennec/.App (has extras) }
I/ActivityManager( 1017): Start proc org.mozilla.fennec for activity org.mozilla.fennec/.App: pid=2063 uid=10033 gids={3003, 1015, 1007}
D/dalvikvm( 1449): GC_EXPLICIT freed 518 objects / 33704 bytes in 31ms
D/dalvikvm( 1449): GC_EXPLICIT freed 376 objects / 59080 bytes in 26ms
D/dalvikvm( 2063): GC_FOR_MALLOC freed 1798 objects / 142432 bytes in 24ms
I/GeckoApp( 2063): create
I/GeckoApp( 2063): Checking for an update
I/GeckoApp( 2063): start
I/GeckoApp( 2063): resume
D/dalvikvm( 2063): Trying to load lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418
D/dalvikvm( 2063): Added shared lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418
D/dalvikvm( 2063): No JNI_OnLoad found in /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418, skipping init
I/GeckoAppShell( 2063): env0: null
W/InputManagerService( 1017): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@444f8930 (uid=10031 pid=1449)
I/GeckoSurfaceView( 2063): surface created
I/GeckoSurfaceView( 2063): surfaceChanged: fmt: -1 dim: 1024 743
I/ActivityManager( 1017): Displayed activity org.mozilla.fennec/.App: 281 ms (total 281 ms)
I/DEBUG   (  936): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (  936): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20101105.183900:eng/test-keys'
I/DEBUG   (  936): pid: 2063, tid: 2071  >>> org.mozilla.fennec <<<
I/DEBUG   (  936): signal 11 (SIGSEGV), fault addr 70002f34
I/DEBUG   (  936):  r0 70001ae0  r1 00000001  r2 70000921  r3 70000923
I/DEBUG   (  936):  r4 70001ae4  r5 00000001  r6 00000004  r7 00000000
I/DEBUG   (  936):  r8 428f0000  r9 80c0d476  10 00000f6c  fp 80c0d476
I/DEBUG   (  936):  ip 70002bf8  sp 47252718  lr 700009e7  pc 700008a4  cpsr 00000010
I/DEBUG   (  936):  d0  0000007442e80000  d1  3ff0000042e80000
I/DEBUG   (  936):  d2  0000000050baf6de  d3  0000000000000000
I/DEBUG   (  936):  d4  0000002f000000b4  d5  3fe999999999999a
I/DEBUG   (  936):  d6  3fe8000000000000  d7  3eaaaaab3f800000
I/DEBUG   (  936):  d8  0000000000000000  d9  0000000000000000
I/DEBUG   (  936):  d10 0000000000000000  d11 0000000000000000
I/DEBUG   (  936):  d12 0000000000000000  d13 0000000000000000
I/DEBUG   (  936):  d14 0000000000000000  d15 0000000000000000
I/DEBUG   (  936):  scr 60000012
I/DEBUG   (  936): 
I/DEBUG   (  936):          #00  pc 000008a4  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):          #01  lr 700009e7  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936): 
I/DEBUG   (  936): code around pc:
I/DEBUG   (  936): 70000884 e28fc600 e28cca01 e5bcf34c e28fc600 
I/DEBUG   (  936): 70000894 e28cca01 e5bcf344 e28fc600 e28cca01 
I/DEBUG   (  936): 700008a4 e5bcf33c e28fc600 e28cca01 e5bcf334 
I/DEBUG   (  936): 700008b4 e28fc600 e28cca01 e5bcf32c e28fc600 
I/DEBUG   (  936): 700008c4 e28cca01 e5bcf324 e28fc600 e28cca01 
I/DEBUG   (  936): 
I/DEBUG   (  936): code around lr:
I/DEBUG   (  936): 700009c4 f7ffb510 bd10ef76 4604b510 f7ff4620 
I/DEBUG   (  936): 700009d4 b910ef70 f85af000 bd10e7f8 f7ffb510 
I/DEBUG   (  936): 700009e4 bd10ef80 4605b570 4628460c f7ff4621 
I/DEBUG   (  936): 700009f4 b910ef78 f84af000 bd70e7f7 f7ffb510 
I/DEBUG   (  936): 70000a04 bd10ef52 4605b570 4628460c f7ff4621 
I/DEBUG   (  936): 
I/DEBUG   (  936): stack:
I/DEBUG   (  936):     472526d8  00000000  
I/DEBUG   (  936):     472526dc  00000000  
I/DEBUG   (  936):     472526e0  70000add  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     472526e4  00000002  
I/DEBUG   (  936):     472526e8  700007d0  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     472526ec  80c0f920  
I/DEBUG   (  936):     472526f0  70000218  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     472526f4  70000538  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     472526f8  00000002  
I/DEBUG   (  936):     472526fc  80c0fc28  
I/DEBUG   (  936):     47252700  00005384  
I/DEBUG   (  936):     47252704  80c0c9ec  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     47252708  0000001e  
I/DEBUG   (  936):     4725270c  80c0fc54  
I/DEBUG   (  936):     47252710  df002777  
I/DEBUG   (  936):     47252714  e3a070ad  
I/DEBUG   (  936): #00 47252718  70001ae4  
I/DEBUG   (  936):     4725271c  80c074a7  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     47252720  80c0fc54  
I/DEBUG   (  936):     47252724  00002731  
I/DEBUG   (  936):     47252728  0000001e  
I/DEBUG   (  936):     4725272c  80c08853  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG   (  936):     47252730  80c0fc54  
I/DEBUG   (  936):     47252734  00000004  
I/DEBUG   (  936):     47252738  0000001e  
I/DEBUG   (  936):     4725273c  00000000  
I/DEBUG   (  936):     47252740  428f0000  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     47252744  80c0fc54  
I/DEBUG   (  936):     47252748  80c0fc54  
I/DEBUG   (  936):     4725274c  ffffffff  
I/DEBUG   (  936):     47252750  0000001e  
I/DEBUG   (  936):     47252754  00000000  
I/DEBUG   (  936):     47252758  428f0000  /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG   (  936):     4725275c  80c0d476  /data/data/org.mozilla.fennec/lib/libmozutils.so
I/WindowManager( 1017): WIN DEATH: Window{4443b530 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
Could you get the libmozalloc.so file from the apk that gave you that crash and attach it to this bug? Thanks.
So, the calling code would be
000009e0 <moz_realloc>:
 9e0:	b510      	push	{r4, lr}
 9e2:	f7ff ef80 	blx	8e4 <moz_malloc_usable_size-0x40>
 9e6:	bd10      	pop	{r4, pc}

which ... makes no sense, because the crash is at 0x8a4, which is *before* 0x8e4 (and both these addresses are in the PLT)
Attached patch WorkaroundSplinter Review
Workaround
Attachment #561247 - Flags: review?(mark.finkle)
Comment on attachment 561247 [details] [diff] [review]
Workaround

Talked to a few folks about this. Looks good to workaround the problem and get testing infra back on it's feet. Mike (glandium) also agrees.

Let's land as a workaround, but keep the bug open to figure out the "why" part.
Attachment #561247 - Flags: review?(mark.finkle) → review+
pushed woraround to m-c:
https://hg.mozilla.org/mozilla-central/rev/b15856d4b114

leaving bug open to do deeper digging into what's happening
So, on the tegra board where I could reliably reproduce the issue, I seem to have reliably fixed it by moving the stuff done by mLibLoadThread in http://mxr.mozilla.org/mozilla-central/source/embedding/android/GeckoApp.java#404 on the main thread. It seems like dalvik is doing its own threading and doesn't do interworking properly (switching between arm and thumb). That's my current guess at the root cause.
Trying this:
https://hg.mozilla.org/try/rev/48eb2ecba6aa
https://tbpl.mozilla.org/?usebuildbot=1&tree=Try&rev=48eb2ecba6aa

Please feel free to re-trigger tests if necessary.

Yeah, that seriously sucks :( If this is confirmed to solve these crashes, this basically means we have to stop the world while our libs are being loaded. But, if dalvik effectively has a weird way to do threads, maybe we can make the linker code use real threads...

Note that an alternative explanation would be a bug in the bionic libc or in the kernel, which would be even scarier.
Duplicate of this bug: 686763
Some more info: it also doesn't happen if i add logging of the address being called, but keeping the separate thread.
In the end, it was a cache coherency problem between the instruction cache and the dat cache. It didn't happen before because it was quite unlikely to happen. But the drop in size of libmozalloc.so with the move of jemalloc into mozutils made it much more likely.
Attachment #561794 - Flags: review?(mwu)
Comment on attachment 561794 [details] [diff] [review]
Ensure Instruction and Data cache coherency after extracting libraries

Good catch.
Attachment #561794 - Flags: review?(mwu) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/e299d7fed89d
Assignee: nobody → mh+mozilla
Whiteboard: [orange][mobile_unittests][android_tier_1] → [orange][mobile_unittests][android_tier_1][inbound]
https://hg.mozilla.org/mozilla-central/rev/e299d7fed89d
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 9
Whiteboard: [orange][mobile_unittests][android_tier_1][inbound] → [orange][mobile_unittests][android_tier_1][inbound][QA?]
Whiteboard: [orange][mobile_unittests][android_tier_1][inbound][QA?] → [mobile_unittests][android_tier_1][inbound][QA?]
You need to log in before you can comment on or make changes to this bug.