Closed
Bug 687147
Opened 13 years ago
Closed 13 years ago
fennec is crashing while running reftests and mochitests
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Firefox 9
People
(Reporter: jmaher, Assigned: glandium)
References
Details
(Keywords: intermittent-failure, Whiteboard: [mobile_unittests][android_tier_1][inbound][QA?])
Attachments
(3 files)
3.86 KB,
application/octet-stream
|
Details | |
1.05 KB,
patch
|
mfinkle
:
review+
|
Details | Diff | Splinter Review |
4.25 KB,
patch
|
mwu
:
review+
|
Details | Diff | Splinter Review |
While running reftests and mochitests we would run into a situation where we would launch fennec and our scripts would fail because fennec was not in the process list. This has been an increasing problem on the automation infrastructure and we were able to debug this here and found this stack trace via logcat:
I/ActivityManager( 1017): Starting activity: Intent { act=android.intent.action.MAIN flg=0x10000000 pkg=org.mozilla.fennec cmp=org.mozilla.fennec/.App (has extras) }
I/ActivityManager( 1017): Start proc org.mozilla.fennec for activity org.mozilla.fennec/.App: pid=4393 uid=10033 gids={3003, 1015}
D/dalvikvm( 938): GC_EXPLICIT freed 301 objects / 11408 bytes in 33ms
D/dalvikvm( 1428): GC_EXPLICIT freed 1225 objects / 123272 bytes in 51ms
D/dalvikvm( 938): GC_EXPLICIT freed 57 objects / 2528 bytes in 29ms
D/dalvikvm( 938): GC_EXPLICIT freed 2 objects / 48 bytes in 27ms
D/dalvikvm( 1428): GC_EXPLICIT freed 347 objects / 53536 bytes in 40ms
D/dalvikvm( 4393): GC_FOR_MALLOC freed 2070 objects / 152800 bytes in 26ms
I/GeckoApp( 4393): create
I/GeckoApp( 4393): Checking for an update
I/GeckoApp( 4393): start
I/GeckoApp( 4393): resume
D/dalvikvm( 4393): Trying to load lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338
D/dalvikvm( 4393): Added shared lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338
D/dalvikvm( 4393): No JNI_OnLoad found in /data/data/org.mozilla.fennec/lib/libmozutils.so 0x442ca338, skipping init
I/GeckoAppShell( 4393): env0: ANDROID_SOCKET_zygote=10
I/GeckoAppShell( 4393): env1: ANDROID_BOOTLOGO=1
I/GeckoAppShell( 4393): env2: EXTERNAL_STORAGE=/mnt/sdcard
I/GeckoAppShell( 4393): env3: ANDROID_ASSETS=/system/app
I/GeckoAppShell( 4393): env4: ASEC_MOUNTPOINT=/mnt/asec
I/GeckoAppShell( 4393): env5: PATH=/sbin:/system/sbin:/system/bin:/system/xbin
I/GeckoAppShell( 4393): env6: ANDROID_DATA=/data
I/GeckoAppShell( 4393): env7: BOOTCLASSPATH=/system/framework/core.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/com.nvidia.graphics.jar
I/GeckoAppShell( 4393): env8: ANDROID_PROPERTY_WORKSPACE=9,32768
I/GeckoAppShell( 4393): env9: ANDROID_ROOT=/system
I/GeckoAppShell( 4393): env10: LD_LIBRARY_PATH=/system/lib
I/GeckoAppShell( 4393): env11: MOZ_CRASHREPORTER_NO_REPORT=1
I/GeckoAppShell( 4393): env12: MOZ_CRASHREPORTER=1
I/GeckoAppShell( 4393): env13: XPCOM_MEM_BLOAT_LOG=/tmp/tmpJNuzc2/runreftest_leaks.log
I/GeckoAppShell( 4393): env14: XPCOM_DEBUG_BREAK=stack
I/GeckoAppShell( 4393): env15: MOZ_PROCESS_LOG=/tmp/tmpkzILk_pidlog
I/GeckoAppShell( 4393): env16: NO_EM_RESTART=1
I/GeckoAppShell( 4393): env17: null
W/InputManagerService( 1017): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@44489d78 (uid=10031 pid=1428)
I/GeckoSurfaceView( 4393): surface created
I/DEBUG ( 936): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG ( 936): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20101105.183900:eng/test-keys'
I/DEBUG ( 936): pid: 4393, tid: 4401 >>> org.mozilla.fennec <<<
I/DEBUG ( 936): signal 4 (SIGILL), fault addr 70000940
I/DEBUG ( 936): r0 70001ae0 r1 00000001 r2 70000921 r3 70000923
I/DEBUG ( 936): r4 70001ae4 r5 00000001 r6 00000004 r7 00000000
I/DEBUG ( 936): r8 428ef000 r9 80c0d40e 10 00000f6c fp 80c0d40e
I/DEBUG ( 936): ip 80c0e12c sp 4738f720 lr 80c0745b pc 70000940 cpsr 00000030
I/DEBUG ( 936): d0 0000007442e80000 d1 3ff0000042e80000
I/DEBUG ( 936): d2 0000000050baf6de d3 0000000000000000
I/DEBUG ( 936): d4 00000011000000b4 d5 3fe999999999999a
I/DEBUG ( 936): d6 3fe8000000000000 d7 3eaaaaab3f800000
I/DEBUG ( 936): d8 0000000000000000 d9 0000000000000000
I/DEBUG ( 936): d10 0000000000000000 d11 0000000000000000
I/DEBUG ( 936): d12 0000000000000000 d13 0000000000000000
I/DEBUG ( 936): d14 0000000000000000 d15 0000000000000000
I/DEBUG ( 936): scr 60000012
I/DEBUG ( 936):
I/GeckoSurfaceView( 4393): surfaceChanged: fmt: -1 dim: 1680 1025
I/ActivityManager( 1017): Displayed activity org.mozilla.fennec/.App: 382 ms (total 382 ms)
I/DEBUG ( 936): #00 pc 00000940 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): #01 lr 80c0745b /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936):
I/DEBUG ( 936): code around pc:
I/DEBUG ( 936): 70000920 bf004770 b108b510 efa0f7ff bf00bd10
I/DEBUG ( 936): 70000930 f7ffb510 bd10efa8 4604b510 f7ff4620
I/DEBUG ( 936): 70000940 b910efa2 f8a4f000 bd10e7f8 f7ffb510
I/DEBUG ( 936): 70000950 bd10efbe 4605b570 4628460c f7ff4621
I/DEBUG ( 936): 70000960 4606efb6 f7ffb938 6803ef88 d0022b16
I/DEBUG ( 936):
I/DEBUG ( 936): code around lr:
I/DEBUG ( 936): 80c07438 2301b90a 1e4be004 0483eb00 33fff04f
I/DEBUG ( 936): 80c07448 460d009e 6822e006 1e5319a4 d8003303
I/DEBUG ( 936): 80c07458 3d014790 dcf62d00 bf00bd70 4b19b51f
I/DEBUG ( 936): 80c07468 22004601 681b447b d005428b 00a4f8d3
I/DEBUG ( 936): 80c07478 b110461a e7f74603 4813b989 143df240
I/DEBUG ( 936):
I/DEBUG ( 936): stack:
I/DEBUG ( 936): 4738f6e0 70000add /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f6e4 00000002
I/DEBUG ( 936): 4738f6e8 700007d0 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f6ec 80c0f920
I/DEBUG ( 936): 4738f6f0 70000218 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f6f4 70000538 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f6f8 00000002
I/DEBUG ( 936): 4738f6fc 80c0fc28
I/DEBUG ( 936): 4738f700 00005368
I/DEBUG ( 936): 4738f704 80c0c984 /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 4738f708 0000001e
I/DEBUG ( 936): 4738f70c 80c0fc54
I/DEBUG ( 936): 4738f710 70001b9c
I/DEBUG ( 936): 4738f714 0000001e
I/DEBUG ( 936): 4738f718 df002777
I/DEBUG ( 936): 4738f71c e3a070ad
I/DEBUG ( 936): #00 4738f720 80c0fc54
I/DEBUG ( 936): 4738f724 00002731
I/DEBUG ( 936): 4738f728 0000001e
I/DEBUG ( 936): 4738f72c 80c08807 /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 4738f730 80c0fc54
I/DEBUG ( 936): 4738f734 00000004
I/DEBUG ( 936): 4738f738 0000001e
I/DEBUG ( 936): 4738f73c 00000000
I/DEBUG ( 936): 4738f740 428ef000 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f744 80c0fc54
I/DEBUG ( 936): 4738f748 80c0fc54
I/DEBUG ( 936): 4738f74c ffffffff
I/DEBUG ( 936): 4738f750 0000001e
I/DEBUG ( 936): 4738f754 00000000
I/DEBUG ( 936): 4738f758 428ef000 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4738f75c 80c0d40e /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 4738f760 00000f6c
I/DEBUG ( 936): 4738f764 80c08a3d /data/data/org.mozilla.fennec/lib/libmozutils.so
I/WindowManager( 1017): WIN DEATH: Window{444dbcc8 SurfaceView paused=false}
D/Zygote ( 938): Process 4393 terminated by signal (4)
I/ActivityManager( 1017): Process org.mozilla.fennec (pid 4393) has died.
I/WindowManager( 1017): WIN DEATH: Window{443f0b58 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
I/UsageStats( 1017): Unexpected resume of com.mozilla.SUTAgentAndroid while already resumed in org.mozilla.fennec
Reporter | ||
Comment 1•13 years ago
|
||
this appear intermittent while running on the same tegra/install, but I found that removing /data/data/org.mozilla.fennec/cache and /app_tmp directories yielded a reproduction....could be coincidence.
Reporter | ||
Comment 2•13 years ago
|
||
so talos will run into this problem when we update the talos extensions to have an unlimited maxversion.
Comment 3•13 years ago
|
||
Have you tried a new build? This looks to be the same as Bug 686763, except that crash involves the system linker and this crash involves our custom linker.
Very strange crash nonetheless... AFAICT, the crash happens when the linker tries to call .init_array in libmozalloc.so. The address to call is 0x70000920, but for some reason PC ends up at 0x70000940 (even crazier, various logcats show PC values at 0x70000940, 0x70000960, and 0x70000980; 0x20 increments).
See Also: → 686763
Reporter | ||
Comment 4•13 years ago
|
||
this is still happening on tinderbox, so we are not in the clear yet
Reporter | ||
Comment 5•13 years ago
|
||
ok, I got the crash data from a live tegra, it looks very similar, just a little different:
I/ActivityManager( 1017): Starting activity: Intent { act=android.intent.action.VIEW dat=http://people.mozilla.org/~jmaher/talos/getInfo.html flg=0x10000000 pkg=org.mozilla.fennec cmp=org.mozilla.fennec/.App (has extras) }
I/ActivityManager( 1017): Start proc org.mozilla.fennec for activity org.mozilla.fennec/.App: pid=2063 uid=10033 gids={3003, 1015, 1007}
D/dalvikvm( 1449): GC_EXPLICIT freed 518 objects / 33704 bytes in 31ms
D/dalvikvm( 1449): GC_EXPLICIT freed 376 objects / 59080 bytes in 26ms
D/dalvikvm( 2063): GC_FOR_MALLOC freed 1798 objects / 142432 bytes in 24ms
I/GeckoApp( 2063): create
I/GeckoApp( 2063): Checking for an update
I/GeckoApp( 2063): start
I/GeckoApp( 2063): resume
D/dalvikvm( 2063): Trying to load lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418
D/dalvikvm( 2063): Added shared lib /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418
D/dalvikvm( 2063): No JNI_OnLoad found in /data/data/org.mozilla.fennec/lib/libmozutils.so 0x4430a418, skipping init
I/GeckoAppShell( 2063): env0: null
W/InputManagerService( 1017): Starting input on non-focused client com.android.internal.view.IInputMethodClient$Stub$Proxy@444f8930 (uid=10031 pid=1449)
I/GeckoSurfaceView( 2063): surface created
I/GeckoSurfaceView( 2063): surfaceChanged: fmt: -1 dim: 1024 743
I/ActivityManager( 1017): Displayed activity org.mozilla.fennec/.App: 281 ms (total 281 ms)
I/DEBUG ( 936): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG ( 936): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20101105.183900:eng/test-keys'
I/DEBUG ( 936): pid: 2063, tid: 2071 >>> org.mozilla.fennec <<<
I/DEBUG ( 936): signal 11 (SIGSEGV), fault addr 70002f34
I/DEBUG ( 936): r0 70001ae0 r1 00000001 r2 70000921 r3 70000923
I/DEBUG ( 936): r4 70001ae4 r5 00000001 r6 00000004 r7 00000000
I/DEBUG ( 936): r8 428f0000 r9 80c0d476 10 00000f6c fp 80c0d476
I/DEBUG ( 936): ip 70002bf8 sp 47252718 lr 700009e7 pc 700008a4 cpsr 00000010
I/DEBUG ( 936): d0 0000007442e80000 d1 3ff0000042e80000
I/DEBUG ( 936): d2 0000000050baf6de d3 0000000000000000
I/DEBUG ( 936): d4 0000002f000000b4 d5 3fe999999999999a
I/DEBUG ( 936): d6 3fe8000000000000 d7 3eaaaaab3f800000
I/DEBUG ( 936): d8 0000000000000000 d9 0000000000000000
I/DEBUG ( 936): d10 0000000000000000 d11 0000000000000000
I/DEBUG ( 936): d12 0000000000000000 d13 0000000000000000
I/DEBUG ( 936): d14 0000000000000000 d15 0000000000000000
I/DEBUG ( 936): scr 60000012
I/DEBUG ( 936):
I/DEBUG ( 936): #00 pc 000008a4 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): #01 lr 700009e7 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936):
I/DEBUG ( 936): code around pc:
I/DEBUG ( 936): 70000884 e28fc600 e28cca01 e5bcf34c e28fc600
I/DEBUG ( 936): 70000894 e28cca01 e5bcf344 e28fc600 e28cca01
I/DEBUG ( 936): 700008a4 e5bcf33c e28fc600 e28cca01 e5bcf334
I/DEBUG ( 936): 700008b4 e28fc600 e28cca01 e5bcf32c e28fc600
I/DEBUG ( 936): 700008c4 e28cca01 e5bcf324 e28fc600 e28cca01
I/DEBUG ( 936):
I/DEBUG ( 936): code around lr:
I/DEBUG ( 936): 700009c4 f7ffb510 bd10ef76 4604b510 f7ff4620
I/DEBUG ( 936): 700009d4 b910ef70 f85af000 bd10e7f8 f7ffb510
I/DEBUG ( 936): 700009e4 bd10ef80 4605b570 4628460c f7ff4621
I/DEBUG ( 936): 700009f4 b910ef78 f84af000 bd70e7f7 f7ffb510
I/DEBUG ( 936): 70000a04 bd10ef52 4605b570 4628460c f7ff4621
I/DEBUG ( 936):
I/DEBUG ( 936): stack:
I/DEBUG ( 936): 472526d8 00000000
I/DEBUG ( 936): 472526dc 00000000
I/DEBUG ( 936): 472526e0 70000add /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 472526e4 00000002
I/DEBUG ( 936): 472526e8 700007d0 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 472526ec 80c0f920
I/DEBUG ( 936): 472526f0 70000218 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 472526f4 70000538 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 472526f8 00000002
I/DEBUG ( 936): 472526fc 80c0fc28
I/DEBUG ( 936): 47252700 00005384
I/DEBUG ( 936): 47252704 80c0c9ec /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 47252708 0000001e
I/DEBUG ( 936): 4725270c 80c0fc54
I/DEBUG ( 936): 47252710 df002777
I/DEBUG ( 936): 47252714 e3a070ad
I/DEBUG ( 936): #00 47252718 70001ae4
I/DEBUG ( 936): 4725271c 80c074a7 /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 47252720 80c0fc54
I/DEBUG ( 936): 47252724 00002731
I/DEBUG ( 936): 47252728 0000001e
I/DEBUG ( 936): 4725272c 80c08853 /data/data/org.mozilla.fennec/lib/libmozutils.so
I/DEBUG ( 936): 47252730 80c0fc54
I/DEBUG ( 936): 47252734 00000004
I/DEBUG ( 936): 47252738 0000001e
I/DEBUG ( 936): 4725273c 00000000
I/DEBUG ( 936): 47252740 428f0000 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 47252744 80c0fc54
I/DEBUG ( 936): 47252748 80c0fc54
I/DEBUG ( 936): 4725274c ffffffff
I/DEBUG ( 936): 47252750 0000001e
I/DEBUG ( 936): 47252754 00000000
I/DEBUG ( 936): 47252758 428f0000 /dev/ashmem/lib/libmozalloc.so (deleted)
I/DEBUG ( 936): 4725275c 80c0d476 /data/data/org.mozilla.fennec/lib/libmozutils.so
I/WindowManager( 1017): WIN DEATH: Window{4443b530 org.mozilla.fennec/org.mozilla.fennec.App paused=false}
Assignee | ||
Comment 6•13 years ago
|
||
Could you get the libmozalloc.so file from the apk that gave you that crash and attach it to this bug? Thanks.
Reporter | ||
Comment 7•13 years ago
|
||
Assignee | ||
Comment 8•13 years ago
|
||
So, the calling code would be
000009e0 <moz_realloc>:
9e0: b510 push {r4, lr}
9e2: f7ff ef80 blx 8e4 <moz_malloc_usable_size-0x40>
9e6: bd10 pop {r4, pc}
which ... makes no sense, because the crash is at 0x8a4, which is *before* 0x8e4 (and both these addresses are in the PLT)
Comment 10•13 years ago
|
||
Comment on attachment 561247 [details] [diff] [review]
Workaround
Talked to a few folks about this. Looks good to workaround the problem and get testing infra back on it's feet. Mike (glandium) also agrees.
Let's land as a workaround, but keep the bug open to figure out the "why" part.
Attachment #561247 -
Flags: review?(mark.finkle) → review+
Comment 11•13 years ago
|
||
pushed woraround to m-c:
https://hg.mozilla.org/mozilla-central/rev/b15856d4b114
leaving bug open to do deeper digging into what's happening
Assignee | ||
Comment 12•13 years ago
|
||
So, on the tegra board where I could reliably reproduce the issue, I seem to have reliably fixed it by moving the stuff done by mLibLoadThread in http://mxr.mozilla.org/mozilla-central/source/embedding/android/GeckoApp.java#404 on the main thread. It seems like dalvik is doing its own threading and doesn't do interworking properly (switching between arm and thumb). That's my current guess at the root cause.
Assignee | ||
Comment 13•13 years ago
|
||
Trying this:
https://hg.mozilla.org/try/rev/48eb2ecba6aa
https://tbpl.mozilla.org/?usebuildbot=1&tree=Try&rev=48eb2ecba6aa
Please feel free to re-trigger tests if necessary.
Yeah, that seriously sucks :( If this is confirmed to solve these crashes, this basically means we have to stop the world while our libs are being loaded. But, if dalvik effectively has a weird way to do threads, maybe we can make the linker code use real threads...
Note that an alternative explanation would be a bug in the bionic libc or in the kernel, which would be even scarier.
Assignee | ||
Comment 15•13 years ago
|
||
Some more info: it also doesn't happen if i add logging of the address being called, but keeping the separate thread.
Assignee | ||
Comment 16•13 years ago
|
||
In the end, it was a cache coherency problem between the instruction cache and the dat cache. It didn't happen before because it was quite unlikely to happen. But the drop in size of libmozalloc.so with the move of jemalloc into mozutils made it much more likely.
Assignee | ||
Comment 17•13 years ago
|
||
Assignee | ||
Updated•13 years ago
|
Attachment #561794 -
Flags: review?(mwu)
Comment 18•13 years ago
|
||
Comment on attachment 561794 [details] [diff] [review]
Ensure Instruction and Data cache coherency after extracting libraries
Good catch.
Attachment #561794 -
Flags: review?(mwu) → review+
Assignee | ||
Comment 19•13 years ago
|
||
Assignee: nobody → mh+mozilla
Whiteboard: [orange][mobile_unittests][android_tier_1] → [orange][mobile_unittests][android_tier_1][inbound]
Comment 20•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 9
Updated•13 years ago
|
Whiteboard: [orange][mobile_unittests][android_tier_1][inbound] → [orange][mobile_unittests][android_tier_1][inbound][QA?]
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange][mobile_unittests][android_tier_1][inbound][QA?] → [mobile_unittests][android_tier_1][inbound][QA?]
You need to log in
before you can comment on or make changes to this bug.
Description
•