Closed Bug 959804 Opened 12 years ago Closed 10 years ago

Intermittent tcheck2,tp4m | application crashed [@ libc.so + 0x11e78]

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox28 unaffected, firefox29 affected, firefox30 affected, firefox-esr24 unaffected)

RESOLVED WORKSFORME
Tracking Status
firefox28 --- unaffected
firefox29 --- affected
firefox30 --- affected
firefox-esr24 --- unaffected

People

(Reporter: RyanVM, Assigned: esawin)

References

Details

(Keywords: crash, intermittent-failure)

This has been maddening to track down because as best I can tell, it started on m-c with the last b-i -> m-c merge (rev 3a264db9f9e7), but I can't get it to reproduce on b-i until much later. Does not appear to be a slave-specific issue. https://tbpl.mozilla.org/php/getParsedLog.php?id=32978458&tree=Mozilla-Central Android 2.2 Tegra mozilla-central talos remote-trobocheck2 on 2014-01-14 07:47:16 PST for push 3a264db9f9e7 slave: tegra-182 INFO : Reading 'browser_version' from remoteapp.ini => 29.0a1 INFO : Reading 'browser_name' from remoteapp.ini => Fennec INFO : Reading 'repository' from remoteapp.ini => http://hg.mozilla.org/mozilla-central INFO : Reading 'sourcestamp' from remoteapp.ini => 3a264db9f9e7 tegra-182: Started Tue, 14 Jan 2014 07:52:31 Running test tcheck2: Started Tue, 14 Jan 2014 07:52:31 INFO : EventExpecter: no longer listening for Gecko:Ready EventExpecter: no longer listening for DOMContentLoaded __startBeforeLaunchTimestamp1389715269271__endBeforeLaunchTimestamp __startAfterTerminationTimestamp1389715286288__endAfterTerminationTimestamp browser_name:Fennec browser_version:29.0a1 buildID:20140114064832 PROCESS-CRASH | tcheck2 | application crashed [@ libc.so + 0x11e78] Crash dump filename: /tmp/tmpvIe5em/2d8b2a86-10e7-faeb-77c6ebbb-2f4b1801.dmp Operating system: Android 0.0.0 Linux 2.6.32.9-00002-gd8084dc-dirty #1 SMP PREEMPT Wed Feb 2 11:32:06 PST 2011 armv7l nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys CPU: arm 0 CPUs Crash reason: SIGSEGV Crash address: 0xdeadbaad Thread 43 (crashed) 0 libc.so + 0x11e78 r4 = 0xafd42328 r5 = 0x00000000 r6 = 0x60cffc8c r7 = 0x0000a000 r8 = 0x56d84904 r9 = 0x56799856 r10 = 0x00000000 fp = 0x0000000c sp = 0x60cffc78 lr = 0xdeadbaad pc = 0xafd11e78 Found by: given as instruction pointer in context 1 libc.so + 0x438e2 sp = 0x60cffc7c pc = 0xafd438e4 Found by: stack scanning 2 libc.so + 0x42326 sp = 0x60cffc84 pc = 0xafd42328 Found by: stack scanning 3 libc.so + 0x42326 sp = 0x60cffc94 pc = 0xafd42328 Found by: stack scanning 4 libc.so + 0x4372a sp = 0x60cffc98 pc = 0xafd4372c Found by: stack scanning 5 libc.so + 0xbe4b sp = 0x60cffca0 pc = 0xafd0be4d Found by: stack scanning 6 libc.so + 0x3ad85 sp = 0x60cffca4 pc = 0xafd3ad87 Found by: stack scanning 7 libc.so + 0x43cbe sp = 0x60cffcb8 pc = 0xafd43cc0 Found by: stack scanning 8 libc.so + 0xcdad sp = 0x60cffcc8 pc = 0xafd0cdaf Found by: stack scanning
This single crash also appeared once, so I'm assuming it's related though I don't have proof of it. https://tbpl.mozilla.org/php/getParsedLog.php?id=32985568&tree=B2g-Inbound
Summary: Intermittent tcheck2 | application crashed [@ libc.so + 0x11e78] → Intermittent tcheck2,tp4m | application crashed [@ libc.so + 0x11e78]
Geoff, this is a pretty frequent failure. Can you please help find an owner?
I reviewed several recent tcheck2 failures. In all cases, I saw nothing helpful in the crashdump. In all cases, the crash happens shortly after receiving DOMContentLoaded. This sequence seems typical: 02-10 08:21:08.480 D/Robocop ( 2090): waiting for Gecko:Ready 02-10 08:21:09.570 E/GeckoConsole( 2090): zerdatime 1392020469578 - browser chrome startup finished. ... 02-10 08:21:09.750 D/Robocop ( 2090): handleMessage called for: Gecko:Ready; expecting: Gecko:Ready 02-10 08:21:09.750 D/Robocop ( 2090): received event Gecko:Ready 02-10 08:21:09.750 D/Robocop ( 2090): unblocked on expecter for Gecko:Ready 02-10 08:21:09.773 I/Robocop ( 2090): EventExpecter: no longer listening for Gecko:Ready ... 02-10 08:21:12.580 D/Robocop ( 2090): waiting for DOMContentLoaded ... 02-10 08:21:25.903 D/GeckoToolbar( 2090): onTabChanged: LOADED 02-10 08:21:25.903 D/GeckoBrowserApp( 2090): BrowserApp.onTabChanged: 0: LOADED 02-10 08:21:25.913 D/Robocop ( 2090): handleMessage called for: DOMContentLoaded; expecting: DOMContentLoaded 02-10 08:21:25.913 D/Robocop ( 2090): received event DOMContentLoaded 02-10 08:21:25.913 D/Robocop ( 2090): unblocked on expecter for DOMContentLoaded 02-10 08:21:25.913 I/Robocop ( 2090): EventExpecter: no longer listening for DOMContentLoaded 02-10 08:21:26.130 D/Robocop ( 2090): Received drawFinished notification 02-10 08:21:26.160 D/Robocop ( 2090): Received drawFinished notification 02-10 08:21:26.200 D/Robocop ( 2090): Received drawFinished notification 02-10 08:21:26.230 D/Robocop ( 2090): Received drawFinished notification 02-10 08:21:26.270 D/Robocop ( 2090): Received drawFinished notification 02-10 08:21:26.300 D/GeckoTabs( 2090): handleMessage: Content:PageShow 02-10 08:21:26.300 D/GeckoToolbar( 2090): onTabChanged: PAGE_SHOW 02-10 08:21:26.300 D/GeckoBrowserApp( 2090): BrowserApp.onTabChanged: 0: PAGE_SHOW 02-10 08:21:26.300 D/GeckoToolbar( 2090): onTabChanged: FAVICON 02-10 08:21:26.300 D/GeckoToolbarDisplayLayout( 2090): updateFavicon(android.graphics.Bitmap@484eecc0) 02-10 08:21:26.300 D/GeckoBrowserApp( 2090): BrowserApp.onTabChanged: 0: FAVICON 02-10 08:21:26.300 I/dalvikvm( 2090): Jit: resizing JitTable from 4096 to 8192 02-10 08:21:26.340 D/GeckoTabs( 2090): handleMessage: Content:StateChange 02-10 08:21:26.340 D/GeckoToolbar( 2090): onTabChanged: STOP 02-10 08:21:26.340 I/GeckoToolbarDisplayLayout( 2090): zerdatime 660424 - Throbber stop 02-10 08:21:26.340 D/GeckoBrowserApp( 2090): BrowserApp.onTabChanged: 0: STOP 02-10 08:21:27.373 I/DEBUG ( 937): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 02-10 08:21:27.373 I/DEBUG ( 937): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys' 02-10 08:21:27.373 I/DEBUG ( 937): pid: 2090, tid: 2233 >>> org.mozilla.fennec <<< 02-10 08:21:27.373 I/DEBUG ( 937): signal 11 (SIGSEGV), fault addr deadbaad
The deadbaad fault address indicates memory corruption; probably trying to free a bad pointer. I don't see anything in the merge set from comment 0 that would trigger this; it's a pretty small set of changes and really only touches b2g.
As a practical work-around, I thought sleeping immediately after the page load might help, but it doesn't: https://tbpl.mozilla.org/?tree=Try&rev=0eeac6add557.
Un-cc'ing myself. needinfo me if you want anything from me.
Kats: I believe the deadbaad address indicates that Bionic called abort(), not necessarily random memory corruption.
Checking into this on try, it seems the crash always happens in verifyHomePagerHidden, during the waitForCondition(). It looks to me like it is operating on a valid HomePager view. https://tbpl.mozilla.org/php/getParsedLog.php?id=34621004&tree=Try&full=1#error0
Flags: needinfo?(gbrown)
:margaret -- I think you wrote verifyHomePagerHidden...any thoughts here?
Flags: needinfo?(margaret.leibovic)
I don't know how to explain this: https://tbpl.mozilla.org/?tree=Try&rev=7cb671698c85 Is something wrong in BaseTest.waitForCondition()??
(In reply to Geoff Brown [:gbrown] from comment #151) > I don't know how to explain this: > https://tbpl.mozilla.org/?tree=Try&rev=7cb671698c85 > > Is something wrong in BaseTest.waitForCondition()?? Robocop tests aren't run on the UI thread are they? Views aren't thread-safe, so I wonder if posting this to the UI thread would help.
Oh, looking over the logs, I see that the waitForCondition test doesn't even execute in the first place....
(In reply to Geoff Brown [:gbrown] from comment #151) > I don't know how to explain this: > https://tbpl.mozilla.org/?tree=Try&rev=7cb671698c85 > > Is something wrong in BaseTest.waitForCondition()?? Thinking over this again, it seems unlikely that there's a problem in waitForCondition, or even verifyHomePagerHidden...more likely those are sleeping when there is a segv in gecko -- something related to page load, or post-load.
Sorry I've been slow to look into this, but it sounds like you no longer think verifyHomePagerHidden is to blame? It looks like that method is only used in PixelTest now, but it would be odd if this problem wasn't appearing in other PixelTests. Is there something different about the talos harness that would cause this crash to happen on talos tests, but not on regular mochitest robocop tests?
Flags: needinfo?(margaret.leibovic)
(In reply to :Margaret Leibovic from comment #194) Correct - I do not think verifyHomePagerHidden is to blame...but I don't know what is to blame! Why are these Talos tests affected and not others? That's a good question. As I recall, tcheck2 and tp4m do more page loads per check-in than any other Android test -- that might be a factor. Also, Talos and non-Talos tests tend to report failures differently, so the same crash might be reported in other bugs -- but I have looked and cannot find evidence of that (we have various startup crashes reported in bug 722166 and others, but those look quite different from this crash).
There are environment differences between Talos and non-Talos. Talos (tcheck2): 02-18 21:36:28.498 D/GeckoLoader( 2804): Gecko environment env0: MOZ_CRASHREPORTER_SHUTDOWN=1 02-18 21:36:28.498 D/GeckoLoader( 2804): env1: MOZ_CRASHREPORTER_NO_REPORT=1 02-18 21:36:28.498 D/GeckoLoader( 2804): env2: NO_EM_RESTART=1 02-18 21:36:28.498 D/GeckoLoader( 2804): env3: null vs Robocop: 02-18 07:49:39.016 D/GeckoLoader( 1814): Gecko environment env0: MOZ_CRASHREPORTER=1 02-18 07:49:39.016 D/GeckoLoader( 1814): env1: MOZ_CRASHREPORTER_NO_REPORT=1 02-18 07:49:39.016 D/GeckoLoader( 1814): env2: XPCOM_DEBUG_BREAK=stack 02-18 07:49:39.016 D/GeckoLoader( 1814): env3: XPCOM_MEM_BLOAT_LOG=/tmp/tmprTWYhK/runtests_leaks.log 02-18 07:49:39.016 D/GeckoLoader( 1814): env4: null vs jsreftest: 02-15 11:53:02.967 D/GeckoLoader( 1757): Gecko environment env0: ANDROID_SOCKET_zygote=10 02-15 11:53:02.967 D/GeckoLoader( 1757): env1: ANDROID_BOOTLOGO=1 02-15 11:53:02.967 D/GeckoLoader( 1757): env2: EXTERNAL_STORAGE=/mnt/sdcard 02-15 11:53:02.967 D/GeckoLoader( 1757): env3: ANDROID_ASSETS=/system/app 02-15 11:53:02.967 D/GeckoLoader( 1757): env4: ASEC_MOUNTPOINT=/mnt/asec 02-15 11:53:02.967 D/GeckoLoader( 1757): env5: PATH=/sbin:/system/sbin:/system/bin:/system/xbin 02-15 11:53:02.967 D/GeckoLoader( 1757): env6: ANDROID_DATA=/data 02-15 11:53:02.967 D/GeckoLoader( 1757): env7: BOOTCLASSPATH=/system/framework/core.jar:/system/framework/ext.jar:/system/framework/framework.jar:/system/framework/android.policy.jar:/system/framework/services.jar:/system/framework/com.nvidia.graphics.jar 02-15 11:53:02.967 D/GeckoLoader( 1757): env8: ANDROID_PROPERTY_WORKSPACE=9,32768 02-15 11:53:02.967 D/GeckoLoader( 1757): env9: ANDROID_ROOT=/system 02-15 11:53:02.967 D/GeckoLoader( 1757): env10: LD_LIBRARY_PATH=/system/lib 02-15 11:53:02.967 D/GeckoLoader( 1757): env11: MOZ_CRASHREPORTER_NO_REPORT=1 02-15 11:53:02.967 D/GeckoLoader( 1757): env12: MOZ_CRASHREPORTER=1 02-15 11:53:02.967 D/GeckoLoader( 1757): env13: XPCOM_MEM_BLOAT_LOG=/tmp/tmpx3kjcY.mozrunner/runreftest_leaks.log 02-15 11:53:02.967 D/GeckoLoader( 1757): env14: XPCOM_DEBUG_BREAK=stack 02-15 11:53:02.967 D/GeckoLoader( 1757): env15: MOZ_PROCESS_LOG=/tmp/tmpPUErt8pidlog 02-15 11:53:02.967 D/GeckoLoader( 1757): env16: NO_EM_RESTART=1 02-15 11:53:02.967 D/GeckoLoader( 1757): env17: null And there may be slight profile differences also.
Eugen, can you take a look? You're the expert on weird-ass test crashes now :)
Assignee: nobody → esawin
It looks like it has moved from inbound to aurora and is now only affecting the beta channel. Not reproducing on m-c: https://tbpl.mozilla.org/?tree=Try&rev=9934f6e0e73b.
I think I have seen this pattern in logcats for other crashes as well: 07:39:19 INFO - 04-25 07:38:38.703 E/GeckoConsole( 2106): [JavaScript Error: "browsers is null" {file: "jar:jar:file:///data/app/org.mozilla.fennec-1.apk!/assets/omni.ja!/components/SessionStore.js" line: 230}] 07:39:19 INFO - 04-25 07:38:40.554 F/libc ( 2106): Fatal signal 11 (SIGSEGV) at 0x5a5a5a5a (code=2) ...but I can't think of examples just now, and I don't know what's happening in SessionStore.
See Also: → 984555
Bug 924622 made changes to shutdown that may have affected / fixed this -- landed 05/23. This wasn't a frequent failure for the last few weeks though, so we should keep it open for now.
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.