Closed Bug 825996 Opened 8 years ago Closed 6 years ago

crash in libdvm.so@0x45... on JB

Categories

(Firefox for Android :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox17 --- affected
firefox18 --- affected
firefox19 - affected
firefox20 - affected
firefox21 --- affected
firefox22 --- affected
firefox23 --- affected
firefox24 --- affected
firefox25 --- affected

People

(Reporter: kbrosnan, Assigned: jchen)

References

Details

(Keywords: crash, topcrash-android-armv7, Whiteboard: [native-crash])

Crash Data

Attachments

(1 file)

This is a follow up for the remaining crashes.

+++ This bug was initially created as a clone of Bug #780831 +++

It's #17 top crasher in 15.0b3 and #22 in 16.0a2 while only #485 in 14.0.1 and #93 in 17.0a1.

Signature 	libdvm.so@0x45dd0 More Reports Search
UUID	cad1fbe6-5d08-4521-b9fd-9242c2120807
Date Processed	2012-08-07 07:28:42
Uptime	964
Last Crash	16.2 minutes before submission
Install Age	3.0 days since version was first installed.
Install Time	2012-08-04 07:31:09
Product	FennecAndroid
Version	15.0
Build ID	20120731145644
Release Channel	beta
OS	Linux
OS Version	0.0.0 Linux 3.1.10-g52027f9 #1 SMP PREEMPT Thu Jun 28 16:19:26 PDT 2012 armv7l
Build Architecture	arm
Build Architecture Info	
Crash Reason	SIGSEGV
Crash Address	0xdeadd00d
App Notes 	
AdapterVendorID: grouper, AdapterDeviceID: Nexus 7.
AdapterDescription: 'Model: 'Nexus 7', Product: 'nakasi', Manufacturer: 'asus', Hardware: 'grouper''.
EGL? EGL+ GL Context? GL Context+ GL Layers? GL Layers+ 
asus Nexus 7
google/nakasi/grouper:4.1.1/JRO03D/402395:user/release-keys
Processor Notes 	CSignatureTool: No proper signature could be created because no good data for the crashing thread (24) was found
EMCheckCompatibility	True
Adapter Vendor ID	grouper
Adapter Device ID	Nexus 7

Frame 	Module 	Signature 	Source
0 	libdvm.so 	libdvm.so@0x45dd0

More reports at:
https://crash-stats.mozilla.com/report/list?signature=libdvm.so%400x45dd0
Keywords: topcrash
It's still #1 on recent betas, so it's still a topcrash.
Keywords: topcrash
Whiteboard: [native-crash][summary in comment 23] → [native-crash]
Kats, this is a clone of Bug #780831 hence passing on to you for further investigation.Can you please take a look ?
Assignee: nobody → bugmail.mozilla
The crashes on 18 don't have enough useful info to diagnose the problem, and the crashes don't seem to be occurring on 19 or 20, so I can't get any useful info from those channels either. Any fixes here would be speculative at best, and given that 18 is almost out the door I don't think it's really worth spending the time to look at this much more.
Actually maybe that's not entirely true. I found a couple of crashes on 20 that also match the libdvm.so@0x45... signature:

https://crash-stats.mozilla.com/report/index/eed1ff79-015a-46f4-9705-7c66c2121228
https://crash-stats.mozilla.com/report/index/20d8031a-2c0f-4c58-9838-c42362121224

I looked at the logcat dumps for these and they both have this outputted by dalvikvm as it crashes:

W dalvikvm: Invalid indirect reference 0x41f58fa8 in decodeIndirectRef
E dalvikvm: VM aborting

According to [1] this could be a result of passing NULL or "" as a jstring when invoking a java method via JNI. A quick check through AndroidBridge.cpp shows that we do this in AndroidBridge::EmptyClipboard.

There's also a NULL in the call to eglCreateWindowSurface which might be a problem, and NewJavaString could return NULL and that's used all over the place (these two are less likely to be a problem though since it would happen a lot more often or be logged).

I can try to come up with STR for this; if my analysis/data above is correct then triggering a call to AndroidBridge::EmptyClipboard should be sufficient to repro this.

[1] http://stackoverflow.com/questions/11055609/android-ics-ndk-invalid-indirect-reference-on-newobject-call
I forced a call to AndroidBridge::EmptyClipboard and it didn't crash, so that's not the problem.
This is only #20 in early 18.0 release data while it's #9 in 17.0 release data, so bug 780831 definitely helps, but we need to keep tracking this and trying to find solutions for the remaining problems.
We'll need to see how this does in early Beta 19 data.
The crash volume here is lower now in 18.0beta. We're going to check the crash data one more time in the next week before untracking.
FYI, it's down to #13 on 19.0b1 at this time.
This crash volume is now at the same level as FF15. Given that, I don't expect tracking to get us to resolution any sooner.
Just as an additional data point, it's #21 now on 18.0 release, so not tracking is surely the right thing.
That said, there's still crashes happening with this and it would be good if we could eliminate more or all of those.
Crash Signature: [@ libdvm.so@0x45dd0] [@ libdvm.so@0x45c90] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-2.apk@classes.dex@0xaab0a] [@ data@app@org.mozilla.firefox_beta-1.apk@classes.dex@0xaab26] [@ data@app@or… → [@ libdvm.so@0x45dd0] [@ libdvm.so@0x45c90] [@ libdvm.so@0x45e50 ] [@ libdvm.so@0x45e88 ] [@ libdvm.so@0x45ed0 ]
Version: Firefox 15 → Trunk
With combined signatures, it's #25 crasher in 20.0 and #13 in 21.0b1.
Keywords: topcrash
Crash Signature: [@ libdvm.so@0x45dd0] [@ libdvm.so@0x45c90] [@ libdvm.so@0x45e50 ] [@ libdvm.so@0x45e88 ] [@ libdvm.so@0x45ed0 ] → [@ libdvm.so@0x45dd0] [@ libdvm.so@0x45c90] [@ libdvm.so@0x45e50 ] [@ libdvm.so@0x45e88 ] [@ libdvm.so@0x45ed0 ] [@ libdvm.so@0x45d08 ] [@ libdvm.so@0x45ad8 ]
This is back into a top crash.
What's really interesting to me about these is that we're regularly crashing at 0xdeadd00d. Does anyone know what that poison address means?

Also, the stacks are almost always useless because of "No proper signature could be created because no good data for the crashing thread (20) was found" which is a little worrying to me. Ted, do you know what that message actually means and whether that means we're not collecting good minidumps?
Flags: needinfo?(ted)
Flags: needinfo?(blassey.bugs)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> What's really interesting to me about these is that we're regularly crashing
> at 0xdeadd00d. Does anyone know what that poison address means?

That's a dalvik VM abort. It *usually* means some JNI call threw an exception and then we tried to do something else without clearing it.
I'm not actively working on this so somebody else should take it if it's a high priority.
Assignee: bugmail.mozilla → nobody
Jim has been looking at a similar crash
Assignee: nobody → nchen
Flags: needinfo?(blassey.bugs)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Also, the stacks are almost always useless because of "No proper signature
> could be created because no good data for the crashing thread (20) was
> found" which is a little worrying to me. Ted, do you know what that message
> actually means and whether that means we're not collecting good minidumps?

That message is a result of the junky stack, not the cause. It basically means "everything on the crashing thread was in the skiplist":
https://github.com/mozilla/socorro/blob/master/socorro/processor/signature_utilities.py#L176
Flags: needinfo?(ted)
It's probably worthwhile to take some of these dumps on a trip through dump-lookup to see if there's anything useful hiding on the stack.
dump-lookup results for bp-af85ef98-63a7-424d-becc-e68f02140113

The relevant frames seem to be:
0x6046e08c: libxul.so!_JNIEnv::CallStaticObjectMethod(_jclass*, _jmethodID*, ...) [jni.h:da9ce9ea0d96 : 779 + 0x11]
0x6046e09c: libxul.so!mozilla::AndroidBridge::HandleGeckoMessageWrapper(nsAString_internal const&) [GeneratedJNIWrappers.cpp:da9ce9ea0d96 : 1223 + 0x7]
0x6046e0a0: dalvik-LinearAlloc (deleted) + 0x360a08
0x6046e0bc: libxul.so!mozilla::AndroidBridge::HandleGeckoMessage(nsAString_internal const&, nsAString_internal&) [AndroidBridge.cpp:da9ce9ea0d96 : 1068 + 0x1]
0x6046e0dc: libxul.so!nsAndroidBridge::HandleGeckoMessage(nsAString_internal const&, nsAString_internal&) [AndroidBridge.cpp:da9ce9ea0d96 : 1563 + 0x1]
0x6046e0f4: libxul.so!NS_InvokeByIndex [xptcinvoke_arm.cpp:da9ce9ea0d96 : 165 + 0x1]
0x6046e144: libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:da9ce9ea0d96 : 2139 + 0x1]
... JS code and then hard to tell
I can reproduce this crash on Samsung Galaxy Nexus (Android 4.2) on latest Nightly 04-08-2014 with the following steps:
1. Go to http://mozilla.github.com/webrtc-landing/gum_test.html
2. Tap audio & video
Keywords: steps-wanted
Hm I could not reproduce with the same setup (Galaxy Nexus, 4.2 build JOP40C, 2014-08-04 nightly). Did you use a new profile?
Flags: needinfo?(teodora.vermesan)
I cannot reproduce the issue with the steps provided in comment20 on latest Nightly (2014-08-25) on Samsung Galaxy Nexus (Android 4.2) (build JOP40D)
Flags: needinfo?(teodora.vermesan)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
File a new bug(?) seems rather different than the original issue. Snorp what are your thoughts?
Flags: needinfo?(snorp)
Yeah, this looks like a different crash/bug with the same signature. From the logs:

> 05:36:58     INFO -  07-16 05:36:34.507 W/dalvikvm( 2297): dvmFindClassByName rejecting 'android/media/MediaCodec'
> 05:36:58     INFO -  07-16 05:36:34.507 W/dalvikvm( 2297): JNI WARNING: JNI method called with exception pending
> 05:36:58     INFO -  07-16 05:36:34.507 W/dalvikvm( 2297):              in Ldalvik/system/NativeStart;.run:()V (NewGlobalRef)
> 05:36:58     INFO -  07-16 05:36:34.507 W/dalvikvm( 2297): Pending exception is:
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): java.lang.ClassNotFoundException: android/media/MediaCodec
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): 	at dalvik.system.BaseDexClassLoader.findClass(BaseDexClassLoader.java:61)
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): 	at java.lang.ClassLoader.loadClass(ClassLoader.java:501)
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): 	at java.lang.ClassLoader.loadClass(ClassLoader.java:461)
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): 	at dalvik.system.NativeStart.run(Native Method)
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297): "Thread-106" prio=5 tid=21 NATIVE
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297):   | group="main" sCount=0 dsCount=0 obj=0x41346718 self=0x10eb078
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297):   | sysTid=2570 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=16741992
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297):   | schedstat=( 0 0 0 ) utm=0 stm=0 core=0
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297):   at dalvik.system.NativeStart.run(Native Method)
> 05:36:58     INFO -  07-16 05:36:34.507 I/dalvikvm( 2297):
> 05:36:58     INFO -  07-16 05:36:34.507 E/dalvikvm( 2297): VM aborting
Flags: needinfo?(snorp)
You need to log in before you can comment on or make changes to this bug.