Closed Bug 751262 Opened 13 years ago Closed 13 years ago

Dalvik VM aborts and Fennec dies sometimes after Fennec and Android device are idle

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

(firefox14 fixed, firefox15 verified, blocking-fennec1.0 +)

VERIFIED FIXED
Firefox 15
Tracking Status
firefox14 --- fixed
firefox15 --- verified
blocking-fennec1.0 --- +

People

(Reporter: aaronmt, Assigned: kats)

References

Details

(Keywords: crash, crashreportid, regression, Whiteboard: [native-crash])

Crash Data

Attachments

(5 files, 1 obsolete file)

In the past two days I have seen Fennec randomly die twice with a Dalvik VM abort after Fennec and my Android device are completely idle. Both times, Fennec was the foreground application and not on about:home. See attached log W/dalvikvm(21970): JNI WARNING: too many PopLocalFrame calls W/System.err(21970): java.lang.OutOfMemoryError: out of stack in JNI PushLocalFrame W/System.err(21970): at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method) W/System.err(21970): at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:488) W/System.err(21970): at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:112) bp-94d1af8c-5f67-41f1-bb85-ae3122120502 https://crash-stats.mozilla.com/report/index/bp-94d1af8c-5f67-41f1-bb85-ae3122120502 - Samsung Galaxy Nexus (Android 4.0.4) Nightly (05/02) Aurora (05/02)
qawanted to try to get STR
Assignee: nobody → bugmail.mozilla
Keywords: qawanted
Attached file Full raw log (Nightly 05/03) (obsolete) —
I can reproduce this visiting http://www.iex.nl/ (after the Android app message), and leaving my phone idle for about five minutes. When my screen turns off GeckoAppShell continues to taking whole-screen screenshots, is that right?
Last log line before crashing is: W/Surface ( 1781): Surface.finalize() has work. You should have called release() (4347488, 0) Sounds like a case for super-snorp!
blocking-fennec1.0: ? → +
Another full raw log (Nightly 05/05) - idling on a mobile Gizmodo article for about 15 minutes.
Attachment #620683 - Attachment is obsolete: true
The new log indicates a dependency on bug 748531. The nightly from the 5th didn't have the patches from that bug; an updated log with a newer nightly might be useful in pinpointing the problem.
Depends on: 748531
Attached patch patchSplinter Review
this is a speculative fix based on what cpeterson described the bug to be on irc
Attachment #622225 - Flags: review?
Yet another raw log with some "JNI WARNING: too many PopLocalFrame calls" proceeding a crash.
Attachment #622225 - Flags: review? → review?(bugmail.mozilla)
Comment on attachment 622225 [details] [diff] [review] patch Review of attachment 622225 [details] [diff] [review]: ----------------------------------------------------------------- The patch itself is fine, but it won't fix anything except remove a bunch of JNI warnings from the log. The root cause is still us running out of memory somewhere and causing the PushLocalFrame to fail. ::: widget/android/AndroidBridge.h @@ +569,3 @@ > // any local refs that you need to keep around in global refs! > void Purge() { > if (mJNIEnv) { Do we even use Purge anywhere? Can we just axe it?
Attachment #622225 - Flags: review?(bugmail.mozilla) → review+
Looking around for other people's use of NewDirectByteBuffer, I found this example: http://svn.apache.org/repos/asf/tomcat/native/trunk/native/src/bb.c Note how they call NewDirectByteBuffer and if it returns null, they free the memory they passed in. This could leak massively in our case (looking at AndroidBridge::TakeScreenshot as an example) because Java_org_mozilla_gecko_GeckoAppShell_allocateDirectBuffer does a malloc and directly passes the result to NewDirectByteBuffer. If the malloc succeeds but the NewDirectByteBuffer fails, the malloc'd memory will be leaked. Thoughts?
(In reply to Kartikaya Gupta (:kats) from comment #10) > Looking around for other people's use of NewDirectByteBuffer, I found this > example: > > http://svn.apache.org/repos/asf/tomcat/native/trunk/native/src/bb.c > > Note how they call NewDirectByteBuffer and if it returns null, they free the > memory they passed in. This could leak massively in our case (looking at > AndroidBridge::TakeScreenshot as an example) because > Java_org_mozilla_gecko_GeckoAppShell_allocateDirectBuffer does a malloc and > directly passes the result to NewDirectByteBuffer. If the malloc succeeds > but the NewDirectByteBuffer fails, the malloc'd memory will be leaked. > > Thoughts? You are correct, it would leak. But do we know that's actually happening in practice very often? Also, this would not have any impact on the Java heap.
I have no idea what's actually happening in practice :)
Attached patch moar patchSplinter Review
Attachment #622396 - Flags: review?(blassey.bugs)
Attachment #622396 - Flags: review?(blassey.bugs) → review+
In the top log, the following gets dumped: W/dalvikvm(21970): JNI local reference table (0xfcd580) dump: W/dalvikvm(21970): Last 10 entries (of 512): W/dalvikvm(21970): 511: 0x416ac280 java.lang.String "System.err" W/dalvikvm(21970): 510: 0x40a6e478 java.lang.Class<android.util.Log> W/dalvikvm(21970): 509: 0x416a6e68 java.lang.OutOfMemoryError W/dalvikvm(21970): 508: 0x4195e530 java.lang.OutOfMemoryError W/dalvikvm(21970): 507: 0x41b7a658 java.lang.OutOfMemoryError W/dalvikvm(21970): 506: 0x41823a70 java.lang.OutOfMemoryError W/dalvikvm(21970): 505: 0x41a2c358 java.lang.OutOfMemoryError W/dalvikvm(21970): 504: 0x41a2bc60 java.lang.OutOfMemoryError W/dalvikvm(21970): 503: 0x4195ea38 java.lang.OutOfMemoryError W/dalvikvm(21970): 502: 0x417ac950 java.lang.OutOfMemoryError W/dalvikvm(21970): Summary: W/dalvikvm(21970): 2 of java.lang.Class (2 unique instances) W/dalvikvm(21970): 2 of java.lang.String (2 unique instances) W/dalvikvm(21970): 126 of java.lang.OutOfMemoryError (126 unique instances) W/dalvikvm(21970): 382 of java.nio.ReadWriteDirectByteBuffer (382 unique instances) I'd say the likely cause for actually running out of memory is the 382 instances of ReadWriteDirectByteBuffer. We need to find out where those are coming from.
Target Milestone: Firefox 15 → ---
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #15) > I'd say the likely cause for actually running out of memory is the 382 > instances of ReadWriteDirectByteBuffer. We need to find out where those are > coming from. I don't think the ReadWriteDirectByteBuffers are taking up a significant amount of space, I think those are just local refs left behind in the JNI table from us not handling the OOMs properly. That is, the OOM happens first and then we start leaking those refs in the local ref table, and then we crash when the table fills up, rather than the buffers being leaked and causing the OOMs.
Also, if you want the bug to stay open you should take it and put a whiteboard annotation, rather than changing the target milestone.
Target Milestone: --- → Firefox 15
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 622225 [details] [diff] [review] patch [Approval Request Comment] Regression caused by (bug #): User impact if declined: possibly crashes? not really sure if this actually helps or not. Testing completed (on m-c, etc.): on m-c Risk to taking this patch (and alternatives if risky): android-only, fairly low risk String changes made by this patch: none
Attachment #622225 - Flags: approval-mozilla-aurora?
Comment on attachment 622396 [details] [diff] [review] moar patch [Approval Request Comment] Regression caused by (bug #): User impact if declined: potential memory leak Testing completed (on m-c, etc.): on m-c Risk to taking this patch (and alternatives if risky): android-only, fairly low risk String changes made by this patch: none
Attachment #622396 - Flags: approval-mozilla-aurora?
Verified Fixed via (Nightly 05/14), Galaxy Nexus (4.0.4)
Attachment #622225 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #622396 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The issue has been fixed and uplifted to Aurora. Removing regression-wanted.
Status: RESOLVED → VERIFIED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: