Closed
Bug 751262
Opened 13 years ago
Closed 13 years ago
Dalvik VM aborts and Fennec dies sometimes after Fennec and Android device are idle
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox14 fixed, firefox15 verified, blocking-fennec1.0 +)
VERIFIED
FIXED
Firefox 15
People
(Reporter: aaronmt, Assigned: kats)
References
Details
(Keywords: crash, crashreportid, regression, Whiteboard: [native-crash])
Crash Data
Attachments
(5 files, 1 obsolete file)
|
52.42 KB,
text/plain
|
Details | |
|
179.97 KB,
text/plain
|
Details | |
|
1.75 KB,
patch
|
kats
:
review+
mfinkle
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
|
809.18 KB,
text/plain
|
Details | |
|
1.41 KB,
patch
|
blassey
:
review+
mfinkle
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
In the past two days I have seen Fennec randomly die twice with a Dalvik VM abort after Fennec and my Android device are completely idle. Both times, Fennec was the foreground application and not on about:home.
See attached log
W/dalvikvm(21970): JNI WARNING: too many PopLocalFrame calls
W/System.err(21970): java.lang.OutOfMemoryError: out of stack in JNI PushLocalFrame
W/System.err(21970): at org.mozilla.gecko.GeckoAppShell.nativeRun(Native Method)
W/System.err(21970): at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:488)
W/System.err(21970): at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:112)
bp-94d1af8c-5f67-41f1-bb85-ae3122120502
https://crash-stats.mozilla.com/report/index/bp-94d1af8c-5f67-41f1-bb85-ae3122120502
-
Samsung Galaxy Nexus (Android 4.0.4)
Nightly (05/02)
Aurora (05/02)
Comment 1•13 years ago
|
||
qawanted to try to get STR
Assignee: nobody → bugmail.mozilla
Keywords: qawanted
Comment 2•13 years ago
|
||
https://crash-stats.mozilla.com/report/index/bp-94d1af8c-5f67-41f1-bb85-ae3122120502
0 libdvm.so libdvm.so@0x5098e
I guess related to bug 730890, bug 750965 and bug 751343.
| Reporter | ||
Comment 3•13 years ago
|
||
I can reproduce this visiting http://www.iex.nl/ (after the Android app message), and leaving my phone idle for about five minutes.
When my screen turns off GeckoAppShell continues to taking whole-screen screenshots, is that right?
| Assignee | ||
Comment 4•13 years ago
|
||
Last log line before crashing is:
W/Surface ( 1781): Surface.finalize() has work. You should have called release() (4347488, 0)
Sounds like a case for super-snorp!
Updated•13 years ago
|
blocking-fennec1.0: ? → +
| Reporter | ||
Comment 5•13 years ago
|
||
Another full raw log (Nightly 05/05) - idling on a mobile Gizmodo article for about 15 minutes.
| Reporter | ||
Updated•13 years ago
|
Attachment #620683 -
Attachment is obsolete: true
| Assignee | ||
Comment 6•13 years ago
|
||
The new log indicates a dependency on bug 748531. The nightly from the 5th didn't have the patches from that bug; an updated log with a newer nightly might be useful in pinpointing the problem.
Depends on: 748531
Comment 7•13 years ago
|
||
this is a speculative fix based on what cpeterson described the bug to be on irc
Attachment #622225 -
Flags: review?
Comment 8•13 years ago
|
||
Yet another raw log with some "JNI WARNING: too many PopLocalFrame calls" proceeding a crash.
Updated•13 years ago
|
Attachment #622225 -
Flags: review? → review?(bugmail.mozilla)
| Assignee | ||
Comment 9•13 years ago
|
||
Comment on attachment 622225 [details] [diff] [review]
patch
Review of attachment 622225 [details] [diff] [review]:
-----------------------------------------------------------------
The patch itself is fine, but it won't fix anything except remove a bunch of JNI warnings from the log. The root cause is still us running out of memory somewhere and causing the PushLocalFrame to fail.
::: widget/android/AndroidBridge.h
@@ +569,3 @@
> // any local refs that you need to keep around in global refs!
> void Purge() {
> if (mJNIEnv) {
Do we even use Purge anywhere? Can we just axe it?
Attachment #622225 -
Flags: review?(bugmail.mozilla) → review+
| Assignee | ||
Comment 10•13 years ago
|
||
Looking around for other people's use of NewDirectByteBuffer, I found this example:
http://svn.apache.org/repos/asf/tomcat/native/trunk/native/src/bb.c
Note how they call NewDirectByteBuffer and if it returns null, they free the memory they passed in. This could leak massively in our case (looking at AndroidBridge::TakeScreenshot as an example) because Java_org_mozilla_gecko_GeckoAppShell_allocateDirectBuffer does a malloc and directly passes the result to NewDirectByteBuffer. If the malloc succeeds but the NewDirectByteBuffer fails, the malloc'd memory will be leaked.
Thoughts?
Comment 11•13 years ago
|
||
(In reply to Kartikaya Gupta (:kats) from comment #10)
> Looking around for other people's use of NewDirectByteBuffer, I found this
> example:
>
> http://svn.apache.org/repos/asf/tomcat/native/trunk/native/src/bb.c
>
> Note how they call NewDirectByteBuffer and if it returns null, they free the
> memory they passed in. This could leak massively in our case (looking at
> AndroidBridge::TakeScreenshot as an example) because
> Java_org_mozilla_gecko_GeckoAppShell_allocateDirectBuffer does a malloc and
> directly passes the result to NewDirectByteBuffer. If the malloc succeeds
> but the NewDirectByteBuffer fails, the malloc'd memory will be leaked.
>
> Thoughts?
You are correct, it would leak. But do we know that's actually happening in practice very often? Also, this would not have any impact on the Java heap.
| Assignee | ||
Comment 12•13 years ago
|
||
I have no idea what's actually happening in practice :)
| Assignee | ||
Comment 13•13 years ago
|
||
Attachment #622396 -
Flags: review?(blassey.bugs)
Updated•13 years ago
|
Attachment #622396 -
Flags: review?(blassey.bugs) → review+
| Assignee | ||
Comment 14•13 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/111174836d83
https://hg.mozilla.org/integration/mozilla-inbound/rev/a439eb29c5eb
status-firefox15:
--- → fixed
Target Milestone: --- → Firefox 15
Comment 15•13 years ago
|
||
In the top log, the following gets dumped:
W/dalvikvm(21970): JNI local reference table (0xfcd580) dump:
W/dalvikvm(21970): Last 10 entries (of 512):
W/dalvikvm(21970): 511: 0x416ac280 java.lang.String "System.err"
W/dalvikvm(21970): 510: 0x40a6e478 java.lang.Class<android.util.Log>
W/dalvikvm(21970): 509: 0x416a6e68 java.lang.OutOfMemoryError
W/dalvikvm(21970): 508: 0x4195e530 java.lang.OutOfMemoryError
W/dalvikvm(21970): 507: 0x41b7a658 java.lang.OutOfMemoryError
W/dalvikvm(21970): 506: 0x41823a70 java.lang.OutOfMemoryError
W/dalvikvm(21970): 505: 0x41a2c358 java.lang.OutOfMemoryError
W/dalvikvm(21970): 504: 0x41a2bc60 java.lang.OutOfMemoryError
W/dalvikvm(21970): 503: 0x4195ea38 java.lang.OutOfMemoryError
W/dalvikvm(21970): 502: 0x417ac950 java.lang.OutOfMemoryError
W/dalvikvm(21970): Summary:
W/dalvikvm(21970): 2 of java.lang.Class (2 unique instances)
W/dalvikvm(21970): 2 of java.lang.String (2 unique instances)
W/dalvikvm(21970): 126 of java.lang.OutOfMemoryError (126 unique instances)
W/dalvikvm(21970): 382 of java.nio.ReadWriteDirectByteBuffer (382 unique instances)
I'd say the likely cause for actually running out of memory is the 382 instances of ReadWriteDirectByteBuffer. We need to find out where those are coming from.
status-firefox15:
fixed → ---
Target Milestone: Firefox 15 → ---
| Assignee | ||
Comment 16•13 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #15)
> I'd say the likely cause for actually running out of memory is the 382
> instances of ReadWriteDirectByteBuffer. We need to find out where those are
> coming from.
I don't think the ReadWriteDirectByteBuffers are taking up a significant amount of space, I think those are just local refs left behind in the JNI table from us not handling the OOMs properly. That is, the OOM happens first and then we start leaking those refs in the local ref table, and then we crash when the table fills up, rather than the buffers being leaked and causing the OOMs.
| Assignee | ||
Comment 17•13 years ago
|
||
Also, if you want the bug to stay open you should take it and put a whiteboard annotation, rather than changing the target milestone.
status-firefox15:
--- → fixed
Target Milestone: --- → Firefox 15
Comment 18•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 19•13 years ago
|
||
| Assignee | ||
Comment 20•13 years ago
|
||
Comment on attachment 622225 [details] [diff] [review]
patch
[Approval Request Comment]
Regression caused by (bug #):
User impact if declined: possibly crashes? not really sure if this actually helps or not.
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): android-only, fairly low risk
String changes made by this patch: none
Attachment #622225 -
Flags: approval-mozilla-aurora?
| Assignee | ||
Comment 21•13 years ago
|
||
Comment on attachment 622396 [details] [diff] [review]
moar patch
[Approval Request Comment]
Regression caused by (bug #):
User impact if declined: potential memory leak
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): android-only, fairly low risk
String changes made by this patch: none
Attachment #622396 -
Flags: approval-mozilla-aurora?
| Reporter | ||
Comment 22•13 years ago
|
||
Verified Fixed via (Nightly 05/14), Galaxy Nexus (4.0.4)
Keywords: qawanted
| Reporter | ||
Updated•13 years ago
|
status-firefox14:
--- → affected
Updated•13 years ago
|
Attachment #622225 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•13 years ago
|
Attachment #622396 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
| Assignee | ||
Comment 23•13 years ago
|
||
Comment 24•13 years ago
|
||
The issue has been fixed and uplifted to Aurora. Removing regression-wanted.
Keywords: regressionwindow-wanted
Updated•13 years ago
|
Status: RESOLVED → VERIFIED
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•