Closed Bug 1577192 Opened 10 months ago Closed 8 months ago

Crash in [@ java.lang.OutOfMemoryError: at dalvik.system.VMRuntime.newNonMovableArray(Native Method)]

Categories

(GeckoView :: General, defect, P1)

69 Branch
Unspecified
Android
defect

Tracking

(firefox-esr68 unaffected, firefox69 wontfix, firefox70 wontfix, firefox71 wontfix, firefox72 fixed)

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- unaffected
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: marcia, Assigned: droeh)

References

Details

(Keywords: crash, regression, Whiteboard: [geckoview:m1909] [geckoview:m1910])

Crash Data

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1569312 +++

Creating a new bug since this signature appears to be the #2 crash in Fenix according to https://crash-stats.mozilla.com/topcrashers/?product=Fenix&version=69.0b0&process_type=any

It also appears to be a different stack than what is showing for the Fennec bug.
Bug 1569312#c4 has a good summary of what we think is happening in Fenix.

java.lang.OutOfMemoryError: Failed to allocate a 7115052 byte allocation with 4967648 free bytes and 4MB until OOM
	at dalvik.system.VMRuntime.newNonMovableArray(Native Method)
	at android.graphics.Bitmap.nativeCreate(Native Method)
	at android.graphics.Bitmap.createBitmap(Bitmap.java:843)
	at android.graphics.Bitmap.createBitmap(Bitmap.java:820)
	at android.graphics.Bitmap.createBitmap(Bitmap.java:787)
	at org.mozilla.gecko.GeckoThread.runUiThreadCallback(Native Method)
	at org.mozilla.gecko.GeckoThread$1.run(GeckoThread.java:2)
	at android.os.Handler.handleCallback(Handler.java:742)
	at android.os.Handler.dispatchMessage(Handler.java:95)
	at android.os.Looper.loop(Looper.java:157)
	at android.app.ActivityThread.main(ActivityThread.java:5601)
	at java.lang.reflect.Method.invoke(Native Method)
	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:774)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:652)

The GV team's theory is that Fenix is probably taking page screenshots more often than they need. Is Fenix destroying the screenshot Bitmaps? Do we reuse the Bitmap? But would that screenshot theory explain why Fennec is affected? There were over 1500 Fennec crash reports with this signature last week.

The crash spiked in mid-July, which is when we released ARM64 Fennec 68. It's surprising that we would see more more OOMs on ARM64. See a similar stack trace from Fenix crash bug 1574992.

See Also: → 1569312
Duplicate of this bug: 1574992
See Also: → 1577526

Adding this bug to GV's September sprint because this is a top crash in Fenix

Emily is investigating.

Assignee: nobody → etoop
Priority: P2 → P1
Whiteboard: [geckoview:fenix:m8] → [geckoview:m1909]

AndroidComponents only takes a screenshot on Page Load Stop, which should not be too many calls, and then discards the Bitmap. Fenix does nothing different here, as they don't use screenshots in the tabs tray, only site icons.

However, considering that Fenix doesn't use the screenshots, there seems no need to continue to take them on page load stop. AC has been updated to make this only occur when the ThumbnailsFeature is enabled, but Fenix currently has that feature enabled. I believe this is intended to be disabled as it is unused.

All in all, this is unlikely to be caused by screenshotting in GeckoView and soon this feature will be disabled in Fenix anyway.

The GV interface to capture the contents of GeckoView is provided by GeckoView.capturePixels(). This calls requestScreenPixels in the Compositor which hooks into the renderer. This hook captures the requested data and then calls back into Java for a Bitmap object to copy the data into. The Bitmap object is than passed back as a result.

The allocation of the Bitmap object in Java is where the OOM crash occurs.
The typical maximum size of the Java heap on android is 36MB so larger render captures are in danger of blowing the heap.
This allocation is hindered by:
  The allocation must be contiguous so a fragmented heap effectively reduces available memory.
  The allocation can be effectively the full screen of a device. On a 4k device this would require just less than 32MB of contiguous heap.
  There is no way to pass in an existing Bitmap object for re-use or recycling.

It is suggested that this functionality is replaced with something that minimises memory usage and allows for the Java allocation to be controlled by the App.
For the API, a Screenshot class is suggested to be used and returned instead of Bitmap object.
The class would provide calls for requesting bitmap objects, image regions and scaling; Allowing for bitmaps to be passed as arguments as a draw target or for memory recycling/reuse.

Assignee: etoop → estirling

Tracking for GV's October sprint. Dylan says he will pick up Elliot's patch.

IIUC, Fenix is taking screenshots, but not actually using them. FxR plans to start using screenshots soon.

Assignee: estirling → droeh
Whiteboard: [geckoview:m1909] → [geckoview:m1910]
Whiteboard: [geckoview:m1910] → [geckoview:m1909] [geckoview:m1910]
Duplicate of this bug: 1588985

Marking this as fix optional for 71 to get this out of the Platform regression queue.

Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1d8e31bd28db
Adds a screenshot api using ScreenshotBuilder. r=geckoview-reviewers,snorp,rbarker

I guess the failure was caused by the flakiness I wrote in bug 1588985 comment 0.

As far as I can tell, with the old snapshot machinery (I haven't read the new one carefully), it takes images from the compositor directly so that it's possible that the image hasn't reflected changes just before taking the snapshot. In reftest harness (or something similar ones) we normally wait for MozAfterPaint events to be sure the compositor has composited the changes. But for the Android snapshot machinery I believe we don't need to waits for MozAfterPaint, but the test cases need to do.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #12)

I guess the failure was caused by the flakiness I wrote in bug 1588985 comment 0.

As far as I can tell, with the old snapshot machinery (I haven't read the new one carefully), it takes images from the compositor directly so that it's possible that the image hasn't reflected changes just before taking the snapshot. In reftest harness (or something similar ones) we normally wait for MozAfterPaint events to be sure the compositor has composited the changes. But for the Android snapshot machinery I believe we don't need to waits for MozAfterPaint, but the test cases need to do.

Thanks, this was very helpful! I think waiting for the ContentDelegate.onFirstContentfulPaint callback in Java is sufficient for the test cases, I was no longer able to reproduce this locally when I waited for that. I've let it run for a few iterations on try just to be safe before re-landing it.

Flags: needinfo?(droeh)
Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bb34b6826ada
Adds a screenshot api using ScreenshotBuilder. r=geckoview-reviewers,snorp,rbarker
Backout by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ffc26be8527d
Backed out changeset bb34b6826ada for geckoview failures on VerticalClippingTest.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&tochange=ffc26be8527d57b51e4ceaa03a5047e72efcb798&fromchange=bb34b6826ada3fdd31567c9bbfac1deba4d28256&searchStr=android%2C7.0%2Cx86-64%2Cdebug%2Ctest-android-em-7.0-x86_64%2Fdebug-geckoview-junit-e10s%2C%28gv-junit%29

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=274160747&repo=autoland&lineNumber=10087

Backout link: https://hg.mozilla.org/integration/autoland/rev/ffc26be8527d57b51e4ceaa03a5047e72efcb798

[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=verticalClippingSucceeds
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.VerticalClippingTest
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stack=org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:58)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.os.Handler.handleCallback(Handler.java:751)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:161)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitUntilCalled(GeckoSessionTestRule.java:1524)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitUntilCalled(GeckoSessionTestRule.java:1466)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitUntilCalled(GeckoSessionTestRule.java:1419)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.VerticalClippingTest.verticalClippingSucceeds(VerticalClippingTest.kt:69)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at java.lang.reflect.Method.invoke(Native Method)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.lambda$evaluate$0$GeckoSessionTestRule$2(GeckoSessionTestRule.java:1257)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at org.mozilla.geckoview.test.rule.-$$Lambda$GeckoSessionTestRule$2$mzZNnl5Bu5F2_4xGxj0DHU4J33I.run(lambda)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.app.Instrumentation$SyncRunnable.run(Instrumentation.java:1950)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.os.Handler.handleCallback(Handler.java:751)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.os.Looper.loop(Looper.java:154)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at android.app.ActivityThread.main(ActivityThread.java:6077)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at java.lang.reflect.Method.invoke(Native Method)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:756)
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test |
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=472
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: -2
[task 2019-11-01T21:18:59.576Z] 21:18:59 WARNING - TEST-UNEXPECTED-FAIL | org.mozilla.geckoview.test.VerticalClippingTest.verticalClippingSucceeds | status -2
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - TEST-INFO took 31264ms
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=546
[task 2019-11-01T21:18:59.576Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | org.mozilla.geckoview.test.WebExecutorTest:
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=testFetchUnknownHost
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.WebExecutorTest
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=473
[task 2019-11-01T21:18:59.577Z] 21:18:59 INFO - org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: 1

Flags: needinfo?(droeh)
Pushed by droeh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2a8ed7fb920c
Adds a screenshot api using ScreenshotBuilder. r=geckoview-reviewers,snorp,rbarker
Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Does this need a Beta uplift request for GV71?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)

Does this need a Beta uplift request for GV71?

Not as far as I know.

Flags: needinfo?(droeh)

Adds a screenshot api using ScreenshotBuilder. r=geckoview-reviewers,snorp,rbarker

Dylan, do Fenix or A-C need update any of their code to use the new ScreenshotBuilder API?

Flags: needinfo?(droeh)

(In reply to Chris Peterson [:cpeterson] from comment #21)

Adds a screenshot api using ScreenshotBuilder. r=geckoview-reviewers,snorp,rbarker

Dylan, do Fenix or A-C need update any of their code to use the new ScreenshotBuilder API?

The old code will continue to work, but it would probably be good to update to take advantage of the new functionality.

Sebastian: The patch for this adds the ability to set a source rectangle for screenshots, the ability to set the output size for screenshots in a few different ways (directly, aspect-preserving, and scale of the source), and allows you to recycle an existing bitmap as the output. If (as we think) the crash here was due to the size of screenshots, setting the output size should help with this.

Flags: needinfo?(droeh) → needinfo?(s.kaspari)
Regressions: 1533724
Regressions: 1594142
Flags: needinfo?(s.kaspari)
You need to log in before you can comment on or make changes to this bug.