Closed
Bug 1416015
Opened 7 years ago
Closed 7 years ago
Broken tiles and crashes with SurfaceTexture tiles enabled
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla59
People
(Reporter: esawin, Assigned: jnicol)
References
(Blocks 1 open bug, )
Details
Attachments
(4 files)
STR (Nexus 6P, Android 8.0, gfx.use-surfacetexture-textures enabled) 1. Open an article on zeit.de (reproduced on http://www.zeit.de/politik/ausland/2017-11/saudi-arabien-mohammed-bin-salman-massenverhaftungen) 2. Pinch-zoom in 3. Slowly pinch-zoom out until surrounding tiles stop updating [1] (remain white or grey) 4. (Repeat 2. and 3. until issue occurs) At this point the browser tab is unresponsive, reloading won't work, neither will opening new tabs (remain white). Only restarting the browser fixes the state. Sometimes the issue reproduces with a crash instead [2]. [1] https://photos.google.com/share/AF1QipNEW4QMmQiELMS3J3L-5vC-VEXpVJOjMZvsFlM2nk_fF9edm74oqvw252e33zmmkQ?key=MGktRmFTcUxqeHFtcThRU2k0NENhSkNEc2pmV0RR [2] https://crash-stats.mozilla.com/report/index/00b1a079-9bd6-4f7c-b550-e32990171109
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Attachment #8927048 -
Attachment mime type: text/html → text/plain
Assignee | ||
Comment 2•7 years ago
|
||
Thanks for testing. The crash is an oom, so ANativeWindow_lock fails, and we MOZ_CRASH. We need to handle the failure properly. Also need to investigate whether we're doing something to cause the oom in the first place too. Nothing in the log for the hang. Will dig deeper on monday. To confirm, can you reproduce this easily?
Assignee: nobody → jnicol
Reporter | ||
Comment 3•7 years ago
|
||
Yes, I can easily reproduce it when pinch-zooming. It also seems independent from the content, as I can reproduce it on most sites that allow zooming that I've tried.
Eugen do you get a bunch of failures in logcat related releaseTexImage()? If that goes south when we're destroying a tile, it could mean the memory doesn't get freed correctly. I remember there was an issue where we weren't destroying with the right GLContext current, but I thought maybe that was fixed? Something to look at at least.
Whoops I really meant detachFromGLContext(), which we call during destruction. And there are lots of failures there in the logs.
Reporter | ||
Comment 6•7 years ago
|
||
This one seems to have better logs, including E GLConsumer: [SurfaceTexture-0-13300-693] detachFromContext: invalid current EGLDisplay E GLConsumer: [SurfaceTexture-0-13300-735] syncForReleaseLocked: error dup'ing native fence fd: 0x3000 E GLConsumer: [SurfaceTexture-0-13300-735] syncForReleaseLocked failed (slot=1), err=-2147483648 W GeckoSurfaceTexture: releaseTexImage() failed W GeckoSurfaceTexture: java.lang.RuntimeException: Error during updateTexImage
OK that looks like we're also running out of fds, which indicates a leak.
Assignee | ||
Comment 8•7 years ago
|
||
I'm first going to make this resilient to out-of-memory situations, then we can find and fix the leak. ANativeWindow_lock returns -ENOMEM when we run out of memory. We should simply return false rather than crashing. That will fix the crash. The hang occurs when ReleaseTexImage (in SurfaceTextureHost::NotifyNotUsed) fails. > W Adreno-GSL: <gsl_syncobj_dup_fd:265>: 0xb28e4cd8 dup(212) error 24 Too many open files > GLConsumer: [SurfaceTexture-0-20515-729] syncForReleaseLocked: error dup'ing native fence fd: 0x3000 > E GLConsumer: [SurfaceTexture-0-20515-729] syncForReleaseLocked failed (slot=1), err=-2147483648 > W GeckoSurfaceTexture: releaseTexImage() failed > W GeckoSurfaceTexture: java.lang.RuntimeException: Error during updateTexImage (see logcat for details) > W GeckoSurfaceTexture: at android.graphics.SurfaceTexture.nativeReleaseTexImage(Native Method) > W GeckoSurfaceTexture: at android.graphics.SurfaceTexture.releaseTexImage(SurfaceTexture.java:252) > W GeckoSurfaceTexture: at org.mozilla.gecko.gfx.GeckoSurfaceTexture.releaseTexImage(GeckoSurfaceTexture.java:111) (There is a copy paste error in the android source which makes it say "error during updateTexImage" when it means "error during releaseTexImage" - https://android.googlesource.com/platform/frameworks/base/+/master/core/jni/android/graphics/SurfaceTexture.cpp#342) ANativeWindow_lock hangs waiting for a buffer, since it was not released. I'm not sure how best to fix this. It might be possible to get the TextureHost to not release its read lock, making the TiledContentClient then discard the buffer.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•7 years ago
|
||
An explanation of the problem for if future people come across this bug: ReleaseTexImage() was failing to create a sync object, because we ran out of FDs. We discovered this occurred because the fds are only closed when the SurfaceTexture object was finalized by the garbage collector, and it was quite easy to reach the limit before this happens.
Assignee | ||
Comment 11•7 years ago
|
||
mozreview-review |
Comment on attachment 8930292 [details] Bug 1416015 - Ensure SurfaceTexture desctruction happens correctly https://reviewboard.mozilla.org/r/201364/#review206856 ::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/gfx/GeckoSurfaceTexture.java:48 (Diff revision 1) > private GeckoSurfaceTexture(int handle, boolean singleBufferMode) { > super(0, singleBufferMode); > init(handle, singleBufferMode); > } > > + @Override I'm not sure how things are done in Java, but should this perhaps be protected, and "throws Throwable" (with the try-catch at the call site) so that it matches the super class' definition? ::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/gfx/GeckoSurfaceTexture.java:185 (Diff revision 1) > + } > - } > + } > - } > + } > > - release(); > + @WrapForJNI > + public static void destroyUnused(long context) { I don't see what doing this gets us over simply calling DetachFromGLContext when the TextureHost drops its reference to the SurfaceTexture. Especially now that the TextureHost is responsible for attaching it in the first place, that seems easier, and conceptually cleaner to me, and frees the resources sooner. We could assert that mAttachedContext == 0 when the final reference is dropped, to catch any mistakes.
Comment 12•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8930292 [details] Bug 1416015 - Ensure SurfaceTexture desctruction happens correctly https://reviewboard.mozilla.org/r/201364/#review206856 > I'm not sure how things are done in Java, but should this perhaps be protected, and "throws Throwable" (with the try-catch at the call site) so that it matches the super class' definition? Yup, these are good points, I'll change it. > I don't see what doing this gets us over simply calling DetachFromGLContext when the TextureHost drops its reference to the SurfaceTexture. Especially now that the TextureHost is responsible for attaching it in the first place, that seems easier, and conceptually cleaner to me, and frees the resources sooner. > > We could assert that mAttachedContext == 0 when the final reference is dropped, to catch any mistakes. The problem is that the SurfaceTexture is ref counted, and the Compositor isn't always the last reference holder. The lifetime of the Surface on the content side is shared with that of its SurfaceTexture, so sometimes the content thread/process is the last to unref a SurfaceTexture. In that case, the decrementUse() is called on a binder thread and not the compositor thread, so this destroyUnused() pattern is used to make sure we do the destruction from the compositor. It would be nicer if we could just queue a Runnable to run on the Compositor thread, but we don't have a Java Looper or anything like that to allow this.
Assignee | ||
Comment 13•7 years ago
|
||
mozreview-review |
Comment on attachment 8930292 [details] Bug 1416015 - Ensure SurfaceTexture desctruction happens correctly https://reviewboard.mozilla.org/r/201364/#review206918
Attachment #8930292 -
Flags: review?(jnicol) → review+
Comment 14•7 years ago
|
||
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/efc5b1e03ead Ensure SurfaceTexture desctruction happens correctly r=jnicol
Comment 15•7 years ago
|
||
Pushed by jwillcox@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/840cf9044de3 Fix checkstyle failure r=me
Comment 16•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/efc5b1e03ead https://hg.mozilla.org/mozilla-central/rev/840cf9044de3
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox59:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
I think this needs backed out, I'm still getting hangs.
Oops wrong bug.
Comment 19•7 years ago
|
||
I guess this can ride the trains to 59, please go ahead and request uplift to beta if not.
Assignee | ||
Comment 20•7 years ago
|
||
This is still off behind a pref anyway so no need to uplift
You need to log in
before you can comment on or make changes to this bug.
Description
•