Closed Bug 741319 Opened 12 years ago Closed 12 years ago

Adreno200-EGL: eglLockWindowSurface: failed to map the memory

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(blocking-fennec1.0 beta+)

VERIFIED FIXED
Firefox 14
Tracking Status
blocking-fennec1.0 --- beta+

People

(Reporter: xti, Assigned: ajuma)

Details

Attachments

(2 files, 2 obsolete files)

Attached file logcat
Firefox 14.0a1 (2012-04-01)
Device: HTC Desire
OS: Android 2.2

Steps to reproduce:
1. Open Fennec
2. Go to https://wiki.mozilla.org/Mobile/Notes/28-Mar-2012#Chris_Lord_.28cwiiis.29
3. Pan and zoom the page until the screen turns to persistent black

Expected result:
No errors should occur.

Actual result:
Lots of E/Adreno200-EGL( 5842): eglLockWindowSurface: failed to map the memory occur in console.
If step 3 is still performed, those kind of errors will still occur until OOM.
Also if the page is reloaded the issue will still persist.
When I tap on URL Bar I get these errors in console: 

I/ActivityManager( 1333): Displayed org.mozilla.fennec/org.mozilla.gecko.AwesomeBar: +349ms
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
E/Adreno200-EGL( 5655): egliSwapWindowSurface: unable to dequeue native buffer
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
E/Adreno200-EGL( 5655): eglLockWindowSurface: unable to dequeue native buffer
E/Surface ( 5655): surface (identity=571) is invalid, err=-19 (No such device)
I/GeckoApp( 5655): stop

Should I file another bug or is it related to this one?
blocking-fennec1.0: --- → ?
(In reply to Cristian Nicolae (:xti) from comment #1)
> Should I file another bug or is it related to this one?

It seems related to this one.
Discussed in triage - beta+ blocking and ali to own or reassign
Assignee: nobody → ajuma
blocking-fennec1.0: ? → beta+
While I haven't been able to reproduced this so far, I believe I know what's happening. The error "eglLockWindowSurface: failed to map the memory" can occur on Adreno devices when we continually generate textures (using glGenTextures) but either fail to delete them (using glDeleteTextures) or delete them using a different context from the one use to create them. And, indeed, it turns out our texture deletion in ~TextureImageEGL and ~BasicTextureImage is just plain wrong for OMTC: we check if we're on the main thread (which, of course, we're not), and if not, we delete the textures using the global shared context rather than the context used to create them.

We need to check if we're on the same thread used to create the GLContext, not if we're on the main thread.
This patch makes us check if we're on the thread used to create the context, not if we're on the main thread.

It also seems to fix some of the black flashes/checkerboarding we've been seeing recently.
Attachment #612615 - Flags: review?(joe)
Whiteboard: [autoland-try:-p all -u all -t all]
Whiteboard: [autoland-try:-p all -u all -t all] → [autoland-in-queue]
Comment on attachment 612615 [details] [diff] [review]
Delete textures using the same context used to create them, if on the thread that owns that context

love it.

Should we also change the NS_IsMainThread in ImageLayerOGL.cpp? It looks like it puts it off to the main thread if not, so it might not matter, but we might be able to simplify things.
Attachment #612615 - Flags: review?(joe) → review+
(In reply to Joe Drew (:JOEDREW!) from comment #6)
> Should we also change the NS_IsMainThread in ImageLayerOGL.cpp? It looks
> like it puts it off to the main thread if not, so it might not matter, but
> we might be able to simplify things.

Yes, we definitely should. It looks like GLTexture::Release() in ImageLayerOGL.cpp is actually causing our compositor's GL context to be made current on the main thread!

I'll update the patch to include the changes needed in ImageLayerOGL.cpp.
Patch updated to also change the NS_IsMainThread in ImageLayerOGL.cpp, and to dispatch an event to the context's owning thread (rather than to the main thread) when we're not on the context's owning thread.
Attachment #612615 - Attachment is obsolete: true
Attachment #612683 - Flags: review?(joe)
Attachment #612683 - Flags: review?(joe) → review+
Autoland Failure
Specified patches [612615] do not exist, or are not posted to this bug.
Autoland Patchset:
	Patches: 612683
	Branch: mozilla-central => try
	Destination: http://hg.mozilla.org/try/pushloghtml?changeset=7ebd63a35b95
Try run started, revision 7ebd63a35b95. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=7ebd63a35b95
There were a couple issues with the previous version of this patch.

First, in GLTexture::Release, mContext.forget() was getting called before mContext.DispatchToOwningThread. This was easy to fix.

The more serious issue is that we were leaking TextureDeleters on OS X, and consequently also leaking GLContexts and an nsThread. It turns out that when we dispatch TextureDeleters to the main thread during shutdown, they're arriving too late to get processed, so they enter the event queue but never get released. This seems to be a bug in the way late-arriving events are being handled. We weren't hitting this before when using NS_DispatchToMainThread since when this is called during shutdown, it fails to get the main thread, and hence doesn't add events to the event queue (that is, the events are simply dropped). This suggests a workaround we can use when dispatching events: try to get the main thread, and if this fails, don't dispatch an event since we're probably shutting down. This solves the leak for me locally (and I'll verify this on try).

So it seems we have at least a couple options here:
1) Set aside the ImageLayerOGL changes, land the rest, and file a follow-up for the ImageLayerOGL changes.
2) Use the workaround described above, and file a bug for the issue we're seeing with the way late-arriving events are handled.

I prefer (2), since:
-Otherwise, with OMTC, we're making the compositor's context current on the main thread when deleting textures in ImageLayerOGL, and this is going to cause instability.
-This doesn't change our current behaviour without OMTC: events that were going to be dispatched to the main thread will still be dispatched, and events that were going to be dropped will still be dropped.
Attachment #612683 - Attachment is obsolete: true
Attachment #613341 - Flags: review?(joe)
(In reply to Ali Juma [:ajuma] from comment #11)
> (and I'll verify this on try).

Try run looks good: https://tbpl.mozilla.org/?tree=Try&rev=e245587d976b
Comment on attachment 613341 [details] [diff] [review]
Delete textures using the same context used to create them, on the thread that owns that context

Review of attachment 613341 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.h
@@ +657,5 @@
> +    void DispatchToOwningThread(nsIRunnable *event) {
> +        // Before dispatching, we need to ensure we're not in the middle of
> +        // shutting down. Dispatching runnables in the middle of shutdown
> +        // (that is, when the main thread is no longer get-able) can cause them
> +        // to leak. See Bug X.

X=741319

I presume NS_GetMainThread fails if we're shutting down?
Attachment #613341 - Flags: review?(joe) → review+
(In reply to Joe Drew (:JOEDREW!) from comment #13)

> I presume NS_GetMainThread fails if we're shutting down?

Correct.
https://hg.mozilla.org/integration/mozilla-inbound/rev/b1421c3cd5c8
Whiteboard: [autoland-in-queue]
Target Milestone: --- → Firefox 14
https://hg.mozilla.org/mozilla-central/rev/b1421c3cd5c8
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Verified fixed on:

Firefox 14.0a1 (2012-04-17)
Device: HTC Desire
OS: Android 2.2
Status: RESOLVED → VERIFIED
Try run for 7ebd63a35b95 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=7ebd63a35b95
Results (out of 277 total builds):
    exception: 4
    success: 208
    warnings: 41
    failure: 24
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-7ebd63a35b95
 Timed out after 12 hours without completing.
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: