SkiaGL crashes with async compositing

RESOLVED FIXED in mozilla25

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: snorp, Assigned: snorp)

Tracking

({crash})

unspecified
mozilla25
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

Sometimes we crash during reftests with SkiaGL enabled on Android. We normally end up crashing in SurfaceTextureHostOGL::Lock(). After a little debugging, I found that the SurfaceStream we have there is garbage, as it has been destroyed by the content thread. I can reproduce this pretty regularly by running the layout/reftest/bugs/608636-1.html test by itself.
Keywords: crash
I suspect that this might also be a potential issue with WebGL on mac.

The difference is that for WebGL the GLContext's lifetime is tied to that of the DOM element (I believe), so it always lives long enough for it to have been removed from the compositor.

With SkiaGL 2d canvas, the GLContext reference is part of mDrawTarget and we can drop that fairly quickly in some cases (no waiting for garbage collection).

We can't just take a reference to the GLContext on the compositor side, since it could have been destroyed by then.

I think we need to take a strong reference to the GLContext (or anything else that will guarantee our SurfaceStream stays alive - but I think only the GLContext is refcounted) here: 

http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/CanvasClient.cpp#126

Then when we stop using the SurfaceStream on the compositor we can remove it.

GLContext has thread-safe counting, so we should be fine there.

Deciding how to implement and abstract this is the fun part :)
My preference would be to make SurfaceStream threadsafe refcounted, and take a strong reference when we pass it to the compositor. Then make the compositor responsible for releasing that when it no longer needs it.

Not sure how that would affect the rest of the SurfaceStream and friends design though.

Over to Jeff, since this is his baby :)
Flags: needinfo?(jgilbert)
(In reply to Matt Woodrow (:mattwoodrow) from comment #2)
> My preference would be to make SurfaceStream threadsafe refcounted, and take
> a strong reference when we pass it to the compositor. Then make the
> compositor responsible for releasing that when it no longer needs it.
> 
> Not sure how that would affect the rest of the SurfaceStream and friends
> design though.
> 
> Over to Jeff, since this is his baby :)

We can't do this until we have a decent way to guarantee DeleteTextures and friends will work if the original context has been destroyed. Also, it's *probably* fine that SurfStream outlives the GLScreenBuffer, but we might have to poke at that a little, too.

It would certainly be the safest to just hold a ref to the GLContext until we don't need its contents anymore.
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #3)
> 
> It would certainly be the safest to just hold a ref to the GLContext until
> we don't need its contents anymore.


a) We only pass the SurfaceStream to the compositor now, so we would have to go back to the old way of passing the whole GLContext. I assume there was a reason it was changed, so do we really want to do this?

b) If we unref in the compositor, the GLContext could be destroyed there. Is it ok to destroy a GLContext from the non-owning thread?
Posted patch This is disgusting (obsolete) — Splinter Review
Attachment #777181 - Flags: review?(matt.woodrow)
Attachment #777181 - Flags: review?(jgilbert)
Comment on attachment 777181 [details] [diff] [review]
This is disgusting

Review of attachment 777181 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +336,5 @@
> +    SurfaceStream* surfStream = SurfaceStream::FromHandle(streamDesc.handle());
> +    if (surfStream) {
> +      MOZ_ASSERT(surfStream->GLContext());
> +      surfStream->GLContext()->Release();
> +    }

Ugh. I think this is really broken.

We've setup the SurfaceStreamHost/Client part as being double buffered, even though the buffer passing logic is handled internally within SurfaceStream.

What we really want is it to be single buffered and we pass the surface stream across when when we create it. From then on, the CanvasClientWebGL::Update() function can just do nothing, and we pull the new surfaces across using the SurfaceSteam API each time we composite.

I think this is probably my fault, so can you file a new bug please to fix this and assign to me. I'll do it after the texture host refactoring lands.

So for now we're getting a new SurfaceStreamDescriptor on every SurfaceStreamHostOGL::SwapTexture call, though it happens to be the same point.

We need to run this release code in that function, and also in the destructor (if mBuffer isn't null).

SetCompositor shouldn't be a reason to drop the reference.
Comment on attachment 777774 [details] [diff] [review]
Ref GLContext before sending SurfaceStream to compositor

Review of attachment 777774 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good (relatively speaking :) ) to me. I still think you should get jgilbert to review the gfx/gl components.

Do we have any way of confirming that we're not leaking the GLContext? I can't see any way it would, but this is the sort of thing that we'll probably regress during the texture host refactor.
Attachment #777774 - Flags: review?(matt.woodrow) → review+
Attachment #777774 - Flags: review?(jgilbert)
Attachment #777181 - Attachment is obsolete: true
Attachment #777181 - Flags: review?(matt.woodrow)
Attachment #777181 - Flags: review?(jgilbert)
(In reply to Matt Woodrow (:mattwoodrow) from comment #8)
> Do we have any way of confirming that we're not leaking the GLContext? I
> can't see any way it would, but this is the sort of thing that we'll
> probably regress during the texture host refactor.

As discussed on IRC, this path gets hit on Mac with WebGL, so the leakcheck build/tests would catch it there (and in fact did with my initial patch on the graphics branch).
Comment on attachment 777774 [details] [diff] [review]
Ref GLContext before sending SurfaceStream to compositor

It is really, really worth taking the time to think about how to avoid doing manual calls to AddRef and Release --- which make code very hard to read and assess for correctness, and very prone to breaking in the future --- and instead abstracting this behind a RefPtr.
Attachment #777774 - Flags: review-
Comment on attachment 777774 [details] [diff] [review]
Ref GLContext before sending SurfaceStream to compositor

Review of attachment 777774 [details] [diff] [review]:
-----------------------------------------------------------------

Talked with Snorp, who convinced me that due to this being passed over IPC, we had a real reason to do these manual AddRef and Release calls here. This is still very dangerous and unsatisfying, especially as --- as Snorp noted --- we can't prove that the IPC message will be deserialized and that the GLContext will be Release()'d, but should be OK as a short-term fix; it will indeed only be a short-term fix as GLContext multiplexing, which we should have soon, will make this a non-issue (by having the GLContext no longer being destroyed so early).
Attachment #777774 - Flags: review-
Comment on attachment 777774 [details] [diff] [review]
Ref GLContext before sending SurfaceStream to compositor

Review of attachment 777774 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/CanvasClient.cpp
@@ +124,5 @@
>  #endif
>    } else {
>      SurfaceStreamHandle handle = stream->GetShareHandle();
>      mDeprecatedTextureClient->SetDescriptor(SurfaceStreamDescriptor(handle, false));
> +    aLayer->mGLContext->AddRef();

Isn't `aLayer->mGLContext` the compositor GLContext?

Also, why don't we just set/unset it to/from some RefPtr?
(In reply to Jeff Gilbert [:jgilbert] from comment #12)
> Comment on attachment 777774 [details] [diff] [review]
> Ref GLContext before sending SurfaceStream to compositor
> 
> Review of attachment 777774 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/client/CanvasClient.cpp
> @@ +124,5 @@
> >  #endif
> >    } else {
> >      SurfaceStreamHandle handle = stream->GetShareHandle();
> >      mDeprecatedTextureClient->SetDescriptor(SurfaceStreamDescriptor(handle, false));
> > +    aLayer->mGLContext->AddRef();
> 
> Isn't `aLayer->mGLContext` the compositor GLContext?

As we discussed on IRC, no. It is the ClientCanvasLayer, which has the WebGL/SkiaGL context.

> 
> Also, why don't we just set/unset it to/from some RefPtr?

We can use a nsRefPtr in SurfaceStreamHostOGL. I'm about to post a patch that does just that. Can't avoid the manual AddRef, though.
Attachment #777774 - Attachment is obsolete: true
Attachment #777774 - Flags: review?(jgilbert)
Comment on attachment 778206 [details] [diff] [review]
Ref GLContext before sending SurfaceStream to compositor

Review of attachment 778206 [details] [diff] [review]:
-----------------------------------------------------------------

R+ for this patch, but as I'm talking with snorp on IRC, I have larger concerns about why this is made necessary.
Attachment #778206 - Flags: review?(jgilbert) → review+
Posted patch bustage fix (obsolete) — Splinter Review
Attachment #778273 - Flags: review?(matt.woodrow)
Comment on attachment 778273 [details] [diff] [review]
bustage fix

Review of attachment 778273 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +354,5 @@
>  {
>    MOZ_ASSERT(aImage.type() == SurfaceDescriptor::TSurfaceStreamDescriptor,
>               "Invalid descriptor");
> +
> +  if (mBuffer && mBuffer->type() == SurfaceDescriptor::TSurfaceStreamDescriptor) {

aImage!

mBuffer is the previous frame that we haven't swapped out yet.
Attachment #778273 - Flags: review?(matt.woodrow) → review+
Posted patch even more bustage fix (obsolete) — Splinter Review
Attachment #778273 - Attachment is obsolete: true
Attachment #778277 - Flags: review?(matt.woodrow)
Attachment #778277 - Attachment is obsolete: true
Attachment #778277 - Flags: review?(matt.woodrow)
Attachment #778279 - Flags: review?(matt.woodrow)
Comment on attachment 778279 [details] [diff] [review]
third time is a charm

Review of attachment 778279 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +368,5 @@
> +    SurfaceStream* surfStream = SurfaceStream::FromHandle(streamDesc.handle());
> +    if (surfStream) {
> +      mStreamGL = dont_AddRef(surfStream->GLContext());
> +    }
> +  } 

Whitespace!
Attachment #778279 - Flags: review?(matt.woodrow) → review+
https://hg.mozilla.org/mozilla-central/rev/fbdfcb52c475
https://hg.mozilla.org/mozilla-central/rev/a8aa3006efb9
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
You need to log in before you can comment on or make changes to this bug.