Closed Bug 879297 Opened 11 years ago Closed 11 years ago

ImageLayerOGL keeps lock on external buffers

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:leo+, firefox24 unaffected, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 wontfix, b2g-v1.1hd fixed)

RESOLVED FIXED
blocking-b2g leo+
Tracking Status
firefox24 --- unaffected
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- wontfix
b2g-v1.1hd --- fixed

People

(Reporter: diego, Assigned: bjacob)

References

Details

(Whiteboard: [YouTubeCertBlocker+])

Attachments

(1 file, 2 obsolete files)

B2G video playback and camera preview are rendered by binding an external buffer to a GL texture in ImageLayerOGL.

https://mxr.mozilla.org/mozilla-b2g18/source/gfx/layers/opengl/ImageLayerOGL.cpp#969

This binding can lock the external buffer until it is unbound. If someone tries to reuse the buffer before it's unbound/unlocked can cause some locking errors like this:

E/libgenlock(  132): perform_lock_unlock_operation: GENLOCK_IOC_DREADLOCK failed (lockType0x1,        >err=Connection timed out fd=31)
E/omx_vdec(  132): Failed to acquire genlock, ret = 1
I have a working patch that I'll share ASAP
Assignee: nobody → dwilson
blocking-b2g: --- → leo?
Blocks: 878343
What's the impact towards YouTube videos here? What's the end user impact?
(In reply to Jason Smith [:jsmith] from comment #2)
> What's the impact towards YouTube videos here? What's the end user impact?

If this isn't fixed, YouTube videos freeze on a single frame mid-way while audio continues.
Blocks: 877024
This will fix 878977 right?
Blocks: 878977
Whiteboard: [YouTubeCertBlocker+]
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #4)
> This will fix 878977 right?

Kinda. It will fix the screen freeze caused as a side effect of bug 878977.

Bug 878977 will still make it fall back to GPU when the "network activity" icon blinks, which is bad for power performance. Not ideal, but not a blocker either.
OK, the solution is not as straightforward as I thought.

My first thought was unbinding the buffer after rendering using this:

https://mxr.mozilla.org/mozilla-b2g18/source/gfx/gl/GLContextProviderEGL.cpp#448

But that still sometimes fails to unlock the buffer!

I'll keep trying. Suggestions are welcome.
Indeed, BindExternalBuffer and UnbindExternalBuffer were extremely confusing as, contrary to what their names suggested, they were not doing the converse of each other --- one was operating on TEXTURE_2D and the other on TEXTURE_EXTERNAL. So they have been mercilessly axed from mozilla-central. I suppose that the present bug is b2g-v1, I'll try to look at this tomorrow.
Flags: needinfo?(bjacob)
Ah, I've looked at the code link in comment 0 now --- ImageLayerOGL.cpp:969. Hg history suggests that the right person to ask is Kan-Ru

     kanru: #ifdef MOZ_WIDGET_GONK
     kanru:     if (img
     kanru:         && (img->type() == SharedImage::TSurfaceDescriptor)
     kanru:         && (img->get_SurfaceDescriptor().type() == SurfaceDescriptor::TSurfaceDescriptorGralloc)) {
     kanru:         const SurfaceDescriptorGralloc& desc = img->get_SurfaceDescriptor().get_SurfaceDescriptorGralloc();
     kanru:         sp<GraphicBuffer> graphicBuffer = GrallocBufferActor::GetFrom(desc);
     kanru:         mSize = gfxIntSize(graphicBuffer->getWidth(), graphicBuffer->getHeight());
    sikeda:         mPictureRect = nsIntRect(0, 0, desc.size().width, desc.size().height);
     kanru:         if (!mExternalBufferTexture.IsAllocated()) {
     kanru:           mExternalBufferTexture.Allocate(gl());
     kanru:         }
     kanru:         gl()->MakeCurrent();
     kanru:         gl()->fActiveTexture(LOCAL_GL_TEXTURE0);
     kanru:         gl()->BindExternalBuffer(mExternalBufferTexture.GetTextureID(), graphicBuffer->getNativeBuffer());
     kanru:         mImageVersion = imgVersion;
     kanru:     }
     kanru: #endif
Flags: needinfo?(bjacob)
Is the buffer being kept alive by the GPU?  Though the really hardcore unbind (with a teximage2d) is likely going to cause a pipeline flush anyway.
Per discussed with chiajung I think we should unlock the buffer before return it to the child. I'll check this today.
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #9)
> Is the buffer being kept alive by the GPU?  Though the really hardcore
> unbind (with a teximage2d) is likely going to cause a pipeline flush anyway.

If you're refering to what UnbindExternalBuffer does, it actually doesn't do anything here because of a texture target mismatch (2D != EXTERNAL) just being an INVALID_OPERATION.
I meant if it was actually done on the proper target :)
We should unbind and unlock gralloc buffer before this point

https://mxr.mozilla.org/mozilla-b2g18/source/gfx/layers/ipc/ImageContainerParent.cpp#33
Flags: needinfo?(kchen)
(In reply to Kan-Ru Chen (:kanru) (Mozilla Corporation) from comment #13)
> We should unbind and unlock gralloc buffer before this point
> 
> https://mxr.mozilla.org/mozilla-b2g18/source/gfx/layers/ipc/
> ImageContainerParent.cpp#33

Thanks Kan-Ru. Now I know the "where". Just to figure out the "how".
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #12)
> I meant if it was actually done on the proper target :)

If this answers your question: the GPU will definitely be holding on to any buffer that it is sampling from, but that won't cause a lock failure (because different handle), instead that will cause content to wait to acquire its next lock on it.
(In reply to Diego Wilson [:diego] from comment #14)
> (In reply to Kan-Ru Chen (:kanru) (Mozilla Corporation) from comment #13)
> > We should unbind and unlock gralloc buffer before this point
> > 
> > https://mxr.mozilla.org/mozilla-b2g18/source/gfx/layers/ipc/
> > ImageContainerParent.cpp#33
> 
> Thanks Kan-Ru. Now I know the "where". Just to figure out the "how".

Indeed the "how" is a bit nontrivial as the EGL API does not have a way of explicitly undoing the effect of a glEGLImageTargetTexture2D call.

I know 2 ways that you can still achieve this:
 - 1. either bind another EGLImage to that GL texture
 - 2. or delete that GL texture (glDeleteTextures).

Way 2. is of course a bit scary to do frequently, so try to do 1. instead. If you look at Android's SurfaceTexture, what is being done is that the same GL texture gets reused so that when a new GraphicBuffer is presented, the previous one automatically gets unlocked (as the new glEGLImageTargetTexture2D undoes the effect of the previous one).
blocking-b2g: leo? → leo+
We're on a tight schedule here so can we all please give as much information as possible instead of just dribbling it out a little at a time? Ideally in the form of a suggested patches?

How would we"bind another EGLImage to that GL texture"? What's the best way to create such an image?
Why is glDeleteTextures scary to do frequently?

Can we use a single EGLImage for this and bind it to every GL texture?

Why would texImage2D on the bound texture cause a GPU pipeline flush?
It appears the only safe way to force an unbind is to bind to another buffer. Using a dummy buffer like bug 775649 right after the draw worked well for me. Unless there's any objections with this approach I'll create a patch.

P.s. I was sure I had already added this comment earlier today. If you see it in another bug pleas ignore it :)
Please create a patch!
Attached patch Patch for b2g18 (obsolete) — Splinter Review
Could you test this patch?
Attached patch updated patch (obsolete) — Splinter Review
How about this version? I think we can avoid adding glFlush after painting each layer, by deferring the unbinding until after LayerManagerOGL has completed rendering and done its own glFlush.

Testing this now.
Attachment #759074 - Flags: feedback?(kchen)
Comment on attachment 759074 [details] [diff] [review]
updated patch

Review of attachment 759074 [details] [diff] [review]:
-----------------------------------------------------------------

Seems to work OK. Thanks Kan-Ru!
Attachment #759074 - Flags: review?(bjacob)
It would be great to have Diego test this too :-)
Comment on attachment 759074 [details] [diff] [review]
updated patch

Review of attachment 759074 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, this should work.

Important here is that both mDummyGrallocBuffer and mExternalBufferTexture are members of the same class, so we have a 1:1 mapping between them. For a while in mozilla-central (no longer) we had a single "dummy gralloc buffer" for all textures and this ran into slow paths on certain phones' drivers (at least the Peak). Anyway, good that this isn't being done here.

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +1147,5 @@
>  
>    mGLContext->fDisableVertexAttribArray(vcattr);
>    mGLContext->fDisableVertexAttribArray(tcattr);
>  
>    mGLContext->fFlush();

I realize that this is pre-existing code, but I have to ask what is this glFlush call supposed to achieve? The OpenGL specification doesn't say anything definite about glFlush --- it is entirely up to the implementation whether it is a no-operation or not. So any call to glFlush should be accompanied by a comment explaining on what driver it was found to have a useful effect there.

Where a glFlush is used, generally what is wanted instead is either a WaitSync, or a glFinish.

::: gfx/layers/opengl/LayerManagerOGL.h
@@ +414,5 @@
>    /** Region we're clipping our current drawing to. */
>    nsIntRegion mClippingRegion;
>  
> +  /** List of LayerOGLs that need a post-flush callback */
> +  nsTArray<LayerOGL*> mPostFlushCallbacks;

Since it looks like this array will constantly be grown to a small size and then cleared, this is probably a good use case for nsAutoTArray ...?
Attachment #759074 - Flags: review?(bjacob) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #18)
> Why is glDeleteTextures scary to do frequently?

Because textures (even without any image data) are not so lightweight objects and it is scary, from a performance perspective, to be creating and destroying them all the time. We had in-depth conversations about this in Taipei as we were still struggling to solve that problem on mozilla-central then (it's now been solved by adopting SurfaceTexture's design, see bug 875211.

> 
> Can we use a single EGLImage for this and bind it to every GL texture?

That is what we have been doing for a little while on mozilla-central before we changed it in bug 875211, and that is what turned out to run slowly in certain phones' drivers, like the Peak, see bug 869696.

> 
> Why would texImage2D on the bound texture cause a GPU pipeline flush?

It depends on fine details of the GPU, like whether it is an immediate or a deferred GPU. I still don't know for sure the extent to which Adreno's are deferred, but have heard several people saying that they are. On a deferred GPU, you can issue GL calls for a frame while the previous frame is still being rendered in fragment shaders. In that case, issuing a texImage2D call on a texture that is still being sampled for the previous frame causes the pipeline to stall. I have been trying to assemble some documentation on deferred GPUs in this wiki page:
   https://wiki.mozilla.org/Platform/GFX/DeferredGPUs
(this page boldly claims that Adrenos are deferred. Again, I am not certain of that).
The page moved to: https://wiki.mozilla.org/Platform/GFX/MobileGPUs
Wait, I r+'d this a little too soon. Please explain why it is safe for the LayerManager to have this array of plain pointers to Layers? Is it guaranteed that Layers cannot go away between the time when the raw pointer is recorded and the time when it is dereferenced and cleared?
Flags: needinfo?(roc)
Comment on attachment 759074 [details] [diff] [review]
updated patch

Review of attachment 759074 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +1153,5 @@
> +
> +  for (uint32_t i = 0; i < mPostFlushCallbacks.Length(); ++i) {
> +    mPostFlushCallbacks[i]->InvokePostFlushCallback();
> +  }
> +  mPostFlushCallbacks.Clear();

On my device LayerManagerOGL::Render() returns earlier here:

https://mxr.mozilla.org/mozilla-b2g18/source/gfx/layers/opengl/LayerManagerOGL.cpp#1071

But if I paste the chunk above there it works. So LGTM with that.
(In reply to Benoit Jacob [:bjacob] from comment #25)
> I realize that this is pre-existing code, but I have to ask what is this
> glFlush call supposed to achieve? The OpenGL specification doesn't say
> anything definite about glFlush --- it is entirely up to the implementation
> whether it is a no-operation or not. So any call to glFlush should be
> accompanied by a comment explaining on what driver it was found to have a
> useful effect there.
> 
> Where a glFlush is used, generally what is wanted instead is either a
> WaitSync, or a glFinish.

Yeah, existing code, I don't know, we're definitely not taking it out on the 18 branch :-).

> ::: gfx/layers/opengl/LayerManagerOGL.h
> @@ +414,5 @@
> >    /** Region we're clipping our current drawing to. */
> >    nsIntRegion mClippingRegion;
> >  
> > +  /** List of LayerOGLs that need a post-flush callback */
> > +  nsTArray<LayerOGL*> mPostFlushCallbacks;
> 
> Since it looks like this array will constantly be grown to a small size and
> then cleared, this is probably a good use case for nsAutoTArray ...?

Actually the most common size for this array is probably 0. I don't think this matters much either way.
Flags: needinfo?(roc)
(In reply to Benoit Jacob [:bjacob] from comment #28)
> Wait, I r+'d this a little too soon. Please explain why it is safe for the
> LayerManager to have this array of plain pointers to Layers? Is it
> guaranteed that Layers cannot go away between the time when the raw pointer
> is recorded and the time when it is dereferenced and cleared?

It's supposed to be, but Diego pointed out that it's not in all cases. So I'll fix that up.
Attached patch v2Splinter Review
Attachment #759071 - Attachment is obsolete: true
Attachment #759074 - Attachment is obsolete: true
Attachment #759074 - Flags: feedback?(kchen)
Attachment #759484 - Flags: review?(bjacob)
Attachment #759484 - Flags: feedback?(kchen)
Questions for the review:

Can you clarify comment 31? What does "i'll fix that up" mean?

Also, how would you feel about having these pointers be nsRefPtr's? Since layers aren't cycle-collected, and as far as I can see in Layers.h they use NS_INLINE_DECL_REFCOUNTING, addref/release'ing them is cheap.
Flags: needinfo?(roc)
(In reply to Benoit Jacob [:bjacob] from comment #33)
> Can you clarify comment 31? What does "i'll fix that up" mean?

I'll make sure that the post-render callbacks are called (and the list cleared) on every possible exit path from Render(). Patch v2 does this by doing it right after the (sole) call to Render.

> Also, how would you feel about having these pointers be nsRefPtr's? Since
> layers aren't cycle-collected, and as far as I can see in Layers.h they use
> NS_INLINE_DECL_REFCOUNTING, addref/release'ing them is cheap.

Unnecessary. Layers can't be deleted during Render(), that would break the world.
Flags: needinfo?(roc)
Comment on attachment 759484 [details] [diff] [review]
v2

Review of attachment 759484 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +702,1 @@
>        Render();

Does this patch rely on this being the only place where Render() is called? I have only checked that it is so in this file and in the .h file. I don't know about other files. Please check/confirm.
Attachment #759484 - Flags: review?(bjacob) → review+
Yes it does. That's OK I think.
Comment on attachment 759484 [details] [diff] [review]
v2

Review of attachment 759484 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +965,5 @@
>            mExternalBufferTexture.Allocate(gl());
>          }
> +        if (!mDummyGrallocBuffer.get()) {
> +          mDummyGrallocBuffer = new GraphicBuffer(2, 2, HAL_PIXEL_FORMAT_YV12,
> +                                                  GraphicBuffer::USAGE_HW_TEXTURE);

I forgot to add the comment "For YV12 the size of the buffer has to be at least 2x2"

::: gfx/layers/opengl/LayerManagerOGL.cpp
@@ +702,1 @@
>        Render();

It's important that ImageContainerParent::RecvPublishImage runs on the same thread of the Compositor, otherwise we might return the image to the child too early. Need to document this at somewhere.
Attachment #759484 - Flags: feedback?(kchen) → feedback+
Thanks Kan-Ru. Pushed with your comments addressed.
https://hg.mozilla.org/releases/mozilla-b2g18/rev/cd9f8960933c
Benoit, can you port this to central? Everything's utterly changed with the layers refactoring so the patch basically has to be rewritten.
Assignee: dwilson → bjacob
Central should not have any such problem anymore --- we ran into many such problems there during layers-refactoring, but for all I know, these problems are behind us now. And since Nical's work in bug 875211, we even have adopted the nice idea from Android's SurfaceTexture so we don't even need dummy GraphicBuffers to bind anymore.
Calling this fixed based on comment 40.

https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/cd9f8960933c
https://hg.mozilla.org/releases/mozilla-b2g18_v1_1_0_hd/rev/cedffa70cd98
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Flags: in-moztrap-
(In reply to Benoit Jacob [:bjacob] from comment #40)
> Central should not have any such problem anymore --- we ran into many such
> problems there during layers-refactoring, but for all I know, these problems
> are behind us now. And since Nical's work in bug 875211, we even have
> adopted the nice idea from Android's SurfaceTexture so we don't even need
> dummy GraphicBuffers to bind anymore.

How sure of this are you? I'm seeing a lot of genlock contention with h264 playback on master, and I'm wondering if this would help.
Blocks: 909851
No longer blocks: 909851
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: