Closed Bug 952507 Opened 11 years ago Closed 10 years ago

New ContentClient doesn't Lock/Unlock properly

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: nical, Assigned: nical)

References

Details

Attachments

(4 files, 4 obsolete files)

Some TextureClient types assert that they are previously locked on Unlock() and unlocked when Lock() is called. When using them with the new ContentClient we hit the assertion in Unlock() at startup.

This blocks using the new ContentClient on Windows and B2G.
This patch doesn't solve the problem, but makes it easy to reproduce on Mac and Linux.
ContentClient is also accessing an unlocked texture's DrawTarget which is bad.
Here is an indented log of what's going on in ContentClient, ClientThebesLayer and RotatedBuffer:

 * RenderLayer
    * BeginPaint
    * PaintThebes
       * SyncFrontBufferToBackBuffer
       * BeginPaint
          * CreateBuffer
             * BuildTextureClients
                * DestroyBuffers 
                * CreateAndAllocateTextureClient
             * GetAsDrawTarget (!!! the texture is not locked !!!)
          * GetContextForQuadrantUpdate
             * EnsureBuffer 
       * PaintBuffer 
          * Updated 
             * GetUpdatedRegion
    * EndPaint 
       * Unlock (!!! the texture is not locked !!!)
 * SwapBuffers
ContentClient/RotatedBuffer also keeps around the DrawTarget as a member which makes tracking usage of the texture outside of locks a bit tedious. I think we could simplify this by not storing the DrawTargets, and always ask the DrawTargets to the texture clients. Or clearly define and document when those DrawTarget members should exist. For example it looks like (but it's rather easy to get lost in this code so I need confirmation) we only use the DrawTargets in SyncFrontBufferToBackBuffer and in PaintThebes.

If that's true, let's not keep the DrawTargets around outside of these two methods (or outside of the smallest interval that includes these two methods), and do the lock/unlock stuff in the same scopes if possible.

Nick, is what I am saying here making sense?
Flags: needinfo?(ncameron)
So It looks like if I over-simplify, things are going on like this: 

* RenderLayer()
    * SyncFrontToBack()
        * if allocation needed -> create and allocate textures
        * do the sync front to back thing
    * PaintStuff()
* SwapBuffers()

could this be changed into making sure we have everything allocated before we enter the rest of the logic, like below?

* RenderLayer()
   * if allocation needed -> create and allocate textures
   * Lock the textures // the DrawTargets can be used between now and Unlock
   * SyncFrontToBack()
   * PaintStuff()
   * SetDTBuffer(null) // the DrawTargets must NOT be kept/used after Unlock
   * Unlock the textures
* SwapBuffers()

This way the places to lock/unlock the textures and the validity of the DrawTarget would become more obvious.
Why does RotatedBuffer have a mBufferProvider member, while the ContentClient also stores mTextureClient? they sometimes don't point to the same TextureClient. Is that intended?
(In reply to Nicolas Silva [:nical] from comment #6)
> Why does RotatedBuffer have a mBufferProvider member, while the
> ContentClient also stores mTextureClient? they sometimes don't point to the
> same TextureClient. Is that intended?

Yes, that is intended. (A caveat, I think that the RotatedBuffer/ContentClient is really, really nasty code. It evolved rather than was designed and is begging for a new approach. However, I believe the plan is to throw this away and just use tiling and to do so relatively soon, so it is not worth the effort).

It is intended because... The buffer provider is meant to be ephemeral - it is set at the beginning of the paint cycle and unset at the end, but will not be changed during that time. mTextureClient is 'permanent', but might be changed to a new one by calling CreateBuffer, if the old one is the wrong size or format. I am not sure if this is necessary, tbh. We could _probably_ get rid of mTextureClient, it would not be simple though.
(In reply to Nicolas Silva [:nical] from comment #4)
> ContentClient/RotatedBuffer also keeps around the DrawTarget as a member
> which makes tracking usage of the texture outside of locks a bit tedious. I
> think we could simplify this by not storing the DrawTargets, and always ask
> the DrawTargets to the texture clients. Or clearly define and document when
> those DrawTarget members should exist. For example it looks like (but it's
> rather easy to get lost in this code so I need confirmation) we only use the
> DrawTargets in SyncFrontBufferToBackBuffer and in PaintThebes.
> 
> If that's true, let's not keep the DrawTargets around outside of these two
> methods (or outside of the smallest interval that includes these two
> methods), and do the lock/unlock stuff in the same scopes if possible.
> 
> Nick, is what I am saying here making sense?

I think RotatedBuffer uses the draw targets, but called mBuffer[OnWhite]. RotatedBuffer keeps them around for the duration of a paint cycle (pulled out in EnsureBuffer, forgotten when SetBufferProvider is called from EndPaint with nullptr). Not sure if/how that affects your plan.
Flags: needinfo?(ncameron)
I'm also changing some of this around for bug 951556.
Blocks: 950050
Attached patch contentclient-lock (obsolete) — Splinter Review
So, I logged pretty much everything that is happening in ContentClient.cpp and RotatedBuffer.cpp and used that to identify symptoms of when locking poblems happen. For instance I found that it always happens when front and back buffer do not differ, etc.

So with that I added lock/unlock calls in places where it seems to make sense.
The entire thing is so full of states that are sometimes redundant and sometimes not and, has too many different ways to branch depending on the different states for me to be able to have a clear mental model of what is going on (especially around mDTBuffer vs mBufferProvider vs mTextureClient, etc. and their respective lifetimes and semantics), so this is a very uneducated patch.

I put in the comments my (confused) thinking around the code that I changed. If this patch is correct, I'll clean it up and comment it properly, so don't pay attention to style yet.

With this patch I don't run into locking problems anymore (tested by scrolling around and resizing the window like crazy on Linux+OMTC) but I sometimes see some black artifacts when resizing. Note that We already have rendering problems before the patch, for instance when moving tabs.
Assignee: nobody → nical.bugzilla
Attachment #8358489 - Flags: feedback?(ncameron)
Also, this is on top of the patches of bug 951554.
(In reply to Nicolas Silva [:nical] from comment #10)

> With this patch I don't run into locking problems anymore (tested by
> scrolling around and resizing the window like crazy on Linux+OMTC) but I
> sometimes see some black artifacts when resizing. Note that We already have
> rendering problems before the patch, for instance when moving tabs.

I wasn't aware of existing problems - are these platform/configuration specific? Are the black artefacts just where the resizing reveals new areas of the window or elsewhere?
Comment on attachment 8358489 [details] [diff] [review]
contentclient-lock

Review of attachment 8358489 [details] [diff] [review]:
-----------------------------------------------------------------

The scheme here seems to be that we lock texture clients on creation and we keep them locked until destruction. Should we be locking and unlocking every frame? And using read only locks for the front buffer?

f- for the first comment f+ for all the other changes.

::: gfx/layers/client/ContentClient.cpp
@@ +128,5 @@
> +  // somewhere in EnsureBuffer...
> +  // XXX - I am wondering if this may cause rendering issues if at some point
> +  // later we are expecting mBufferProvider to be non null. If so, we are in
> +  // trouble because there seems to be no way to know if the texture will be locked
> +  // as we may have unlocked mTextureClient here.

This seems very bad. Do we really need to unlock mTextureClient now? Can't we do it when we clear mOldTextures and the texture client actually dies? By that stage we are guaranteed that it is no longer the buffer provider.

@@ +137,5 @@
>    mTextureClient = nullptr;
>  
>    if (mTextureClientOnWhite) {
> +    if (mBufferProviderOnWhite == mTextureClientOnWhite) {
> +      mBufferProviderOnWhite = nullptr; // same here

same as above

@@ +951,5 @@
> +    // I added a lock here, since I observed that most of the locking errors
> +    // that I observed where happenning after we early return here becase front 
> +    // and back don't differ. It sorta makes some sense because below we do lock
> +    // the texture and I suppose that code that executes after PrepareFrame expects
> +    // the texture to be locked.

I think this makes sense.

@@ +958,5 @@
> +    if (mTextureClient) {
> +      bool locked = mTextureClient->Lock(OPEN_READ_WRITE);
> +      MOZ_ASSERT(locked);
> +    } else {
> +      logln("mTextureClient is null, can't lock");

We should find out why we get a null texture client. I thought content clients always had one? We should lock when the texture client is created, if it happens after PrepareFrame - you are already doing that in CreateBuffer, which looks good.

@@ +964,5 @@
>      return;
>    }
>  
>    RefPtr<DrawTarget> backBuffer = GetDTBuffer();
>    if (!backBuffer && mTextureClient) {

I guess we should assert that if backbuffer exists, then it has been locked
Attachment #8358489 - Flags: feedback?(ncameron)
(In reply to Nick Cameron [:nrc] from comment #13)
> Comment on attachment 8358489 [details] [diff] [review]
> contentclient-lock
> 
> Review of attachment 8358489 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The scheme here seems to be that we lock texture clients on creation and we
> keep them locked until destruction. Should we be locking and unlocking every
> frame? And using read only locks for the front buffer?

Yeah we should use READ_ONLY/WRITE_ONLY when it makes sense. I'll do that in a followup patch.

> This seems very bad. Do we really need to unlock mTextureClient now? Can't
> we do it when we clear mOldTextures and the texture client actually dies? By
> that stage we are guaranteed that it is no longer the buffer provider.

Sounds good, I tried this approach and it works. The only problem is that in the double buffered case, the textures that are added to mOldTextures are sometimes locked, sometimes not. So I had to expose an IsLocked() method in TextureClient in order to work around that which is a bit of a shame because it will make it more tempting for people to do complicated locking logic like here.


> @@ +958,5 @@
> > +    if (mTextureClient) {
> > +      bool locked = mTextureClient->Lock(OPEN_READ_WRITE);
> > +      MOZ_ASSERT(locked);
> > +    } else {
> > +      logln("mTextureClient is null, can't lock");
> 
> We should find out why we get a null texture client. I thought content
> clients always had one? We should lock when the texture client is created,
> if it happens after PrepareFrame - you are already doing that in
> CreateBuffer, which looks good.

iirc this was happenig something like the first time the ContenClient is used and the texture was probably going to be lazily created at some point later but I may be wrong.
Attachment #8358489 - Attachment is obsolete: true
Attachment #8359287 - Flags: review?(ncameron)
Attachment #8359288 - Flags: review?(ncameron)
Comment on attachment 8359287 [details] [diff] [review]
Fix the locking issue with single buffered ContentClient

Review of attachment 8359287 [details] [diff] [review]:
-----------------------------------------------------------------

I don't understand what the point of the lock is - we are not locking on the compositor thread, indeed we can't because we always hold the lock on the main thread, so why do we need to lock at all?
Comment on attachment 8359288 [details] [diff] [review]
Fix the locking issue with double buffered ContentClient

Review of attachment 8359288 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ContentClient.cpp
@@ +654,5 @@
>      BorrowDrawTargetForQuadrantUpdate(aUpdateRegion.GetBounds(), BUFFER_BLACK);
>  
> +  if (!destDT) {
> +    // XXX - We should do something!
> +    // This happened while resizing the window.

good catch. I don't think we need to anything though, just return (doesn't even need to warn)

::: gfx/layers/client/TextureClient.h
@@ +171,5 @@
>    virtual bool Lock(OpenMode aMode) { return IsValid(); }
>  
>    virtual void Unlock() {}
>  
> +  virtual bool IsLocked() const = 0;

Could you use an UnlockIfLocked style API instead?
Same patch rebased
Attachment #8359288 - Attachment is obsolete: true
Attachment #8359288 - Flags: review?(ncameron)
Attachment #8362945 - Flags: review?(ncameron)
Blocks: 946720
(In reply to Nick Cameron [:nrc] from comment #17)
> Comment on attachment 8359287 [details] [diff] [review]
> Fix the locking issue with single buffered ContentClient
> 
> Review of attachment 8359287 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't understand what the point of the lock is - we are not locking on the
> compositor thread, indeed we can't because we always hold the lock on the
> main thread, so why do we need to lock at all?

Lock/Unlock also has the map/unmap kind of semantics with some of the underlying APIs only letting you access the memory between lock and unlock. We should only be holding the lock when we want to access the underlying data. And we do hold an actual mutex lock with some of the texture types (gralloc and D3D11 textures for instance)
(In reply to Nick Cameron [:nrc] from comment #18)
> > +  virtual bool IsLocked() const = 0;
> 
> Could you use an UnlockIfLocked style API instead?

I prefer having the user of the texture be responsible for knowing when it is allowed to perform some operations with the texture or not. There are other operations like GetAsDrawTarget that are only valid if the texture is successfully locked. I like when the API is unforgiving enough that you can't let yourself do potentially unsupported things.
(In reply to Nicolas Silva [:nical] from comment #21)
> (In reply to Nick Cameron [:nrc] from comment #18)
> > > +  virtual bool IsLocked() const = 0;
> > 
> > Could you use an UnlockIfLocked style API instead?
> 
> I prefer having the user of the texture be responsible for knowing when it
> is allowed to perform some operations with the texture or not. There are
> other operations like GetAsDrawTarget that are only valid if the texture is
> successfully locked. I like when the API is unforgiving enough that you
> can't let yourself do potentially unsupported things.

Aren't you doing the opposite by exposing IsLocked? - you can always do |if (IsLocked()) GetAsDrawTarget();| and so forth, i.e., it presents an *IfLocked API for all methods that require you be locked. Whereas, adding UnlockIfLocked (and not IsLocked) only lets you be unsure if you are going to Unlock.

I guess another option is to make IsLocked (or UnlockIfLocked) private and friend classes you want to be able to use it. That makes it clear that is an emergency-only API.
Comment on attachment 8359287 [details] [diff] [review]
Fix the locking issue with single buffered ContentClient

Review of attachment 8359287 [details] [diff] [review]:
-----------------------------------------------------------------

Do you need to lock in FinalizeFrame too where we sync the last frame to the current one?
Attachment #8359287 - Flags: review?(ncameron) → review+
Comment on attachment 8362945 [details] [diff] [review]
Fix the locking issue with double buffered ContentClient (rebased)

Review of attachment 8362945 [details] [diff] [review]:
-----------------------------------------------------------------

I'm still not really convinced that IsLocked is the way to do this, but I don't feel too strongly about it. r=me with that or any of the alternatives I suggested
Attachment #8362945 - Flags: review?(ncameron) → review+
(In reply to Nick Cameron [:nrc] from comment #23)
> Comment on attachment 8359287 [details] [diff] [review]
> Fix the locking issue with single buffered ContentClient
> 
> Review of attachment 8359287 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Do you need to lock in FinalizeFrame too where we sync the last frame to the
> current one?

I think we already do (or at least with these patches) because FinalizeFrame calls UpdateDestinationFrom which BorrowDrawTargetForQuadrantUpdate which calls EnsureBuffer inside of which we make sure we have a locked TextureClient (and then we get its DrawTarget which can only happen when the texture is locked).
(In reply to Nicolas Silva [:nical] from comment #25)
> (In reply to Nick Cameron [:nrc] from comment #23)
> > Comment on attachment 8359287 [details] [diff] [review]
> > Fix the locking issue with single buffered ContentClient
> > 
> > Review of attachment 8359287 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Do you need to lock in FinalizeFrame too where we sync the last frame to the
> > current one?
> 
> I think we already do (or at least with these patches) because FinalizeFrame
> calls UpdateDestinationFrom which BorrowDrawTargetForQuadrantUpdate which
> calls EnsureBuffer inside of which we make sure we have a locked
> TextureClient (and then we get its DrawTarget which can only happen when the
> texture is locked).

I was thinking of the front buffers, but it looks like we already lock/unlock them for read only access
(In reply to Nick Cameron [:nrc] from comment #22)
> (In reply to Nicolas Silva [:nical] from comment #21)
> > (In reply to Nick Cameron [:nrc] from comment #18)
> > > > +  virtual bool IsLocked() const = 0;
> > > 
> > > Could you use an UnlockIfLocked style API instead?
> > 
> > I prefer having the user of the texture be responsible for knowing when it
> > is allowed to perform some operations with the texture or not. There are
> > other operations like GetAsDrawTarget that are only valid if the texture is
> > successfully locked. I like when the API is unforgiving enough that you
> > can't let yourself do potentially unsupported things.
> 
> Aren't you doing the opposite by exposing IsLocked? - you can always do |if
> (IsLocked()) GetAsDrawTarget();| and so forth, i.e., it presents an
> *IfLocked API for all methods that require you be locked. Whereas, adding
> UnlockIfLocked (and not IsLocked) only lets you be unsure if you are going
> to Unlock.
> 
> I guess another option is to make IsLocked (or UnlockIfLocked) private and
> friend classes you want to be able to use it. That makes it clear that is an
> emergency-only API.

The best (whenever possible) is even to go
if (!texture->Lock(...)) {
  // print scary warning or maybe even crash now
  return false; // as early as possible instead of going through the rest of the logic with an invalid state
}

I have a feeling anything more complicated than the above will eventually generate some bugmail. I like the emergency API idea though.
Fixed the build porblem on OSX and pushed to try https://tbpl.mozilla.org/?tree=Try&rev=55aae78b35a2
The two previous patches fixed the fact that GetAsDrawTarget was used on unlocked textures, and that some textures were locked twice or unlocked twice.
But it still crashes on Windows inside D2D. I looked into that and it looks like we unlock the source of DrawBufferQuadrants before unlocking the destination, which causes the source texture to be inaccessible when the destination flushes it's command queue.
This patch fixes that but in a very unsatisfying way: it adds a Flush() call on the destination after we do the rotation, which is probably very bad for performance.

A better fix would be to find out which DrawTarget corresponds to which TextureClient, and unlock everything in the correct order.

In this patch I also needed to add some scopes and move PainteBuffer into PaintThebes (its only caller) so that I can null out gfxContext and DrawTarget earlier.

At the moment D3D9 doesn't crash but while the chrome renders properly, the web page is just black.
Interestingly the deprecated ContentClient never seems to unlock the front buffer, it just locks it (while it does unlock the back buffer). Weird. This would mean the AcquireSync is called on the d3d texture but ReleaseSync is never called. Bas, is that supposed to work? Also is it possible for the compositor to composite the texture while it is "Acquired" by the content thread ?
Flags: needinfo?(bas)
(In reply to Nicolas Silva [:nical] from comment #34)
> Interestingly the deprecated ContentClient never seems to unlock the front
> buffer, it just locks it (while it does unlock the back buffer). Weird. This
> would mean the AcquireSync is called on the d3d texture but ReleaseSync is
> never called. Bas, is that supposed to work? Also is it possible for the
> compositor to composite the texture while it is "Acquired" by the content
> thread ?

No, that would not work. However, maybe this is only in the single-buffered version? Which we don't use with D3D11 textures.
Flags: needinfo?(bas)
Turns out the deprecated ContentClient is unlocking, and it is also Flushing the back buffer after calling UpdateDestinationFrom (which the new ContentClient wasn't doing before this patch).
This patch:
 * adds the flush
 * restricts a bunch of lifetimes to shorter scopes to ensure that DrawTarget references are dropped before we unlock the texture client.
 * inlines PaintBuffer into PaintThebes (its only caller) in order to to null out some references earlier than before (also making the call stack one item shorter which is nice when reading the code).

With that, the new ContentClient works with D3D11.
Attachment #8364519 - Attachment is obsolete: true
Attachment #8366045 - Flags: review?(bas)
Comment on attachment 8366045 [details] [diff] [review]
Flush the back buffer affer syncing with the front before it becomes inaccessible.

Review of attachment 8366045 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/ClientThebesLayer.cpp
@@ +64,5 @@
>        SetAntialiasingFlags(this, target);
>  
>        nsRefPtr<gfxContext> ctx = gfxContext::ContextForDrawTarget(target);
> +
> +      ContentClientRemote* contentClientRemote = static_cast<ContentClientRemote*>(mContentClient.get());

Why'd you decide to inline this?
No longer blocks: 950050
Same patch without inlining in ClientThebesLayer
Attachment #8366045 - Attachment is obsolete: true
Attachment #8366045 - Flags: review?(bas)
Attachment #8369962 - Flags: review?(bas)
Attachment #8369962 - Flags: review?(bas) → review+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: