Closed Bug 1144906 Opened 9 years ago Closed 9 years ago

WebGL uses readback on Windows with E10S

Categories

(Core :: Graphics: CanvasWebGL, defect)

38 Branch
All
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
e10s m6+ ---
firefox41 --- fixed

People

(Reporter: jgilbert, Assigned: jgilbert)

References

Details

Attachments

(16 files, 4 obsolete files)

68.00 KB, patch
nical
: review+
Details | Diff | Splinter Review
28.76 KB, patch
nical
: review+
Details | Diff | Splinter Review
4.61 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
9.22 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
3.68 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
11.56 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
92.75 KB, patch
Details | Diff | Splinter Review
11.72 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
1.34 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
1.55 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
4.42 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
3.53 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
1.49 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
4.31 KB, patch
mattwoodrow
: review+
jrmuizel
: review+
jgilbert
: checkin-
Details | Diff | Splinter Review
1.27 KB, patch
Details | Diff | Splinter Review
2.87 KB, patch
Details | Diff | Splinter Review
We should stop doing readback.
Attachment #8579633 - Flags: review?(matt.woodrow)
Attachment #8579633 - Flags: review?(dglastonbury)
Comment on attachment 8579633 [details] [diff] [review]
0001-Attach-TexClient-for-DGXI-shared-HANDLE-to-ShSurfANG.patch

Review of attachment 8579633 [details] [diff] [review]:
-----------------------------------------------------------------

I'll defer to nical, since this is primarily TextureClient changes.

::: gfx/layers/d3d11/TextureD3D11.h
@@ -59,5 @@
>    virtual gfx::DrawTarget* BorrowDrawTarget() MOZ_OVERRIDE;
>  
> -  virtual bool AllocateForSurface(gfx::IntSize aSize,
> -                                  TextureAllocationFlags aFlags = ALLOC_DEFAULT) MOZ_OVERRIDE;
> -

Doesn't this break d2d content rendering? Texture clients that are used for ContentClient need to implement AllocateForSurface to work.

@@ +85,5 @@
>  };
>  
>  /**
> + */
> +class TextureThinClientDXGI : public TextureThinClient

We have a SharedTextureClientD3D9 already, which is almost identical to this, except the keep alive object is IDirect3DTexture9.

Could we share these and pass in a KeepAlive<T> object, or just IUnknown?
Attachment #8579633 - Flags: review?(matt.woodrow) → review?(nical.bugzilla)
Comment on attachment 8579633 [details] [diff] [review]
0001-Attach-TexClient-for-DGXI-shared-HANDLE-to-ShSurfANG.patch

Review of attachment 8579633 [details] [diff] [review]:
-----------------------------------------------------------------

With nits

::: gfx/layers/client/CanvasClient.cpp
@@ +44,5 @@
>                                   TextureFlags aFlags)
>  {
> +#if defined(MOZ_WIDGET_GONK) || defined(XP_WIN)
> +  const bool hasNonReadbackIPCSupport = true;
> +#else 

To the god of WS...

@@ +47,5 @@
> +  const bool hasNonReadbackIPCSupport = true;
> +#else 
> +  const bool hasNonReadbackIPCSupport = false;
> +#endif
> +  

Ditto

::: gfx/layers/client/TextureClient.h
@@ +562,5 @@
> +
> +  virtual TemporaryRef<TextureClient>
> +  CreateSimilar(TextureFlags, TextureAllocationFlags) const MOZ_OVERRIDE
> +  {
> +    return nullptr;

Should this ASSERT that it shouldn't be used?

::: gfx/layers/composite/TextureHost.cpp
@@ +875,5 @@
>    RefPtr<TextureSource> texSource;
>    switch (abstractSurf->mType) {
>  #ifdef XP_WIN
> +    case gl::SharedSurfaceType::EGLSurfaceANGLE:
> +      MOZ_CRASH();

Explanation for why this crashes?
Attachment #8579633 - Flags: review?(nical.bugzilla)
Attachment #8579633 - Flags: review?(matt.woodrow)
Attachment #8579633 - Flags: review?(dglastonbury)
Attachment #8579633 - Flags: review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #1)
> Comment on attachment 8579633 [details] [diff] [review]
> 0001-Attach-TexClient-for-DGXI-shared-HANDLE-to-ShSurfANG.patch
> 
> Review of attachment 8579633 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'll defer to nical, since this is primarily TextureClient changes.
> 
> ::: gfx/layers/d3d11/TextureD3D11.h
> @@ -59,5 @@
> >    virtual gfx::DrawTarget* BorrowDrawTarget() MOZ_OVERRIDE;
> >  
> > -  virtual bool AllocateForSurface(gfx::IntSize aSize,
> > -                                  TextureAllocationFlags aFlags = ALLOC_DEFAULT) MOZ_OVERRIDE;
> > -
> 
> Doesn't this break d2d content rendering? Texture clients that are used for
> ContentClient need to implement AllocateForSurface to work.
All callers of AllocateForSurface immediately follow the creation of one of these TextureClient subclasses.
I believe we're supposed to be replacing ctor+AllocateForSurface with static fallible constructors, but the patches are in limbo somewhere.

> @@ +85,5 @@
> >  };
> >  
> >  /**
> > + */
> > +class TextureThinClientDXGI : public TextureThinClient
> 
> We have a SharedTextureClientD3D9 already, which is almost identical to
> this, except the keep alive object is IDirect3DTexture9.
> 
> Could we share these and pass in a KeepAlive<T> object, or just IUnknown?

I'll look.
SharedTextureClientD3D9 allows many more operations than TextureThinClientDXGI, so I would prefer to keep TextureThinClientDXGI. More complications tend to cause issues (sometimes subtle) when someone later assumes that all SharedTextureClientD3D9s are more or less the same. I believe it's beneficial to limit the capabilities of objects in webgl's compositing pipeline.
Ok, on second look, not 'many more operations', but I really think it helps to keep a gap between the thin TexClients we need so far for WebGL and the more full-featured ones used by content rendering, etc.

For instance, SharedTextureClientD3D9 pretends to implement locking, and claims to have an internal buffer.

TextureThinClientDXGI does not implement locking at the TexClient level, and does not claim to have an internal buffer, since it's just keeping alive the d3d object. (I'm not sure what practical effect 'HasInternalBuffer()' has, anyway)

Since it sounds like they could be merged, I'd rather replace SharedTextureClientD3D9 with TextureThinClientDXGI in order to be explicit that this is not a TextureClient that can be manipulated directly by the client. It would save me a bunch of mental load to be able to ignore the possibility of TexClient locking on WebGL compositing objects. (Hence a separate abstract class for TextureThinClient)
The HasInternalBuffer thing sounds like a bug, SharedTextureClientD3D9 definitely should be returning false.

We only use SharedTextureClientD3D9 for the output of video decoding, so the use case is identical. We should be able to combine the two fairly easily.
(In reply to Matt Woodrow (:mattwoodrow) from comment #6)
> The HasInternalBuffer thing sounds like a bug, SharedTextureClientD3D9
> definitely should be returning false.
> 
> We only use SharedTextureClientD3D9 for the output of video decoding, so the
> use case is identical. We should be able to combine the two fairly easily.

Alright, that's good to hear. I'll merge them.
Flags: needinfo?(jbecerra)
This has some 'todo' MOZ_CRASH()s still, but how's the concept look?
Attachment #8579633 - Attachment is obsolete: true
Attachment #8579633 - Flags: review?(matt.woodrow)
Attachment #8584898 - Flags: feedback?(nical.bugzilla)
Comment on attachment 8584898 [details] [diff] [review]
0001-Stack-ShSurfs-on-TexClients.patch

Review of attachment 8584898 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't do a thorough review, but LGTM. Yay for SharedSurfaces owned by TextureClient!
Attachment #8584898 - Flags: feedback?(nical.bugzilla) → feedback+
(In reply to Nicolas Silva [:nical] from comment #9)
> Comment on attachment 8584898 [details] [diff] [review]
> 0001-Stack-ShSurfs-on-TexClients.patch
> 
> Review of attachment 8584898 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I didn't do a thorough review, but LGTM. Yay for SharedSurfaces owned by
> TextureClient!

Awesome, good to hear.
Blocks: e10s-gfx
No longer blocks: 1111396
Hi Jeff, what's the status here? We'd like to have to have these m6 bugs finished before the end of the month.
Flags: needinfo?(jgilbert)
(In reply to Bill McCloskey (:billm) from comment #11)
> Hi Jeff, what's the status here? We'd like to have to have these m6 bugs
> finished before the end of the month.

I'm chasing down lifetime issues on Android, but we're close.
Flags: needinfo?(jgilbert)
Flags: needinfo?(jbecerra)
Blocks: 1155649
30 files changed, 472 insertions(+), 645 deletions(-)
Net -173 lines!
Attachment #8584898 - Attachment is obsolete: true
Attachment #8598957 - Flags: review?(nical.bugzilla)
16 files changed, 84 insertions(+), 347 deletions(-)
Net -263 lines. :)
Attachment #8598958 - Flags: review?(nical.bugzilla)
21 files changed, 219 insertions(+), 112 deletions(-)
So net +107, but made up for in the other patches!
Attachment #8598960 - Flags: review?(nical.bugzilla)
Attachment #8598960 - Flags: review?(matt.woodrow)
(In reply to Jeff Gilbert [:jgilbert] from comment #16)
> Created attachment 8598960 [details] [diff] [review]
> 0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch
> 
>  21 files changed, 219 insertions(+), 112 deletions(-)
> So net +107, but made up for in the other patches!

This is a pretty massive patch, can you summarize what it does and why you need it?
Comment on attachment 8598957 [details] [diff] [review]
0001-Stack-ShSurfs-on-TexClients.patch

Review of attachment 8598957 [details] [diff] [review]:
-----------------------------------------------------------------

I would prefer that we end up merging SharedSurface and TextureClient completely but it doesn't have to happen right now. The fact that the TextureClient now owns the underlying texture (be it wrapped in a shared surface or not) is important and should let us make the transaction asynchronous again.

r+ with the commented out code removed.

::: gfx/gl/SharedSurface.h
@@ +251,4 @@
>  protected:
>      virtual UniquePtr<SharedSurface> CreateShared(const gfx::IntSize& size) = 0;
>  
> +    //std::queue<RefPtr<layers::SharedSurfaceTextureClient>> mRecyclePool;

nit: remove the commented out code

@@ +256,4 @@
>  
>  public:
>      UniquePtr<SharedSurface> NewSharedSurface(const gfx::IntSize& size);
> +    //TemporaryRef<ShSurfHandle> NewShSurfHandle(const gfx::IntSize& size);

nit: remove the commented out code

::: gfx/gl/SharedSurfaceGL.cpp
@@ +108,3 @@
>  // SharedSurface_GLTexture
>  
> +/*

nit: remove the commented out code

::: gfx/gl/SharedSurfaceGL.h
@@ +92,4 @@
>      }
>  };
>  
> +/*

Remove the commented out code.

::: gfx/layers/client/TextureClientSharedSurface.h
@@ +29,5 @@
> +class SharedSurfaceTextureClient : public TextureClient
> +{
> +protected:
> +  const UniquePtr<gl::SharedSurface> mSurf;
> +  const WeakPtr<gl::SurfaceFactory> mFactory;

If we want to use this TextureClient on several threads (and CanvasWorkers will want to do that), we'll have to remove WeakPtr which is not thread-safe.
(In reply to Nicolas Silva [:nical] from comment #18)
> 
> r+ with the commented out code removed.

Ah, this is done in another patch.
Attachment #8598958 - Flags: review?(nical.bugzilla) → review+
Attachment #8598957 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8598960 [details] [diff] [review]
0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch

Review of attachment 8598960 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/AtomicRefCountedWithFinalize.h
@@ +63,5 @@
> +
> +    template<class U>
> +    friend struct ::RunnableMethodTraits;
> +
> +    //friend class mozilla::gl::SurfaceFactory;

commented code

@@ +65,5 @@
> +    friend struct ::RunnableMethodTraits;
> +
> +    //friend class mozilla::gl::SurfaceFactory;
> +
> +    void AddRefManually(const char* funcName, const char* fileName, uint32_t lineNum) {

ok why not

::: gfx/layers/client/TextureClientSharedSurface.cpp
@@ +27,5 @@
> +  // AtomicRefCountedWithFinalize is a little strange. It attempts to recycle when
> +  // Release drops the ref count to 1. The idea is that the recycler holds a strong ref.
> +  // Here, we want to keep it simple, and always have a recycle callback, but only recycle
> +  // if the WeakPtr mFactory is still non-null. Therefore, we keep mRecyclingRef as ref to
> +  // ourself! Call StopRecycling() to end the cycle.

Is that really simpler? plus, WeakPtr makes the object not thread-safe which will be a problem when we want to use it with canvas worker.
(In reply to Matt Woodrow (:mattwoodrow) from comment #17)
> (In reply to Jeff Gilbert [:jgilbert] from comment #16)
> > Created attachment 8598960 [details] [diff] [review]
> > 0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch
> > 
> >  21 files changed, 219 insertions(+), 112 deletions(-)
> > So net +107, but made up for in the other patches!
> 
> This is a pretty massive patch, can you summarize what it does and why you
> need it?

It fixes object lifetime management for the TexClients.
(In reply to Nicolas Silva [:nical] from comment #20)
> Comment on attachment 8598960 [details] [diff] [review]
> 0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch
> 
> Review of attachment 8598960 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/AtomicRefCountedWithFinalize.h
> @@ +63,5 @@
> > +
> > +    template<class U>
> > +    friend struct ::RunnableMethodTraits;
> > +
> > +    //friend class mozilla::gl::SurfaceFactory;
> 
> commented code
> 
> @@ +65,5 @@
> > +    friend struct ::RunnableMethodTraits;
> > +
> > +    //friend class mozilla::gl::SurfaceFactory;
> > +
> > +    void AddRefManually(const char* funcName, const char* fileName, uint32_t lineNum) {
> 
> ok why not
> 
> ::: gfx/layers/client/TextureClientSharedSurface.cpp
> @@ +27,5 @@
> > +  // AtomicRefCountedWithFinalize is a little strange. It attempts to recycle when
> > +  // Release drops the ref count to 1. The idea is that the recycler holds a strong ref.
> > +  // Here, we want to keep it simple, and always have a recycle callback, but only recycle
> > +  // if the WeakPtr mFactory is still non-null. Therefore, we keep mRecyclingRef as ref to
> > +  // ourself! Call StopRecycling() to end the cycle.
> 
> Is that really simpler? plus, WeakPtr makes the object not thread-safe which
> will be a problem when we want to use it with canvas worker.

It's a formalization of just buffering the ref count by one by using RefPtr instead of manually addref/releasing. This saves us from having to maintain a set of refptrs in the recycler, and explicitly search-and-release the relevent refptr when we're done recycling an object.

Using a WeakPtr matcches the intent perfectly: "Recycle this unless the recycler is gone." (Rather than doing this implicitly via refcounting and juggling recycler pfns)

I found the recycles-at-ref=1 to be confusing when the normal approach is that it gets destroyed at 0. Longer-term, I would hope to replace this with a system that recycles at 0 rather than 1, but certainly didn't want to hold this feature up on that refactor.
Comment on attachment 8598960 [details] [diff] [review]
0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch

Review of attachment 8598960 [details] [diff] [review]:
-----------------------------------------------------------------

There are way too many unrelated changes in this patch. I see at least the ADDREF_MANUALLY debugging helpers, the actual recycling changes, the removal of RefPtrQueue and a whole bunch of indentation/renaming/various cleanup. These should all be separate patches to make it easier for the reviewers and to understand the history.

What's the idea behind adding AddRefManually? I guess the idea is just a debugging helper to ensure all manual addref/releases are balanced? This often isn't the case (as you've seen with the changes in SharedPlanarYCbCrImage), without actual indicating a bug.

I'm not entirely convinced that this extra code and overhead is worth it, do you have examples of how it prevents legitimate bugs? We also have the downside that AtomicRefCountedWithFinalize no longer matches the API of RefCounted/NS_INLINE_DECL_REFCOUNTING which is the API most developers are accustomed to. Having a very similar, but subtly different API for this sounds like it will cause pain.

The RefPtrQueue and most of the cleanup changes look great, will happily r+ those as standalone patches.

::: gfx/layers/client/TextureClientSharedSurface.cpp
@@ +41,5 @@
> +  // value. This means the dtor will see that mRecyclingRef is non-null, and try to
> +  // Release it, even though we're mid-Release! We need to clear mRecyclingRef so the dtor
> +  // doesn't try to Release it.
> +  auto pretendToUse = mRecyclingRef.forget().take();
> +  (void)pretendToUse;

mozilla::unused << mRecyclingRef.forget().take() is the style preferred according to the style guide.
Attachment #8598960 - Flags: review?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #23)
> Comment on attachment 8598960 [details] [diff] [review]
> 0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch
> 
> Review of attachment 8598960 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> There are way too many unrelated changes in this patch. I see at least the
> ADDREF_MANUALLY debugging helpers, the actual recycling changes, the removal
> of RefPtrQueue and a whole bunch of indentation/renaming/various cleanup.
> These should all be separate patches to make it easier for the reviewers and
> to understand the history.
I do not see how a fuller picture of the history would make this easier to review, since the development is a jumbled mix of changes, which eventually coalesce into the final form of the patch. Other reviews in this very bug express confusion over changes made in patch N that were fixed in patch N+1.

It would take a disproportionately long time to completely split this apart compared to the time it takes to review it in aggregate. (at least in my experience in both roles) This patch is only 27KB anyways, which is not unreasonably massive. (again, in my experience)
> 
> What's the idea behind adding AddRefManually? I guess the idea is just a
> debugging helper to ensure all manual addref/releases are balanced? This
> often isn't the case (as you've seen with the changes in
> SharedPlanarYCbCrImage), without actual indicating a bug.

Just because we don't currently have a bug doesn't mean it's a good idea to keep. Doing manual addref/release is Known Dangerous, so we should at least call this out, and make it much easier to debug if it does go wrong.

I spent days hunting down the double-release I mention in comments, and part of this was naturally trying to isolate and identify all manual addref/releases so ensure that those are not being unbalanced, causing a release-after-release. (which seems the obvious suspect, as opposed to the allegedly known-good RefPtr concepts)

If anything causes asserts after this lands, (similar to what I found with SharedPlanarYCbCrImage) it must either be an easy fix, or represents a serious bug and/or gap in our understanding. We can't afford to rely on "don't write bugs" to save us from manual addref/release issues.

> 
> I'm not entirely convinced that this extra code and overhead is worth it, do
> you have examples of how it prevents legitimate bugs? We also have the
> downside that AtomicRefCountedWithFinalize no longer matches the API of
> RefCounted/NS_INLINE_DECL_REFCOUNTING which is the API most developers are
> accustomed to. Having a very similar, but subtly different API for this
> sounds like it will cause pain.

A) It already doesn't match, and is already a pain. AtomicRefCountedWIthFinalize does Something Weird when refcount=1, and we have to juggle whether there's a callback function still around at dtor time. Also, Finalization is really really Sharp, as I found out with my double-release issue.
B) The compiler will yell at you if you try to AddRef/Release like a normal RefCounted<T>, so turn-around time and confusion is minimal.
C) We get a great invariant that we did the Known Dangerous AddRef/Release calls the same number of times, which saves a bunch of wasted effort during lifetime debugging.
D) Known Dangerous things are called out loudly, and can be easily spewed/breakpointed. The Known Good classes get access to the AddRef/Release calls directly.
E) I would absolutely love to see this support in our other refcounting methods, but that wasn't the problem case here, and it's not a battle I want to fight in this scope.
F) I've talked to a number of people during my debugging attempts, and received positive support for my approaches, particularly regarding being very explicit about manual refcounting.
G) There is no overhead in non-DEBUG builds.

> 
> The RefPtrQueue and most of the cleanup changes look great, will happily r+
> those as standalone patches.
> 
> ::: gfx/layers/client/TextureClientSharedSurface.cpp
> @@ +41,5 @@
> > +  // value. This means the dtor will see that mRecyclingRef is non-null, and try to
> > +  // Release it, even though we're mid-Release! We need to clear mRecyclingRef so the dtor
> > +  // doesn't try to Release it.
> > +  auto pretendToUse = mRecyclingRef.forget().take();
> > +  (void)pretendToUse;
> 
> mozilla::unused << mRecyclingRef.forget().take() is the style preferred
> according to the style guide.

Works for me. I only saw this once before.
(In reply to Jeff Gilbert [:jgilbert] from comment #24)

> I do not see how a fuller picture of the history would make this easier to
> review, since the development is a jumbled mix of changes, which eventually
> coalesce into the final form of the patch. Other reviews in this very bug
> express confusion over changes made in patch N that were fixed in patch N+1.
> 
> It would take a disproportionately long time to completely split this apart
> compared to the time it takes to review it in aggregate. (at least in my
> experience in both roles) This patch is only 27KB anyways, which is not
> unreasonably massive. (again, in my experience)

Sorry, I didn't mean the history of this patch, I meant the permanent HG history that will be recorded when this lands. It makes regression ranges and understanding the development of code significantly easier when changesets are limited to a single purpose.

Take this patch for example, your (well justified) reasons for adding the ADDREF_MANUALLY code wasn't covered at all by the description of 'It fixes object lifetime management for the TexClients', and led to confusion.

Splitting patches shouldn't take more than about a minute using qcrecord, it's well worth learning since it makes life easier for everyone.


> 
> A) It already doesn't match, and is already a pain.
> AtomicRefCountedWIthFinalize does Something Weird when refcount=1, and we
> have to juggle whether there's a callback function still around at dtor
> time. Also, Finalization is really really Sharp, as I found out with my
> double-release issue.
> B) The compiler will yell at you if you try to AddRef/Release like a normal
> RefCounted<T>, so turn-around time and confusion is minimal.
> C) We get a great invariant that we did the Known Dangerous AddRef/Release
> calls the same number of times, which saves a bunch of wasted effort during
> lifetime debugging.
> D) Known Dangerous things are called out loudly, and can be easily
> spewed/breakpointed. The Known Good classes get access to the AddRef/Release
> calls directly.
> E) I would absolutely love to see this support in our other refcounting
> methods, but that wasn't the problem case here, and it's not a battle I want
> to fight in this scope.
> F) I've talked to a number of people during my debugging attempts, and
> received positive support for my approaches, particularly regarding being
> very explicit about manual refcounting.
> G) There is no overhead in non-DEBUG builds.

Ok, consider me convinced then :)
Comment on attachment 8598960 [details] [diff] [review]
0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch

Review of attachment 8598960 [details] [diff] [review]:
-----------------------------------------------------------------

I understand that WeakPtr makes more sense than recycling when the ref count goes to 1. If you want to fix the other places where we recycle TextureClient, then having both systems now is not that bad.
My concern is that it is not thread-safe and that canvas workers are around the corner, so this code will have to be changed soon. At this point why not just use the ugly-but-thread-safe way for everything (or add a thread-safe way so that we can convert the rest of the code to it)?
r+ because the patch is correct and blocks some important stuff, but I am still a bit reluctant to adding new code that will need to be changed while there is already a (uglier) solution that covers all cases in tree.
Attachment #8598960 - Flags: review?(nical.bugzilla) → review+
(In reply to Nicolas Silva [:nical] from comment #26)
> Comment on attachment 8598960 [details] [diff] [review]
> 0003-Fix-usage-of-AtomicRefCountedWithFinalize.patch
> 
> Review of attachment 8598960 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I understand that WeakPtr makes more sense than recycling when the ref count
> goes to 1. If you want to fix the other places where we recycle
> TextureClient, then having both systems now is not that bad.
> My concern is that it is not thread-safe and that canvas workers are around
> the corner, so this code will have to be changed soon. At this point why not
> just use the ugly-but-thread-safe way for everything (or add a thread-safe
> way so that we can convert the rest of the code to it)?
> r+ because the patch is correct and blocks some important stuff, but I am
> still a bit reluctant to adding new code that will need to be changed while
> there is already a (uglier) solution that covers all cases in tree.

Yeah, I can agree with this. WebGL in Workers is right around the corner, so this won't be left to fester long.
r=nical already
Attachment #8598960 - Attachment is obsolete: true
Attachment #8601840 - Flags: review?(matt.woodrow)
r=nical already
Attachment #8601841 - Flags: review?(matt.woodrow)
r=nical already
Attachment #8601842 - Flags: review?(matt.woodrow)
r=nical already
Attachment #8601843 - Flags: review?(matt.woodrow)
Attachment #8601840 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8601841 [details] [diff] [review]
0004-Make-explicit-AddRef-Release-explicit.patch

Review of attachment 8601841 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/composite/FPSCounter.cpp
@@ +394,4 @@
>      Rect drawRect = Rect(aOffsetX + n * FontWidth, aOffsetY, FontWidth, FontHeight);
>      Rect clipRect = Rect(0, 0, 300, 100);
>      aCompositor->DrawQuad(drawRect, clipRect,
> +  aEffectChain, opacity, transform);

Indenting is still wrong here :)
Attachment #8601841 - Flags: review?(matt.woodrow) → review+
Attachment #8601842 - Flags: review?(matt.woodrow) → review+
Attachment #8601843 - Flags: review?(matt.woodrow) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f5bbfe33ed78

Android 4.0 API11+ debug, mostly mochitest-5 and -7, sometimes -8 and reftest-3: "Assertion failure: status != 0 (ClientWaitSync generated an error. Has mSync already been destroyed?)" a la https://treeherder.mozilla.org/logviewer.html#?job_id=9622488&repo=mozilla-inbound

ASan mochitest-e10s 1, bc1, bc3 and reftest: "AddressSanitizer: heap-use-after-free /builds/slave/m-in-l64-asan-0000000000000000/build/src/obj-firefox/gfx/layers/../../dist/include/mozilla/RefPtr.h:294 assign" a la https://treeherder.mozilla.org/logviewer.html#?job_id=9623479&repo=mozilla-inbound

OS X 10.10 mochitest-gl: unexpected passes in three tests, a la https://treeherder.mozilla.org/logviewer.html#?job_id=9624493&repo=mozilla-inbound
ASAN should now be green.  Working on the rest.
Flags: needinfo?(jgilbert)
The current branch is on github here: https://github.com/jdashg/gecko-dev/tree/tex-client-shsurf
We're currently hitting an assert because eglClientWaitSync is returning 0, which means there was an error. This is generally because we deleted the sync object in the content process already. This indicates that there's a lifetime mismatch between the TexClient and the TexHost.

Historically, this was because we recycled the EGLImage and used a different EGLSync. I made the EGL backend non-recycling in order to avoid this. It's worth double-checking that it's not still recycling.

I briefly tried to reproduce this most recent issue locally on Android, but was not successful. If this is really uncommon, we can probably just changes this to an NS_WARNING and fix it in a follow-up, but it does generally mean something is wrong.
Thanks for the patch and summary.  Jeff M. is going to follow up on this while Jeff G. is on PTO next two days so that we keep it going.
Flags: needinfo?(jmuizelaar)
James, random thoughts on the failures and comment 41?
Flags: needinfo?(snorp)
I'm not up on the latest TexClient/Host lifetime mechanisms, but we've definitely had problems like this before. At one point I thought the plan was to put the client side stuff in some limbo until the host side was destroyed, and then decide after that if we want to destroy or recycle the client? I think nical will have the best ideas here wrt to the lifetime stuff.

If I have some time, I'll try to build this and debug locally.
Flags: needinfo?(snorp) → needinfo?(nical.bugzilla)
The mSync object that we're calling fClientWaitSync is 0, which explains the complaint. My guess is that this test machine doesn't support this extension but we don't check for it.
Flags: needinfo?(jmuizelaar)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> The mSync object that we're calling fClientWaitSync is 0, which explains the
> complaint. My guess is that this test machine doesn't support this extension
> but we don't check for it.

It seems we do check for the extensions in SharedSurfaceEGL when the fence is created, but we just leave it set to 0 if we lack extensions or fail to create the sync. We should probably just kill this assertion that's failing?
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> > The mSync object that we're calling fClientWaitSync is 0, which explains the
> > complaint. My guess is that this test machine doesn't support this extension
> > but we don't check for it.
> 
> It seems we do check for the extensions in SharedSurfaceEGL when the fence
> is created, but we just leave it set to 0 if we lack extensions or fail to
> create the sync. We should probably just kill this assertion that's failing?

We should only call WaitSync if we have a fence.

Here's a patch that does:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1db22660a5c3
(In reply to Jeff Muizelaar [:jrmuizel] from comment #47)
> 
> Here's a patch that does:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=1db22660a5c3

It looks like we are not drawing subsequent frames in R3 on Android 4.0 API11
(In reply to Jeff Muizelaar [:jrmuizel] from comment #48)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #47)
> > 
> > Here's a patch that does:
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=1db22660a5c3
> 
> It looks like we are not drawing subsequent frames in R3 on Android 4.0 API11

and only when we don't have alpha
That is to say:

webgl-color-test.html?frame=6 fails as long as there's no alpha argument
I tried mucking around with a couple alpha related things, but found no success.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> > The mSync object that we're calling fClientWaitSync is 0, which explains the
> > complaint. My guess is that this test machine doesn't support this extension
> > but we don't check for it.
> 
> It seems we do check for the extensions in SharedSurfaceEGL when the fence
> is created, but we just leave it set to 0 if we lack extensions or fail to
> create the sync. We should probably just kill this assertion that's failing?

Rather, only assert if we have the extension. I'm checking if our Try slaves lack the extension.
(In reply to Jeff Gilbert [:jgilbert] from comment #52)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> > (In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> > > The mSync object that we're calling fClientWaitSync is 0, which explains the
> > > complaint. My guess is that this test machine doesn't support this extension
> > > but we don't check for it.
> > 
> > It seems we do check for the extensions in SharedSurfaceEGL when the fence
> > is created, but we just leave it set to 0 if we lack extensions or fail to
> > create the sync. We should probably just kill this assertion that's failing?
> 
> Rather, only assert if we have the extension. I'm checking if our Try slaves
> lack the extension.

This still asserts: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2352c7611d1
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #44)
> I'm not up on the latest TexClient/Host lifetime mechanisms, but we've
> definitely had problems like this before. At one point I thought the plan
> was to put the client side stuff in some limbo until the host side was
> destroyed, and then decide after that if we want to destroy or recycle the
> client? I think nical will have the best ideas here wrt to the lifetime
> stuff.
> 
> If I have some time, I'll try to build this and debug locally.

Looking at the last few comments my understanding is that the issue is related to the availability of a gl extension so clearing the needinfo. Please needinfo-me back if I got it wrong.
But in short, right now webgl goes through a very slow path where the content side synchronously waits for the host to have let go of the texture before releasing it, specifically because the TextureClient doesn't own the SharedSurface and can't make assumptions about its lifetime. This patch is making the SharedSurface owned by TextureClient which will let us remove this slow lifetime-paranoia-mode eventually but as far as i know we still go through the slow and safe route with the patch.
Flags: needinfo?(nical.bugzilla)
(In reply to Jeff Gilbert [:jgilbert] from comment #53)
> (In reply to Jeff Gilbert [:jgilbert] from comment #52)
> > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> > > (In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> > > > The mSync object that we're calling fClientWaitSync is 0, which explains the
> > > > complaint. My guess is that this test machine doesn't support this extension
> > > > but we don't check for it.
> > > 
> > > It seems we do check for the extensions in SharedSurfaceEGL when the fence
> > > is created, but we just leave it set to 0 if we lack extensions or fail to
> > > create the sync. We should probably just kill this assertion that's failing?
> > 
> > Rather, only assert if we have the extension. I'm checking if our Try slaves
> > lack the extension.
> 
> This still asserts:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2352c7611d1

I don't see any asserts in that job.
(In reply to Nicolas Silva [:nical] from comment #54)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #44)
> > I'm not up on the latest TexClient/Host lifetime mechanisms, but we've
> > definitely had problems like this before. At one point I thought the plan
> > was to put the client side stuff in some limbo until the host side was
> > destroyed, and then decide after that if we want to destroy or recycle the
> > client? I think nical will have the best ideas here wrt to the lifetime
> > stuff.
> > 
> > If I have some time, I'll try to build this and debug locally.
> 
> Looking at the last few comments my understanding is that the issue is
> related to the availability of a gl extension so clearing the needinfo.
> Please needinfo-me back if I got it wrong.
> But in short, right now webgl goes through a very slow path where the
> content side synchronously waits for the host to have let go of the texture
> before releasing it, specifically because the TextureClient doesn't own the
> SharedSurface and can't make assumptions about its lifetime. This patch is
> making the SharedSurface owned by TextureClient which will let us remove
> this slow lifetime-paranoia-mode eventually but as far as i know we still go
> through the slow and safe route with the patch.

It's not extension-related. I falsified this theory.
Flags: needinfo?(nical.bugzilla)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #55)
> (In reply to Jeff Gilbert [:jgilbert] from comment #53)
> > (In reply to Jeff Gilbert [:jgilbert] from comment #52)
> > > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #46)
> > > > (In reply to Jeff Muizelaar [:jrmuizel] from comment #45)
> > > > > The mSync object that we're calling fClientWaitSync is 0, which explains the
> > > > > complaint. My guess is that this test machine doesn't support this extension
> > > > > but we don't check for it.
> > > > 
> > > > It seems we do check for the extensions in SharedSurfaceEGL when the fence
> > > > is created, but we just leave it set to 0 if we lack extensions or fail to
> > > > create the sync. We should probably just kill this assertion that's failing?
> > > 
> > > Rather, only assert if we have the extension. I'm checking if our Try slaves
> > > lack the extension.
> > 
> > This still asserts:
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2352c7611d1
> 
> I don't see any asserts in that job.

The reds: m5,7,8 and R3.
(In reply to Jeff Gilbert [:jgilbert] from comment #57)
> The reds: m5,7,8 and R3.

Indeed. I must have been looking at the wrong job.
Fence is called twice, which destroys the original sync object.

The second fence call is from GLContext::Readback, where we call ProducerRelease (after Acquiring it) as added here:
http://hg.mozilla.org/mozilla-central/rev/ed186e5af4a1
https://bugzilla.mozilla.org/show_bug.cgi?id=1116190
Flags: needinfo?(nical.bugzilla)
Attachment #8609051 - Flags: review?(matt.woodrow)
Attachment #8609054 - Flags: review?(matt.woodrow)
Attachment #8609055 - Flags: review?(jmuizelaar)
Comment on attachment 8609055 [details] [diff] [review]
0009-Use-ConsumerAcquire-instead-of-ProducerAcquire.patch

Review of attachment 8609055 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContext.cpp
@@ +2516,5 @@
>          ScopedBindFramebuffer autoFB(this);
>  
> +        // We're consuming from the producer side, so which do we use?
> +        // Really, we just want a read-only lock, so ConsumerAcquire is the best match.
> +        src->ConsumerAcquire();

We need the producer lock to be able to read from the surface.
Attachment #8609055 - Flags: review?(jmuizelaar) → review+
Attachment #8609055 - Flags: review+ → review-
Attachment #8609055 - Attachment is obsolete: true
Attachment #8609077 - Flags: review?(jmuizelaar)
Comment on attachment 8609077 [details] [diff] [review]
0009-Add-producer-read-only-lock-to-ShSurf.patch

Review of attachment 8609077 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.
Attachment #8609077 - Flags: review?(jmuizelaar) → review+
Attachment #8609060 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8609058 [details] [diff] [review]
0010-Assert-if-Fencing-twice-and-don-t-always-expect-a-sy.patch

Review of attachment 8609058 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SharedSurfaceEGL.cpp
@@ +117,4 @@
>          mGL->IsExtensionSupported(GLContext::OES_EGL_sync))
>      {
>          if (mSync) {
> +            MOZ_RELEASE_ASSERT(false, "Non-recycleable should not Fence twice.");

Since this is going to crash even in release builds, that make the two lines below dead code.

Can't we simplify the entire block to just MOZ_RELEASE_ASSERT(!mSync, ...)?
Attachment #8609058 - Flags: review?(matt.woodrow) → review+
Attachment #8609054 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8609051 [details] [diff] [review]
0007-Hold-recycle-ref-in-recycler.patch

Review of attachment 8609051 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SharedSurface.h
@@ +289,4 @@
>  protected:
>      SurfaceCaps mDrawCaps;
>      SurfaceCaps mReadCaps;
> +    RefQueue<layers::SharedSurfaceTextureClient> mRecycleFreePool;

Why do we need this change? Having our own RefQueue class seems like unnecessary complexity if queue<RefPtr<T>> works
(In reply to Matt Woodrow (:mattwoodrow) from comment #70)
> Comment on attachment 8609058 [details] [diff] [review]
> 0010-Assert-if-Fencing-twice-and-don-t-always-expect-a-sy.patch
> 
> Review of attachment 8609058 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/SharedSurfaceEGL.cpp
> @@ +117,4 @@
> >          mGL->IsExtensionSupported(GLContext::OES_EGL_sync))
> >      {
> >          if (mSync) {
> > +            MOZ_RELEASE_ASSERT(false, "Non-recycleable should not Fence twice.");
> 
> Since this is going to crash even in release builds, that make the two lines
> below dead code.
> 
> Can't we simplify the entire block to just MOZ_RELEASE_ASSERT(!mSync, ...)?

MOZ_RELEASE_ASSERT does not assert in non-DEBUG Beta and Release builds, though it asserts in other non-DEBUG builds.
(In reply to Matt Woodrow (:mattwoodrow) from comment #71)
> Comment on attachment 8609051 [details] [diff] [review]
> 0007-Hold-recycle-ref-in-recycler.patch
> 
> Review of attachment 8609051 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/SharedSurface.h
> @@ +289,4 @@
> >  protected:
> >      SurfaceCaps mDrawCaps;
> >      SurfaceCaps mReadCaps;
> > +    RefQueue<layers::SharedSurfaceTextureClient> mRecycleFreePool;
> 
> Why do we need this change? Having our own RefQueue class seems like
> unnecessary complexity if queue<RefPtr<T>> works

There's a difference in surety for me. I'm fairly sure queue<RefPtr<>> works, but it's really hard to be very sure. I *know* this custom one works the way I expect. It's designed to be stubbed right out though, so we can easily switch it to use std::queue again.
(In reply to Jeff Gilbert [:jgilbert] from comment #73)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #71)
> > Comment on attachment 8609051 [details] [diff] [review]
> > 0007-Hold-recycle-ref-in-recycler.patch
> > 
> > Review of attachment 8609051 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: gfx/gl/SharedSurface.h
> > @@ +289,4 @@
> > >  protected:
> > >      SurfaceCaps mDrawCaps;
> > >      SurfaceCaps mReadCaps;
> > > +    RefQueue<layers::SharedSurfaceTextureClient> mRecycleFreePool;
> > 
> > Why do we need this change? Having our own RefQueue class seems like
> > unnecessary complexity if queue<RefPtr<T>> works
> 
> There's a difference in surety for me. I'm fairly sure queue<RefPtr<>>
> works, but it's really hard to be very sure. I *know* this custom one works
> the way I expect. It's designed to be stubbed right out though, so we can
> easily switch it to use std::queue again.

That doesn't really apply to other readers though. i.e this adds a layer of abstraction that readers need to look at to understand what it does. What are you worried about not working with queue<RefPtr<>> or set<RefPtr<>>?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #74)
> That doesn't really apply to other readers though. i.e this adds a layer of
> abstraction that readers need to look at to understand what it does.
This is largely fixed by stating that we're tightly wrapping the std container. By inspection, it's very quickly clear that it functions as advertised. 

> What are you worried about not working with queue<RefPtr<>> or set<RefPtr<>>?
The AtomicRefCountedWithFinalize is very sensitive to what happens when there is only one reference left. It is treacherous to use classes which may make a copy at an inopportune time. "We *should* be safe from this" is my thinking, but it was easy enough to remove doubt.
(In reply to Jeff Gilbert [:jgilbert] from comment #75)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #74)
> > That doesn't really apply to other readers though. i.e this adds a layer of
> > abstraction that readers need to look at to understand what it does.
> This is largely fixed by stating that we're tightly wrapping the std
> container. By inspection, it's very quickly clear that it functions as
> advertised. 
> 
> > What are you worried about not working with queue<RefPtr<>> or set<RefPtr<>>?
> The AtomicRefCountedWithFinalize is very sensitive to what happens when
> there is only one reference left. It is treacherous to use classes which may
> make a copy at an inopportune time. "We *should* be safe from this" is my
> thinking, but it was easy enough to remove doubt.

I'm not entirely convinced, but a minimum you should add this kind of rationale to RefQueue so that people don't wonder why it exists.
Comment on attachment 8609051 [details] [diff] [review]
0007-Hold-recycle-ref-in-recycler.patch

Review of attachment 8609051 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/SharedSurface.cpp
@@ +349,5 @@
>      ret = new layers::SharedSurfaceTextureClient(mAllocator, mFlags, Move(surf), this);
> +
> +    StartRecycling(ret);
> +
> +    mRecycleTotalPool.insert(ret);

We already did this in StartRecycling, right?

@@ +381,5 @@
>  {
>      MOZ_ASSERT(NS_IsMainThread());
>  
> +    RefPtr<layers::SharedSurfaceTextureClient> tc = (layers::SharedSurfaceTextureClient*)rawTC;
> +    SurfaceFactory* factory = (SurfaceFactory*)rawFactory;

Prefer c++ style static_cast<> instead of c style.
(In reply to Jeff Gilbert [:jgilbert] from comment #75)
 
> > What are you worried about not working with queue<RefPtr<>> or set<RefPtr<>>?
> The AtomicRefCountedWithFinalize is very sensitive to what happens when
> there is only one reference left. It is treacherous to use classes which may
> make a copy at an inopportune time. "We *should* be safe from this" is my
> thinking, but it was easy enough to remove doubt.

I'm not convinced either, the queue should always be holding at least one ref (it might hold more while copying), but that shouldn't an issue.

If we have issues with races then this doesn't seem like a great way to avoid them and could easily break in the future.
(In reply to Matt Woodrow (:mattwoodrow) from comment #77)
> Comment on attachment 8609051 [details] [diff] [review]
> 0007-Hold-recycle-ref-in-recycler.patch
> 
> Review of attachment 8609051 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/gl/SharedSurface.cpp
> @@ +349,5 @@
> >      ret = new layers::SharedSurfaceTextureClient(mAllocator, mFlags, Move(surf), this);
> > +
> > +    StartRecycling(ret);
> > +
> > +    mRecycleTotalPool.insert(ret);
> 
> We already did this in StartRecycling, right?

Yep! (It fails quietly, but we should not attempt it)

> 
> @@ +381,5 @@
> >  {
> >      MOZ_ASSERT(NS_IsMainThread());
> >  
> > +    RefPtr<layers::SharedSurfaceTextureClient> tc = (layers::SharedSurfaceTextureClient*)rawTC;
> > +    SurfaceFactory* factory = (SurfaceFactory*)rawFactory;
> 
> Prefer c++ style static_cast<> instead of c style.

OK.
(In reply to Matt Woodrow (:mattwoodrow) from comment #78)
> (In reply to Jeff Gilbert [:jgilbert] from comment #75)
>  
> > > What are you worried about not working with queue<RefPtr<>> or set<RefPtr<>>?
> > The AtomicRefCountedWithFinalize is very sensitive to what happens when
> > there is only one reference left. It is treacherous to use classes which may
> > make a copy at an inopportune time. "We *should* be safe from this" is my
> > thinking, but it was easy enough to remove doubt.
> 
> I'm not convinced either, the queue should always be holding at least one
> ref (it might hold more while copying), but that shouldn't an issue.
It's a potential problem if it AddRefs then Releases back down to 1, triggering a recycle when it's already in a recycled state.
> 
> If we have issues with races then this doesn't seem like a great way to
> avoid them and could easily break in the future.

This is not meant to address races.
Attachment #8611000 - Flags: review?(matt.woodrow)
Attachment #8611001 - Flags: review?(matt.woodrow)
Attachment #8611001 - Flags: review?(jmuizelaar)
Attachment #8611001 - Flags: review?(jmuizelaar) → review+
Attachment #8611000 - Flags: review?(matt.woodrow) → review+
Attachment #8609051 - Flags: review?(matt.woodrow) → review+
Attachment #8611001 - Flags: review?(matt.woodrow) → review+
Attachment #8611001 - Flags: checkin-
Sotaro: Any insight into why this might be happening?
https://treeherder.mozilla.org/logviewer.html#?job_id=7997876&repo=try
Flags: needinfo?(jgilbert) → needinfo?(sotaro.ikeda.g)
DropGrallocBuffer() seems to be called from both client and host sides. It should be called only from one side for one gralloc buffer. If client call it, host side should not call it.

From client side, GrallocTextureClientOGL::~GrallocTextureClientOGL() does it.
 https://dxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/GrallocTextureClient.cpp#
Flags: needinfo?(sotaro.ikeda.g)
This is looking good, though m12 debug hasn't finished yet. (It was crashing at ~50m, and is now at 80m)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfd61c608516
Attachment #8614446 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8614446 [details] [diff] [review]
0013-Mark-ShSurfGralloc-TexClient-as-mShared-to-prevent-d.patch

Review of attachment 8614446 [details] [diff] [review]:
-----------------------------------------------------------------

"TextureClient::mShared == true" means TextureClient will have TextureHost. It is already set correctly by TextureClient::InitIPDLActor(). So, it is not good to set it manually. I reproduced the problematic symptom on my flame-kk. Then I am going to investigate why it happens.
Attachment #8614446 - Flags: review?(sotaro.ikeda.g)
----------------------------------------------------------
> 
> "TextureClient::mShared == true" means TextureClient will have TextureHost.
> It is already set correctly by TextureClient::InitIPDLActor(). So, it is not
> good to set it manually.

It could cause gralloc resource leak.
(In reply to Sotaro Ikeda [:sotaro] from comment #91)
> Comment on attachment 8614446 [details] [diff] [review]
> 0013-Mark-ShSurfGralloc-TexClient-as-mShared-to-prevent-d.patch
> 
> Review of attachment 8614446 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> "TextureClient::mShared == true" means TextureClient will have TextureHost.
> It is already set correctly by TextureClient::InitIPDLActor(). So, it is not
> good to set it manually. I reproduced the problematic symptom on my
> flame-kk. Then I am going to investigate why it happens.

mShared is never set to false again, and I set it to true just as if we called InitIPDLActor on it. We don't call InitIPDLActor because we're actually using a different TexClient, which owns a SharedSurface, which in turn owns the Gralloc TexClient. We should not be leaking, because we communicate the descriptor of the gralloc TexClient, though we act through the IDPL actor of the TexClientShSurf.
I seems to understand what happening.

SharedSurfaceTextureClient wraps GrallocTextureClient via gl::SharedSurface. Therefore in effect, GrallocTextureHost's peer is SharedSurfaceTextureClient, not GrallocTextureClient. GrallocTextureClient is not directly related to GrallocTextureHost. One exception is ToSurfaceDescriptor(). GrallocTextureClientOGL::ToSurfaceDescriptor() is called from SharedSurfaceTextureClient::ToSurfaceDescriptor().

ToSurfaceDescriptor() is called only from TextureClient::InitIPDLActor(). And when ToSurfaceDescriptor() is called, TextureClient always calls "mActor->IPCOpen()". It creates TextureHost.

Therefore when SharedSurface_Gralloc::ToSurfaceDescriptor() is called, if we set "GrallocTextureClient::mShare = true", mShare is correctly set in correct timing. Then attachment 8614446 [details] [diff] [review] should work correctly on gonk.
attachment 8614446 [details] [diff] [review] seems to fix the problem on gonk. But it seems that mShare should be set to true for all wrapped TextureClient.
nical, how do you think about comment 95?
Flags: needinfo?(nical.bugzilla)
(In reply to Jeff Gilbert [:jgilbert] from comment #93)
> (In reply to Sotaro Ikeda [:sotaro] from comment #91)
> > Comment on attachment 8614446 [details] [diff] [review]
> > 0013-Mark-ShSurfGralloc-TexClient-as-mShared-to-prevent-d.patch
> > 
> > Review of attachment 8614446 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > "TextureClient::mShared == true" means TextureClient will have TextureHost.
> > It is already set correctly by TextureClient::InitIPDLActor(). So, it is not
> > good to set it manually. I reproduced the problematic symptom on my
> > flame-kk. Then I am going to investigate why it happens.
> 
> mShared is never set to false again, and I set it to true just as if we
> called InitIPDLActor on it. We don't call InitIPDLActor because we're
> actually using a different TexClient, which owns a SharedSurface, which in
> turn owns the Gralloc TexClient. We should not be leaking, because we
> communicate the descriptor of the gralloc TexClient, though we act through
> the IDPL actor of the TexClientShSurf.

Yhea, it does not do leak on gonk. But as in comment 95, it seems better to update mShared of all wrapped TextureClient.
(In reply to Sotaro Ikeda [:sotaro] from comment #95)
> attachment 8614446 [details] [diff] [review] seems to fix the problem on
> gonk. But it seems that mShare should be set to true for all wrapped
> TextureClient.

I am fine with whatever temporary hack we do in the short term to address this, as long as we don't keep wrapping a TextureClient in a sharedsurface in another TextureClient for too long.

We should either have only TextureClient own a SharedSurface, or better: merge TextureClient and SharedSurface.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #98)
> 
> I am fine with whatever temporary hack we do in the short term to address
> this, as long as we don't keep wrapping a TextureClient in a sharedsurface
> in another TextureClient for too long.

Yea, in longer term, it is better to remove the hack.
Flags: needinfo?(jgilbert)
https://hg.mozilla.org/mozilla-central/rev/10f2a5e84c91
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Blocks: 1186926
Bug 1144906 breaks fence handling. TextureClientSharedSurface does fence handling between Host side. But GrallocTextureClient does fence handling between SharedSurface.
No longer depends on: 1216549
Depends on: 1194185
Depends on: 1241484
Depends on: 1247611
This caused a tracked regression bug 1247611, Jeff, what's the next step here?
Flags: needinfo?(jgilbert)
(In reply to Milan Sreckovic [:milan] from comment #106)
> This caused a tracked regression bug 1247611, Jeff, what's the next step
> here?

Let's talk in that bug.
Flags: needinfo?(jgilbert)
Depends on: 1273132
Blocks: 1264505
You need to log in before you can comment on or make changes to this bug.