Closed Bug 747811 Opened 8 years ago Closed 7 years ago

TiledThebesLayer is not Multi-process safe

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: BenWa, Assigned: cwiiis)

References

Details

Attachments

(4 files, 11 obsolete files)

33.98 KB, patch
nrc
: review+
Details | Diff | Splinter Review
24.66 KB, patch
cwiiis
: review+
nrc
: review+
Details | Diff | Splinter Review
16.73 KB, patch
cwiiis
: review+
Details | Diff | Splinter Review
14.32 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
We currently share pointers directly. We need store the tiles in shared memory and make this multi-process safe. This was overlooked to get tiling done in time for Fennec but normally should of be done before landing.
Blocks: 748649
Attached patch wip (obsolete) — Splinter Review
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attached patch patch (obsolete) — Splinter Review
Attachment #618798 - Attachment is obsolete: true
Attached patch patch (obsolete) — Splinter Review
Attachment #619085 - Attachment is obsolete: true
Attachment #619089 - Flags: review?(jones.chris.g)
Depends on: 749723
Comment on attachment 619089 [details] [diff] [review]
patch

In general, I prefer the impl we discussed on IRC, where
 - there's a "busy" bit on the shared image
 - busy-ness is just volatile-load/store'd (i.e. loads/stores don't
   need a total ordering)
 - we unset busy on the compositor thread after the surface is first
   uploaded, to free it up for use on the content thread ASAP

But the fixes here are more important, so we can do the rest in a
followup.

>diff --git a/gfx/layers/basic/BasicTiledThebesLayer.cpp b/gfx/layers/basic/BasicTiledThebesLayer.cpp

>+void
>+BasicTiledLayerBuffer::GetSurfaceDescriptorTiles(SurfaceDescriptorTiles *aDescriptor)
>+{
>+  InfallibleTArray<TileDescriptor> tiles;
>+
>+  for (size_t i = 0; i < mRetainedTiles.Length(); i++) {
>+    TileDescriptor tileDesc;
>+    if (mRetainedTiles.SafeElementAt(i, GetPlaceholderTile()) == GetPlaceholderTile()) {
>+      tileDesc = PlaceholderTileDescriptor(0);

I don't understand this, see below.

> BasicTiledLayerTile
> BasicTiledLayerBuffer::ValidateTileInternal(BasicTiledLayerTile aTile,
>                                             const nsIntPoint& aTileOrigin,
>                                             const nsIntRect& aDirtyRect)
> {

>-    aTile = BasicTiledLayerTile(tmpTile);
>+    nsRefPtr<gfxSharedImageSurface> sharedImage =
>+      gfxSharedImageSurface::CreateUnsafe(mThebesLayer->GetShadow(),
>+                                          gfxIntSize(GetTileLength(), GetTileLength()),
>+                                          GetFormat());

Please doc why you need to use CreateUnsafe() here.

>diff --git a/gfx/layers/basic/BasicTiledThebesLayer.h b/gfx/layers/basic/BasicTiledThebesLayer.h

>+  BasicTiledLayerBuffer(const nsIntRegion& aValidRegion,
>+                        const nsIntRegion& aLastPaintRegion,
>+                        const InfallibleTArray<TileDescriptor>& aTiles,
>+                        int aRetainedWidth,
>+                        int aRetainedHeight)
>+  {

This belongs in the .cpp.

>diff --git a/gfx/layers/ipc/PLayers.ipdl b/gfx/layers/ipc/PLayers.ipdl

>+struct PlaceholderTileDescriptor {
>+  bool empty; // void
>+};

What's a placeholder tile descriptor?  Comment here would help.

Depending on what this is, you probably want |void_t| here, which
doesn't occupy any space, unlike |bool|.

>+struct SurfaceDescriptorTiles {

I think there's perhaps a more/better descriptive name for this
creature, but I don't have any good suggestions.

>+  // Later when we want to support more complex tile types we will
>+  // use a TileDescriptor union.
>+  TileDescriptor[] tiles;

This is already a union, yeah?

>+  int         retainedWidth;
>+  int         retainedHeight;

int32_t

>diff --git a/gfx/thebes/gfxReusableSurfaceWrapper.cpp b/gfx/thebes/gfxReusableSurfaceWrapper.cpp
>diff --git a/gfx/thebes/gfxReusableSurfaceWrapper.h b/gfx/thebes/gfxReusableSurfaceWrapper.h

I don't know anything about gfxReusableSurfaceWrapper.  Is there
someone else who can review this code?  I can review the IPC usage if
someone else takes the class semantics.  Or you can catch me up on IRC
and I can review both parts.

>diff --git a/gfx/thebes/gfxSharedImageSurface.cpp b/gfx/thebes/gfxSharedImageSurface.cpp

> struct SharedImageInfo {
>     PRInt32 width;
>     PRInt32 height;
>     PRInt32 format;
>+    PRInt32 readCount;

This should be declared |volatile|, use the NSPR atomic type and be
placed first in this struct.

>+void
>+gfxSharedImageSurface::ReadLock()
>+{
>+void
>+gfxSharedImageSurface::ReadUnlock()
>+{
>+int
>+gfxSharedImageSurface::GetReadCount()

Please MOZ_ASSERT() here that the reader count is always >= 0.

>diff --git a/gfx/thebes/gfxSharedImageSurface.h b/gfx/thebes/gfxSharedImageSurface.h

>+    // To be used by gfxReusableSurfaceWrapper
>+    void ReadLock();
>+    void ReadUnlock();
>+    int GetReadCount();
>+

Naming nits:
  LockForRead()
  Unlock()
  GetReaderCount()

This is looking good.
Attachment #619089 - Flags: review?(jones.chris.g)
(In reply to Chris Jones [:cjones] [:warhammer] from comment #4)
> In general, I prefer the impl we discussed on IRC, where

I feel like the patch here does this:

>  - there's a "busy" bit on the shared image

We use a counter instead of a busy bit. This is because async transaction are queued in the compositors event queue, thus the same tile can be sent multiple times to the compositor and thus more then 1 copy can be sent in the compositor queue.

>  - busy-ness is just volatile-load/store'd (i.e. loads/stores don't
>    need a total ordering)

If we do go with the count is are the pr atomic read/write what we want here.

>  - we unset busy on the compositor thread after the surface is first
>    uploaded, to free it up for use on the content thread ASAP
> 

Currently I do it after everything is uploaded. We can certainly unlock tile by tile. 

Another optimization is to predict which tiles the compositor will read so we can get away with locking less tiles. This is the type of work I rather do when they show up in profile. It takes us a while to get back in painting code so we've likely uploaded everything by then. So I rather wait for evidence before doing something more complex.
This patch rotted a fair amount.  Some further updates

(In reply to Chris Jones [:cjones] [:warhammer] from comment #4)
> Comment on attachment 619089 [details] [diff] [review]
> patch
> 
> >diff --git a/gfx/layers/basic/BasicTiledThebesLayer.h b/gfx/layers/basic/BasicTiledThebesLayer.h
> 
> >+struct PlaceholderTileDescriptor {
> >+  bool empty; // void
> >+};
>
> Depending on what this is, you probably want |void_t| here, which
> doesn't occupy any space, unlike |bool|.
> 

You can now use an empty struct

  struct PlaceholderTileDescriptor { };

but I don't recall what this was being used for.  Rebasing the patch might remind me.

(In reply to Benoit Girard (:BenWa) from comment #5)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #4)
> > In general, I prefer the impl we discussed on IRC, where
> 
> I feel like the patch here does this:
> 
> >  - there's a "busy" bit on the shared image
> 
> We use a counter instead of a busy bit. This is because async transaction
> are queued in the compositors event queue, thus the same tile can be sent
> multiple times to the compositor and thus more then 1 copy can be sent in
> the compositor queue.
> 

Per previous comment this work can go to a followup, but as I recall from our discussion a long time back, all tiles with refcount > 0 are uploaded, which is not what we want.  The busy bit is simpler and can slightly optimize the upload pipeline.
Attached patch rebased, but broken (obsolete) — Splinter Review
I rebased the dumb stuff, but this patch fails to compile with more interesting problems

../../dist/include/BasicTiledThebesLayer.h:70: error: 'BasicTileDescriptor' has not been declared
../../dist/include/BasicTiledThebesLayer.h:71: error: ISO C++ forbids declaration of 'BasicTileDescriptor' with no type
../../dist/include/BasicTiledThebesLayer.h:71: error: expected ',' or '...' before '&' token
In file included from /home/cjones/mozilla/inbound/gfx/layers/opengl/TiledThebesLayerOGL.h:13,
                 from /home/cjones/mozilla/inbound/gfx/layers/opengl/ReusableTileStoreOGL.h:9,
                 from /home/cjones/mozilla/inbound/gfx/layers/opengl/ReusableTileStoreOGL.cpp:6:
../../dist/include/BasicTiledThebesLayer.h:93: error: 'TileDescriptor' was not declared in this scope
../../dist/include/BasicTiledThebesLayer.h:93: error: template argument 1 is invalid
../../dist/include/BasicTiledThebesLayer.h: In constructor 'mozilla::layers::BasicTiledLayerBuffer::BasicTiledLayerBuffer(const nsIntRegion&, const nsIntRegion&, const int&, int, int)':
../../dist/include/BasicTiledThebesLayer.h:102: error: request for member 'Length' in 'aTiles', which is of non-class type 'const int'
../../dist/include/BasicTiledThebesLayer.h:103: error: invalid types 'const int[size_t]' for array subscript
../../dist/include/BasicTiledThebesLayer.h:103: error: 'TileDescriptor' has not been declared
../../dist/include/BasicTiledThebesLayer.h:106: error: expected initializer before '&' token
../../dist/include/BasicTiledThebesLayer.h:107: error: 'basicTileDesc' was not declared in this scope

BenWa volunteered to pick this back up tomorrow, definitely the right person for the job :).
Attached patch rebased (obsolete) — Splinter Review
Attachment #632147 - Attachment is obsolete: true
BenWa and I discussed this for a while and there are some unresolved questions around this impl, and some followup work that this will "compete with", so we're going to go with the better-understood gralloc impl for now.  This patch is very good to have in reserve, thanks!
No longer blocks: b2g-layers-work, b2g-e10s-work
As discuss I'll be putting this patch on hold. Here's what is needed to fix this patch:
This patch will leak Shmem. This is because we now have more then one gfxReusableSurfaceWrapper pointing to the same Shmem. Also deleting the gfxSharedImageSurface doesn't release the Shmem. To fix this the gfxReusableSurfaceWrapper needs a 'ISurfaceDeAllocator' and must be changed to use the read lock of the Shmem to release it.
BenWa, when you add features like the tile "generation counting" that will need work for multi-process, can you set those bugs to block this?  Thanks!
I'm not sure what you mean by 'generation counting'. We already support reusing the same tile across multiple paints.
Skipping upload of a buffer if the "paint count" of the tile is higher (newer) than that of the buffer.  That is, if there's a newer buffer in the upload queue.
That's already implemented:
http://mxr.mozilla.org/mozilla-central/source/gfx/layers/opengl/TiledThebesLayerOGL.cpp#134

When we receive a new buffer we abort any pending upload, we unlock the current tiles, receive the new tiles and update the region to upload.
Assignee: bgirard → nobody
Blocks: 887819
I'm currently rebasing BenWa's patch on m-c.
Assignee: nobody → chrislord.net
Attached patch Ported to recent m-c (obsolete) — Splinter Review
This still has the issue that the shared memory isn't freed at any point, I'll tackle that next - would just like to get feedback that this rebase is ok.

I've tested it on Android, where it seems to work fine, except for the occasional  crash where it tries to delete a tile that still has a read-lock (and subsequently aborts).
Attachment #619089 - Attachment is obsolete: true
Attachment #632350 - Attachment is obsolete: true
Attachment #786349 - Flags: feedback?(bgirard)
Comment on attachment 786349 [details] [diff] [review]
Ported to recent m-c

Review of attachment 786349 [details] [diff] [review]:
-----------------------------------------------------------------

Rebase looks nice

::: gfx/layers/TiledLayerBuffer.h
@@ +18,5 @@
>  
>  #include "nsRect.h"
>  #include "nsRegion.h"
>  #include "nsTArray.h"
> +#include "mozilla/layers/LayersSurfaces.h"

IIRC you only need to forward declare for passing by reference.

::: gfx/layers/client/TextureClient.cpp
@@ +207,5 @@
>  void
>  DeprecatedTextureClientTile::EnsureAllocated(gfx::IntSize aSize, gfxASurface::gfxContentType aType)
>  {
>    if (!mSurface ||
>        mSurface->Format() != gfxPlatform::GetPlatform()->OptimalFormatForContent(aType)) {

I wonder if this should be check if the size has changed or perhaps it's handled elsewhere.
Attachment #786349 - Flags: feedback?(bgirard) → feedback+
(In reply to Benoit Girard (:BenWa) from comment #17)
> ::: gfx/layers/TiledLayerBuffer.h
> @@ +18,5 @@
> >  
> >  #include "nsRect.h"
> >  #include "nsRegion.h"
> >  #include "nsTArray.h"
> > +#include "mozilla/layers/LayersSurfaces.h"
> 
> IIRC you only need to forward declare for passing by reference.
> 

Just to clarify. Here you're including LayersSurfaces.h but I think you only need to forward declare SurfaceDescriptorTiles.
Attached patch Make tiles multi-process safe (obsolete) — Splinter Review
This is BenWa's original patch, ported to more current m-c. Tested to work on Android, bar issues due to gfxReusableSurfaceWrapper not handling destruction correctly (see next patch for that).

Please pass on the review if you think you aren't appropriate :)
Attachment #786349 - Attachment is obsolete: true
Attachment #787606 - Flags: review?(ncameron)
This makes gfxReusableSurfaceWrapper handle deallocation. I wasn't clever enough to figure a race-free way to do this without adding an extra field to gfxBaseSharedImageSurface.

Applying these two patches appears to work fine on Android, I still need to test this on b2g. These patches will be folded when committed.
Attachment #787607 - Flags: review?(nical.bugzilla)
Attachment #787607 - Flags: review?(bgirard)
Try is looking green with these patches applied: https://tbpl.mozilla.org/?tree=Try&rev=a6c400a25b86
Comment on attachment 787606 [details] [diff] [review]
Make tiles multi-process safe

Review of attachment 787606 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/TiledLayerBuffer.h
@@ +190,5 @@
>    /**
>     * Update the current retained layer with the updated layer data.
>     * The BasicTiledLayerBuffer is expected to be in the ReadLock state
>     * prior to this being called. aTiledBuffer is copy constructed and
>     * is retained until it has been uploaded/copyed and unlocked.

please update this comment

::: gfx/layers/client/TextureClient.cpp
@@ +209,5 @@
>  {
>    if (!mSurface ||
>        mSurface->Format() != gfxPlatform::GetPlatform()->OptimalFormatForContent(aType)) {
> +    nsRefPtr<gfxSharedImageSurface> sharedImage =
> +      gfxSharedImageSurface::CreateUnsafe(mForwarder,

Where do we release the shmem under this image?

::: gfx/layers/client/TextureClient.h
@@ +203,5 @@
>  {
>  public:
>    DeprecatedTextureClientTile(const DeprecatedTextureClientTile& aOther);
>    DeprecatedTextureClientTile(CompositableForwarder* aForwarder,
>                      const TextureInfo& aTextureInfo);

Do we still need this constructor?

::: gfx/layers/client/TiledContentClient.cpp
@@ +110,5 @@
>  }
>  
>  
>  void
> +BasicTiledLayerTile::GetTileDescriptor(BasicTileDescriptor *aDesc)

return the result rather than use an out param

@@ +125,5 @@
> +    new DeprecatedTextureClientTile(nullptr, TextureInfo(BUFFER_TILED), surface));
> +}
> +
> +void
> +BasicTiledLayerBuffer::GetSurfaceDescriptorTiles(SurfaceDescriptorTiles *aDescriptor)

return the result rather than use an out param

@@ +154,5 @@
> +                                            aDescriptor.paintedRegion(),
> +                                            aDescriptor.tiles(),
> +                                            aDescriptor.retainedWidth(),
> +                                            aDescriptor.retainedHeight(),
> +                                            aDescriptor.resolution());

might make more sense for the BasicTiledLayerBuffer to take SurfaceDescriptorTile as an arg.

::: gfx/layers/client/TiledContentClient.h
@@ +126,5 @@
> +    mValidRegion = aValidRegion;
> +    mPaintedRegion = aPaintedRegion;
> +    mRetainedWidth = aRetainedWidth;
> +    mRetainedHeight = aRetainedHeight;
> +    mResolution = aResolution;

Use initialisers for all of these rather than assignment

@@ +130,5 @@
> +    mResolution = aResolution;
> +
> +    for(size_t i = 0; i < aTiles.Length(); i++) {
> +      if (aTiles[i].type() == TileDescriptor::TPlaceholderTileDescriptor) {
> +        mRetainedTiles.AppendElement( GetPlaceholderTile() );

nit: remove spaces

@@ +133,5 @@
> +      if (aTiles[i].type() == TileDescriptor::TPlaceholderTileDescriptor) {
> +        mRetainedTiles.AppendElement( GetPlaceholderTile() );
> +      } else {
> +        const BasicTileDescriptor& basicTileDesc = aTiles[i].get_BasicTileDescriptor();
> +        mRetainedTiles.AppendElement( BasicTiledLayerTile::OpenDescriptor(basicTileDesc) );

same nit

::: gfx/layers/ipc/LayersSurfaces.ipdlh
@@ +74,5 @@
>     */
>    bool isRBSwapped;
>  };
>  
> +struct BasicTileDescriptor {

Why 'Basic'? In the sense of simple or using software layers rendering? If the former, please change the name.

@@ +79,5 @@
> +  Shmem reusableSurface;
> +};
> +
> +struct PlaceholderTileDescriptor {
> +  bool empty; // void

not sure what this comment means

::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
@@ +29,5 @@
>  };
>  
>  gfxReusableSurfaceWrapper::~gfxReusableSurfaceWrapper()
>  {
> +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() == 0, "Should not be locked when released");

Maybe move this assertion to the destructor of gfxSharedImageSurface?

@@ +47,5 @@
>  void
>  gfxReusableSurfaceWrapper::ReadUnlock()
>  {
> +  mSurface->ReadUnlock();
> +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() >= 0, "Should not be negative");

This assertion should probably be moved to mSurface->ReadUnlock()

@@ +56,4 @@
>  {
>    NS_CheckThreadSafe(_mOwningThread.GetThread(), "Only the owner thread can call GetWritable");
>  
> +  if (mSurface->GetReadCount() == 0) {

Can we change the API to something like IsLockedForReading and then get rid of the GetReadCount method? I feel the read count could be encapsulated in the image much better.

@@ +62,5 @@
>    }
>  
>    // Something else is reading the surface, copy it
> +  nsRefPtr<gfxSharedImageSurface> copySurface =
> +    gfxSharedImageSurface::CreateUnsafe(aAllocator, mSurface->GetSize(), mSurface->Format());

Where does this shmem get deallocated?

@@ +74,5 @@
>  
> +const unsigned char*
> +gfxReusableSurfaceWrapper::GetReadOnlyData() const
> +{
> +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() > 0, "Should have read lock");

I'm not sure if this assert is very useful since it doesn't check that WE have the read lock. If this is the best we can do, then maybe move it to mSurface->Data()
Attachment #787606 - Flags: review?(ncameron) → review+
(In reply to Chris Lord [:cwiiis] from comment #19)
> Created attachment 787606 [details] [diff] [review]
> Make tiles multi-process safe
> 
> This is BenWa's original patch, ported to more current m-c. Tested to work
> on Android, bar issues due to gfxReusableSurfaceWrapper not handling
> destruction correctly (see next patch for that).
> 
> Please pass on the review if you think you aren't appropriate :)

Missed the comment on deallocation, so ignore my queries about that in the review above :-)
Try run with patches: https://tbpl.mozilla.org/?tree=Try&rev=a6c400a25b86
Without: https://tbpl.mozilla.org/?tree=Try&rev=5cf5f7e55f76

rck2 gets a consistently worse score (iirc, rck2 can be between 0 and 100, where 0 is 'no checkerboarding occurred' and 100 is 'nothing was drawn to the screen at all'). I guess because of the overhead of the atomic operations, shared memory allocation and extra indirection.

rp gets a marginally worse score on Android 2.2 and marginally better score on 4.0 - I guess it improves on 4.0 because it's possible that less texture upload is happening, or less rendering is happening; I guess it declines on 2.2 because of the cost of the atomic operations and freeing of the shm being shared between compositor and renderer(?) (that last one seems unlikely as the memory is reused in the common case)

So question; do we think this performance decrease is reasonable? Do we want to provide an Android path that doesn't use shared memory (with the assumption that it increases overhead)? Do we want to devise a way of doing this with fewer atomic operations?

Thinking aloud, do readlock/unlock even need to be atomic at all? If we only sent changed tiles to the client, couldn't we could avoid the need for multiple read-locks, and thus lock/unlock wouldn't need to be atomic as there's no situation where they could happen concurrently?
Comment on attachment 787607 [details] [diff] [review]
Make tiles multi-process safe, part 2

Review of attachment 787607 [details] [diff] [review]:
-----------------------------------------------------------------

I am not familiar with how tiling works so my review was more focused on the plumbing of passing around the ISurfaceAllocator and how it's used, which is rather simple here.
Attachment #787607 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 787607 [details] [diff] [review]
Make tiles multi-process safe, part 2

Review of attachment 787607 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to fully understand the 'Users' concept before r+ing.

::: gfx/layers/client/TiledContentClient.h
@@ +8,5 @@
>  
>  #include "mozilla/layers/ContentClient.h"
>  #include "TiledLayerBuffer.h"
>  #include "gfxPlatform.h"
> +#include "mozilla/layers/ISurfaceAllocator.h"

Don't need to include this either. class ISurfaceAllocator is enough.

::: gfx/layers/composite/TiledContentHost.h
@@ +7,5 @@
>  #define GFX_TILEDCONTENTHOST_H
>  
>  #include "ContentHost.h"
>  #include "ClientTiledThebesLayer.h" // for BasicTiledLayerBuffer
> +#include "mozilla/layers/ISurfaceAllocator.h"

This include isn't needed

::: gfx/thebes/gfxBaseSharedMemorySurface.h
@@ +151,5 @@
>          return shmInfo->readCount;
>      }
>  
> +    int32_t
> +    IncrementUsers()

Why are you using different terminology here? This looks to be essentially a cross process reference count. If you do introduce this then you should assert that users == 0 when released.

::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
@@ +25,5 @@
> +    // If the read count isn't zero, it means we destroyed this wrapper before
> +    // another user was about to acquire the surface. In this situation, do
> +    // nothing; the other user will be responsible for freeing the shared
> +    // memory surface.
> +    if (mSurface->GetReadCount() != 0) {

I don't know if this is correct but perhaps users isn't really a reference count?
Attachment #787607 - Flags: review?(bgirard) → review-
(In reply to Benoit Girard (:BenWa) from comment #26)
> Comment on attachment 787607 [details] [diff] [review]
> Make tiles multi-process safe, part 2
> 
> Review of attachment 787607 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd like to fully understand the 'Users' concept before r+ing.
> 
> ::: gfx/thebes/gfxBaseSharedMemorySurface.h
> @@ +151,5 @@
> >          return shmInfo->readCount;
> >      }
> >  
> > +    int32_t
> > +    IncrementUsers()
> 
> Why are you using different terminology here? This looks to be essentially a
> cross process reference count. If you do introduce this then you should
> assert that users == 0 when released.
> 
> ::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
> @@ +25,5 @@
> > +    // If the read count isn't zero, it means we destroyed this wrapper before
> > +    // another user was about to acquire the surface. In this situation, do
> > +    // nothing; the other user will be responsible for freeing the shared
> > +    // memory surface.
> > +    if (mSurface->GetReadCount() != 0) {
> 
> I don't know if this is correct but perhaps users isn't really a reference
> count?

It pretty much is a reference count - the idea is that for every place that could take the responsibility of freeing it, it'll hold onto one of these references - the race I'm accounting for here is if the client decides to destroy the tile whilst it's in-flight to the host.

So the usual flow is this;

1- Client creates tile, adds reference and ReadLock
2- Host receives tile, adds reference
3- Host uploads tile, removes ReadLock, removes reference
4- Client either reuses tile, or destroys it and frees shared memory

But there's also this;

1- Client creates tile, adds reference and ReadLock
2- Host receives tile, adds reference
3- Client wants to reuse tile, creates a copy due to ReadLock, removes reference on old tile
4- Host uploads tile, removes ReadLock, removes reference and frees shared memory

And this;

1- Client creates tile, adds reference and ReadLock
2- Client wants to reuse tile, creates a copy due to ReadLock, removes reference on old tile. Old tile has zero references, but ReadLock, so stays alive
3- Host receives tile, adds reference
4- Host uploads tile, removes ReadLock, removes reference and frees shared memory

And this;

1- Client creates tile, adds reference and ReadLock, sends tile to host
2- Client wants to reuse tile, creates a copy due to ReadLock, removes reference on old tile. Old tile has zero references, but ReadLock, so stays alive
3- Client sends new tile to host
4- Host receives first tile, adds reference
5- Host receives second tile, removes readlock and reference from first tile, frees shared memory
6- Host uploads second tile, removes ReadLock, removes reference
7- Client reuses or destroys second tile, the latter of which removes the reference and frees the shared memory


And probably several more permutations, but those were the ones I was primarily worried about. Given that a tile can have zero references, but remain alive due to the readlock, is it ok that we call these references? And if so, have you got any good ideas for naming? (because mSurface is a refable object already, so AddRef won't do)
Flags: needinfo?(bgirard)
I think we can capture everything using only the read lock. The compositor must always hold a ReadLock if it wants to use the shared image, if it removes the last Readlock it should dealloc the surface.

For content it should always hold a ReadLock to the image, if it releases the last ReadLock it should dealloc the object. All we need to do is change GetWritable to require a release of the ReadLock. If its the owner of the last ReadLock it returns the current surface, otherwise it copies the surface.
Flags: needinfo?(bgirard)
(In reply to Benoit Girard (:BenWa) from comment #28)
> I think we can capture everything using only the read lock. The compositor
> must always hold a ReadLock if it wants to use the shared image, if it
> removes the last Readlock it should dealloc the surface.
> 
> For content it should always hold a ReadLock to the image, if it releases
> the last ReadLock it should dealloc the object. All we need to do is change
> GetWritable to require a release of the ReadLock. If its the owner of the
> last ReadLock it returns the current surface, otherwise it copies the
> surface.

I'm not sure how what you're suggesting works - are you saying that if there aren't any readlocks during destruction, then it should free the memory? If so, that's racy (if content destroys after compositor readunlocks, it'll get freed twice)

The compositor always releases the readlock, content never readunlocks - I'm not really sure what you're suggesting? If it's that they should both hold readlocks, what happens if content readlocks, ends the transaction and readunlocks before the compositor receives and read-locks the tiles?

I guess content could readlock twice and destroying would call readunlock, and whoever unlocks last frees... But in that case, how does reusing a tile work without races?
Flags: needinfo?(bgirard)
We resolved this on IRC where synchronous communicate was more effective.
Flags: needinfo?(bgirard)
I think I've addressed most of your comments bar, the initialiser one - I couldn't do that, I assume due to some template nonsense... I didn't look much into it, but I don't suppose it's too critical either.

The patch is functionally the same, I just want to r? again to make sure I didn't miss anything you'd really like to see fixed/changed.
Attachment #787606 - Attachment is obsolete: true
Attachment #790315 - Flags: review?(ncameron)
And here's the deallocation part, using just ReadLock/Unlock, as suggested. I've tried to be quite liberal with the documentation. There are fewer areas where locks are taken/released now and try is looking green, so I think this is alright.
Attachment #787607 - Attachment is obsolete: true
Attachment #790318 - Flags: review?(ncameron)
Attachment #790318 - Flags: review?(bgirard)
Comment on attachment 790318 [details] [diff] [review]
Make tiles multi-process safe, part 2

Review of attachment 790318 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is great. I'd like to see the read lock count be made a bit more robust before landing.

::: gfx/layers/composite/TiledContentHost.h
@@ +138,5 @@
>      : ContentHost(aTextureInfo)
>      , mPendingUpload(false)
>      , mPendingLowPrecisionUpload(false)
>    {}
> +  ~TiledContentHost() {}

If you don't mind please add MOZ_COUNT_CTOR/DTOR

::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
@@ +22,2 @@
>    MOZ_COUNT_DTOR(gfxReusableSurfaceWrapper);
> +  ReadUnlock();

Assert the read count is not negative.

@@ +36,5 @@
> +  int32_t readCount = mSurface->ReadUnlock();
> +  NS_ABORT_IF_FALSE(readCount >= 0, "Read count should not be negative");
> +
> +  if (readCount == 0) {
> +    mAllocator->DeallocShmem(mSurface->GetShmem());

Here we deallocate the Shmem but in ~gfxReusableSurfaceWrapper we called ReadUnlock which will bring the readCount to -1. This isn't right.

@@ +51,5 @@
>      *aSurface = mSurface;
>      return this;
>    }
>  
> +  NS_ABORT_IF_FALSE(readCount > 1, "A ReadLock must be held when calling GetWritable");

This should be > 0 and moved to the start.

::: gfx/thebes/gfxReusableSurfaceWrapper.h
@@ +46,5 @@
>  
> +  /**
> +   * Create a gfxReusableSurfaceWrapper from the shared memory segment of a
> +   * gfxSharedImageSurface. A ReadLock must be held, which will be adopted by
> +   * the returned gfxReusableSurfaceWrapper.

If we wanted we could actually make this staticly enforced. i.e. When sending the surface you readlock the surface which gives you a LockedReusableSurfaceDescriptor and we can add a constructor here that takes a LockedReusableSurfaceDescriptor called from the compositor.

... But I don't want to derail this work so lets not do anything unless the lifetimes cause us more problems.
Attachment #790318 - Flags: review?(bgirard) → review-
(In reply to Benoit Girard (:BenWa) from comment #33)
> Comment on attachment 790318 [details] [diff] [review]
> Make tiles multi-process safe, part 2
> 
> Review of attachment 790318 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This patch is great. I'd like to see the read lock count be made a bit more
> robust before landing.
> 
> ::: gfx/layers/composite/TiledContentHost.h
> @@ +138,5 @@
> >      : ContentHost(aTextureInfo)
> >      , mPendingUpload(false)
> >      , mPendingLowPrecisionUpload(false)
> >    {}
> > +  ~TiledContentHost() {}
> 
> If you don't mind please add MOZ_COUNT_CTOR/DTOR

Sure

> ::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
> @@ +22,2 @@
> >    MOZ_COUNT_DTOR(gfxReusableSurfaceWrapper);
> > +  ReadUnlock();
> 
> Assert the read count is not negative.

This is already asserted in the ReadUnlock implementation, why assert it here too?

> @@ +36,5 @@
> > +  int32_t readCount = mSurface->ReadUnlock();
> > +  NS_ABORT_IF_FALSE(readCount >= 0, "Read count should not be negative");
> > +
> > +  if (readCount == 0) {
> > +    mAllocator->DeallocShmem(mSurface->GetShmem());
> 
> Here we deallocate the Shmem but in ~gfxReusableSurfaceWrapper we called
> ReadUnlock which will bring the readCount to -1. This isn't right.

I'm not sure what you mean? This is the gfxReusableSurfaceWrapper::ReadUnlock implementation, which is what we call in the destructor - this is the only place gfxSharedImageSurface::ReadUnlock is called, and it asserts that the readcount remains positive. I can confirm that this patch works correctly too (tested locally on both Android and b2g, and try is completely green) - Are you sure you're reading this correctly?

> @@ +51,5 @@
> >      *aSurface = mSurface;
> >      return this;
> >    }
> >  
> > +  NS_ABORT_IF_FALSE(readCount > 1, "A ReadLock must be held when calling GetWritable");
> 
> This should be > 0 and moved to the start.

Yes, it should...

> ::: gfx/thebes/gfxReusableSurfaceWrapper.h
> @@ +46,5 @@
> >  
> > +  /**
> > +   * Create a gfxReusableSurfaceWrapper from the shared memory segment of a
> > +   * gfxSharedImageSurface. A ReadLock must be held, which will be adopted by
> > +   * the returned gfxReusableSurfaceWrapper.
> 
> If we wanted we could actually make this staticly enforced. i.e. When
> sending the surface you readlock the surface which gives you a
> LockedReusableSurfaceDescriptor and we can add a constructor here that takes
> a LockedReusableSurfaceDescriptor called from the compositor.
> 
> ... But I don't want to derail this work so lets not do anything unless the
> lifetimes cause us more problems.

This isn't a huge change, I'll see how long it'll take me and decide based on that - I like the idea.


Requesting needinfo about the ReadUnlock thing above, either to confirm I'm correct or clarify what I've gotten wrong.
Flags: needinfo?(bgirard)
Comment on attachment 790318 [details] [diff] [review]
Make tiles multi-process safe, part 2

Review of attachment 790318 [details] [diff] [review]:
-----------------------------------------------------------------

Right, it was my bad, I read the code wrong. Ignore that.
Attachment #790318 - Flags: review- → review+
This addresses the other review comments, carrying r+ - decided not to make a type to greater enforce the ref-counting, just to try and get this done a bit more quickly... I hope I don't regret that.
Attachment #790318 - Attachment is obsolete: true
Attachment #790318 - Flags: review?(ncameron)
Attachment #791364 - Flags: review+
Flags: needinfo?(bgirard)
Comment on attachment 791364 [details] [diff] [review]
Make tiles multi-process safe, part 2

I think it'd be good for nrc to see this as well as the first patch.
Attachment #791364 - Flags: review?(ncameron)
This separates out the sharable interface of gfxReusableSurfaceWrapper, a follow-up patch will add a non-shm implementation for Android to (hopefully) avoid the performance regression, presumably caused by using shared memory.
Attachment #791365 - Flags: review?(bgirard)
I'm not super happy with the amount of #ifdef's in this, so suggestions welcome (and of course suggestions about anything else).

This adds a non-shm implementation of gfxReusableSurfaceWrapper that is basically the same as before this patch and the necessary ipc and support code to use it. I settled on doing it this way as I thought it'd be a nice way of experimenting with alternative tile allocation methods, such as Gralloc (or tfp on Linux desktop).
Attachment #791366 - Flags: review?(bgirard)
Try run without patches: https://tbpl.mozilla.org/?tree=Try&rev=99946207f85e
Try run with patches: https://tbpl.mozilla.org/?tree=Try&rev=f722ff628d46

I'm really hoping that not using shm on Android fixes the performance regression, for the most part (I wouldn't be surprised at a slight regression as there's likely a bit more ipc overhead and allocation churn communicating the tiles like this rather than just a pointer to a copy of the buffer). If it doesn't, I'll have to verify where the perf regression is coming from and reassess from there.

For the record, this seems to work really nicely on b2g - anecdotally, it feels smoother, but there is definitely less visible glitching than using the standard backend with Gralloc-backed surfaces.
Comment on attachment 791365 [details] [diff] [review]
Separate gfxReusableSurfaceWrapper into a base class and implementation

Review of attachment 791365 [details] [diff] [review]:
-----------------------------------------------------------------

You should put the bugzilla description of this patch in the commit.

::: gfx/thebes/gfxReusableSharedImageSurfaceWrapper.h
@@ +10,5 @@
> +#include "mozilla/layers/ISurfaceAllocator.h"
> +
> +class gfxSharedImageSurface;
> +
> +class gfxReusableSharedImageSurfaceWrapper : public gfxReusableSurfaceWrapper {

This class needs a comment. If I understand correctly this extends gfxReusableSurfaceWrapper to provide a reusable surface that can be shared across process.

::: gfx/thebes/gfxReusableSurfaceWrapper.h
@@ +25,4 @@
>   * 2) The surface is sent from content to the compositor once
>   *    or potentially many time, each increasing a read lock.
>   * 3) When the compositor has processed the surface and uploaded
>   *    the content it then releases the read lock.

We need to update this comment
Attachment #791365 - Flags: review?(bgirard) → review+
Comment on attachment 791366 [details] [diff] [review]
Add and use gfxReusableImageSurfaceWrapper

Review of attachment 791366 [details] [diff] [review]:
-----------------------------------------------------------------

r- because I'd like to see the patch again with corrections even if the patch is close.

::: gfx/layers/client/TiledContentClient.cpp
@@ +115,5 @@
>  
> +BasicShmTileDescriptor
> +BasicTiledLayerTile::GetShmTileDescriptor()
> +{
> +  return BasicShmTileDescriptor(static_cast<gfxReusableSharedImageSurfaceWrapper*>(GetSurface())->GetShmem());

I'd rather if we could assert these types, at least in debug. I've seen this go wrong so many times. But actually in the header I asked to merge these so we should be able to get the TileDescriptor without having to worry about the type of BasicTiledLayerTile.

@@ +122,4 @@
>  BasicTileDescriptor
>  BasicTiledLayerTile::GetTileDescriptor()
>  {
> +  return BasicTileDescriptor(uintptr_t(GetSurface()));

The above should also fix this.

@@ +157,5 @@
>      } else {
> +      // If we're using OMTC, we can save some cycles by not using shared
> +      // memory. Using shared memory here is a small, but significant
> +      // performance regression.
> +#ifdef MOZ_ANDROID_OMTC

And also remove this ifdef

::: gfx/layers/client/TiledContentClient.h
@@ +73,5 @@
>    BasicTileDescriptor GetTileDescriptor();
> +  BasicShmTileDescriptor GetShmTileDescriptor();
> +
> +  static BasicTiledLayerTile OpenDescriptor(ISurfaceAllocator *aAllocator, const BasicShmTileDescriptor& aDesc);
> +  static BasicTiledLayerTile OpenDescriptor(const BasicTileDescriptor& aDesc);

Can we use TileDescriptor and only expose one version of GetTileDescriptior. On the other hand we should have explicit names for the two types of Opens: OpenShmTileDescriptor/OpenTileDescriptor.

@@ +130,2 @@
>    {
>      mValidRegion = aValidRegion;

I think it's time to pull this into the .cpp given the size of this method.
Attachment #791366 - Flags: review?(bgirard) → review-
Thanks for the quick review, will make the changes and resubmit. Also, not using shm fixes the performance regression for the most part, so that's a relief... 

This does bring another problem to light though, that tiles are allocated and deallocated quite a lot on Android just scrolling around - I guess this means that the display-port is changing size, perhaps by a row/column of tiles as you scroll vertically/horizontally over a threshold (off by one?)

A tile pool with some thresholds for freeing/allocating would fix this, but perhaps we should look at why that is happening too. Not in this bug though.
Add comments and commit message, carrying r+.
Attachment #791365 - Attachment is obsolete: true
Attachment #791657 - Flags: review+
Ok, this is a bit more than what you asked for, but I think it's too much of a risk to leave in multiple places where the type of the tile/surface had to be inferred, so I added types to gfxReusableSurfaceWrapper.

Now the surface wrapper type is only picked in TextureClientTile::EnsureAllocated and everywhere else can just handle all types based on whatever gets allocated there.
Attachment #791366 - Attachment is obsolete: true
Attachment #791661 - Flags: review?(bgirard)
Comment on attachment 791661 [details] [diff] [review]
Add and use gfxReusableImageSurfaceWrapper

Review of attachment 791661 [details] [diff] [review]:
-----------------------------------------------------------------

Much better. r+
Attachment #791661 - Flags: review?(bgirard) → review+
(In reply to Chris Lord [:cwiiis] from comment #31)
> Created attachment 790315 [details] [diff] [review]
> Make tiles multi-process safe, part 1
> 
> I think I've addressed most of your comments bar, the initialiser one - I
> couldn't do that, I assume due to some template nonsense... I didn't look
> much into it, but I don't suppose it's too critical either.
> 
> The patch is functionally the same, I just want to r? again to make sure I
> didn't miss anything you'd really like to see fixed/changed.

I'm a bit confused - most of my comments don't seem to be addressed, but then a whole lot of this patch seems to get changed in the later patches anyway, so I'm not really sure what is important. Could you refactor the patches a little bit so I can see which bits of this patch will stick around?
Comment on attachment 791364 [details] [diff] [review]
Make tiles multi-process safe, part 2

Review of attachment 791364 [details] [diff] [review]:
-----------------------------------------------------------------

I think this looks good, I haven't looked in too much detail since BenWa has.

::: gfx/layers/composite/TiledContentHost.cpp
@@ +61,5 @@
>    static_cast<ThebesLayerComposite*>(aLayer)->EnsureTiled();
>  }
>  
>  void
> +TiledContentHost::PaintedTiledLayerBuffer(ISurfaceAllocator* aAllocator, const SurfaceDescriptorTiles& aTiledDescriptor)

nit: long line

::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
@@ +36,5 @@
> +  int32_t readCount = mSurface->ReadUnlock();
> +  NS_ABORT_IF_FALSE(readCount >= 0, "Read count should not be negative");
> +
> +  if (readCount == 0) {
> +    mAllocator->DeallocShmem(mSurface->GetShmem());

why do we no longer need to do this on the main thread?

@@ +46,5 @@
>  {
>    NS_CheckThreadSafe(_mOwningThread.GetThread(), "Only the owner thread can call GetWritable");
>  
> +  int32_t readCount = mSurface->GetReadCount();
> +  NS_ABORT_IF_FALSE(readCount > 0, "A ReadLock must be held when calling GetWritable");

We really ought to check that the caller has the read lock, not just someone, somewhere. That seems vulnerable to subtle errors. Should we also check in non-debug builds and return null?

::: gfx/thebes/gfxReusableSurfaceWrapper.h
@@ +33,5 @@
>    NS_INLINE_DECL_THREADSAFE_REFCOUNTING(gfxReusableSurfaceWrapper)
>  public:
>    /**
> +   * Pass the gfxSharedImageSurface to the wrapper. The wrapper will ReadLock
> +   * on creation and ReadUnlock on destruction.

Why do this?
Attachment #791364 - Flags: review?(ncameron) → review+
(In reply to Nick Cameron [:nrc] from comment #48)
> Comment on attachment 791364 [details] [diff] [review]
> Make tiles multi-process safe, part 2
> 
> Review of attachment 791364 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think this looks good, I haven't looked in too much detail since BenWa has.
> 
> ::: gfx/layers/composite/TiledContentHost.cpp
> @@ +61,5 @@
> >    static_cast<ThebesLayerComposite*>(aLayer)->EnsureTiled();
> >  }
> >  
> >  void
> > +TiledContentHost::PaintedTiledLayerBuffer(ISurfaceAllocator* aAllocator, const SurfaceDescriptorTiles& aTiledDescriptor)
> 
> nit: long line

Will split.

> ::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
> @@ +36,5 @@
> > +  int32_t readCount = mSurface->ReadUnlock();
> > +  NS_ABORT_IF_FALSE(readCount >= 0, "Read count should not be negative");
> > +
> > +  if (readCount == 0) {
> > +    mAllocator->DeallocShmem(mSurface->GetShmem());
> 
> why do we no longer need to do this on the main thread?

It wasn't really ever required to do this on the main thread I don't think, but that's a marked difference pre and post patch, deallocation can happen in either thread/process now. For the cross-thread version, we could put it back on the main thread, but I wanted to minimise differences (we could do the same with ipc too, but I don't think the overhead is worth it).

> @@ +46,5 @@
> >  {
> >    NS_CheckThreadSafe(_mOwningThread.GetThread(), "Only the owner thread can call GetWritable");
> >  
> > +  int32_t readCount = mSurface->GetReadCount();
> > +  NS_ABORT_IF_FALSE(readCount > 0, "A ReadLock must be held when calling GetWritable");
> 
> We really ought to check that the caller has the read lock, not just
> someone, somewhere. That seems vulnerable to subtle errors. Should we also
> check in non-debug builds and return null?

I'm not sure how we could check this - the readlockcount is just an int (or in the cross-thread version, it's the ref count). That's one of the reasons for readlock/unlock on creation/destruction, it makes it more likely to mean that if you have a pointer to a surface wrapper, you also hold a reference(/readlock).

> ::: gfx/thebes/gfxReusableSurfaceWrapper.h
> @@ +33,5 @@
> >    NS_INLINE_DECL_THREADSAFE_REFCOUNTING(gfxReusableSurfaceWrapper)
> >  public:
> >    /**
> > +   * Pass the gfxSharedImageSurface to the wrapper. The wrapper will ReadLock
> > +   * on creation and ReadUnlock on destruction.
> 
> Why do this?

ReadLock/Unlock are pretty much just a cross-process ref-count. It just makes it much easier to track lifecycle this way, rather than manually calling lock/unlock about the place. In the cross-thread version, ReadLock/Unlock just call AddRef/Release.
These are the comments I addressed in the updated part 1 patch:

(In reply to Nick Cameron [:nrc] from comment #22)
> ::: gfx/layers/TiledLayerBuffer.h
> @@ +190,5 @@
> >    /**
> >     * Update the current retained layer with the updated layer data.
> >     * The BasicTiledLayerBuffer is expected to be in the ReadLock state
> >     * prior to this being called. aTiledBuffer is copy constructed and
> >     * is retained until it has been uploaded/copyed and unlocked.
> 
> please update this comment

Done

> ::: gfx/layers/client/TextureClient.cpp
> @@ +209,5 @@
> >  {
> >    if (!mSurface ||
> >        mSurface->Format() != gfxPlatform::GetPlatform()->OptimalFormatForContent(aType)) {
> > +    nsRefPtr<gfxSharedImageSurface> sharedImage =
> > +      gfxSharedImageSurface::CreateUnsafe(mForwarder,
> 
> Where do we release the shmem under this image?

In part 2, as you point out later.

> ::: gfx/layers/client/TextureClient.h
> @@ +203,5 @@
> >  {
> >  public:
> >    DeprecatedTextureClientTile(const DeprecatedTextureClientTile& aOther);
> >    DeprecatedTextureClientTile(CompositableForwarder* aForwarder,
> >                      const TextureInfo& aTextureInfo);
> 
> Do we still need this constructor?

I removed the extra constructor and just altered the second existing one to take a surface pointer (which defaults to null).

> ::: gfx/layers/client/TiledContentClient.cpp
> @@ +110,5 @@
> >  }
> >  
> >  
> >  void
> > +BasicTiledLayerTile::GetTileDescriptor(BasicTileDescriptor *aDesc)
> 
> return the result rather than use an out param

Done.

> @@ +125,5 @@
> > +    new DeprecatedTextureClientTile(nullptr, TextureInfo(BUFFER_TILED), surface));
> > +}
> > +
> > +void
> > +BasicTiledLayerBuffer::GetSurfaceDescriptorTiles(SurfaceDescriptorTiles *aDescriptor)
> 
> return the result rather than use an out param

Done.

> @@ +154,5 @@
> > +                                            aDescriptor.paintedRegion(),
> > +                                            aDescriptor.tiles(),
> > +                                            aDescriptor.retainedWidth(),
> > +                                            aDescriptor.retainedHeight(),
> > +                                            aDescriptor.resolution());
> 
> might make more sense for the BasicTiledLayerBuffer to take
> SurfaceDescriptorTile as an arg.

I don't think I did this, not really for any good reason... We can change this later if need be.

> ::: gfx/layers/client/TiledContentClient.h
> @@ +126,5 @@
> > +    mValidRegion = aValidRegion;
> > +    mPaintedRegion = aPaintedRegion;
> > +    mRetainedWidth = aRetainedWidth;
> > +    mRetainedHeight = aRetainedHeight;
> > +    mResolution = aResolution;
> 
> Use initialisers for all of these rather than assignment

Using initialisers for this causes a build error - I assume some templating or scoping or whatever C++ feature I don't fully understand means this isn't possible.

> @@ +130,5 @@
> > +    mResolution = aResolution;
> > +
> > +    for(size_t i = 0; i < aTiles.Length(); i++) {
> > +      if (aTiles[i].type() == TileDescriptor::TPlaceholderTileDescriptor) {
> > +        mRetainedTiles.AppendElement( GetPlaceholderTile() );
> 
> nit: remove spaces

Done.

> @@ +133,5 @@
> > +      if (aTiles[i].type() == TileDescriptor::TPlaceholderTileDescriptor) {
> > +        mRetainedTiles.AppendElement( GetPlaceholderTile() );
> > +      } else {
> > +        const BasicTileDescriptor& basicTileDesc = aTiles[i].get_BasicTileDescriptor();
> > +        mRetainedTiles.AppendElement( BasicTiledLayerTile::OpenDescriptor(basicTileDesc) );
> 
> same nit

Done.

> ::: gfx/layers/ipc/LayersSurfaces.ipdlh
> @@ +74,5 @@
> >     */
> >    bool isRBSwapped;
> >  };
> >  
> > +struct BasicTileDescriptor {
> 
> Why 'Basic'? In the sense of simple or using software layers rendering? If
> the former, please change the name.

It's the latter, so I didn't change it.

> @@ +79,5 @@
> > +  Shmem reusableSurface;
> > +};
> > +
> > +struct PlaceholderTileDescriptor {
> > +  bool empty; // void
> 
> not sure what this comment means

Ends up empty structs in ipdl are ok these days, so I removed this entirely.

> ::: gfx/thebes/gfxReusableSurfaceWrapper.cpp
> @@ +29,5 @@
> >  };
> >  
> >  gfxReusableSurfaceWrapper::~gfxReusableSurfaceWrapper()
> >  {
> > +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() == 0, "Should not be locked when released");
> 
> Maybe move this assertion to the destructor of gfxSharedImageSurface?

Part 2 of the patch changes this completely anyway.

> @@ +47,5 @@
> >  void
> >  gfxReusableSurfaceWrapper::ReadUnlock()
> >  {
> > +  mSurface->ReadUnlock();
> > +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() >= 0, "Should not be negative");
> 
> This assertion should probably be moved to mSurface->ReadUnlock()

Ditto.

> @@ +56,4 @@
> >  {
> >    NS_CheckThreadSafe(_mOwningThread.GetThread(), "Only the owner thread can call GetWritable");
> >  
> > +  if (mSurface->GetReadCount() == 0) {
> 
> Can we change the API to something like IsLockedForReading and then get rid
> of the GetReadCount method? I feel the read count could be encapsulated in
> the image much better.

Ditto here, though we need the read count to assert that it's only locked in one place - I guess we could still hide the count behind API, but we'd need both a IsLocked and IsLockedElsewhere, I think the count is ok given how small this class is (and especially as it maps to addref/release/mrefcnt nicely)

> @@ +62,5 @@
> >    }
> >  
> >    // Something else is reading the surface, copy it
> > +  nsRefPtr<gfxSharedImageSurface> copySurface =
> > +    gfxSharedImageSurface::CreateUnsafe(aAllocator, mSurface->GetSize(), mSurface->Format());
> 
> Where does this shmem get deallocated?

Part 2.

> @@ +74,5 @@
> >  
> > +const unsigned char*
> > +gfxReusableSurfaceWrapper::GetReadOnlyData() const
> > +{
> > +  NS_ABORT_IF_FALSE(mSurface->GetReadCount() > 0, "Should have read lock");
> 
> I'm not sure if this assert is very useful since it doesn't check that WE
> have the read lock. If this is the best we can do, then maybe move it to
> mSurface->Data()

I guess to see if we are the owner of the lock, we could check that mRefCnt == GetReadCount, but that isn't going to work if we had two wrappers in the same process referencing the same surface, and that isn't going to work for the cross-thread implementation, so I wonder how useful a thing that is. I'd vote to leave it as is for now, but I'd take any advice on the matter :)
Nick, do comment #49 and comment #50 clear things up enough that you'd be ok to see this committed?
Flags: needinfo?(ncameron)
(In reply to Chris Lord [:cwiiis] from comment #51)
> Nick, do comment #49 and comment #50 clear things up enough that you'd be ok
> to see this committed?

Yes. Thanks for the breakdown, that is useful.

The thing about not checking that the caller actually has the lock is pretty worrying, but I also don't see a solution, so I guess it will have to do.
Flags: needinfo?(ncameron)
Attachment #790315 - Flags: review?(ncameron) → review+
You need to log in before you can comment on or make changes to this bug.