Closed Bug 765734 Opened 12 years ago Closed 12 years ago

add gralloc-based layer buffers for gonk

Categories

(Core :: Graphics: Layers, defect)

x86
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16
blocking-basecamp +

People

(Reporter: gal, Assigned: cjones)

References

Details

Attachments

(14 files, 10 obsolete files)

5.87 KB, patch
bent.mozilla
: review+
Details | Diff | Splinter Review
2.17 KB, patch
mwu
: review+
Details | Diff | Splinter Review
5.94 KB, patch
roc
: review+
Details | Diff | Splinter Review
14.83 KB, patch
roc
: review+
Details | Diff | Splinter Review
15.73 KB, patch
cjones
: review+
Details | Diff | Splinter Review
65.69 KB, patch
nrc
: review+
BenWa
: review+
Details | Diff | Splinter Review
36.62 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.10 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
27.50 KB, patch
cjones
: review+
Details | Diff | Splinter Review
3.55 KB, patch
Details | Diff | Splinter Review
1.02 KB, patch
Details | Diff | Splinter Review
2.39 KB, patch
Details | Diff | Splinter Review
512 bytes, patch
Details | Diff | Splinter Review
10.84 KB, patch
roc
: review+
Details | Diff | Splinter Review
As a stepping stone towards cross-process texture sharing, add gralloc-based layer buffers to gonk using GraphicBuffer. We render into the buffer by locking it and mapping it into CPU memory, and on the parent side lock it again and upload it by copying via CPU memory. This shouldn't be much worse than shmem performance wise. Once this part works we will add direct mapping of the texture instead of uploading it.
Attached patch v1 (obsolete) — Splinter Review
Assignee: nobody → gal
Attachment #634037 - Flags: review?(jones.chris.g)
Comment on attachment 634037 [details] [diff] [review]
v1

This is looking really good, but r- for the comments below.

>diff --git a/gfx/layers/ipc/PLayers.ipdl b/gfx/layers/ipc/PLayers.ipdl

>@@ -49,6 +50,7 @@ union SurfaceDescriptor {
>   Shmem;
>   SurfaceDescriptorD3D10;
>   SurfaceDescriptorX11;
>+  SurfaceDescriptorGralloc;

We need separate notions of a "gralloc buffer descriptor", which can
be used to share a GraphicBuffer from one process to another, and a
"gralloc handle" that we can use to quickly look up a GraphicBuffer.

>diff --git a/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp b/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp

>+static gfxASurface::gfxImageFormat
>+ImageFormatForPixel(android::PixelFormat aFormat)
>+{
>+    switch (aFormat) {

Nit: 2-space indent.

>+static android::PixelFormat
>+PixelFormatForContentType(gfxASurface::gfxContentType aContentType)
>+{
>+  if (aContentType == gfxASurface::CONTENT_COLOR)
>+    return PIXEL_FORMAT_RGBX_8888;
>+  return PIXEL_FORMAT_RGBA_8888;
>+}

Use gfxPlatform::OptimalFormatForContent().

>+/*static*/ already_AddRefed<gfxASurface>
>+ShadowLayerForwarder::PlatformOpenDescriptor(const SurfaceDescriptor& aSurface)
>+{

>+  status_t status = buffer->lock(GRALLOC_USAGE_SW_READ_OFTEN | GRALLOC_USAGE_SW_WRITE_OFTEN,
>+                                 &vaddr);

We need a version of this function that takes a flag parameter
describing the access required for the mapped buffer.  This
implementation is going to block the content process on the
compositor and vice versa.

>+static bool
>+PlatformDestroySharedSurface(SurfaceDescriptor* aSurface)
>+{
>+  if (SurfaceDescriptor::TSurfaceDescriptorGralloc != aSurface->type()) {
>+    return false;
>+  }
>+  GraphicBuffer *buffer = aSurface->get_SurfaceDescriptorGralloc().mGraphicBuffer.get();
>+  return buffer->unlock() == OK;

This is the "really seriously destroy the buffer" method.  I don't
believe this is actually going to destroy the gralloc surface.

I think what you're using this "unlock/unmap", which I think we'll
need a new interface for.

We can't land this as-is because it would regress the current pipeline
and leak gralloc memory, AFAICT.
Attachment #634037 - Flags: review?(jones.chris.g) → review-
Chris from the code it seems to me that destroy surface means unmap. Check it out.
There is a ref count on the gralloc buffer. When the child drops the SurfaceDescriptor we should dealloc it.
I can assure you that DestroySharedSurface() means just that.  I wrote that code.

Are you referring to a refcount on a particular GraphicBuffer, or on the underlying pmem region, or both?  If this will all work magically, then you need to document why.
Ok that's a great simplification if I can destroy the buffer in DestroySharedSurface. There is refcnt and the pmem level I believe.
I'm hitting the following assert immediately upon launch of the calc app.  Tip gecko with this patch:
I/Gecko   (  342): [Child 342] ###!!! ABORT: unexpected SurfaceDescriptor type!: file /local/mnt/workspace/mvines/ics.b2g/gecko/gfx/layers/ipc/ShadowLayers.cpp, line 483
BasicShadowableThebesLayer::CreateBuffer()
   - or be smarter and hand that decision off to PlatformAllocBuffer hand back gralloc buffer
   - PlatformAllocBuffer adds to table, sends BufferCreated() message to parent process
   - lock(READ|WRITE)
BasicShadowableThebesLayer::Paint()
   - normal code
   - after painting, unlock()
== transaction goes across ==
ThebesLayerOGL::Swap()
   - take sent gralloc buffer, make it front buffer.  don't lock()
   - hand back old buffer, if it exists.  on first paint it'll be null_t()
BasicShadowableThebesLayer::SetBackBufferAndSync()
   - if the "read-only front buffer" exists, lock(READ).  it will exist on the first paint
   - if the "new back buffer" exists, lock(READ|WRITE).  it won't exist on the first paint
   - if both, copy front-->back
== second paint ==
BasicShadowableThebesLayer::CreateBuffer()
  - allocates another buffer, lock(READ|WRITE)
BasicShadowableThebesLayer::Paint()
  - same code, unlock()
ThebesLayerOGL::Swap()
  - takes new front buffer, hands back old front buffer
BasicShadowableThebesLayer::SetBackBufferAndSync()
  - newBackBuffer->lock(READ|WRITE)
  - newReadOnlyFront->lock(READ)
  - copy
after that it's in steady state
An easier/better way to do the gralloc-buffer tracking might be something like

PLayers.ipdl:
  include protocol PGrallocBuffer;

  struct SurfaceDescriptorGralloc {
    PGrallocBuffer buffer;
  }

  union SurfaceDescriptor {
    // ...
    SurfaceDescriptorGralloc;
  };

  sync protocol PLayers {
    manages PGrallocBuffer;
    //...

  parent:
    async PGrallocBuffer(/* flattened gralloc thing with fds */);
  };

then you can reuse the IPDL actor tracking machinery to do the (fast) gralloc buffer serialize/lookup.

We use this pattern here 
 http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PPluginInstance.ipdl#205
 http://mxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PPluginSurface.ipdl
Assignee: gal → jones.chris.g
blocking-basecamp: --- → ?
Attached patch rollup patch (obsolete) — Splinter Review
This is a rollup of 5 patches.  The first 4 are yak-shaving, tested and working on desktop.  The fifth adds gralloc, and not tested (but compiles).  Likely to crash and/or deadlock.  Will fix/polish tomorrow.
Attachment #634037 - Attachment is obsolete: true
After fixing a dumb crash, I'm now seeing errors with sending invalid fds.  The code in attachment 634037 [details] [diff] [review] doesn't seem to be sharing correctly.
Attached patch rollup v2, some dumb stuff fixed (obsolete) — Splinter Review
With serializaton code in this patch (which should "right"), I get an EINVAL when opening the buffer in the child process and the buffer fails to be created (which leads to lock() failing).  I get the same exact error on sgs2 so looks like we're doing something wrong in gecko.  Will poke a bit more.
Attachment #635747 - Attachment is obsolete: true
OK, I think I figured this out.  We're using the vendor gralloc in the parent process, as expected, but in the child process we somehow end up choosing gralloc.default.so o_O.  (gralloc.default.so is not loaded in the parent process at all.)  I can sort of understand why trying to share gralloc handles across two different implementations would make for unhappy.

I'm betting this is a quirk of GL initialization.  Will see if I can cook up a workaround.
That's not loading gralloc.  Hmm.
Forcing the content process to create a FrameBufferNativeWindow doesn't work either.  Nor does chopping hardware/libhardware/modules/gralloc/Android.mk out of the build (can't find gralloc module in content process).
It looks like the property DB is inaccessible in content processes.
Yeeeeaaah pretty bad, but doesn't affect any existing code AFAIK.
Attachment #636028 - Attachment is obsolete: true
Attachment #636069 - Flags: review?(bent.mozilla)
Attached patch part 2: Remove unused code (obsolete) — Splinter Review
Attachment #636072 - Flags: review?(matt.woodrow)
Attachment #636073 - Flags: superreview?(roc)
Attachment #636073 - Flags: review?(ncameron)
Attachment #636073 - Flags: review?(matt.woodrow)
Attachment #636073 - Flags: review?(bgirard)
Attachment #636070 - Flags: review?(mwu) → review+
Comment on attachment 636073 [details] [diff] [review]
part 3: Add an RAII helper to open/close SurfaceDescriptors

Review of attachment 636073 [details] [diff] [review]:
-----------------------------------------------------------------

Can you explain more about what's going on here?

It seems bad that we have to deal with surface-or-descriptor explicitly in some places. Is there any way to avoid that?
Yeah, I meant to write that up.

The interface to gralloc looks like

  Buffer b = Create();
  while (live) {
    DrawTarget t = b.lock(/*read, write*/);
    // draw
    b.unlock();
    //send to compositor
  }
  Destroy(b);

lock() means "ensure this memory is mapped into this process's address space, and assert a read or write lock".  There's no guarantee that the memory returned by lock() is still valid after unlock().  It may be unmapped or protected.

This violates the current assumption that a gfxASurface opened from a SurfaceDescriptor is valid as long as the SurfaceDescriptor is valid.  The model is changing to one in which the gfxASurface is valid only between lock()/unlock().

I started to manually insert the analogue of unlock() everywhere it was needed, but that was a fool's errand, too complicated and fragile.  So instead, I made AutoOpenSurface to do the lock()/unlock() automatically.  The gfxASurface* returned by Get() is only valid in the scope of the AutoOpenSurface.  (That's why it returns a bare pointer.)

While sprinkling AutoOpenSurface around, I noticed there were a bunch of places that we OpenDescriptor()'d just to check the underlying surface's content type and/or size.  The descriptor often knows those without having to map the surface, so I added a special interface for that to AutoOpenSurface.  The fallback is to map the gfxASurface and ask it.
I should also note that these patches get us to the point where we use gralloc buffers as a kind of fancy shmem, just a transport of pixels up to texture upload.  Using gralloc for that is slightly more expensive than using "normal" shmem.  In the next bug, we'll do the GL magic to avoid the upload and bind the gralloc buffer directly to texture.  And win will be had by all.
Changes a comment and fixes a thinko in the previous version.
Attachment #636075 - Attachment is obsolete: true
Attachment #636075 - Flags: review?(gal)
Attachment #636117 - Flags: review?(gal)
Comment on attachment 636117 [details] [diff] [review]
Integrate gralloc buffers into the shadow-layers pipelines, v1.1

Gave a couple optional nits offline.
Attachment #636117 - Flags: review?(gal) → review+
Comment on attachment 636073 [details] [diff] [review]
part 3: Add an RAII helper to open/close SurfaceDescriptors

Review of attachment 636073 [details] [diff] [review]:
-----------------------------------------------------------------

It seems that there is a lot of code duplication around the idiom of returning either a surface or a surface descriptor, then checking which is non-null and perhaps getting a reference to the underlying surface. Could all of this be encapsulated by using a helper object? (Oh, that is pretty much what roc said, ho-hum).
Comment on attachment 636071 [details] [diff] [review]
part 1: Migrate ImageLayers to SurfaceDescriptor

Review of attachment 636071 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayers.cpp
@@ +2092,5 @@
>          gfxUtils::ClipToRegion(aTarget, aLayer->GetEffectiveVisibleRegion());
>        }
>        AutoSetOperator setOperator(aTarget, container->GetOperator());
> +      // XXX is this meant to be save/restore???
> +      //gfxMatrix temp = aTarget->CurrentMatrix();

nrc would probably know more about this. Doesn't appear to be doing anything currently though.

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +673,5 @@
> +    nsRefPtr<gfxImageSurface> surfY = asurfY->GetAsImageSurface();
> +    nsRefPtr<gfxASurface> asurfU =
> +      ShadowLayerForwarder::OpenDescriptor(yuv.Udata());
> +    nsRefPtr<gfxImageSurface> surfU = asurfU->GetAsImageSurface();
> +    // XXX do we really need to open this?

I can't see why we would, remove it?

@@ +722,5 @@
>        const YUVImage& yuv = aNewFront.get_YUVImage();
>  
> +      nsRefPtr<gfxASurface> asurfY =
> +        ShadowLayerForwarder::OpenDescriptor(yuv.Ydata());
> +      nsRefPtr<gfxImageSurface> surfY = asurfY->GetAsImageSurface();

I assume that the SurfaceDescriptors being used will always return surfaces that implement GetAsImageSurface correctly. Might be worth adding an assertion, if there isn't already one somewhere else.
Attachment #636071 - Flags: review?(matt.woodrow) → review+
Attachment #636072 - Flags: review?(matt.woodrow) → review+
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Comment on attachment 636071 [details] [diff] [review]
> part 1: Migrate ImageLayers to SurfaceDescriptor
> 
> Review of attachment 636071 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicLayers.cpp
> @@ +2092,5 @@
> >          gfxUtils::ClipToRegion(aTarget, aLayer->GetEffectiveVisibleRegion());
> >        }
> >        AutoSetOperator setOperator(aTarget, container->GetOperator());
> > +      // XXX is this meant to be save/restore???
> > +      //gfxMatrix temp = aTarget->CurrentMatrix();
> 
> nrc would probably know more about this. Doesn't appear to be doing anything
> currently though.
> 

I'm 99% sure that was used for debugging and left in by mistake. Removing it is the right thing to do.
Attached patch rollup (obsolete) — Splinter Review
With this patch applied and omtc turned on:

 pref("layers.offmainthreadcomposition.enabled", true);
 pref("dom.ipc.tabs.disabled", true);

I got

 W/GraphicBufferMapper( 3237): lock(...) failed -22 (Invalid argument)
 I/Gecko   ( 3237): ###!!! ABORT: unexpected SurfaceDescriptor type!: file /home/kanru/zone2/mozilla/central/gfx/layers/ipc/ShadowLayers.cpp, line 441

on nexus-s
kanru can you please post a stack with gdb?
It hit the assertion at ShadowLayers.cpp:441

#0  mozilla::layers::ShadowLayerForwarder::OpenDescriptor (aMode=<value optimized out>, aSurface=...) at /home/kanru/zone2/mozilla/central/gfx/layers/ipc/ShadowLayers.cpp:435
#1  0x40c97ab4 in mozilla::layers::AutoOpenSurface::Get (this=0xbeb1a328) at /home/kanru/zone2/mozilla/central/gfx/layers/ipc/ShadowLayers.cpp:663
#2  0x40c85926 in mozilla::layers::BasicShadowableThebesLayer::SyncFrontBufferToBackBuffer (this=0x456d88) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:2531
#3  0x40c87af2 in mozilla::layers::BasicThebesLayer::PaintThebes (this=0x456d88, aContext=0xf23620, aMaskLayer=<value optimized out>, aCallback=0, aCallbackData=0x0, aReadback=0xbeb1adfc) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:671
#4  0x40c87f7e in mozilla::layers::BasicShadowableThebesLayer::PaintThebes (this=0x456d88, aContext=0xf23620, aMaskLayer=0x0, aCallback=0, aCallbackData=0x0, aReadback=0xbeb1adfc) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:2481
#5  0x40c872a4 in mozilla::layers::BasicLayerManager::PaintLayer (this=0xcd4bd8, aTarget=0xf23620, aLayer=0x456d88, aCallback=<value optimized out>, aCallbackData=0x0, aReadback=0xbeb1adfc) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:2065
#6  0x40c8731c in mozilla::layers::BasicLayerManager::PaintLayer (this=0xcd4bd8, aTarget=0xf23620, aLayer=0x4569b8, aCallback=<value optimized out>, aCallbackData=0x0, aReadback=0xbeb1b244) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:2080
#7  0x40c8731c in mozilla::layers::BasicLayerManager::PaintLayer (this=0xcd4bd8, aTarget=0xf23620, aLayer=0xbba490, aCallback=<value optimized out>, aCallbackData=0x0, aReadback=0x0) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:2080
#8  0x40c87a36 in mozilla::layers::BasicLayerManager::EndTransactionInternal (this=0xcd4bd8, aCallback=0, aCallbackData=0x0, aFlags=<value optimized out>) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:1729
#9  0x40c87a90 in mozilla::layers::BasicLayerManager::EndEmptyTransaction (this=0x1) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:1769
#10 0x40c883ac in mozilla::layers::BasicShadowLayerManager::EndEmptyTransaction (this=0x1) at /home/kanru/zone2/mozilla/central/gfx/layers/basic/BasicLayers.cpp:3670
#11 0x4060648a in PresShell::Paint (this=0xa328d0, aViewToPaint=0xa1e500, aWidgetToPaint=0xa34b10, aDirtyRegion=..., aIntDirtyRegion=..., aWillSendDidPaint=<value optimized out>) at /home/kanru/zone2/mozilla/central/layout/base/nsPresShell.cpp:5253
#12 0x408034ca in nsViewManager::Refresh (this=0xa34ae0, aView=<value optimized out>, aWidget=0xa34b10, aRegion=<value optimized out>, aWillSendDidPaint=false) at /home/kanru/zone2/mozilla/central/view/src/nsViewManager.cpp:341
#13 0x40803aaa in nsViewManager::DispatchEvent (this=0xa34ae0, aEvent=0xbeb1b678, aView=0xa1e500, aStatus=<value optimized out>) at /home/kanru/zone2/mozilla/central/view/src/nsViewManager.cpp:770
#14 0x40801e7c in HandleEvent (aEvent=0xbeb1b678) at /home/kanru/zone2/mozilla/central/view/src/nsView.cpp:127
#15 0x40b0527a in nsWindow::DoDraw () at /home/kanru/zone2/mozilla/central/widget/gonk/nsWindow.cpp:220
#16 0x40b0350c in nsAppShell::ProcessNextNativeEvent (this=0x3a7190, mayWait=<value optimized out>) at /home/kanru/zone2/mozilla/central/widget/gonk/nsAppShell.cpp:583
#17 0x40b1f6ae in nsBaseAppShell::DoProcessNextNativeEvent (this=0xa34ae0, mayWait=true, recursionDepth=4550416) at /home/kanru/zone2/mozilla/central/widget/xpwidgets/nsBaseAppShell.cpp:139
#18 0x40b1f774 in nsBaseAppShell::OnProcessNextEvent (this=0x3a7190, thr=0x1e01a0, mayWait=false, recursionDepth=0) at /home/kanru/zone2/mozilla/central/widget/xpwidgets/nsBaseAppShell.cpp:286
#19 0x40c27d84 in nsThread::ProcessNextEvent (this=0x1e01a0, mayWait=false, result=0xbeb1b89f) at /home/kanru/zone2/mozilla/central/xpcom/threads/nsThread.cpp:586
#20 0x40c08dea in NS_ProcessNextEvent_P (thread=0x3a7190, mayWait=false) at /home/kanru/zone2/mozilla/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:217
#21 0x40b8cd1c in mozilla::ipc::MessagePump::Run (this=0x1ce098, aDelegate=0x1ce288) at /home/kanru/zone2/mozilla/central/ipc/glue/MessagePump.cpp:82
#22 0x40c474ac in MessageLoop::RunInternal (this=0x4105dc38) at /home/kanru/zone2/mozilla/central/ipc/chromium/src/base/message_loop.cc:208
#23 0x40c47562 in MessageLoop::RunHandler (this=0x1ce288) at /home/kanru/zone2/mozilla/central/ipc/chromium/src/base/message_loop.cc:201
#24 MessageLoop::Run (this=0x1ce288) at /home/kanru/zone2/mozilla/central/ipc/chromium/src/base/message_loop.cc:175
#25 0x40b1f274 in nsBaseAppShell::Run (this=0x3a7190) at /home/kanru/zone2/mozilla/central/widget/xpwidgets/nsBaseAppShell.cpp:163
#26 0x40a5f264 in nsAppStartup::Run (this=0x3a7138) at /home/kanru/zone2/mozilla/central/toolkit/components/startup/nsAppStartup.cpp:256
#27 0x404f3b9a in XREMain::XRE_mainRun (this=0xbeb1ba5c) at /home/kanru/zone2/mozilla/central/toolkit/xre/nsAppRunner.cpp:3786
#28 0x404f6310 in XREMain::XRE_main (this=0xbeb1ba5c, argc=<value optimized out>, argv=0xbeb1dc44, aAppData=<value optimized out>) at /home/kanru/zone2/mozilla/central/toolkit/xre/nsAppRunner.cpp:3863
#29 0x404f6468 in XRE_main (argc=1, argv=0xbeb1dc44, aAppData=0xa970, aFlags=<value optimized out>) at /home/kanru/zone2/mozilla/central/toolkit/xre/nsAppRunner.cpp:3939
#30 0x0000896a in do_main (argc=1, argv=0xbeb1dc44) at /home/kanru/zone2/mozilla/central/b2g/app/nsBrowserApp.cpp:153
#31 main (argc=1, argv=0xbeb1dc44) at /home/kanru/zone2/mozilla/central/b2g/app/nsBrowserApp.cpp:236
Attachment #636069 - Flags: review?(bent.mozilla) → review+
(In reply to Kan-Ru Chen [:kanru] from comment #35)
> It hit the assertion at ShadowLayers.cpp:441

The backtrace is from nexus-s.

No problem on sgs2.
Attachment #636074 - Flags: review?(bgirard) → review+
Comment on attachment 636073 [details] [diff] [review]
part 3: Add an RAII helper to open/close SurfaceDescriptors

Review of attachment 636073 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicImplData.h
@@ +102,5 @@
>    /**
>     * Return a surface for this layer. Will use an existing surface, if
> +   * possible, or may create a temporary surface.  Implement this
> +   * method for any layers that might be used as a mask.  Should only
> +   * return false if a surface cannor be created.  If true is

cannot*

::: gfx/layers/basic/BasicLayers.cpp
@@ +97,2 @@
>         bool maskIs2D =
>           aMaskLayer->GetEffectiveTransform().CanDraw2D(aMaskTransform);

Unrelated to your patch here but we're computing maskIs2D uselessly for production builds.
Attachment #636073 - Flags: review?(bgirard) → review+
> ::: gfx/layers/basic/BasicLayers.cpp
> @@ +97,2 @@
> >         bool maskIs2D =
> >           aMaskLayer->GetEffectiveTransform().CanDraw2D(aMaskTransform);
> 
> Unrelated to your patch here but we're computing maskIs2D uselessly for
> production builds.

Not true, although it probably needs a comment because I keep thinking the same, aMaskTransform is an out parameter of both CanDraw2D and this method, so the call is necessary in production builds to calculate aMaskTransform. maskIs2D will be returned in any case, so the only unnecessary work is assigning it into a local variable, which the compiler should optimise away, and is no big thing if it doesn't.
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Comment on attachment 636071 [details] [diff] [review]
> part 1: Migrate ImageLayers to SurfaceDescriptor
> 
> @@ +722,5 @@
> >        const YUVImage& yuv = aNewFront.get_YUVImage();
> >  
> > +      nsRefPtr<gfxASurface> asurfY =
> > +        ShadowLayerForwarder::OpenDescriptor(yuv.Ydata());
> > +      nsRefPtr<gfxImageSurface> surfY = asurfY->GetAsImageSurface();
> 
> I assume that the SurfaceDescriptors being used will always return surfaces
> that implement GetAsImageSurface correctly. Might be worth adding an
> assertion, if there isn't already one somewhere else.

Going back over this again, I don't like how this works.  I'm instead going to add a "capabilities" flag to CreateBuffer(), with a MapAsImageSurface flag used by ImageLayer.
(In reply to Matt Woodrow (:mattwoodrow) from comment #30)
> Comment on attachment 636071 [details] [diff] [review]
> part 1: Migrate ImageLayers to SurfaceDescriptor
> 
> Review of attachment 636071 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicLayers.cpp
> @@ +2092,5 @@
> >          gfxUtils::ClipToRegion(aTarget, aLayer->GetEffectiveVisibleRegion());
> >        }
> >        AutoSetOperator setOperator(aTarget, container->GetOperator());
> > +      // XXX is this meant to be save/restore???
> > +      //gfxMatrix temp = aTarget->CurrentMatrix();
> 
> nrc would probably know more about this. Doesn't appear to be doing anything
> currently though.
> 

Removed.

> ::: gfx/layers/opengl/ImageLayerOGL.cpp
> @@ +673,5 @@
> > +    nsRefPtr<gfxImageSurface> surfY = asurfY->GetAsImageSurface();
> > +    nsRefPtr<gfxASurface> asurfU =
> > +      ShadowLayerForwarder::OpenDescriptor(yuv.Udata());
> > +    nsRefPtr<gfxImageSurface> surfU = asurfU->GetAsImageSurface();
> > +    // XXX do we really need to open this?
> 
> I can't see why we would, remove it?
> 

Removed.

> @@ +722,5 @@
> >        const YUVImage& yuv = aNewFront.get_YUVImage();
> >  
> > +      nsRefPtr<gfxASurface> asurfY =
> > +        ShadowLayerForwarder::OpenDescriptor(yuv.Ydata());
> > +      nsRefPtr<gfxImageSurface> surfY = asurfY->GetAsImageSurface();
> 
> I assume that the SurfaceDescriptors being used will always return surfaces
> that implement GetAsImageSurface correctly. Might be worth adding an
> assertion, if there isn't already one somewhere else.

Implemented plan in comment 39.
(In reply to Benoit Girard (:BenWa) from comment #37)
> Comment on attachment 636073 [details] [diff] [review]
> part 3: Add an RAII helper to open/close SurfaceDescriptors
> 
> Review of attachment 636073 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicImplData.h
> @@ +102,5 @@
> >    /**
> >     * Return a surface for this layer. Will use an existing surface, if
> > +   * possible, or may create a temporary surface.  Implement this
> > +   * method for any layers that might be used as a mask.  Should only
> > +   * return false if a surface cannor be created.  If true is
> 
> cannot*
> 

Fixed.
(In reply to Nick Cameron [:nrc] from comment #29)
> Comment on attachment 636073 [details] [diff] [review]
> part 3: Add an RAII helper to open/close SurfaceDescriptors
> 
> Review of attachment 636073 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It seems that there is a lot of code duplication around the idiom of
> returning either a surface or a surface descriptor, then checking which is
> non-null and perhaps getting a reference to the underlying surface. Could
> all of this be encapsulated by using a helper object? (Oh, that is pretty
> much what roc said, ho-hum).

I added an AutoMaskData helper to manage this.
Attachment #636071 - Attachment is obsolete: true
Attachment #636072 - Attachment is obsolete: true
Attachment #636073 - Attachment is obsolete: true
Attachment #636074 - Attachment is obsolete: true
Attachment #636117 - Attachment is obsolete: true
Attachment #636282 - Attachment is obsolete: true
Attachment #636073 - Flags: superreview?(roc)
Attachment #636073 - Flags: review?(ncameron)
Attachment #636073 - Flags: review?(matt.woodrow)
Attachment #636074 - Flags: review?(matt.woodrow)
Attachment #640448 - Flags: review?(roc)
Carrying over r=mattwoodrow, but this changed enough that it's probably worth a second look.
Attachment #640449 - Flags: review?(roc)
Carrying over r=mattwoodrow.
Attachment #640450 - Flags: review+
Carrying over r=BenWa.  Pinch-review to roc.
Attachment #640452 - Flags: review?(roc)
Comment on attachment 640448 [details] [diff] [review]
part 1: Let clients specify capabilities required of cross-process surfaces.  Only MapAsImageSurface needed for now.

Review of attachment 640448 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with that.

::: gfx/layers/ipc/ShadowLayers.h
@@ +43,5 @@
> +  /** 
> +   * The allocated buffer must be efficiently mappable as a
> +   * gfxImageSurface.
> +   */
> +  MapAsImageSurface = 1 << 0,

more common style is to use DEFAULT_BUFFER_CAPS, MAP_AS_IMAGE_SURFACE for constants like these.
Attachment #640448 - Flags: review?(roc) → review+
Comment on attachment 640449 [details] [diff] [review]
part 2: Migrate ImageLayers to SurfaceDescriptor, v2

Review of attachment 640449 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, you could simplify a bunch of code here by adding ShadowLayerForwarder::OpenDescriptorAsImageSurface, but I'll let it slide because you rewrite all this in the later patch anyway...
Comment on attachment 640451 [details] [diff] [review]
part 4: Add an RAII helper to open/close SurfaceDescriptors, v2

Review of attachment 640451 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking better than it did before :-).

::: gfx/layers/ipc/ShadowLayers.h
@@ +340,5 @@
> +                                   gfxIntSize* aSize,
> +                                   gfxASurface** aSurface);
> +
> +  static already_AddRefed<gfxASurface>
> +  OpenDescriptor(const SurfaceDescriptor& aSurface);

These should really all be methods of SurfaceDescriptor. Maybe we can find a way to do that?

@@ +384,5 @@
> +  /** This can't escape the scope of AutoOpenSurface. */
> +  gfxASurface* Get();
> +
> +  /** This can't escape the scope of AutoOpenSurface. */
> +  gfxImageSurface* GetAsImage();

Does this always succeed? If not, document what happens when it doesn't, and under what conditions it will succeed.
Comment on attachment 640452 [details] [diff] [review]
part 5: Mark usage of SurfaceDescriptor as ReadOnly or ReadWrite, v2

Review of attachment 640452 [details] [diff] [review]:
-----------------------------------------------------------------

Would it hurt much to fold this into 4? Probably it would help since I think you're basically just changing the same lines of code all over again.

::: gfx/layers/ipc/ShadowLayers.h
@@ +48,5 @@
>  };
>  
> +enum OpenMode {
> +  OpenReadOnly,
> +  OpenReadWrite

OPEN_READ_ONLY, OPEN_READ_WRITE
Comment on attachment 640451 [details] [diff] [review]
part 4: Add an RAII helper to open/close SurfaceDescriptors, v2

Review of attachment 640451 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the small change below

::: gfx/layers/basic/BasicImageLayer.cpp
@@ +161,3 @@
>  {
>    if (!mContainer) {
>      return nsnull;

return false;
Attachment #640451 - Flags: review?(ncameron) → review+
Comment on attachment 640451 [details] [diff] [review]
part 4: Add an RAII helper to open/close SurfaceDescriptors, v2

Review of attachment 640451 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicLayersImpl.h
@@ +61,5 @@
>      return static_cast<BasicLayerManager*>(mManager);
>    }
>  };
>  
> +class NS_STACK_CLASS AutoMaskData {

Needs a comment explaining what this class is for.

@@ +70,5 @@
> +  void Construct(const gfxMatrix& aTransform,
> +                 gfxASurface* aSurface);
> +
> +  void Construct(const gfxMatrix& aTransform,
> +                 const SurfaceDescriptor& aSurface);

... and what these methods do.

::: gfx/layers/basic/BasicThebesLayer.cpp
@@ +229,5 @@
>  }
>  
> +/**
> + * AutoOpenBuffer tracks buffer "mappings" (making visible to the CPU)
> + * of texturable memory.  For most layer types, that's pretty easy.

Is this a bit specific? It's not a given that we must have CPU-addressable buffers to render into a ThebesLayer. In fact with D3D10/D2D we don't have that. I think this comment needs to be generalized a bit.

@@ +461,5 @@
>  
>    mIsNewBuffer = true;
>  
> +  gfxASurface* buffer = mBufferTracker->CreatedBuffer(mBackBuffer);
> +  return nsRefPtr<gfxASurface>(buffer).forget();

Slightly less magical to declare buffer as nsRefPtr and then return buffer.forget()

::: gfx/layers/ipc/ShadowLayers.cpp
@@ +615,5 @@
> +AutoOpenSurface::~AutoOpenSurface()
> +{
> +  if (mSurface) {
> +    mSurface = nsnull;
> +    //ShadowLayerForwarder::CloseDescriptor(mDescriptor);

Setting mSurface to null here seems redundant since its destructor is about to run?
Attachment #640451 - Flags: superreview?(roc) → superreview+
Attachment #640454 - Flags: review?(bgirard) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #53)
> Comment on attachment 640452 [details] [diff] [review]
> part 5: Mark usage of SurfaceDescriptor as ReadOnly or ReadWrite, v2
> 
> Review of attachment 640452 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Would it hurt much to fold this into 4? Probably it would help since I think
> you're basically just changing the same lines of code all over again.
> 

No, I only split like this to make it easier for folks to concentrate on remembering whether +r or +rw was needed.
Comment on attachment 640451 [details] [diff] [review]
part 4: Add an RAII helper to open/close SurfaceDescriptors, v2

Review of attachment 640451 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/basic/BasicBuffers.h
@@ +57,5 @@
>      gfxASurface* aBuffer,
>      gfxASurface* aSource, const nsIntRect& aRect, const nsIntPoint& aRotation,
>      const nsIntRegion& aUpdateRegion);
>  
> +  void MapBuffer(gfxASurface* aBuffer)

I don't fell like these add much value over making SetBuffer public other then a more descriptive name. It will get inline so I don't object.

@@ +96,5 @@
>    {
>      MOZ_COUNT_DTOR(ShadowThebesLayerBuffer);
>    }
>  
> +  void Swap(const nsIntRect& aNewRect, const nsIntPoint& aNewRotation,

As discussed on IRC this is very confusing since we're not swapping surfaces as you'd normally expect. Here's my attempt:

// Swap must happen when there's no buffer mapped. This means we wont be swapping
// between a physical surface. However next time we map a buffer it will be a different
// buffer with a new rect and rotation.
Attachment #640451 - Flags: review?(bgirard) → review+
(In reply to Chris Jones [:cjones] [:warhammer] from comment #56)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #53)
> > Comment on attachment 640452 [details] [diff] [review]
> > part 5: Mark usage of SurfaceDescriptor as ReadOnly or ReadWrite, v2
> > 
> > Review of attachment 640452 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Would it hurt much to fold this into 4? Probably it would help since I think
> > you're basically just changing the same lines of code all over again.
> > 
> 
> No, I only split like this to make it easier for folks to concentrate on
> remembering whether +r or +rw was needed.

To be clear, were you waiting on the fold to review this?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #50)
> Comment on attachment 640448 [details] [diff] [review]
> part 1: Let clients specify capabilities required of cross-process surfaces.
> Only MapAsImageSurface needed for now.
> 
> Review of attachment 640448 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with that.
> 
> ::: gfx/layers/ipc/ShadowLayers.h
> @@ +43,5 @@
> > +  /** 
> > +   * The allocated buffer must be efficiently mappable as a
> > +   * gfxImageSurface.
> > +   */
> > +  MapAsImageSurface = 1 << 0,
> 
> more common style is to use DEFAULT_BUFFER_CAPS, MAP_AS_IMAGE_SURFACE for
> constants like these.

*shrug*.  Changed.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #52)
> Comment on attachment 640451 [details] [diff] [review]
> part 4: Add an RAII helper to open/close SurfaceDescriptors, v2
> 
> ::: gfx/layers/ipc/ShadowLayers.h
> @@ +340,5 @@
> > +                                   gfxIntSize* aSize,
> > +                                   gfxASurface** aSurface);
> > +
> > +  static already_AddRefed<gfxASurface>
> > +  OpenDescriptor(const SurfaceDescriptor& aSurface);
> 
> These should really all be methods of SurfaceDescriptor. Maybe we can find a
> way to do that?
> 

Per IRC chat, we could do that with a subclass or container class, but the interface here includes alloc/delloc which requires an IPC context.  That's hard to easily/safely stuff in along with these guys.

> @@ +384,5 @@
> > +  /** This can't escape the scope of AutoOpenSurface. */
> > +  gfxASurface* Get();
> > +
> > +  /** This can't escape the scope of AutoOpenSurface. */
> > +  gfxImageSurface* GetAsImage();
> 
> Does this always succeed? If not, document what happens when it doesn't, and
> under what conditions it will succeed.

It succeeds when gfxASurface::GetAsImageSurface() does.  Noted that, and also that clients should use MAP_AS_IMAGE_SURFACE when this helper is important.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #53)
> Comment on attachment 640452 [details] [diff] [review]
> part 5: Mark usage of SurfaceDescriptor as ReadOnly or ReadWrite, v2
> 
> Review of attachment 640452 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Would it hurt much to fold this into 4? Probably it would help since I think
> you're basically just changing the same lines of code all over again.
> 
> ::: gfx/layers/ipc/ShadowLayers.h
> @@ +48,5 @@
> >  };
> >  
> > +enum OpenMode {
> > +  OpenReadOnly,
> > +  OpenReadWrite
> 
> OPEN_READ_ONLY, OPEN_READ_WRITE

Changed.
(In reply to Nick Cameron [:nrc] from comment #54)
> Comment on attachment 640451 [details] [diff] [review]
> part 4: Add an RAII helper to open/close SurfaceDescriptors, v2
> 
> Review of attachment 640451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with the small change below
> 
> ::: gfx/layers/basic/BasicImageLayer.cpp
> @@ +161,3 @@
> >  {
> >    if (!mContainer) {
> >      return nsnull;
> 
> return false;

Fixed.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #55)
> Comment on attachment 640451 [details] [diff] [review]
> part 4: Add an RAII helper to open/close SurfaceDescriptors, v2
> 
> Review of attachment 640451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicLayersImpl.h
> @@ +61,5 @@
> >      return static_cast<BasicLayerManager*>(mManager);
> >    }
> >  };
> >  
> > +class NS_STACK_CLASS AutoMaskData {
> 
> Needs a comment explaining what this class is for.
> 

/**
 * Drawing with a mask requires a mask surface and a transform.
 * Sometimes the mask surface is a direct gfxASurface, but other times
 * it's a SurfaceDescriptor.  For SurfaceDescriptor, we need to use a
 * scoped AutoOpenSurface to get a gfxASurface for the
 * SurfaceDescriptor.
 *
 * This helper class manages the gfxASurface-or-SurfaceDescriptor
 * logic.
 */

> @@ +70,5 @@
> > +  void Construct(const gfxMatrix& aTransform,
> > +                 gfxASurface* aSurface);
> > +
> > +  void Construct(const gfxMatrix& aTransform,
> > +                 const SurfaceDescriptor& aSurface);
> 
> ... and what these methods do.
> 
  /**
   * Construct this out of either a gfxASurface or a
   * SurfaceDescriptor.  Construct() must only be called once.
   * GetSurface() and GetTransform() must not be called until this has
   * been constructed.
   */

> ::: gfx/layers/basic/BasicThebesLayer.cpp
> @@ +229,5 @@
> >  }
> >  
> > +/**
> > + * AutoOpenBuffer tracks buffer "mappings" (making visible to the CPU)
> > + * of texturable memory.  For most layer types, that's pretty easy.
> 
> Is this a bit specific? It's not a given that we must have CPU-addressable
> buffers to render into a ThebesLayer. In fact with D3D10/D2D we don't have
> that. I think this comment needs to be generalized a bit.
> 

 * AutoOpenBuffer is a helper that builds on top of AutoOpenSurface,
 * which we need to get a gfxASurface from a SurfaceDescriptor.  For
 * other layer types, simple lexical scoping of AutoOpenSurface is
 * easy.  For ThebesLayers, the lifetime of buffer mappings doesn't
 * exactly match simple lexical scopes, so naively putting
 * AutoOpenSurfaces on the stack doesn't always work.  We use this
 * helper to track openings instead.
 *
 * Any surface that's opened while painting this ThebesLayer will
 * notify this helper and register itself for unmapping.

> @@ +461,5 @@
> >  
> >    mIsNewBuffer = true;
> >  
> > +  gfxASurface* buffer = mBufferTracker->CreatedBuffer(mBackBuffer);
> > +  return nsRefPtr<gfxASurface>(buffer).forget();
> 
> Slightly less magical to declare buffer as nsRefPtr and then return
> buffer.forget()
> 

Changed.

> ::: gfx/layers/ipc/ShadowLayers.cpp
> @@ +615,5 @@
> > +AutoOpenSurface::~AutoOpenSurface()
> > +{
> > +  if (mSurface) {
> > +    mSurface = nsnull;
> > +    //ShadowLayerForwarder::CloseDescriptor(mDescriptor);
> 
> Setting mSurface to null here seems redundant since its destructor is about
> to run?

This is to order the destruction of the gfxASurface before CloseDescriptor(), which might unmap the memory pointed at gfxASurface.  Not that I don't trust thebes/cairo, I just don't see any reason to be gung ho about that ...
(In reply to Benoit Girard (:BenWa) from comment #57)
> Comment on attachment 640451 [details] [diff] [review]
> part 4: Add an RAII helper to open/close SurfaceDescriptors, v2
> 
> Review of attachment 640451 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/basic/BasicBuffers.h
> @@ +57,5 @@
> >      gfxASurface* aBuffer,
> >      gfxASurface* aSource, const nsIntRect& aRect, const nsIntPoint& aRotation,
> >      const nsIntRegion& aUpdateRegion);
> >  
> > +  void MapBuffer(gfxASurface* aBuffer)
> 
> I don't fell like these add much value over making SetBuffer public other
> then a more descriptive name. It will get inline so I don't object.
> 

   * When BasicThebesLayerBuffer is used with layers that hold
   * SurfaceDescriptor, this buffer only has a valid gfxASurface in
   * the scope of an AutoOpenSurface for that SurfaceDescriptor.  That
   * is, it's sort of a "virtual buffer" that's only mapped an
   * unmapped within the scope of AutoOpenSurface.  None of the
   * underlying buffer attributes (rect, rotation) are affected by
   * mapping/unmapping.
   *
   * These helpers just exist to provide more descriptive names of the
   * map/unmap process.

> @@ +96,5 @@
> >    {
> >      MOZ_COUNT_DTOR(ShadowThebesLayerBuffer);
> >    }
> >  
> > +  void Swap(const nsIntRect& aNewRect, const nsIntPoint& aNewRotation,
> 
> As discussed on IRC this is very confusing since we're not swapping surfaces
> as you'd normally expect. Here's my attempt:
> 
> // Swap must happen when there's no buffer mapped. This means we wont be
> swapping
> // between a physical surface. However next time we map a buffer it will be
> a different
> // buffer with a new rect and rotation.

   * Swap in the old "virtual buffer" (see above) attributes in aNew*
   * and return the old ones in aOld*.
   *
   * Swap() must only be called when the buffer is in its "unmapped"
   * state, that is the underlying gfxASurface is not available.  It
   * is expected that the owner of this buffer holds an unmapped
   * SurfaceDescriptor as the backing storage for this buffer.  That's
   * why no gfxASurface or SurfaceDescriptor parameters appear here.
Trivial, to be folded.  Not requesting rereview.
The interesting hunk here is the assertion removal, which got irc-r from roc and nrc.  The other two hunks are trivial, no rereview.
This is a bad one.  The original patch had AutoOpenSurface store

  const SurfaceDescriptor& mDescriptor

as a member variable.  I don't remember whether this was a type or I did it intentionally for some reason.  Anyway, it's bad and led to some really subtle and hard to debug crashes.  So the fix is to store

  SurfaceDescriptor mDescriptor

The problem is, now we need the definition of SurfaceDescriptor, can't forward declare it.  That requires some ipdl-generated headers, which pull in chromium stuff that includes their NSPR fork and <windows.h>.  ShadowLayers.h gets transitively included more than I'd like so fixing up the base/basictypes.h and windows.h problems turned into a tar baby.

Instead, this patch splits AutoOpenSurface into its own header.  None of the rest of the impl changed beyond what's described above.
Attachment #641395 - Flags: review?(roc)
Got all the way up to |hg push| and then ran into win32 PGO tree bustage, everyone's favorite.  Hopefully can land when I wake up.
Blocks: 775436
Blocks: 775649
No longer blocks: 775649
blocking-basecamp: ? → +
Blocks: 820316
(In reply to Andreas Gal :gal from comment #28)
> Comment on attachment 636117 [details] [diff] [review]
> Integrate gralloc buffers into the shadow-layers pipelines, v1.1
> 
> Gave a couple optional nits offline.

> https://hg.mozilla.org/mozilla-central/rev/089b9510e595

For the record, this is the single worst changeset that I've seen in 3 years at Mozilla, in terms of how many bugs it has caused, how difficult they have been to debug, how much time they have drawn away from more interesting projects.

The basic mistake here is to wrap an existing ref-counted class (android::GraphicBuffer) in a non-refcounted class (GrallocBufferActor).

Just over the past week, here are the fascinating things that we've been working on: bug 900012, bug 912725.

We need to prioritize bug 879681 to end this.
Blocks: 879681
No longer blocks: 879681
Depends on: 879681
(In reply to Benoit Jacob [:bjacob] from comment #73)
> (In reply to Andreas Gal :gal from comment #28)
> > Comment on attachment 636117 [details] [diff] [review]
> > Integrate gralloc buffers into the shadow-layers pipelines, v1.1
> > 
> > Gave a couple optional nits offline.
> 
> > https://hg.mozilla.org/mozilla-central/rev/089b9510e595
> 
> For the record, this is the single worst changeset that I've seen in 3 years
> at Mozilla, in terms of how many bugs it has caused, how difficult they have
> been to debug, how much time they have drawn away from more interesting
> projects.
> 
> The basic mistake here is to wrap an existing ref-counted class
> (android::GraphicBuffer) in a non-refcounted class (GrallocBufferActor).
> 
> Just over the past week, here are the fascinating things that we've been
> working on: bug 900012, bug 912725.
> 
> We need to prioritize bug 879681 to end this.

I definitely agree that GrallocBufferActor's memory model is causing a lot of trouble. I tried to remove it but it is quite hard because it is used everywhere in Gecko, and because we need a solution for the problem of not serializing/deserializing the gralloc's file descriptor several time (GrallocBufferActor provides at least that). I have plans for the serialization thing, but just so you know there are a few dependencies for this bug and removing GrallocBufferActor is not a short term thing.

In the shorter term, what I suggest is to move to the new textures, because they make the gralloc actor wrapped and fully owned by the texture host on the compositor side, and the new TextureHost itself has proper memory management. Once all the uses of GrallocBufferActor have been wrapped into TextureClient and TextureHost, GrallocBufferActor will be isolated enough that it will cause less problems and be easier to remove or replace by a solution that is not gralloc specific.
This isn't meant for anyone in particular, just myself.

Some comments read of late-night frustration, which I totally get.  Made more than my share.  So my first reaction is to ignore them.  But I was reminded of a quote of Richard Dawkins's, to paraphrase, there's a danger in folks assuming that the makers of strident comments took the elementary precaution of being right.  I hate that this is going to sound like sour grapes, but I don't know any other way of saying it.

It makes me sad that things got to this point.  It makes me sad that during the rewrite, I can count on one hand the number of questions I was asked about the old code, let alone review requests, even while I was a MoCo employee.  It makes me especially sad that here three years, a rewrite, and who knows how much sweat and tears later, the model of the original code is *still* misunderstood by folks hacking on it, as some comments suggest.

That makes it hard for me to gin up sympathy, but no matter.  It's best for everyone to close things off de jure, instead of de facto (not that it makes a difference anyway ;) ).  I won't be responding to new comments.

Enjoy and good luck.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #75)
> It makes me sad that during
> the rewrite, I can count on one hand the number of questions I was asked
> about the old code, let alone review requests, even while I was a MoCo
> employee.

There is nothing surprising about us not asking many questions about that before you left, given the timing:

 - You left in March;
 - That's when we were just starting to work on gralloc-in-new-layers (bug 833128);
 - It's only a month later that we seriously got to fixing the kind of crashes that these issues caused (bug 862324);
 - It's only another month later, during the Taiwan work week, that the notion that PGrallocBuffer was inherently a mistake, became understood by everyone (leading to bug 879681).

> It makes me especially sad that here three years, a rewrite, and
> who knows how much sweat and tears later, the model of the original code is
> *still* misunderstood by folks hacking on it, as some comments suggest.

Please explain more specifically: what do you think we don't understand?
By the way, in case anyone is confused into thinking that PGrallocBuffer wasn't a cause of problems before the layers-refactoring, a quick google search for PGrallocBufferParent Send__delete__ quickly brings up occurrences of old-layers code running into the same kind of issues, e.g. bug 776940, and I distinctly remember that as I was working on the original support for gralloc in new-layers in March, this kind of PGrallocBufferParent Send__delete__ crashes were common in the old-layers, pre-layers-refactoring B2G master branch. (That is, obviously, something that we checked as we were running into these issues, as we wanted to know "how does old-layers do it?" and the answer was "old-layers has issues too".)
More examples on old-layers:
 * bug 829747 (e.g.  bug 829747 comment 40)
 * bug 853960 (which affected both old and new layers).
>  - It's only another month later, during the Taiwan work week, that the
> notion that PGrallocBuffer was inherently a mistake, became understood by
> everyone (leading to bug 879681).

I also atttended Taiwan work week, but I was too busy for the b2g task and do not know the above conclusion. Someone always says it's a mistake. I do not think so. It's a good implementation for b2g18. It could be changed and improved, but not a mistake. To judge the past such a way is not fair. You mentioned bug 879681. I feel current way of implementation is going to make mistake.

From different point of view, almost b2g members attended in oslo b2g work week think that graphics team totally failed the Gfx layer refactoring from b2g point of view, actual bug's regression number shows that graphics regression is most majority of b2g master's regression from v1.1. And I believe almost all current b2g master graphics problem is not related to PGrallocBufferParent. It says graphics team still do not understand the gralloc buffer. And from my point of view, current Graphic's team do not pay enough attention and tale care of b2g. b2g team faces very very serious bad affect from Graphics Regression. I hope to understand the current serious problem happening on b2g. b2g v1.2 is almost no worth as a mobile platform by the flaw of graphics. To mitigate this some, during the work week I worked from 8am to 2 am every day. They wrer Bug 912134 Bug 916264 both were not related to PGrallocBufferParent.

I do not want to blame about the past each other. Can we talk positively about the future?
> I do not want to blame about the past each other. Can we talk positively
> about the future?

Yes, this sounds like an excellent idea.

I believe the ball has been set rolling to get more tests etc. so that we don't have another situation where platform/gfx changes can regress b2g without anyone knowing.

Sotaro - if you think the approach in bug 879681 is incorrect, please comment there. It would be excellent to have your input. If you think there is something that is not understood about gralloc, please add it to https://wiki.mozilla.org/Platform/GFX/Gralloc, again, your input there would be valuable.

Without saying someone or someone else made a mistake, it is obvious that we need to support gralloc in a less buggy way than we currently do. Getting input from people who have worked a lot on gralloc issues for b2g is essential for that to succeed. I'm not sure what the best way to collect that is - should we file a bug for gralloc discussion? Or is a wiki page better?
I criticized one C++ class, I didn't criticize any _person_. Seeing how this criticism of a C++ class has been taken, I now regret to have started it. I apologize for that.

Sotaro, your tremendous work last week fixing these critical bugs is very much appreciated. Also, I didn't say that the presently-discussed C++ class was linked to _all_ our B2G gfx issues. In particular, it is absolutely true that the bugs on which you worked last week were not related to it at all. On the other hand, I worked on a different set of bugs last week that were related to it: bug 915869, bug 900012, bug 912725.

Sotaro, I also appreciate all your help over the past year understanding B2G-specific issues. Since you helped us so much on these matters, surely you know that B2G has been by far the platform on which the Gfx team has spent the most time over the past year.

As Nick said, if someone really thinks that we don't understand gralloc, it would be very helpful to edit https://wiki.mozilla.org/Platform/GFX/Gralloc (which, by the way, was jointly written with the B2G team during the last rendering work week, so I thought that we were on the same page when it came to our understanding of this topic).
I think current master b2g graphics problem come from architectural inconsistency between gecko and gonk. For me, it is not a gralloc problem. It is a architectural inconsistency/ffaw problem. Gonk is actually a part of android framework. Therefore they have own framework's requirements. Mobile platform is resource scarce, therefore the requirements becomes stricter than other pc's platform. For b2g18, we worked very hard to mitigate to combine two different architecture. Even I was very surprised that camera preview and video playback works well without modification to the camera hal and hw codec hal!

We faced a lot of PGrallocBuffer*** crashes during development. From this, it is easier to mention to the classes. But it is not a source of the problem. Architectural incosistency is the actual problem. And gecko's ipc characteristic make this situation worse. gecko IPC objects are not reference counted and the lot of classes uses the IPC object directly. It make the crash around IPC when client side handles the IPC object incorrectly.

Since two weeks ago, I started to look into the new gfx layers. So, I am a still beginner in this area. It looks like TextureHost/TextureClient are going to solves the IPC objects reference problem by wrapping them in TextureHost/TextureClient. But it is failed now, because GonkNativeWindow heavily breaks this. 
By reading the code around TextureHost/TextureClient, it is not clear about the roles of TextureHost / TextureClient. Around ClientThebesLayer, TextureHost/TextureClient seems to works as a abstraction of buffers. Around gecko's media area, the role seems different. abstraction of buffers are provided by Image class. And in b2g's video playback and camera preview cases, TextureHost/TextureClient are used just as a trailer of buffers. It deliver buffers from ImageContainer to related ImageLayerComposite for composition.

Current TextureHost/TextureClient seems to have a role of buffer trailer for a layer they belong to. Within gecko's media area, buffers can be rendered by multiple layers. It does not fit well to  TextureHost/TextureClient. We can write such code to do it, but it does not good from architecture point of view. My Idea to this problem is just simple, create a reference counted wrapper of buffer IPC object.
Currently TextureHost/TextureClient also works as a reference counted wrapper. Split the role to a new wapper object. TextureHost/TextureClient becomes free from buffers lifetime management and can focus to the role of the buffer trailer for a layer. I already wrote a little bit to Bug 897452 comment #1.
(In reply to Sotaro Ikeda [:sotaro] from comment #82)
> We faced a lot of PGrallocBuffer*** crashes during development. From this,
> it is easier to mention to the classes. But it is not a source of the
> problem. Architectural incosistency is the actual problem. And gecko's ipc
> characteristic make this situation worse. gecko IPC objects are not
> reference counted and the lot of classes uses the IPC object directly. It
> make the crash around IPC when client side handles the IPC object
> incorrectly.

You are right. The problem that I was blaming GrallocBufferActor for is not specific at all to GrallocBufferActor. It just happens to be the one that I ran into. In that sense, my criticism was unfair to GrallocBufferActor. I, too, wish that IPC objects were reference-counted.

> Since two weeks ago, I started to look into the new gfx layers. [...]

Two weeks only, and you already were able to make such a patch as attachment 804019 [details] [diff] [review]. That is very impressive. But let me suggest that that is also a very good indication of the benefits that we are starting to reap from the layers-refactoring.

Here is why. The layers code before layers-refactoring was very complex to understand. That was true on all platforms; perhaps even more so on B2G, due to the large number of B2G-specific code paths. For example, the code to wrap a gralloc buffer as an OpenGL texture was done in several different places, for different kinds of layers (ImageLayerOGL.cpp had its own code for that, etc). By contrast, in the new-layers code, it is done in only one place (GrallocTextureHost) (well, there is still the Deprecated version, but it's meant to go away soon).

So in the old-layers code, you couldn't have made a patch like you did, that added what was needed to the CompositableHost class. Instead, you would have had to fix one by one all the different kinds of layers that we had (ImageLayerOGL, CanvasLayerOGL, ThebesLayerOGL...). They each had their different kind of logic around that.

What's worse, the old-layers code used some very confusing functions, like BindExternalBuffer and UnbindExternalBuffer which, contrary to what their names suggested, were not to be used together (one was working with the TEXTURE_2D target, the other one with TEXTURE_EXTERNAL). That alone was the cause of more than one bug.

So before the layers-refactoring, B2G graphics found itself in a nasty place where really nobody in the gfx team (and, from trying to find people to work on incoming bugs, probably nobody at Mozilla) understood well how layers worked on B2G, and when bugs in B2G v1.0 came up and needed to be fixed, nobody was in a rush to volunteer... I remember that distinctly, because I was one of the people who had to work on some of these bugs. Examples include bug 860483 and bug 879297.

So I'm very impressed by your patch and it makes me happy that you could do that after only "two weeks" of looking at this code. I don't remember anyone doing such a feat on the old layers code.

> It looks like TextureHost/TextureClient are
> going to solves the IPC objects reference problem by wrapping them in
> TextureHost/TextureClient. But it is failed now, because GonkNativeWindow
> heavily breaks this.

Here and below it sounds like you are not disagreeing with the direction that the refactoring is taking --- you are rather pointing out ideas for further improvements, and noting consequences of us having to limit the scope of this project (we didn't do all the changes that we initially envisioned because we didn't have the time/resources). Anyway, your comments here sound very interesting, but I am not really the right person on these matters, Nical or Nick would be more knowledgeable. Maybe the best approach is that you file individual bugs about each of your ideas.
(In reply to Benoit Jacob [:bjacob] from comment #83)
> Here and below it sounds like you are not disagreeing with the direction

I like the way of layer refactoring. The code becomes more readable. gecko was originally used only for application's framework. b2g is using application's framework as system framework. b2g is also challenging this difference. From system framework point of view, I tend to like the more modular and atomic approach. Creating the wrapper of the buffer IPC object and keep separate from TextureHost/TextureClient comes from this thinking. I worked long time for development of android smart phones and the idea is strongly affected from android's system architecture. In this case, Android binder IPC is the source of the idea.

http://elinux.org/Android_Binder
Depends on: 1145389
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: