Efficiently share BasicThebesLayer pixel data with BasicShadowThebesLayer

RESOLVED FIXED

Status

()

defect
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: cjones, Assigned: cjones)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(13 attachments, 15 obsolete attachments)

8.86 KB, patch
roc
: review+
Details | Diff | Splinter Review
916 bytes, patch
Details | Diff | Splinter Review
1.17 KB, patch
roc
: review+
Details | Diff | Splinter Review
2.90 KB, patch
roc
: review+
shaver
: superreview+
Details | Diff | Splinter Review
10.33 KB, patch
karlt
: review+
shaver
: superreview+
Details | Diff | Splinter Review
23.25 KB, patch
roc
: review+
Details | Diff | Splinter Review
1.64 KB, patch
Details | Diff | Splinter Review
1.27 KB, patch
karlt
: review+
jrmuizel
: review+
Details | Diff | Splinter Review
6.29 KB, patch
karlt
: review+
Details | Diff | Splinter Review
14.14 KB, patch
roc
: review+
shaver
: superreview+
Details | Diff | Splinter Review
10.41 KB, patch
karlt
: review+
shaver
: superreview+
Details | Diff | Splinter Review
21.31 KB, patch
roc
: review+
shaver
: superreview+
Details | Diff | Splinter Review
69.18 KB, patch
Details | Diff | Splinter Review
Currently, BasicThebesLayer has a ThebesLayerBuffer which owns a gfxASurface into which content is painted.  To share this pixel data with the layer's shadow, we create two new gfxSharedImageSurfaces (backed by IPDL Shmem) as front and back buffers, both the size of the ThebesLayerBuffer, and do a dumb paint from the ThebesLayerBuffer into the back-buffer gfxSharedImageSurface on every BasicThebesLayer::Paint().  This is horribly inefficient; we use 50% more memory than necessary and copy from 50% to 100% more pixels than we need to on each Paint().  The gfx/thebes APIs used for the buffer-to-buffer copy are probably not even the best too, since that code as written by a novice (me).

A better (possibly optimal) implementation is to have the ThebesLayerBuffer (or a subtype thereof) in the content process draw directly into a gfxSharedImageSurface back buffer, then swap that with the browser process's front buffer on PLayers::Update (IPC-forwarded transaction). 

One lingering issue is (copied from a comment in part c of the series in bug 570294)

+   * XXX should the receiving process blit updates from the new front
+   * buffer to the previous front buffer (new back buffer) while it has
+   * access to the new front buffer?  Or is it better to fill the
+   * updates bits in anew on the new back buffer?
+   *
+   * Seems like memcpy()s from new-front to new-back would have to
+   * always be no slower than any kind of fill from content, so one
+   * would expect the former to win in terms of total throughput.
+   * However, that puts the blit on the critical path of
+   * publishing-process-blocking-on-receiving-process, so
+   * responsiveness might suffer, pointing to the latter.  Experience
+   * will tell!  (Maybe possible to choose between both depending on
+   * size of blit vs. expense of re-fill?)
(In reply to comment #0)
> Currently, BasicThebesLayer has a ThebesLayerBuffer which owns a gfxASurface
> into which content is painted.  To share this pixel data with the layer's
> shadow, we create two new gfxSharedImageSurfaces (backed by IPDL Shmem) as
> front and back buffers, both the size of the ThebesLayerBuffer, and do a dumb
> paint from the ThebesLayerBuffer into the back-buffer gfxSharedImageSurface on
> every BasicThebesLayer::Paint().  This is horribly inefficient; we use 50% more
> memory than necessary and copy from 50% to 100% more pixels than we need to on
> each Paint().

Not sure I understand; you're going to need a buffer for the content process and a buffer for the display process, right?  Or are there additional buffers that are being created?  (I'm also confused by your percentages -- you're saying, I think, that we use twice as much memory [could use 50% less memory], and that we copy from 2x to 4x the amount we'd need (?))

> The gfx/thebes APIs used for the buffer-to-buffer copy are
> probably not even the best too, since that code as written by a novice (me).

This should just be a memcpy -- see gfxImageSurface::CopyFrom.

> A better (possibly optimal) implementation is to have the ThebesLayerBuffer (or
> a subtype thereof) in the content process draw directly into a
> gfxSharedImageSurface back buffer, then swap that with the browser process's
> front buffer on PLayers::Update (IPC-forwarded transaction).

Oh, now I understand I think.  Sure, you should be able to just give the thebes layer the gfxSharedImageSurface backbuffer (which just looks like a gfxImageSurface, right?).

> +   * XXX should the receiving process blit updates from the new front
> +   * buffer to the previous front buffer (new back buffer) while it has
> +   * access to the new front buffer?  Or is it better to fill the
> +   * updates bits in anew on the new back buffer?

Well, you shouldn't have to swap buffers -- you can just copy from back -> front, thus always having the back buffer either have newer or same-age data as what's in the front buffer.  If we're only updating a portion of the screen, that'd probably be a pretty big win.

> +   * Seems like memcpy()s from new-front to new-back would have to
> +   * always be no slower than any kind of fill from content, so one
> +   * would expect the former to win in terms of total throughput.
> +   * However, that puts the blit on the critical path of
> +   * publishing-process-blocking-on-receiving-process, so
> +   * responsiveness might suffer, pointing to the latter.  Experience
> +   * will tell!  (Maybe possible to choose between both depending on
> +   * size of blit vs. expense of re-fill?)

This could certainly be interesting, if you had some way to notify the content process "hey your buffer was lost, rerender it entirely", perhaps in response to the entire display changing quickly as opposed to just some subregions.
(Hit enter too soon)

So, I would suggest *not* doing the swap-and-partial update approach; instead I'd start out with just:

- content process draws to back buffer always
- chrome process blits from back -> front
- chrome process always presents front buffer to screen

(no swapping of the buffers)

And then potentially add:

- content process draws to back buffer always
- chrome process swaps back and front, and notifies content that entire buffer was lost
- chrome process always presents front buffer to screen

(swap, but only a full swap, with a full re-render required for the next update)
(In reply to comment #1)
> (In reply to comment #0)
> > Currently, BasicThebesLayer has a ThebesLayerBuffer which owns a gfxASurface
> > into which content is painted.  To share this pixel data with the layer's
> > shadow, we create two new gfxSharedImageSurfaces (backed by IPDL Shmem) as
> > front and back buffers, both the size of the ThebesLayerBuffer, and do a dumb
> > paint from the ThebesLayerBuffer into the back-buffer gfxSharedImageSurface on
> > every BasicThebesLayer::Paint().  This is horribly inefficient; we use 50% more
> > memory than necessary and copy from 50% to 100% more pixels than we need to on
> > each Paint().
> 
> Not sure I understand; you're going to need a buffer for the content process
> and a buffer for the display process, right?  Or are there additional buffers
> that are being created?  (I'm also confused by your percentages -- you're
> saying, I think, that we use twice as much memory [could use 50% less memory],
> and that we copy from 2x to 4x the amount we'd need (?))
> 

Not quite.  This is just a dumb placeholder impl, so it's not that important to understand, but FTR it's approximately

  // child process
  class ThebesLayer:
    gfxASurface buffer;               // X bytes
    gfxSharedImageSurface back;       // (!!) X bytes

    def Paint():
      buffer.FillInvalidatedRegion()  // writes between 0 and X bytes
      CopyFrom(back <-- buffer)       // (!!) writes X bytes
      IPC->SwapBuffers(back, &back)   // swap |back| with parent's |front|

  class ShadowThebesLayer:
    gfxSharedImageSurface front;      // X bytes
    def Paint():
      CopyFrom(widget <-- front)      // writes ~X bytes

(Currently there's another copy from back --> widget in the child process, but that's just bug 570620 so I omitted it.)

The inefficiencies are marked (!!).  We map 3*X bytes when we could be using 2*X bytes, and we write between ~2*X and 3*X bytes when we could be writing ~1*X to 2*X.

> > The gfx/thebes APIs used for the buffer-to-buffer copy are
> > probably not even the best too, since that code as written by a novice (me).
> 
> This should just be a memcpy -- see gfxImageSurface::CopyFrom.
> 

Ah, I was using gfxContext (like I said, novice ;) ).  But in the pseudocode above, |back| and |buffer| might be different formats, and the fast path probably wouldn't be taken then.

> > A better (possibly optimal) implementation is to have the ThebesLayerBuffer (or
> > a subtype thereof) in the content process draw directly into a
> > gfxSharedImageSurface back buffer, then swap that with the browser process's
> > front buffer on PLayers::Update (IPC-forwarded transaction).
> 
> Oh, now I understand I think.  Sure, you should be able to just give the thebes
> layer the gfxSharedImageSurface backbuffer (which just looks like a
> gfxImageSurface, right?).
> 

Exactly.  It's a gfxASurface, in fact.

> > +   * XXX should the receiving process blit updates from the new front
> > +   * buffer to the previous front buffer (new back buffer) while it has
> > +   * access to the new front buffer?  Or is it better to fill the
> > +   * updates bits in anew on the new back buffer?
> 
> Well, you shouldn't have to swap buffers -- you can just copy from back ->
> front, thus always having the back buffer either have newer or same-age data as
> what's in the front buffer.  If we're only updating a portion of the screen,
> that'd probably be a pretty big win.
> 

I don't follow all the details here, but see below.

> > +   * Seems like memcpy()s from new-front to new-back would have to
> > +   * always be no slower than any kind of fill from content, so one
> > +   * would expect the former to win in terms of total throughput.
> > +   * However, that puts the blit on the critical path of
> > +   * publishing-process-blocking-on-receiving-process, so
> > +   * responsiveness might suffer, pointing to the latter.  Experience
> > +   * will tell!  (Maybe possible to choose between both depending on
> > +   * size of blit vs. expense of re-fill?)
> 
> This could certainly be interesting, if you had some way to notify the content
> process "hey your buffer was lost, rerender it entirely", perhaps in response
> to the entire display changing quickly as opposed to just some subregions.

Yep, this should be relatively straightforward in theory.

(In reply to comment #2)
> (Hit enter too soon)
> 
> So, I would suggest *not* doing the swap-and-partial update approach; instead
> I'd start out with just:
> 
> - content process draws to back buffer always
> - chrome process blits from back -> front
> - chrome process always presents front buffer to screen
> 
> (no swapping of the buffers)
> 

By "blits back -> front", do you mean writing just the dirty rect from back -> front, or a full back -> front copy?  If the former, that's basically what I had in mind with swap-and-partial-update; you've rightly pointed out that the swap is unnecessary (although it's free in IPDL so it doesn't matter much).  The full-copy impl is probably the best next step to take in this bug, but in the long term I don't think it's desirable to put that operation on the critical path of content-blocked-on-chrome (and of course disregard this if you meant the former).

> And then potentially add:
> 
> - content process draws to back buffer always
> - chrome process swaps back and front, and notifies content that entire buffer
> was lost
> - chrome process always presents front buffer to screen
> 
> (swap, but only a full swap, with a full re-render required for the next
> update)

Yeah, I definitely think we'll want this for cases like user-scrolls-really-fast.  Agree it's not a top priority.
(In reply to comment #3)
> (In reply to comment #1)

Oops, this was supposed to say

+   // parent process
>   class ShadowThebesLayer:
>     gfxSharedImageSurface front;      // X bytes
>     def Paint():
>       CopyFrom(widget <-- front)      // writes ~X bytes
This will cut down on memory usage but also force entirely client-side rendering for BasicThebesLayer.  Posting in case someone wants to try it out.
Assignee: nobody → jones.chris.g
Attachment #472000 - Attachment is obsolete: true
Attachment #472038 - Attachment is obsolete: true
Attachment #472057 - Flags: review?(roc)
Attachment #472038 - Flags: review?(roc)
I wonder if https://wiki.mozilla.org/Gecko:AsyncPluginPainting is relevant here. I guess it's basically the same problem as here.

And don't we want to not be using image surfaces here, at least if we're not sandboxing? On X it seems to me we should be drawing into a pixmap and sharing that with the compositor process. The ultimate solution performance-wise would be "content process draws into X pixmap, compositor process does texture_from_pixmap and composites with GL", right? Then for the next update, the content process would copy that pixmap to a new (or recycled pixmap), update the dirty area, and show that.
Yep, that's the general idea.  For this bug, I just want to get "drawing into X pixmap" and updating parts working.
Able to load nytimes.com, zoom in, and pan around for a while with all this before OOM.  We're still OOMing way earlier than I would expect, but there's some low-hanging fruit: the prefetch region seems to be a bit on the large side, and the displayport seems to get larger on panning with a scale-zoom applied, which causes us to throw out buffers.  For example,

  [OOM] set displayport to <x=-41, y=33, w=450, h=1125>(CSS pixels)
  [OOM] +++ alloc double buffer <w=1636,h=4500> (x2!) ...
  [OOM] ... ok
  [OOM] --- dealloc buffer <w=1636,h=3628>
  [OOM] --- dealloc buffer <w=1636,h=3628>

which seems to be about where I OOM on device.
Posted patch "Rollup" v2 (obsolete) — Splinter Review
Trying to use ImageSurface::Copy directly, previous code may have been hitting a slow path in whatever drawing backend android is using.
Attachment #472730 - Attachment is obsolete: true
Comment on attachment 472057 [details] [diff] [review]
part 0: Don't throw out buffers when scrolling with a resolution applied, v1

+  nsIntSize destBufferDims = ScaledSize(visibleBounds.Size(),
+                                      aXResolution, aYResolution);

Fix indent
Attachment #472057 - Flags: review?(roc) → review+
Good news and bad news.  Good news first: I have a very very rough impl of shadow-layers-using-X-surfaces, and the panning perf is vastly improved on the n900.  Pretty close to 2.0a1 AFAICT, except that my estimate isn't very accurate because ...

Bad news: my patches are triggering an X error after swapping back/front a few (apparently nondeterministic) number of times.  The error is

###!!! ABORT: RenderCreatePicture: RenderBadFormat (invalid PictFormat parameter); 5 requests ago; id=0x54000c6

Hopefully this is my patches doing something dumb.  Otherwise might be an unforeseen synchronization problem.  Will investigate tomorrow.
Works quite well on my n900.  Might blow up on android.  Getting closer to something shippable ...
Attachment #472058 - Attachment is obsolete: true
Attachment #472791 - Attachment is obsolete: true
Attachment #473254 - Attachment is obsolete: true
Attachment #473263 - Attachment is obsolete: true
Trying to set up an android dev env, looks to be time consuming.  If anyone is already set up for profiling, run this patch |MOZ_FT="log.txt" fennec| to see some rough-ish timing info.
Oops, wrong patch.
Attachment #473395 - Attachment is obsolete: true
Posted file Log (obsolete) —
Something is definitely rotten in android land.  We need a consistent testcase to compare apples to apples, but similar-looking operations across android/n900 look to be ~100x slower (~1000x slower for allocs) than on the n900, even taking XSync()s into account.  My kingdom for an oprofile run ...
Status: NEW → UNCONFIRMED
Ever confirmed: false
Oops.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Turns out we were getting the 2-3 orders of magnitude, does-not-compute worse perf on android for the worst possible reason among various candidates, namely that we were mmap()ing non-shmem, real files.  Oops headdesk facepalm etc.  The shared memory story on android is not great from native code.

Regardless, this rollup approximately represents the perf we can expect from SW-composited rendering.  It's really quite respectable, until we OOM ;).

Couple of issues, though.  First, the new patch in this rollup has us allocate shmem from /dev/ashmem (android-added interface), which has the contract that mapped regions can be thrown out whenever there's memory pressure.  (Though there is a notification we can probably listen to.)  This will pretty much guarantee that the fronted will crash when memory is low, but I'm OK with that for b1.  Second, /dev/ashmem isn't forgiving wrt dumb reallocs.  I can't load nytimes.com to completion, apparently because of surface reallocs leading to alloc fail.  This is bad and good though, because it means we'll need to tighten up our surface allocations, which benefits all platforms.
Attachment #473271 - Attachment is obsolete: true
Attachment #473399 - Attachment is obsolete: true
Attachment #473408 - Attachment is obsolete: true
Attachment #473410 - Attachment is obsolete: true
(In reply to comment #13)
> Comment on attachment 472057 [details] [diff] [review]
> part 0: Don't throw out buffers when scrolling with a resolution applied, v1
> 
> +  nsIntSize destBufferDims = ScaledSize(visibleBounds.Size(),
> +                                      aXResolution, aYResolution);
> 
> Fix indent

Done.
Attachment #473877 - Flags: superreview? → superreview?(shaver)
Attachment #473878 - Flags: superreview? → superreview?(shaver)
Attachment #473879 - Flags: superreview? → superreview?(shaver)
Attachment #473880 - Flags: superreview? → superreview?(shaver)
Comment on attachment 473876 [details] [diff] [review]
part -1: ThebesLayerBuffer::SetBuffer needs to hold its ref to the old buffer

+    nsRefPtr<gfxASurface> tmp = mBuffer;

Make it mBuffer.forget().
Attachment #473876 - Flags: review?(roc) → review+
+  const nsIntSize& BufferDims() const { return mBufferDims; }
   const nsIntRect& BufferRect() const { return mBufferRect; }
   const nsIntPoint& BufferRotation() const { return mBufferRotation; }

We need comments explaining what these are and particularly why BufferDims is different from BufferRect().Size()

I don't really understand what part 1 is trying to do. BasicShadowableThebesLayer::CreateBuffer is creating three buffers now?
Comment on attachment 473878 [details] [diff] [review]
part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to share an Xlib surface to another process

>+    : mId(0)

"mId(None)" would make it clearer that this is an id.

>+    , mSize(0, 0)

This may not be necessary.

>+  bool operator==(const SurfaceDescriptorX11& aOther) const {
>+    // XXX the semantics of == for this are open to debate.  Just do
>+    // something fast and simple for now.
>+    return mId == aOther.mId;
>+  }

That looks fine to me.
If one surface has different sizes or formats something is wrong.

>+  XRenderPictFormat mFormat;

I'd use XRenderPictFormat* here.  The id is the only part that needs to be
sent on the wire.  XRenderPictFormat is usually a pointer to a long-living
struct on the Display.  I'm guessing that temporary XRenderPictFormats are
likely to cause problems (as you alluded to in your comments).

Serialize XRenderPictFormat* by just writing PictFormat .id;
I'd change
 XRenderPictFormat*
 GetXlibPointerToFormat(Display* aDisplay, const XRenderPictFormat* aFormat)
to
 XRenderPictFormat*
 GetXlibPointerFromFormat(Display* aDisplay, const PictFormat aFormat)
and use it to deserialize back to the pointer.
(In reply to comment #36)
> Comment on attachment 473876 [details] [diff] [review]
> part -1: ThebesLayerBuffer::SetBuffer needs to hold its ref to the old buffer
> 
> +    nsRefPtr<gfxASurface> tmp = mBuffer;
> 
> Make it mBuffer.forget().

Done.
(In reply to comment #37)
> +  const nsIntSize& BufferDims() const { return mBufferDims; }
>    const nsIntRect& BufferRect() const { return mBufferRect; }
>    const nsIntPoint& BufferRotation() const { return mBufferRotation; }
> 
> We need comments explaining what these are and particularly why BufferDims is
> different from BufferRect().Size()
> 

OK.

> I don't really understand what part 1 is trying to do.
> BasicShadowableThebesLayer::CreateBuffer is creating three buffers now?

It was *previously* creating three buffers, just out of simplicity.  This patch makes it only create two buffers.  The old scheme was

  ShadowableThebes { mBuffer /*"real" buffer*/; mBackBuffer };
  ShadowThebes { mFrontBuffer; };

  ShadowableThebes::Paint
    Paint(mBuffer);
    Copy(mBackBuffer, mBuffer);
    -->Update(mBackBuffer)
== IPC ==
  newBackBuffer ShadowThebes::Update(newFrontBuffer)
    newBack = mFrontBuffer
    mFrontBuffer = newFrontBuffer
    return newBack

after which the ShadowableThebes would set |newBack| to be its mBackBuffer.

In these patches, we make ShadowableThebes render directly into the back buffer.  That is, the "real" buffer becomes the back buffer.  Then when the compositor processes a transaction, we copy the ShadowableThebes's back buffer contents into its ShadowThebes's front buffer.  (There's a swap additionally, but that's just for generality.  It's essentially a no-op.  Also currently, we copy the entire back buffer to the front buffer on updates.)

This back->front dirty-copy on update isn't always going to be what we want, but I'd like to defer figuring out what to do there to a followup bug where we can make a decision with hard data in hand.
(In reply to comment #38)
> Comment on attachment 473878 [details] [diff] [review]
> part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to
> share an Xlib surface to another process
> 
> >+    : mId(0)
> 
> "mId(None)" would make it clearer that this is an id.
> 

Makes sense.

> >+    , mSize(0, 0)
> 
> This may not be necessary.
> 

That's a good idea.  I'll let valgrind complain when these are used inappropriately.  That will do away with the mId() initialization above.

> >+  bool operator==(const SurfaceDescriptorX11& aOther) const {
> >+    // XXX the semantics of == for this are open to debate.  Just do
> >+    // something fast and simple for now.
> >+    return mId == aOther.mId;
> >+  }
> 
> That looks fine to me.
> If one surface has different sizes or formats something is wrong.
> 
> >+  XRenderPictFormat mFormat;
> 
> I'd use XRenderPictFormat* here.  The id is the only part that needs to be
> sent on the wire.  XRenderPictFormat is usually a pointer to a long-living
> struct on the Display.  I'm guessing that temporary XRenderPictFormats are
> likely to cause problems (as you alluded to in your comments).
> 

OK.  Guess this is the "Xlib way" ;).

> Serialize XRenderPictFormat* by just writing PictFormat .id;
> I'd change
>  XRenderPictFormat*
>  GetXlibPointerToFormat(Display* aDisplay, const XRenderPictFormat* aFormat)
> to
>  XRenderPictFormat*
>  GetXlibPointerFromFormat(Display* aDisplay, const PictFormat aFormat)
> and use it to deserialize back to the pointer.

Makes sense, thanks.
(In reply to comment #40)
> This back->front dirty-copy on update isn't always going to be what we want,
> but I'd like to defer figuring out what to do there to a followup bug where we
> can make a decision with hard data in hand.

I thought about this a bit more and I think swapping is pretty much always going to win over dirty-copy.  The cases where dirty copy would win are probably edge enough we don't need to worry about them.  I'll implement this as part 9.
Rob, I'm not sure I answered all the questions you have (missed you on IRC), but this adds the comments you requested.
Attachment #473877 - Attachment is obsolete: true
Attachment #473953 - Flags: superreview?
Attachment #473953 - Flags: review?(roc)
Attachment #473877 - Flags: superreview?(shaver)
Attachment #473877 - Flags: review?(roc)
Comment on attachment 473953 [details] [diff] [review]
part 1: Only use back/front buffers for BasicThebesLayer and update back->front in the compositor process, v2

Gum, "shaver@mozilla", not ":shaver".
Attachment #473953 - Flags: superreview? → superreview?(shaver)
This patch gets us to the following "swap" painting model

  Child                                       Parent
 -------                                     --------
  Paint(region):
    ThebesCallback(backBuffer, region)
    Update(backBuffer, region)
                    ========== IPC =========> Swap(newFront, updatedRegion, out newBack):
                                                frontBuffer.validRegion.invalidate(updatedRegion)
                                                *newBack = frontBuffer
                    <--------- reply --------   frontBuffer = newFront
    SetBackBuffer(newBack, newValidRegion)
      backBuffer = newBack
      validRegion = newValidRegion

On each update, the child swaps its back buffer for the parent's old front buffer, and invalidates the content that was painted into the back buffer since the last swap.

To improve on this model, karl's suggestion in bug 556487 comment 113, we're going to need to revive the patches for https://wiki.mozilla.org/IPDL/Proposal:Shmem_access_control or toss out the current Shmem ownership model in favor of willy-nilly access.  Probably best to wait for data on this.
Attachment #473992 - Flags: superreview?(shaver)
Attachment #473992 - Flags: review?(roc)
Comment on attachment 473954 [details] [diff] [review]
part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to share an Xlib surface to another process, v2

>+namespace mozilla { namespace layers {
>+struct SurfaceDescriptorX11 {
>+  bool operator==(const SurfaceDescriptorX11&) const { return false; }
>+};
>+} }

I guess the default operator== for an empty struct is also inline but
returns true?   I'm curious if/why the variant is needed.  

>+  int screen = DefaultScreen(display);
>+
>+  XRenderPictFormat* format = GetXRenderPictFormatFromId(display, mFormat);
>+  nsRefPtr<gfxXlibSurface> surf =
>+    new gfxXlibSurface(XScreenOfDisplay(display, screen),

I didn't know about *X*ScreenOfDisplay.  Looks like it's a function that does the same as the ScreenOfDisplay macro.
The macro might be a little less code.  Here you can 

  Screen *screen = DefaultScreenOfDisplay(display);

    new gfxXlibSurface(screen, ...

(which should expand to the same code as the macro, only a slightly different order.)
Attachment #473954 - Flags: review?(karlt) → review+
Comment on attachment 473884 [details] [diff] [review]
part 7: Add gfxXlibSurface::ReleasePixmap() to undo TakePixmap()

Looks good to me, but should check with a gfx peer.
Attachment #473884 - Flags: review?(karlt)
Attachment #473884 - Flags: review?(jmuizelaar)
Attachment #473884 - Flags: review+
Comment on attachment 473954 [details] [diff] [review]
part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to share an Xlib surface to another process, v2

>+XRenderPictFormat*
>+GetXRenderPictFormatFromId(Display* aDisplay, PictFormat aFormatId)

This can have static linkage.
(In reply to comment #47)
> Comment on attachment 473954 [details] [diff] [review]
> part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to
> share an Xlib surface to another process, v2
> 
> >+namespace mozilla { namespace layers {
> >+struct SurfaceDescriptorX11 {
> >+  bool operator==(const SurfaceDescriptorX11&) const { return false; }
> >+};
> >+} }
> 
> I guess the default operator== for an empty struct is also inline but
> returns true?   I'm curious if/why the variant is needed.  
> 

C++-style structs don't have a default operator==.

> >+  int screen = DefaultScreen(display);
> >+
> >+  XRenderPictFormat* format = GetXRenderPictFormatFromId(display, mFormat);
> >+  nsRefPtr<gfxXlibSurface> surf =
> >+    new gfxXlibSurface(XScreenOfDisplay(display, screen),
> 
> I didn't know about *X*ScreenOfDisplay.  Looks like it's a function that does
> the same as the ScreenOfDisplay macro.
> The macro might be a little less code.  Here you can 
> 
>   Screen *screen = DefaultScreenOfDisplay(display);
> 
>     new gfxXlibSurface(screen, ...
> 
> (which should expand to the same code as the macro, only a slightly different
> order.)

Will fix, thanks.
(In reply to comment #49)
> Comment on attachment 473954 [details] [diff] [review]
> part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to
> share an Xlib surface to another process, v2
> 
> >+XRenderPictFormat*
> >+GetXRenderPictFormatFromId(Display* aDisplay, PictFormat aFormatId)
> 
> This can have static linkage.

Oh right, I think this was lost in the shuffle.  Thanks for the catch.
Comment on attachment 473880 [details] [diff] [review]
part 4: Put support in place for allocating platform-specific backing buffers for IPC layers, and synchronizing properly during transactions (i.e. XSync() on X11). The platform-specific interfaces jus

>+   * Platform-specific buffers.  These default to using
>+   * Shmem/gfxSharedImageSurface.

Perhaps "In the absence of platform-specific buffers these fall back to
Shmem/gfxSharedImageSurface."

>+#if !defined(MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS)

I'd be tempted to make these inline and put them in the header file.

>   Frame()->ShadowLayersUpdated();
> 
>+  ShadowLayerManager::PlatformSyncBeforeReplyUpdate();

I don't know what you have planned for ShadowLayersUpdated, but if that is
going to do some operations that don't need to be synced here, it may be
better to only sync the necessary operations by calling PlatformSyncBeforeReplyUpdate before ShadowLayersUpdated.
Attachment #473880 - Flags: review?(karlt) → review+
Comment on attachment 473885 [details] [diff] [review]
part 8: Share Xlib surfaces across processes on X11 platforms

>+      NS_ERROR("not reached");

NS_NOTREACHED, I guess.

>+  NS_ABORT_IF_FALSE(xrenderFormat, "should have a render format by now");

This would be hit on displays without the render extension.  I don't know
whether we care but it would be simple to return NULL here to make
PlatformAllocDoubleBuffer return FALSE, and fall back to
gfxSharedImageSurface.
Attachment #473885 - Flags: review?(karlt) → review+
Comment on attachment 473885 [details] [diff] [review]
part 8: Share Xlib surfaces across processes on X11 platforms

>+static PRBool
>+TakeAndDestroyXlibSurface(SurfaceDescriptor* aSurface)
>+{
>+  nsRefPtr<gfxXlibSurface> surf =
>+    aSurface->get_SurfaceDescriptorX11().OpenForeign();
>+  surf->TakePixmap();

XFreePixmap(DefaultXDisplay(), aSurface->get_SurfaceDescriptorX11().mId)
would be simpler and more efficient.
But I see the symmetry of gfxXlibSurface Create/Destroy,
so up to you.
Attachment #473879 - Flags: superreview?(shaver) → superreview+
Comment on attachment 473880 [details] [diff] [review]
part 4: Put support in place for allocating platform-specific backing buffers for IPC layers, and synchronizing properly during transactions (i.e. XSync() on X11). The platform-specific interfaces jus

A description in the code of what the expected protocol/pattern is for calling the various update routines wouldn't be excessive, even though they are internal to the implementation.  I had to chase a bunch of code to figure out what the assumptions were about when they'd be called, and I expect that future srs will have to do similarly.
Attachment #473880 - Flags: superreview?(shaver) → superreview+
Comment on attachment 473954 [details] [diff] [review]
part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to share an Xlib surface to another process, v2

I don't like the XXX in SurfaceDescriptorX11::operator==, but I'm not sure with what to replace it.  Perhaps a short framing of the debate, or at least what the basis of the comparison decision was?  (It seems to be "this works well enough for now, and is fast", which is just fine with me, but then I don't think it needs a XXX.)
Attachment #473954 - Flags: superreview?(shaver) → superreview+
Comment on attachment 473953 [details] [diff] [review]
part 1: Only use back/front buffers for BasicThebesLayer and update back->front in the compositor process, v2

CAVEAT EMPTOR in SetBackingBuffer would like an assertion defending it, I think.  Going from "rendering glitch" to "buffer size mismatch" will be a pretty painful debugging process without at least that assistance, I think.

+  // We hand this out to our ThebesLayerBuffer, but hold an extra ref

I *think* you mean "we transfer ownership of mBackBuffer", in which case that would probably be clearer.  (If you mean something *else*, then it should probably be clearer too!)

I would like also to draw your attention to "XXX error handling", for which we should perhaps have an hg hook. :-P  Pre-existing, but we don't have a Good Samaritan rule here!
Attachment #473953 - Flags: superreview?(shaver) → superreview+
Comment on attachment 473992 [details] [diff] [review]
part 9: When updating thebes layers, swap out back/front buffers and invalidate the newly-painted region on the old front buffer

Oh look, my previous review comments addressed before I made them. Ahem, sorry for not checking that more thoroughly, I guess.
Attachment #473992 - Flags: superreview?(shaver) → superreview+
(In reply to comment #52)
> Comment on attachment 473880 [details] [diff] [review]
> part 4: Put support in place for allocating platform-specific backing buffers
> for IPC layers, and synchronizing properly during transactions (i.e. XSync() on
> X11). The platform-specific interfaces jus
> 
> >+   * Platform-specific buffers.  These default to using
> >+   * Shmem/gfxSharedImageSurface.
> 
> Perhaps "In the absence of platform-specific buffers these fall back to
> Shmem/gfxSharedImageSurface."
> 

Sounds good, thanks,

> >+#if !defined(MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS)
> 
> I'd be tempted to make these inline and put them in the header file.
> 

Sure.

> >   Frame()->ShadowLayersUpdated();
> > 
> >+  ShadowLayerManager::PlatformSyncBeforeReplyUpdate();
> 
> I don't know what you have planned for ShadowLayersUpdated, but if that is
> going to do some operations that don't need to be synced here, it may be
> better to only sync the necessary operations by calling
> PlatformSyncBeforeReplyUpdate before ShadowLayersUpdated.

After part 1, the content process draws directly into a back buffer, then sends the back buffer to the browser.  The browser copies back->front, and then sends back a new back buffer for content.  Before returning from this RecvUpdate() method in the browser, we need to ensure that the back->front copy has finished, or the content process might start scribbling over memory in the new back buffer that's still being copied to the new front buffer.

With part 9, we stop doing the update-copy and don't need the XSync() here for thebes layer.  I sort of lean towards leaving the XSync() in for future-proofing, of course taking it out if it lags the UI, but I'm open to ideas.
(In reply to comment #59)
> (In reply to comment #52)
> > >+#if !defined(MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS)
> > 
> > I'd be tempted to make these inline and put them in the header file.
> > 
> 
> Sure.
> 

Hm, looked into this but it require pulling potentially lot of headers before we find out !defined(MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS).  Would prefer to leave these in .cpp if you're OK with that.
(In reply to comment #54)
> Comment on attachment 473885 [details] [diff] [review]
> part 8: Share Xlib surfaces across processes on X11 platforms
> 
> >+static PRBool
> >+TakeAndDestroyXlibSurface(SurfaceDescriptor* aSurface)
> >+{
> >+  nsRefPtr<gfxXlibSurface> surf =
> >+    aSurface->get_SurfaceDescriptorX11().OpenForeign();
> >+  surf->TakePixmap();
> 
> XFreePixmap(DefaultXDisplay(), aSurface->get_SurfaceDescriptorX11().mId)
> would be simpler and more efficient.
> But I see the symmetry of gfxXlibSurface Create/Destroy,
> so up to you.

Hmm .... I think I prefer keeping this in gfxXlibSurface.
(In reply to comment #55)
> Comment on attachment 473880 [details] [diff] [review]
> part 4: Put support in place for allocating platform-specific backing buffers
> for IPC layers, and synchronizing properly during transactions (i.e. XSync() on
> X11). The platform-specific interfaces jus
> 
> A description in the code of what the expected protocol/pattern is for calling
> the various update routines wouldn't be excessive, even though they are
> internal to the implementation.  I had to chase a bunch of code to figure out
> what the assumptions were about when they'd be called, and I expect that future
> srs will have to do similarly.

I added a long-ish comment in ShadowLayers.h that describes a bit more how the ShadowForwarder/IPDL/ShadowManager interaction works.  I suspect we're probably still lacking in docs though, will look at putting together something better, perhaps using multiple media!
(In reply to comment #56)
> Comment on attachment 473954 [details] [diff] [review]
> part 2: Add a SurfaceDescriptorX11 datatype that abstracts what's needed to
> share an Xlib surface to another process, v2
> 
> I don't like the XXX in SurfaceDescriptorX11::operator==, but I'm not sure with
> what to replace it.  Perhaps a short framing of the debate, or at least what
> the basis of the comparison decision was?  (It seems to be "this works well
> enough for now, and is fast", which is just fine with me, but then I don't
> think it needs a XXX.)

Changed this to

    // Define == as two descriptors having the same XID for now,
    // ignoring size and render format.  If the two indeed refer to
    // the same valid XID, then size/format are "actually" the same
    // anyway, regardless of the values of the fields in
    // SurfaceDescriptorX11.
(In reply to comment #57)
> Comment on attachment 473953 [details] [diff] [review]
> part 1: Only use back/front buffers for BasicThebesLayer and update back->front
> in the compositor process, v2
> 
> CAVEAT EMPTOR in SetBackingBuffer would like an assertion defending it, I
> think.  Going from "rendering glitch" to "buffer size mismatch" will be a
> pretty painful debugging process without at least that assistance, I think.
> 

Yes, good point.  When I wrote this patch we didn't have the ability to assert the sizes were the same, but we now do thanks to Oleg.

> +  // We hand this out to our ThebesLayerBuffer, but hold an extra ref
> 
> I *think* you mean "we transfer ownership of mBackBuffer", in which case that
> would probably be clearer.  (If you mean something *else*, then it should
> probably be clearer too!)
> 

Updated.

> I would like also to draw your attention to "XXX error handling", for which we
> should perhaps have an hg hook. :-P  Pre-existing, but we don't have a Good
> Samaritan rule here!

Heh, yeah.  It's still not clear to me what we should do when we fail to allocate these buffers.  Most other code in this area seems to silently ignore, which is OK I guess.  I would prefer to postpone this debate.
(In reply to comment #58)
> Comment on attachment 473992 [details] [diff] [review]
> part 9: When updating thebes layers, swap out back/front buffers and invalidate
> the newly-painted region on the old front buffer
> 
> Oh look, my previous review comments addressed before I made them. Ahem, sorry
> for not checking that more thoroughly, I guess.

No, your comment is still valid.  It's expected that the old size matches the new size.  If the buffer attributes (rect/rotation) are garbled or we invalidate incorrectly, we'll still get glitches.
Posted patch RollupSplinter Review
Attachment #473472 - Attachment is obsolete: true
(In reply to comment #60)
> Hm, looked into this but it require pulling potentially lot of headers before
> we find out !defined(MOZ_HAVE_PLATFORM_SPECIFIC_LAYER_BUFFERS).  Would prefer
> to leave these in .cpp if you're OK with that.

Yes, sure.
The compiler should be able to work out most of the optimization anyway.
Attachment #473883 - Flags: superreview?(roc) → superreview+
Attachment #473884 - Flags: superreview?(roc) → superreview+
In part 7, it might actually make sense to have ReleasePixmap return the pixmap, so we have one getter that doesn't transfer ownership and another getter that does.
Sounds good.  Done.
Attachment #473884 - Flags: review?(jmuizelaar) → review+
Comment on attachment 473953 [details] [diff] [review]
part 1: Only use back/front buffers for BasicThebesLayer and update back->front in the compositor process, v2

+   * There are two ways they may differ: first, if a ThebesLayer is
+   * drawn at a non-1.0 resolution, and second, if our buffer rect
+   * shrunk and we re-used the old buffer instead of allocating a new
+   * one.

mBufferDims isn't needed for the second case, right? Because we can just make the valid region be smaller than mBufferRect. So at least we should change the comment here.
Attachment #473953 - Flags: review?(roc) → review+
(In reply to comment #59)
> (In reply to comment #52)
> > Comment on attachment 473880 [details] [diff] [review] [details]
> > >   Frame()->ShadowLayersUpdated();
> > > 
> > >+  ShadowLayerManager::PlatformSyncBeforeReplyUpdate();
> > 
> > I don't know what you have planned for ShadowLayersUpdated, but if that is
> > going to do some operations that don't need to be synced here, it may be
> > better to only sync the necessary operations by calling
> > PlatformSyncBeforeReplyUpdate before ShadowLayersUpdated.
> 

I missed the point Karl was making here (and he pointed that even after part 9 we need to sync here to finish previous compositing operations).  In the current implementation, ShadowLayersUpdate() just ends up calling nsIFrame::Invalidate(), so it doesn't really matter where we sync.  But it seems better to sync before we call out, just after the shadow-layers update.  Will fix.
(In reply to comment #70)
> Comment on attachment 473953 [details] [diff] [review]
> part 1: Only use back/front buffers for BasicThebesLayer and update back->front
> in the compositor process, v2
> 
> +   * There are two ways they may differ: first, if a ThebesLayer is
> +   * drawn at a non-1.0 resolution, and second, if our buffer rect
> +   * shrunk and we re-used the old buffer instead of allocating a new
> +   * one.
> 
> mBufferDims isn't needed for the second case, right? Because we can just make
> the valid region be smaller than mBufferRect. So at least we should change the
> comment here.

Hm, yes.  OK.
You need to log in before you can comment on or make changes to this bug.