Closed Bug 649417 Opened 13 years ago Closed 13 years ago

Do YUV -> RGB conversion on the GPU on mobile

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: mattwoodrow, Assigned: mattwoodrow)

References

Details

Attachments

(2 files, 5 obsolete files)

      No description provided.
Attachment #525444 - Flags: review?(jones.chris.g)
Forgot to qref some of my cleanups.
Attachment #525467 - Attachment is obsolete: true
Attachment #525467 - Flags: review?(jones.chris.g)
Attachment #526098 - Flags: review?(jones.chris.g)
Why we can't just add YUVImage to SurfaceDescriptor union?
also see bug https://bugzilla.mozilla.org/show_bug.cgi?id=610155 which is switching imageLayer to use Surface Descriptor...
Also I think it make sense to create BasicPlanarYCbCrImage storing Shmem directly, instead of using temporary mData (remove one memcopy).
+ with 634557 fixed we can speedup software version, by doing direct gfx::ScaleYCbCrToRGB565 into context in BasicShadowImageLayer::Paint
(In reply to comment #4)
> Why we can't just add YUVImage to SurfaceDescriptor union?
> also see bug https://bugzilla.mozilla.org/show_bug.cgi?id=610155 which is
> switching imageLayer to use Surface Descriptor...

SurfaceDescriptor identifies a shareable gfxASurface.  YUV images can't become gfxASurfaces (and I'm told by Jeff and Bas that they don't belong in 2D drawing APIs).  I think it makes sense to have another Image-specific type.  You and Matt will have to race to see who rots the other :).  Bug 610155 is still useful, but the SurfaceDescriptor would be a variant alongside the YUVImage being added here.

(In reply to comment #5)
> + with 634557 fixed we can speedup software version, by doing direct
> gfx::ScaleYCbCrToRGB565 into context in BasicShadowImageLayer::Paint

The current plan is to do that in the content process, once.  Lazier YUV->RGB conversion should only win if we're dropping a lot of frames.  But we can measure when the time comes.
Comment on attachment 525444 [details] [diff] [review]
Part 1: Create a SharedImage ipdl union and pass this to ShadowImageLayer::Swap

>diff --git a/gfx/layers/ipc/ShadowLayers.h b/gfx/layers/ipc/ShadowLayers.h

>+  virtual PRBool Init(SharedImage aImage, const nsIntSize& aSize) = 0;
>+  Swap(SharedImage aImage) = 0;

|const SharedImage&| for both of these.

> 
>   /**
>    * CONSTRUCTION PHASE ONLY
>    *
>    * Destroy the current front buffer.
>    */
>   virtual void DestroyFrontBuffer() = 0;
> 
>diff --git a/gfx/layers/ipc/ShadowLayersParent.cpp b/gfx/layers/ipc/ShadowLayersParent.cpp

>+      NS_ASSERTION(newFront.type() == SharedImage::TShmem, "PANIC!");

This assertion isn't necessary.

r=me with those fixed.
Attachment #525444 - Flags: review?(jones.chris.g) → review+
(In reply to comment #6)
> The current plan is to do that in the content process, once.  Lazier YUV->RGB
> conversion should only win if we're dropping a lot of frames.  But we can
> measure when the time comes.

I don't think this is quite true. Right now we have to drag the whole image through cache to convert+scale, and then drag it through cache a second time to composite. See the profiling data in bug 634557 comment 75. The only time lazier conversion should not be a win is if we have to composite the same data more than once (i.e., a paused video, or something animating faster than the video framerate). It should be easy to detect that and fall back to the current behavior (except in the chrome process).

This would also put decode and conversion on separate threads (by the nature of having them happen in separate processes), which should be a _huge_ win for the new dual-core A9 chips.
(In reply to comment #8)
> (In reply to comment #6)
> > The current plan is to do that in the content process, once.  Lazier YUV->RGB
> > conversion should only win if we're dropping a lot of frames.  But we can
> > measure when the time comes.
> 
> I don't think this is quite true. Right now we have to drag the whole image
> through cache to convert+scale, and then drag it through cache a second time to
> composite. See the profiling data in bug 634557 comment 75.

Hm, those results aren't particularly clear to me.  Are you suggesting a convert+scale->temporary, composite temporary->window implementation or one that would convert+scale+composite->window row-by-row?  It's not obvious to me that the former would have sufficiently better cache usage to offset the convert+scale on the critical painting path, but I guess that's what profilers are for :).  Latter is interesting.

Note too that there's a known unnecessary memcpy() in the current implementation.  We should fix that, but I've been waiting on bug 598868 which will make things easier.

> The only time
> lazier conversion should not be a win is if we have to composite the same data
> more than once (i.e., a paused video, or something animating faster than the
> video framerate). It should be easy to detect that and fall back to the current
> behavior (except in the chrome process).

That's not a problem if we convert+scale->temporary; we can just hold on to the temporary with some heuristics for letting it go.  It would be a problem if we convert+scale+composite in one pass, but what you suggest sounds fine.

> This would also put decode and conversion on separate threads (by the nature of
> having them happen in separate processes), which should be a _huge_ win for the
> new dual-core A9 chips.

Good point, for the current implementation.  We'll get lot more flexibility here when bug 592833 lands.
Comment on attachment 526098 [details] [diff] [review]
Part 2: Add a YUV option to SharedImage and use it to share YUV data across processes v2

>diff --git a/gfx/layers/ImageLayers.h b/gfx/layers/ImageLayers.h

>+  /**
>+   * Ask this Image to not convert YUV to RGB during SetData, and make
>+   * the original data available through GetData. This is optional,
>+   * and not all PlanaerYCbCrImages will support it.
>+   */
>+  virtual void SetDelayedConversion(PRBool aDelayed) { };

Stray semicolon.  (Burns maemo gcc.)

>+
>+  /**
>+   * Grab the original YUV data. This is optional.
>+   */
>+  virtual const Data* GetData() { return nsnull; }

>+
> protected:
>   PlanarYCbCrImage(void* aImplData) : Image(aImplData, PLANAR_YCBCR) {}
> };
> 
> /**
>  * Currently, the data in a CairoImage surface is treated as being in the
>  * device output color space.
>  */

>diff --git a/gfx/layers/basic/BasicImages.cpp b/gfx/layers/basic/BasicImages.cpp

>+  const Data* GetData()
>+  {
>+    return &mData;
>+  }
>+

Nit: style here would put the function body on the same line as its
signature.

>   void SetOffscreenFormat(gfxImageFormat aFormat) { mOffscreenFormat = aFormat; }
>   gfxImageFormat GetOffscreenFormat() { return mOffscreenFormat; }
> 
> protected:
>   nsAutoArrayPtr<PRUint8>              mBuffer;
>   nsCountedRef<nsMainThreadSurfaceRef> mSurface;
>   gfxIntSize                           mScaleHint;
>   PRInt32                              mStride;
>   gfxImageFormat                       mOffscreenFormat;
>+  Data                                 mData;
>+  PRUint32                             mBufferSize;
>+  PRBool                               mDelayedConversion;

PRPackedBool.

> void
> BasicPlanarYCbCrImage::SetData(const Data& aData)

I hope that Joe can review this.

>+  // XXX: If we forced delayed conersion, are we ever going to hit this?
>+  // We may need to implement the conversion here.
>+  if (!mBuffer || mDelayedConversion) {

We'll hit this if something in the content process does a drawWindow()
with USE_WIDGET_LAYERS.  Supporting this is somewhat intricate; we
need to only GetCurrentAsSurface() when the manager is
!mUsingDefaultTarget.  I think I'd be OK with fixing this in a
followup bug (if this patch doesn't cause any tests to fail ;).
Please file if so.

>diff --git a/gfx/layers/ipc/PLayers.ipdl b/gfx/layers/ipc/PLayers.ipdl

>+struct OpYUVBufferSwap { 
>+  PLayer layer;
>+  SurfaceDescriptor newYBuffer;
>+  SurfaceDescriptor newUBuffer;
>+  SurfaceDescriptor newVBuffer;
>+};

  struct OpImageSwap { PLayer layer; SharedImage newBackImage; }

makes more sense here.


>diff --git a/gfx/layers/ipc/ShadowLayers.h b/gfx/layers/ipc/ShadowLayers.h

>   void CreatedImageBuffer(ShadowableLayer* aImage,
>                           nsIntSize aSize,
>                           gfxSharedImageSurface* aInitialFrontSurface);
>+  void CreatedYUVImageBuffer(ShadowableLayer* aImage,
>+                             nsIntSize aSize,
>+                             gfxSharedImageSurface* aYChannel,
>+                             gfxSharedImageSurface* aUChannel,
>+                             gfxSharedImageSurface* aVChannel);

Unify this and the above under

  void CreatedImageBuffer(..., const SharedImage&);

>   void PaintedImage(ShadowableLayer* aImage,
>                     gfxSharedImageSurface* aNewFrontSurface);
>+  void PaintedYUVImage(ShadowableLayer* aImage,
>+                       gfxSharedImageSurface* aYChannel,
>+                       gfxSharedImageSurface* aUChannel,
>+                       gfxSharedImageSurface* aVChannel);

  PaintedImage(..., const SharedImage&);

>+   struct BufferData
>+   {
>+      nsRefPtr<gfxSharedImageSurface> mBufferOne;
>+      nsRefPtr<gfxSharedImageSurface> mBufferTwo;
>+      nsRefPtr<gfxSharedImageSurface> mBufferThree;

This isn't necessary.  Remove it.

>-  virtual already_AddRefed<gfxSharedImageSurface>
>+  virtual BufferData
>   Swap(SharedImage aImage) = 0;

Param should be |const SharedImage&|.  Return a |SharedImage|.

>diff --git a/gfx/layers/opengl/ImageLayerOGL.cpp b/gfx/layers/opengl/ImageLayerOGL.cpp

Joe needs to review the GL stuff here.

>diff --git a/gfx/layers/opengl/ImageLayerOGL.h b/gfx/layers/opengl/ImageLayerOGL.h

>+  // We only ever use one of these at a time.
>   nsRefPtr<TextureImage> mTexImage;
>+  GLTexture mYUVTexture[3];

Would it make more sense to have three TextureImages?  Joe should
weigh in.

>   // XXX FIXME holding to free
>   nsRefPtr<gfxSharedImageSurface> mDeadweight;
>-
>+  nsRefPtr<gfxSharedImageSurface> mDeadweight2;
>+  nsRefPtr<gfxSharedImageSurface> mDeadweight3;

Oy vey, yuck.  We shouldn't need these anymore because we don't
double-buffer in GL.  I'd be happy if you removed them here but we
could do that in another bug too.

> 
> };
> 
> } /* layers */
> } /* mozilla */
> #endif /* GFX_IMAGELAYEROGL_H */

>diff --git a/gfx/src/nsRect.cpp b/gfx/src/nsRect.cpp

>+  //NS_ASSERTION((IsFloatInteger(aXMult) || IsFloatInteger(1/aXMult)) &&
>+  //             (IsFloatInteger(aYMult) || IsFloatInteger(1/aYMult)),
>+  //             "Multiplication factors must be integers or 1/integer");

???

>diff --git a/gfx/thebes/GLContextProviderEGL.cpp b/gfx/thebes/GLContextProviderEGL.cpp

Whups.
Attachment #526098 - Flags: review?(jones.chris.g)
Attachment #526098 - Flags: review?(joe)
Attachment #526098 - Flags: review-
(In reply to comment #10)
> >-  virtual already_AddRefed<gfxSharedImageSurface>
> >+  virtual BufferData
> >   Swap(SharedImage aImage) = 0;
> 
> Param should be |const SharedImage&|.  Return a |SharedImage|.

Actually this should be

  void Swap(const SharedImage&, SharedImage*);
Fixed review comments. Carrying forward r=cjones
Attachment #525444 - Attachment is obsolete: true
Attachment #527453 - Flags: review+
Fixed review comments and removed mystery hunks from other patches.
Attachment #526098 - Attachment is obsolete: true
Attachment #526098 - Flags: review?(joe)
Attachment #527454 - Flags: review?(jones.chris.g)
Attachment #527454 - Flags: review?(joe)
Comment on attachment 527454 [details] [diff] [review]
Part 2: Add a YUV option to SharedImage and use it to share YUV data across processes v3

I'm not a huge fan of the surface1/2/3 stuff but don't have better
suggestions.  Maybe Joe does.

> protected:
>   nsRefPtr<Image> mImage;
>   gfxIntSize mScaleHint;
>   gfxImageFormat mOffscreenFormat;
>+  PRBool mDelayed;

PRPackedBool here too.

>+      BasicManager()->CreatedImageBuffer(BasicManager()->Hold(this),
>+                                         nsIntSize(mSize.width, mSize.height),
>+                                         SharedImage(yuv));

>+    BasicManager()->PaintedImage(BasicManager()->Hold(this),
>+                                 SharedImage(yuv));

>     BasicManager()->CreatedImageBuffer(BasicManager()->Hold(this),
>                                        nsIntSize(mSize.width, mSize.height),
>-                                       tmpFrontSurface);
>+                                       SharedImage(tmpFrontSurface->GetShmem()));

>   BasicManager()->PaintedImage(BasicManager()->Hold(this),
>-                               mBackSurface);
>+                               SharedImage(mBackSurface->GetShmem()));

>+  *aNewBack = SharedImage(mFrontSurface->GetShmem());

Forgot to mention it on first pass, but at all these callsites you can
use the constructors and operator=()s generated for the variants of
SharedImage and save some code.  For example, both

   BasicManager()->PaintedImage(BasicManager()->Hold(this), yuv);

and

  *aNewBack = mFrontSurface->GetShmem();

will Just Work.

r=me with that.
Attachment #527454 - Flags: review?(jones.chris.g) → review+
Joe: review ping, I'd quite like to land this soon :)
Comment on attachment 527454 [details] [diff] [review]
Part 2: Add a YUV option to SharedImage and use it to share YUV data across processes v3

Review of attachment 527454 [details] [diff] [review]:

I like it. Make sure you address cjones' comment 14 too, though.

::: gfx/layers/ImageLayers.h
@@ +423,5 @@
 
+  /**
+   * Ask this Image to not convert YUV to RGB during SetData, and make
+   * the original data available through GetData. This is optional,
+   * and not all PlanaerYCbCrImages will support it.

Planar (typo)

::: gfx/layers/basic/BasicImages.cpp
@@ +141,5 @@
 };
 
 void
 BasicPlanarYCbCrImage::SetData(const Data& aData)
 {

The best part of this implementation is that I already reviewed it in bug 590735 :)

It'd be nice if we could, at some point, factor this out to be shared among implementations.

@@ +190,5 @@
+    }
+    mData.mYSize = aData.mPicSize;
+    mData.mYStride = mData.mYSize.width;
+    mBufferSize = mData.mCbCrStride * mData.mCbCrSize.height * 2 +
+                mData.mYStride * mData.mYSize.height;

vertically align mData here

@@ +195,5 @@
+    mBuffer = new PRUint8[mBufferSize];
+
+    if (!mBuffer) {
+      return;
+    }

New is infallible; if you want a fallible new, you have to explicitly ask for it.

@@ +321,5 @@
     nsRefPtr<gfxASurface> result = mSurface.get();
     return result.forget();
   }
 
+  // XXX: If we forced delayed conersion, are we ever going to hit this?

conversion (typo)

::: gfx/layers/basic/BasicLayers.cpp
@@ +2010,3 @@
   nsRefPtr<gfxSharedImageSurface> mBackSurface;
+  nsRefPtr<gfxSharedImageSurface> mBackSurface2;
+  nsRefPtr<gfxSharedImageSurface> mBackSurface3;

If you wanted, you could call mBackSurface2 mBackSurfaceCb, and similarly for mBackSurface3.

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +1022,5 @@
+
+    yuvProgram->Activate();
+    yuvProgram->SetLayerQuadRect(nsIntRect(0, 0,
+                                        mSize.width,
+                                        mSize.height));

it's not clear from the diff viewer, but make sure mSize aligns directly below the 0 here.
Attachment #527454 - Flags: review?(joe) → review+
Oh - I forgot to ask! Why can't ImageLayer decide on its own whether to convert on draw or on SetData(), rather than leaving it up to the user?
Comment on attachment 527454 [details] [diff] [review]
Part 2: Add a YUV option to SharedImage and use it to share YUV data across processes v3

>+    // Round up the values for width and height to make sure we sample enough data
>+    // for the last pixel - See bug 590735
>+    if (width_shift && (aData.mPicSize.width & 1)) {
>+      mData.mCbCrStride++;
>+      mData.mCbCrSize.width++;
>+    }

For what it's worth, this is wrong when aData.mPicX is odd (and similarly for mPicSize.height and mPicY). The correct behavior should be equivalent to cropping after converting to RGB. This is the reason that the code retains the offsets up to this point instead of just updating the buffer pointers.
(In reply to comment #17)
> Oh - I forgot to ask! Why can't ImageLayer decide on its own whether to
> convert on draw or on SetData(), rather than leaving it up to the user?

ImageLayer can have a null manager, and we can't check for accelerated compositing in that case.
(In reply to comment #18)
> Comment on attachment 527454 [details] [diff] [review] [review]
> Part 2: Add a YUV option to SharedImage and use it to share YUV data across
> processes v3
> 
> >+    // Round up the values for width and height to make sure we sample enough data
> >+    // for the last pixel - See bug 590735
> >+    if (width_shift && (aData.mPicSize.width & 1)) {
> >+      mData.mCbCrStride++;
> >+      mData.mCbCrSize.width++;
> >+    }
> 
> For what it's worth, this is wrong when aData.mPicX is odd (and similarly
> for mPicSize.height and mPicY). The correct behavior should be equivalent to
> cropping after converting to RGB. This is the reason that the code retains
> the offsets up to this point instead of just updating the buffer pointers.

Hum, I added this code when we were failing with odd sized videos, and it fixed the reftest failure. Since this bug already exists in the current code, I'll file a follow-up to move this repeated code into one place (as joe suggested) and fix the problem there. Suggestions on how would be appreciated :)
Rebased against tip, with the SurfaceDescriptor changes.
Attachment #527453 - Attachment is obsolete: true
Attachment #531513 - Flags: review+
Rebased against tip, and fixed review comments.

Carrying forward r=cjones,joe
Attachment #531514 - Flags: review+
Attachment #527454 - Attachment is obsolete: true
Filed bug 656185 for the odd sized video bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Depends on: 659217
Depends on: 659768
Assignee: nobody → matt.woodrow+bugzilla
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: