Closed Bug 656185 Opened 13 years ago Closed 13 years ago

Consolidate YUV data copying code and handle odd sized images correctly.

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla7

People

(Reporter: mattwoodrow, Unassigned)

References

Details

(Whiteboard: [inbound])

Attachments

(5 files, 7 obsolete files)

We now have this code in two places, BasicPlanarYCbCrImage and PlanarYCbCrImageOGL, we should share this code in a central location.

We also are handling odd sized videos incorrectly:

> For what it's worth, this is wrong when aData.mPicX is odd (and similarly
> for mPicSize.height and mPicY). The correct behavior should be equivalent to
> cropping after converting to RGB. This is the reason that the code retains
> the offsets up to this point instead of just updating the buffer pointers.
It's not necessarily odd sizes that are incorrect, but odd crop offsets. This can be incorrect even with an even size, as follows:

A 6x2 4:2:0 image (Y=luma, C=both chroma channels)
+---+---+---+---+---+---+
| Y | Y | Y | Y | Y | Y |
+---C---+---C---+---C---+
| Y | Y | Y | Y | Y | Y |
+---+---+---+---+---+---+
Each chroma value is shared with its four neighboring luma pixels.

If you crop 1 pixel off the left and right, you still need all 3 chroma values
    +---+---+---+---+
    | Y | Y | Y | Y |
    C---+---C---+---C
    | Y | Y | Y | Y |
    +---+---+---+---+
even though the width is now 4.

What we're doing now would shift the chroma one (luma) pixel to the right, i.e., we would covert as if it were
    +---+---+---+---+
    | Y | Y | Y | Y |
    +---C---+---C---+
    | Y | Y | Y | Y |
    +---+---+---+---+

For the basic case, we need to keep an offset of at least 1 when the original offset was odd (if we're sub-sampling chroma along that axis). For OpenGL, we need to do pretty much the same thing, and munge the texture coordinates a little to make things line up properly.
Attached patch WIP (obsolete) — Splinter Review
How does this look as a proof of concept, at least as far as correctness goes?

Obviously needs a lot of cleanup, sharing the code etc.

The four cases I see are:

Odd crop offset, even width: Drop half a pixel from each edge of the Cb and Cr planes
Odd crop offset, odd width: Drop half a pixel from the left edge of the Cb and Cr planes.
Even crop offset, even width: Planes should line up correctly.
Even crop offset, odd width: Drop half a pixel from the right edge of the Cb and Cr planes.

Does this sound correct? The patch should be implementing this.

I also feel we need a test for this. How would I go about getting a suitable video (A single frame with different colours along a long we are going to clip) made for this?
Attachment #531834 - Flags: feedback?(tterribe)
Attached patch WIP v2 (obsolete) — Splinter Review
Fixed texture coords bug.
Attachment #531834 - Attachment is obsolete: true
Attachment #531834 - Flags: feedback?(tterribe)
Attachment #531836 - Flags: feedback?(tterribe)
Comment on attachment 531836 [details] [diff] [review]
WIP v2

Review of attachment 531836 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think your texture coordinate corrections are correct, and there are a few other issues. See comments inline.

::: gfx/layers/opengl/ImageLayerOGL.cpp
@@ +397,5 @@
> +  GLuint texCoordAttribIndex =
> +    aProg->AttribLocation(LayerProgram::TexCoordAttrib);
> +  GLuint cbcrtexCoordAttribIndex =
> +    aProg->AttribLocation(LayerProgram::CbCrTexCoordAttrib);
> +  NS_ASSERTION(texCoordAttribIndex != GLuint(-1), "no texture coords?");

Should you have an assert for cbcrtexCoordAttribIndex also?

@@ +418,5 @@
> +                0.0f, 0.0f, 1.0f, 1.0f,
> +                /* tex coords */
> +                0.0f, 0.0f, 1.0f, 1.0f);
> +
> +  cbcrrects.addRect(/* dest rectangle */

I don't think these texture coordinate corrections are correct. Let's assume you fix the two scaling issues I mention below, so that the cbcr texture coordinates range from 0 to 1, which maps to texels from 0 to mCbCrSize.width. Let's also assume that you add checks for width_shift and height_shift when you set aSkipLeft, etc., so these are only set when the chroma plane is actually subsampled along the corresponding axis. For simplicitly, lets abbreviate mPicSize.width as pW and mCbCrSize.width as cW. Then you should have the following invariant:

(x1-x0)*cW*2 == pW

That is, the step size in chroma texels should be exactly twice the step size in luma texels.

Let's consider a few cases:

a) aSkipLeft && !aSkipRight
In this case pW is odd, so cW == ((pW>>1)+1) == 0.5*pW+0.5
This gives

(1.0 - 0.5/pW)*(0.5*pW+0.5)*2 == (1.0 - 0.5/pW)*(pW+1) == (pW+1 - (0.5/pW)*(pW+1)) != pW

What we want is for the (0.5/pW)*(pW+1) term to be 1. This suggests the _actual_ offset should not be 0.5/pW, but 1.0/(pW+1).

b) !aSkipLeft && aSkipRight
This is basically the same as a).

c) aSkipLeft && aSkipRight
In this case pW is even, so cW == ((pW>>1)+1) == 0.5*pW+1.0
This gives

((pW - 0.5)/pW - 0.5/pW)*(0.5*pW+1.0)*2 == (1.0 - 2*(0.5/pW))*(pW+2) == (pW+2 - 2*(0.5/pW)*(pW+2)) != pW
Again, we want the (0.5/pW)*(pW+2) term to be 1.0, suggesting the _actual_ offset should be 1.0/(pW+2).

Note that in both cases, the divisor is cW*2.0, so the correct offset should simply be
0.5f/mCbCrSize.width, not 0.5f/mPicSize.width.

The cases for height are exactly analogous.

@@ +421,5 @@
> +
> +  cbcrrects.addRect(/* dest rectangle */
> +                    0.0f, 0.0f, 1.0f, 1.0f,
> +                    /* tex coords */
> +                    aSkipLeft ? 0.5f : 0.0f / GLfloat(aTexSize.width),

/ has higher precedence than ?:, so it is only applied to 0.0f, not the 0.5f. I think you want (aSkipLeft ? 0.5f / GLfloat(aTexSize.width) : 0.0f), no? (you could just add parentheses, but you really only need the divide in the aSkipLeft case).

@@ +423,5 @@
> +                    0.0f, 0.0f, 1.0f, 1.0f,
> +                    /* tex coords */
> +                    aSkipLeft ? 0.5f : 0.0f / GLfloat(aTexSize.width),
> +                    aSkipTop ? 0.5f : 0.0f / GLfloat(aTexSize.height),
> +                    GLfloat(aTexSize.width) - (aSkipRight ? 0.5f : 0.0f),

Shouldn't this also be divided by aTexSize.width to scale between 0 and 1?

@@ +501,5 @@
>      program->SetLayerOpacity(GetEffectiveOpacity());
>      program->SetRenderOffset(aOffset);
>      program->SetYCbCrTextureUnits(0, 1, 2);
>  
>      mOGLManager->BindAndDrawQuad(program);

Shouldn't this line be removed?

@@ +762,5 @@
> +  mSkipRight = PR_FALSE;
> +  mSkipTop = PR_FALSE;
> +  mSkipBottom = PR_FALSE;
> +
> +  if (aData.mPicX & 1) {

Nothing in this section checks to see if we are actually subsampling chroma on either axis (i.e., the if (width_shift && ...) and if (height_shift && ...) part of the checks below). Since BindAndDrawYCbCrQuad also doesn't check, this means it will add a chroma offset even when it shouldn't.

@@ +764,5 @@
> +  mSkipBottom = PR_FALSE;
> +
> +  if (aData.mPicX & 1) {
> +    mSkipLeft = PR_TRUE;
> +    if (!aData.mPicSize.width & 1) {

! has higher precedence than &, and width should be non-zero, so this is effectively if (true) {

@@ +773,5 @@
> +  }
> +  
> +  if (aData.mPicY & 1) {
> +    mSkipTop = PR_TRUE;
> +    if (!aData.mPicSize.height & 1) {

Same here.
Attachment #531836 - Flags: feedback?(tterribe) → feedback-
(In reply to comment #2)
> I also feel we need a test for this. How would I go about getting a suitable
> video (A single frame with different colours along a long we are going to
> clip) made for this?

If you give me a PNG, I'd be happy to make an Ogg file with offsets for the various cases out of it.
(In reply to comment #4)
> Note that in both cases, the divisor is cW*2.0, so the correct offset should
> simply be
> 0.5f/mCbCrSize.width, not 0.5f/mPicSize.width.

This is indeed the calculation I meant to perform, and what my sketches show as the correct answer. Thanks for the full breakdown, and for catching this.

I think the rest of the comments fall into approximately the same category, which "I should have tested this before uploading it"

Thanks Tim! Will fix those and work on a test.
Attached image Reftest image
I think this should work as a test image if we use the clip rect {21, 21, 60, 60}.

All the colour bands should be multiples of 2 in width and height (so the subsampled chroma channels stay exactly what I'd expect) and this clip rect should need to sample from both sides of the literal rect.
I'm not sure what you mean (the blue lines are 1 pixel wide, which last I checked was not an integer multiple of 2), however http://people.xiph.org/~tterribe/theora/color-crop-c60x60+21+21.ogg is what I think you asked for.
Interesting, MS Paint must have lied to me. The blue strips were meant to be 2 pixels wide.

I don't think it matters, I was taking incorrect guesses as how the conversion to YUV would work.

Does anyone have ideas as to how I could create a reference image for this?

I've tried using the reftest output of the correctly rendered video (which is a png data url) and clipping to just be the 60x60 image and using that but it has slight differences with every pixel.

I also tried creating a second identical video except shifted one pixel to the left so the picture rect lies on an even offset. This is also slightly different on the edges.
Attachment #531836 - Attachment is obsolete: true
Attachment #532815 - Flags: review?(tterribe)
Attached patch D3D9 fix (obsolete) — Splinter Review
This still needs the shaders to be generated and tested.
Attachment #532817 - Flags: review?(bas.schouten)
Attached patch D3D10 fix (obsolete) — Splinter Review
Same as the d3d9 patch.
Attachment #532818 - Flags: review?(bas.schouten)
We still have the problem that the accelerated layers GetCurrentAsSurface implementation is calling ConvertYCbCrToRGB with only the picture rect data and isn't being offset correctly.

I'm not overly keen to make the right changes throughout the entirety of the ycbcr/ folder, so other alternatives would be nice.

We could just make a copy of the entire frame, which should work, but could possibly be wasteful.

We could also just pad the luma channel out with an extra pixel at the start end and use a picture rect offset of 1 in this case. I'll have a look to see if I can make this work without being too ugly.
Attached patch Fix GetAsSurface for OpenGL (obsolete) — Splinter Review
This scares me, it's a horrible hack.

I think it's correct as we are only lying about luma pixels outside of the picture rect, and these should never be read.

The issue of (xoffset + width) > stride could be a problem but it doesn't appear to hit any assertions.

It does fix the problem (tested by drawing the video to a canvas to force this code path).

Every other solution to the problem requires massive rewrites of our YUV layers code. All the current code (in all 3 accelerated backends) assumes they can only copy data within the picture rect and dump the rest of the frame. Changing this for a very rare edge case seems unnecessary.

I think we can get away with doing this if we document it properly, but I'd accept someone putting their foot down here.
Attachment #532848 - Flags: feedback?(tterribe)
I was talking with Tim about this on IRC.

Our optimal design is to decode each frame directly into a shared-memory buffer, and in the compositor process upload the buffer as a texture, and then on the GPU perform YUV conversion as we composite the texture to the destination.

Given that design, there's no point in having the content process be responsible for dealing with the picture rect, since it won't be touching the data after decoding. Instead the content process should pass the decoded YUV data and the picture rect up to the compositor process and let it deal with the picture rect.

The compositor process can handle the picture rect as follows, I think:
-- For non-GPU compositing, just use the ycbcr code as now.
-- For GPU compositing, upload the entire YUV buffer to a texture. Then draw a quad corresponding to the picture rect, with YUV conversion. Tim tells me that the ARM YUV+scaling code already takes partial samples of pixels outside the picture rect, and this is OK.
In particular, the current gfxSharedImageSurface is not adequate for this task, as it does not allow you specify the stride. The patch in attachment 532815 [details] [diff] [review] blatantly ignores the declared stride of the buffer and copies the data as a single block in BasicShadowableImageLayer::Paint(), with a potentially different stride. It should be doing a row-by-row copy, but really we should not be doing a separate copy at all (even when we're not decoding into shmem directly).

Keep in mind also that decoding directly into shmem will expect a single block of memory for all 3 planes.
Attachment #532848 - Attachment is obsolete: true
Attachment #532848 - Flags: feedback?(tterribe)
Should be fairly straightforward, just need this outside of ThebesLayer now.
Attachment #533518 - Flags: review?
Attachment #533518 - Flags: review? → review?(joe)
Rewritten from scratch.

We now copy the entire frame data and upload this to the textures. Textures coordinates are used to only draw the picture rect.

Looks much better!
Attachment #532815 - Attachment is obsolete: true
Attachment #532815 - Flags: review?(tterribe)
Attachment #533519 - Flags: review?(tterribe)
Attached patch D3D10 fix v2Splinter Review
Attachment #532818 - Attachment is obsolete: true
Attachment #532818 - Flags: review?(bas.schouten)
Attachment #533520 - Flags: review?(bas.schouten)
Attached patch D3D9 fix v2Splinter Review
This patch queue passes on tryserver and visibly fixes the bug on Mac.

I still haven't managed to get a reftest image to pass successfully, very interested in ideas for this.
Attachment #532817 - Attachment is obsolete: true
Attachment #532817 - Flags: review?(bas.schouten)
Attachment #533521 - Flags: review?(bas.schouten)
Comment on attachment 533519 [details] [diff] [review]
Handle odd crop offsets correctly v2

Review of attachment 533519 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with changes. This looks like a much cleaner approach.

::: gfx/layers/ImageLayers.h
@@ +450,5 @@
> +   */
> +  PRUint8 *CopyData(Data& aDest, gfxIntSize& aDestSize,
> +                    PRUint32& aDestBufferSize, const Data& aData);
> +
> +  virtual PRUint8* AllocateBuffer(PRUint32 aSize) { return new PRUint8[aSize]; };

Should you use a fallible new here? This buffer could be several MB in size, and you already have the NULL check.

::: gfx/layers/Layers.cpp
@@ +458,5 @@
> +PRUint8*
> +PlanarYCbCrImage::CopyData(Data& aDest, gfxIntSize& aDestSize,
> +                           PRUint32& aDestBufferSize, const Data& aData)
> +{
> +  aDest = aData;

This will set the strides to those in aData, which are usually larger than the actual width of the rows (because the images are padded for motion compensation purposes, at least for VP8 and Theora). I'd recommend explicitly setting aDest.mYStride and aDest.mCbCrStride to the corresponding widths (they'll always be multiples of 16, so the alignment should be fine). If you're doing the row-by-row copy anyway, you might as well save the 32 or 64 bytes per row.

Also, that way I don't have to think about whether the memcpy()s below can read past the end of the buffer on the last row when aDest.mYStride > aDest.mYSize.width (they can't, but only because there are also extra rows in the original buffer).

::: gfx/layers/basic/BasicImages.cpp
@@ +148,5 @@
>      NS_ERROR("Illegal width or height");
>      return;
>    }
>    
> +  gfx::YUVType type = 

type is no longer needed by CopyData, so you can move this afterwards to avoid computing it when mDelayedConversion is true.

::: gfx/layers/basic/BasicLayers.cpp
@@ +2065,5 @@
>        
> +    for (int i = 0; i < data->mYSize.height; i++) {
> +      memcpy(mBackBufferY->Data() + i * mBackBufferY->Stride(),
> +             data->mYChannel + i * data->mYStride,
> +             data->mYStride);

mBackBufferY is created with a width of mSize.width, not mYStride. The latter will almost certainly be larger, meaning this could overrun the buffer on the last row. Same for the two loops below. If you update the strides like I suggest above, I don't think this is a problem, but there's no reason for this code to rely on mSize.width==mYStride.

I'd still like this extra copy to go away entirely, but I guess that is the subject of another bug.

::: gfx/ycbcr/yuv_convert.cpp
@@ +30,5 @@
>  const int kFractionBits = 16;
>  const int kFractionMax = 1 << kFractionBits;
>  const int kFractionMask = ((1 << kFractionBits) - 1);
>  
> +NS_GFX_(YUVType) TypeFromSize(int ywidth, 

This hunk should be added to gfx/ycbcr/convert.patch so it will be applied by update.sh.

::: gfx/ycbcr/yuv_convert.h
@@ +40,5 @@
>    FILTER_BILINEAR_V = 2,  // Bilinear vertical filter.
>    FILTER_BILINEAR = 3     // Bilinear filter.
>  };
>  
> +NS_GFX_(YUVType) TypeFromSize(int ywidth, int yheight, int cbcrwidth, int cbcrheight);

This hunk should be added to gfx/ycbcr/convert.patch so it will be applied by update.sh.
Attachment #533519 - Flags: review?(tterribe) → review+
Fixed review comments.

Carrying forward r=derf
Attachment #533519 - Attachment is obsolete: true
Attachment #534343 - Flags: review+
Comment on attachment 534343 [details] [diff] [review]
Handle odd crop offsets correctly v3

>       YUVImage yuv(tmpYSurface->GetShmem(),
>                    tmpUSurface->GetShmem(),
>-                   tmpVSurface->GetShmem());
>+                   tmpVSurface->GetShmem(),
>+                   nsIntRect());

I meant to ask before, but apparently forgot, should the last parameter here use the picture rect from data, rather than an empty rectangle? I realize the OpCreateImageBuffer implementation doesn't currently use it, but it seems cleaner to pass valid data rather than not.
Comment on attachment 533518 [details] [diff] [review]
Move BindAndDrawQuadWithTextureRect into LayerManagerOGL

Review of attachment 533518 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #533518 - Flags: review?(joe) → review+
Attachment #533520 - Flags: review?(bas.schouten) → review+
Comment on attachment 533521 [details] [diff] [review]
D3D9 fix v2

Review of attachment 533521 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d9/ImageLayerD3D9.cpp
@@ -386,5 @@
>  void
>  PlanarYCbCrImageD3D9::SetData(const PlanarYCbCrImage::Data &aData)
>  {
> -  // XXX - For D3D9Ex we really should just copy to systemmem surfaces here.
> -  // For now, we copy the data

Is this comment no longer true?
Attachment #533521 - Flags: review?(bas.schouten) → review+
> I meant to ask before, but apparently forgot, should the last parameter here
> use the picture rect from data, rather than an empty rectangle? I realize
> the OpCreateImageBuffer implementation doesn't currently use it, but it
> seems cleaner to pass valid data rather than not.

Fair enough, fixed.(In reply to comment #25)

> >  void
> >  PlanarYCbCrImageD3D9::SetData(const PlanarYCbCrImage::Data &aData)
> >  {
> > -  // XXX - For D3D9Ex we really should just copy to systemmem surfaces here.
> > -  // For now, we copy the data
> 
> Is this comment no longer true?

The comment still applies, the patches just change the bounds of what gets uploaded to a texture.

Landed as:

http://hg.mozilla.org/integration/mozilla-inbound/rev/f18032387a59
http://hg.mozilla.org/integration/mozilla-inbound/rev/f67ea1c7da36
http://hg.mozilla.org/integration/mozilla-inbound/rev/5841deff3454
http://hg.mozilla.org/integration/mozilla-inbound/rev/3657ab7434b7
Whiteboard: [inbound]
And backed out again because msvc doesn't like my fallible new syntax for some reason.

http://hg.mozilla.org/integration/mozilla-inbound/rev/05058eacd33a
Whiteboard: [inbound]
Depends on: 670573
Depends on: 671756
Hi, sorry I'm not familiar with your source/bugs reporting system.
But I notice this code is based on code I wrote originally for chromium.

I've since rewritten it for webrtc, but in a separate package libyuv, if you just want the conversion and/or scaling functions.  Its SSSE3 and Neon optimized now and roughly 2x faster than the original MMX table method.

If you consider using libyuv, please post an issue
http://code.google.com/p/libyuv/issues/list
with your requirements/changes.  I'd also welcome your direct contribution.
Thanks Frank.  I looked at moving to libyuv a long time ago, but never had a chance to do much about it.  I thought we had a bug on file, but I can't find it now, so I filed bug 791941.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: