Closed Bug 557982 Opened 10 years ago Closed 10 years ago

Eliminate unnecessary copy of video data

Categories

(Core :: Audio/Video, defect)

defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: rich)

References

Details

Attachments

(2 files, 1 obsolete file)

Currently nsOggDecoder copies video frames from the libtheora buffer into its own queue. When a frame is ready to be presented, we create an PlanarYCbCrImage object for the frame and make that the current image of the ImageContainer. Currently, the semantics of PlanarYCbCrImage::SetData require the image object to make a copy of the passed-in data. In the BasicLayers backend this is not a performance issue since we do YUV conversion at that point anyway, so the copy is free. However, in the GL layers backend, we have to make a copy of the YUV data. I see two ways to eliminate that extra copy:

1) Have the GL layers backend synchronously upload the data to a texture in the call to PlanarYCbCrImage::SetData. I don't know if that can be done within GL's threading constraints.

2) Have the decoder create an Image object for each frame directly from the libtheora buffers; the decoder's frame queue becomes a queue of Image objects. The BasicLayers backend implementation of PlanarYCbCrImge would not YUV-convert in SetData anymore, instead it would YUV-convert when the image becomes the current image of an ImageContainer. The GL backend implementation would remain unchanged (or could do a synchronous texture upload, for possibly even greater wins).

Approach #2 seems like the right one, primarily because that structure is also needed to accommodate the bc-cat + Leonora DSP decoder combination on the N900. In that implementation, we will need to allocate a PackedYCbCrImage for each frame before decoding it; the PackedYCbCrImage will use bc-cat to allocate a buffer so that the DSP decoder can decode directly into the buffer.
Note that approach #2 would be performance-neutral for BasicLayers; the copy we currently do in nsOggDecoder would effectively move into BasicPlanarYCbCrImage::SetData.
While we're doing this, we should get rid of the copying here:
http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderStateMachine.cpp#1036
which would be a win for all backends.
Attached patch Patch attempt 1 (obsolete) — Splinter Review
I've changed the queued VideoData (frame) objects so that they now hold an Image object. The Image is created directly from the theora buffers, eliminating the unnecessary copying that you identified. Copying is now done within the Image's SetData() method.

For the moment I haven't changed any of the SetData() methods. BasicPlanarYCbCrImage's implementation still converts to RGB straight away. A future patch will delay the RGB conversion so that doesn't occur until the Image is added to a container. Note that this will mean making an extra copy.

There are a couple of things about my implementation that I'm not too sure about, and would appreciate feedback on.

1. The VideoData::Create() method now needs access to the state machine's nsVideoInfo object. This is because the nsVideoInfo object contains information needed to create the Image object. To make the nsVideoInfo available I'm threading it as a parameter through many of the nsOggReader and nsBuiltinDecoderReader methods. But maybe passing it as a parameter through a dozen methods is heavier than necessary. I could alternatively access the state machine directly, but I would then need to cast the state machine into an nsBuiltinDecoderStateMachine. I avoided doing this because I thought that it might pollute the decoder's nicely designed builtin vs ogg abstractions.

Alternatively I could try and avoid using the nsVideoInfo object entirely and just use information in the YCbCrBuffer. However doing that would change the behaviour of the code, something I don't really want to do!

2. I've tried to preserve all the error checking that occurred in the original code. This included a mix of conditions and outcomes: some errors are ignored, some cause failures, and some result in a partial success (e.g. success for some channels but not for others). I've tried my best to preserve the same behaviour.

Now most of these error conditions occur when the Image is created (in VideoData::Create()) rather than in when it is rendered (in nsBuiltinDecoderStateMachine::RenderVideoFrame()). I've changed the Create() method to return an nsresult so that the calling code can handle errors if it wishes. The code all seems to work, and the tests pass, but it would be great if you could have a look over the code in this area. It would be especially good if you could look at how errors are handled when they fall out of CreateVideo(), through nsOggReader::DecodeTheora() and into nsOggReader::DecodeVideoFrame(). Extra eyes here would definitely be appreciated.
Basically looks good. We don't need to distinguish errors, so we should keep on returning null. mImage doesn't need to be initialized to null.
Is it worth storing the nsVideoInfo object in the nsBuiltinDecoderReader instead of passing it around to all the functions? Maybe move it from the nsBuiltinDecoderStateMachine to the reader and have the state machine object ask the reader for it when it needs it?
Attached patch Patch attempt 2Splinter Review
Updated based on Rob and Chris's feedback.
Attachment #445045 - Attachment is obsolete: true
I actually think we can take this patch as-is without doing this part:

> The BasicLayers backend implementation of PlanarYCbCrImge would not YUV-convert
> in SetData anymore, instead it would YUV-convert when the image becomes the
> current image of an ImageContainer.

Delaying that YUV-conversion has two benefits and one drawback:
1) If we skip frames, we save the cost of YUV-conversion for those frames.
2) Our frame queue uses half the memory (for 4:2:0 videos).
3) But it introduces an extra copy into the pipeline.

If we're able to keep up with playback (not skipping any frames) then 1 is not an issue. Also, 1 is less of an issue now that we have considerably faster YUV conversion code. It's also less of an issue since we can now skip to the next keyframe, which avoids decoding frames along the way and thus avoids the YUV-conversion of those frames.

2 is probably not a big issue for almost all users.

Matthew suggests (and I agree) that we should focus on optimizing the case where we are able to keep up, because when we can't keep up the experience is inevitably not very good.
+  // We check for invalid frame sizes before setting data's attributes.
+  // If we skip a channel then it will have 0 dimensions, thanks to our constructor.
+  PRUint32 size = 0;
+  if (MulOverflow32(PR_ABS(aInfo.mFrame.width),
+                    PR_ABS(aInfo.mFrame.height),
+                    size))
+  {
+    data.mYChannel = aBuffer.mPlanes[0].mData;
+    data.mYSize = gfxIntSize(aInfo.mFrame.width, aInfo.mFrame.height);
+    data.mYStride = aBuffer.mPlanes[0].mStride;
+  }
+  if (MulOverflow32(PR_ABS(aBuffer.mPlanes[1].mWidth),
+                    PR_ABS(aBuffer.mPlanes[1].mHeight),
+                    size))

We should really just bail out of either of these fail. Proceeding with 0-size channels is just confusing.

+  // This frame's image, in PLANAR_YCBCR format.
+  nsRefPtr<Image> mImage;

The comment shouldn't mention PLANAR_YCBCR --- this image could be in any format, and we might want to use that in the future.

The rest looks fine. I will update the patch to make it ready for landing.
Attached patch patch v3Splinter Review
Updated patch. I revised the validation logic at the beginning of VideoData::Create, you should check that.
Attachment #445642 - Flags: review?(chris.double)
Attachment #445642 - Flags: review?(chris.double) → review?(kinetik)
Assignee: chris.double → rich
Comment on attachment 445642 [details] [diff] [review]
patch v3

+                             ImageContainer *aContainer,
+                             PRInt64 aOffset,
                              PRInt64 aTime,
                              const YCbCrBuffer &aBuffer,

* and & are on the wrong side of the whitespace.

+  NS_ASSERTION(v->mImage->GetFormat() == Image::PLANAR_YCBCR,
+               "Wrong format?");

Seems like ImageContainer::CreateImage should be enforcing this internally. 

+  // Gets presentation info required for playback.
+  nsVideoInfo& GetInfo() {
+    return mInfo;
+  }

I'd prefer this to return a const nsVideoInfo &.  There's only one place that needs to modify it:

+  if (!info.mHasVideo) {
+    info.mCallbackPeriod = 1000 / AUDIO_FRAME_RATE;
   }

Which can be pushed down into the readers where the rest of the structure is initialized.  It's better to initialize it completely in a single place if possible.
Attachment #445642 - Flags: review?(kinetik) → review+
(In reply to comment #10)
> +  NS_ASSERTION(v->mImage->GetFormat() == Image::PLANAR_YCBCR,
> +               "Wrong format?");
> 
> Seems like ImageContainer::CreateImage should be enforcing this internally.

Absolutely, this is just a sanity check.

> +  // Gets presentation info required for playback.
> +  nsVideoInfo& GetInfo() {
> +    return mInfo;
> +  }
> 
> I'd prefer this to return a const nsVideoInfo &.  There's only one place that
> needs to modify it:
> 
> +  if (!info.mHasVideo) {
> +    info.mCallbackPeriod = 1000 / AUDIO_FRAME_RATE;
>    }
> 
> Which can be pushed down into the readers where the rest of the structure is
> initialized.  It's better to initialize it completely in a single place if
> possible.

OK, that makes good sense.
Whiteboard: [needs landing]
http://hg.mozilla.org/mozilla-central/rev/e530c2b50c0a
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing]
Duplicate of this bug: 774347
You need to log in before you can comment on or make changes to this bug.