To avoid an extra copy, the buffers we decode into should be allocated in a platform specific way. For example, with d3d10 I expect we should be decoding directly into a D3D10_USAGE_STAGING resource, the video frames can then be asynchronously copied to the GPU.
We're limited with the existing Ogg backend, because it uses liboggplay which manages its own buffers. With the new Ogg backend, we'll have a lot more freedom (it's just waiting on review). We need to change things to use bc-cat on N900 as well. Currently the setup is 1) The Ogg backend maintains its own queue of YUV buffers (in liboggplay). Every time libtheora decodes a frame, we copy from libtheora's frame buffer into a liboggplay buffer. 2) When a buffer becomes the current frame, we create an Image object for that buffer. BasicLayers converts YUV to RGB at this time (off the main thread). I guess D3D/OGL do texture upload at this time. Here's how I think the final setup will work: 1) The Ogg backend maintains a queue of Image objects. Before we decode a frame, we allocate an Image object. -- When using libtheora, we ask it to decode a frame and it gives us back a pointer to an internal buffer. We can't just tell it to decode into our own buffer, I presume because it needs its own copy to correctly decode inter-frames. Basically, if we are overlapping decoding with texture upload, we need to make a copy somewhere in main memory. -- So after obtaining the pointer to libtheora's buffer, we'll need to pass that pointer to the Image object and have it do something quickly and return. -- for BasicLayers, we'll copy the YUV data into a main memory buffer and return. -- for D3D/OGL, we can copy the YUV data into a buffer and start an asynchronous texture upload, or we can do a synchronous texture upload and avoid the copy. Given this is all on a dedicated decoding thread, which is currently even per-video, I suspect we probably should do the synchronous upload. The only case where that is suboptimal is if we're playing a giant video so that Theora decoding plus non-overlapped synchronous texture upload delay can't keep up. In that case we *might* be better off copying the YUV data in main memory and doing asynchronous texture upload. -- The N900 story is different because we use the DSP for decoding. There, creating the Image will allocate a bc-cat texture streaming buffer and we'll pass that address to the DSP decoder. The DSP decoder will make a copy from its internal buffers into the bc-cat buffer, swizzling into bc-cat's packed-YUV format at the same time. The CPU never touches the decoded data. 2) When a buffer becomes the current frame, we make the Image the current image for the ImageContainer. -- For BasicLayers, we perform YUV to RGB conversion at this time. -- For D3D/OGL (including N900), I guess we don't really need to do anything except update the internal pointers so that the Image's texture data gets used for the next draw.
Is this effectively implemented? Can we close this bug?
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
It's not completely clear to me if this is fixed or not. Can someone please clarify?
This is fixed.
You need to log in before you can comment on or make changes to this bug.