Closed Bug 847267 Opened 7 years ago Closed 7 years ago

Enable DXVA2 on Windows Vista+ to accelerate H.264 video decoding

Categories

(Core :: Graphics: Layers, defect)

x86_64
Windows 8
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla23
Tracking Status
relnote-firefox --- 23+

People

(Reporter: cpearce, Assigned: cpearce)

References

(Depends on 1 open bug)

Details

Attachments

(4 files, 7 obsolete files)

2.49 KB, patch
jgilbert
: review+
Details | Diff | Splinter Review
23.36 KB, patch
bas.schouten
: review+
Details | Diff | Splinter Review
37.82 KB, patch
cpearce
: review+
Details | Diff | Splinter Review
770 bytes, patch
Callek
: review+
philip.chee
: feedback+
Details | Diff | Splinter Review
We need to enable DXVA2 in our Windows Media Foundation video decoder. Slow video painting performance is killing us on Windows 8 tablets, particularly the low end ones, and even the Surface Pro has glitches if it's running with a "Power Saver" power plan.

Our Windows Media Foundation decoder (WMFReader [1]) uses Windows Media Foundation's IMFSourceReader [2] to decode video/audio samples. In order to have the SourceReader use DXVA2 [3] we need to wrap a IDirect3D9 object in a DXVA2CreateDirect3DDeviceManager9 and pass that into the SourceReader as it's constructed.

Other objects that use the IDirect3D9Device are supposed to lock it using the DXVA2CreateDirect3DDeviceManager9 while in use.

Once we've set the SourceReader to use DXVA2, the decoded video frames come out inside IDirect3D9Surface's, instead of in CPU memory buffers, which should be faster to paint.

On Windows 8 we could use IMFDXGIDeviceManager instead and that works with D3D11, but that wouldn't work on Windows 7, and we want to support Windows 7 if possible.

So, we need to figure out how to get an IDirect3D9Device interface pointer to the WMFReader, and how to get IDirect3D9Surface's down the layers/gfx pipeline and onto the screen.

[1] http://mxr.mozilla.org/mozilla-central/source/content/media/wmf/WMFReader.h
[2] http://msdn.microsoft.com/en-us/library/windows/desktop/dd940436(v=vs.85).aspx
[3] http://msdn.microsoft.com/en-us/library/windows/desktop/dd389281(v=vs.85).aspx#hardware_acceleration
This patch creates a dummy windows and uses that to initialize D3D9 and DXVA2 hardware accelerated video decoding in the SourceReader.

This speeds up decoding considerably, but we're still using the same software rendering path, so we're doing a readback from the D3D9 surface in GPU memory into CPU memory and then rendering the frame using our existing code. That process cripples our rendering performance, so we still need to alter our rendering pipeline to handle video frames stored in IDirect3D9Surfaces.

I'm only initializing D3D9 here, we can do something similar for D3D11, but only on Windows 8.

If I alter our decoder to not do the readback, and just allocate and paint equivalently sized blank frames, even on low end hardware our decode easily keeps up with playback, there are no audio stutters, and we use a fraction of the CPU. :D

Bas: How does this approach to initializing DXVA2 look? Any hints as to how we can take a IDirect3D9Surface and get that into compositing/layers and onscreen?
Attachment #724285 - Flags: feedback?(bas)
(In reply to Chris Pearce (:cpearce) from comment #1)
> Created attachment 724285 [details] [diff] [review]
> WIP: Enable DXVA support in Source Reader
> 
> Bas: How does this approach to initializing DXVA2 look? Any hints as to how
> we can take a IDirect3D9Surface and get that into compositing/layers and
> onscreen?

This looks fine really! I need to dig in more how to make it work with D3D10/11 on Win7. I'm still hoping the platform update will bring the possibilities we need, but it's not so sure.

We could use D3D9/10 interop, but we know it to be -very- unstable on a lot of systems/drivers. We'll have to see.
Status update: I have DXVA working using D3D9Ex and rendering frames correctly onscreen in the D3D9 layers backend. Next step is to get frames stored in D3D9Ex surfaces shared through to DXGI to the D3D10.1 layers backend.
Assignee: nobody → cpearce
Summary: Enable DXVA2 on Windows7+ to accelerate H.264 video decoding → Enable DXVA2 on Windows Vista+ to accelerate H.264 video decoding
Actually rather than using D3D9Ex and sharing the decoded frames' D3D9 surfaces into the D3D10.1 layers backend, a better solution might be to use D3D11 for DXVA where available, and then use DXGI to share the resources with the D3D10.1 layers backend, and then presumably the upcoming D3D11 layers backend as well.

Then for the D3D9 layers backend we can then either use the D3D11->D3D9Ex surface sharing techniques described here: http://msdn.microsoft.com/en-nz/library/windows/desktop/ee913554(v=vs.85).aspx#interoperability_between_direct3d_9ex_and_dxgi_based_apis or we can just use D3D9Ex in DXVA.
Work in progress patch, enables DXVA using D3D9Ex. Requires forcing using D3D9 layers backend.
Attachment #724285 - Attachment is obsolete: true
Attachment #724285 - Flags: feedback?(bas)
Blocks: 799315
This patch is based on top of the previous D3D9Ex version, and initializes DXVA using D3D11. The decoded frames are stored in NV12 format, which is a Y plane followed by interleaved UV samples in a single plane. I copy the decoded frame into a D3D11 texture2d that is sharable using DXGI, and then that is put into a new image type. This copy appears to make it all the way into the D3D10ImageLayer::RenderLayer(), but now I'm stuck.

Bas: How do I render an NV12 texture 2d in the D3D10 backend? Do I need to write another pixel shader? Where do I start??
Attachment #733628 - Flags: feedback?(bas)
* Create a new object DXVA2Manager, which manages the creation of a D3D device, and converts video MFSample objects into layers::Image objects. We use D3D9Ex for DXVA, and the decoded frames come out as IDirect3DSurface9 objects. DXVA2 uses its own D3D device, and the D3D surface DXVA gives us aren't sharable, so our DXVA2Manager creates a sharable copy of the decoded frame using ID3DDevice9::StretchRect(). StretchRect also preforms YCbCr->RGB conversion for us on the GPU.
* Add a new Image type D3D9_RGB32_TEXTURE, implemented by D3DSurfaceImage class. This wraps a D3D9 Texture in RGB format. The DXVA2Manager converts decoded video frames to Images.
* I've designed the DXVA2Manager so that it's transparent to the WMFReader what D3D version it's using, so that it's easy to implement D3D11 support for DXVA if needed in future.
* I haven't implemented DXVA using D3D11 support, since it seems to work fine with D3D9Ex+resource sharing on the devices I have available here. If once we've had some more extensive testing and we find D3D9 is too unreliable, we can pref it off and I can work on a D3D11 DXA2Manager.
* WMFReader finds the LayerManager and only uses DXVA if we're using D3D9 or D3D10 layer managers. I found that video painting performance was worse if I readback the frames from GPU memory in order to be used by the Basic layers backend.
* The D3D9/D3D10 layers backends open the Image texture using resource sharing, and paint frames using code paths pretty similar to the code paths that we already have there.
* I implement D3DSurfaceImage::GetAsSurface() by doing readback.
* There's a pref to disable DXVA support, it's enabled by default in this patch.
* For comparison, Chrome's code to init/use DXVA2 is here:
https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu/media/dxva_video_decode_accelerator.cc&sq=package:chromium
* CPU usage is much reduced by this patch, and we playback 1080p pretty smoothly on the Surface Pro when running on battery in power saver mode. We're still dropping about 5% of frames in rendering though (dropping meaning that the frame isn't being drawn before ImageContainer::SetCurrentImage() is called again with the next frame).
* https://hg.mozilla.org/try/rev/31d1a32824de
Attachment #729963 - Attachment is obsolete: true
Attachment #733628 - Attachment is obsolete: true
Attachment #733628 - Flags: feedback?(bas)
Attachment #735605 - Flags: review?(roc)
Attachment #735605 - Flags: feedback?(bas)
Attachment #735605 - Flags: review?(paul)
> CPU usage is much reduced by this patch, and we playback 1080p pretty smoothly on the Surface Pro when running on battery in power saver mode. We're still dropping about 5% of frames in rendering though (dropping meaning that the frame isn't being drawn before ImageContainer::SetCurrentImage() is called again with the next frame).

Are you using the Enhanced Video Renderer for rendering decoded frames?
No. The frame is stored in a texture, and that's just being rendered as a textured quad inside our regular pipeline.
I realised I had broke things when I was cleaning up the patch, and DXVA wasn't being used in my v1 patch.

There's a test failure, I'll work on fixing that, and address it in a follow up patch, or merge the fix into this one while addressing review comments:
https://tbpl.mozilla.org/?tree=Try&rev=ffa29f9995b1

Note, you can download a build to test this feature here:
https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/cpearce@mozilla.com-ffa29f9995b1/try-win32/firefox-23.0a1.en-US.win32.zip
Attachment #735605 - Attachment is obsolete: true
Attachment #735605 - Flags: review?(roc)
Attachment #735605 - Flags: review?(paul)
Attachment #735605 - Flags: feedback?(bas)
Attachment #736067 - Flags: review?(roc)
Attachment #736067 - Flags: feedback?(bas)
Attachment #736067 - Flags: review?(paul)
Comment on attachment 736067 [details] [diff] [review]
Patch v2: Use D3D9Ex in DXVA2 for GPU assisted video decoding

Review of attachment 736067 [details] [diff] [review]:
-----------------------------------------------------------------

I barely skimmed the Windows API parts of the code, relying on Bas for that :-).

Overall it looks nice. It's great that we can plug this in so cleanly.

Might be worth splitting the patch up into "imagelayer changes", "add GetOwner()", "DXVA2Manager", and "WMF integration"

::: content/media/wmf/DXVA2Manager.h
@@ +18,5 @@
> +class Image;
> +class ImageContainer;
> +}
> +
> +class DXVA2Manager {

Add a comment explaining what this class is for

::: gfx/layers/d3d9/ImageLayerD3D9.cpp
@@ +617,5 @@
> +  const unsigned srcPitch = rect.Pitch;
> +  for (int y = 0; y < mData->mSize.height; y++) {
> +    memcpy(surface->Data() + surface->Stride() * y,
> +           (unsigned char*)(src) + srcPitch * y,
> +           mData->mSize.width * 4);

There must be a copy of this code somewhere already we can share...
Attachment #736067 - Flags: review?(roc) → review+
Attachment #736067 - Flags: feedback?(bas) → review?(bas)
I've tracked down the mochitest failure. It's the Khronos webgl conformance test tex-image-and-sub-image-2d-with-video.html. The test loads a 2x2 pixel red video, writes the video frame to a canvas, and checks that its rgb=(255,0,0). But when using DXVA2 on an nvidia card, the colour of the frame comes out as rgb=(253,1,0), and the test fails.

The mochitest machines have nvidia graphics cards, I can repro this on my laptop (which has Optimus graphics) when I run in Discrete graphics mode, not in Optimus mode where we use the Intel Graphics.

Hopefully there's some antialiasing settings I can tweak for the D3D devices being used, or we could just make our version of that test fuzzy.
Depends on: 860999
The failing test can be run online at:
https://www.khronos.org/registry/webgl/sdk/tests/conformance/textures/tex-image-and-sub-image-2d-with-video.html

This test passes in Chrome; they use DXVA2.

I talked to jgilbert on IRC. Recent versions of that test have a fuzz factor (which explains why Chrome passes), and our mochitest copy of that test does not, so we'll pass the test once we've updated the testsuite. jgilbert has said he'll update the test suite next week. I filed bug 860999 to track that.
Comment on attachment 735605 [details] [diff] [review]
Patch: Use D3D9Ex in DXVA2 for GPU assisted video decoding

Review of attachment 735605 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/wmf/DXVA2Manager.cpp
@@ +103,5 @@
> +                              D3DCREATE_FPU_PRESERVE |
> +                              D3DCREATE_MULTITHREADED |
> +                              D3DCREATE_MIXED_VERTEXPROCESSING,
> +                              &params,
> +                              NULL,

I assume this is okay, even when using the full-screen API.
Attachment #735605 - Flags: review+
Attachment #736067 - Flags: review?(paul) → review+
Comment on attachment 736067 [details] [diff] [review]
Patch v2: Use D3D9Ex in DXVA2 for GPU assisted video decoding

Review of attachment 736067 [details] [diff] [review]:
-----------------------------------------------------------------

nit: bunch of whitespace issues in here, I won't point them all out.

Also, let's call D3DSurfaceImage something more descriptive and closer to the image type enum. Like D3D9TextureImage or something like that (or D3D9SurfaceImage if we're going with the changes I proposed).

::: content/media/wmf/DXVA2Manager.cpp
@@ +61,5 @@
> +typedef HRESULT (WINAPI*Direct3DCreate9Func)(UINT SDKVersion, IDirect3D9Ex**);
> +
> +HRESULT
> +D3D9DXVA2Manager::Init()
> +{

This all seems somewhat wasteful if we've already created a device manager inside Layers. But we can forego that for now I suppose.

@@ +86,5 @@
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> +
> +  // Create D3D9DeviceEx.
> +  D3DPRESENT_PARAMETERS params = {0};
> +  params.BackBufferWidth = 1;

nit: might want to make indent consistent

@@ +91,5 @@
> +  params.BackBufferHeight = 1;
> +  params.BackBufferFormat           = D3DFMT_UNKNOWN;
> +  params.BackBufferCount            = 1;
> +  params.SwapEffect                 = D3DSWAPEFFECT_DISCARD;
> +  params.hDeviceWindow              = ::GetShellWindow();

Hrm, is it safe to use the ShellWindow here? I guess if you tested it it won't hurt? Was this taken from an MS example?

@@ +166,5 @@
> +  nsRefPtr<IDirect3DSurface9> textureSurface;
> +  hr = texture->GetSurfaceLevel(0, getter_AddRefs(textureSurface));
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> +
> +  hr = mDevice->StretchRect(surface, NULL, textureSurface, NULL, D3DTEXF_NONE);

Let's be clear this copy is not free, especially on shared memory architectures where memory bandwidth is not there in abundance. In the future, is there a way we could avoid using the shared handle if we're already compositing with a D3D9Ex device anyway? This ties into the Init question (i.e. we'd share the device). Especially since places where we're resorting to D3D9 will be places with lower-end hardware in general.

@@ +180,5 @@
> +  nsAutoPtr<D3DSurfaceImage::Data> data = new D3DSurfaceImage::Data();
> +
> +  data->mTexture = texture;
> +  data->mDevice = mDevice;
> +  data->mShareHandle = shareHandle;

I suggest as a more 'future-proof' solution here we consider passing the IDirect3DSurface9 into the data here. This will allow different backends to do the 'optimal' thing here. This code would basically go into layers.

::: gfx/layers/d3d10/ImageLayerD3D10.cpp
@@ +145,5 @@
> +      D3DSurfaceImage* d3dImage = reinterpret_cast<D3DSurfaceImage*>(aImage);    
> +      const D3DSurfaceImage::Data* data = d3dImage->GetData();
> +
> +      nsRefPtr<ID3D10Resource> tempRes10;
> +      hr = device()->OpenSharedResource(data->mShareHandle, IID_ID3D10Resource, (void**)getter_AddRefs(tempRes10));

I think you'll be able to open an IID_ID3D10Texture2D directly and save some code here.

::: gfx/layers/d3d9/ImageLayerD3D9.cpp
@@ +121,5 @@
> +                                      D3DUSAGE_RENDERTARGET,
> +                                      D3DFMT_X8R8G8B8,
> +                                      D3DPOOL_DEFAULT,
> +                                      getter_AddRefs(sharedTexture),
> +                                      &aShareHandle);

I'd like to understand what's providing the synchronization between the LayerManager's D3D9 device and the DXVA D3D9 device right now or if we're just generally synchronized by sheer luck here. I suppose if we don't see any artifacts from this in practice we can live with it for now! It would be nice to see a comment about this so it doesn't get forgotten.

@@ +605,5 @@
> +                                                   getter_AddRefs(systemMemorySurface),
> +                                                   0);
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), nullptr);
> +
> +  hr = mData->mDevice->GetRenderTargetData(textureSurface, systemMemorySurface);  

If we did my earlier suggestion this codepath could be called directly on the IDirect3DSurface9 preventing the copy in GPU memory that we're doing now even if we're going to take this fallback path anyway.
(In reply to Bas Schouten (:bas.schouten) from comment #15)
> Comment on attachment 736067 [details] [diff] [review]
> Patch v2: Use D3D9Ex in DXVA2 for GPU assisted video decoding
> 
> Review of attachment 736067 [details] [diff] [review]:


> @@ +91,5 @@
> > +  params.BackBufferHeight = 1;
> > +  params.BackBufferFormat           = D3DFMT_UNKNOWN;
> > +  params.BackBufferCount            = 1;
> > +  params.SwapEffect                 = D3DSWAPEFFECT_DISCARD;
> > +  params.hDeviceWindow              = ::GetShellWindow();
> 
> Hrm, is it safe to use the ShellWindow here? I guess if you tested it it
> won't hurt? Was this taken from an MS example?

This worked fine on every device/machine I tested on. Chromium's code does this too, FWIW. ;)


> 
> @@ +166,5 @@
> > +  nsRefPtr<IDirect3DSurface9> textureSurface;
> > +  hr = texture->GetSurfaceLevel(0, getter_AddRefs(textureSurface));
> > +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> > +
> > +  hr = mDevice->StretchRect(surface, NULL, textureSurface, NULL, D3DTEXF_NONE);
> 
> Let's be clear this copy is not free, especially on shared memory
> architectures where memory bandwidth is not there in abundance.

Right, but overall decoding with DXVA plus this copy to a sharable surface appears to be faster than CPU decoding and then uploading the decoded frame to GPU memory? ;)

But yes, I agree that if we can avoid the extra copy for D3D9 layers that would be great. I'd like to see if I can remove this copy in a follow up bug. We're still dropping about 5% of frames with this patch on the Surface Pro in Power Saver mode, and I'd like to improve on that (it uses D3D10 layers however).

This StretchRect copy also does NV12->RGB conversion, so if we didn't do the copy we'd need to write a shader to do the NV12->RGB conversion, or find some other way to do it - but ideally writing shaders is the sort of thing someone on the graphics team should do.


> In the
> future, is there a way we could avoid using the shared handle if we're
> already compositing with a D3D9Ex device anyway? This ties into the Init
> question (i.e. we'd share the device). Especially since places where we're
> resorting to D3D9 will be places with lower-end hardware in general.

As far as I can tell DXVA doesn't create its surfaces with sharing enabled, so we must copy them to a shared surface to make them accessible to another d3d device.

Also on some hardware (specifically the Win7 Desktop with an nvidia card sitting under my desk) we can't actually hang onto more than a couple of the IDirect3DSurface9s that come out of DXVA without the application locking up ("a couple" is some number >2 and <=5, the values I tried).

Our video stack tries to keep 10 decoded frames in its buffers at all times, so if we were storing references to DXVA's surfaces we'd need to reduce the number of decoded frames we buffer. Though for a 1080p XRGB video frame we'd need 8.29 MB/frame, or 82.9MB for 10 of them, so reducing the number of buffered frames would probably be a good idea anyway...


> @@ +180,5 @@
> > +  nsAutoPtr<D3DSurfaceImage::Data> data = new D3DSurfaceImage::Data();
> > +
> > +  data->mTexture = texture;
> > +  data->mDevice = mDevice;
> > +  data->mShareHandle = shareHandle;
> 
> I suggest as a more 'future-proof' solution here we consider passing the
> IDirect3DSurface9 into the data here. This will allow different backends to
> do the 'optimal' thing here. This code would basically go into layers.

Right, we'd need to pass in the device too, otherwise we couldn't copy it to a shared surface if we needed to.

> ::: gfx/layers/d3d9/ImageLayerD3D9.cpp
> @@ +121,5 @@
> > +                                      D3DUSAGE_RENDERTARGET,
> > +                                      D3DFMT_X8R8G8B8,
> > +                                      D3DPOOL_DEFAULT,
> > +                                      getter_AddRefs(sharedTexture),
> > +                                      &aShareHandle);
> 
> I'd like to understand what's providing the synchronization between the
> LayerManager's D3D9 device and the DXVA D3D9 device right now or if we're
> just generally synchronized by sheer luck here. I suppose if we don't see
> any artifacts from this in practice we can live with it for now! It would be
> nice to see a comment about this so it doesn't get forgotten.

I think we're synchronizing via sheer luck. ;)

Chromium uses a D3D query and they flush that and then call Sleep(1) inside a loop until the device flush finishes:
https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu/media/dxva_video_decode_accelerator.cc&sq=package:chromium&l=346

I had that but it didn't seem to have any effect so I removed it. Actually now that I think about it, we try to keep 10 frames decoded in advance, so under optimal conditions that would give us something a like half a second between a frame being decoded and the frame being displayed, which is probably why I'm not seeing artifacts.

I will add in a loop like Chromium's, unless you know a better way to enforce synchronization?


So, I will:
* Rename the image type to D3D9TextureImage.
* Add the D3DQuery flush and loop to ensure synchronization between D3D devices.
* Pass the IDirect3DSurface9 and IDirect3DDevice9 into the D3D9TextureImage's data so that the backend can do the right thing.
* Try reducing the number of buffered frames when using DXVA.
* Address the nits you picked up.

We can look at removing the copy in a follow up bug.
(In reply to Chris Pearce (:cpearce) from comment #16)
> (In reply to Bas Schouten (:bas.schouten) from comment #15)
> > Comment on attachment 736067 [details] [diff] [review]
> > Patch v2: Use D3D9Ex in DXVA2 for GPU assisted video decoding
> > 
> > Review of attachment 736067 [details] [diff] [review]:
> 
> 
> > @@ +91,5 @@
> > > +  params.BackBufferHeight = 1;
> > > +  params.BackBufferFormat           = D3DFMT_UNKNOWN;
> > > +  params.BackBufferCount            = 1;
> > > +  params.SwapEffect                 = D3DSWAPEFFECT_DISCARD;
> > > +  params.hDeviceWindow              = ::GetShellWindow();
> > 
> > Hrm, is it safe to use the ShellWindow here? I guess if you tested it it
> > won't hurt? Was this taken from an MS example?
> 
> This worked fine on every device/machine I tested on. Chromium's code does
> this too, FWIW. ;)

Wfm :-)

> > @@ +166,5 @@
> > > +  nsRefPtr<IDirect3DSurface9> textureSurface;
> > > +  hr = texture->GetSurfaceLevel(0, getter_AddRefs(textureSurface));
> > > +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> > > +
> > > +  hr = mDevice->StretchRect(surface, NULL, textureSurface, NULL, D3DTEXF_NONE);
> > 
> > Let's be clear this copy is not free, especially on shared memory
> > architectures where memory bandwidth is not there in abundance.
> 
> Right, but overall decoding with DXVA plus this copy to a sharable surface
> appears to be faster than CPU decoding and then uploading the decoded frame
> to GPU memory? ;)
> 
> But yes, I agree that if we can avoid the extra copy for D3D9 layers that
> would be great. I'd like to see if I can remove this copy in a follow up
> bug. We're still dropping about 5% of frames with this patch on the Surface
> Pro in Power Saver mode, and I'd like to improve on that (it uses D3D10
> layers however).

Dropping 5% of frames? That seems excessive! Hrm, yeah, we should definitely fix that!

> This StretchRect copy also does NV12->RGB conversion, so if we didn't do the
> copy we'd need to write a shader to do the NV12->RGB conversion, or find
> some other way to do it - but ideally writing shaders is the sort of thing
> someone on the graphics team should do.

Hrm, really? I'm not sure what that shader would do, I think that's handled automatically, but I might be wrong! I can do that shader if we need it of course.

> 
> 
> > In the
> > future, is there a way we could avoid using the shared handle if we're
> > already compositing with a D3D9Ex device anyway? This ties into the Init
> > question (i.e. we'd share the device). Especially since places where we're
> > resorting to D3D9 will be places with lower-end hardware in general.
> 
> As far as I can tell DXVA doesn't create its surfaces with sharing enabled,
> so we must copy them to a shared surface to make them accessible to another
> d3d device.

Right, but if we use the -same- device, we don't need sharing :-). I.e. we'd initialize the DXVA device manager with the device we also use on Layers.

> 
> Also on some hardware (specifically the Win7 Desktop with an nvidia card
> sitting under my desk) we can't actually hang onto more than a couple of the
> IDirect3DSurface9s that come out of DXVA without the application locking up
> ("a couple" is some number >2 and <=5, the values I tried).

Let's e-mail Microsoft/NVidia about that once this is landed, that seems weird.

> 
> Our video stack tries to keep 10 decoded frames in its buffers at all times,
> so if we were storing references to DXVA's surfaces we'd need to reduce the
> number of decoded frames we buffer. Though for a 1080p XRGB video frame we'd
> need 8.29 MB/frame, or 82.9MB for 10 of them, so reducing the number of
> buffered frames would probably be a good idea anyway...

Agreed.

> > @@ +180,5 @@
> > > +  nsAutoPtr<D3DSurfaceImage::Data> data = new D3DSurfaceImage::Data();
> > > +
> > > +  data->mTexture = texture;
> > > +  data->mDevice = mDevice;
> > > +  data->mShareHandle = shareHandle;
> > 
> > I suggest as a more 'future-proof' solution here we consider passing the
> > IDirect3DSurface9 into the data here. This will allow different backends to
> > do the 'optimal' thing here. This code would basically go into layers.
> 
> Right, we'd need to pass in the device too, otherwise we couldn't copy it to
> a shared surface if we needed to.

No, we don't :-) - see http://msdn.microsoft.com/en-us/library/windows/desktop/bb205880%28v=vs.85%29.aspx (IDirect3DSurface9 inherits from IDirect3DResource9)
 
> > ::: gfx/layers/d3d9/ImageLayerD3D9.cpp
> > @@ +121,5 @@
> > > +                                      D3DUSAGE_RENDERTARGET,
> > > +                                      D3DFMT_X8R8G8B8,
> > > +                                      D3DPOOL_DEFAULT,
> > > +                                      getter_AddRefs(sharedTexture),
> > > +                                      &aShareHandle);
> > 
> > I'd like to understand what's providing the synchronization between the
> > LayerManager's D3D9 device and the DXVA D3D9 device right now or if we're
> > just generally synchronized by sheer luck here. I suppose if we don't see
> > any artifacts from this in practice we can live with it for now! It would be
> > nice to see a comment about this so it doesn't get forgotten.
> 
> I think we're synchronizing via sheer luck. ;)
> 
> Chromium uses a D3D query and they flush that and then call Sleep(1) inside
> a loop until the device flush finishes:
> https://code.google.com/p/chromium/codesearch#chromium/src/content/common/
> gpu/media/dxva_video_decode_accelerator.cc&sq=package:chromium&l=346
> 
> I had that but it didn't seem to have any effect so I removed it. Actually
> now that I think about it, we try to keep 10 frames decoded in advance, so
> under optimal conditions that would give us something a like half a second
> between a frame being decoded and the frame being displayed, which is
> probably why I'm not seeing artifacts.
> 
> I will add in a loop like Chromium's, unless you know a better way to
> enforce synchronization?

Aye, but make sure you send in the query -as early- as possibly after executing the copy to the shared surface, and then only do the loop when actually trying to draw the shared surface. This will minimize the risk of actually blocking in this loop when we don't need to.

> So, I will:
> * Rename the image type to D3D9TextureImage.
> * Add the D3DQuery flush and loop to ensure synchronization between D3D
> devices.
> * Pass the IDirect3DSurface9 and IDirect3DDevice9 into the
> D3D9TextureImage's data so that the backend can do the right thing.
> * Try reducing the number of buffered frames when using DXVA.
> * Address the nits you picked up.
> 
> We can look at removing the copy in a follow up bug.

Sounds great!
I split the patches up for easier understandability.

Implements a new Image type for storing D3D surfaces, with review comments addressed:
* Called class D3D9TextureImage.
* Pass in IDirect3DSurface9 into D3D9TextureImage::SetData().
* Synchronize texture copies with D3D Queries.
* Removed all trailing whitespace.
Attachment #736067 - Attachment is obsolete: true
Attachment #736067 - Flags: review?(bas)
Attachment #741145 - Flags: review?(bas)
This is basically the content/media changes. It's pretty much the same as before, except that I've moved the copy-to-shared surface to the previous patch.
Attachment #741147 - Flags: review?(bas)
This patch disable webgl conformance tex-image-and-sub-image-2d-with-video on Windows Vista and later so that we can land hardware accelerated h.264 decoding support without waiting for bug 860999 to be fixed. We can re-enable the tex-image-and-sub-image-2d-with-video conformance test in 860999.

We can't just mark the test as a known failure, as it only fails on some graphics hardware, in particular the nvidia cards on the mochitest releng machines.
Attachment #741149 - Flags: review?(jgilbert)
Comment on attachment 741149 [details] [diff] [review]
Patch 3: Disable webgl conformance tex-image-and-sub-image-2d-with-video on Windows Vista+

Review of attachment 741149 [details] [diff] [review]:
-----------------------------------------------------------------

We should add support for basic comments for skipped_tests_ files, so we can label why things are skipped, and/or attach bug numbers.
Attachment #741149 - Flags: review?(jgilbert) → review+
(In reply to Jeff Gilbert [:jgilbert] from comment #21)
> Comment on attachment 741149 [details] [diff] [review]
> Patch 3: Disable webgl conformance tex-image-and-sub-image-2d-with-video on
> Windows Vista+
> 
> Review of attachment 741149 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> We should add support for basic comments for skipped_tests_ files, so we can
> label why things are skipped, and/or attach bug numbers.

Looks like I'll also have to disable conformance/textures/texture-npot-video.html:

https://tbpl.mozilla.org/?tree=Try&rev=edc22819037a
Comment on attachment 741145 [details] [diff] [review]
Patch 1 v1: Image class for D3D Texture

Review of attachment 741145 [details] [diff] [review]:
-----------------------------------------------------------------

All in all I think this is a great improvement! A couple of small things. I'd like to see fixed before we land it.

::: gfx/layers/D3D9TextureImage.cpp
@@ +9,5 @@
> +namespace mozilla {
> +namespace layers {
> +
> +HRESULT
> +D3D9TextureImage::SetData(IDirect3DSurface9* aSurface, const nsIntSize& aSize)

We should make this consistent with other image classes, and pass in const Data &aData.

@@ +54,5 @@
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> +
> +  nsRefPtr<IDirect3DQuery9> query;
> +  hr = device->CreateQuery(D3DQUERYTYPE_EVENT, getter_AddRefs(query));
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);

I'm personally not a huge fan of checking the result codes of functions you no succeeds. It contributes to the high overhead of COM and introduces a huge amount of conditional branches. However I'm fine with it if you prefer it.

@@ +86,5 @@
> +    return;
> +  }
> +  int iterations = 0;
> +  while (iterations < 10 && S_FALSE == mData->mQuery->GetData(NULL, 0, D3DGETDATA_FLUSH)) {
> +    Sleep(1);

It might be better to do a Sleep(0) here? This makes me very sad :(.

@@ +112,5 @@
> +  EnsureSynchronized();
> +
> +  // Readback the texture from GPU memory into system memory, so that
> +  // we can copy it into the Cairo image. This is expensive.
> +  nsRefPtr<IDirect3DSurface9> textureSurface;

Please use mfbt's RefPtr in new code.

::: gfx/layers/D3D9TextureImage.h
@@ +12,5 @@
> +
> +namespace mozilla {
> +namespace layers {
> +
> +class D3D9TextureImage : public Image {

Let's call it D3D9SurfaceImage as I suggested previously I think, now that that's what it is :-)

@@ +19,5 @@
> +  struct Data {
> +    nsRefPtr<IDirect3DTexture9> mTexture;
> +    nsRefPtr<IDirect3DQuery9> mQuery;
> +    HANDLE mShareHandle;
> +    nsIntSize mSize;

We should move mQuery and mShareHandle to be class members to be consistent with other Image types.

@@ +46,5 @@
> +
> +  already_AddRefed<gfxASurface> GetAsSurface() MOZ_OVERRIDE;
> +
> +private:
> +  nsAutoPtr<Data> mData;

Data should probably not be a Pointer but just a member. Again, if only to be consistent with how other Images work.

::: gfx/layers/d3d10/ImageLayerD3D10.cpp
@@ +159,5 @@
> +      hr = device()->CreateShaderResourceView(dat->mTexture, NULL, getter_AddRefs(dat->mSRView));
> +      NS_ENSURE_TRUE(SUCCEEDED(hr) && dat->mSRView, nullptr);
> +
> +      aImage->SetBackendData(mozilla::layers::LAYERS_D3D10, dat.forget());
> +    }

aHasAlpha needs to be set here.
Attachment #741145 - Flags: review?(bas) → review-
Comment on attachment 741147 [details] [diff] [review]
Patch 2 v1: Initialize DXVA in WMFReader

Review of attachment 741147 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, we'll need some minor changes to adjust to changing the SetData function to use the Data class but I trust this won't pose an issue.
Attachment #741147 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #23)
> Comment on attachment 741145 [details] [diff] [review]
> Patch 1 v1: Image class for D3D Texture

Thanks for the review!

> ::: gfx/layers/D3D9TextureImage.cpp
> @@ +9,5 @@
> > +namespace mozilla {
> > +namespace layers {
> > +
> > +HRESULT
> > +D3D9TextureImage::SetData(IDirect3DSurface9* aSurface, const nsIntSize& aSize)
> 
> We should make this consistent with other image classes, and pass in const
> Data &aData.

You mean we should change D3D9SurfaceImage::Data to have only two members, a RefPtr<IDirect3DSurface9> and an nsIntSize? We can't hold onto references to the DXVA's device's IDirect3DSurface9 as I mentioned earlier, as some nvidia cards can't handle this. So we'd need to move mTexture out of D3D9SurfaceImage::Data to be class members as well, and add an accessor for the texture, and not store the Data object.


> @@ +46,5 @@
> > +
> > +  already_AddRefed<gfxASurface> GetAsSurface() MOZ_OVERRIDE;
> > +
> > +private:
> > +  nsAutoPtr<Data> mData;
> 
> Data should probably not be a Pointer but just a member. Again, if only to
> be consistent with how other Images work.
> 

Right, if we're holding a reference to the DXVA device's surface in D3D9SurfaceImage::Data, we shouldn't keep a reference to it inside D3D9SurfaceImage else we'll cause problems on nvidia cards.
(In reply to Bas Schouten (:bas.schouten) from comment #23)
> Comment on attachment 741145 [details] [diff] [review]
> Patch 1 v1: Image class for D3D Texture
 
> @@ +86,5 @@
> > +    return;
> > +  }
> > +  int iterations = 0;
> > +  while (iterations < 10 && S_FALSE == mData->mQuery->GetData(NULL, 0, D3DGETDATA_FLUSH)) {
> > +    Sleep(1);
> 
> It might be better to do a Sleep(0) here? This makes me very sad :(.


Um... Well won't that mean we can actually end up not waiting when we need to?

At least with a Sleep(1), sleeping for at most 10 iterations means at most 10ms (plus timer slop, but we'll be enabling high precision timers in bug 866561), whereas looping for 10 iterations with Sleep(0) could end up looping for something very close to 0ms when there's no other CPU threads to run and so Sleep(0) returns immediately. We may actually need to wait in that case, but we'll run that loop without context switching or sleeping in that case.
Review comments addressed:
* Renamed Image to D3D9SurfaceImage.
* D3D9SurfaceImage::SetData() takes a D3D9SurfaceImage::Data temporary instead of raw parameters.
* D3D9SurfaceImage::Data now only stores the surface and size, the other members are now class members of D3D9SurfaceImage.
* Expose the D3DSURFACE_DESC of the surface on D3D9SurfaceImage, that way we don't need to expose the Texture (which was only used to get the D3DSURFACE_DESC anyway).
* D3D9SurfaceImage now only exposes its underlying D3D resource via the share HANDLE. Users have the use resource sharing to open the resource.
* Use RefPtr instead of nsRefPtr.
* The query loop is still Sleep(1), but I can make it Sleep(0) before landing if you really think it's a good idea.
Attachment #741145 - Attachment is obsolete: true
Attachment #744984 - Flags: review?(bas)
Updated to reflect changes in Patch 1.
Attachment #741147 - Attachment is obsolete: true
Attachment #744985 - Flags: review+
Comment on attachment 744984 [details] [diff] [review]
Patch 1 v1: Image class for D3D9 Surface

Review of attachment 744984 [details] [diff] [review]:
-----------------------------------------------------------------

One important comment about aHasAlpha, looks good otherwise!

::: gfx/layers/D3D9SurfaceImage.cpp
@@ +60,5 @@
> +  RefPtr<IDirect3DQuery9> query;
> +  hr = device->CreateQuery(D3DQUERYTYPE_EVENT, byRef(query));
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), hr);
> +
> +  hr = device->StretchRect(surface, NULL, textureSurface, NULL, D3DTEXF_NONE);

nit: I'd like the create query to be moved below this to the issue call to keep them together.

@@ +86,5 @@
> +    return;
> +  }
> +  int iterations = 0;
> +  while (iterations < 10 && S_FALSE == mQuery->GetData(NULL, 0, D3DGETDATA_FLUSH)) {
> +    Sleep(1);

My concern is sleep(1) could sleep for a fairly long time on some older systems (but that might not be relevant), sleep(0) I believe just relinguishes your timeslice and gets back to you after a round of scheduling. Having said that I like that your current approach allows us to max our waiting at 10 ms. So let's leave it for now.

::: gfx/layers/d3d10/ImageLayerD3D10.cpp
@@ +156,5 @@
> +      NS_ENSURE_TRUE(SUCCEEDED(hr) && dat->mSRView, nullptr);
> +
> +      aImage->SetBackendData(mozilla::layers::LAYERS_D3D10, dat.forget());
> +      aHasAlpha = false;
> +    }

You need to said aHasAlpha outside of this if block, so we set it on subsequent draws of the same frame as well (i.e. where we have backend data)

::: gfx/layers/d3d9/ImageLayerD3D9.cpp
@@ +363,5 @@
> +      if (backendData->mTexture) {
> +        aImage->SetBackendData(mozilla::layers::LAYERS_D3D9, backendData.forget());
> +      }
> +    }
> +    aHasAlpha = false;

Like you do here! :-)
Attachment #744984 - Flags: review?(bas) → review+
> DXVA2Manager.obj : error LNK2001: unresolved external symbol _MR_BUFFER_SERVICE
> xul.dll : fatal error LNK1120: 1 unresolved externals

So where does this symbol come from?
strmiids.lib

Sorry, looks like we're not linking this when WebRTC is disabled. I'll prepare a patch to fix that.
Does this fix the build for you?
Attachment #745698 - Flags: review?(philip.chee)
Blocks: 868848
Comment on attachment 745698 [details] [diff] [review]
Patch: Link strmiids when WebRTC is disabled

Review of attachment 745698 [details] [diff] [review]:
-----------------------------------------------------------------

untested but r+ unsure if ratty would trust his own review here
Attachment #745698 - Flags: review+
Comment on attachment 745698 [details] [diff] [review]
Patch: Link strmiids when WebRTC is disabled

Yes, this patch allows me to compile SeaMonkey. f=me as I'm not a peer of anything outside SeaMonkey
Attachment #745698 - Flags: review?(philip.chee) → feedback+
Depends on: 872375
This has been noted in the Aurora 23 release notes:

http://www.mozilla.org/en-US/firefox/23.0a2/auroranotes/

If you would like to make any changes or have questions/concerns please contact me directly.
Depends on: 879099
Are there any demos or live sites using this tech that QA can use to test as Firefox 23 enters Beta? In addition are there any high risk areas we should pay attention to?
Basically anything with HTML5 <video> that uses H.264.

A basic example is here:
http://people.mozilla.com/~cpearce/h264.html

You should pay attention to non-perfect video playback, in particular, audio stuttering or glitches, and error appearing on the decoded video. For example, issues like bug 872375.

Testing on a range of hardware would be good. Note that we use DXVA to accelerate video decoding on any Windows Vista or later system with D3D9 or D3D10 layers acceleration. So testing on a computer without layers acceleration won't test this feature.
Do we know of any live sites serving html5 h264 video?
Keywords: verifyme
If you set media.webm.enabled to false, then the HTML5 trial on YouTube will serve you H.264 (it serves WebM by default to Firefox in the HTML5 trial). You need to opt into the HTML5 trial at http://youtube.com/html5 and you might want to disable flash in about:addons just to be sure you don't get served Flash.
Thanks for that information Chris. We'll use that when testing Firefox 23.0b1 candidates.
Mozilla/5.0 (Windows NT 6.2; WOW64; rv:23.0) Gecko/20130619 Firefox/23.0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20130619 Firefox/23.0

Looks fine in latest Aurora 23.0a2 (buildID: 20130619004018) using the information from comment 42 and comment 44.
Status: RESOLVED → VERIFIED
Bogdan, can you please explain in more detail what was tested here, in particular which hardware configurations were used?
QA Contact: bogdan.maris
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #47)
> Bogdan, can you please explain in more detail what was tested here, in
> particular which hardware configurations were used?

I set media.webm.enabled to false in about:config, disabled Flash in about:addons, went on http://people.mozilla.com/~cpearce/h264.html and played all videos (no distortions or glitches were found), then opted for HTML trial on youtube and played some videos from there.

Hardware configuration:
- For Windows 7 x64
  Graphics

        Adapter Description
        AMD Radeon HD 6450

        Adapter Drivers
        aticfx64 aticfx64 aticfx64 aticfx32 aticfx32 aticfx32 atiumd64 atidxx64 atidxx64 atiumdag atidxx32 atidxx32 atiumdva atiumd6a atitmm64

        Adapter RAM
        1024

        Device ID
        0x6779

        Direct2D Enabled
        true

        DirectWrite Enabled
        true (6.2.9200.16492)

        Driver Date
        12-19-2012

        Driver Version
        9.12.0.0

        GPU #2 Active
        false

        GPU Accelerated Windows
        1/1 Direct3D 10

        Vendor ID
        0x1002

        WebGL Renderer
        Google Inc. -- ANGLE (AMD Radeon HD 6450)

        AzureCanvasBackend
        direct2d

        AzureContentBackend
        direct2d

        AzureFallbackCanvasBackend
        cairo

- For Windows 8 x64:
Graphics

        Adapter Description
        Intel(R) HD Graphics

        Adapter Drivers
        igdumd64 igd10umd64 igd10umd64 igdumd32 igd10umd32 igd10umd32

        Adapter RAM
        Unknown

        Device ID
        0x0102

        Direct2D Enabled
        true

        DirectWrite Enabled
        true (6.2.9200.16433)

        Driver Date
        3-8-2013

        Driver Version
        9.17.10.3062

        GPU #2 Active
        false

        GPU Accelerated Windows
        1/1 Direct3D 10

        Vendor ID
        0x8086

        WebGL Renderer
        Google Inc. -- ANGLE (Intel(R) HD Graphics)

        AzureCanvasBackend
        direct2d

        AzureContentBackend
        direct2d

        AzureFallbackCanvasBackend
        cairo
Thanks Bogdan, can you also test this on a machine with an NVidia GPU?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #49)
> Thanks Bogdan, can you also test this on a machine with an NVidia GPU?

Sure. It looks fine on a Nvidia GPU as well.
- For Windows 7 x64:
Graphics

        Adapter Description
        NVIDIA GeForce GT 620

        Adapter Drivers
        nvd3dumx nvwgf2umx nvwgf2umx nvd3dum nvwgf2um nvwgf2um

        Adapter RAM
        1024

        Device ID
        0x0f01

        Direct2D Enabled
        true

        DirectWrite Enabled
        true (6.2.9200.16492)

        Driver Date
        1-18-2013

        Driver Version
        9.18.13.1106

        GPU #2 Active
        false

        GPU Accelerated Windows
        1/1 Direct3D 10

        Vendor ID
        0x10de

        WebGL Renderer
        Google Inc. -- ANGLE (NVIDIA GeForce GT 620)

        AzureCanvasBackend
        direct2d

        AzureContentBackend
        direct2d

        AzureFallbackCanvasBackend
        cairo
Thanks Bogdan. I think we can safely call this verified now. Just for extra sanity I will incorporate H264 video testing in a Beta testday this cycle.
Keywords: verifyme
not fixed!!
pleaase look here
https://bugzilla.mozilla.org/show_bug.cgi?id=901944
ok on seven, but please fix it for Vista32, thanks, this is important.
(green video: html5 h264, until FF22 all was ok)
(In reply to banakon from comment #52)
> not fixed!!
> pleaase look here
> https://bugzilla.mozilla.org/show_bug.cgi?id=901944

Let's keep investigating your issue on your bug.
ok, sorry ;)
I will provide all the infos you asked
here #901944
Depends on: 901944
Depends on: 945645
Depends on: 1057805
You need to log in before you can comment on or make changes to this bug.