Closed Bug 1223270 Opened 6 years ago Closed 4 years ago

Add support for texture recycling to ffmpeg backend on Windows

Categories

(Core :: Audio/Video: Playback, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed

People

(Reporter: ajones, Assigned: jya)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [qf:p1])

Attachments

(10 files)

Texture allocation is slow on nVidia GPUs so recycling the textures will be faster. The ffmpeg backend also has an unnecessary copy that could be replaced by a texture upload.
Flags: needinfo?(matt.woodrow)
Assignee: nobody → matt.woodrow
Depends on: 1214462
I think this is basically what we want, extending it for ffmpeg is only a few lines.

I don't see any improvements on my machine though (only dropping frames on level 5.2 of Anthony's frame drop test).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb1052feea43
Flags: needinfo?(matt.woodrow)
Assignee: matt.woodrow → jyavenard
Status: NEW → ASSIGNED
Assignee: jyavenard → matt.woodrow
Priority: -- → P2
Matt, I had a shot at rebasing this change, but there are things too convoluted for me to be satisfied in the knowledge that my rebase was correct.

In particular the changes to gfx/layers/d3d11/TextureD3D11.cpp ; there are modifications to the YCbCrKeepAliveD3D11 class, class that was introduced in the middle of bug 1200595 only to be removed in commit https://hg.mozilla.org/mozilla-central/rev/49e11078c536

And since, anything related to *YCbCrTextureClient classes have been removed.

Could you rebase this patch please? We would like to integrate it and further improve our VP9 decoding performance.

Thank you
Flags: needinfo?(matt.woodrow)
Attached patch Rebased patchSplinter Review
Flags: needinfo?(matt.woodrow)
Review commit: https://reviewboard.mozilla.org/r/49639/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/49639/
Attachment #8746942 - Flags: review?(nical.bugzilla)
Attachment #8746943 - Flags: review?(matt.woodrow)
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review46407
Attachment #8746942 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

https://reviewboard.mozilla.org/r/49641/#review46419
Attachment #8746943 - Flags: review+
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/1-2/
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/1-2/
https://reviewboard.mozilla.org/r/49637/#review46455

I had missed two files in the rebase and where added to version control, gfx/layers/D3D11YCbCrImage.cpp/h

nical, you may want to have a 2nd look again...
What happens when we get a device reset and the layers backend changes?  It's handled?
https://reviewboard.mozilla.org/r/49639/#review46643

::: gfx/thebes/gfxWindowsPlatform.cpp:2261
(Diff revision 2)
>      ContentAdapterIsParentAdapter(mD3D11ImageBridgeDevice);
>    }
> +
> +  RefPtr<ID3D10Multithread> multi;
> +  mD3D11ImageBridgeDevice->QueryInterface(__uuidof(ID3D10Multithread), getter_AddRefs(multi));
> +  multi->SetMultithreadProtected(TRUE);

Need to null check 'multi' here.
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

https://reviewboard.mozilla.org/r/49641/#review46645
Attachment #8746943 - Flags: review?(matt.woodrow) → review+
I looked at the file that was missing and I'm fine with that (no need to reset the review flag or whatnot). I would prefer to extend TextureClient with an efficient way to initialize it from ycbcr data rather than have an extra Image class (which pokes into the TextureClient's internals), but that can be addressed later.
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/2-3/
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/2-3/
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/3-4/
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/3-4/
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/4-5/
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/4-5/
Flags: needinfo?(matt.woodrow)
Assignee: matt.woodrow → jyavenard
Mass change P2 -> P3
Priority: P2 → P3
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review139010

::: gfx/layers/D3D11YCbCrImage.cpp:16
(Diff revision 7)
> +using namespace mozilla::gfx;
> +
> +namespace mozilla {
> +namespace layers {
> +
> +D3D11YCbCrImage::D3D11YCbCrImage()

For the most part, this Image implementation seems really similar to IMFYCbCrImage.

The main differences are that IMFYCbCrImage holds a strong ref to the (locked) data, so that it can do uploads on demand (GetTextureClient) rather than needing to do it immediately.

It seems like we could combine all the shared code into one class, and have it support both modes (own data, or upload immediately).

If you don't want to do that, could you at least move AutoLockTexture and the texture allocation/upload code into methods on DXGIYCbCrTextureData?
The IMFYCbCrImage version is slightly better (has the mt->Enter/Leave bits, and comments/history about how we got there), so using that for both seems preferable.

::: gfx/layers/D3D11YCbCrImage.cpp:146
(Diff revision 7)
> +  RefPtr<ID3D11Texture2D> textureY;
> +  HRESULT hr =
> +    mDevice->CreateTexture2D(&newDesc, nullptr, getter_AddRefs(textureY));
> +  NS_ENSURE_TRUE(SUCCEEDED(hr), nullptr);
> +
> +  // TODO: Specify CbCr size!

Yes, please :)

::: gfx/layers/ImageContainer.cpp:438
(Diff revision 7)
>    }
>  }
>  
> +#ifdef XP_WIN
> +D3D11YCbCrRecycleAllocator*
> +ImageContainer::GetD3D11YCbCrRecycleAllocator(KnowsCompositor* aAllocator)

Is it possible that we change aAllocator?

Please either assert that this doesn't happen, or reallocate mD3D11YCbCrRecycleAllocator if it does.
Attachment #8746942 - Flags: review?(matt.woodrow)
Comment on attachment 8746943 [details]
Bug 1223270: P3. Remove extra copy of YUV buffer on Windows.

https://reviewboard.mozilla.org/r/49641/#review139012

Looks good!
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review139010

> For the most part, this Image implementation seems really similar to IMFYCbCrImage.
> 
> The main differences are that IMFYCbCrImage holds a strong ref to the (locked) data, so that it can do uploads on demand (GetTextureClient) rather than needing to do it immediately.
> 
> It seems like we could combine all the shared code into one class, and have it support both modes (own data, or upload immediately).
> 
> If you don't want to do that, could you at least move AutoLockTexture and the texture allocation/upload code into methods on DXGIYCbCrTextureData?
> The IMFYCbCrImage version is slightly better (has the mt->Enter/Leave bits, and comments/history about how we got there), so using that for both seems preferable.

Which part is common with IMFYCbCrImage really?

I'm not sure they could ever be used interchangeably either. That is if one type is created it's always to be used with the same purpose.
They also don't take the same argument type (PlanarYCbCrData vs IMFMediaBuffer)...
In all, I'm not sure there's much to gain.

Yes, there's AutoLockTexture that is duplicated. This comes from the original patch that you wrote. I will limit that.
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review139010

> Which part is common with IMFYCbCrImage really?
> 
> I'm not sure they could ever be used interchangeably either. That is if one type is created it's always to be used with the same purpose.
> They also don't take the same argument type (PlanarYCbCrData vs IMFMediaBuffer)...
> In all, I'm not sure there's much to gain.
> 
> Yes, there's AutoLockTexture that is duplicated. This comes from the original patch that you wrote. I will limit that.

Actually, I'll take this into a separate bug... This could be used to ease porting this functionality to other backend/architecture...
YouTube performance is really important and there's some good work going on here that I believe will be very relevant to improving the performance.
Blocks: 1357571
Whiteboard: [qf:p1]
(In reply to Jean-Yves Avenard [:jya] from comment #31)

> Which part is common with IMFYCbCrImage really?
> 
> I'm not sure they could ever be used interchangeably either. That is if one
> type is created it's always to be used with the same purpose.
> They also don't take the same argument type (PlanarYCbCrData vs
> IMFMediaBuffer)...
> In all, I'm not sure there's much to gain.
> 
> Yes, there's AutoLockTexture that is duplicated. This comes from the
> original patch that you wrote. I will limit that.

The bit I'm mainly worried about is the code for UpdateSubresource. The IMFYCbCrImage version uses D3D10Multithread to prevent races (maybe), and has comments about why we do this. This is the bit that I really want to be shared.

There's also code for allocating the textures, which matters less, but still duplicated.
It may be worth for someone from gfx to take over this bug, because of some memory OOM occurring and errors I do not understand.

like here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006

The failure in WebGL and other components show a problem.

From a media perspective, everything is done in P3 and P4.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bas)
Upon further thoughts. I'm wondering how valid this approach is. 

By uploading the data directly to the D3D11 surface we may potentially use a big amount of GPU ram. 

With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its queue.
Attachment #8864532 - Flags: review?(matt.woodrow)
Attachment #8746942 - Flags: review?(matt.woodrow)
Attachment #8864533 - Flags: review?(matt.woodrow)
Dropping review requests for now since this isn't finished.


(In reply to Jean-Yves Avenard [:jya] from comment #40)
> Upon further thoughts. I'm wondering how valid this approach is. 
> 
> By uploading the data directly to the D3D11 surface we may potentially use a
> big amount of GPU ram. 
> 
> With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its
> queue.

That is indeed a concern, we might want to change the condition to only keep 3 frames if the output is going to be stored in GPU memory, rather than a hardware accelerated decoder.


(In reply to Jean-Yves Avenard [:jya] from comment #39)
> It may be worth for someone from gfx to take over this bug, because of some
> memory OOM occurring and errors I do not understand.
> 
> like here:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006
> 
> The failure in WebGL and other components show a problem.
> 
> From a media perspective, everything is done in P3 and P4.

These looks like issues with drawing video->WebGL, so probably an issue with GetAsSourceSurface.
(In reply to Matt Woodrow (:mattwoodrow) from comment #41)
> Dropping review requests for now since this isn't finished.
> 
> 
> (In reply to Jean-Yves Avenard [:jya] from comment #40)
> > Upon further thoughts. I'm wondering how valid this approach is. 
> > 
> > By uploading the data directly to the D3D11 surface we may potentially use a
> > big amount of GPU ram. 
> > 
> > With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its
> > queue.
> 
> That is indeed a concern, we might want to change the condition to only keep
> 3 frames if the output is going to be stored in GPU memory, rather than a
> hardware accelerated decoder.

FWIW, DirectX transparently pages to regular RAM for unused textures.

> (In reply to Jean-Yves Avenard [:jya] from comment #39)
> > It may be worth for someone from gfx to take over this bug, because of some
> > memory OOM occurring and errors I do not understand.
> > 
> > like here:
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006
> > 
> > The failure in WebGL and other components show a problem.
> > 
> > From a media perspective, everything is done in P3 and P4.
> 
> These looks like issues with drawing video->WebGL, so probably an issue with
> GetAsSourceSurface.

Yeah, I've fixed all these.
Flags: needinfo?(bas)
Depends on: 1373163
See Also: → 1373229
Comment on attachment 8878667 [details]
Bug 1223270: P6. Disable WebGL fast path tests on windows.

https://reviewboard.mozilla.org/r/149964/#review154702
Attachment #8878667 - Flags: review?(jgilbert) → review+
Comment on attachment 8864533 [details]
Bug 1223270: P4. Remove extra copy of YUV buffer for Theora.

https://reviewboard.mozilla.org/r/136208/#review155454
Attachment #8864533 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8864532 [details]
Bug 1223270: P1. Extract AutoLockD3D11Texture.

https://reviewboard.mozilla.org/r/136206/#review155456
Attachment #8864532 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8878666 [details]
Bug 1223270: P5. Reduce VideoQueue size for GPU based images.

https://reviewboard.mozilla.org/r/149962/#review155458
Attachment #8878666 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review155450

This looks good, I'm just a little worried about D3D devices + multiple threads, since we've had real issues in the past.

I'm going to go through this again tomorrow, plus I'm adding Bas to take a second look.

::: gfx/layers/D3D11YCbCrImage.cpp:163
(Diff revision 10)
> +  if (!mt->GetMultithreadProtected()) {
> +    gfxCriticalError() << "Device used not marked as multithread-safe.";
> +    return nullptr;
> +  }
> +
> +  class D3D11MTAutoEnter

Can we use this for SetData too?
Attachment #8746942 - Flags: review?(matt.woodrow)
Flags: needinfo?(matt.woodrow)
Attachment #8746942 - Flags: review?(bas)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8b06f81e24a8
P1. Extract AutoLockD3D11Texture. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/5d4f3028f48e
P2. Add D3D11YCbCrImage type. r=nical
https://hg.mozilla.org/integration/autoland/rev/c79b6d337d04
P3. Remove extra copy of YUV buffer on Windows. r=mattwoodrow,nical
https://hg.mozilla.org/integration/autoland/rev/b0b4667e25df
P4. Remove extra copy of YUV buffer for Theora. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/3e74b1fb1b79
P5. Reduce VideoQueue size for GPU based images. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/fc623f4ec820
P6. Disable WebGL fast path tests on windows. r=jgilbert
Caused compilation error due to bug 1374389, will be backed out
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/35093d05505c
Backed out changeset fc623f4ec820 
https://hg.mozilla.org/integration/autoland/rev/25d328d1bc16
Backed out changeset 3e74b1fb1b79 
https://hg.mozilla.org/integration/autoland/rev/5d2d884164ea
Backed out changeset b0b4667e25df 
https://hg.mozilla.org/integration/autoland/rev/3cfa1e95fa2e
Backed out changeset c79b6d337d04 
https://hg.mozilla.org/integration/autoland/rev/3f095ad6ccc2
Backed out changeset 5d4f3028f48e 
https://hg.mozilla.org/integration/autoland/rev/e92b4f95ddf0
Backed out changeset 8b06f81e24a8 for build/linker issues on a CLOSED TREE
Attachment #8746942 - Flags: review?(matt.woodrow) → review?(bas)
Attachment #8746942 - Flags: review?(matt.woodrow)
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review155450

The logic is similar to what is done in IMFYCbCrImage. So if it's a problem here, it will be a problem there too.

Now, some changes are made in bug 1373229, the allows for fallback if D3D11 multi-threading isn't supported.

> Can we use this for SetData too?

I've done so in P7 (and also used in in IMFYCbCrImage)
Comment on attachment 8879671 [details]
Bug 1223270: P7. Extract D3D11MTAutoEnter and reuse.

https://reviewboard.mozilla.org/r/151020/#review156166
Attachment #8879671 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8746942 [details]
Bug 1223270: P2. Add D3D11YCbCrImage type.

https://reviewboard.mozilla.org/r/49639/#review156170
Attachment #8746942 - Flags: review?(matt.woodrow) → review+
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/52b484c23679
P1. Extract AutoLockD3D11Texture. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/04ce252928ce
P2. Add D3D11YCbCrImage type. r=mattwoodrow,nical
https://hg.mozilla.org/integration/autoland/rev/89e57eae53e9
P3. Remove extra copy of YUV buffer on Windows. r=mattwoodrow,nical
https://hg.mozilla.org/integration/autoland/rev/809aef3e8b19
P4. Remove extra copy of YUV buffer for Theora. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/79d9bc4917e1
P5. Reduce VideoQueue size for GPU based images. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/f2dce9290174
P6. Disable WebGL fast path tests on windows. r=jgilbert
https://hg.mozilla.org/integration/autoland/rev/6df92fc83b61
P7. Extract D3D11MTAutoEnter and reuse. r=mattwoodrow
See Also: → 1403618
Depends on: 1409141
Duplicate of this bug: 1210233
Depends on: 1418831
See Also: → 1479175
You need to log in before you can comment on or make changes to this bug.