Closed Bug 1223270 Opened 9 years ago Closed 8 years ago

Add support for texture recycling to ffmpeg backend on Windows

Categories

(Core :: Audio/Video: Playback, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed

People

(Reporter: ajones, Assigned: jya)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(10 files)

Texture allocation is slow on nVidia GPUs so recycling the textures will be faster. The ffmpeg backend also has an unnecessary copy that could be replaced by a texture upload.
Flags: needinfo?(matt.woodrow)
Assignee: nobody → matt.woodrow
Depends on: 1214462
I think this is basically what we want, extending it for ffmpeg is only a few lines. I don't see any improvements on my machine though (only dropping frames on level 5.2 of Anthony's frame drop test). https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb1052feea43
Flags: needinfo?(matt.woodrow)
Assignee: matt.woodrow → jyavenard
Status: NEW → ASSIGNED
Assignee: jyavenard → matt.woodrow
Priority: -- → P2
Matt, I had a shot at rebasing this change, but there are things too convoluted for me to be satisfied in the knowledge that my rebase was correct. In particular the changes to gfx/layers/d3d11/TextureD3D11.cpp ; there are modifications to the YCbCrKeepAliveD3D11 class, class that was introduced in the middle of bug 1200595 only to be removed in commit https://hg.mozilla.org/mozilla-central/rev/49e11078c536 And since, anything related to *YCbCrTextureClient classes have been removed. Could you rebase this patch please? We would like to integrate it and further improve our VP9 decoding performance. Thank you
Flags: needinfo?(matt.woodrow)
Attached patch Rebased patchSplinter Review
Flags: needinfo?(matt.woodrow)
Attachment #8746942 - Flags: review?(nical.bugzilla)
Attachment #8746943 - Flags: review?(matt.woodrow)
Attachment #8746942 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. https://reviewboard.mozilla.org/r/49641/#review46419
Attachment #8746943 - Flags: review+
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/1-2/
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/1-2/
https://reviewboard.mozilla.org/r/49637/#review46455 I had missed two files in the rebase and where added to version control, gfx/layers/D3D11YCbCrImage.cpp/h nical, you may want to have a 2nd look again...
What happens when we get a device reset and the layers backend changes? It's handled?
https://reviewboard.mozilla.org/r/49639/#review46643 ::: gfx/thebes/gfxWindowsPlatform.cpp:2261 (Diff revision 2) > ContentAdapterIsParentAdapter(mD3D11ImageBridgeDevice); > } > + > + RefPtr<ID3D10Multithread> multi; > + mD3D11ImageBridgeDevice->QueryInterface(__uuidof(ID3D10Multithread), getter_AddRefs(multi)); > + multi->SetMultithreadProtected(TRUE); Need to null check 'multi' here.
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. https://reviewboard.mozilla.org/r/49641/#review46645
Attachment #8746943 - Flags: review?(matt.woodrow) → review+
I looked at the file that was missing and I'm fine with that (no need to reset the review flag or whatnot). I would prefer to extend TextureClient with an efficient way to initialize it from ycbcr data rather than have an extra Image class (which pokes into the TextureClient's internals), but that can be addressed later.
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/2-3/
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/2-3/
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/3-4/
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/3-4/
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49639/diff/4-5/
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. Review request updated; see interdiff: https://reviewboard.mozilla.org/r/49641/diff/4-5/
Flags: needinfo?(matt.woodrow)
Assignee: matt.woodrow → jyavenard
Mass change P2 -> P3
Priority: P2 → P3
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. https://reviewboard.mozilla.org/r/49639/#review139010 ::: gfx/layers/D3D11YCbCrImage.cpp:16 (Diff revision 7) > +using namespace mozilla::gfx; > + > +namespace mozilla { > +namespace layers { > + > +D3D11YCbCrImage::D3D11YCbCrImage() For the most part, this Image implementation seems really similar to IMFYCbCrImage. The main differences are that IMFYCbCrImage holds a strong ref to the (locked) data, so that it can do uploads on demand (GetTextureClient) rather than needing to do it immediately. It seems like we could combine all the shared code into one class, and have it support both modes (own data, or upload immediately). If you don't want to do that, could you at least move AutoLockTexture and the texture allocation/upload code into methods on DXGIYCbCrTextureData? The IMFYCbCrImage version is slightly better (has the mt->Enter/Leave bits, and comments/history about how we got there), so using that for both seems preferable. ::: gfx/layers/D3D11YCbCrImage.cpp:146 (Diff revision 7) > + RefPtr<ID3D11Texture2D> textureY; > + HRESULT hr = > + mDevice->CreateTexture2D(&newDesc, nullptr, getter_AddRefs(textureY)); > + NS_ENSURE_TRUE(SUCCEEDED(hr), nullptr); > + > + // TODO: Specify CbCr size! Yes, please :) ::: gfx/layers/ImageContainer.cpp:438 (Diff revision 7) > } > } > > +#ifdef XP_WIN > +D3D11YCbCrRecycleAllocator* > +ImageContainer::GetD3D11YCbCrRecycleAllocator(KnowsCompositor* aAllocator) Is it possible that we change aAllocator? Please either assert that this doesn't happen, or reallocate mD3D11YCbCrRecycleAllocator if it does.
Attachment #8746942 - Flags: review?(matt.woodrow)
Comment on attachment 8746943 [details] Bug 1223270: P3. Remove extra copy of YUV buffer on Windows. https://reviewboard.mozilla.org/r/49641/#review139012 Looks good!
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. https://reviewboard.mozilla.org/r/49639/#review139010 > For the most part, this Image implementation seems really similar to IMFYCbCrImage. > > The main differences are that IMFYCbCrImage holds a strong ref to the (locked) data, so that it can do uploads on demand (GetTextureClient) rather than needing to do it immediately. > > It seems like we could combine all the shared code into one class, and have it support both modes (own data, or upload immediately). > > If you don't want to do that, could you at least move AutoLockTexture and the texture allocation/upload code into methods on DXGIYCbCrTextureData? > The IMFYCbCrImage version is slightly better (has the mt->Enter/Leave bits, and comments/history about how we got there), so using that for both seems preferable. Which part is common with IMFYCbCrImage really? I'm not sure they could ever be used interchangeably either. That is if one type is created it's always to be used with the same purpose. They also don't take the same argument type (PlanarYCbCrData vs IMFMediaBuffer)... In all, I'm not sure there's much to gain. Yes, there's AutoLockTexture that is duplicated. This comes from the original patch that you wrote. I will limit that.
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. https://reviewboard.mozilla.org/r/49639/#review139010 > Which part is common with IMFYCbCrImage really? > > I'm not sure they could ever be used interchangeably either. That is if one type is created it's always to be used with the same purpose. > They also don't take the same argument type (PlanarYCbCrData vs IMFMediaBuffer)... > In all, I'm not sure there's much to gain. > > Yes, there's AutoLockTexture that is duplicated. This comes from the original patch that you wrote. I will limit that. Actually, I'll take this into a separate bug... This could be used to ease porting this functionality to other backend/architecture...
YouTube performance is really important and there's some good work going on here that I believe will be very relevant to improving the performance.
Blocks: 1357571
Whiteboard: [qf:p1]
(In reply to Jean-Yves Avenard [:jya] from comment #31) > Which part is common with IMFYCbCrImage really? > > I'm not sure they could ever be used interchangeably either. That is if one > type is created it's always to be used with the same purpose. > They also don't take the same argument type (PlanarYCbCrData vs > IMFMediaBuffer)... > In all, I'm not sure there's much to gain. > > Yes, there's AutoLockTexture that is duplicated. This comes from the > original patch that you wrote. I will limit that. The bit I'm mainly worried about is the code for UpdateSubresource. The IMFYCbCrImage version uses D3D10Multithread to prevent races (maybe), and has comments about why we do this. This is the bit that I really want to be shared. There's also code for allocating the textures, which matters less, but still duplicated.
It may be worth for someone from gfx to take over this bug, because of some memory OOM occurring and errors I do not understand. like here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006 The failure in WebGL and other components show a problem. From a media perspective, everything is done in P3 and P4.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bas)
Upon further thoughts. I'm wondering how valid this approach is. By uploading the data directly to the D3D11 surface we may potentially use a big amount of GPU ram. With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its queue.
Attachment #8864532 - Flags: review?(matt.woodrow)
Attachment #8746942 - Flags: review?(matt.woodrow)
Attachment #8864533 - Flags: review?(matt.woodrow)
Dropping review requests for now since this isn't finished. (In reply to Jean-Yves Avenard [:jya] from comment #40) > Upon further thoughts. I'm wondering how valid this approach is. > > By uploading the data directly to the D3D11 surface we may potentially use a > big amount of GPU ram. > > With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its > queue. That is indeed a concern, we might want to change the condition to only keep 3 frames if the output is going to be stored in GPU memory, rather than a hardware accelerated decoder. (In reply to Jean-Yves Avenard [:jya] from comment #39) > It may be worth for someone from gfx to take over this bug, because of some > memory OOM occurring and errors I do not understand. > > like here: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006 > > The failure in WebGL and other components show a problem. > > From a media perspective, everything is done in P3 and P4. These looks like issues with drawing video->WebGL, so probably an issue with GetAsSourceSurface.
(In reply to Matt Woodrow (:mattwoodrow) from comment #41) > Dropping review requests for now since this isn't finished. > > > (In reply to Jean-Yves Avenard [:jya] from comment #40) > > Upon further thoughts. I'm wondering how valid this approach is. > > > > By uploading the data directly to the D3D11 surface we may potentially use a > > big amount of GPU ram. > > > > With software decoding, the MDSM keeps a minimum of 10 decoded frames in Its > > queue. > > That is indeed a concern, we might want to change the condition to only keep > 3 frames if the output is going to be stored in GPU memory, rather than a > hardware accelerated decoder. FWIW, DirectX transparently pages to regular RAM for unused textures. > (In reply to Jean-Yves Avenard [:jya] from comment #39) > > It may be worth for someone from gfx to take over this bug, because of some > > memory OOM occurring and errors I do not understand. > > > > like here: > > https://treeherder.mozilla.org/#/ > > jobs?repo=try&revision=38e33865582b9cd4f17a0be381735e23752cd006 > > > > The failure in WebGL and other components show a problem. > > > > From a media perspective, everything is done in P3 and P4. > > These looks like issues with drawing video->WebGL, so probably an issue with > GetAsSourceSurface. Yeah, I've fixed all these.
Flags: needinfo?(bas)
Depends on: 1373163
See Also: → 1373229
Attachment #8878667 - Flags: review?(jgilbert) → review+
Comment on attachment 8864533 [details] Bug 1223270: P4. Remove extra copy of YUV buffer for Theora. https://reviewboard.mozilla.org/r/136208/#review155454
Attachment #8864533 - Flags: review?(matt.woodrow) → review+
Attachment #8864532 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8878666 [details] Bug 1223270: P5. Reduce VideoQueue size for GPU based images. https://reviewboard.mozilla.org/r/149962/#review155458
Attachment #8878666 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. https://reviewboard.mozilla.org/r/49639/#review155450 This looks good, I'm just a little worried about D3D devices + multiple threads, since we've had real issues in the past. I'm going to go through this again tomorrow, plus I'm adding Bas to take a second look. ::: gfx/layers/D3D11YCbCrImage.cpp:163 (Diff revision 10) > + if (!mt->GetMultithreadProtected()) { > + gfxCriticalError() << "Device used not marked as multithread-safe."; > + return nullptr; > + } > + > + class D3D11MTAutoEnter Can we use this for SetData too?
Attachment #8746942 - Flags: review?(matt.woodrow)
Flags: needinfo?(matt.woodrow)
Attachment #8746942 - Flags: review?(bas)
Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8b06f81e24a8 P1. Extract AutoLockD3D11Texture. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/5d4f3028f48e P2. Add D3D11YCbCrImage type. r=nical https://hg.mozilla.org/integration/autoland/rev/c79b6d337d04 P3. Remove extra copy of YUV buffer on Windows. r=mattwoodrow,nical https://hg.mozilla.org/integration/autoland/rev/b0b4667e25df P4. Remove extra copy of YUV buffer for Theora. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/3e74b1fb1b79 P5. Reduce VideoQueue size for GPU based images. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/fc623f4ec820 P6. Disable WebGL fast path tests on windows. r=jgilbert
Caused compilation error due to bug 1374389, will be backed out
Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/35093d05505c Backed out changeset fc623f4ec820 https://hg.mozilla.org/integration/autoland/rev/25d328d1bc16 Backed out changeset 3e74b1fb1b79 https://hg.mozilla.org/integration/autoland/rev/5d2d884164ea Backed out changeset b0b4667e25df https://hg.mozilla.org/integration/autoland/rev/3cfa1e95fa2e Backed out changeset c79b6d337d04 https://hg.mozilla.org/integration/autoland/rev/3f095ad6ccc2 Backed out changeset 5d4f3028f48e https://hg.mozilla.org/integration/autoland/rev/e92b4f95ddf0 Backed out changeset 8b06f81e24a8 for build/linker issues on a CLOSED TREE
Attachment #8746942 - Flags: review?(matt.woodrow) → review?(bas)
Attachment #8746942 - Flags: review?(matt.woodrow)
Comment on attachment 8746942 [details] Bug 1223270: P2. Add D3D11YCbCrImage type. https://reviewboard.mozilla.org/r/49639/#review155450 The logic is similar to what is done in IMFYCbCrImage. So if it's a problem here, it will be a problem there too. Now, some changes are made in bug 1373229, the allows for fallback if D3D11 multi-threading isn't supported. > Can we use this for SetData too? I've done so in P7 (and also used in in IMFYCbCrImage)
Attachment #8879671 - Flags: review?(matt.woodrow) → review+
Attachment #8746942 - Flags: review?(matt.woodrow) → review+
Pushed by jyavenard@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/52b484c23679 P1. Extract AutoLockD3D11Texture. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/04ce252928ce P2. Add D3D11YCbCrImage type. r=mattwoodrow,nical https://hg.mozilla.org/integration/autoland/rev/89e57eae53e9 P3. Remove extra copy of YUV buffer on Windows. r=mattwoodrow,nical https://hg.mozilla.org/integration/autoland/rev/809aef3e8b19 P4. Remove extra copy of YUV buffer for Theora. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/79d9bc4917e1 P5. Reduce VideoQueue size for GPU based images. r=mattwoodrow https://hg.mozilla.org/integration/autoland/rev/f2dce9290174 P6. Disable WebGL fast path tests on windows. r=jgilbert https://hg.mozilla.org/integration/autoland/rev/6df92fc83b61 P7. Extract D3D11MTAutoEnter and reuse. r=mattwoodrow
See Also: → 1403618
Depends on: 1409141
Depends on: 1418831
See Also: → 1479175
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: