[Wayland] Implement h.264 VA-API decode by ffmpeg
Categories
(Core :: Audio/Video: Playback, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox75 | --- | fixed |
People
(Reporter: stransky, Assigned: stransky)
References
(Blocks 1 open bug)
Details
Attachments
(5 files)
Implement VA-API decode by ffmpeg on Wayland.
Assignee | ||
Comment 1•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 2•6 years ago
|
||
Implement VA-API decoder on top of FFmpegDataDecoder.
Implement VAAPIFrameHolder class to hold decoded h264 image which is used by GL backend,
we need to keep reference to the frame as ffmpeg tends to re-use it for another video frames.
Depends on D63132
Assignee | ||
Comment 3•6 years ago
|
||
Depends on D63133
Comment 4•6 years ago
|
||
Do you intend to complete things so that there's no readbacks?
When we last play with this approach, performance was consistently worse than with software decoders
Assignee | ||
Comment 5•6 years ago
•
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #4)
Do you intend to complete things so that there's no readbacks?
When we last play with this approach, performance was consistently worse than with software decoders
I'm not sure what do you mean with "no readbacks". Do you mean that we should copy every frame right after creation from vasurface DRM buffer to a new gl texture and then render from it? I can check that.
Right now I use the direct rendering from va surface and I see rapid performance improvement than SW decoding. I checked mpv and it uses the same method (draw directly from vasurface) and it's the fastest rendering/playback I've seen so far.
There's also a difference if the video is played by WebRender or GL compositor, where GL compositor seems to work bette and I see rendering artifacts in WebRender. Right now I'd go for direct rendering & gl compositor which works for me.
I can implement the vasurface -> gl texture copy at WaylandDMABUFSurface which can be optionally enabled so there may be no much difference in the patches I submitted (we just don't need to keep reference to DRM buffers in this case).
Comment 6•6 years ago
|
||
Concerning Webrender, note bug 1579235, especially https://bugzilla.mozilla.org/show_bug.cgi?id=1579235#c8
Assignee | ||
Comment 7•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #4)
Do you intend to complete things so that there's no readbacks?
When we last play with this approach, performance was consistently worse than with software decoders
I tested it today with GL compositor. I tested 2K / full HD and 720p clips playback on 4K display on Intel 630 / Fedora 31 / Wayland.
With VAAPI I have constant 4-5% cpu utilization (on 6 core CPU + HT on) which means one core is active and running about 50% no matter which clip is played.
With SW decode + GL rendering I have 9% cpu usage for 720p clip and 12-15% cpu usage for FullHD/2K clips which means one core is about 100% and another one 20-30%.
Assignee | ||
Comment 8•6 years ago
|
||
For reference mpv --hwdec=vaapi gives me about 2% cpu utilization no matter which clip is played.
Comment 9•6 years ago
|
||
(In reply to Martin Stránský [:stransky] from comment #5)
(In reply to Jean-Yves Avenard [:jya] from comment #4)
Do you intend to complete things so that there's no readbacks?
When we last play with this approach, performance was consistently worse than with software decoders
I'm not sure what do you mean with "no readbacks". Do you mean that we should copy every frame right after creation from vasurface DRM buffer to a new gl texture and then render from it? I can check that.
No, I mean that supporting the VA-OpenGL surface in the compositor and paint them directly. No readback into a software buffer or DMA mapping.
This will require a much more extensive change, as you need sharing the HW context with the compositor and have native support for VA-GL images.
In the mean time, I'd want this to be behind a pref that is disabled by default.
Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #9)
No, I mean that supporting the VA-OpenGL surface in the compositor and paint them directly. No readback into a software buffer or DMA mapping.
I see. AFAIK VA-OpenGL surface can be used on GLX only.
On Wayland we use dmabuf to share hw context and it's implemented by WaylandDMABUFSurfaces - Bug 1572697.
So yes, under Wayland we can render directly from VASurface. This is also a reason why VAAPIFrameHolder() class is used in the patch - it references the HW buffer as far as it's used by gecko compositor. Without the reference VASurfaces are reused by ffmpeg and video playback is scattered.
This will require a much more extensive change, as you need sharing the HW context with the compositor and have native support for VA-GL images.
Yes, it was worked on Bug 1572697. We already have that implemented for WebRender/GL compositor and WebGL also can use it (Bug 1586696).
In the mean time, I'd want this to be behind a pref that is disabled by default.
It's off by default, Bug 1616680 has the needed changes to platform to enable it under preference. I'd need to update the patch here for it.
Assignee | ||
Comment 12•6 years ago
|
||
Load and bind symbols from libva and libav needed for HW accelerated video decode.
Depends on D63134
Updated•6 years ago
|
Comment 13•6 years ago
|
||
Thank you for this contribution. Quite nice.
I'm a bit unfamiliar with the mapping of the GPU image to be used later by the wayland code. Does that performs a readback or the mapping is handled like it would with a GL surface handle?
And how does this work in a multi-process environment?
decoding is currently done in the content process (but not for much longer), while compositing is in the GPU process.
Once bug 1595994 lands, decoding will be done in the RDD (remote data decoder) process.
I'm unfamiliar with how this done, have done work with latest vaapi version. Last I implemented a vaapi decoder was almost 10 years ago (in mythtv), things have changed since.
Ultimately, on windows we had to implement a GPU process and run the decoding there, because HW decoders and drivers have proven to be very crashy. so to enable this we will have to wait on 1595994
Assignee | ||
Comment 14•6 years ago
•
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #13)
I'm a bit unfamiliar with the mapping of the GPU image to be used later by the wayland code. Does that performs a readback or the mapping is handled like it would with a GL surface handle?
That was the most difficult part of the work. FFmpeg decoded VASurfaces are GEM objects which can be mapped as dmabuf object (a fd in user space) so there it's any copy here. We use EXT_image_dma_buf_import extension to map dmabuf fd as EGLImage without copy. The issue here is that VASurfaces/GEM to dmabuf mapping isn't exact, dmabuf object is live until the fd is closed but underlying GEM object can be changed.
So there's a problem that VASurfaces/GEM are altered for the same dmabuf fd as VASurfaces are reused by va-api hw decoder. That's reason there's the frame holder class here - to keep VASurfaces/GEM mapped to exact dmabuf object until the dmabuf/EGLimage on top of is used by gecko compositor.
So yes, we do direct rendering from va-api decoded frames in gecko without any copy.
I'm not sure what do you mean with 'GL surface handle'. We use EGLimage which is an abstraction over GPU memory and can be mapped as texture/framebuffer so it's pretty much versatile.
And how does this work in a multi-process environment?
The VASurfaces/GEM mapped as dmabuf can be shared as fd or EGLImage. I use fd and SurfaceDescriptorDMABuf is used for sharing.
decoding is currently done in the content process (but not for much longer), while compositing is in the GPU process.
Once bug 1595994 lands, decoding will be done in the RDD (remote data decoder) process.
That's not a problem. But Wayland does not use GPU process.
I'm unfamiliar with how this done, have done work with latest vaapi version. Last I implemented a vaapi decoder was almost 10 years ago (in mythtv), things have changed since.
Ultimately, on windows we had to implement a GPU process and run the decoding there, because HW decoders and drivers have proven to be very crashy. so to enable this we will have to wait on 1595994
I guess there's not difference from Wayland POV. It does not matter which process does the decoding as results are always shared by SurfaceDescriptorDMABuf.
Thanks.
Comment 15•6 years ago
|
||
Assignee | ||
Comment 16•6 years ago
|
||
Try for all four patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=90296ee316295b3b548dd3cfa9c68684972da633
Assignee | ||
Updated•6 years ago
|
Comment 18•6 years ago
|
||
![]() |
||
Comment 19•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e9ba11d2516b
https://hg.mozilla.org/mozilla-central/rev/da012bb39f2b
https://hg.mozilla.org/mozilla-central/rev/e0f4279ea250
https://hg.mozilla.org/mozilla-central/rev/5ebaa08b1816
Comment hidden (obsolete) |
Assignee | ||
Comment 22•6 years ago
|
||
Please try latest nightly and open a new bug if it's broken for you there. Firefox 75 does not have va-api support finished.
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment 25•5 years ago
|
||
System: Host: a Kernel: 5.7.9-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.1.0 Desktop: KDE Plasma 5.19.3
Distro: Manjaro Linux
Machine: Type: Laptop System: Dell product: XPS 15 9560 v: N/A serial: <filter>
Mobo: Dell model: 0YH90J v: A04 serial: <filter> UEFI: Dell v: 1.19.2 date: 05/22/2020
CPU: Topology: Quad Core model: Intel Core i7-7700HQ bits: 64 type: MT MCP arch: Kaby Lake rev: 9 L2 cache: 6144 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 44817
Speed: 1000 MHz min/max: 800/2800 MHz Core speeds (MHz): 1: 1000 2: 1000 3: 1000 4: 1000 5: 1000 6: 1001 7: 1000
8: 1000
Graphics: Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0
Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] driver: N/A bus ID: 01:00.0
Display: x11 server: X.Org 1.20.8 driver: modesetting resolution: 3840x2160~60Hz
OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 20.1.3 direct render: Yes
Audio: Device-1: Intel CM238 HD Audio vendor: Dell driver: snd_hda_intel v: kernel bus ID: 00:1f.3
Sound Server: ALSA v: k5.7.9-1-MANJARO
Nightly 80.0a1 (2020-07-24)
env MOZ_X11_EGL=1 MOZ_LOG="PlatformDecoderModule:5" firefox-nightly
Since today(yesterday?) video playbacks are quite broken. It's playing but time to time it show all green frame and flickering. Sometimes it goes first frame (00:00) then play again.
I'll attach logs.
Normal mp4 file and youtube video affected.
Comment 26•5 years ago
|
||
Assignee | ||
Comment 27•5 years ago
|
||
(In reply to dontdieych from comment #25)
Since today(yesterday?) video playbacks are quite broken. It's playing but time to time it show all green frame and flickering. Sometimes it goes first frame (00:00) then play again.
Please try latest nightly, should be fixed now. File a new bug if you still see it.
Thanks.
Description
•