Closed Bug 1616185 Opened 1 year ago Closed 1 year ago

[Wayland] Implement h.264 VA-API decode by ffmpeg

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla75
Tracking Status
firefox75 --- fixed

People

(Reporter: stransky, Assigned: stransky)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

Implement VA-API decode by ffmpeg on Wayland.

Assignee: nobody → stransky
Status: NEW → ASSIGNED

Implement VA-API decoder on top of FFmpegDataDecoder.
Implement VAAPIFrameHolder class to hold decoded h264 image which is used by GL backend,
we need to keep reference to the frame as ffmpeg tends to re-use it for another video frames.

Depends on D63132

Do you intend to complete things so that there's no readbacks?

When we last play with this approach, performance was consistently worse than with software decoders

Flags: needinfo?(stransky)

(In reply to Jean-Yves Avenard [:jya] from comment #4)

Do you intend to complete things so that there's no readbacks?

When we last play with this approach, performance was consistently worse than with software decoders

I'm not sure what do you mean with "no readbacks". Do you mean that we should copy every frame right after creation from vasurface DRM buffer to a new gl texture and then render from it? I can check that.

Right now I use the direct rendering from va surface and I see rapid performance improvement than SW decoding. I checked mpv and it uses the same method (draw directly from vasurface) and it's the fastest rendering/playback I've seen so far.

There's also a difference if the video is played by WebRender or GL compositor, where GL compositor seems to work bette and I see rendering artifacts in WebRender. Right now I'd go for direct rendering & gl compositor which works for me.

I can implement the vasurface -> gl texture copy at WaylandDMABUFSurface which can be optionally enabled so there may be no much difference in the patches I submitted (we just don't need to keep reference to DRM buffers in this case).

Flags: needinfo?(stransky)
See Also: → 1579235
Depends on: 1616590

(In reply to Jean-Yves Avenard [:jya] from comment #4)

Do you intend to complete things so that there's no readbacks?

When we last play with this approach, performance was consistently worse than with software decoders

I tested it today with GL compositor. I tested 2K / full HD and 720p clips playback on 4K display on Intel 630 / Fedora 31 / Wayland.

With VAAPI I have constant 4-5% cpu utilization (on 6 core CPU + HT on) which means one core is active and running about 50% no matter which clip is played.

With SW decode + GL rendering I have 9% cpu usage for 720p clip and 12-15% cpu usage for FullHD/2K clips which means one core is about 100% and another one 20-30%.

For reference mpv --hwdec=vaapi gives me about 2% cpu utilization no matter which clip is played.

No longer depends on: 1616590

(In reply to Martin Stránský [:stransky] from comment #5)

(In reply to Jean-Yves Avenard [:jya] from comment #4)

Do you intend to complete things so that there's no readbacks?

When we last play with this approach, performance was consistently worse than with software decoders

I'm not sure what do you mean with "no readbacks". Do you mean that we should copy every frame right after creation from vasurface DRM buffer to a new gl texture and then render from it? I can check that.

No, I mean that supporting the VA-OpenGL surface in the compositor and paint them directly. No readback into a software buffer or DMA mapping.

This will require a much more extensive change, as you need sharing the HW context with the compositor and have native support for VA-GL images.

In the mean time, I'd want this to be behind a pref that is disabled by default.

(In reply to Jean-Yves Avenard [:jya] from comment #9)

No, I mean that supporting the VA-OpenGL surface in the compositor and paint them directly. No readback into a software buffer or DMA mapping.

I see. AFAIK VA-OpenGL surface can be used on GLX only.

On Wayland we use dmabuf to share hw context and it's implemented by WaylandDMABUFSurfaces - Bug 1572697.

So yes, under Wayland we can render directly from VASurface. This is also a reason why VAAPIFrameHolder() class is used in the patch - it references the HW buffer as far as it's used by gecko compositor. Without the reference VASurfaces are reused by ffmpeg and video playback is scattered.

This will require a much more extensive change, as you need sharing the HW context with the compositor and have native support for VA-GL images.

Yes, it was worked on Bug 1572697. We already have that implemented for WebRender/GL compositor and WebGL also can use it (Bug 1586696).

In the mean time, I'd want this to be behind a pref that is disabled by default.

It's off by default, Bug 1616680 has the needed changes to platform to enable it under preference. I'd need to update the patch here for it.

Duplicate of this bug: 1616128

Load and bind symbols from libva and libav needed for HW accelerated video decode.

Depends on D63134

Priority: -- → P3

Thank you for this contribution. Quite nice.

I'm a bit unfamiliar with the mapping of the GPU image to be used later by the wayland code. Does that performs a readback or the mapping is handled like it would with a GL surface handle?
And how does this work in a multi-process environment?
decoding is currently done in the content process (but not for much longer), while compositing is in the GPU process.
Once bug 1595994 lands, decoding will be done in the RDD (remote data decoder) process.

I'm unfamiliar with how this done, have done work with latest vaapi version. Last I implemented a vaapi decoder was almost 10 years ago (in mythtv), things have changed since.

Ultimately, on windows we had to implement a GPU process and run the decoding there, because HW decoders and drivers have proven to be very crashy. so to enable this we will have to wait on 1595994

(In reply to Jean-Yves Avenard [:jya] from comment #13)

I'm a bit unfamiliar with the mapping of the GPU image to be used later by the wayland code. Does that performs a readback or the mapping is handled like it would with a GL surface handle?

That was the most difficult part of the work. FFmpeg decoded VASurfaces are GEM objects which can be mapped as dmabuf object (a fd in user space) so there it's any copy here. We use EXT_image_dma_buf_import extension to map dmabuf fd as EGLImage without copy. The issue here is that VASurfaces/GEM to dmabuf mapping isn't exact, dmabuf object is live until the fd is closed but underlying GEM object can be changed.

So there's a problem that VASurfaces/GEM are altered for the same dmabuf fd as VASurfaces are reused by va-api hw decoder. That's reason there's the frame holder class here - to keep VASurfaces/GEM mapped to exact dmabuf object until the dmabuf/EGLimage on top of is used by gecko compositor.

So yes, we do direct rendering from va-api decoded frames in gecko without any copy.

I'm not sure what do you mean with 'GL surface handle'. We use EGLimage which is an abstraction over GPU memory and can be mapped as texture/framebuffer so it's pretty much versatile.

And how does this work in a multi-process environment?

The VASurfaces/GEM mapped as dmabuf can be shared as fd or EGLImage. I use fd and SurfaceDescriptorDMABuf is used for sharing.

decoding is currently done in the content process (but not for much longer), while compositing is in the GPU process.
Once bug 1595994 lands, decoding will be done in the RDD (remote data decoder) process.

That's not a problem. But Wayland does not use GPU process.

I'm unfamiliar with how this done, have done work with latest vaapi version. Last I implemented a vaapi decoder was almost 10 years ago (in mythtv), things have changed since.

Ultimately, on windows we had to implement a GPU process and run the decoding there, because HW decoders and drivers have proven to be very crashy. so to enable this we will have to wait on 1595994

I guess there's not difference from Wayland POV. It does not matter which process does the decoding as results are always shared by SurfaceDescriptorDMABuf.

Thanks.

Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e9ba11d2516b
[Wayland] Implement VA-API decode in FFmpegDataDecoder, r=jya
Duplicate of this bug: 1616129
Summary: [Wayland] Implement VA-API decode by ffmpeg → [Wayland] Implement h.264 VA-API decode by ffmpeg
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da012bb39f2b
[Wayland] Implement VA-API decode in FFmpegVideoDecoder, r=jya
https://hg.mozilla.org/integration/autoland/rev/e0f4279ea250
[Wayland] Build VA-API support for ffmpeg58 and Wayland only, r=jya
https://hg.mozilla.org/integration/autoland/rev/5ebaa08b1816
[Wayland] Load library symbols for VA-API r=jya
Duplicate of this bug: 1210727
Regressions: 1619544

Please try latest nightly and open a new bug if it's broken for you there. Firefox 75 does not have va-api support finished.

System:    Host: a Kernel: 5.7.9-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.1.0 Desktop: KDE Plasma 5.19.3 
           Distro: Manjaro Linux 
Machine:   Type: Laptop System: Dell product: XPS 15 9560 v: N/A serial: <filter> 
           Mobo: Dell model: 0YH90J v: A04 serial: <filter> UEFI: Dell v: 1.19.2 date: 05/22/2020 
CPU:       Topology: Quad Core model: Intel Core i7-7700HQ bits: 64 type: MT MCP arch: Kaby Lake rev: 9 L2 cache: 6144 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 44817 
           Speed: 1000 MHz min/max: 800/2800 MHz Core speeds (MHz): 1: 1000 2: 1000 3: 1000 4: 1000 5: 1000 6: 1001 7: 1000 
           8: 1000 
Graphics:  Device-1: Intel HD Graphics 630 vendor: Dell driver: i915 v: kernel bus ID: 00:02.0 
           Device-2: NVIDIA GP107M [GeForce GTX 1050 Mobile] driver: N/A bus ID: 01:00.0 
           Display: x11 server: X.Org 1.20.8 driver: modesetting resolution: 3840x2160~60Hz 
           OpenGL: renderer: Mesa Intel HD Graphics 630 (KBL GT2) v: 4.6 Mesa 20.1.3 direct render: Yes 
Audio:     Device-1: Intel CM238 HD Audio vendor: Dell driver: snd_hda_intel v: kernel bus ID: 00:1f.3 
           Sound Server: ALSA v: k5.7.9-1-MANJARO 

Nightly 80.0a1 (2020-07-24)

env MOZ_X11_EGL=1 MOZ_LOG="PlatformDecoderModule:5" firefox-nightly

Since today(yesterday?) video playbacks are quite broken. It's playing but time to time it show all green frame and flickering. Sometimes it goes first frame (00:00) then play again.

I'll attach logs.
Normal mp4 file and youtube video affected.

(In reply to dontdieych from comment #25)

Since today(yesterday?) video playbacks are quite broken. It's playing but time to time it show all green frame and flickering. Sometimes it goes first frame (00:00) then play again.

Please try latest nightly, should be fixed now. File a new bug if you still see it.
Thanks.

You need to log in before you can comment on or make changes to this bug.