Open Bug 1539735 Opened 6 years ago Updated 2 months ago

Perform on demand decoding driven by the compositor and move all decoding into the GPU process

Tracking

()

Status:

NEW

People

(Reporter: jya, Unassigned, NeedInfo)

References

(Blocks 2 open bugs)

Details

Jean-Yves Avenard [:jya]

Reporter

Description

•

6 years ago

Idem came up during a 1:1 with :mattwoodrow

Currently the playback stack on Windows is something like:

Demuxing (CP) -> IPC -> Decoding (GPU) -> Copy into a new sync surface (GPU) -> IPC -> MediaFormatReader/MediaDecoderStateMachine which will buffer at least 3 frames (Content) -> VideoSink (CP) -> ImageContainer (CP) -> IPC -> Compositor (GPU) -> Upload to surface / convert to RGB (GPU)

On Windows, a DXVA decoder will use and recycle around 4 surfaces. As such, if you attempt to decode a 5th frame, it will typically use the same surface as what was used for the first frame.

The VideoSink / ImageContainer itself will keep over 10 frames in its queue.

As such, we must perform a blit/copy into a new surface (including allocation of such surface) before returning that image to the MDSM.

In bug 1536449, profiler reveals that copying into sync surface is where most of the time is spent; leading to disastrous performance.

One solution would be to only deal with compressed frame and feed the compositor with those instead.

When the compositor needs to paint a new frame, it would request a decoder to perform decoding on the fly. If decoding is too slow the information would be propagated back to the VideoSink/MDSM which would then active the skip to next frame logic.

We would no longer need to keep a massive queue of decoded frame, and likely do everything under the 4 frames limit imposed by the Windows MFT.

Michael Froman [:mjf]

Updated

•

6 years ago

Rank: 20

Priority: -- → P3

Jean-Yves Avenard [:jya]

Reporter

Updated

•

5 years ago

Blocks: 1594677

Jeff Muizelaar [:jrmuizel]

Comment 1

•

5 years ago

jya, any idea what the priority of this is these days?

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Reporter

Comment 2

•

5 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #1)

jya, any idea what the priority of this is these days?

I'd like to start looking into this after the fission work has completed.

Flags: needinfo?(jyavenard)

Jean-Yves Avenard [:jya]

Reporter

Updated

•

5 years ago

Updated

•

4 years ago

Comment 3

•

4 years ago

Adding more information from bug 1589165:

(In reply to Jean-Yves Avenard [:jya] from bug 1589165 comment #7)

The idea is to have all decoders in the GPU process; there will be no need to copy the decoded image into a shared buffer. The aim is to render it directly what comes out of the decoder, be it software or hardware

Summary: Perform on demand decoding driven by the compositor → Perform on demand decoding driven by the compositor and move all decoding into the GPU process

Jeff Muizelaar [:jrmuizel]

Comment 4

•

3 years ago

I'm worried that decoding on demand will add too much time to our frame time and cause us to blow our frame budget. It feels like we should be able to overlap decoding and drawing. From my quick investigation on my Skylake Windows machine using GPUview, video decode can definitely overlap with 3d rendering. Further, I see 4k h264 video decoding times of 6-10ms which would eat a lot of the frame budget (especially when running at higher frame rates like 144Hz)

Jeff Muizelaar [:jrmuizel]

Comment 5

•

3 years ago

Also, do we know what Chrome does about this problem?

Alastor Wu [:alwu]

Comment 6

•

3 years ago

By reading their blog post, it seems that they're already using this way (pull-based) on their rendering pipeline.

Jet Villegas (inactive)

Comment 7

•

3 years ago

•

Edited

Looks like the VTXDecoder and/or RDD process has unbounded memory growth on Nightly (94.0a1).
Here's a test case (use 4K settings) to repro:

https://www.youtube.com/watch?v=fDIUdXkeBEY

I hope all's well with y'all <3

Edit: looks like this issue is now logged as a regression from bug 1731815

Randell Jesup [:jesup] (needinfo me)

Comment 8

•

3 years ago

FYI comment 7

Flags: needinfo?(jmuizelaar)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Perform on demand decoding driven by the compositor and move all decoding into the GPU process

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

Tracking

()

People

(Reporter: jya, Unassigned, NeedInfo)

References

(Blocks 2 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated