Bug 1713276 Comment 13 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Alastor Wu [:alwu] from comment #12)
> Hi, Martin,
> I'm still investigating the code to see how we can implement that, now I have some rough idea and I collect them into this [doc](https://docs.google.com/document/d/1HqV4iyd7Ln_RSEraJU-k8JcuOlSJkkVYNdNs1f9_oy4/edit?usp=sharing). Would you mind to check if the direction of my thought is correct or not?
> Thank you so much.

Hello Alastor,

There are some facts which needs to be considered:

- direct decode to dmabuf (on Linux) is very slow. It because ffmpeg also reads from the buffer during decode and it expects the buffer is zeroed. Direct decoding to dmabuf takes 700%!! CPU on my box while indirect decoding 40-60%. The GPU memory is generally very slow to read (if it's even supported). I don't know how is that handled on Windows/Mac but I expect the situation is similar (or the data are cached and then moved to GPU at once but that's what we do now anyway). The correct (and only working) way here is to create a texture, upload video buffer into it and use such texture which is done by gecko later.

- recycled buffers can be hold for long time by different parts of gecko/different processes.
- you may need to use avcodec_default_get_buffer2() for video buffer formats which are not supported by us (at least for the beginning).

Right now the decoding sequence is (on Linux):

1) ffmpeg allocates video buffer and decodes video data into it
2) we allocate shm buffer and copy video data there
3) we allocate GL texture and copy data there

So there are 3 allocations and 3 video data copy.

Given the facts I'm suggesting to decode to shm memory buffers only. We can allocate every frame now and implement frame pool later if that's needed, I don't know how shm allocation is expensive (we allocate shm memory for every frame now).
(In reply to Alastor Wu [:alwu] from comment #12)
> Hi, Martin,
> I'm still investigating the code to see how we can implement that, now I have some rough idea and I collect them into this [doc](https://docs.google.com/document/d/1HqV4iyd7Ln_RSEraJU-k8JcuOlSJkkVYNdNs1f9_oy4/edit?usp=sharing). Would you mind to check if the direction of my thought is correct or not?
> Thank you so much.

Hello Alastor,

There are some facts which needs to be considered:

- direct decode to dmabuf (on Linux) is very slow. It because ffmpeg also reads from the buffer during decode and it expects the buffer is zeroed. Direct decoding to dmabuf takes 700%!! CPU on my box while indirect decoding 40-60%. The GPU memory is generally very slow to read (if it's even supported). I don't know how is that handled on Windows/Mac but I expect the situation is similar (or the data are cached and then moved to GPU at once but that's what we do already). The correct (and only working) way here is to create a texture, upload video buffer into it and use such texture (which is done by gecko anyway).

- recycled buffers can be hold for long time by different parts of gecko/different processes.
- you may need to use avcodec_default_get_buffer2() for video buffer formats which are not supported by us (at least for the beginning).

Right now the decoding sequence is (on Linux):

1) ffmpeg allocates video buffer and decodes video data into it
2) we allocate shm buffer and copy video data there
3) we allocate GL texture and copy data there

So there are 3 allocations and 3 video data copy.

Given the facts I'm suggesting to decode to shm memory buffers only. We can allocate every frame now and implement frame pool later if that's needed, I don't know how shm allocation is expensive (we allocate shm memory for every frame now).

Back to Bug 1713276 Comment 13