Webrtc mediaRecorder.start timeslice not working

NEW
Unassigned

Status

()

Core
Audio/Video: Recording
P2
normal
Rank:
19
27 days ago
9 days ago

People

(Reporter: Divyesh, Unassigned)

Tracking

(Depends on: 1 bug)

56 Branch
Unspecified
All
Points:
---

Firefox Tracking Flags

(firefox-esr52 affected, firefox56 affected, firefox57 affected, firefox58 affected)

Details

Attachments

(6 attachments)

(Reporter)

Description

27 days ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36

Steps to reproduce:

Just start Mediarecorder with timeslice


Actual results:

ondataavailable event is called by interval of timeslice. but all time event.data size is zero only one event per some second has big data like 2.0 MB.
So here timeslice is useless

mediaRecorder.ondataavailable = handleDataAvailable;
mediaRecorder.start(1000); 

function handleDataAvailable(event) {
    console.log('data avail -> ',event.data.size);
}

See Five second dataavailable
data avail -> 312
data avail -> 0
data avail -> 0
data avail -> 0
data avail -> 0
data avail -> 212212


Expected results:

need to work timeslice proper with mediarecorder.start
example
mediaRecorder.ondataavailable = handleDataAvailable;
mediaRecorder.start(1000); 

function handleDataAvailable(event) {
    console.log('data avail -> ',event.data.size);
}

See Five second dataavailable
data avail -> 312
data avail -> 12457
data avail -> 25412
data avail -> 10023
data avail -> 10111
data avail -> 19845
I can repro: https://jsfiddle.net/jib1/73u0ekbz/

Looks like we're not following the intent of the spec here. Looks like we're following some internal interval instead. Open to a fix here if it is trivial.
Has STR: --- → yes
Rank: 19
OS: Unspecified → All
Priority: -- → P2
Status: UNCONFIRMED → NEW
status-firefox56: --- → affected
status-firefox57: --- → affected
status-firefox58: --- → affected
status-firefox-esr52: --- → affected
Ever confirmed: true
Component: WebRTC → Audio/Video: Recording
I tracked this to EbmlComposer. For VP8 it will only finish a cluster (and write it to the blob) once a keyframe appears, see [1]. We encode a keyframe every 60 frames, [2]. There appears to be a fallback at ~33 seconds if you don't have VP8, [3], (so audio in webm, but we only support audio in ogg, [4]).

WebM spec says, on [5]:
> - Key frames SHOULD be placed at the beginning of clusters.
>   - Having key frames at the beginning of clusters should make seeking faster and easier for the client.

So the question here is whether we should go against this SHOULD (and what it means, we'd still have keyframes at the beginning of clusters, but not all clusters would have keyframes), encode more keyframes, set keyframe interval based on the timeslice, or something else.


[1] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/dom/media/webm/EbmlComposer.cpp#131
[2] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/dom/media/encoder/VP8TrackEncoder.cpp#212
[3] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/dom/media/webm/EbmlComposer.cpp#137
[4] http://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/dom/media/MediaRecorder.cpp#1487
[5] https://www.webmproject.org/docs/container/
Another, perhaps simpler, option is to let a blob contain half-finished clusters, but that might take some refactoring of EbmlComposer.

Comment 4

24 days ago
I have been playing with this problem for couple of days so here is my few notes:

- as mentioned above, config.kf_max_dist is set to 60 frames so that means that for aprox. 15 frames per second (which is rate common for most low/middle cost webcams) you get AV data every 3-4 seconds. With lower frames it's getting longer.
- the most ideal solution would be to rewrite whole logic in EbmlComposer so instead of waiting for cluster to be finished just flush out all data and conntinue - currently there is internal buffer used depending on the unfinished cluster data so you can't here just get data, clean buffer and wait for another data - it must be precisely rewritten
- keyframes at the beginning of the cluster are must - I have been facing such a problem a year ago when I tried to find out why is webm/vp8 stream not playing via MSE(can't remember if it was problem of Firefox or Chrome) but fixing stream to have keyframes at the beginning of the cluster was the correct solution

So currently the simplest solution for me is a small change to calculate and set a fixed keyframe interval according the timeslice given to MediaRecorder so cluster would be smaller than actual 60 frames.
 
I already tried to make a patch - changes are attached. Code is taken from firefox-57.0b10 release I tried to compile it did a few tests and it works without any problem. The main change is in VP8TrackEncoder where is logic of counting the best keyframe interval according the track rate, timeslice and frame duration. Frame duration is taken from the first frame so before the data are passed to the encoder the keyframe interval is counted and set. Changes in other modules are only to hand over the timeslice value (MediaRecorder -> MediaEncoder -> VP8TrackEncoder). If a timeslice/flushtime is not set (this is done only from MediaRecoder) a keyframe interval is kept as it is now (60 frames).

Could somebody please review this patch and let me know if it is suitable to move forward?

Thanks.

Comment 5

24 days ago
Created attachment 8923196 [details] [diff] [review]
MediaEncoder.cpp.patch
Attachment #8923196 - Flags: feedback+

Comment 6

24 days ago
Created attachment 8923197 [details] [diff] [review]
MediaEncoder.h.patch
Attachment #8923197 - Flags: feedback+

Comment 7

24 days ago
Created attachment 8923198 [details] [diff] [review]
MediaRecorder.cpp.patch
Attachment #8923198 - Flags: feedback+

Comment 8

24 days ago
Created attachment 8923199 [details] [diff] [review]
TrackEncoder.h.patch
Attachment #8923199 - Flags: feedback+

Comment 9

24 days ago
Created attachment 8923200 [details] [diff] [review]
VP8TrackEncoder.cpp.patch
Attachment #8923200 - Flags: feedback+

Comment 10

24 days ago
Created attachment 8923201 [details] [diff] [review]
VP8TrackEncoder.h.patch
Attachment #8923201 - Flags: feedback+
I think John meant to mark these feedback? not feedback+
Flags: needinfo?(bvandyk)

Comment 12

22 days ago
Yes sorry, I didn't know what's different between feedback+ and feedback.

Thanks.
Various thoughts:

The timeslice is a lower bound on firing a blob based on my reading of the spec. I think it's intuitive to fire at the rate specified by timeslice, but there is some wiggle room here (it is possible for the UA to set a minimum timeslice which the arg cannot go under)[1].

I do not believe Chrome set their own bound, and will fire an event if the passed time has exceeded the user set timeslice[2]. Looks like the call chain there is VideoTrackRecorder::Encoder::OnFrameEncodeCompleted -> MediaRecorderHandler::OnEncodedVideo -> WebmMuxer::OnEncodedVideo which then starts to call into libwebm[3]. MediaRecorderHandler::WriteData gets called back by libwebm as part of its data writing callback. I believe libwebms muxer will write on block boundries.

Agreed that keyframes at the start of clusters are a must.

Agreed we can do better here. What use cases are we interested in? The most compelling use case for getting data available as soon as possible is for Media Source Extensions or Real Time Communication (though her perhaps we'd send the input to RTC rather than the recorder output data?). In the MSE case I think we can output non finished clusters that can be appended to buffers (like libwebm we could write out on the block scale).

We could also tweak the existing keyframe rate so that we continue to write out on cluster boundaries, but are more flexible. I'm concerned as to the results of this for small timeslices.

So 2 major solutions: write on blocks, not clusters, or adjust keyframe rate. For the first we could either rework EBMLComposer, or we could replace the existing machinery with libwebm. The former is likely easier, though I have an old branch that uses libwebm (will see if I can dredge it up and if it rebases).

Interested in other people's thoughts.

[1]: https://w3c.github.io/mediacapture-record/MediaRecorder.html#mediarecorder-methods
[2]: https://cs.chromium.org/chromium/src/content/renderer/media_recorder/media_recorder_handler.cc?sq=package:chromium&l=406
[3]: https://github.com/webmproject/libwebm
Flags: needinfo?(bvandyk)
"When multiple Blobs are returned (because of timeslice or requestData()), the individual Blobs need not be playable, but the combination of all the Blobs from a completed recording MUST be playable."

So given that, why do blobs need to start with keyframes?
(In reply to Randell Jesup [:jesup] from comment #14)
> "When multiple Blobs are returned (because of timeslice or requestData()),
> the individual Blobs need not be playable, but the combination of all the
> Blobs from a completed recording MUST be playable."
> 
> So given that, why do blobs need to start with keyframes?

They don't, it's just our code's legacy. It would be some work to write to the blob when we finish a block rather than a cluster. FWIW I think this refactor (write on clusters -> blocks) would be the cleanest. Bryce, do you have an idea of how much work that would be?
Flags: needinfo?(bvandyk)
(In reply to Randell Jesup [:jesup] from comment #14)
> "When multiple Blobs are returned (because of timeslice or requestData()),
> the individual Blobs need not be playable, but the combination of all the
> Blobs from a completed recording MUST be playable."
> 
> So given that, why do blobs need to start with keyframes?

It's okay if the blobs don't start we keyframes. However, if once the blobs are assembled they constitute a webm where a cluster starts with a non-key frame then that's not desirable as demuxers may rely on clusters starting with keyframes (sensibly IMO given the spec).

> Bryce, do you have an idea of how much work that would be?

I'll take a look. Off the top of my head I'd estimate a week or two, but it's been awhile since I was dug into the code. Am about to relocate, but am hoping to be largely back online at the start of next week and will revise estimate as needed then.
Flags: needinfo?(bvandyk)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2de3be9d5832743033fe372834730d47e69991a
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc8636296eac6aa89d4e31ef835429bf414a4618
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f801a6b10bb1a73b761ecdafccc163e57af9ca8
Most of the work to get us onto the new backend is hopefully done. However, I remember that some of this work requires us to do more rigorous syncing within the muxer. Right now we'll happily pass frames with out of order time stamps to the muxer, which works fine with our libmkv based backend (though these will likely be discarded by various demuxers), but the libwebm based backend will reject these frames.

I've got some old WIP on Bug 1014393 to do with better handling around this behaviour. Going to rebase and get that done to enable these changes.
Depends on: 1014393
You need to log in before you can comment on or make changes to this bug.