Closed Bug 1666487 Opened 7 months ago Closed 2 months ago

Passing Timeslice interval to MediaRecorder doesn't affect keyframe generation

Categories

(Core :: Audio/Video: Recording, defect, P2)

defect

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox87 --- fixed

People

(Reporter: modler.daniel, Assigned: pehrsons)

Details

Attachments

(15 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36

Steps to reproduce:

  • Record a video using MediaRecorder (Source was getDisplayMedia() with a rather high resolution)
  • Call recorder.start(50000)
  • Recorded a static, non-changing screen
  • Save the video and played it back

Actual results:

The video contained a keyframe approximately every second. This leads to the picture quality to become bad (showing artifacts) every second. Only a very high bitrate can compensate for that. This is not necessary in Chromium.

Expected results:

I expected the video to display my static desktop image, at high quality.

I guess this is because of the following:
https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#1100
This line sets the keyframe interval to the timeslice passed to start() or one second, whatever is larger

https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
This line sets the keyframe interval to the requested interval or one second, whatever is smaller.

So the keyframe interval is always one second.

This sounds like bug 1605588. That bug mentions values less than 1000ms are problematic, but it sounds like any args may be busted.

Status: UNCONFIRMED → RESOLVED
Closed: 7 months ago
Component: Audio/Video → Audio/Video: Recording
Resolution: --- → DUPLICATE
See Also: → 1605588
Duplicate of bug: 1605588
See Also: 1605588

There has been no intention to encode keyframes at the timeslice interval.

We made it so values less than 1s would generate keyframes at that interval to improve latency in cases when the application wanted low-latency. This was an external contribution and a mitigation that could ship sooner than a proper streaming fix.

Historically we had keyframes at a 1s interval, I believe to help with seeking, though this behavior predates my time at Mozilla. We lack cues in produced webm files, but you need to find a keyframe either way when seeking.

Considering this I don't think this is a dupe of bug 1605588.

If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters. The example in comment 0 uses 50s as timeslice, and then we would at best be able to fire a non-empty "dataavailable" blob every 50s.

It would be interesting to see whether Chrome encodes keyframes at an interval at all and if so, at what interval and with what settings. We have configured the encoder to use variable bitrate to help with keyframe quality in the past. There might be other improvements that could help.

Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---

Google chrome outputs chunks at approx. timeslice interval, even if clusters haven't finished then, so keyframe intervals are much longer there.

"If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters"
If users want low latency, they would choose a low timeslice, because that's what the timeslice is for. ( https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/start ). But I agree that open clusters would be more useful than a shorter keyframe interval (and shorter clusters) for that case.

On google chrome, users use tools like ts-ebml to do segmentation for MSE and possibly add cues afterwards for seeking, so I think there's no real need to emit complete clusters, since that's not done by Chrome anyway.

I believe that raising the 1s limit here: https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759

would solve the problem of the too-frequent keyframe generation without negative impact, because users wanting low intervals would still be able to force them by selecting a low timeslice. I'd agree that the 1s lower limit for the keyframe generation in MediaRecorder.cpp might be wise

(In reply to modler.daniel from comment #3)

Google chrome outputs chunks at approx. timeslice interval, even if clusters haven't finished then, so keyframe intervals are much longer there.

"If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters"
If users want low latency, they would choose a low timeslice, because that's what the timeslice is for. ( https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/start ). But I agree that open clusters would be more useful than a shorter keyframe interval (and shorter clusters) for that case.

You're right, sorry. I'll say the inverse instead: the timeslice is supposed to be a control of latency, not keyframe interval. If we let it affect the keyframe interval we might very well paint us into a corner we can't get out of, because applications will expect a certain keyframe behavior from setting the timeslice. Even if we update the spec with a keyframe control, old uses might linger.

On google chrome, users use tools like ts-ebml to do segmentation for MSE and possibly add cues afterwards for seeking, so I think there's no real need to emit complete clusters, since that's not done by Chrome anyway.

Sure, the reason that we emit only complete clusters is one of legacy.

(In reply to modler.daniel from comment #4)

I believe that raising the 1s limit here: https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759

would solve the problem of the too-frequent keyframe generation without negative impact, because users wanting low intervals would still be able to force them by selecting a low timeslice. I'd agree that the 1s lower limit for the keyframe generation in MediaRecorder.cpp might be wise

The concern with large keyframe intervals is mainly one of seeking.

Severity: -- → S2
Priority: -- → P1
Severity: S2 → S4
Priority: P1 → P3

Comparison to Chromium

I did some testing, also compiled Chromium and debugged and compared configuration and values.

Chromium (also) sets kf_mode=VPX_KF_AUTO but sets kf_max_dist=100 and does not force keyframe generation at all. This results in a keyframe being generated about every 3-5 seconds (I verified that in the finished webm file), the same video having a much lower effective bitrate. I could not see any seeking issues with that video, also not with MSE (apart from the cues missing as always).

Suggested changes

I tested and suggest the following changes:

https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#32
Add

static const unsigned int MAX_KEYFRAME_INTERVAL_MS = 10000;

above that. (This would actually lead to the code doing what the comments above suggest)

https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
Change to

mKeyFrameInterval = std::min(aKeyFrameInterval, MAX_KEYFRAME_INTERVAL_MS);

Finally, in https://searchfox.org/mozilla-central/source/dom/media/encoder/VP8TrackEncoder.cpp#29
Change to

#define MAX_KEYFRAME_INTERVAL 100

This leads to a very similar video as with chrome, with similar or same input, and there is some (and not no, as of no) control of chunk sizes via the timeslice parameter (albeit still via keyframes), and the behaviour is more similar to Chromium.

Thanks. I was looking into something similar yesterday. I need to ponder through a bit how our unittests should treat this, but in general I think it sounds good to use VPX_KF_AUTO. The biggest problem with that is that there's a max frame interval to control the frequency of the keyframes, which can become a very long wall-clock interval if the source is very low framerate. Falling back to MAX_KEYFRAME_INTERVAL_MS would work for that. Would 5s be ok with you?

Would you mind submitting a patch? That way you get due credit and can call yourself a Firefox contributor. I can help to review, land and fix tests.

FWIW I have a jsfiddle that I use to do manual testing of MediaRecorder changes (perceived quality that's hard to catch in automation). It should cover your use case and others if you are in business for one.

If the framerate is known in advance we could calculate the max frame interval according to that. Don't know if there is some variable portion here, e.g. if the desktop grabber isn't able to capture at 30FPS. For that case the MAX_KEYFRAME_INTERVAL_MS would help of course, however I'd choose the value so that it is usually ~2 seconds larger than the configured kf_max_dist interval and so true manual keyframe generation is the exception.

Question is, at the moment the keyframe interval is also meant to be affected by the timeslice parameter, to generate smaller clusters (and mitigate the fact that open clusters are currently not put out). We could do that using the kf_max_dist configuration parameter, like this:

cfg.kf_max_dist = expected_fps * std::min(std::max(1, timeslice * 1000, MAX_KEYFRAME_INTERVAL);

or just use the "fallback" (which means a bit less code changes).

I'd submit a patch if you like, as this also helps our product ofcourse.

(This was meant to be timeslice / 1000, of course, if timeslice is given in milliseconds)

(In reply to modler.daniel from comment #8)

If the framerate is known in advance we could calculate the max frame interval according to that. Don't know if there is some variable portion here, e.g. if the desktop grabber isn't able to capture at 30FPS. For that case the MAX_KEYFRAME_INTERVAL_MS would help of course, however I'd choose the value so that it is usually ~2 seconds larger than the configured kf_max_dist interval and so true manual keyframe generation is the exception.

In general nothing can be assumed about framerate. A captured canvas is for instance controlled entirely by the application. In practice however, since MediaRecorder doesn't allow switching a track for another, we do know a bit about the type of source that's giving us video frames.

  • Cameras have a framerate setting
  • Screen/Window capture likewise, but it's set to 30
  • Canvas capture can have any framerate
  • Media element capture is bound by the source of the media element, which is most likely fixed but gives no guarantees
  • A received RTCPeerConnection can have any framerate -- the remote side can swap out a track for any other transparently (though spec says it SHOULD be no less than 1fps)

I think a more scalable solution would be to start out with a reasonable assumption and then reconfigure the encoder as we go. A rolling average that we check in on every so often could work for instance.

The RollingMean class might help with something like that.

Question is, at the moment the keyframe interval is also meant to be affected by the timeslice parameter, to generate smaller clusters (and mitigate the fact that open clusters are currently not put out). We could do that using the kf_max_dist configuration parameter, like this:

cfg.kf_max_dist = expected_fps * std::min(std::max(1, timeslice * 1000, MAX_KEYFRAME_INTERVAL);

or just use the "fallback" (which means a bit less code changes).

I'd submit a patch if you like, as this also helps our product ofcourse.

I think it makes sense to go to some lengths to avoid the "fallback". This could be split up in several patches of course, or even bugs.

Daniel, let me know if you'll have a patch up soon. I did some work around this to have the tests prepared, and to try to improve the frame-dropping logic to be more resilient to short-term changes. I can incorporate your suggestions too. Or stack it on top of your patch if you have one coming.

Flags: needinfo?(modler.daniel)
Priority: P3 → P2
Assignee: nobody → apehrson
Status: REOPENED → ASSIGNED

Sorry for the radio silence, have been busy with another project during the last month. If still helpful, I'd create and submit a first patch.

Flags: needinfo?(modler.daniel)

I do have a long patch queue lined up that fixes this and does a bunch of refactoring, but it's not quite ready just yet. If you want the credit, submit a patch and I'll incorporate it in my queue.

I'm not after the credit at all, just after having this fixed sometimes :) If you already have that, that's fine. If I can be of help, let me know.

Ok, NP. When I have something to show I might ask you to have a look and to give it a shot.

Before this patch we would re-allocate a full-size vpx image on Init and
Reconfigure, despite the intention being to not allocate at all -- see the
comment on passing nullptr to vpx_img_wrap.

On top of that we allocated a separate full-size vpx image in mI420Frame and
did bookkeeping of that ourselves - including re-allocating a larger one when
input size increased.

This patch merges these two allocation paths to only use vpx_img_wrap and
vpx_img_free for bookkeeping planes and strides, and only re-allocate when an
input frame exceeds the extant i420 buffer.

The net result is less allocations, which in case of a large video resolution,
like in the case of screensharing, has very noticable impact.

These members were declared on VideoTrackEncoder but never used outside of the
VP8TrackEncoder subclass.

This changes the type of the interval to be unit aware, and makes it a Maybe to
allow for denoting a value that has not yet been set. This also moves the state
to VP8TrackEncoder to reduce the amount of indirection.

This changes the frame-skipping (and keyframe force-encoding) logic from only
looking at the frame-duration and wall-clock-encode-duration of the last frame
to looking at a rolling window of 30 frames, to better avoid jittery durations.
For instance in cases where the frame-encode-duration is long due to a
reconfiguration.

For simplicity, and for better handling of re-init if other parameters would
require it, like max keyframe distance in a future patch.

Video tracks do not by spec have constant framerate, while the vp8 encoder works
best with a constant framerate, since the key frame interval is defined as a
number of frames.

This patch lets us configure this value based on an observed framerate, updated
every 5 seconds, such that the actual keyframe interval doesn't get too long.

This patch more importantly moves away from forcing keyframes, except for some
corner cases since it stays as a fallback. Instead we go to lengths to let the
encoder itself decide where to put keyframes, since that results in better
perceived video quality and lower bitrate, especially at large resolutions, like
screenshare.

This patch makes VideoTrackEncoder wait for at least 5 full (having an end time)
frames, i.e. 6 incoming frames, before Init()ing. This so it can use the last 3
frames of those 5 to measure they're mean framerate and configure the vp8
encoder accordingly.

Configuring the vp8 encoder with an accurate framerate from the start avoids
a re-init if the keyframe distance would later have to be updated. A re-init
means a larger quality degradation than just a reconfig leading to a new
keyframe.

Sources with a framerate that varies wildly will require re-inits to adjust.
The typical source for this would be a canvas where the application cannot or
does not want to keep a consistent rate. Camera capture, screen/window capture,
peer connections (with camera/screen sources) are pretty stable in framerate
and should cause re-inits. Most playback content has constant framerate, so
media element capture is generally exempt from re-inits too, unless paused.

Most importantly this disables the keyframe encoding, but it also reduces the
margin for when to trigger a skipped frame to 85%.

Suddenly encoding a keyframe causes a quality degradation since the codec is
trying to constrain itself to its configured bitrate. Skipping a frame is less
intrusive since it essentially reduces the output framerate but retains the
visible quality of the frames. This is especially noticable with large
resolutions where keyframes can become very blocky.

An application can configure videoBitsPerSecond to be <1000, leading to the
configuration of the vp8 encoder with rc_target_bitrate = 0. A target bitrate
of 0 makes the encoder skip all frames.

Without this we tripped an assert in a crashtest with the whole stack of patches
applied.

This patch is based on an observation that our decoder stack can fail an assert
if given an initial frame whose resolution differs too much from the size
provided in the webm header.

Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/3a248308dc61
Remove UniquePtr wrappers around vpx state in VP8TrackEncoder. r=bryce
https://hg.mozilla.org/integration/autoland/rev/c13960f77f32
Only re-allocate vpx image data when strictly needed. r=bryce
https://hg.mozilla.org/integration/autoland/rev/ef30b76b1d98
Constify some RollingMean getters. r=glandium
https://hg.mozilla.org/integration/autoland/rev/3153ea880408
Change SetConfigurationValues to a static function that doesn't affect state. r=bryce
https://hg.mozilla.org/integration/autoland/rev/0cfdab0ac564
Create VP8Metadata only once on Init instead of in getter. r=bryce
https://hg.mozilla.org/integration/autoland/rev/f8866040f97e
Move mFrameWidth/Height from VideoTrackEncoder to VP8TrackEncoder. r=bryce
https://hg.mozilla.org/integration/autoland/rev/6454bf741983
Move key frame interval state from VideoTrackEncoder to VP8TrackEncoder. r=bryce
https://hg.mozilla.org/integration/autoland/rev/1f5570ef1fc9
Rework VP8TrackEncoder's frame skipping logic. r=bryce
https://hg.mozilla.org/integration/autoland/rev/7845ea82b89c
Let VP8TrackEncoder::Reconfigure handle re-init when needed, instead of the caller. r=bryce
https://hg.mozilla.org/integration/autoland/rev/875a4e58285b
Make VP8TrackEncoder's kf_max_dist configurable and dynamically adapted. r=bryce
https://hg.mozilla.org/integration/autoland/rev/75b6f286671f
Wait for a few frames to observe framerate before vp8 encoder Init(). r=bryce
https://hg.mozilla.org/integration/autoland/rev/fffda786f8c2
Skip framerate estimation on EOS. r=bryce
https://hg.mozilla.org/integration/autoland/rev/9c56a50cc6d3
Effectively disable encoding keyframes when on a tight time budget. r=bryce
https://hg.mozilla.org/integration/autoland/rev/dd2bab72c12d
Don't encode vp8 with 0 bitrate. r=bryce
https://hg.mozilla.org/integration/autoland/rev/76227a7f4ed9
Init the video encoder with the known size if available. r=bryce
You need to log in before you can comment on or make changes to this bug.