Passing Timeslice interval to MediaRecorder doesn't affect keyframe generation
Categories
(Core :: Audio/Video: Recording, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox87 | --- | fixed |
People
(Reporter: modler.daniel, Assigned: pehrsons)
Details
Attachments
(15 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1666487 - Change SetConfigurationValues to a static function that doesn't affect state. r?bryce!
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36
Steps to reproduce:
- Record a video using MediaRecorder (Source was getDisplayMedia() with a rather high resolution)
- Call recorder.start(50000)
- Recorded a static, non-changing screen
- Save the video and played it back
Actual results:
The video contained a keyframe approximately every second. This leads to the picture quality to become bad (showing artifacts) every second. Only a very high bitrate can compensate for that. This is not necessary in Chromium.
Expected results:
I expected the video to display my static desktop image, at high quality.
I guess this is because of the following:
https://searchfox.org/mozilla-central/source/dom/media/MediaRecorder.cpp#1100
This line sets the keyframe interval to the timeslice passed to start() or one second, whatever is larger
https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
This line sets the keyframe interval to the requested interval or one second, whatever is smaller.
So the keyframe interval is always one second.
This sounds like bug 1605588. That bug mentions values less than 1000ms are problematic, but it sounds like any args may be busted.
Assignee | ||
Comment 2•4 years ago
|
||
There has been no intention to encode keyframes at the timeslice interval.
We made it so values less than 1s would generate keyframes at that interval to improve latency in cases when the application wanted low-latency. This was an external contribution and a mitigation that could ship sooner than a proper streaming fix.
Historically we had keyframes at a 1s interval, I believe to help with seeking, though this behavior predates my time at Mozilla. We lack cues in produced webm files, but you need to find a keyframe either way when seeking.
Considering this I don't think this is a dupe of bug 1605588.
If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters. The example in comment 0 uses 50s as timeslice, and then we would at best be able to fire a non-empty "dataavailable" blob every 50s.
It would be interesting to see whether Chrome encodes keyframes at an interval at all and if so, at what interval and with what settings. We have configured the encoder to use variable bitrate to help with keyframe quality in the past. There might be other improvements that could help.
Reporter | ||
Comment 3•4 years ago
|
||
Google chrome outputs chunks at approx. timeslice interval, even if clusters haven't finished then, so keyframe intervals are much longer there.
"If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters"
If users want low latency, they would choose a low timeslice, because that's what the timeslice is for. ( https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/start ). But I agree that open clusters would be more useful than a shorter keyframe interval (and shorter clusters) for that case.
On google chrome, users use tools like ts-ebml to do segmentation for MSE and possibly add cues afterwards for seeking, so I think there's no real need to emit complete clusters, since that's not done by Chrome anyway.
Reporter | ||
Comment 4•4 years ago
|
||
I believe that raising the 1s limit here: https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
would solve the problem of the too-frequent keyframe generation without negative impact, because users wanting low intervals would still be able to force them by selecting a low timeslice. I'd agree that the 1s lower limit for the keyframe generation in MediaRecorder.cpp might be wise
Assignee | ||
Comment 5•4 years ago
|
||
(In reply to modler.daniel from comment #3)
Google chrome outputs chunks at approx. timeslice interval, even if clusters haven't finished then, so keyframe intervals are much longer there.
"If we were to emit keyframes at a high interval based on the timeslice, then latency would suffer unless we refactor to permit emitting open clusters"
If users want low latency, they would choose a low timeslice, because that's what the timeslice is for. ( https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/start ). But I agree that open clusters would be more useful than a shorter keyframe interval (and shorter clusters) for that case.
You're right, sorry. I'll say the inverse instead: the timeslice is supposed to be a control of latency, not keyframe interval. If we let it affect the keyframe interval we might very well paint us into a corner we can't get out of, because applications will expect a certain keyframe behavior from setting the timeslice. Even if we update the spec with a keyframe control, old uses might linger.
On google chrome, users use tools like ts-ebml to do segmentation for MSE and possibly add cues afterwards for seeking, so I think there's no real need to emit complete clusters, since that's not done by Chrome anyway.
Sure, the reason that we emit only complete clusters is one of legacy.
(In reply to modler.daniel from comment #4)
I believe that raising the 1s limit here: https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
would solve the problem of the too-frequent keyframe generation without negative impact, because users wanting low intervals would still be able to force them by selecting a low timeslice. I'd agree that the 1s lower limit for the keyframe generation in MediaRecorder.cpp might be wise
The concern with large keyframe intervals is mainly one of seeking.
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 6•4 years ago
|
||
Comparison to Chromium
I did some testing, also compiled Chromium and debugged and compared configuration and values.
Chromium (also) sets kf_mode=VPX_KF_AUTO
but sets kf_max_dist=100
and does not force keyframe generation at all. This results in a keyframe being generated about every 3-5 seconds (I verified that in the finished webm file), the same video having a much lower effective bitrate. I could not see any seeking issues with that video, also not with MSE (apart from the cues missing as always).
Suggested changes
I tested and suggest the following changes:
https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#32
Add
static const unsigned int MAX_KEYFRAME_INTERVAL_MS = 10000;
above that. (This would actually lead to the code doing what the comments above suggest)
https://searchfox.org/mozilla-central/source/dom/media/encoder/TrackEncoder.cpp#759
Change to
mKeyFrameInterval = std::min(aKeyFrameInterval, MAX_KEYFRAME_INTERVAL_MS);
Finally, in https://searchfox.org/mozilla-central/source/dom/media/encoder/VP8TrackEncoder.cpp#29
Change to
#define MAX_KEYFRAME_INTERVAL 100
This leads to a very similar video as with chrome, with similar or same input, and there is some (and not no, as of no) control of chunk sizes via the timeslice parameter (albeit still via keyframes), and the behaviour is more similar to Chromium.
Assignee | ||
Comment 7•4 years ago
|
||
Thanks. I was looking into something similar yesterday. I need to ponder through a bit how our unittests should treat this, but in general I think it sounds good to use VPX_KF_AUTO. The biggest problem with that is that there's a max frame interval to control the frequency of the keyframes, which can become a very long wall-clock interval if the source is very low framerate. Falling back to MAX_KEYFRAME_INTERVAL_MS would work for that. Would 5s be ok with you?
Would you mind submitting a patch? That way you get due credit and can call yourself a Firefox contributor. I can help to review, land and fix tests.
FWIW I have a jsfiddle that I use to do manual testing of MediaRecorder changes (perceived quality that's hard to catch in automation). It should cover your use case and others if you are in business for one.
Reporter | ||
Comment 8•4 years ago
|
||
If the framerate is known in advance we could calculate the max frame interval according to that. Don't know if there is some variable portion here, e.g. if the desktop grabber isn't able to capture at 30FPS. For that case the MAX_KEYFRAME_INTERVAL_MS
would help of course, however I'd choose the value so that it is usually ~2 seconds larger than the configured kf_max_dist
interval and so true manual keyframe generation is the exception.
Question is, at the moment the keyframe interval is also meant to be affected by the timeslice
parameter, to generate smaller clusters (and mitigate the fact that open clusters are currently not put out). We could do that using the kf_max_dist
configuration parameter, like this:
cfg.kf_max_dist = expected_fps * std::min(std::max(1, timeslice * 1000, MAX_KEYFRAME_INTERVAL);
or just use the "fallback" (which means a bit less code changes).
I'd submit a patch if you like, as this also helps our product ofcourse.
Reporter | ||
Comment 9•4 years ago
|
||
(This was meant to be timeslice / 1000
, of course, if timeslice
is given in milliseconds)
Assignee | ||
Comment 10•4 years ago
|
||
(In reply to modler.daniel from comment #8)
If the framerate is known in advance we could calculate the max frame interval according to that. Don't know if there is some variable portion here, e.g. if the desktop grabber isn't able to capture at 30FPS. For that case the
MAX_KEYFRAME_INTERVAL_MS
would help of course, however I'd choose the value so that it is usually ~2 seconds larger than the configuredkf_max_dist
interval and so true manual keyframe generation is the exception.
In general nothing can be assumed about framerate. A captured canvas is for instance controlled entirely by the application. In practice however, since MediaRecorder doesn't allow switching a track for another, we do know a bit about the type of source that's giving us video frames.
- Cameras have a framerate setting
- Screen/Window capture likewise, but it's set to 30
- Canvas capture can have any framerate
- Media element capture is bound by the source of the media element, which is most likely fixed but gives no guarantees
- A received RTCPeerConnection can have any framerate -- the remote side can swap out a track for any other transparently (though spec says it SHOULD be no less than 1fps)
I think a more scalable solution would be to start out with a reasonable assumption and then reconfigure the encoder as we go. A rolling average that we check in on every so often could work for instance.
The RollingMean class might help with something like that.
Question is, at the moment the keyframe interval is also meant to be affected by the
timeslice
parameter, to generate smaller clusters (and mitigate the fact that open clusters are currently not put out). We could do that using thekf_max_dist
configuration parameter, like this:cfg.kf_max_dist = expected_fps * std::min(std::max(1, timeslice * 1000, MAX_KEYFRAME_INTERVAL);
or just use the "fallback" (which means a bit less code changes).
I'd submit a patch if you like, as this also helps our product ofcourse.
I think it makes sense to go to some lengths to avoid the "fallback". This could be split up in several patches of course, or even bugs.
Assignee | ||
Comment 11•4 years ago
|
||
Daniel, let me know if you'll have a patch up soon. I did some work around this to have the tests prepared, and to try to improve the frame-dropping logic to be more resilient to short-term changes. I can incorporate your suggestions too. Or stack it on top of your patch if you have one coming.
Assignee | ||
Updated•4 years ago
|
Reporter | ||
Comment 12•4 years ago
|
||
Sorry for the radio silence, have been busy with another project during the last month. If still helpful, I'd create and submit a first patch.
Assignee | ||
Comment 13•4 years ago
|
||
I do have a long patch queue lined up that fixes this and does a bunch of refactoring, but it's not quite ready just yet. If you want the credit, submit a patch and I'll incorporate it in my queue.
Reporter | ||
Comment 14•4 years ago
|
||
I'm not after the credit at all, just after having this fixed sometimes :) If you already have that, that's fine. If I can be of help, let me know.
Assignee | ||
Comment 15•4 years ago
|
||
Ok, NP. When I have something to show I might ask you to have a look and to give it a shot.
Assignee | ||
Comment 16•4 years ago
|
||
Assignee | ||
Comment 17•4 years ago
|
||
Before this patch we would re-allocate a full-size vpx image on Init and
Reconfigure, despite the intention being to not allocate at all -- see the
comment on passing nullptr to vpx_img_wrap.
On top of that we allocated a separate full-size vpx image in mI420Frame and
did bookkeeping of that ourselves - including re-allocating a larger one when
input size increased.
This patch merges these two allocation paths to only use vpx_img_wrap and
vpx_img_free for bookkeeping planes and strides, and only re-allocate when an
input frame exceeds the extant i420 buffer.
The net result is less allocations, which in case of a large video resolution,
like in the case of screensharing, has very noticable impact.
Assignee | ||
Comment 18•4 years ago
|
||
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
|
||
Assignee | ||
Comment 21•4 years ago
|
||
These members were declared on VideoTrackEncoder but never used outside of the
VP8TrackEncoder subclass.
Assignee | ||
Comment 22•4 years ago
|
||
This changes the type of the interval to be unit aware, and makes it a Maybe to
allow for denoting a value that has not yet been set. This also moves the state
to VP8TrackEncoder to reduce the amount of indirection.
Assignee | ||
Comment 23•4 years ago
|
||
This changes the frame-skipping (and keyframe force-encoding) logic from only
looking at the frame-duration and wall-clock-encode-duration of the last frame
to looking at a rolling window of 30 frames, to better avoid jittery durations.
For instance in cases where the frame-encode-duration is long due to a
reconfiguration.
Assignee | ||
Comment 24•4 years ago
|
||
For simplicity, and for better handling of re-init if other parameters would
require it, like max keyframe distance in a future patch.
Assignee | ||
Comment 25•4 years ago
|
||
Video tracks do not by spec have constant framerate, while the vp8 encoder works
best with a constant framerate, since the key frame interval is defined as a
number of frames.
This patch lets us configure this value based on an observed framerate, updated
every 5 seconds, such that the actual keyframe interval doesn't get too long.
This patch more importantly moves away from forcing keyframes, except for some
corner cases since it stays as a fallback. Instead we go to lengths to let the
encoder itself decide where to put keyframes, since that results in better
perceived video quality and lower bitrate, especially at large resolutions, like
screenshare.
Assignee | ||
Comment 26•4 years ago
|
||
This patch makes VideoTrackEncoder wait for at least 5 full (having an end time)
frames, i.e. 6 incoming frames, before Init()ing. This so it can use the last 3
frames of those 5 to measure they're mean framerate and configure the vp8
encoder accordingly.
Configuring the vp8 encoder with an accurate framerate from the start avoids
a re-init if the keyframe distance would later have to be updated. A re-init
means a larger quality degradation than just a reconfig leading to a new
keyframe.
Sources with a framerate that varies wildly will require re-inits to adjust.
The typical source for this would be a canvas where the application cannot or
does not want to keep a consistent rate. Camera capture, screen/window capture,
peer connections (with camera/screen sources) are pretty stable in framerate
and should cause re-inits. Most playback content has constant framerate, so
media element capture is generally exempt from re-inits too, unless paused.
Assignee | ||
Comment 27•4 years ago
|
||
Assignee | ||
Comment 28•4 years ago
|
||
Most importantly this disables the keyframe encoding, but it also reduces the
margin for when to trigger a skipped frame to 85%.
Suddenly encoding a keyframe causes a quality degradation since the codec is
trying to constrain itself to its configured bitrate. Skipping a frame is less
intrusive since it essentially reduces the output framerate but retains the
visible quality of the frames. This is especially noticable with large
resolutions where keyframes can become very blocky.
Assignee | ||
Comment 29•4 years ago
|
||
An application can configure videoBitsPerSecond to be <1000, leading to the
configuration of the vp8 encoder with rc_target_bitrate = 0. A target bitrate
of 0 makes the encoder skip all frames.
Without this we tripped an assert in a crashtest with the whole stack of patches
applied.
Assignee | ||
Comment 30•3 years ago
|
||
This patch is based on an observation that our decoder stack can fail an assert
if given an initial frame whose resolution differs too much from the size
provided in the webm header.
Comment 31•3 years ago
|
||
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/3a248308dc61 Remove UniquePtr wrappers around vpx state in VP8TrackEncoder. r=bryce https://hg.mozilla.org/integration/autoland/rev/c13960f77f32 Only re-allocate vpx image data when strictly needed. r=bryce https://hg.mozilla.org/integration/autoland/rev/ef30b76b1d98 Constify some RollingMean getters. r=glandium https://hg.mozilla.org/integration/autoland/rev/3153ea880408 Change SetConfigurationValues to a static function that doesn't affect state. r=bryce https://hg.mozilla.org/integration/autoland/rev/0cfdab0ac564 Create VP8Metadata only once on Init instead of in getter. r=bryce https://hg.mozilla.org/integration/autoland/rev/f8866040f97e Move mFrameWidth/Height from VideoTrackEncoder to VP8TrackEncoder. r=bryce https://hg.mozilla.org/integration/autoland/rev/6454bf741983 Move key frame interval state from VideoTrackEncoder to VP8TrackEncoder. r=bryce https://hg.mozilla.org/integration/autoland/rev/1f5570ef1fc9 Rework VP8TrackEncoder's frame skipping logic. r=bryce https://hg.mozilla.org/integration/autoland/rev/7845ea82b89c Let VP8TrackEncoder::Reconfigure handle re-init when needed, instead of the caller. r=bryce https://hg.mozilla.org/integration/autoland/rev/875a4e58285b Make VP8TrackEncoder's kf_max_dist configurable and dynamically adapted. r=bryce https://hg.mozilla.org/integration/autoland/rev/75b6f286671f Wait for a few frames to observe framerate before vp8 encoder Init(). r=bryce https://hg.mozilla.org/integration/autoland/rev/fffda786f8c2 Skip framerate estimation on EOS. r=bryce https://hg.mozilla.org/integration/autoland/rev/9c56a50cc6d3 Effectively disable encoding keyframes when on a tight time budget. r=bryce https://hg.mozilla.org/integration/autoland/rev/dd2bab72c12d Don't encode vp8 with 0 bitrate. r=bryce https://hg.mozilla.org/integration/autoland/rev/76227a7f4ed9 Init the video encoder with the known size if available. r=bryce
Comment 32•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3a248308dc61
https://hg.mozilla.org/mozilla-central/rev/c13960f77f32
https://hg.mozilla.org/mozilla-central/rev/ef30b76b1d98
https://hg.mozilla.org/mozilla-central/rev/3153ea880408
https://hg.mozilla.org/mozilla-central/rev/0cfdab0ac564
https://hg.mozilla.org/mozilla-central/rev/f8866040f97e
https://hg.mozilla.org/mozilla-central/rev/6454bf741983
https://hg.mozilla.org/mozilla-central/rev/1f5570ef1fc9
https://hg.mozilla.org/mozilla-central/rev/7845ea82b89c
https://hg.mozilla.org/mozilla-central/rev/875a4e58285b
https://hg.mozilla.org/mozilla-central/rev/75b6f286671f
https://hg.mozilla.org/mozilla-central/rev/fffda786f8c2
https://hg.mozilla.org/mozilla-central/rev/9c56a50cc6d3
https://hg.mozilla.org/mozilla-central/rev/dd2bab72c12d
https://hg.mozilla.org/mozilla-central/rev/76227a7f4ed9
Description
•