Open Bug 1706917 Opened 4 years ago Updated 3 years ago

Verify codec configuration in conduits and codec implementations

Categories

(Core :: WebRTC, task, P2)

task

Tracking

()

People

(Reporter: ng, Assigned: bwc)

References

(Blocks 1 open bug)

Details

Structurally find all supported codec (mainly video) parameters supported upstream, and:

  • Make sure they are configured correctly by conduits, and
  • Make sure they are supported by our own codec implementations
No longer blocks: 1706925
No longer blocks: 1706921
No longer blocks: 1706924
No longer blocks: 1706920
Assignee: nobody → docfaraday

VideoCodecVP8::complexity -> Set to 0 (kComplexityNormal) here, which we then use here. Then, this is mapped to MediaDataEncoder::VPXSpecific::Complexity and passed to MediaDataEncoder from here.

VideoCodecVP8::numberOfTemporalLayers -> Set to 1 here, which we then use here. Passed to MediaDataEncoder here.

VideoCodecVP8::denoisingOn -> Set to true here, which we then use here. However, we override this, setting it to false iff we're doing screencasting. We ignore the value returned by WebrtcVideoConduit::Denoising here; if |denoising| is true, |codec_default_denoising| is false, causing us to use true here, whereas if |denoising| false, |codec_default_denoising| is true, also causing us to use true in the same place. I do not see anything else using WebrtcVideoConduit::Denoising in the tree. There is either a bug here, or we can eliminate WebrtcVideoConduit::Denoising. Passed to MediaDataEncoder here.

VideoCodecVP8::automaticResizeOn -> Set to false here, which we then use here, however we override the value based on this code, setting it to true for non-screencast unicast, and false otherwise. Passed to MediaDataEncoder here.

VideoCodecVP8::frameDroppingOn -> Set to true here, which we then use here, but we override, setting it to true iff we aren't screencasting. Passed to MediaDataEncoder here.

VideoCodecVP8::keyFrameInterval -> Set to 3000 here, which we then use here. Passed to MediaDataEncoder here.

^ Is this the kind of analysis we're looking for here?

Flags: needinfo?(na-g)

VideoCodecVP9::complexity -> Set to 0 (kComplexityNormal) here, which we then use here. Then, this is mapped to MediaDataEncoder::VPXSpecific::Complexity and passed to MediaDataEncoder from here.

VideoCodecVP9::numberOfTemporalLayers -> Set to 1 here, which we then use here. Passed to MediaDataEncoder here.

VideoCodecVP9::denoisingOn -> Set to true here, which we then use here. However, we override this if WebrtcVideoConduit::Denoising returns false or if screencasting. Passed to MediaDataEncoder here.

VideoCodecVP9::frameDroppingOn -> Set to true here, and also by us here, even when doing screencast. Passed to MediaDataEncoder here.

VideoCodecVP9::keyFrameInterval -> Set to 3000 here, which we then use here. Passed to MediaDataEncoder here.

VideoCodecVP9::adaptiveQpMode -> Set to true here, which we then use here. Passed to MediaDataEncoder here.

VideoCodecVP9::automaticResizeOn -> Set to true here, which we then use here. We do not set this to false in the screencapture case like we do for VP8. Passed to MediaDataEncoder here.

VideoCodecVP9::numberOfSpatialLayers -> Set to 2 in the screencast case, or the return of WebrtcVideoConduit::SpatialLayers otherwise. The return of WebrtcVideoConduit::SpatialLayers is either 1, or equal to the value of the 'media.peerconnection.video.svc.spatial' pref, although it is unclear whether values other than 1 will work. Passed to MediaDataEncoder here.

VideoCodecVP9::flexibleMode -> Set to false here, which we then use here. Passed to MediaDataEncoder at encoder creation time here, and then again at encode init time here? Weird that we're setting this in two different places. Someone who knows more about this should give this a look.

VideoCodecVP9::interLayerPred -> Set to On here, which we then use here. Not passed on to MediaDataEncoder at any point. Query: Are we doing the right thing here?

VideoCodecH264::frameDroppingOn -> We set this to true iff we are not screencasting. I don't see any code actually using this field, however. Query: Are we doing the right thing here?

VideoCodecH264::keyFrameInterval -> Set to 3000 in GetDefaultH264Settings, which we call in ConfigureVideoEncoderSettings. The only read I see is in MultiplexEncoderAdapter::InitEncode, and that only has an effect in MultiplexEncoderAdapter::Encode. I do not know if this is in the path for any of our H264 encoding. Further verification will be needed.

VideoCodecH264::scaleDownBy -> Set to 0 with a memset, and never read/written again. This is a double, so memset 0 is a little sketchy maybe, but it isn't being used anyway.

VideoCodecH264::numberOfTemporalLayers -> Set to 1 in VideoEncoder::GetDefaultH264Settings, which we call in ConfigureVideoEncoderSettings. Not read in any mozilla specific code, and only appears to be used in libwebrtc code to disable the frame dropper in some cases. Seems to have no real effect in our build.

VideoCodecH264::packetizationMode -> Set in ConfigureVideoEncoderSettings based on fmtp negotiation. Used to set CodecSpecificInfoH264::packetization_mode inside WebrtcGmpVideoEncoder::InitEncode and InitCodecSpecficInfo.

(In reply to Byron Campen [:bwc] from comment #3)

^ Is this the kind of analysis we're looking for here?

Yes, this is exactly what we are looking for.

Flags: needinfo?(na-g)

VideoCodec::codecType -> Surely this is being set.

VideoCodec::width and VideoCodec::height -> These aren't configured, but are set based on frames.

VideoCodec::startBitrate -> This is set:

Our conduit code uses a totally different entry point for starting bitrate that does not seem to have anything to do with VideoCodec::startBitrate, and there are a lot of hops:

There really do seem to be at least two distinct implementations of starting bitrate in here, and I am not sure how (or if) these things are reconciled. All I have now is a headache.

VideoCodec::maxBitrate -> This is set:

The way that we are trying to set max bitrate is:

VideoCodec::minBitrate -> We do not attempt to configure this in our conduit code, since that is not something that we negotiate, or configure from JS. We do use it in WebrtcGmpVideoEncoder::InitEncode and WebrtcMediaDataEncoder::SetupConfig

VideoCodec::maxFramerate -> We do not set this in our conduit code, although we do have a field (EncodingConstraints::maxFps) that we set based on SDP negotiation. This ends up being used in SelectSendFrameRate, which is called by WebrtcVideoConduit::OnControlConfigChange and WebrtcVideoConduit::SelectSendResolution to set WebrtcVideoConduit::mSendingFramerate, which is used to set VideoStreamFactory::mSendingFramerate, which is used by VideoStreamFactory::CreateEncoderStreams to set the VideoStream::max_framerate in the streams it creates. VideoStream::max_framerate is in turn used:

maxFramerate is used in WebrtcGmpVideoEncoder::InitEncode, WebrtcMediaDataEncoder::SetupConfig, and WebrtcMediaDataEncoder::CreateEncoder

VideoCodec::active -> Seems to do what the comment says; the bitrate allocator looks at this and allocates 0 if false, and nothing else pays any attention to it. Set either based on:

VideoCodec::qpMax

Substantial reads:

Writes:

I do not see this VideoCodec::qpMax/SpatialLayer::qpMax/VideoStream::max_qp triad making it to our platform encoders in any way... Query: Do our platform encoders support this at all?

VideoCodec::expect_encode_from_texture

I only see this set in one place, and never read by anything. The Chromium Code Search turns up only one use, in an EXPECT_EQ in a unit-test:

https://source.chromium.org/search?q=expect_encode_from_texture&sq=&ss=chromium

VideoCodec::buffer_pool_size

We don't seem to be using this.

VideoCodec::legacy_conference_mode

We aren't using this.

VideoStream::width and VideoStream::height

Set by VideoStreamFactory::CreateEncoderStreams, based on VideoStreamFactory::mCodecConfig, which holds information on the number of simulcast streams and the scale-down value for each, and which is set when the VideoStreamFactory is constructed by WebrtcVideoConduit::OnControlConfigChange. (Note: This code uses the passed parameter |config| to determine the stream count which is used to index into mCodecConfig.mEncodings, and not the size of mCodecConfig.mEncodings. This strikes me as fragile.) This scale-down information is passed to VideoAdapter::OnScaleResolutionBy, and then the width/height is adjusted with a call to VideoAdapter::AdaptFrameResolution.

VideoStream::min_bitrate_bps, VideoStream::target_bitrate_bps, and VideoStream::max_bitrate_bps

Set in SelectBitrates which is called by VideoStreamFactory::CreateEncoderStreams, based on:

VideoStream::scale_resolution_down_by

I don't see anything setting this besides things like bulk adjustment factors, ceilings, and floors. The comment seems to indicate that setting this does not cause the dimensions to be further downscaled, but it is possible that failing to set this field will confuse things further down the line, possibly in subtle and hard to diagnose ways. Chromium does seem to set this.

Query: What bugs might this be causing?

If this is causing a bug, it would be related to AlignmentAdjuster::GetAlignmentAndMaybeAdjustScaleFactors, the only user of this field. This is called from VideoStreamEncoder::ReconfigureEncoder right before the call to VideoStreamFactory::CreateEncoderStreams. So, we might be getting our alignment calculations for simulcast wrong. I am not 100% sure what the consequences are of this, but this comment hints that we could end up with bad cropping or aspect ratio.

VideoStream::num_temporal_layers

VideoStreamFactory::CreateEncoderStreams sets this to 1 for simulcast screenshare, and 2 for regular simulcast, and not at all in non-simulcast.

VideoStream::bitrate_priority

VideoStreamFactory::CreateEncoderStreams sets this based on VideoEncoderConfig::bitrate_priority within VideoStreamEncoder::encoder_config_ which is set as described in comment 15. VideoEncoderConfig::bitrate_priority is only ever set in WebrtcVideoConduit::OnControlConfigChange, to 1, although it looks like there's a TODO there to connect it to the webrtc-priority w3c work.

VideoStream::active

We are not using this, but will probably start doing so as part of bug 1676855.

VideoEncoderConfig::video_format

Exactly the same story as codec_type; seems to be redundant information, and there is even a TODO for eliminating one of these.

VideoEncoderConfig::video_stream_factory

WebrtcVideoConduit::OnControlConfigChange sets this to a dedicated VideoStreamFactory with additional config bits that are not captured in WebrtcVideoConduit::mEncoderConfig.

VideoEncoderConfig::spatial_layers

Nothing ever sets this in our codebase, and it seems that the same is true in Chromium? Query: Is this just dead code?

VideoEncoderConfig::encoder_specific_settings

WebrtcVideoConduit::OnControlConfigChange sets this, based on the return of ConfigureVideoEncoderSettings, which ends up being a wrapper around either a VideoCodecH264 (covered in comment 5), a VideoCodecVP8 (covered in comment 2), or a VideoCodecVP9 (covered in comment 4).

VideoEncoderConfig::min_transmit_bitrate_bps

WebrtcVideoConduit::OnControlConfigChange sets this to WebrtcVideoConduit::mMinBitrate, which is set in the constructor, ultimately based on a combination of the "media.peerconnection.video.min_bitrate" pref and kViEMinCodecBitrate_bps with a current default of 0. We are configuring this same thing on VideoStream::min_bitrate_bps (as described in comment 22), via VideoStreamFactory. Query: Couldn't we simplify this? VideoStreamFactory::CreateEncoderStreams is passed the config object, which has a copy of the same minimum bitrate in it, making VideoStreamFactory::mMinBitrate unnecessary...

VideoEncoderConfig::max_bitrate_bps

WebrtcVideoConduit::OnControlConfigChange sets this to the minimum of:

It should be noted that this calculation is similar to the one carried out by SelectBitrates, as described in comment 22, but not quite the same. Query: Should we reconcile these? I don't think we want there to be two different implementations of this basic logic. Given that the config object has a max bitrate field, and the factory has access to that config object, we ought to be performing this calculation in the conduit code, and have the factory copy the field from the config object when creating VideoStreams.

VideoEncoderConfig::simulcast_layers

I don't see anything in our codebase that puts anything in here... it seems that these are intended to be filled by something before the video_stream_factory gets a crack at the config (which makes it less of a factory, and more of a converter or mutator). AlignmentAdjuster::GetAlignmentAndMaybeAdjustScaleFactors (see the potential alignment problem noted in comment 23) only operates on elements in this array, so to fix that problem, we would need to start using VideoEncoderConfig::simulcast_layers as well as VideoStream::scale_resolution_down_by. Query: What other problems are we going to trip over by using a factory as a factory?

There's also the question if we set codec params properly from signaling, like we do here for opus. If any params for opus were added upstream in the update, we might need to add them. And then check that they are propagated properly (like you've been doing).

There are less sdp params for video I believe, but we should verify this. Especially that those that exist are handled by our own codecs.

SdpAudioFormat::name, SdpAudioFormat::clockrate_hz, SdpAudioFormat::num_channels

WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets these, based on AudioCodecConfig::mName, AudioCodecConfig::mFreq, and AudioCodecConfig::mChannels respectively. This is in turn called by WebrtcAudioConduit::OnControlConfigChange (as well as WebrtcAudioConduit::CodecConfigToWebRTCCodec, which appears to be unused Query: Do we get rid of this, or did we miss something here?). JsepCodecDescToAudioCodecConfig performs the conversion from JsepCodecDescription to AudioCodecConfig, which carries over JsepCodecDescription::mName, JsepCodecDescription::mClock, and JsepCodecDescription::mChannels (subject to a JsepAudioCodecDescription::mForceMono check). The name/frequency/channels are hard-coded in JsepSessionImpl::SetupDefaultCodecs, while mForceMono is negotiated in JsepAudioCodecDescription::Negotiate for Opus send streams, based on whether a stereo fmtp is present.

SdpAudioFormat::Parameters

This is used in many places in libwebrtc:

"useinbandfec"/kCodecParamUseInbandFec: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to the value of AudioCodecConfig::mFECEnabled, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mFECEnabled. For send streams, this is negotiated, and for receive streams, this is set based on the "media.navigator.audio.use_fec" pref (which is true by default). (In other words, the receiver decides whether to use FEC, and right now both Firefox and Chrome do this) Both our sipcc and rust SDP parsers look like they support this fmtp.

"usedtx"/kCodecParamUseDtx: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to the value of AudioCodecConfig::mDTXEnabled, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mDTXEnabled, which JsepAudioCodecDescription::Negotiate sets based on fmtp for opus (but only for send). Like FEC, the receiver decides whether DTX will be used on a given stream, but neither Chrome nor Firefox use the usedtx fmtp right now, since the default for opus is to use variable bitrate (DTX allows a dramatic slowdown in packet transmission when there is silence, to save bandwidth, but variable bitrate also saves bandwidth when there is silence). Both our sipcc and rust SDP parsers appear to support this fmtp.

"cbr"/kCodecParamCbr: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mCbrEnabled, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mCbrEnabled, which JsepAudioCodecDescription::Negotiate sets based on the cbr fmtp for opus (again, only for send). Neither Firefox or Chrome use the cbr fmtp. Both our sipcc and rust SDP parsers appear to support this fmtp.

"maxaveragebitrate"/kCodecParamMaxAverageBitrate: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mMaxAverageBitrate, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mMaxAverageBitrate, which JsepAudioCodecDescription::Negotiate sets based on the maxaveragebitrate fmtp for opus (again, only for send). Neither Firefox nor Chrome use this fmtp. Both our sipcc and rust SDP parsers appear to support this fmtp.

"minptime"/kCodecParamMinPTime: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mMinFrameSizeMs, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mMinFrameSizeMs, which JsepAudioCodecDescription::Negotiate sets based on the minptime opus fmtp (again, only for send). Firefox does not use this fmtp, but Chrome does by default (with a value of 10). Our sipcc-based SDP parser does not support it. Our rust-based SDP parser does appear to support minptime. Query: This almost certainly explains some of the fmtp differences we're seeing in the field. Nico?

"maxptime"/kCodecParamMaxPTime: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mMaxFrameSizeMs, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mMaxFrameSizeMs, which JsepAudioCodecDescription::Negotiate sets based on the maxptime opus fmtp (again, only for send). Neither Firefox nor Chrome use this fmtp. Our sipcc-based SDP parser does not support this fmtp, but our rust SDP parser does. Query: Maybe this might explain some of the fmtp differences we're seeing in the field? Nico?

"stereo"/kCodecParamStereo: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to true iff the number of channels is 2 (this is duplicative, but whatever), which is set in the manner described in comment 40. Both our sipcc and rust SDP parsers support this fmtp.

"maxplaybackrate"/kCodecParamMaxPlaybackRate: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mMaxPlaybackRate, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription, which JsepAudioCodecDescription::Negotiate sets based on the maxplaybackrate opus fmtp (again, only for send). Firefox always sets this to 48000, while Chrome does not emit it. about:webrtc displays the maxplaybackrate=0 for Chrome's SDP, which is a bit concerning. Query: Is there some parse code that is not assigning the correct default of 48000 when the fmtp is not present?

"ptime"/kCodecParamPTime: WebrtcAudioConduit::CodecConfigToLibwebrtcFormat sets this to AudioCodecConfig::mFrameSizeMs, which JsepCodecDescToAudioCodecConfig sets to JsepAudioCodecDescription::mFrameSizeMs, which JsepAudioCodecDescription::Negotiate sets based on either the a=maxptime SDP attribute, or if that is not set, the a=ptime attribute, or if that is not set, the ptime opus fmtp param (but only for the send direction, and only for opus). Neither Firefox nor Chrome use put any of these things in their SDP. Our sipcc and rust SDP parsers seem to support a=ptime and a=maxptime, but only our rust parser supports the ptime fmtp. Additionally, the ptime-related logic in JsepAudioCodecDescription::Negotiate seems pretty strange to me. Query: Should this be setting mMaxFrameSizeMs instead of mFrameSizeMs?

Flags: needinfo?(na-g)

Is there anything else we need to examine here?

Flags: needinfo?(na-g)
Flags: needinfo?(apehrson)

Yes, SdpVideoFormat, and most importantly its parameters. We have our own platform codec impls for video and they need to handle these parameters properly too. Decoders can generally inspect the bitstream but encoders would have to rely on parameters. If we know what the parameters are per-codec we can always divide the work of verifying our codec impls later.

Great work on the others so far. Could you summarize any action points when you're done? I'm low on cycles and this is a lot to go through.

Flags: needinfo?(apehrson)

This will be very useful as we look at deleting SipCC and refactoring WebRTC-SDP.

Flags: needinfo?(na-g)

(In reply to Andreas Pehrson [:pehrsons] from comment #44)

Yes, SdpVideoFormat, and most importantly its parameters. We have our own platform codec impls for video and they need to handle these parameters properly too. Decoders can generally inspect the bitstream but encoders would have to rely on parameters. If we know what the parameters are per-codec we can always divide the work of verifying our codec impls later.

Great work on the others so far. Could you summarize any action points when you're done? I'm low on cycles and this is a lot to go through.

I don't see any of our code writing SdpVideoFormat::parameters, is that something we expect to see?

Flags: needinfo?(apehrson)

Yeah, the stuff we're negotiating in SDP is definitely never ending up in SdpVideoFormat::parameters, and therefore not making it down into the code in WebrtcMediaDataEncoder::CreateEncoder. We just end up using the default in ParseSdpProfileLevelId. Now, this just happens to be the same thing that we negotiate by default, but we end up doing the wrong thing if SDP negotiation settles on something else.

(In reply to Byron Campen [:bwc] from comment #47)

Yeah, the stuff we're negotiating in SDP is definitely never ending up in SdpVideoFormat::parameters, and therefore not making it down into the code in WebrtcMediaDataEncoder::CreateEncoder. We just end up using the default in ParseSdpProfileLevelId. Now, this just happens to be the same thing that we negotiate by default, but we end up doing the wrong thing if SDP negotiation settles on something else.

Yeah, we need a bug here.

(In reply to Byron Campen [:bwc] from comment #46)

I don't see any of our code writing SdpVideoFormat::parameters, is that something we expect to see?

Great, so we shouldn't expect a regression from this. But we should look at what is supported by the upstream codecs, and what Chromium is writing. Because I'd guess there's something we should pass on from negotiation, although not as much as for audio.

Flags: needinfo?(apehrson)

(In reply to Nico Grunbaum [:ng, @chew:mozilla.org] from comment #48)

(In reply to Byron Campen [:bwc] from comment #47)

Yeah, the stuff we're negotiating in SDP is definitely never ending up in SdpVideoFormat::parameters, and therefore not making it down into the code in WebrtcMediaDataEncoder::CreateEncoder. We just end up using the default in ParseSdpProfileLevelId. Now, this just happens to be the same thing that we negotiate by default, but we end up doing the wrong thing if SDP negotiation settles on something else.

Yeah, we need a bug here.

Filed bug 1746205.

See Also: → 1746205
See Also: → 1746466
See Also: → 1746469
See Also: → 1746472
See Also: → 1746477
See Also: → 1746481
See Also: → 1746491
See Also: → 1746492
See Also: → 1746497
See Also: → 1746514
You need to log in before you can comment on or make changes to this bug.