Bug 1631263 Comment 63 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Byron Campen [:bwc]

on 2023-08-29 06:45:26 PDT

I'm not exactly sure what review tools are available for MDN, so:

> The framework fires an rtctransform event at the worker global scope on construction of the corresponding RTCRtpScriptTransform, and whenever an encoded frame is enqueued for processing.

rtctransform is only for the initial setup, frames are handed off via a ReadableStream.

> The worker script must implement a handler for the rtctransform event, creating a pipe chain that pipes the event.transformer.readable stream through an appropriate TransformStream to the event.transformer.writable stream.

So, while TransformStream is likely to be the strategy most use, it isn't actually required; the only expectation is that the worker code reads frames from the provided ReadableStream somehow, and writes those frames with whatever modifications it wants to the provided WritableStream somehow, in the same order without any duplications (it could drop some though).

> This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame. Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key. 

There's some subtlety here. WebRTC already has mechanisms to prompt the sending of new keyframes (this is built into the RTP specifications). I think the intent here is to allow JS code a "way out" when it needs a keyframe to do its transform work, for whatever reason.

> A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call getKeyFrame() at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not?

Browsers never receive more than one rid at a time, and do not know or care about rids on receive streams. The point of sending more than one rid at a time (this is called simulcast) is to allow a middlebox to have a variety of levels of video quality available for transmission to other participants in the conference; it will only ever send one at a time to any particular participant, and when it switches it will just look like a quality change to the receiver, not a different stream. This can be used to handle differences in available bandwidth for the participants, and it can also be used to switch resolution rapidly on the fly (for example, the middlebox forwards low-quality video for everyone except the active speaker).

> The rid for generateKeyFrame() appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if you addTrack() so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"?

This ends up being kinda complicated. If you use addTrack, _and_ you're also the offerer, you can't do simulcast at all. If you use addTrack, but you're the answerer, the offer SDP can set up simulcast for you (with rids chosen by the offerer). addTransceiver is the only way to do simulcast as an offerer, and also the only way for JS to select the rids. 

> Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec).

The specs are inconsistent, but the upshot is that it can be between 1 and 255 characters of alpha-numeric. (RFC 8851 allows '-' and '_' and unlimited length, but RFC 8852 disagrees, see https://www.rfc-editor.org/errata/eid7132)

> is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)

Yes. **RTCRtpSender**.generateKeyFrame does not, but that's probably a spec bug.

> For an encoded video frame the type can be "empty". What does that mean/what do you do with it?

I think the only situation where that could come up is if the transform held onto a reference to a frame _after_ it had written it to the writable. Eventually, that frame data will be transferred to another thread, leaving the frame empty? I don't think we implement that right now.

Revision 1 by

Byron Campen [:bwc]

on 2023-08-29 06:51:29 PDT

I'm not exactly sure what review tools are available for MDN, so:

> The framework fires an rtctransform event at the worker global scope on construction of the corresponding RTCRtpScriptTransform, and whenever an encoded frame is enqueued for processing.

rtctransform is only for the initial setup, frames are handed off via a ReadableStream.

> The worker script must implement a handler for the rtctransform event, creating a pipe chain that pipes the event.transformer.readable stream through an appropriate TransformStream to the event.transformer.writable stream.

So, while TransformStream is likely to be the strategy most use, it isn't actually required; the only expectation is that the worker code reads frames from the provided ReadableStream somehow, and writes those frames with whatever modifications it wants to the provided WritableStream somehow, in the same order without any duplications (it could drop some though).

> This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame. Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key. 

There's some subtlety here. WebRTC already has mechanisms to prompt the sending of new keyframes (this is built into the RTP specifications). I think the intent here is to allow JS code a "way out" when it needs a keyframe to do its transform work, for whatever reason.

> A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call getKeyFrame() at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not?

Browsers never receive more than one rid at a time, and do not know or care about rids on receive streams. The point of sending more than one rid at a time (this is called simulcast) is to allow a middlebox to have a variety of levels of video quality available for transmission to other participants in the conference; it will only ever send one at a time to any particular participant, and when it switches it will just look like a quality change to the receiver, not a different stream. This can be used to handle differences in available bandwidth for the participants, and it can also be used to switch resolution rapidly on the fly (for example, the middlebox forwards low-quality video for everyone except the active speaker).

> The rid for generateKeyFrame() appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if you addTrack() so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"?

This ends up being kinda complicated. If you use addTrack, _and_ you're also the offerer, you can't do simulcast at all. If you use addTrack, but you're the answerer, the offer SDP can set up simulcast for you (with rids chosen by the offerer). addTransceiver is the only way to do simulcast as an offerer, and also the only way for JS to select the rids. 

> Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec).

The specs are inconsistent, but the upshot is that it can be between 1 and 255 characters of alpha-numeric. (RFC 8851 allows '-' and '_' and unlimited length, but RFC 8852 disagrees, see https://www.rfc-editor.org/errata/eid7132)

> is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)

Yes. **RTCRtpSender**.generateKeyFrame does not, but that's probably a spec bug.

> For an encoded video frame the type can be "empty". What does that mean/what do you do with it?

I think the only situation where that could come up is if the transform held onto a reference to a frame _after_ it had written it to the writable. Eventually, that frame data will be transferred to another thread, leaving the frame empty? I don't think we implement changing the type to empty when that occurs right now.

Back to Bug 1631263 Comment 63