Closed Bug 1631263 Opened 4 years ago Closed 8 months ago

Support RTCRtpScriptTransform (formerly webrtc insertable streams)

Categories

(Core :: WebRTC: Networking, enhancement, P2)

68 Branch
enhancement

Tracking

()

RESOLVED FIXED
117 Branch
Webcompat Priority P1
Tracking Status
relnote-firefox --- 117+
firefox117 --- fixed

People

(Reporter: jerome.bouat, Assigned: bwc)

References

(Blocks 3 open bugs, Regressed 3 open bugs)

Details

(Keywords: dev-doc-complete, Whiteboard: [jitsi-meet], [wptsync upstream])

Attachments

(4 files, 8 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

Insertable streams provide a hook between the RTP (de)packetizer and the encoder/decoder as described here :
http://www.w3.org/2019/09/18-mediaprocessing-harald-insertable-media-processing.pdf

The Jisti team implemented an end-to-end encryption by using the insertable streams of Chrome with the Jisti Meet system :
https://jitsi.org/blog/e2ee/

This allows their video bridge (Selective Forwarding Unit) to easily forward packets whatever they are encrypted or not, whatever the receiver quality is.

On the end points, the insertable stream can be processed into a separated thread (or process ?), besides the usual (de)packetizers and encoder/decoder.

Actual results:

It seems Firefox doesn't support it.

Expected results:

Plan a support for this feature ? Are their any security issues ?

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core

Jib, would you mind taking a initial triage pass here?

Flags: needinfo?(jib)

Please add insertable streams support to Firefox.

Access give access to:

  • end-to-end encryption
  • adding more reliability with forward error correction

For more information, please watch the “Jitsi Community Call” (Streamed live on 21 Apr 2020) at 21’30”
https://youtu.be/y4CW3c_Es4I?t=1290

100% support for Firefox (and other non-Chrome browsers)
https://github.com/jitsi/jitsi-meet/issues/4758

It is understood that Chromium has the insertable streams feature.

What do you think?

Thank you

I believe Mozilla is working on a position on insertable streams. We should probably open an issue on https://github.com/mozilla/standards-positions/

Flags: needinfo?(jib) → needinfo?(drno)

Correct. Mozilla first needs to decide about it position on https://github.com/mozilla/standards-positions/ what we think about Google insertable streams proposal. These discussions don't happen in bugzilla.

Flags: needinfo?(drno)

(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #4) and (In reply to Nils Ohlmeier [:drno] from comment #5)
https://github.com/mozilla/standards-positions/issues/330

It looks like this is being worked on in the standards-positions tracker. I am closing this as moved for now, it can be reopened or recreated when a position is taken.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → MOVED

I guess this issue can be reopened now that a position is taken.

Hi,
any news about this?

(In reply to Nico Grunbaum [:ng, @chew:mozilla.org] from comment #7)

It looks like this is being worked on in the standards-positions tracker. I am closing this as moved for now, it can be reopened or recreated when a position is taken.

Now that a position has been taken could this bug be reopened? Or is there another bug tracking work on WebRTC insertable streams?

Status: RESOLVED → UNCONFIRMED
Resolution: MOVED → ---
Status: UNCONFIRMED → NEW
Ever confirmed: true

Just to update this issue that the plan is to implement the latest spec, which is RTCRtpScriptTransform, and not any previous Chrome API. This is the API we'd implement that most matches the requests from major services.

Separately, the spec also offers the SFrameTransform API, which would provide sframe encoding natively without requiring this to be done by JS. This API is a bit less mature, and we haven't gotten requests for it yet. I'll open a separate issue on that.

Severity: normal → S3
Component: WebRTC: Audio/Video → WebRTC: Networking
Priority: -- → P2
Summary: support webrtc insertable streams → Support RTCRtpScriptTransform (formerly webrtc insertable streams)

Jitsi Meet has just implemented RTCRtpScriptTransform support, so E2EE on Jitsi Meet will work in Firefox once Firefox gains RTCRtpScriptTransform support.

Since firefox 96 fixed many issues about webRTC support https://bugzilla.mozilla.org/show_bug.cgi?id=1654112#c371. This is one important remaining bug. Is there any ETA?

Depends on: 1749547
No longer depends on: 1474543

Any update on this?

Any news?

https://bugzilla.mozilla.org/show_bug.cgi?id=1715625#c3 says that this is blocking Facetime support in Firefox.

Webcompat Priority: --- → ?

Given that we can get support for Facetime, setting webcompat priority as P1.

Webcompat Priority: ? → P1

Any news on this? This is very important for firefox compatibility, especially since the not support pushes Firefox in more unsecure areas when using e.g. Jitsi or any other Web-Video-Conference and e2ee is not possible due to the lack of support for this feature in FF.

Has there been any progress now that this is P1?

As others have said this is a major blocker with using Firefox for end to end encryption in webRTC.

The next version 106 will contain many enhancements of WebRTC https://www.mozilla.org/en-US/firefox/106.0beta/releasenotes/, so maybe they will focus on that later on, I hope.

Sorry to be the person to bother everyone again, but I think people would appreciate if you had some idea of what the priority and potential timeframe on this is, considering that it's the sole blocker for making multiple major services usable (or usable with increased functionality) in Firefox.

Flags: needinfo?(docfaraday)
Assignee: nobody → jib

(In reply to Justin Peter from comment #23)

Sorry to be the person to bother everyone again, but I think people would appreciate if you had some idea of what the priority and potential timeframe on this is, considering that it's the sole blocker for making multiple major services usable (or usable with increased functionality) in Firefox.

We're currently working on getting this implemented!

Flags: needinfo?(docfaraday)
See Also: → 1835076
Blocks: 1835405
See Also: → 1835405
See Also: 1835405
Attachment #9334590 - Attachment is obsolete: true
Attachment #9335741 - Attachment is obsolete: true

Depends on D179099

Depends on D179731

Depends on D179732

Depends on D179733

Attachment #9335916 - Attachment is obsolete: true
Depends on: 1837570
Depends on: 1838080

Comment on attachment 9337092 [details]
WIP: Bug 1631263: (WIP) libwebrtc modifications (needs moving to a different bug)

Revision D179731 was moved to bug 1838080. Setting attachment 9337092 [details] to obsolete.

Attachment #9337092 - Attachment is obsolete: true
Attachment #9337093 - Attachment description: WIP: Bug 1631263: (WIP) Test-cases for bug. → Bug 1631263: Test-cases for bug. r?jib
Attachment #9337094 - Attachment description: WIP: Bug 1631263: (WIP) Add generic "event with options" internal API to Worker. → Bug 1631263: Add generic "event with options" internal API to Worker. r?asuth
Attachment #9337095 - Attachment description: WIP: Bug 1631263: RTCRtpScriptTransform webidl. → Bug 1631263: RTCRtpScriptTransform webidl. r?#webidl
Attachment #9337096 - Attachment description: WIP: Bug 1631263: (WIP) Implement RTCRtpScriptTransform. → Bug 1631263: Implement RTCRtpScriptTransform. r?pehrsons,jib
Attachment #9338201 - Attachment description: WIP: Bug 1631263: Work around bug in libwebrtc where the last ref to a taskqueue can be in a lambda running on that taskqueue. → Bug 1631263: Work around bug in libwebrtc where the last ref to a taskqueue can be in a lambda running on that taskqueue. r?pehrsons
Attachment #9337096 - Attachment description: Bug 1631263: Implement RTCRtpScriptTransform. r?pehrsons,jib → Bug 1631263: Implement RTCRtpScriptTransform. r?pehrsons,jib,asuth,#webidl
Attachment #9337095 - Attachment is obsolete: true
Attachment #9337094 - Attachment is obsolete: true
Duplicate of this bug: 1804949
Attachment #9337093 - Attachment description: Bug 1631263: Test-cases for bug. r?jib → Bug 1631263: Expand test-cases for RTCRtpScriptTransform. r?jib

This was breaking code in RTCRtpScriptTransformer that relied on the first rid
having 0 as its simulcast index. It was probably also resulting in a priority
inversion on simulcast streams.
(see https://www.rfc-editor.org/rfc/rfc8853.html#section-7.1)

Also remove some unused cruft.

Depends on D179735

Attachment #9338201 - Attachment is obsolete: true
Duplicate of this bug: 1836295
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4cfe27d450e4
Expand test-cases for RTCRtpScriptTransform. r=jib
https://hg.mozilla.org/integration/autoland/rev/d51f80b53059
Implement RTCRtpScriptTransform. r=pehrsons,jib,asuth,emilio,saschanaz
https://hg.mozilla.org/integration/autoland/rev/2dd5c7b8fe5b
Remove some reverse iteration that was causing the rid order to be backwards in libwebrtc. r=pehrsons
https://hg.mozilla.org/integration/autoland/rev/717eac481650
1838080: apply code formatting via Lando
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/40939 for changes under testing/web-platform/tests
Whiteboard: [jitsi-meet] → [jitsi-meet], [wptsync upstream]

Backed out for causing VideoConduitTest related failures

Backout link

Push with failures

Failure log

Flags: needinfo?(jib)
Upstream PR was closed without merging
Assignee: jib → docfaraday
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7a2c4be3701d
Expand test-cases for RTCRtpScriptTransform. r=jib
https://hg.mozilla.org/integration/autoland/rev/581c7a4d8f5f
Implement RTCRtpScriptTransform. r=pehrsons,jib,asuth,emilio,saschanaz
https://hg.mozilla.org/integration/autoland/rev/dc46d2be4c25
Remove some reverse iteration that was causing the rid order to be backwards in libwebrtc. r=pehrsons
https://hg.mozilla.org/integration/autoland/rev/a857a01fb321
1838080: apply code formatting via Lando
Regressions: 1844403
Regressions: 1844412

Backed out for causing wpt failures in script-transform-generateKeyFrame.https.html

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-TIMEOUT | /webrtc-encoded-transform/script-transform-generateKeyFrame.https.html | generateKeyFrame works with simulcast rids - Test timed out
Flags: needinfo?(docfaraday)

Looks like that tester is just too slow to reliably run that test. I might be able to break the test file up into multiple smaller test files.

Flags: needinfo?(docfaraday)
Upstream PR was closed without merging

Breaking the test file up seems to have resolved the timeouts.

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ebdc00fb1689
Expand test-cases for RTCRtpScriptTransform. r=jib
https://hg.mozilla.org/integration/autoland/rev/1012734b7311
Implement RTCRtpScriptTransform. r=pehrsons,jib,asuth,emilio,saschanaz
https://hg.mozilla.org/integration/autoland/rev/b0348f1f8d71
Remove some reverse iteration that was causing the rid order to be backwards in libwebrtc. r=pehrsons
https://hg.mozilla.org/integration/autoland/rev/98677779a3a4
1838080: apply code formatting via Lando
Regressions: 1844623
Upstream PR merged by moz-wptsync-bot
Regressions: 1844710
Flags: needinfo?(jib)

Do we want to call this out in the Fx117 relnotes?

Flags: needinfo?(docfaraday)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #52)

Do we want to call this out in the Fx117 relnotes?

Definitely, yes!

Flags: needinfo?(docfaraday)

I'll post an intent to ship today or tomorrow. Sorry for the delay on that.

Release Note Request (optional, but appreciated)
[Why is this notable]:
RTCRtpScriptTransform is a feature that is in use in major services (eg; Facetime), and we are the second major implementer. Chrome supports an alternative standard.
[Affects Firefox for Android]:
In theory, although I'm not sure how many services use this feature on mobile.
[Suggested wording]:
Implemented RTCRtpScriptTransform.
[Links (documentation, blog post, etc)]:
https://w3c.github.io/webrtc-encoded-transform/

@jib, do we have anything to link to here besides the spec?

relnote-firefox: --- → ?
Flags: needinfo?(jib)
Keywords: dev-doc-needed

MDN doesn't have a page for it yet, but there's https://caniuse.com/?search=RTCRtpScriptTransform

WebRTC Samples is failing over bug 1396922:

Uncaught TypeError: transceiver.setCodecPreferences is not a function    

I've been meaning to write an adapter.js shim for that, but not gotten around to it.

In the meantime this fiddle demonstrates use of the API successfully in Firefox (for VP8).

Flags: needinfo?(jib)

I've submitted a fix to WebRTC samples that makes it work. In the meantime, you can try it here https://jan-ivar.github.io/samples/src/content/insertable-streams/endtoend-encryption/

Added to the Fx117 relnotes.

FF117 MDN Docs work for this can be tracked in https://github.com/mdn/content/issues/28280

For the docs, I have a few questions on RTCRtpScriptTransform(worker, options, transfer)

  • Can you provide a practical example of when you would need to use the transfer option and how your would do it? I think you could do it like:
    const myMessage = "encode";
    const mySomeObject= new SomeObject().
    videoSender.transform = new RTCRtpScriptTransform(worker, {
    transform_type: , thingy: mySomeObject
    }, [mySomeObject, myMessage ]);
    
    But not sure when you would.
  • For docs I've said you get an exception if an object in the transfer array can't be transferred. Is that precise enough? I mean I expect you would understand you can't transfer an already detached object etc?

A slightly bigger question more "for my understanding". The RTCRtpScriptTransform hooks you into an RTC pipeline for incoming/outgoing frames. I understand it has an internal readable/writable that are piped encoded frames. These readable/writable are exposed to the worker in a RTCRtpScriptTransformer.
My problem is this: in the spec it seems to indicate that the readable/writable are transferred to the worker. I thought transferring made them inaccessible to the main thread. So I was wondering how the rest of the RTC pipeline can still add frames to these variables. Is it that there is some magic going on here outside of JavaScript? I.e. you've transferred the readable writable but RTC itself is writing to the same memory in C++ directly, not going via the original JavaScript values in RTCRtpScriptTransform? Sorry if this is very dumb!

Flags: needinfo?(docfaraday)

(In reply to Hamish Willee from comment #60)

For the docs, I have a few questions on RTCRtpScriptTransform(worker, options, transfer)

  • Can you provide a practical example of when you would need to use the transfer option and how your would do it? I think you could do it like:
    const myMessage = "encode";
    const mySomeObject= new SomeObject().
    videoSender.transform = new RTCRtpScriptTransform(worker, {
    transform_type: , thingy: mySomeObject
    }, [mySomeObject, myMessage ]);
    
    But not sure when you would.
  • For docs I've said you get an exception if an object in the transfer array can't be transferred. Is that precise enough? I mean I expect you would understand you can't transfer an already detached object etc?

So, the wpt for this use |transfer| to pass an extra MessagePort that is used to communicate with the worker-side object that is handling that specific transform (instead of postMessage and demuxing):

https://searchfox.org/mozilla-central/rev/d7a8eadc28298c31381119cbf25c8ba14b8712b3/testing/web-platform/tests/webrtc-encoded-transform/routines.js#23-30

I could see webdevs using a similar pattern. It is likely that webdevs will pass crypto-related stuff there too.

A slightly bigger question more "for my understanding". The RTCRtpScriptTransform hooks you into an RTC pipeline for incoming/outgoing frames. I understand it has an internal readable/writable that are piped encoded frames. These readable/writable are exposed to the worker in a RTCRtpScriptTransformer.
My problem is this: in the spec it seems to indicate that the readable/writable are transferred to the worker. I thought transferring made them inaccessible to the main thread. So I was wondering how the rest of the RTC pipeline can still add frames to these variables. Is it that there is some magic going on here outside of JavaScript? I.e. you've transferred the readable writable but RTC itself is writing to the same memory in C++ directly, not going via the original JavaScript values in RTCRtpScriptTransform? Sorry if this is very dumb!

Yeah, the spec is really wonky here, and overspecifies implementation details. The essential property here is that there are readable/writable exposed to the worker, and how that happens under the hood should not matter.

Flags: needinfo?(docfaraday)
Attachment #9336594 - Attachment is obsolete: true

Thanks very much @docfaraday - that was very helpful.

Getting there with the docs. Would it be possible to get a review of Using_Encoded_Transforms in particular? .... and https://github.com/mdn/content/pull/28439 if you have more time (that's still a little "under construction").

A few more questions, that came up as I was documenting:

  • A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call getKeyFrame() at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not?
  • The rid for generateKeyFrame() appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if you addTrack() so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"? Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec).
  • is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)
  • For an encoded video frame the type can be "empty". What does that mean/what do you do with it?

I don't think these are obvious from the spec, but I might be being dim!

Flags: needinfo?(docfaraday)

I'm not exactly sure what review tools are available for MDN, so:

The framework fires an rtctransform event at the worker global scope on construction of the corresponding RTCRtpScriptTransform, and whenever an encoded frame is enqueued for processing.

rtctransform is only for the initial setup, frames are handed off via a ReadableStream.

The worker script must implement a handler for the rtctransform event, creating a pipe chain that pipes the event.transformer.readable stream through an appropriate TransformStream to the event.transformer.writable stream.

So, while TransformStream is likely to be the strategy most use, it isn't actually required; the only expectation is that the worker code reads frames from the provided ReadableStream somehow, and writes those frames with whatever modifications it wants to the provided WritableStream somehow, in the same order without any duplications (it could drop some though).

This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame. Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key.

There's some subtlety here. WebRTC already has mechanisms to prompt the sending of new keyframes (this is built into the RTP specifications). I think the intent here is to allow JS code a "way out" when it needs a keyframe to do its transform work, for whatever reason.

A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call getKeyFrame() at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not?

Browsers never receive more than one rid at a time, and do not know or care about rids on receive streams. The point of sending more than one rid at a time (this is called simulcast) is to allow a middlebox to have a variety of levels of video quality available for transmission to other participants in the conference; it will only ever send one at a time to any particular participant, and when it switches it will just look like a quality change to the receiver, not a different stream. This can be used to handle differences in available bandwidth for the participants, and it can also be used to switch resolution rapidly on the fly (for example, the middlebox forwards low-quality video for everyone except the active speaker).

The rid for generateKeyFrame() appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if you addTrack() so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"?

This ends up being kinda complicated. If you use addTrack, and you're also the offerer, you can't do simulcast at all. If you use addTrack, but you're the answerer, the offer SDP can set up simulcast for you (with rids chosen by the offerer). addTransceiver is the only way to do simulcast as an offerer, and also the only way for JS to select the rids.

Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec).

The specs are inconsistent, but the upshot is that it can be between 1 and 255 characters of alpha-numeric. (RFC 8851 allows '-' and '_' and unlimited length, but RFC 8852 disagrees, see https://www.rfc-editor.org/errata/eid7132)

is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)

Yes. RTCRtpSender.generateKeyFrame does not, but that's probably a spec bug.

For an encoded video frame the type can be "empty". What does that mean/what do you do with it?

I think the only situation where that could come up is if the transform held onto a reference to a frame after it had written it to the writable. Eventually, that frame data will be transferred to another thread, leaving the frame empty? I don't think we implement changing the type to empty when that occurs right now.

Flags: needinfo?(docfaraday)

Hi Byron,

I am very grateful for your support and happy to get it any way that works for you!
That said, FYI MDN docs source is hosted on github, and we benefit from excellent source commenting and differencing tools (much better than Bugzilla). So for example your comment on RTCRtpSender.generateKeyFrame() has been copied back to here and can be discussed in a threaded way associated with particular text.

W.r.t. that comment, note that FF appears to implement the CPP for RTCRtpSender.generateKeyFrame() but not IDL - doesn't that mean it is not exposed in JavaScript?

Anyway, I'll go through this and get back to you if I have any questions.

(In reply to Hamish Willee from comment #64)

Hi Byron,

I am very grateful for your support and happy to get it any way that works for you!
That said, FYI MDN docs source is hosted on github, and we benefit from excellent source commenting and differencing tools (much better than Bugzilla). So for example your comment on RTCRtpSender.generateKeyFrame() has been copied back to here and can be discussed in a threaded way associated with particular text.

W.r.t. that comment, note that FF appears to implement the CPP for RTCRtpSender.generateKeyFrame() but not IDL - doesn't that mean it is not exposed in JavaScript?

Correct, and I think the function signature is wrong anyway. We'd need a little refactoring to expose this to JS.

Depends on: 1849422

Thanks @Byron. Just FYI the docs are now "complete" and in review. They aren't inspiring, but they are fairly complete :-).

I did have one oddity in RTCEncodedAudioFrame.getMetadata() that I was getting only payloadType and synchronizationSource in the returned object and was expecting other values like sequence. This was from a webcam asking for both video and audio, and the video frames seemed to have the right kind of information.

I've incorporated some of your info on RID (more should go in, but what I took was appropriate for this API).
I didn't modify the text on generating Key frames to state that WebRTC can auto request them on need, though this is implied. My take is that even if RTC can order frames when it knows it needs them, it can't always know that the transform does - for example if a transform starts adding new encryption keys or whatever. Anyway, what it says now should not be "wrong".

(In reply to Hamish Willee from comment #66)

Thanks @Byron. Just FYI the docs are now "complete" and in review. They aren't inspiring, but they are fairly complete :-).

I did have one oddity in RTCEncodedAudioFrame.getMetadata() that I was getting only payloadType and synchronizationSource in the returned object and was expecting other values like sequence. This was from a webcam asking for both video and audio, and the video frames seemed to have the right kind of information.

Was this an outgoing frame? Spec says that |sequenceNumber| is only set on incoming audio frames.

https://www.w3.org/TR/webrtc-encoded-transform/#dom-rtcencodedaudioframemetadata-sequencenumber

Also, there's a bug in libwebrtc that prevents us from having |contributingSources| on outgoing audio frames:

https://bugzilla.mozilla.org/show_bug.cgi?id=1835076

Thanks very much - that's exactly what it was.

Regressions: 1858809
You need to log in before you can comment on or make changes to this bug.