Support RTCRtpScriptTransform (formerly webrtc insertable streams)
Categories
(Core :: WebRTC: Networking, enhancement, P2)
Tracking
()
People
(Reporter: jerome.bouat, Assigned: bwc)
References
(Blocks 3 open bugs, Regressed 2 open bugs)
Details
(Keywords: dev-doc-complete, webcompat:platform-bug, Whiteboard: [jitsi-meet], [wptsync upstream])
Attachments
(4 files, 8 obsolete files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
Insertable streams provide a hook between the RTP (de)packetizer and the encoder/decoder as described here :
http://www.w3.org/2019/09/18-mediaprocessing-harald-insertable-media-processing.pdf
The Jisti team implemented an end-to-end encryption by using the insertable streams of Chrome with the Jisti Meet system :
https://jitsi.org/blog/e2ee/
This allows their video bridge (Selective Forwarding Unit) to easily forward packets whatever they are encrypted or not, whatever the receiver quality is.
On the end points, the insertable stream can be processed into a separated thread (or process ?), besides the usual (de)packetizers and encoder/decoder.
Actual results:
It seems Firefox doesn't support it.
Expected results:
Plan a support for this feature ? Are their any security issues ?
Comment 1•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Comment 2•5 years ago
|
||
Jib, would you mind taking a initial triage pass here?
Please add insertable streams support to Firefox.
Access give access to:
- end-to-end encryption
- adding more reliability with forward error correction
For more information, please watch the “Jitsi Community Call” (Streamed live on 21 Apr 2020) at 21’30”
https://youtu.be/y4CW3c_Es4I?t=1290
100% support for Firefox (and other non-Chrome browsers)
https://github.com/jitsi/jitsi-meet/issues/4758
It is understood that Chromium has the insertable streams feature.
What do you think?
Thank you
Comment 4•5 years ago
|
||
I believe Mozilla is working on a position on insertable streams. We should probably open an issue on https://github.com/mozilla/standards-positions/
Comment 5•5 years ago
|
||
Correct. Mozilla first needs to decide about it position on https://github.com/mozilla/standards-positions/ what we think about Google insertable streams proposal. These discussions don't happen in bugzilla.
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #4) and (In reply to Nils Ohlmeier [:drno] from comment #5)
https://github.com/mozilla/standards-positions/issues/330
Comment 7•5 years ago
|
||
It looks like this is being worked on in the standards-positions tracker. I am closing this as moved
for now, it can be reopened or recreated when a position is taken.
Comment 8•4 years ago
|
||
I guess this issue can be reopened now that a position is taken.
Comment 10•4 years ago
|
||
(In reply to Nico Grunbaum [:ng, @chew:mozilla.org] from comment #7)
It looks like this is being worked on in the standards-positions tracker. I am closing this as
moved
for now, it can be reopened or recreated when a position is taken.
Now that a position has been taken could this bug be reopened? Or is there another bug tracking work on WebRTC insertable streams?
Updated•3 years ago
|
Comment 11•3 years ago
•
|
||
Just to update this issue that the plan is to implement the latest spec, which is RTCRtpScriptTransform, and not any previous Chrome API. This is the API we'd implement that most matches the requests from major services.
Separately, the spec also offers the SFrameTransform API, which would provide sframe encoding natively without requiring this to be done by JS. This API is a bit less mature, and we haven't gotten requests for it yet. I'll open a separate issue on that.
Comment 13•3 years ago
|
||
Jitsi Meet has just implemented RTCRtpScriptTransform support, so E2EE on Jitsi Meet will work in Firefox once Firefox gains RTCRtpScriptTransform support.
Comment 14•3 years ago
|
||
Since firefox 96 fixed many issues about webRTC support https://bugzilla.mozilla.org/show_bug.cgi?id=1654112#c371. This is one important remaining bug. Is there any ETA?
Comment 15•3 years ago
|
||
Any update on this?
Comment 16•2 years ago
|
||
Any news?
Comment 17•2 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1715625#c3 says that this is blocking Facetime support in Firefox.
Comment 18•2 years ago
|
||
Given that we can get support for Facetime, setting webcompat priority as P1.
Comment 19•2 years ago
|
||
Any news on this? This is very important for firefox compatibility, especially since the not support pushes Firefox in more unsecure areas when using e.g. Jitsi or any other Web-Video-Conference and e2ee is not possible due to the lack of support for this feature in FF.
Comment 20•2 years ago
|
||
Has there been any progress now that this is P1?
As others have said this is a major blocker with using Firefox for end to end encryption in webRTC.
Comment 21•2 years ago
|
||
The next version 106 will contain many enhancements of WebRTC https://www.mozilla.org/en-US/firefox/106.0beta/releasenotes/, so maybe they will focus on that later on, I hope.
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Sorry to be the person to bother everyone again, but I think people would appreciate if you had some idea of what the priority and potential timeframe on this is, considering that it's the sole blocker for making multiple major services usable (or usable with increased functionality) in Firefox.
Updated•2 years ago
|
Comment 24•2 years ago
|
||
(In reply to Justin Peter from comment #23)
Sorry to be the person to bother everyone again, but I think people would appreciate if you had some idea of what the priority and potential timeframe on this is, considering that it's the sole blocker for making multiple major services usable (or usable with increased functionality) in Firefox.
We're currently working on getting this implemented!
Updated•2 years ago
|
Comment 25•2 years ago
|
||
Assignee | ||
Comment 26•1 years ago
|
||
Assignee | ||
Comment 27•1 years ago
|
||
Assignee | ||
Comment 28•1 years ago
|
||
Updated•1 years ago
|
Updated•1 years ago
|
Assignee | ||
Comment 29•1 years ago
|
||
Depends on D179099
Assignee | ||
Comment 30•1 years ago
|
||
Depends on D179099
Assignee | ||
Comment 31•1 years ago
|
||
Depends on D179731
Assignee | ||
Comment 32•1 years ago
|
||
Depends on D179732
Assignee | ||
Comment 33•1 years ago
|
||
Depends on D179733
Assignee | ||
Comment 34•1 years ago
|
||
Depends on D179734
Updated•1 years ago
|
Assignee | ||
Comment 35•1 year ago
|
||
Depends on D179735
Comment 36•1 year ago
|
||
Comment on attachment 9337092 [details]
WIP: Bug 1631263: (WIP) libwebrtc modifications (needs moving to a different bug)
Revision D179731 was moved to bug 1838080. Setting attachment 9337092 [details] to obsolete.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 38•1 year ago
|
||
This was breaking code in RTCRtpScriptTransformer that relied on the first rid
having 0 as its simulcast index. It was probably also resulting in a priority
inversion on simulcast streams.
(see https://www.rfc-editor.org/rfc/rfc8853.html#section-7.1)
Also remove some unused cruft.
Depends on D179735
Updated•1 year ago
|
Comment 40•1 year ago
|
||
Comment 42•1 year ago
|
||
Backed out for causing VideoConduitTest related failures
Updated•1 year ago
|
Comment 44•1 year ago
|
||
Comment 45•1 year ago
|
||
Backed out for causing wpt failures in script-transform-generateKeyFrame.https.html
- Backout link
- Push with failures
- Failure Log
- Failure line: TEST-UNEXPECTED-TIMEOUT | /webrtc-encoded-transform/script-transform-generateKeyFrame.https.html | generateKeyFrame works with simulcast rids - Test timed out
Assignee | ||
Comment 46•1 year ago
|
||
Looks like that tester is just too slow to reliably run that test. I might be able to break the test file up into multiple smaller test files.
Assignee | ||
Comment 48•1 year ago
|
||
Breaking the test file up seems to have resolved the timeouts.
Comment 49•1 year ago
|
||
Comment 50•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ebdc00fb1689
https://hg.mozilla.org/mozilla-central/rev/1012734b7311
https://hg.mozilla.org/mozilla-central/rev/b0348f1f8d71
https://hg.mozilla.org/mozilla-central/rev/98677779a3a4
Updated•1 year ago
|
Comment 52•1 year ago
|
||
Do we want to call this out in the Fx117 relnotes?
Assignee | ||
Comment 53•1 year ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #52)
Do we want to call this out in the Fx117 relnotes?
Definitely, yes!
Comment 54•1 year ago
|
||
I'll post an intent to ship today or tomorrow. Sorry for the delay on that.
Assignee | ||
Comment 55•1 year ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]:
RTCRtpScriptTransform is a feature that is in use in major services (eg; Facetime), and we are the second major implementer. Chrome supports an alternative standard.
[Affects Firefox for Android]:
In theory, although I'm not sure how many services use this feature on mobile.
[Suggested wording]:
Implemented RTCRtpScriptTransform.
[Links (documentation, blog post, etc)]:
https://w3c.github.io/webrtc-encoded-transform/
@jib, do we have anything to link to here besides the spec?
Comment 56•1 year ago
|
||
MDN doesn't have a page for it yet, but there's https://caniuse.com/?search=RTCRtpScriptTransform
WebRTC Samples is failing over bug 1396922:
Uncaught TypeError: transceiver.setCodecPreferences is not a function
I've been meaning to write an adapter.js shim for that, but not gotten around to it.
In the meantime this fiddle demonstrates use of the API successfully in Firefox (for VP8).
Comment 57•1 year ago
|
||
I've submitted a fix to WebRTC samples that makes it work. In the meantime, you can try it here https://jan-ivar.github.io/samples/src/content/insertable-streams/endtoend-encryption/
Comment 59•1 year ago
|
||
FF117 MDN Docs work for this can be tracked in https://github.com/mdn/content/issues/28280
Comment 60•1 year ago
|
||
For the docs, I have a few questions on RTCRtpScriptTransform(worker, options, transfer)
- Can you provide a practical example of when you would need to use the
transfer
option and how your would do it? I think you could do it like:
But not sure when you would.const myMessage = "encode"; const mySomeObject= new SomeObject(). videoSender.transform = new RTCRtpScriptTransform(worker, { transform_type: , thingy: mySomeObject }, [mySomeObject, myMessage ]);
- For docs I've said you get an exception if an object in the transfer array can't be transferred. Is that precise enough? I mean I expect you would understand you can't transfer an already detached object etc?
A slightly bigger question more "for my understanding". The RTCRtpScriptTransform
hooks you into an RTC pipeline for incoming/outgoing frames. I understand it has an internal readable/writable that are piped encoded frames. These readable/writable are exposed to the worker in a RTCRtpScriptTransformer
.
My problem is this: in the spec it seems to indicate that the readable/writable are transferred to the worker. I thought transferring made them inaccessible to the main thread. So I was wondering how the rest of the RTC pipeline can still add frames to these variables. Is it that there is some magic going on here outside of JavaScript? I.e. you've transferred the readable writable but RTC itself is writing to the same memory in C++ directly, not going via the original JavaScript values in RTCRtpScriptTransform
? Sorry if this is very dumb!
Assignee | ||
Comment 61•1 year ago
|
||
(In reply to Hamish Willee from comment #60)
For the docs, I have a few questions on
RTCRtpScriptTransform(worker, options, transfer)
- Can you provide a practical example of when you would need to use the
transfer
option and how your would do it? I think you could do it like:But not sure when you would.const myMessage = "encode"; const mySomeObject= new SomeObject(). videoSender.transform = new RTCRtpScriptTransform(worker, { transform_type: , thingy: mySomeObject }, [mySomeObject, myMessage ]);
- For docs I've said you get an exception if an object in the transfer array can't be transferred. Is that precise enough? I mean I expect you would understand you can't transfer an already detached object etc?
So, the wpt for this use |transfer| to pass an extra MessagePort that is used to communicate with the worker-side object that is handling that specific transform (instead of postMessage and demuxing):
I could see webdevs using a similar pattern. It is likely that webdevs will pass crypto-related stuff there too.
A slightly bigger question more "for my understanding". The
RTCRtpScriptTransform
hooks you into an RTC pipeline for incoming/outgoing frames. I understand it has an internal readable/writable that are piped encoded frames. These readable/writable are exposed to the worker in aRTCRtpScriptTransformer
.
My problem is this: in the spec it seems to indicate that the readable/writable are transferred to the worker. I thought transferring made them inaccessible to the main thread. So I was wondering how the rest of the RTC pipeline can still add frames to these variables. Is it that there is some magic going on here outside of JavaScript? I.e. you've transferred the readable writable but RTC itself is writing to the same memory in C++ directly, not going via the original JavaScript values inRTCRtpScriptTransform
? Sorry if this is very dumb!
Yeah, the spec is really wonky here, and overspecifies implementation details. The essential property here is that there are readable/writable exposed to the worker, and how that happens under the hood should not matter.
Updated•1 year ago
|
Comment 62•1 year ago
|
||
Thanks very much @docfaraday - that was very helpful.
Getting there with the docs. Would it be possible to get a review of Using_Encoded_Transforms in particular? .... and https://github.com/mdn/content/pull/28439 if you have more time (that's still a little "under construction").
A few more questions, that came up as I was documenting:
- A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call
getKeyFrame()
at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not? - The rid for
generateKeyFrame()
appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if youaddTrack()
so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"? Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec). - is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)
- For an encoded video frame the type can be "empty". What does that mean/what do you do with it?
I don't think these are obvious from the spec, but I might be being dim!
Assignee | ||
Comment 63•1 year ago
•
|
||
I'm not exactly sure what review tools are available for MDN, so:
The framework fires an rtctransform event at the worker global scope on construction of the corresponding RTCRtpScriptTransform, and whenever an encoded frame is enqueued for processing.
rtctransform is only for the initial setup, frames are handed off via a ReadableStream.
The worker script must implement a handler for the rtctransform event, creating a pipe chain that pipes the event.transformer.readable stream through an appropriate TransformStream to the event.transformer.writable stream.
So, while TransformStream is likely to be the strategy most use, it isn't actually required; the only expectation is that the worker code reads frames from the provided ReadableStream somehow, and writes those frames with whatever modifications it wants to the provided WritableStream somehow, in the same order without any duplications (it could drop some though).
This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame. Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key.
There's some subtlety here. WebRTC already has mechanisms to prompt the sending of new keyframes (this is built into the RTP specifications). I think the intent here is to allow JS code a "way out" when it needs a keyframe to do its transform work, for whatever reason.
A receiver can request that the sender send a new key frame. However it can't specify the RID as you can if you call getKeyFrame() at the sender. Why is that not needed? Is it because the sender implicitly knows what stream the reciever is getting, but the sender perhaps does not?
Browsers never receive more than one rid at a time, and do not know or care about rids on receive streams. The point of sending more than one rid at a time (this is called simulcast) is to allow a middlebox to have a variety of levels of video quality available for transmission to other participants in the conference; it will only ever send one at a time to any particular participant, and when it switches it will just look like a quality change to the receiver, not a different stream. This can be used to handle differences in available bandwidth for the participants, and it can also be used to switch resolution rapidly on the fly (for example, the middlebox forwards low-quality video for everyone except the active speaker).
The rid for generateKeyFrame() appears to be an identifier that you might specify in the transciever creation. However AFAIK you don't have to explicitly add a transceiver if you addTrack() so I have suggested you can get this by querying the sender parameters for encoders. FMI, is this correct? And what are they likely to be - do they get allocated standard values like "1, 2, 3"?
This ends up being kinda complicated. If you use addTrack, and you're also the offerer, you can't do simulcast at all. If you use addTrack, but you're the answerer, the offer SDP can set up simulcast for you (with rids chosen by the offerer). addTransceiver is the only way to do simulcast as an offerer, and also the only way for JS to select the rids.
Further, the format of rid appears to be alphanumeric ascii + hyphen and underscore. Is there any limit on the length (doesn't appear to be in spec).
The specs are inconsistent, but the upshot is that it can be between 1 and 255 characters of alpha-numeric. (RFC 8851 allows '-' and '_' and unlimited length, but RFC 8852 disagrees, see https://www.rfc-editor.org/errata/eid7132)
is the return value of generateKeyFrame the timestamp (a promise that fulfills as the timestamp?)
Yes. RTCRtpSender.generateKeyFrame does not, but that's probably a spec bug.
For an encoded video frame the type can be "empty". What does that mean/what do you do with it?
I think the only situation where that could come up is if the transform held onto a reference to a frame after it had written it to the writable. Eventually, that frame data will be transferred to another thread, leaving the frame empty? I don't think we implement changing the type to empty when that occurs right now.
Comment 64•1 year ago
|
||
Hi Byron,
I am very grateful for your support and happy to get it any way that works for you!
That said, FYI MDN docs source is hosted on github, and we benefit from excellent source commenting and differencing tools (much better than Bugzilla). So for example your comment on RTCRtpSender.generateKeyFrame()
has been copied back to here and can be discussed in a threaded way associated with particular text.
W.r.t. that comment, note that FF appears to implement the CPP for RTCRtpSender.generateKeyFrame()
but not IDL - doesn't that mean it is not exposed in JavaScript?
Anyway, I'll go through this and get back to you if I have any questions.
Assignee | ||
Comment 65•1 year ago
|
||
(In reply to Hamish Willee from comment #64)
Hi Byron,
I am very grateful for your support and happy to get it any way that works for you!
That said, FYI MDN docs source is hosted on github, and we benefit from excellent source commenting and differencing tools (much better than Bugzilla). So for example your comment onRTCRtpSender.generateKeyFrame()
has been copied back to here and can be discussed in a threaded way associated with particular text.W.r.t. that comment, note that FF appears to implement the CPP for
RTCRtpSender.generateKeyFrame()
but not IDL - doesn't that mean it is not exposed in JavaScript?
Correct, and I think the function signature is wrong anyway. We'd need a little refactoring to expose this to JS.
Comment 66•1 year ago
|
||
Thanks @Byron. Just FYI the docs are now "complete" and in review. They aren't inspiring, but they are fairly complete :-).
I did have one oddity in RTCEncodedAudioFrame.getMetadata()
that I was getting only payloadType
and synchronizationSource
in the returned object and was expecting other values like sequence. This was from a webcam asking for both video and audio, and the video frames seemed to have the right kind of information.
I've incorporated some of your info on RID (more should go in, but what I took was appropriate for this API).
I didn't modify the text on generating Key frames to state that WebRTC can auto request them on need, though this is implied. My take is that even if RTC can order frames when it knows it needs them, it can't always know that the transform does - for example if a transform starts adding new encryption keys or whatever. Anyway, what it says now should not be "wrong".
Assignee | ||
Comment 67•1 year ago
|
||
(In reply to Hamish Willee from comment #66)
Thanks @Byron. Just FYI the docs are now "complete" and in review. They aren't inspiring, but they are fairly complete :-).
I did have one oddity in
RTCEncodedAudioFrame.getMetadata()
that I was getting onlypayloadType
andsynchronizationSource
in the returned object and was expecting other values like sequence. This was from a webcam asking for both video and audio, and the video frames seemed to have the right kind of information.
Was this an outgoing frame? Spec says that |sequenceNumber| is only set on incoming audio frames.
https://www.w3.org/TR/webrtc-encoded-transform/#dom-rtcencodedaudioframemetadata-sequencenumber
Also, there's a bug in libwebrtc that prevents us from having |contributingSources| on outgoing audio frames:
Comment 68•1 year ago
|
||
Thanks very much - that's exactly what it was.
Updated•1 year ago
|
Updated•2 months ago
|
Description
•