Sites using Twilio Video SDK broke in Firefox 138
Categories
(Web Compatibility :: Site Reports, defect, P1)
Tracking
(Webcompat Priority:P2, Webcompat Score:5, firefox-esr128 unaffected, firefox138 wontfix, firefox139+ fixed, firefox140+ verified)
People
(Reporter: pehrsons, Assigned: pehrsons)
References
(Regression, )
Details
(Keywords: regression, webcompat:platform-bug, webcompat:site-report)
User Story
platform:windows,mac,linux,android impact:workflow-broken configuration:general affects:all branch:release diagnosis-team:video-conferencing user-impact-score:160
This was filed with Twilio as https://github.com/twilio/twilio-video.js/issues/2101
We have been able to reproduce this issue using https://networktest.twilio.com/
136 and 137 are passing all tests, whereas Nightly fails the last two tests which are for testing video using Twilio's TURN servers.
Comment 1•6 months ago
|
||
Regression range: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=21e870dc027c167507213dc56df75a052adf85a4&tochange=f5cf5b1bf1f5e9d20e69a012453dba77a17f4731
Updated•6 months ago
|
Updated•6 months ago
|
Updated•6 months ago
|
Comment 2•6 months ago
|
||
Set release status flags based on info from the regressing bug 1949282
Comment 3•6 months ago
|
||
Hello, this is Luis.
I'm part of the team working on the Twilio Video SDK. After compiling the source locally and doing some investigation, I identified the commit that appears to be causing the issue:
https://phabricator.services.mozilla.com/rMOZILLACENTRAL4de509660d334ea7eb0746f300177ee419f62171
Please let me know if you need any additional information or if there's anything I can help clarify.
Comment 4•6 months ago
|
||
Set release status flags based on info from the regressing bug 1949282
| Assignee | ||
Comment 5•6 months ago
|
||
(In reply to Luis Rivas from comment #3)
Hello, this is Luis.
I'm part of the team working on the Twilio Video SDK. After compiling the source locally and doing some investigation, I identified the commit that appears to be causing the issue:
https://phabricator.services.mozilla.com/rMOZILLACENTRAL4de509660d334ea7eb0746f300177ee419f62171
Please let me know if you need any additional information or if there's anything I can help clarify.
Thanks Luis for your effort on this. That's the same regressor we found. I believe I have caught the bug on https://networktest.twilio.com with rr and Pernosco, so we should be able to figure this out shortly.
If you do employ a workaround for this issue, it'd be great to keep some test page for us to verify a fix against. We should also be able to add a test case for this issue in automation, but always good with the end-to-end verification in addition.
| Assignee | ||
Comment 6•6 months ago
|
||
Comment 7•6 months ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #5)
Thanks Luis for your effort on this. That's the same regressor we found. I believe I have caught the bug on https://networktest.twilio.com with rr and Pernosco, so we should be able to figure this out shortly.
If you do employ a workaround for this issue, it'd be great to keep some test page for us to verify a fix against. We should also be able to add a test case for this issue in automation, but always good with the end-to-end verification in addition.
Absolutely, we can keep that site as is so you can use it for validations. While we considered workarounds for Twilio Video, supporting the Mozilla team in addressing the issue seems best. A quick fix might cause problems with other use cases and future Firefox versions, as it's hard to pinpoint the exact issue.
| Assignee | ||
Updated•6 months ago
|
| Assignee | ||
Comment 8•5 months ago
•
|
||
(In reply to Luis Rivas from comment #7)
Absolutely, we can keep that site as is so you can use it for validations. While we considered workarounds for Twilio Video, supporting the Mozilla team in addressing the issue seems best. A quick fix might cause problems with other use cases and future Firefox versions, as it's hard to pinpoint the exact issue.
Thank you Luis. Please note we have confirmed three different fixes at different levels all address the issue of not receiving video. Not all are up yet.
During our investigation on https://networktest.twilio.com we found there are three offer/answer exchanges taking place. We don't think that's relevant but haven't finished a minimal test case yet.
The final local description for Firefox, an answer, contains two video m-lines, one inactive and one recvonly. This is relevant and triggers the bug. A number of things have to hold for this bug to trigger:
- a video m-section A must have been the only m-section, and active with a recv direction, when negotiated
- m-section A may at no point have had an
a=ssrcline - a renegotiation must happen where A is inactive and another video m-section B is active with a recv direction, and with the same payload types that A was configured for when active
- m-sections A and B must be combined with BUNDLE
- no MID RTP header extension, at least for m-section A
If all this holds and packets destined for m-section B are received, our code gets confused, routes them to A instead (which is inactive, so the packets in the end are just ignored) and B gets reconfigured internally for some other recv ssrc that we generate on the fly
edit May 14: added the bit on MID
edit May 15: rewritten with the bits on a=ssrc lines and renegotiation
Comment 9•5 months ago
|
||
:dbaker, since you are the author of the regressor, bug 1949282, could you take a look?
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 10•5 months ago
•
|
||
Looking more into this situation with MID, I think Twilio is in violation of RFC8843 section 9.1 here.
On MID with BUNDLE it says:
The RTP MID header extension MUST be enabled, by including an SDP 'extmap' attribute [RFC8285], with a 'urn:ietf:params:rtp-hdrext:sdes:mid' URI value, in each bundled RTP-based "m=" section in every offer and answer.
I see a=group:BUNDLE 0 1 application0 audio0 video0 and no MID extension.
Luis, FYI.
Comment 11•5 months ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #10)
Looking more into this situation with MID, I think Twilio is in violation of RFC8843 section 9.1 here.
On MID with BUNDLE it says:The RTP MID header extension MUST be enabled, by including an SDP 'extmap' attribute [RFC8285], with a 'urn:ietf:params:rtp-hdrext:sdes:mid' URI value, in each bundled RTP-based "m=" section in every offer and answer.
I see
a=group:BUNDLE 0 1 application0 audio0 video0and no MID extension.Luis, FYI.
Hi, Andreas
Thank you very much for the heads up. I will report that mismatch internally so we can review it. Is your team planning to apply stricter rules for this in the coming versions? We offer another type of configuration where SDP is not modified, so I confirmed what you mentioned regarding MID extension is not happening there, but it is still impossible to establish a call between Firefox 137 and Firefox 138 if Firefox 137 connects first. The only way to set up a video call correctly was to connect Firefox 138 first and then join a call using Firefox 137 or any other browser, which is something out of our control.
Said that, we decided not to use a workaround for Firefox 138 since we considered it would have more downsides down the road. However, we would be more than happy to discuss how we could collaborate on this matter going forward to offer a good experience for Firefox users.
Updated•5 months ago
|
Comment 12•5 months ago
|
||
The bug is marked as tracked for firefox139 (beta) and tracked for firefox140 (nightly). We have limited time to fix this, the soft freeze is in 7 days. However, the bug still isn't assigned.
:denschub, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 13•5 months ago
|
||
Assigning Andreas to make the bot happy - but please feel free to assign someone else if needed.
| Assignee | ||
Comment 14•5 months ago
|
||
(In reply to Luis Rivas from comment #11)
Hi, Andreas
Thank you very much for the heads up. I will report that mismatch internally so we can review it. Is your team planning to apply stricter rules for this in the coming versions? We offer another type of configuration where SDP is not modified, so I confirmed what you mentioned regarding MID extension is not happening there, but it is still impossible to establish a call between Firefox 137 and Firefox 138 if Firefox 137 connects first. The only way to set up a video call correctly was to connect Firefox 138 first and then join a call using Firefox 137 or any other browser, which is something out of our control.
Said that, we decided not to use a workaround for Firefox 138 since we considered it would have more downsides down the road. However, we would be more than happy to discuss how we could collaborate on this matter going forward to offer a good experience for Firefox users.
No stricter rules planned, that seems risky wrt breaking sites.
Note I have finished making an automated test case for this issue. The trigger case is even more narrow than I made out in comment 8. In addition to
- BUNDLE
- No MID extension
- Inactive and active (receiving) video transceiver present
you also need
- No a=ssrc line for the inactive transceiver, also for any earlier negotiation involving that transceiver
- The inactive transceiver must have been active when finishing an earlier negotiation, when it also must have been the only active receiving transceiver with the payload type that later gets sent to the other active transceiver
That's a lot of stars aligning -- you should be able to figure out a workaround fairly easily. Let me know if I can assist further in that.
See my test case on https://phabricator.services.mozilla.com/D249545.
Do you have a test page where I can reproduce the bug with the other type of configuration for myself? I can look for bugs. Or check whether my slew of fixes addresses it.
Comment 15•5 months ago
|
||
I seem to have encountered this issue in Firefox 138 (it was working fine in Firefox 137). I've created a simple demo here, could you please help me resolve it?
Comment 16•5 months ago
|
||
This issue causes our RTC service to become unusable in Firefox 138 (received video renders as black screen).
In the past few days, we have received thousands of user reports, and after investigation, we have pinpointed this issue. I hope this problem can be resolved quickly (is it possible to revert the recent commit?) to minimize the impact on more users.
Thanks...
| Assignee | ||
Comment 17•5 months ago
•
|
||
(In reply to xuanshu from comment #16)
This issue causes our RTC service to become unusable in Firefox 138 (received video renders as black screen).
In the past few days, we have received thousands of user reports, and after investigation, we have pinpointed this issue. I hope this problem can be resolved quickly (is it possible to revert the recent commit?) to minimize the impact on more users.
Thanks...
Thank you for that test case. It does indicate that neither absence of the MID rtp header extension, nor absence of a=ssrc is required to reproduce. I'll take a look to see how this failure mode is different from the one we previous found. We are working on a fix for Firefox 139. For a shorter-term fix than that, you'll need a workaround. I'll try to come up with something.
| Assignee | ||
Comment 18•5 months ago
|
||
Here's a profile for the test case in comment 15. The failure mode is indeed different. This seems like a regression due to the packet filter now learning about new SSRCs even when it already knows about some.
Comment 19•5 months ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #18)
Here's a profile for the test case in comment 15. The failure mode is indeed different. This seems like a regression due to the packet filter now learning about new SSRCs even when it already knows about some.
Hi, Andreas
Thank you for your response.
Given the critical and potentially devastating impact this issue is having on our operations, we kindly request an expedited resolution—even a temporary workaround—at your earliest convenience.
To explain further: many of our clients rely on the third-party SDK we provide. Implementing a Web SDK workaround would require extensive client-side upgrades, which is unfortunately not feasible in the short term due to deployment complexities. For this reason, a server-side or browser-level fix for Firefox would be invaluable to mitigate the issue immediately.
We sincerely appreciate your understanding and support in prioritizing this matter.
Best regards,
shu
Comment 20•5 months ago
|
||
Hi all.
Out service is currently experiencing this issue in Firefox 138. The minimal reproduction is here.
This reproduction contains two transceivers, both of which negotiate normally. In certian scenarios, the first transceiver may not start to sending RTP, which causes the second one to render nothing. This scenarios is quite common when connected with SFU.
In my opinion, this is a critical bug that already impacted several use cases, some of which are quite common. I strongly recommend reverting this change as soon as possible.
| Assignee | ||
Comment 21•5 months ago
|
||
In latest Nightly, https://networktest.twilio.com and https://wpj5jv.csb.app/ (comment 15) are now working.
We'll have to do something more for the repro case in comment 20. I'll file another bug.
Comment 22•5 months ago
|
||
(In reply to Andreas Pehrson [:pehrsons] from comment #21)
In latest Nightly, https://networktest.twilio.com and https://wpj5jv.csb.app/ (comment 15) are now working.
We'll have to do something more for the repro case in comment 20. I'll file another bug.
Thanks Andreas!! Could you kindly confirm the estimated timeline for merging this fix into the codebase and which Firefox version it will be included in?
| Assignee | ||
Comment 23•5 months ago
|
||
(In reply to xuanshu from comment #22)
(In reply to Andreas Pehrson [:pehrsons] from comment #21)
In latest Nightly, https://networktest.twilio.com and https://wpj5jv.csb.app/ (comment 15) are now working.
We'll have to do something more for the repro case in comment 20. I'll file another bug.Thanks Andreas!! Could you kindly confirm the estimated timeline for merging this fix into the codebase and which Firefox version it will be included in?
We hope to get it into 139 which releases in about a week.
Comment 24•5 months ago
|
||
Hi, Andreas!
(In reply to Andreas Pehrson [:pehrsons] from comment #14)
Do you have a test page where I can reproduce the bug with the other type of configuration for myself? I can look for bugs. Or check whether my slew of fixes addresses it.
We wanted to let you know we were planning to provide you with a private deployment so you could test it, but today our QE team confirmed that the most recent build (140.0a1) also fixes the issues in that configuration.
| Assignee | ||
Comment 25•5 months ago
|
||
(In reply to Luis Rivas from comment #24)
Hi, Andreas!
(In reply to Andreas Pehrson [:pehrsons] from comment #14)
Do you have a test page where I can reproduce the bug with the other type of configuration for myself? I can look for bugs. Or check whether my slew of fixes addresses it.
We wanted to let you know we were planning to provide you with a private deployment so you could test it, but today our QE team confirmed that the most recent build (140.0a1) also fixes the issues in that configuration.
Great, thank you for confirming.
We found so far that bug 1965960 fixes most issues. Bug 1967189 should fix the rest. Bug 1966185 will be for some tests, and cleanup of non-critical paths.
Updated•5 months ago
|
Comment 26•5 months ago
|
||
The primary Depends on bugs, Bug 1965960 and Bug 1967189, have been uplifted to Fx139. Fx139 is considered "fixed"
Bug 1966185 will follow in Fx140.
Updated•5 months ago
|
Comment 28•4 months ago
|
||
Based on comment #0
This was filed with Twilio as https://github.com/twilio/twilio-video.js/issues/2101
We have been able to reproduce this issue using https://networktest.twilio.com/136 and 137 are passing all tests, whereas Nightly fails the last two tests which are for testing video using Twilio's TURN servers.
Verified, it passes all the tests on https://networktest.twilio.com/
Tested with:
- Browser / Version: Firefox 140.0-candidate build 1
- Operating System: Windows 10
Updated•4 months ago
|
Description
•