Closed Bug 904598 Opened 11 years ago Closed 11 years ago

WebRTC not working on Ubuntu and Mac OS X machines

Categories

(Core :: WebRTC: Audio/Video, defect)

defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla26
Tracking Status
firefox23 --- verified
firefox24 + verified
firefox25 + verified
firefox26 + verified

People

(Reporter: adalucinet, Assigned: ekr)

Details

Attachments

(8 files, 1 obsolete file)

Reproducible on Firefox 24 beta 2 (Build ID: 20130812173056): Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 Reproducible on the latest Nightly (Build ID: 20130812030209): Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0 Reproducible on the latest Aurora (Build ID: 20130813004004): Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0 STR: 1. Launch Firefox. 2. Navigate to http://tokbox.com/opentok/webrtc/demo. 3. Perform a 1 (Ubuntu machine) to 1 (Ubuntu machine) call on the same network. then 3. Perform a 1 (Ubuntu machine) to 1 (Mac OS X 10.7.5 machine) call on the same network. Expected results: After step 3, an audio/video call is performed successfully. Actual results: After step 3, there is no audio sound (both machines) and, in maximum 5 seconds, the participants camera freezes. Notes: 1. I will investigate more and come back with updates. 2. On 1:1 call (Chrome to Chrome), same network, I've encountered no issues: audio and video works as expected. Please let me know if more details are needed.
Please add some more details to the steps-to-reproduce Does "Perform" mean start a call, then end it? How long? Or are there 2 simultaneous calls using 4 machines? Does this happen using apprtc.appspot.com? Please retest with that instead of opentok.
I can reproduce this in apprtc with two machines. Debugging.
(In reply to Randell Jesup [:jesup] from comment #1) > Please add some more details to the steps-to-reproduce > > Does "Perform" mean start a call, then end it? How long? Or are there 2 > simultaneous calls using 4 machines? Yes, perform means to start a call (there is no need to end it): first, 1 call using 2 Ubuntu machines and then, another call using 1 Ubuntu machine and 1 Mac OS X machine; just wait for the actual results to happen (maxim 5 seconds); > Does this happen using apprtc.appspot.com? Please retest with that instead > of opentok. Sure, I will test with this URL and update this issue by tomorrow.
RelMan, is this something important enough to hold 24beta2 for?
Severity: normal → major
(In reply to Alexandra Lucinet [QA] from comment #3) > (In reply to Randell Jesup [:jesup] from comment #1) > > Please add some more details to the steps-to-reproduce > > > > Does "Perform" mean start a call, then end it? How long? Or are there 2 > > simultaneous calls using 4 machines? > Yes, perform means to start a call (there is no need to end it): first, 1 > call using 2 Ubuntu machines and then, another call using 1 Ubuntu machine > and 1 Mac OS X machine; just wait for the actual results to happen (maxim 5 > seconds); To be clear: is the second call between a different Ubuntu machine, or one of the Ubuntu machines used in the first call? We also need to know if this affects FF23 (release)
Tracking - please do confirm if this affects FF23 so we can consider our options for 23.0.1 which will be going to build this week. There's no reason to 'hold' the beta as we will have another Beta build on Thursday so we should take the time to investigate and confirm a low risk fix before uplifting.
I was able to reproduce this with two macs, one head of tree and one random beta one. More information: 1. It fails on Apprtc. 2. I'm seeing errors in the TURN permission refresh when I run with logs/debugging 3. I just tried with webrtcme.herokuapp.com (which does not supply a TURN server) and that seems to work fine. 4. I have not tried with any TURN server other than Google's Can you please verify that this works for you with webrtcme (note: that does not supply *any* STUN server so we get the default. Don't expect this to work between Chrome and Firefox).
lsblakk: assuming I am correct, we should consider the following actions: * For 23.0x, either: (a) just tell people not to use TURN with FF 23. I can take care of tokbox and apprtc easily. (b) explicitly disable TURN on FF 23. This should be pretty easy and doesn't require reverting the code. * For 24, we should try to figure out a real fix. I am investigating now and should have more by end of day
I prefer not to encourage application developers to version-sniff, so if this doesn't work in 23, I prefer not to have them sniffing versions. But we'll know more once we have a little more investigation.
Restoring the original warning message from before bug 855769 would be non-trivial, because that was before the webidl change-over, so this patch logs to NSPR instead (the warning message may have.
Attachment #789772 - Flags: review?(rjesup)
After some testing with FF22, I'm not sure there's much benefit in preserving the "use TURN as STUN" feature in the patch above, over a patch that turns the pref off, at least not from a legacy/compatibility point-of-view. The reason is FF22 only did this fallback if you omitted credentials, which is rare and actually invalid TURN. In FF22, if you pass in a turn uri WITHOUT credentials: { iceServers: [{ url: "turn:turn.example.org" }] } you get this warning in web console: In RTCConfiguration passed to RTCPeerConnection constructor: TURN servers not yet supported. Treating as STUN: "turn:turn.example.org" In FF22, if you pass in a PROPER turn uri: { iceServers: [{ url: "turn:turn.example.org", username:"p", credential:"p" }] } you get different behavior: In RTCConfiguration passed to RTCPeerConnection constructor: Credentials not yet implemented. Omitting "turn:turn.example.org" This is wrong in hindsight, but too late now. So in practice, there is no fallback for valid TURN uri's in FF22.
That is, unless we see inherent benefit in this fallback behavior going forward, in which case I would argue our code is wrong in not falling back when the pref is off.
I'm fine with a patch for pref off.
Attachment #789772 - Flags: review?(rjesup) → review+
While I r+'d the patch, I agree preffing off makes more sense. It still needs to be tested; please submit a patch for that and a Try (and put the patch up for review... ;-)
Assignee: nobody → ekr
Status: NEW → ASSIGNED
(In reply to Eric Rescorla (:ekr) from comment #17) > Created attachment 789808 [details] [diff] [review] > Fix TURN long-term auth for Permissions Requests This is a WIP patch for investigation purposes. Please try and see if it helps.
Here's a preliminary analysis of at least part of the problem: Ordinarily, we get the realm for a request out of the 401 Unauthorized, but for Permissions requests we already know the realm. We correctly stuff it in the outgoing request, but this doesn't have the side effect of having it in the ctx where it's needed for the response, making us reject the response. This patch explicitly shoves it in the context, and now we accept the response. It also adds a unit test that explicitly checks this. Before we didn't check it because the permissions request succeeds on the TURN server and we don't start failing until the retries have given up and the data send/receive tests are shorter than that timeout.
Comment on attachment 789797 [details] [diff] [review] Disable TURN by default (Set media.peerconnection.turn.disable=true) [Triage Comment] pre-approving this based on the try results and conversation with jesup in IRC - we'll get QA on the builds for 23.0.1 to ensure this is working as expected due to time constraints.
Attachment #789797 - Flags: approval-mozilla-release+
Attachment #789797 - Flags: review?(rjesup) → review+
(In reply to Randell Jesup [:jesup] from comment #5) > To be clear: is the second call between a different Ubuntu machine, or one > of the Ubuntu machines used in the first call? Is one of the Ubuntu machines used in the first call. Please let me know how could I help further on.
See also: https://bugzilla.mozilla.org/show_bug.cgi?id=905150 I believe this explains why this TURN failure is causing total failure rather than just failure of TURN connections.
Comment on attachment 789797 [details] [diff] [review] Disable TURN by default (Set media.peerconnection.turn.disable=true) [Approval Request Comment] Bug caused by (feature/regressing bug #): N/A User impact if declined: Failure on sites using TURN servers Testing completed (on m-c, etc.): in 23.0.1; tested by QA today I believe Risk to taking this patch (and alternatives if risky): Minimal risk; we already took it for 23.0.1 String or IDL/UUID changes made by this patch: none
Attachment #789797 - Flags: approval-mozilla-beta?
Comment on attachment 789797 [details] [diff] [review] Disable TURN by default (Set media.peerconnection.turn.disable=true) Approving the pref-off for TURN server support in beta as well(already in 23.0.1) until a final tested fix is available to resolve this bug.
Attachment #789797 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Keywords: verifyme
While we didn't set out to explicitly verify this bug is fixed we did do extensive WebRTC testing in Firefox 23.0.1. See https://wiki.mozilla.org/Releases/Firefox_23/Test_Plan#WebRTC I think it's safe to call this verified fixed for 23 based on this testing. Note that we will be repeating this testing for Beta in Firefox 24.0b4 early next week.
Can you please test th
Flags: needinfo?(anthony.s.hughes)
Hit save too soon
Flags: needinfo?(anthony.s.hughes)
Tracy, can you update this bug with the results of the WebRTC testing in 24b4?
Flags: needinfo?(twalker)
Overnight testers worked on 24b4, their results can be found here: https://wiki.mozilla.org/Releases/Firefox_24/Test_Plan#Regression_Tests_4. bug 907213 and bug 907214 were filed as a result of that testing.
Flags: needinfo?(twalker)
Target Milestone: --- → mozilla26
Attachment #795157 - Flags: review?(adam)
(In reply to Eric Rescorla (:ekr) from comment #34) > Try that covers this patch at: > https://tbpl.mozilla.org/?tree=Try&rev=d7355aacf380 Mihaela, please test this try build to verify this patch is working correctly.
QA Contact: mihaela.velimiroviciu
Comment on attachment 795157 [details] [diff] [review] Fix TURN long-term auth for Permissions Requests Review of attachment 795157 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me. r+ with nits. ::: media/mtransport/test/ice_unittest.cpp @@ +571,5 @@ > + g_turn_user, g_turn_password); > + > + ASSERT_TRUE(Gather(true)); > + Connect(); > +} This is line-for-line the same as "TestConnectTurn". ::: media/mtransport/test/turn_unittest.cpp @@ +314,5 @@ > TEST_F(TurnClient, SendToSelf) { > Allocate(); > SendTo(relay_addr_); > ASSERT_TRUE_WAIT(received() == 100, 1000); > + PR_Sleep(10000); // Wait 10 seconds and try again We might expand this comment to explain why we're doing this (i.e., to give the transaction time to timeout).
Attachment #795157 - Flags: review?(adam) → review+
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #35) > (In reply to Eric Rescorla (:ekr) from comment #34) > > Try that covers this patch at: > > https://tbpl.mozilla.org/?tree=Try&rev=d7355aacf380 > > Mihaela, please test this try build to verify this patch is working > correctly. I built Firefox from the source of https://hg.mozilla.org/try/rev/d7355aacf380 and tried to verify the fix using the https://apprtc.appspot.com/ demo (the tokbox demo is no longer available) but it remains in "Connecting..." state (1:1 call between 2 Ubuntu machines in the same network). Let me know if there is anything else I can try.
Please provide the JS console logs from Web Console.
Attached file js logs
(In reply to Eric Rescorla (:ekr) from comment #38) > Please provide the JS console logs from Web Console. JS logs from web console
OK this is bug 908740 I was hoping Google would fix their server, but it sounds like we need to be more aggressive.
Attachment #795157 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
ekr, which path do you want to take for Aurora 25? We've got ~8 weeks till release, if that colors your proposal.
Flags: needinfo?(ekr)
Suggest we have QA test this briefly and if all looks fine we uplift to Aurora. Note that Aurora has TURN on so this should be low-risk compared to doing nothing.
Flags: needinfo?(ekr)
Mozilla/5.0 (X11; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0, build id: 20130903030201 I still cannot make a call between 2 Ubuntu machines using the latest Nightly build: the second machine gets the message "Sorry, this room is full. Click here to try again.". There are no JS logs in the web console for the 2nd machine, and for the first one there is only "[12:48:51.868] calling createOffer without failureCallback is deprecated!"
(In reply to Mihaela Velimiroviciu [QA] (:mihaelav) from comment #47) > Mozilla/5.0 (X11; Linux i686; rv:26.0) Gecko/20100101 Firefox/26.0, build > id: 20130903030201 > > I still cannot make a call between 2 Ubuntu machines using the latest > Nightly build: the second machine gets the message "Sorry, this room is > full. Click here to try again.". This is probably not a problem with the WebRTC implementation. It's probably either user error or some bug in the AppRTC conferencing logic. > There are no JS logs in the web console for > the 2nd machine, and for the first one there is only "[12:48:51.868] calling > createOffer without failureCallback is deprecated!" This is normal
Mihaela, please give this another attempt tonight. Maire has advised me that this may just be a temporary conferencing server issue. Please give this another shot and we'll decide how to proceed when we have your results in the morning.
I still cannot make a call between 2 Nightlies from Mac OS X 10.8.4 - Mac OS X 10.7.5 and Mac OS X 10.8.4 - Ubuntu 13.04: both the callee and the caller remain stuck in "Connecting..." mode. Attached you can find all the logs from the Web Console.
Something is really messed up here, but I don't think it's us. From the Mac log: Initializing; room=24682468 And from the Ubuntu log: [11:18:32.463] "Initializing; room=13579135." We shouldn't expect this to work b/c you need to be in the same room. 1. Can you describe exactly what steps you used here, since this looks like either user error or a bug in apprtc. 2. Are you doing this same test regularly with 23.01? Does that work?
(In reply to Eric Rescorla (:ekr) from comment #51) > Something is really messed up here, but I don't think it's us. From the > Mac log: > > Initializing; room=24682468 > > > And from the Ubuntu log: > [11:18:32.463] "Initializing; room=13579135." > > > We shouldn't expect this to work b/c you need to be in the same room. The room in Mac logs is from the Mac OS X 10.8.4 - 10.7.5 call, and the room in Ubuntu logs is from a Mac 10.8 - Ubuntu 13.04 call. Sorry for the misunderstanding. I attached again a pair of logs from the same call: Ubuntu 13.04/Nightly - Mac OS X 10.8/Nightly > 2. Are you doing this same test regularly with 23.01? Does that work? Yes, it works with 23.0.1
OK, the problem is that apprtc is busted again. Look below for the line labelled with <--- HERE Apprtc returned all zero TURN addresses. Looks like we need to prioritize bug 908740, since we can't trust apprtc. FWIW, apprtc seems to be fixed again. I just tried with Nightly and all was good. [15:48:30.265] "C->S: {"type":"offer","sdp":"v=0 o=Mozilla-SIPUA-26.0a1 19957 0 IN IP4 0.0.0.0 s=SIP Call t=0 0 a=ice-ufrag:89c0f920 a=ice-pwd:df6a4828052d5103e0eabf93f416b47a a=fingerprint:sha-256 2C:BC:F0:F0:86:60:55:77:0A:36:A5:AB:E7:55:B1:95:9E:1E:C1:9A:EE:30:17:95:F1:12:D9:B7:88:AA:C8:8D m=audio 0 RTP/SAVPF 109 0 8 101 c=IN IP4 0.0.0.0 a=rtpmap:109 opus/48000/2 a=ptime:20 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15 a=sendrecv a=candidate:0 1 UDP 2130379007 192.168.12.186 55353 typ host a=candidate:1 1 UDP 1694236671 82.137.35.243 52696 typ srflx raddr 192.168.12.186 rport 55353 a=candidate:2 1 UDP 1694236671 82.137.35.243 63363 typ srflx raddr 192.168.12.186 rport 55353 a=candidate:3 1 UDP 16515071 0.0.0.0 0 typ relay raddr 0.0.0.0 rport 0 a=candidate:0 2 UDP 2130379006 192.168.12.186 54758 typ host a=candidate:1 2 UDP 1694236670 82.137.35.243 54055 typ srflx raddr 192.168.12.186 rport 54758 a=candidate:2 2 UDP 1694236670 82.137.35.243 46182 typ srflx raddr 192.168.12.186 rport 54758 a=candidate:3 2 UDP 16515070 0.0.0.0 0 typ relay raddr 0.0.0.0 rport 0 m=video 0 RTP/SAVPF 120 c=IN IP4 0.0.0.0 a=rtpmap:120 VP8/90000 a=sendrecv a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=candidate:0 1 UDP 2130379007 192.168.12.186 51218 typ host a=candidate:1 1 UDP 1694236671 82.137.35.243 28585 typ srflx raddr 192.168.12.186 rport 51218 a=candidate:2 1 UDP 1694236671 82.137.35.243 59788 typ srflx raddr 192.168.12.186 rport 51218 a=candidate:3 1 UDP 16515071 0.0.0.0 0 typ relay raddr 0.0.0.0 rport 0 <-- HERE a=candidate:0 2 UDP 2130379006 192.168.12.186 61510 typ host a=candidate:1 2 UDP 1694236670 82.137.35.243 24795 typ srflx raddr 192.168.12.186 rport 61510 a=candidate:2 2 UDP 1694236670 82.137.35.243 34165 typ srflx raddr 192.168.12.186 rport 61510 a=candidate:3 2 UDP 16515070 0.0.0.0 0 typ relay raddr 0.0.0.0 rport 0 <-- HERE
This wireshark trace shows 0.0.0.0 from 192.158.29.39, so I think this is a bug in google's TURN server
Comment on attachment 798086 [details] [diff] [review] Fix TURN long-term auth for Permissions RequestsBug 904598 - Fix TURN long-term auth for Permissions Requests. Review of attachment 798086 [details] [diff] [review]: ----------------------------------------------------------------- [Approval Request Comment] Bug caused by (feature/regressing bug #): 904598, 905150, 915420 User impact if declined: Users will not be able to connect with WebRTC whenever a site offers TURN service. With 904598 and 905150 but not 915420, this will only impact Linux users. Testing completed (on m-c, etc.): Bugs 904598, 905150 tested by EKR and then EKR and ABR. Bug 915420 fixed today and tested by EKR and ABR. Recommend re-testing by QA prior to uplift. Risk to taking this patch (and alternatives if risky): This shouldn't make things any worse. The alternative is to turn TURN off entirely. String or IDL/UUID changes made by this patch: None.
Attachment #798086 - Flags: approval-mozilla-aurora?
(In reply to Eric Rescorla (:ekr) from comment #56) > Risk to taking this patch (and alternatives if risky): > This shouldn't make things any worse. The alternative is to > turn TURN off entirely. I suspect we'd make this decision during the Beta cycle, but would like to salvage what we've got currently. I support that.
Attachment #798086 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #798086 - Flags: checkin?
Comment on attachment 798086 [details] [diff] [review] Fix TURN long-term auth for Permissions RequestsBug 904598 - Fix TURN long-term auth for Permissions Requests. https://hg.mozilla.org/releases/mozilla-aurora/rev/9256e4aef3c8
Attachment #798086 - Flags: checkin? → checkin+
I have been trying to verify this on nightly and aurora using builds with the fix. The plain case where you establish a call by visiting apprtc.webrtc.org works for both branches, on OS X as well as Linux (Ubuntu). However, I can't seem to get the call to go through if I block traffic so that the communications goes through TURN (see bug 915420). The command line arguments I used to do this are: sudo ipfw add allow udp from any to any 67 out keep-state sudo ipfw add allow udp from any 67 to any 68 sudo ipfw add allow udp from any to any dst-port 68 keep-state sudo ipfw add allow udp from any to any src-port 68 keep-state sudo ipfw add allow udp from any to any dst-port 53 keep-state sudo ipfw add allow udp from any to any dst-port 3478 keep-state sudo ipfw add deny udp from any to any So I must not have verified bug 915420 correctly because yesterday I was able to do this on the nightly and in both instances (TURN/No TURN) it worked fine.
I have been trying this again a few more times using nightlies and auroras on both Mac OS X and Ubuntu, and establishing calls between the nightlies, between the auroras, and between a nightly and aurora, and it is now working for me. I tried connecting to Byron's machine (bwc) and that worked too. He also confirmed we were using TURN. I don't know what to attribute the earlier problem to, but since I have had consistent results now, I say this is verified.
Thanks Juan, can you please confirm which versions of Firefox you tested? This should be fixed for Firefox 24, 25, and 26.
Tested on 24, 25, 26, 27.
Thanks a lot for your help, Juan.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: