Closed
Bug 888274
Opened 11 years ago
Closed 11 years ago
Upgrade to RFC 5389 STUN
Categories
(Core :: WebRTC: Networking, defect)
Core
WebRTC: Networking
Tracking
()
VERIFIED
FIXED
mozilla25
Tracking | Status | |
---|---|---|
firefox23 | --- | unaffected |
firefox24 | + | verified |
firefox25 | --- | fixed |
fennec | 24+ | --- |
People
(Reporter: ekr, Assigned: ekr)
References
Details
(Whiteboard: [WebRTC][blocking-webrtc-])
Attachments
(2 files, 1 obsolete file)
5.26 KB,
patch
|
abr
:
review+
bajaj
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
612.64 KB,
text/plain
|
Details |
Firefox currently generates RFC 3489 backward-compatible STUN requests,
but it actually wants to receive a 5389 XOR-MAPPED-ADDRESS. This creates
problems with servers which actually do backward compatibility.
We should just upgrade to RFC5389, which is what Google does.
Try removing:
'USE_RFC_3489_BACKWARDS_COMPATIBLE',
from nicer.gyp
Assignee | ||
Comment 1•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → ekr
Assignee | ||
Comment 2•11 years ago
|
||
Update: this patch works with the Firefox STUN server but not the Google servers because Google is offering MAPPED-ADDRESS instead of XOR-MAPPED-ADDRESS. Investigations continue but I think this may be a problem on Google's side.
Updated•11 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [WebRTC] [blocking-webrtc-]
Comment 4•11 years ago
|
||
Adding a note here that one of the affected STUN servers is the following:
[{url:"stun:stun.l.google.com:19302"}]
Which causes the talky.io and tokbox demo to fail occasionally.
A couple of compatibility questions:
1. What is the web developer workaround here? Specifically, what STUN servers should we suggest web developers to use vs. not use due to this bug existing?
2. What's the breadth of STUN servers impacted here?
Comment 5•11 years ago
|
||
Given that this is causing call failures on at least two WebRTC demos on fxAndroid to desktop, I'm adding the tracking-fennec flag ? that was on the dupe of this bug.
Blocks: android-webrtc
tracking-fennec: --- → ?
Updated•11 years ago
|
Whiteboard: [WebRTC] [blocking-webrtc-] → [WebRTC][blocking-webrtc-][android-webrtc+]
Updated•11 years ago
|
tracking-firefox24:
--- → ?
Updated•11 years ago
|
status-firefox23:
--- → unaffected
status-firefox24:
--- → affected
status-firefox25:
--- → affected
Assignee | ||
Comment 6•11 years ago
|
||
I have reached out to Google to see about them fixing their (noncompliant) server.
Possibilities:
1. They just fix it and we land the above patch.
2. They fix it but make it conditional on the SOFTWARE field, which we just add and set to nICEr or something.
3. They don't fix it and we land a patch that accepts MAPPED-ADDRESS.
Comment 7•11 years ago
|
||
As Ekr says in comment 6, Google's server is noncompliant, and this issue is not a bug in our code (and it's definitely not specific to Android).
Whiteboard: [WebRTC][blocking-webrtc-][android-webrtc+] → [WebRTC][blocking-webrtc-][android-webrtc-]
Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Maire Reavy [:mreavy] from comment #7)
> As Ekr says in comment 6, Google's server is noncompliant, and this issue is
> not a bug in our code (and it's definitely not specific to Android).
This isn't quite correct.
There is a defect in our code in that we are offering 3489 STUN but not
accepting MAPPED-ADDRESS in response. (See the description).
However, even when this defect is fixed we will still have a problem with
Google's servers.
Comment 9•11 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #8)
> (In reply to Maire Reavy [:mreavy] from comment #7)
> > As Ekr says in comment 6, Google's server is noncompliant, and this issue is
> > not a bug in our code (and it's definitely not specific to Android).
>
> This isn't quite correct.
>
> There is a defect in our code in that we are offering 3489 STUN but not
> accepting MAPPED-ADDRESS in response. (See the description).
>
> However, even when this defect is fixed we will still have a problem with
> Google's servers.
Thanks for clarifying. Referring to the possibilities in comment 6: How likely is Google to fix their server code? How much time/effort is it to do #3 (land a patch that accepts MAPPED-ADDRESS)?
Comment 10•11 years ago
|
||
Right, that was my understanding as well - there's a fix we need here on our side.
Also - although this isn't directly related to the Android code, it's impacting important demos for Android. The dupes demonstrate this is preventing usage of at least two WebRTC demos. That's enough of a case to block on. So I disagree on not blocking on this. We're getting going to negative reviews on Google Play if common WebRTC demos aren't working.
Whiteboard: [WebRTC][blocking-webrtc-][android-webrtc-] → [WebRTC][blocking-webrtc-][android-webrtc+]
Assignee | ||
Comment 11•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #10)
> Right, that was my understanding as well - there's a fix we need here on our
> side.
No, that's actually not obviously true either. The STUN servers I
have seen (aside from Google) generally respond with XOR-MAPPED-ADDRESS.
> Also - although this isn't directly related to the Android code, it's
> impacting important demos for Android. The dupes demonstrate this is
> preventing usage of at least two WebRTC demos. That's enough of a case to
> block on. So I disagree on not blocking on this. We're getting going to
> negative reviews on Google Play if common WebRTC demos aren't working.
WebRTC for Android should not be preffed on in release builds in any case.
I don't really care about the flags here, but
Comment 12•11 years ago
|
||
Let's take this to email with release managers & fxandroid drivers cc-ed at this point. I'll start the email thread shortly.
Updated•11 years ago
|
Whiteboard: [WebRTC][blocking-webrtc-][android-webrtc+] → [WebRTC][blocking-webrtc-][android-webrtc?]
Assignee | ||
Comment 13•11 years ago
|
||
Update:
I have contacted Google and they are going to look into fixing their
server. If they don't fix it, I have a workaround ready to go.
Updated•11 years ago
|
tracking-fennec: ? → 24+
Assignee | ||
Comment 14•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #768958 -
Attachment is obsolete: true
Assignee | ||
Updated•11 years ago
|
Attachment #781383 -
Flags: review?(adam)
Comment 15•11 years ago
|
||
Is there an update on this? If we don't have a fix from google, then we'll need your workaround to land before Monday 08/05. Thank you.
Assignee | ||
Comment 16•11 years ago
|
||
Google isn't going to fix this in a timely fashion. We need to land the patch above
Comment 17•11 years ago
|
||
Comment on attachment 781383 [details] [diff] [review]
Merged patch to do 5389 and MAPPED-ADDRESS
Review of attachment 781383 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with nits.
stun_client_ctx.c:577: tab character
stun_client_ctx.c:578: tab character
stun_client_ctx.c:579: tab character
stun_client_ctx.c:580: tab character
stun_client_ctx.c:581: tab character
stun_client_ctx.c:582: tab character
stun_client_ctx.c:583: tab character
stun_client_ctx.c:584: tab character
stun_proc.c:220: tab character
::: media/mtransport/third_party/nICEr/src/ice/ice_candidate.c
@@ +655,5 @@
> switch(cand->u.srvrflx.stun->state){
> /* OK, we should have a mapped address */
> case NR_STUN_CLIENT_STATE_DONE:
> /* Copy the address */
> + nr_transport_addr_copy(&cand->addr, &cand->u.srvrflx.stun->results.stun_binding_response.mapped_addr);
This change appears to make "nr_stun_client_stun_binding_response_stund_0_96_results" effectively deadwood. I think you want to finish this off by stripping it out of stun_client_ctx.h, and changing its one reference in stun_client_ctx.c to use stun_binding_response instead.
Attachment #781383 -
Flags: review?(adam) → review+
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Adam Roach [:abr] (Traveling through Aug 5) from comment #17)
> Comment on attachment 781383 [details] [diff] [review]
> Merged patch to do 5389 and MAPPED-ADDRESS
>
> Review of attachment 781383 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> r+ with nits.
>
> stun_client_ctx.c:577: tab character
> stun_client_ctx.c:578: tab character
> stun_client_ctx.c:579: tab character
> stun_client_ctx.c:580: tab character
> stun_client_ctx.c:581: tab character
> stun_client_ctx.c:582: tab character
> stun_client_ctx.c:583: tab character
> stun_client_ctx.c:584: tab character
> stun_proc.c:220: tab character
Done
> ::: media/mtransport/third_party/nICEr/src/ice/ice_candidate.c
> @@ +655,5 @@
> > switch(cand->u.srvrflx.stun->state){
> > /* OK, we should have a mapped address */
> > case NR_STUN_CLIENT_STATE_DONE:
> > /* Copy the address */
> > + nr_transport_addr_copy(&cand->addr, &cand->u.srvrflx.stun->results.stun_binding_response.mapped_addr);
>
> This change appears to make
> "nr_stun_client_stun_binding_response_stund_0_96_results" effectively
> deadwood. I think you want to finish this off by stripping it out of
> stun_client_ctx.h, and changing its one reference in stun_client_ctx.c to
> use stun_binding_response instead.
I don't think this is right. STUN and ICE are separate modules here and
some other STUN user might want this.
Comment 19•11 years ago
|
||
Okay, that makes sense.
Assignee | ||
Comment 20•11 years ago
|
||
Comment 21•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Updated•11 years ago
|
Comment 22•11 years ago
|
||
Looking for QA to verify that this patch fixes the problems reported. If so, we'll request uplift to Fx24. Thanks.
Keywords: verifyme
Comment 23•11 years ago
|
||
Aaron & I can take a look at this tomorrow.
Updated•11 years ago
|
Whiteboard: [WebRTC][blocking-webrtc-][android-webrtc?] → [WebRTC][blocking-webrtc-][android-webrtc?][webrtc-uplift]
Comment 24•11 years ago
|
||
Jason and I ran through a variety of testing combinations and demos this afternoon and this was our findings
Builds/Config:
* Nightly (08/07) on Desktop (Mac OSX 10.8.4 & Windows 7 64-Bit) and Android (4.3/4.2)
* Nexus 4 (Android 4.3), Samsung Galaxy Nexus (Android 4.2)
* Residential Wi-Fi and Office Wi-Fi (Mozilla Network)
Talky.io:
* We both are not able to see connected clients to a session. This will need further investigation into the issue. As it stands, I am re-opening bug 877701.
* We noticed this error:
[14:08:47.308] NS_ERROR_UNEXPECTED: Unexpected error @ https://talky.io/talky.0ecfcb9a.js:19385
TokBox (http://tokbox.com/opentok/webrtc/demo)
* Confirmed working using the WebRTC demo using Desktop and Android
* The demo did not work on 4G data connection for Jason; perhaps expected. We will check with a follow-up to see if Wi-Fi works out for him
AppRTC (https://apprtc.appspot.com)
* Confirmed working for us using the WebRTC demo using Desktop and Android
* The demo did not work on 4G data connection for Jason; perhaps expected?
* Call went through with Android phone on office WiFi (Mozilla) to Desktop Firefox on (Mozilla-G) WiFi, but performance was extremely slow for video/audio output
* Residential WiFi Desktop to Android on the same network was flawless
* Toronto to Mountain View (Firefox on Android to Firefox on Android), call went through but the session yielded too much latency and was generally slow
* Interop call between Chrome Beta for Android (Mountain View) to Firefox on Android (Toronto), call went through but there was still much latency but far less than Firefox on Android to Firefox on Android
* Mountain View call to Toronto from Desktop to Android on first attempt call failed to establish a connection. On second attempt the call was very smooth, little latency
* Toronto call to Mountain View with Firefox on Android to Chrome on Desktop, the call went through with little latency
Comment 25•11 years ago
|
||
One additional note about the error below:
[14:08:47.308] NS_ERROR_UNEXPECTED: Unexpected error @ https://talky.io/talky.0ecfcb9a.js:19385
When I inspected the JS code, this appears to be failing during the creation of the peer connection object with the STUN server - [{url:"stun:stun.l.google.com:19302"}].
We need to deep dive to figure out why this is happening. bug 877701 has been reopened as a result.
Comment 26•11 years ago
|
||
Just retested also with tokbox over wifi on desktop firefox --> firefox for android - call went through for me as well.
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #25)
> One additional note about the error below:
>
> [14:08:47.308] NS_ERROR_UNEXPECTED: Unexpected error @
> https://talky.io/talky.0ecfcb9a.js:19385
>
> When I inspected the JS code, this appears to be failing during the creation
> of the peer connection object with the STUN server -
> [{url:"stun:stun.l.google.com:19302"}].
>
> We need to deep dive to figure out why this is happening. bug 877701 has
> been reopened as a result.
Hmmm.... Can you provide the entire JS? This seems like a separate issue.
Comment 28•11 years ago
|
||
Comment 29•11 years ago
|
||
Also - looks like calls with talky.io are completely failing entirely, even on desktop firefox --> desktop firefox.
I get the feeling this patch might have broke talky.io entirely. We need a regression range to confirm.
Comment 31•11 years ago
|
||
Did some regression digging around the talky.io failure - this appears to be broken on release 23 with firing an NS_ERROR_FAILURE on creation of the peer connection & the NS_ERROR_UNEXPECTED on nightly before and after this patch. So it looks like this is confirmed to be a different issue. I'll file a separate bug for it.
Comment 32•11 years ago
|
||
Filed bug 902615 for the talky.io issue we've seen.
Comment 33•11 years ago
|
||
I think the analysis above confirms this patch is okay to take to Beta.
We do need to do a separate analysis on what's happening on bug 902615. Additionally, we need to continue to study ways to improve performance of calls (especially in the mobile environment) and figure out faster paths to diagnose call failures based on the analysis done above.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Comment 34•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #33)
> I think the analysis above confirms this patch is okay to take to Beta.
>
> We do need to do a separate analysis on what's happening on bug 902615.
> Additionally, we need to continue to study ways to improve performance of
> calls (especially in the mobile environment) and figure out faster paths to
> diagnose call failures based on the analysis done above.
Update on this - we went back and did some more testing around this. We used a test app that made use of the affected STUN server directly to ensure that we were hitting the affected path. talky.io was unavailable to use due to another site bug found during testing. What we found is summarized below.
1. Call 1 - Desktop to FxAndroid
Result - Call failed - iceConnectionState indicated "failed"
2. Call 2 - FxAndroid to FxAndroid
Result - Call established - video streaming performance was acceptable, but audio was chopped up by each syllable.
3. Call 3 - FxAndroid to FxAndroid (followup to same session from call 2)
Result - Call failed - likely the issue being seen in bug 904598.
4. Call 4 - FxAndroid to FxAndroid (cold start of a new session)
Result - Call established - video streaming performance was bad and audio was chopped up by each syllable.
Conclusion - There's a lot of unpredictability here, which is leading QA to think that we should avoid taking the uplift here. The calls did go through on some cases, but the unpredictable results questions whether this is safe to uplift.
Assignee | ||
Comment 35•11 years ago
|
||
Can you please compare this to beta without this patch. I suppose it's possible this would cause problems with Android video once it is established, but it seems rather unlikely.
Comment 36•11 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #35)
> Can you please compare this to beta without this patch. I suppose it's
> possible this would cause problems with Android video once it is
> established, but it seems rather unlikely.
When this was tested before this patch landed, the apprtc performance issues seen were already present in that build. Doing calls with the affected stun server also failed 100%.
Now we see the same performance issues on apprtc. Except doing calls with the affected stun server appears to establish on FxAndroid --> FxAndroid. So this is an improvement.
Assignee | ||
Comment 37•11 years ago
|
||
(In reply to Jason Smith [:jsmith] from comment #36)
> (In reply to Eric Rescorla (:ekr) from comment #35)
> > Can you please compare this to beta without this patch. I suppose it's
> > possible this would cause problems with Android video once it is
> > established, but it seems rather unlikely.
>
> When this was tested before this patch landed, the apprtc performance issues
> seen were already present in that build. Doing calls with the affected stun
> server also failed 100%.
>
> Now we see the same performance issues on apprtc. Except doing calls with
> the affected stun server appears to establish on FxAndroid --> FxAndroid. So
> this is an improvement.
OK, so since this is a connectivity problem on desktop, this seems like
that's not enough of an argument against uplift....
Comment 38•11 years ago
|
||
Yeah, now that I think about this again, this is actually more favorable to take the patch given that we do get FxAndroid --> FxAndroid calls established with no other regressions seen.
Comment 39•11 years ago
|
||
:ekr,
Are we planning on uplifting this ? If yes, that should happen sooner than later to get maximum beta testing .
Flags: needinfo?(ekr)
Comment 40•11 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #39)
> :ekr,
>
> Are we planning on uplifting this ? If yes, that should happen sooner than
> later to get maximum beta testing .
Please note we are planned to gtb for our next beta this afternoon PT (noon- 1PM) hence landing this asap will be helpful as Maire just confirmed that we are certain that this needs an uplift.
Assignee | ||
Comment 41•11 years ago
|
||
I'm fine with uplifting it. What button do I push?
Flags: needinfo?(ekr)
Assignee | ||
Comment 42•11 years ago
|
||
Comment on attachment 781383 [details] [diff] [review]
Merged patch to do 5389 and MAPPED-ADDRESS
[Approval Request Comment]
Bug caused by (feature/regressing bug #): This has been there since the beginning.
User impact if declined:
Failure during calls between two machines that are behind different NATs on sites which use Google's STUN servers.
Testing completed (on m-c, etc.):
See above in bug.
Risk to taking this patch (and alternatives if risky):
Most probable risk is that some other server behaves in another unpredictable way and so some other calls don't work. Aside from that, the usual risks in taking any patch.
String or IDL/UUID changes made by this patch:
None.
Attachment #781383 -
Flags: approval-mozilla-beta?
Comment 43•11 years ago
|
||
Comment on attachment 781383 [details] [diff] [review]
Merged patch to do 5389 and MAPPED-ADDRESS
Given we do not want ship android webrtc with this bug, approving this on beta given QA verification.
Please let QA know if there is any additional testing that may be needed here .
Attachment #781383 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 44•11 years ago
|
||
Whiteboard: [WebRTC][blocking-webrtc-][android-webrtc?][webrtc-uplift] → [WebRTC][blocking-webrtc-]
Comment 45•11 years ago
|
||
I've verified that what I see on mozilla-beta with this patch on the latest candidate matches the results as seen on mozilla-central that Jason and I conducted last week.
You need to log in
before you can comment on or make changes to this bug.
Description
•