bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

onicecandidate(null) takes 20 seconds longer to fire on systems with VMs (vs. Chrome only takes 10 seconds longer)

NEW
Unassigned

Status

()

Core
WebRTC: Networking
P5
normal
Rank:
45
3 years ago
10 months ago

People

(Reporter: jib, Unassigned)

Tracking

38 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [tech-debt], URL)

Attachments

(3 attachments)

There's a marked difference between platforms, which I've noticed for a while, and then it came up on #media today that Chrome apparently doesn't have this problem:

> bemasc: Chrome gathers candidates and immediately emits onicecandidate(null). It looks like
>         Firefox, by default, delays the null candidate (gathering finished) for a long time.

This was on linux. People seem to be trying to work around it with JS.

Separately, I've observed a marked difference in Firefox on my MBP OSX 10.9.5 vs. Windows 7, both on my home wifi:

STR:
- Go to URL, press the Start Button and wait up to 20 seconds.

I get:

On Firefox on OSX:  null candidate took 0.04 seconds.
On Firefox on Win7: null candidate took 20.02 seconds.

On Chrome  on OSX:  null candidate took 0.15 seconds.
On Chrome  on Win7: null candidate took 9.52 seconds.

The numbers are quite stable for me. Interestingly, Chrome also struggles on Windows, but takes half as long as Firefox.

If someone could try the fiddle on Firefox and Chrome on linux that would be great.

Comment 1

3 years ago
I don't have a windows box. Can you post the results of this test with R_LOG_LEVEL=9 R_LOG_DESTINATION=stderr
Created attachment 8614352 [details]
output.log

Comment 3

3 years ago
Looking at this, it actually looks like it's just that STUN is taking forever
to time out. Do you have some kind of UDP blocking firewall? What candidates
do you get?
Created attachment 8614423 [details]
icecandidates.txt

I've updated the fiddle to dump the candidates, and here's the output.

On Windows, pretty much all the candidates listed appear within the first second, and then it sits there. This seems true on both Chrome and Firefox.

It's a reasonably new ISP-provided router that I haven't messed with yet, and stock Windows setup. I haven't noticed any UDP blocking, and there looks to be udp candidates. The Mac is on the same network and has no delay (though it also produces much fewer candidates as you can see).
Can you timestamp the output?  That would help a bunch.  Also turn on signaling:5,timestamp,mtransport:5 logging.

What are the ICE servers configured to?  Same for Chrome & FF?
Flags: needinfo?(jib)
Created attachment 8614457 [details]
output2.log (timestamped)

Ice server is configured in line 5 of the fiddle. New output:

> 0.04 - setLocal finished...
> 
> 0.13 - {"candidate":"candidate:0 1 UDP 2128609535 192.168.1.5 65264 typ host","sdpMid":"","sdpMLineIndex":0}
> 
> 0.14 - {"candidate":"candidate:2 1 UDP 2128543999 192.168.197.1 65265 typ host","sdpMid":"","sdpMLineIndex":0}
> 
> 0.14 - {"candidate":"candidate:4 1 UDP 2128478463 192.168.245.1 65266 typ host","sdpMid":"","sdpMLineIndex":0}
> 
> 0.14 - {"candidate":"candidate:6 1 UDP 2128412927 192.168.56.1 65267 typ host","sdpMid":"","sdpMLineIndex":0}
> 
> 0.16 - {"candidate":"candidate:1 1 UDP 1692467199 24.47.153.9 65264 typ srflx raddr 192.168.1.5 rport 65264","sdpMid":"","sdpMLineIndex":0}
> 
> 20.14 - null candidate took 20.10 seconds
Flags: needinfo?(jib)
It seems that you have some extra interfaces that can't reach the STUN server. What are those, anyway?

Comment 8

3 years ago
(In reply to Byron Campen [:bwc] from comment #7)
> It seems that you have some extra interfaces that can't reach the STUN
> server. What are those, anyway?

Yes, this seems like a correct analysis. As far as I can tell, we're behaving
correctly here. It might be worth turning down our STUN timeouts, but part
of the point of trickle ICE is that you don't need to wait for the end of
candidates.
Why no delay on OSX?

(In reply to Byron Campen [:bwc] from comment #7)
> It seems that you have some extra interfaces that can't reach the STUN server. What are those, anyway?

Can you point to them? I would like to know as well.
Attachment #8614457 - Attachment mime type: text/x-log → text/plain
(In reply to Jan-Ivar Bruaroey [:jib] from comment #9)
> Why no delay on OSX?
> 
> (In reply to Byron Campen [:bwc] from comment #7)
> > It seems that you have some extra interfaces that can't reach the STUN server. What are those, anyway?
> 
> Can you point to them? I would like to know as well.

You have 4 host candidates there, each with a different IP address. The only one that seems to be working is 192.168.1.5 (see the raddr in the srflx candidate).

Comment 11

3 years ago
only impacting VM's (dev/test) - so not hitting main user base.  can we identify vm interfaces and default to not include them - but behind pref for testing/dev for edge case.

ultimately this is how ICE works, could be issue for gateways that don't interface with trickle ICE.
backlog: --- → webRTC+
Rank: 45
Priority: -- → P4
Whiteboard: [tech-debt]
(In reply to :shell escalante from comment #11)
> only impacting VM's (dev/test) - so not hitting main user base.  can we
> identify vm interfaces and default to not include them - but behind pref for
> testing/dev for edge case.

This doesn't seem like a great idea. Lots of people run VMs and
unless you deliberately don't do trickle, there's no problem.


> ultimately this is how ICE works, could be issue for gateways that don't
> interface with trickle ICE.

They can implement their own timers.
Thanks for the investigation. I agree it's only corner-case scripts that wait for null candidate that will run into this, and it seems to me that those scripts can easily be tuned to time out sooner by some formula of time and candidates found, once they learn about this problem with VMs.

I suppose the remaining question is, with this information, whether we find it valuable to tune our STUN timeouts to more align with Chrome's? Or are we good?
Summary: onicecandidate(null) takes 20+ seconds to fire on Windows+Linux, vs. instant on OSX, vs. faster in Chrome → onicecandidate(null) takes 20 seconds longer to fire on systems with VMs (vs. Chrome only takes 10 seconds longer)
Mass change P4->P5 to align with new Mozilla triage process.
Priority: P4 → P5
You need to log in before you can comment on or make changes to this bug.