Closed Bug 1622759 Opened 4 years ago Closed 4 years ago

WebRTC: failed to create any UDP candidates (error = 10) on ICE restart

Categories

(Core :: WebRTC: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- fixed

People

(Reporter: mawojtcz, Assigned: mawojtcz)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Attached file aboutWebrtc.html

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36

Steps to reproduce:

We've found this issue when testing our Cisco Meeting Server WebRTC app. It only happens for clients running on PCs connected to wifi. When the user in our app starts a call in a special "presentation mode" that doesn't use camera or microphone and starts to share their screen, we get ICE connection failure. We can see in the logs that when user started presenting and we changed the direction of one of the transceivers, a new SDP offer was created and resulted in ICE restart. At that point Firefox fails to gather any UDP local candidates and ICE connection fails.

Example relevant part of the ICE log (full log attached):
(ice/INFO) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc))/STREAM(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc) transport-id=transport_0 - 25dfc27a:98c16fd8900210ee65c36a2156fa00ac)/COMP(1): Consent refreshed
(ice/ERR) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): unable to add interface
(ice/ERR) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): Failed to create candidate of type host
(ice/INFO) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): failed to create UDP candidates with error 10
(ice/INFO) Z:/task_1584302485/build/src/media/mtransport/third_party/nICEr/src/net/nr_socket_multi_tcp.c:173 function nr_socket_multi_tcp_create_stun_server_socket skipping UDP STUN server(addr:)
(ice/WARNING) Z:/task_1584302485/build/src/media/mtransport/third_party/nICEr/src/net/nr_socket_multi_tcp.c:617 function nr_socket_multi_tcp_listen failed with error 3
(ice/WARNING) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): failed to create passive TCP host candidate: 3
(ice/ERR) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): Failed to create candidate of type host
(ice/WARNING) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): failed to create active TCP host candidate: 10
(ice/ERR) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): Failed to create candidate of type relay
(ice/INFO) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): failed to create TCP candidates with error 10
(ice/ERR) ICE(PC:1584352704099000 (id=6442450945 url=https://pajoin.cisco.com/en-us/meeting/cb55c7c5-d58a-42c2-bb3c-eb1a8bdb28dc)): couldn't create any valid candidates

Actual results:

We've done some investigation and found that this issue has only started happening recently, after this change was delivered: https://hg.mozilla.org/mozilla-unified/rev/63d700057a9d1087c0d53989ac2d3ad541787fab

These are the detailed steps of what actually happens in the code:

  1. User has multiple network interfaces (for example ethernet and wifi)
  2. WebRTC connection is created and ICE connection established without user accepting any camera/mic permissions
    2.1 As a result, the flag NR_ICE_CTX_FLAGS_ONLY_DEFAULT_ADDRS is set to true and we add only 1 interface to InterfacePrioritizer
  3. User decides to start screen sharing -> the app asks for permissions, user grants them -> flag NR_ICE_CTX_FLAGS_ONLY_DEFAULT_ADDRS is unset
  4. The direction of one of the transceivers is changed (because of the screen sharing), so a new SDP offer is required
  5. New local SDP offer is created -> ICE is restarted
  6. ICE candiadate gathering is started -> nr_ice_set_local_addresses() is called, this time there are multiple interfaces:
    6.1 The first call to nr_interface_prioritizer_add_interface() succeeds -> InterfacePrioritizer::sorted_ is reset to false
    6.2 Second call to nr_interface_prioritizer_add_interface() matches the default interface that was added in point 2.1 above, so the call returns R_ALREADY
    6.2.1 ABORT() is called and therefore nr_interface_prioritizer_sort_preference() is not called and InterfacePrioritizer remains unsorted
  7. nr_ice_component_initialize() calls nr_ice_component_initialize_udp(), which calls nr_ice_candidate_create()
    7.1. nr_ice_candidate_create() fails on the call to nr_ice_candidate_compute_priority(), because priority cannot be computed by the InterfacePrioritizer (because InterfacePrioritizer::sorted_ flag is set to false)
    7.2 Firefox fails to gather any UDP candidates -> ICE connection is broken

Expected results:

Candidates should have been gathered and ICE connection should have been established after the ICE restart.

This allows us to continue adding remaining interfaces from addrs array and
makes sure nr_interface_prioritizer_sort_preference() is called at the end.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → WebRTC: Networking
Product: Firefox → Core

Michael, looks like a potential regression from Bug 1548318. Could you please have a look and confirm?

Flags: needinfo?(mfroman)
Regressed by: 1548318
Has Regression Range: --- → yes
Assignee: nobody → mawojtcz

Do you have try access already? If not, I can push this to try for you.

Flags: needinfo?(mawojtcz)

No, I don't have try access. If you could push this to try for me, that would be great, thanks

Flags: needinfo?(mawojtcz)

I can see some gtest failures in the results, for example:
Assertion failure: on, at /builds/worker/checkouts/gecko/media/mtransport/nricectx.cpp:854

I think this is caused by my test calling SetCtxFlags() directly instead of using test_utils_->sts_target()->Dispatch(). I'll try to fix this later today or tomorrow

I didn't get these when running ./mach gtest locally - I guess I didn't have #define DEBUG defined.

I usually have a mozconfig [1] with these options set:
ac_add_options --enable-optimize
ac_add_options --enable-debug
ac_add_options --enable-warnings-as-errors

That will enable assertions. The last I checked, warnings-as-errors was enabled for CI builds, but not local builds, which can lead to some unfortunate surprises.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Configuring_Build_Options

Flags: needinfo?(mfroman)
Priority: -- → P3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: P3 → P2

(In reply to Marcin Wojtczak from comment #7)

I can see some gtest failures in the results, for example:
Assertion failure: on, at /builds/worker/checkouts/gecko/media/mtransport/nricectx.cpp:854

I think this is caused by my test calling SetCtxFlags() directly instead of using test_utils_->sts_target()->Dispatch(). I'll try to fix this later today or tomorrow

I didn't get these when running ./mach gtest locally - I guess I didn't have #define DEBUG defined.

I looked at the mda mochitest failures, and those look like known intermittents, so don't worry about those.

I've updated the phabricator with a new patch that fixes the assert in media/mtransport/nricectx.cpp:854
I guess we need to push to try again?

OK, I see there is 1 more problem: the "Linux 18.04 x64 asan opt GTest" job failed because AddressSanitizer has detected heap-use-after-free

This is actually an issue with existing test code:
The IceTestPeer::remote_ field is a raw pointer, which points to an already deleted object p1_ when p2_ is being deleted in TearDown(). This issue affects all tests that do ICE restart, but it just doesn't happen in them purely because they do ICE restart on p2_ instead of p1_. I don't know if this was done like this by design or by luck, but I've modified my test to also do ICE restart on p2_ now so it should be fine now (phabricator updated).
I guess the proper way to fix this would be to use some kind of weak ptr instead of raw pointer in IceTestPeer::remote_.

I've had a look at all the other failures and they don't seem to be related to my changes at all, so most likely they are existing intermittent issues, but it's probably better if someone more experienced with this code base confirms that.

(In reply to Marcin Wojtczak from comment #13)

I've had a look at all the other failures and they don't seem to be related to my changes at all, so most likely they are existing intermittent issues, but it's probably better if someone more experienced with this code base confirms that.

I've had a look, and can confirm this.

I will go ahead and do one last try push for this, to verify that gtest is working as expected with the changes described in comment 12.

Try looks good.

Did you want to do the honors and land?

Flags: needinfo?(mawojtcz)

Thanks, but I don't think I have the permissions and I'm happy for you to land it.

Flags: needinfo?(mawojtcz) → needinfo?(docfaraday)
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/236044537150
Don't fail when nr_interface_prioritizer_add_interface() returns R_ALREADY r=bwc
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Flags: needinfo?(docfaraday) → in-testsuite+

Thanks for the time spent diagnosing and fixing this!

No problem, glad I could help. Thanks for a quick review and help with landing it.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: