Closed Bug 1240897 Opened 4 years ago Closed 9 months ago

Firefox incorrectly generates "a=setup" line in answer when negotiated DTLS role is "passive".

Categories

(Core :: WebRTC: Signaling, defect, P3)

43 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: deadbeef, Assigned: bwc)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36

Steps to reproduce:

1. Create two RTCPeerConnection objects (let's call them "pc1" and "pc2")
2. Create a data channel for pc1.
3. Create an offer from pc1 and create an answer from pc2, calling setLocalDescription/setRemoteDescription as needed.
4. Wait until the connection is established (exchanging ICE candidates as necessary).
5. Now do the reverse of step #3: Create an offer from pc2, and answer from pc1.


Actual results:

In the first offer/answer exchange, "pc2" takes the DTLS role of "active" while pc1 takes the role of passive.

However, in the second offer/answer exchange, pc1's answer contains "a=setup:active", indicating that it's switching to the "active" role.


Expected results:

pc1's answer should have contained "a=setup:passive", since the first offer/answer exchange established that pc1 takes the "passive" DTLS role and pc2 takes the "active" role.

You can reproduce the issue using this page: https://jsfiddle.net/7x44dwjr/2/

Simply look in the console log, and notice that the pc1 renegotiation answer contains "a=setup:active"..
Component: Untriaged → WebRTC: Signaling
Product: Firefox → Core
First of all thank you very much for providing a jsfiddle with your bug report! Awesome.

Unfortunately RFC 4145 (https://tools.ietf.org/html/rfc4145#section-4) does not take renegotiation into consideration, but I think the important fact here is that pc2 in the re-offer sends 'a=setup:actpass' and that is why pc1 then chooses 'active' again like for an initial offer answer exchange.
The following is in the jsep draft (in 5.6.3):

DTLS setup value, which MUST be set according to the rules
specified in [RFC5763], Section 5, and MUST be consistent with
the selected role of the current DTLS connection, if one
exists.[TODO: may need revision, i.e., use of actpass
RFC 4145 does take renegotiation into consideration; see https://tools.ietf.org/html/rfc4145#section-7.3. Also, the whole purpose of the "connection" attribute in RFC 4145 is renegotiation.

It would be desirable if a WebRTC endpoint could offer "a=setup:passive" or "a=setup:active", but RFC 5763 requires that offers use "a=setup:actpass" (https://tools.ietf.org/html/rfc5763#section-5). I'm not sure if this is an oversight, but it's what both Chrome and Firefox have implemented.

So, it's up to the answerer to ensure that the correct role is selected. And "MUST be consistent with the selected role of the current DTLS connection" would seem to indicate that Firefox should use "passive" in this situation.
(In reply to Taylor Brandstetter from comment #3)
> RFC 4145 does take renegotiation into consideration; see
> https://tools.ietf.org/html/rfc4145#section-7.3. Also, the whole purpose of
> the "connection" attribute in RFC 4145 is renegotiation.

Interesting. Unfortunately does not seem to say much about the setup attribute in case of renegotiation. And RFC 5763 explicitly forbids the connection attribute:

      The endpoint MUST NOT use the connection attribute defined in [RFC4145].

> It would be desirable if a WebRTC endpoint could offer "a=setup:passive" or
> "a=setup:active", but RFC 5763 requires that offers use "a=setup:actpass"
> (https://tools.ietf.org/html/rfc5763#section-5). I'm not sure if this is an
> oversight, but it's what both Chrome and Firefox have implemented.

Agreed. But the the first bullet point in RFC 5763 section 5 has pretty strong language that the offerer has to use actpass, and why they recommend that the answerer should choose active and not passive (to avoid the extra round trip if he chooses passive).
 
> So, it's up to the answerer to ensure that the correct role is selected. And
> "MUST be consistent with the selected role of the current DTLS connection"
> would seem to indicate that Firefox should use "passive" in this situation.

I would wish that the JSEP draft more clearly overwrites what RFC 5763 seems to demand in this case.

Now the real question for me is: do you encounter any problem with the current way Firefox behaves?

I'm asking because my assumption would be that in a renegotiation scenario we should just keep on using the existing connection and not touch ICE or DTLS if we can avoid it. In other words: "don't do another round of ICE checks and DTLS handshakes".
If we can agree on that the follow up question to me would be: should we either
A) "silently" agree on re-using (not doing another round of DTLS handshakes) then it sounds reasonable to stick to previously negotiated DTLS roles as they are going be ignored any way.
B) have a special attribute or value which indicates that we are not going through another round of DTLS handshakes, e.g. remove the setup attribute or set it to a new value of 'keeproles'
> Now the real question for me is: do you encounter any problem with the
> current way Firefox behaves?

There's no blocking technical problem, no. Chrome could simply ignore the "setup" attribute if there's already a working DTLS connection. However, doing that would both hide a spec compliance issue, and be a spec compliance issue in itself (see below).

> I'm asking because my assumption would be that in a renegotiation scenario
> we should just keep on using the existing connection and not touch ICE or
> DTLS if we can avoid it. In other words: "don't do another round of ICE
> checks and DTLS handshakes".

Yes; if possible it would be preferable to re-use the existing DTLS connection.

> If we can agree on that the follow up question to me would be: should we
> either
> A) "silently" agree on re-using (not doing another round of DTLS handshakes)
> then it sounds reasonable to stick to previously negotiated DTLS roles as
> they are going be ignored any way.
This would go against RFC 5763 (https://tools.ietf.org/html/rfc5763#section-6.6):

"Note that if the active/passive status of the endpoints changes, a new connection MUST be established."

However, since JSEP assures that the active/passive status *won't* change (without the fingerprint changing), Chrome throws an error rather than establishing a new connection.

> B) have a special attribute or value which indicates that we are not going
> through another round of DTLS handshakes, e.g. remove the setup attribute or
> set it to a new value of 'keeproles'
We don't need that, because we already have an existing mechanism. JSEP section 5.9: 

   o  If the remote DTLS fingerprint has been changed, tear down the
      existing DTLS connection.

   o  If no valid DTLS connection exists, prepare to start a DTLS
      connection, using the specified roles and fingerprints, on any
      underlying ICE components, once they are active.

So, if an endpoint wants to do another DTLS handshake, it must change its fingerprint. This also allows it to change the role. What's *not* allowed is changing the role without changing the fingerprint.
Status: UNCONFIRMED → NEW
Rank: 29
Ever confirmed: true
Priority: -- → P2
As a temporary workaround should it work if in the second offer/answer exchange the "a=setup:active" is replaced by "a=setup:passive", or is that too simple thought?
Too by more specific, replace the "a=setup:active" line in the sdp string by "a=setup:passive".
Yes, replacing "active" with "passive" in the answer before calling setRemoteDescription should work. However, you would need to be careful to only replace "active" with "passive" when appropriate. I.e., when the current local description has a role of "active" in the corresponding media section.
Thanks, looks like it works between Canaray and Firefox now as I don't get the error. Although there is still a problem with webcam streams. Sometimes it just doesn't start or get kicked out. Different things happen depending on the order of events (who starts a datachannel/stream). But it might be some audio codec interop thing as Canaray may give the following error message:

Failed to set local answer sdp: Session error code: ERROR_CONTENT. Session error description: Failed to set local audio description recv parameters..
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3

I believe this issue still relevant, and impacting interoperability. It's kind of hard to test in a succinct way (or perhaps I'm just not seasoned enough with WebRTC to know how to do this easily), but given this gist https://gist.github.com/argggh/677154f5001f1407c1f08e3d95e10de5 (run under node 10, with chromedriver@2.46.0, geckodriver@1.16.0 and selenium-webdriver@3.6.0, I observe the following behavior (output heavily edited):

*** Browser 1: Mozilla/5.0 (X11; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0
*** Browser 2: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36
*** Initial offer from browser 1:
v=0
o=mozilla...THIS_IS_SDPARTA-67.0 37961081938154012 0 IN IP4 0.0.0.0
m=video 9 UDP/TLS/RTP/SAVPF 120 121
c=IN IP4 0.0.0.0
a=sendrecv
a=mid:0
a=msid:{e9b220ff-4c03-4386-9a56-cf0ac5d2e4f5} {a510f115-3beb-4ec9-b993-c1ffe73d77f2}
a=setup:actpass

*** Initial answer from browser 2:
v=0
o=- 6376292017037794710 2 IN IP4 127.0.0.1
m=video 9 UDP/TLS/RTP/SAVPF 120 121
a=setup:active
a=mid:0
a=recvonly

*** Connected
*** Add track offer from browser 2:
v=0
o=- 6376292017037794710 3 IN IP4 127.0.0.1
m=video 52309 UDP/TLS/RTP/SAVPF 120 121 97 99 100 101 102 122 127 119 125 107 108 109 124 118 123
a=setup:actpass
a=mid:0
a=sendrecv
a=msid:7cd8256b-dab0-4c0b-a65b-192c0d1b9b0a 88fc9367-ef2b-4d05-b0a9-620c100bfc47

*** Final answer from browser 1:
v=0
o=mozilla...THIS_IS_SDPARTA-67.0 37961081938154012 1 IN IP4 0.0.0.0
m=video 39126 UDP/TLS/RTP/SAVPF 120
a=sendrecv
a=mid:0
a=msid:{e9b220ff-4c03-4386-9a56-cf0ac5d2e4f5} {a510f115-3beb-4ec9-b993-c1ffe73d77f2}
a=setup:active

*** Result: Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': Failed to set remote answer sdp: Failed to apply the description for 0: Failed to set SSL role for the transport.

So it seems Chrome 73 is unhappy that Firefox has switched to a=setup:active in its final answer. You can play around with browser combinations. Running the test chrome-chrome results in "OK", and you can see that the final answer contains a=setup:passive.

Ok, there's two bits of silliness here. The first is the reoffer using actpass (it should be using active, although the JSEP spec is totally silent about a=setup in reoffers). The second is Firefox choosing active when there's already a DTLS transport negotiated. Let's just go ahead and fix this.

Assignee: nobody → docfaraday

The more I look at this, the more this looks underspecified. draft-jsep says nothing about a=setup for reoffers whatsoever, which is where most of the complexity lies:

  • If not restarting ICE, must we reoffer the same role we negotiated last time?
  • If reoffers must use the same role (unless restarting ICE), what about new m-sections that aren't bundle-only, and might end up with their own transport? Do those need to follow the example of the previously established transport, or can they go the other way?
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75fba0356211
Part 0: Test that we do not gratuitously switch DTLS roles on renegotiation. r=mjf
https://hg.mozilla.org/integration/autoland/rev/7899cc839c4f
Part 1: When creating an answer, do not allow DTLS role to change unless ICE is restarting. r=mjf
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

That's impressive turnaround. Thanks a lot, @bwc.

You need to log in before you can comment on or make changes to this bug.