Closed Bug 1654399 Opened 4 years ago Closed 4 years ago

[wfh] Implement RTCDtlsTransport state and onstatechange interface

Categories

(Core :: WebRTC: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox82 --- fixed

People

(Reporter: mjf, Assigned: mjf)

References

(Blocks 1 open bug)

Details

Attachments

(6 files)

This is to track the partial implementation of RTCDtlsTransport (state and onstatechange) for wfh.

Severity: -- → S3
Priority: -- → P2
Assignee: nobody → mfroman
Blocks: 1307996
See Also: → 1654303
See Also: → 1303867

Partial implementation of RTCDtlsTransport (state and onstatechange)
for wfh. Stubbed out methods so everything builds.

See Also: → 1657394

Similar to changes for Bug 1303867 to make sure we destroy the NrIceMediaStream
after removing the transport.

Depends on D85204

Pushed by mfroman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/71349f66e5c6
pt1 - webidl changes for RTCDtlsTransport. r=webidl,smaug
https://hg.mozilla.org/integration/autoland/rev/7574a5f17d08
pt2 - implement RTCDtlsTransport state and onstatechange. r=bwc
https://hg.mozilla.org/integration/autoland/rev/ab571bfa5fe4
pt3 - add rollback support for RTCDtlsTransports. r=bwc
https://hg.mozilla.org/integration/autoland/rev/786d353021bf
pt4 - better close_notify support during renegotiation. r=bwc
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/25337 for changes under testing/web-platform/tests

The upstream PR here never merged, because it seems one of the added tests is intermittently timing out in Chrome [1]. Can you take a look?

[1] https://github.com/web-platform-tests/wpt/pull/25337/checks?check_run_id=1156045136

Flags: needinfo?(mfroman)

I'm looking into this now.

Flags: needinfo?(mfroman)

James, I believe the issue is that we're using addIceCandidate(...) in the newly added test. This method fails with the error message "TypeError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': Candidate missing values for both sdpMid and sdpMLineIndex". When I run other wpt tests that use addIceCandidate in Chrome, they also fail with the same error message. Is there something equivalent to our *.ini files under testing/web-platform/meta/webrtc that indicates a test is expected to fail for Chrome?

Flags: needinfo?(james)

The problem is not expected-to-fail as such but failing intermittently. In particular it looks like restartIce() retains dtls transports is sometimes passing and sometimes timing out, and depending on what happens there it either passes the remaining tests or they don't get run.

If the problem is just that Chrome is unavoidably intermittent here we can merge the PR but ought to file a bug on them. If the problem is actually in the test we should fix it. Upstream wpt doesn't have metadata as such so that isn't something we can adjust here.

Flags: needinfo?(james)

In my experience, it fails every time on Chrome when I run it here on my macOS laptop. I don't find it to be intermittent at all.

Upstream PR was closed without merging

I'll note that now that I've tried it on my Ubuntu box, the test passes on Chrome, even though the same error is displayed at the top of the test page. That may be the "intermittent" nature, that it is passing (w/ an error displayed) on Ubuntu, but timing out (w/ an error displayed) on macOS.

https://github.com/web-platform-tests/wpt/pull/25337/checks?check_run_id=1360850081 is the retrigger, which shows a similar but not identical pattern. Note that this is all on Linux.

Did you try locally with ./mach wpt --product chrome --verify /webrtc/RTCDtlsTransport-state.html, which should approximately replicate the behaviour of the CI? (or ./wpt run --verify chrome /webrtc/RTCDtlsTransport-state.html if you have a GitHub checkout).

Pushed by mfroman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/79e739e76a26
pt5 - fix wpt test to better deal with Chrome timeouts. r=bwc

Looks like it's still unstable :(

I tried adding a long timeout, but it still fails locally in Chrome. In the failing cases I see

RTCDtlsTransport-state.html:97 Uncaught (in promise) TypeError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': Candidate missing values for both sdpMid and sdpMLineIndex
    at RTCPeerConnection.pc1.onicecandidate (RTCDtlsTransport-state.html:97)
pc1.onicecandidate @ RTCDtlsTransport-state.html:97
RTCDtlsTransport-state.html:98 Uncaught (in promise) TypeError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': Candidate missing values for both sdpMid and sdpMLineIndex
    at RTCPeerConnection.pc2.onicecandidate (RTCDtlsTransport-state.html:98)
pc2.onicecandidate @ RTCDtlsTransport-state.html:98

although I don't know if that also happens in the passing cases.

If this is really just flake in Chrome itself I can admin merge the PR.

The flake report originally wasn't for that test (RTCDtlsTransport-state.html) but for RTCPeerConnection-restartIce.https.html. That is the test I fixed.

For the record, it seems Chrome doesn't like the simpler form that we typically use:

pc2.addIceCandidate(e.candidate)

Chrome seems to insist on:

pc2.addIceCandidate({
        candidate: e.candidate.candidate,
        sdpMid: e.candidate.sdpMid
      });

When I run ./wpt run -y --verify --verify-log-full --binary /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome chrome ./webrtc/RTCDtlsTransport-state.html it passes, but I think that is because every test fails the same way on Chrome. Let me see if making that change from Comment 22 in RTCDtlsTransport-state.html helps since it was part of the change I made for RTCPeerConnection-restartIce.https.html.

Making the change from Comment 22 eliminates those error messages, but it consistently times out on every test running in Chrome (still passing on the verify run). I'll try to find out where the timeout is happening.

The timeout in the wpt on Chrome is happening here[1] waiting for a call to stoppedTransceiver.receiver.track.onended. If I run a wpt that does something very similar (RTCRtpTransceiver.https.html), it also stalls at the same point, waiting for onended to be called[2].

To me, this looks like Chrome hasn't implemented stopping a transceiver, as evidenced here[3] and here[4].

I will push a change to that RTCDtlsTransport-state.html that fixes the error messages, but it will not fix the consistent timeout.

[1] https://searchfox.org/mozilla-central/rev/50215d649d4854812837f1343e8f47bd998dacb5/testing/web-platform/tests/webrtc/RTCDtlsTransport-state.html#129
[2] https://searchfox.org/mozilla-central/rev/50215d649d4854812837f1343e8f47bd998dacb5/testing/web-platform/tests/webrtc/RTCRtpTransceiver.https.html#1240
[3] https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpTransceiver/stop
[4] https://stackoverflow.com/questions/59650954/how-to-circumvent-missing-stop-implementation-of-rtcrtptransceiver-in-chrome

Flags: needinfo?(james)

Make these calls more acceptable to Chrome.

Depends on 96662

Pushed by mfroman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/feab772596ad
pt6 - improve RTCDtlsTransport-state.html addIceCandidate calls. r=bwc

https://github.com/web-platform-tests/wpt/pull/25337/checks?check_run_id=1391326755 is looking better in Chrome, but not yet perfect.

If the conclusion here is that the instability is just a Chrome bug I can just merge the PR (although it would be great if we also filed a Chromium issue so that they can look at the bug; can you do that?)

Flags: needinfo?(james) → needinfo?(mfroman)
Regressions: 1681025
Upstream PR merged by jgraham

(In reply to Michael Froman [:mjf] from comment #12)

"TypeError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': Candidate missing values for both sdpMid and sdpMLineIndex"

This is https://crbug.com/978582 but should be unrelated to the timeout.

(In reply to Michael Froman [:mjf] from comment #22)

For the record, it seems Chrome doesn't like the simpler form that we typically use:

pc2.addIceCandidate(e.candidate)

Chrome seems to insist on:

pc2.addIceCandidate({
        candidate: e.candidate.candidate,
        sdpMid: e.candidate.sdpMid
      });

It's not insisting on that, but rather on the line ahead of it:

if (e.candidate) {

I'm trying to push on Chrome to fix this, so we can update these tests to spec, and even fail tests over them instead of leaving them to be discovered in logs.

Flags: needinfo?(mfroman)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: