[e10s] WebRTC DTLS handshake problems with e10s enabled

RESOLVED FIXED in Firefox 41

Status

()

Core
WebRTC: Audio/Video
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: ehugg, Assigned: ehugg)

Tracking

(Blocks: 2 bugs)

40 Branch
mozilla41
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 fixed)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Assignee)

Description

3 years ago
WebRTC connection works fine without e10s enabled but will not get to "SSL handshake complete" with e10s.  Attaching mtransport logs.
(Assignee)

Comment 1

3 years ago
Created attachment 8606421 [details]
mtransportlognoe10s.txt.gz


From a test that works with e10s off.
(Assignee)

Comment 2

3 years ago
Created attachment 8606422 [details]
mtransportloge10s.txt.gz


mtransport log from failed test with e10s on.
(Assignee)

Comment 3

3 years ago
This test was done using Spark and has been 100% repeatable.  Have not seen this problem connecting peer to other FF or Chrome.

Comment 4

3 years ago
This seems like a regression, right? I believe I've had calls with WebRTC and E10S.
(Assignee)

Comment 5

3 years ago
Not with Spark.  Check the log, the incoming packet is 1636 on the success run but gets truncated to 1500 with e10s perhaps.  Still investigating.

Comment 6

3 years ago
Sounds like there may be two bugs:

1. E10S truncating.
2. Your side failing to back off its MTU estimate.
(Assignee)

Comment 7

3 years ago
Created attachment 8606475 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer
(Assignee)

Updated

3 years ago
Assignee: nobody → ethanhugg
Status: NEW → ASSIGNED
(Assignee)

Comment 8

3 years ago
Comment on attachment 8606475 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer

Review of attachment 8606475 [details] [diff] [review]:
-----------------------------------------------------------------

This hack fixes the handshake problem with Spark.
Going back a long time...
* DTLS/e10s/etc acts like a fixed-size network link with no fragmentation or aggregation of IP packets.  MTU discovery should help here... but it's bad to depend on that to be successful always, especially at the start of a call (slows things down).
* To avoid unnecessary problems, we should make this 'link' larger than any practical MTU we'll see.  Local nets may run jumbo packets of up to perhaps 8K+? 

In practice we likely won't see much above an Ethernet framesize, but per this bug it can happen.
(Assignee)

Comment 10

3 years ago
Well we were a bit surprised to see recvfrom return 1636 bytes here:
https://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptio.c#2047

Is this how it is supposed to work?  Is this new with e10s?  I think other places in the code might be surprised to get more than one packet from a recvfrom call.
Tim points out to me that you might have IP fragmentation
Right, the 1636-byte packet may have been sent by spark and automatically fragmented.  It would be reassembled (typically) and delivered to the application via recvfrom().  If it then passes through a 1500-max-size e10s IPC channel (which from the comments here is what I presume the channel is limited to currently), then it will get truncated or dropped depending on the logic.

To determine the MTU and and block fragmentation it needs to use the DF bit/Path MTU Discovery/etc.

This is why I indicated the e10s IPC channel should allow for packets as large as the largest likely fragmented packet we'd receive (or the largest Jumbo GigE packet).
(Assignee)

Comment 13

3 years ago
Created attachment 8606576 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer
(Assignee)

Updated

3 years ago
Attachment #8606475 - Attachment is obsolete: true
(Assignee)

Comment 14

3 years ago
Comment on attachment 8606576 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer


OK, I misunderstood IP fragmentation so it looks like we just need a larger buffer.  Using 8K.

Also, in the error case the remainder 136 bytes appear to have been discarded every time.  It just took the first 1500 each time.

Not sure who should review this so assigning to Randell is hopes that he knows.
Attachment #8606576 - Flags: review?(rjesup)
Comment on attachment 8606576 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer

Review of attachment 8606576 [details] [diff] [review]:
-----------------------------------------------------------------

8K is safe, if perhaps overly large - that can matter a little on B2G, and B2G never deals with Jumbo packets (and no one should really be sending 8K UDP packets if they're smart).  I'm fine with landing this, but would suggest we ifdef the size to be smaller for b2g - but still >1500.  2048 is likely a reasonable size for b2g.  This is optional, as the 8K size is just stack space, and may have limited impact on actual stack use; hard to tell if this is on the "critical path" for stack depth.  If it isn't or close to, then 2K or 8K doesn't really matter.

Net: let's land this as-is and file a followup bug with [memshrink] to consider reducing the size for B2G
Attachment #8606576 - Flags: review?(rjesup) → review+
(Assignee)

Updated

3 years ago
Blocks: 849746
(Assignee)

Updated

3 years ago
Blocks: 1165546
(Assignee)

Comment 16

3 years ago
Added memshrink followup as bug 1165546
(Assignee)

Comment 18

3 years ago
Marking as checkin-needed.  Try run in comment #17.
Keywords: checkin-needed
Please note that this reflects a bug in your code, which should be reducing the packet size on retransmit (See RFC 6347 S 4.1.1). Also, I would advise starting with a smaller packet size.
https://hg.mozilla.org/mozilla-central/rev/6dd7de1d3027
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox41: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
(Assignee)

Updated

3 years ago
Depends on: 1170319
(Assignee)

Updated

3 years ago
No longer depends on: 1170319
You need to log in before you can comment on or make changes to this bug.