Closed Bug 1165423 Opened 9 years ago Closed 9 years ago

[e10s] WebRTC DTLS handshake problems with e10s enabled

Categories

(Core :: WebRTC: Audio/Video, defect)

40 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: ehugg, Assigned: ehugg)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 1 obsolete file)

WebRTC connection works fine without e10s enabled but will not get to "SSL handshake complete" with e10s.  Attaching mtransport logs.
From a test that works with e10s off.
mtransport log from failed test with e10s on.
This test was done using Spark and has been 100% repeatable.  Have not seen this problem connecting peer to other FF or Chrome.
This seems like a regression, right? I believe I've had calls with WebRTC and E10S.
Not with Spark.  Check the log, the incoming packet is 1636 on the success run but gets truncated to 1500 with e10s perhaps.  Still investigating.
Sounds like there may be two bugs:

1. E10S truncating.
2. Your side failing to back off its MTU estimate.
Assignee: nobody → ethanhugg
Status: NEW → ASSIGNED
Comment on attachment 8606475 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer

Review of attachment 8606475 [details] [diff] [review]:
-----------------------------------------------------------------

This hack fixes the handshake problem with Spark.
Going back a long time...
* DTLS/e10s/etc acts like a fixed-size network link with no fragmentation or aggregation of IP packets.  MTU discovery should help here... but it's bad to depend on that to be successful always, especially at the start of a call (slows things down).
* To avoid unnecessary problems, we should make this 'link' larger than any practical MTU we'll see.  Local nets may run jumbo packets of up to perhaps 8K+? 

In practice we likely won't see much above an Ethernet framesize, but per this bug it can happen.
Well we were a bit surprised to see recvfrom return 1636 bytes here:
https://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/pthreads/ptio.c#2047

Is this how it is supposed to work?  Is this new with e10s?  I think other places in the code might be surprised to get more than one packet from a recvfrom call.
Tim points out to me that you might have IP fragmentation
Right, the 1636-byte packet may have been sent by spark and automatically fragmented.  It would be reassembled (typically) and delivered to the application via recvfrom().  If it then passes through a 1500-max-size e10s IPC channel (which from the comments here is what I presume the channel is limited to currently), then it will get truncated or dropped depending on the logic.

To determine the MTU and and block fragmentation it needs to use the DF bit/Path MTU Discovery/etc.

This is why I indicated the e10s IPC channel should allow for packets as large as the largest likely fragmented packet we'd receive (or the largest Jumbo GigE packet).
Attachment #8606475 - Attachment is obsolete: true
Comment on attachment 8606576 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer


OK, I misunderstood IP fragmentation so it looks like we just need a larger buffer.  Using 8K.

Also, in the error case the remainder 136 bytes appear to have been discarded every time.  It just took the first 1500 each time.

Not sure who should review this so assigning to Randell is hopes that he knows.
Attachment #8606576 - Flags: review?(rjesup)
Comment on attachment 8606576 [details] [diff] [review]
WebRTC Fix DTLS handshake by expanding UDP buffer

Review of attachment 8606576 [details] [diff] [review]:
-----------------------------------------------------------------

8K is safe, if perhaps overly large - that can matter a little on B2G, and B2G never deals with Jumbo packets (and no one should really be sending 8K UDP packets if they're smart).  I'm fine with landing this, but would suggest we ifdef the size to be smaller for b2g - but still >1500.  2048 is likely a reasonable size for b2g.  This is optional, as the 8K size is just stack space, and may have limited impact on actual stack use; hard to tell if this is on the "critical path" for stack depth.  If it isn't or close to, then 2K or 8K doesn't really matter.

Net: let's land this as-is and file a followup bug with [memshrink] to consider reducing the size for B2G
Attachment #8606576 - Flags: review?(rjesup) → review+
Blocks: e10s-webrtc
Blocks: 1165546
Added memshrink followup as bug 1165546
Marking as checkin-needed.  Try run in comment #17.
Keywords: checkin-needed
Please note that this reflects a bug in your code, which should be reducing the packet size on retransmit (See RFC 6347 S 4.1.1). Also, I would advise starting with a smaller packet size.
https://hg.mozilla.org/mozilla-central/rev/6dd7de1d3027
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Depends on: 1170319
No longer depends on: 1170319
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: