Closed Bug 1646715 Opened 2 years ago Closed 1 year ago

DTLS application data packets sometimes exceed MTU significantly

Categories

(Core :: WebRTC: Networking, defect, P3)

77 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox80 --- fixed

People

(Reporter: lgrahl, Assigned: lgrahl)

References

Details

Attachments

(2 files)

Every now and then, when sending a lot of data via a data channel, a DTLS packet of size ~16 KiB is being sent on the wire which obviously goes beyond the typically acceptable MTU.

This breaks interop with libwebrtc/Chrome which seems not to be able to handle DTLS application data packets of that size (it logs "short DTLS read. flushing" - probably just doesn't read all of it). In such a case, libwebrtc/Chrome closes the DTLS transport. I'd imagine interop isn't the main problem here though since it shouldn't send packets greater than the MTU anyway.

CCing Michael since this could actually be a usrsctp issue, potentially handing out very large SCTP packets. Have you seen anything like that before?

Flags: needinfo?(tuexen)

Related to MTU issues, I'm only aware of https://github.com/sctplab/usrsctp/issues/410. But responsiveness is not good.

Do you have a way to reproduce this? Maybe one could connect two machines with an MTU of 9000 bytes and configure in usrsctp a smaller MTU. Then one could observe if one sees larger packets... Haven't done this yet.

Flags: needinfo?(tuexen)

It's quite rare (sometimes happens in a session after ~10 mins while permanently sending large files) and I haven't found a reliable way to reproduce it. I'll try to extract the SCTP packets and will send you a trace if it turns out that there are very large SCTP packets in them.

Also, I'm not sure what usrsctp commit Firefox currently builds on. Might be worth bumping.

Severity: -- → S3
Priority: -- → P3

Alright, I can somewhat reliably reproduce this and it's definitely an issue for large data transfer. It does look like the large packet is indeed coming from usrsctp and all I need to do to trigger it is send as much data to fill the buffer of the remote peer. In our case, the remote peer (the Threema app) is uploading data elsewhere in a blocking manner which will therefore completely stall the flow of incoming data. Michael, I will send you a packet trace and Firefox log in private since this may contain sensitive information (even though I've done this on a test device).

OK, I see. It is definitely a problem in usrsctp. Are you using the latest usrsctp sources? Can you reproduce that with a simpler application than Firefox? Like with you webrtc implementation or with usrsctp only? That would simplify things...

I've bumped to master (ea345b6d0c8a0f8701cf49445dba5ec8d34e2305) and that resolved the issue for me. So, it looks like this was a usrsctp issue. I'll provide a patch.

Nils, can you suggest an appropriate reviewer? :)

Flags: needinfo?(drno)

My assessment on severity is this: Even though it's not very easy to reproduce, it's consistently reproducible when transferring 10-50 MiB to a remote peer who is having a worse network connectivity than the sender. IMO, this makes it very severe for any application that is heavily relying on data channel transmission.

Duplicate of this bug: 1613889

Nils is out until next week. I'd suggest bwc as a reviewer.

Flags: needinfo?(drno)
Assignee: nobody → lennart.grahl
Status: NEW → ASSIGNED

The try runs in comment 13 - 15 don't look green enough. Byron could you try to help Lennart getting these resolved?

Flags: needinfo?(docfaraday)

(In reply to Nils Ohlmeier [:drno] from comment #16)

The try runs in comment 13 - 15 don't look green enough. Byron could you try to help Lennart getting these resolved?

That's bug 1649855, so not something we need to worry about here. The wpt can be found in comment 14. There are lots of oranges in comment 15, but those look like known intermittents on tests that do not involve DataChannels. I would say that try looks ok.

Flags: needinfo?(docfaraday)

Lennart, anything else you need to do before I land this?

Flags: needinfo?(lennart.grahl)

Should be good to go.

Once https://github.com/sctplab/usrsctp/pull/417 is resolved/merged, we should bump again.

Flags: needinfo?(lennart.grahl)
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/006350f1e363
Simplify usrsctp build file and remove unused Android patch r=bwc
https://hg.mozilla.org/integration/autoland/rev/79f69bdc105a
Bump usrsctp version to ea345b6d0c8a0f8701cf49445dba5ec8d34e2305 r=bwc
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.