Open Bug 890931 Opened 11 years ago Updated 2 years ago

DTLS message reassembly & retransmission could be better

Categories

(Core :: WebRTC: Networking, defect, P5)

22 Branch
x86
Windows 7
defect

Tracking

()

People

(Reporter: akvakh, Unassigned)

Details

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:22.0) Gecko/20100101 Firefox/22.0 (Beta/Release)
Build ID: 20130618035212

Steps to reproduce:

Try to establish peerconnection, which involves DTLS handshake, on LTE network


Actual results:

In most networks, DTLS packets are being received in correct order, which results in correct session establishment.
But some networks (for example, LTE) can often change packet transmission order. In this case Firefox doesn't sort them according to sequence number, and whole flight is discarded. Since there is no guarantee that packets will be received in correct order  (UDP doesn't provide such mechanism), whole session can fail, although all packets were transmitted successfully.


Expected results:

DTLS Packets should have been sorted according to their sequence numbers, whole message should have been reconstructed, and DTLS handshake should have continued
Summary: DTLS messages not combined correctly if packets are received in wrong order → WebRTC: DTLS messages not combined correctly if packets are received in wrong order
Do you think you could provide this as regular PCAP? I would like to look at it using tcpdump.
Attachment #772077 - Attachment is obsolete: true
This is packet trace of DTLS handshake with same server software sending same certificate, but packets are received in correct order. So certificate itself and all packets individually seem to be correct.
Component: Untriaged → WebRTC: Networking
Product: Firefox → Core
Is this the entire trace? Both sides should be retransmitting their
flights.

The DTLS stack currently does ignore out of order HS messages, so what
should happen is that it should accept the ServerHello and Certificate
but ignore the ServerHelloDone and CertificateRequest because they come
after the Certificate is complete. Then, on the retransmit, it should
accept them.
Yes, it's the entire trace. The original one was bigger, but I extracted DTLS packets from it. For further 30 seconds there were no retransmission from  both sides.

We're working now on fixing it on our side (adding server retransmissions). I'll report if it makes any difference.

But there's no guarantee that retransmitted fligths will come in correct order. Will Firefox DTLS stack ignore them as well?
What it does it it processes any in-order packet, so if you get say

1 2 4 3

It stores and processes 1 2 3

Then when 1 2 4 3 is retransmitted it processes 4
BTW, I believe that Firefox is behaving correctly here, modulo the fact
that it could try harder to reassemble. I.e., it should not be retransmitting
since the fact that the ServerHello got through means that the client
(Firefox's) data got through. it's the server's job to retransmit.
(In reply to Eric Rescorla (:ekr) from comment #6)
> What it does it it processes any in-order packet, so if you get say
> 
> 1 2 4 3
> 
> It stores and processes 1 2 3
> 
> Then when 1 2 4 3 is retransmitted it processes 4

That's great news. Everything should work fine at last.
(In reply to Eric Rescorla (:ekr) from comment #7)
> BTW, I believe that Firefox is behaving correctly here, modulo the fact
> that it could try harder to reassemble. I.e., it should not be retransmitting
> since the fact that the ServerHello got through means that the client
> (Firefox's) data got through. it's the server's job to retransmit.

I'm not sure about that. RFC states that 
"Partial reads (whether partial messages or only some of the messages in the
      flight) do not cause state transitions or timer resets."

So, if I get it correctly, it should keep retransmitting until it gets all packets from particular flight (since it drops packet 4 in your example).
Huh. You're right it does say that. I will try to remember why we wrote that,
given my reasoning above.
(In reply to akvakh from comment #9)
> (In reply to Eric Rescorla (:ekr) from comment #7)
> > BTW, I believe that Firefox is behaving correctly here, modulo the fact
> > that it could try harder to reassemble. I.e., it should not be retransmitting
> > since the fact that the ServerHello got through means that the client
> > (Firefox's) data got through. it's the server's job to retransmit.
> 
> I'm not sure about that. RFC states that 
> "Partial reads (whether partial messages or only some of the messages in the
>       flight) do not cause state transitions or timer resets."
> 
> So, if I get it correctly, it should keep retransmitting until it gets all
> packets from particular flight (since it drops packet 4 in your example).

Anyway, it doesn't say "must not", so in the real world both approaches seem to be quite reasonable.
(In reply to Eric Rescorla (:ekr) from comment #6)
> What it does it it processes any in-order packet, so if you get say
> 
> 1 2 4 3
> 
> It stores and processes 1 2 3
> 
> Then when 1 2 4 3 is retransmitted it processes 4

We implemented server retransmissions on our side, and it helps to avoid this issue.
Changing title to reflect the remaining issue.
Status: UNCONFIRMED → NEW
backlog: --- → webRTC+
Rank: 42
Ever confirmed: true
Priority: -- → P4
Summary: WebRTC: DTLS messages not combined correctly if packets are received in wrong order → DTLS message reassembly & retransmission could be better
Mass change P4->P5 to align with new Mozilla triage process.
Priority: P4 → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: