Closed Bug 1009227 Opened 5 years ago Closed 5 years ago

DTLS connection establishment failure

Categories

(NSS :: Libraries, defect, P1)

3.14
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.16.2

People

(Reporter: ekr, Assigned: ekr)

Details

Attachments

(4 files, 3 obsolete files)

If a ChangeCipherSpec is sent in a separate packet from earlier parts of one side's second flight, and then the earlier packet is lost or reordered with respect to the CCS, we generate an error because of being in an unexpected state.

CCS does not come with a handshake message sequence
number and so it gets processed right away by ssl3_HandleChangeCipherSpecs
which detects that we are not yet in state wait_change_cipher and generates
an unexpected message alert:


   if (ws != wait_change_cipher) {
	(void)SSL3_SendAlert(ss, alert_fatal, unexpected_message);
	PORT_SetError(SSL_ERROR_RX_UNEXPECTED_CHANGE_CIPHER);
	return SECFailure;
    }

One fix here is to special-case the logic here for DTLS and ignore
any CCS packets that arrive out of order rather than to generate an error.
That will not recover immediately because the other side must then
retransmit, but it will fix the immediate problem. Outside of this
bug, WTC suggested that he had a better fix in mind.
Eric: I was wrong. We also need your fix. My patch cannot handle
the case where CCS is the first message of an out-of-order record.
Attachment #8421344 - Flags: review?(ekr)
Comment on attachment 8421344 [details] [diff] [review]
Ignore the rest of the messages in the record if it has an out-of-order message

Review of attachment 8421344 [details] [diff] [review]:
-----------------------------------------------------------------

Looking at the pcap file in https://code.google.com/p/chromium/issues/detail?id=369855
closer, I don't think this patch will even handle that particular sequence of events.
All the DTLS records in that pcap file contain a single handshake or CCS message.

So this patch is at most an optimization for the case where the peer stuffs multiple
handshake messages into a DTLS record. We definitely need your patch.

::: lib/ssl/dtlscon.c
@@ -320,5 @@
>  	    } else if (message_seq > ss->ssl3.hs.recvMessageSeq) {
>  		/* Case 2
>                   *
> -		 * Ignore this message. This means we don't handle out of
> -		 * order complete messages that well, but we're still

Note that I removed "complete". I believe we don't know if
the message is complete at this point.

@@ +325,5 @@
>  		 * compliant and this probably does not happen often
>                   *
>  		 * XXX OK for now. Maybe do something smarter at some point?
>  		 */
> +		break;

Should we set rv to SECSuccess explicitly before this break
statement?
Wan-Teh,

How do you want to proceed here? My sense is that my proposed
patch would fix the bug here and that this code is a shortcut
that immediately discards data which would be discarded in any
case. Do you want to do both patches?
Eric:

We need your proposed patch. You're right that my patch is
just a shortcut. We can ignore my patch.
OK. I will try to prepare something in the next week or so (I am at a meeting this week)
Attached patch part 2: Unit test (obsolete) — Splinter Review
Assignee: nobody → ekr
Status: NEW → ASSIGNED
Attached patch part 1: NSS fix (obsolete) — Splinter Review
Wan-Teh,

I have attached a patch along the lines we discussed. It checks for CCS out of order and simply ignores it. I wanted to get your opinion on several points.

1. I don't sanity check that the CCS is correct, since we are ignoring it anyway. I believe this is consistent with our general practice, but I wanted to get your opinion.

2. I decided to return SECSuccess. This is what we do with out of order handshake messages so it should be safe. See:
http://dxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/dtlscon.c#327

which breaks and thus returns the default value of SECSuccess set in:
http://dxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/dtlscon.c#187

Since dtls_HandleHandshake and dtls_HandleChangeCipherSpecs are called from adjacent places:

http://dxr.mozilla.org/mozilla-central/source/security/nss/lib/ssl/ssl3con.c#11603

Having them behave similarly on re-ordered packets makes sense.


3. I had to add code to set buf->len = 0 because it's expected that we will consume the
entire buffer. Failure to do so causes an assert later. Does this seem OK?

4. I have included a unit test in the C++ WebRTC mtransport unit test suite. It simply
injects a bogus CCS into the handshake and makes sure it's ignored. You can verify
that the test fails without patch 1 and succeeds with it. I would rather have this in
an NSS suite but it's unfortunately not that easy to add there and it's quite easy to
add one here. I've f?ed you there but you don't need to give it a complete review since
I can have Martin review it.
Attachment #8432008 - Flags: review?(martin.thomson)
Attachment #8432008 - Flags: feedback?(wtc)
Attachment #8432010 - Flags: review?(wtc)
Comment on attachment 8421344 [details] [diff] [review]
Ignore the rest of the messages in the record if it has an out-of-order message

Review of attachment 8421344 [details] [diff] [review]:
-----------------------------------------------------------------

this patch is overtaken by events, so clearing r?
Attachment #8421344 - Flags: review?(ekr)
Comment on attachment 8432008 [details] [diff] [review]
part 2: Unit test

Review of attachment 8432008 [details] [diff] [review]:
-----------------------------------------------------------------

This needs another round.

::: media/mtransport/test/transport_unittests.cpp
@@ +88,5 @@
> +  // Which starts with a certificate message.
> +  virtual bool Inspect(TransportLayer* layer,
> +                       const unsigned char *data, size_t len) {
> +    unsigned char ccs[] = {0x14,  // Type
> +                           0xfe, 0xff, // Version 

trailing space

@@ +131,5 @@
>  
> +    if (inspector_ && !inspector_->Inspect(this, data, len)) {
> +      MOZ_MTLOG(ML_NOTICE, "Dropping packet because of inspector");
> +      ++packet_;
> +      return len;

This is a fair bit of generality given that this inspector doesn't actually cause packets to be dropped ever.  So this is dead code.

@@ +138,4 @@
>      if (loss_mask_ & (1 << (packet_ % 32))) {
>        MOZ_MTLOG(ML_NOTICE, "Dropping packet");
>        ++packet_;
>        return len;

If you want generality, why not go all the way and have the loss mask as a separate inspector?

@@ +179,5 @@
>  
>   private:
>    uint32_t loss_mask_;
>    uint32_t packet_;
> +  Inspector* inspector_;

Where's the destructor?
Attachment #8432008 - Flags: review?(martin.thomson)
OK, this is fair comment. I got excited about generality but
then didn't backfill. Would you prefer that we remove the
generality or do the backfill.




(In reply to Martin Thomson [:mt] from comment #10)
> Comment on attachment 8432008 [details] [diff] [review]
> part 2: Unit test
> 
> Review of attachment 8432008 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This needs another round.
> 
> ::: media/mtransport/test/transport_unittests.cpp
> @@ +88,5 @@
> > +  // Which starts with a certificate message.
> > +  virtual bool Inspect(TransportLayer* layer,
> > +                       const unsigned char *data, size_t len) {
> > +    unsigned char ccs[] = {0x14,  // Type
> > +                           0xfe, 0xff, // Version 
> 
> trailing space
> 
> @@ +131,5 @@
> >  
> > +    if (inspector_ && !inspector_->Inspect(this, data, len)) {
> > +      MOZ_MTLOG(ML_NOTICE, "Dropping packet because of inspector");
> > +      ++packet_;
> > +      return len;
> 
> This is a fair bit of generality given that this inspector doesn't actually
> cause packets to be dropped ever.  So this is dead code.
> 
> @@ +138,4 @@
> >      if (loss_mask_ & (1 << (packet_ % 32))) {
> >        MOZ_MTLOG(ML_NOTICE, "Dropping packet");
> >        ++packet_;
> >        return len;
> 
> If you want generality, why not go all the way and have the loss mask as a
> separate inspector?
> 
> @@ +179,5 @@
> >  
> >   private:
> >    uint32_t loss_mask_;
> >    uint32_t packet_;
> > +  Inspector* inspector_;
> 
> Where's the destructor?

I think I prefer to just have it owned by the parent, so I'll
remove the new in the actual test case.
(In reply to Eric Rescorla (:ekr) from comment #11)
> OK, this is fair comment. I got excited about generality but
> then didn't backfill. Would you prefer that we remove the
> generality or do the backfill.


I'm not all that excited about having two layers here.  Maybe just make a new transport layer that parallels TransportLayerLossy with the new capability.

public class TransportLayerCCSRepeater : TransportLayer {
  TransportLayerCCSRepeater() : sent_(false) {}
  ~TransportLayerCCSRepeater () {}

  virtual TransportResult SendPacket(const unsigned char *data, size_t len) {
    MOZ_MTLOG(ML_NOTICE, LAYER_INFO << "SendPacket(" << len << ")");
    TriggerSend();
    return downward_->SendPacket(data, len);
  }

  void StateChange(TransportLayer *layer, State state) {
    TL_SET_STATE(state);
  }

  void PacketReceived(TransportLayer *layer, const unsigned char *data,
                      size_t len) {
    SignalPacketReceived(this, data, len);
  }
  TRANSPORT_LAYER_ID("ccsrepeater")

 protected:
  virtual void WasInserted() {
    downward_->SignalPacketReceived.
        connect(this,
                &TransportLayerLossy::PacketReceived);
    downward_->SignalStateChange.
        connect(this,
                &TransportLayerLossy::StateChange);

    TL_SET_STATE(downward_->state());
  }

 private:
  bool TriggerSend() {
  }
  bool sent_;
};


It's a little more code, but I think that works.
Attachment #8432812 - Flags: review?(martin.thomson)
Comment on attachment 8432812 [details] [diff] [review]
part 2: Unit test

Review of attachment 8432812 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8432812 - Flags: review?(martin.thomson) → review+
Attachment #8432008 - Attachment is obsolete: true
Attachment #8432008 - Flags: feedback?(wtc)
I changed // to /* */ and fixed the indentation level.

Patch checked in: https://hg.mozilla.org/projects/nss/rev/8f026c806587
Attachment #8432010 - Attachment is obsolete: true
Attachment #8432010 - Flags: review?(wtc)
Attachment #8433760 - Flags: review+
Attachment #8433760 - Flags: checked-in+
Comment on attachment 8432812 [details] [diff] [review]
part 2: Unit test

Review of attachment 8432812 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.

::: media/mtransport/test/transport_unittests.cpp
@@ +86,5 @@
> + public:
> +  InspectorCCSInjector() : injected_(false) {}
> +
> +  // Inject a bogus CCS after you see the client's second handshake message
> +  // Which starts with a certificate message.

Nit: don't capitalize "which".

@@ +89,5 @@
> +  // Inject a bogus CCS after you see the client's second handshake message
> +  // Which starts with a certificate message.
> +  virtual void Inspect(TransportLayer* layer,
> +                       const unsigned char *data, size_t len) {
> +    unsigned char ccs[] = {0x14,  // Type

Nit: this array can be static const.

@@ +115,5 @@
> +      return;
> +
> +    layer->SendPacket(ccs, sizeof(ccs));
> +
> +    return;

Nit: this return statement can be omitted.
Attachment #8432812 - Flags: review+
(In reply to Eric Rescorla (:ekr) from comment #8)
>
> 1. I don't sanity check that the CCS is correct, since we are ignoring it
> anyway. I believe this is consistent with our general practice, but I wanted
> to get your opinion.

This is fine by me. How we handle this corner case doesn't matter much in
practice, so I prefer your simpler code.

> 2. I decided to return SECSuccess. This is what we do with out of order
> handshake messages so it should be safe.

I agree with your analysis.
 
> 3. I had to add code to set buf->len = 0 because it's expected that we will
> consume the
> entire buffer. Failure to do so causes an assert later. Does this seem OK?

Yes, this matches what ssl3_HandleChangeCipherSpecs does when it handles a
good ChangeCipherSpec, and also what dtls_HandleHandshake does before it
returns SECSuccess:

SECStatus
dtls_HandleHandshake(sslSocket *ss, sslBuffer *origBuf)
{
    ...
    SECStatus rv = SECSuccess;
    ...
    origBuf->len = 0;   /* So ssl3_GatherAppDataRecord will keep looping. */

    /* XXX OK for now. In future handle rv == SECWouldBlock safely in order
     * to deal with asynchronous certificate verification */
    return rv;
}
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → 3.16.2
Version: trunk → 3.14
Keeler,

Do you know when we plan to upgrade NSS in Firefox? We can't
land the unit test above until that happens.
Flags: needinfo?(dkeeler)
Looks like around June 6th. This is what Kai sent to the nss-dev list:

Today we discussed to target the 3.16.2 release in about 3 weeks, around
June 20.

All new feature work for 3.16.2 should ideally be checked in to NSS
trunk by June 6, and on that day we should land an updated beta into
mozilla-central (prior to the Mozilla June 9 branch date).

After June 6 we'll have some more time to get web-crypto related
bugfixes done, which Ryan is working on. Also, next week we'll be making
some changes to the root CA list, and final testing confirmations might
take until after June 9.

We should freeze new NSS feature work after June 6 and limit changes to
bugfixes, until the final release of 3.16.2, to keep it simple (avoid
branching of NSS).

(Note I'll be on PTO starting Wednesday, June 18, to Sunday, June 22.)

Thanks,
Kai


__
nss-dev mailing list
nss-dev@mozilla.org
https://mail.mozilla.org/listinfo/nss-dev
Flags: needinfo?(dkeeler)
Comment on attachment 8433760 [details] [diff] [review]
part 1: NSS fix by Eric Rescorla, v2

Review of attachment 8433760 [details] [diff] [review]:
-----------------------------------------------------------------

[Approval Request Comment]
Bug caused by (feature/regressing bug #):

   In original DTLS code.

User impact if declined:

   Continued occasional DTLS connectivity failures under packet
   loss.

Testing completed (on m-c, etc.):

   Hand tested. Unit test in patch #2.

Risk to taking this patch (and alternatives if risky):

   Low.  Patch is already in NSS and we just didn't pull the latest
   NSS into Firefox yet.

String or IDL/UUID changes made by this patch:

   None.
Attachment #8433760 - Flags: approval-mozilla-aurora?
Eric:

I tried to update NSS to NSS_3_16_2_BETA3 last Thursday, but it
was reverted because of WebCrypto mochitest failure. Since I don't
know how to run mochitest, I could not debug it quickly. With
Richard's help, I finally tracked it down and wrote a patch on
Saturday. I will test my fix today and hopefully retry the NSS
update later today or tomorrow.
Wan-Teh,

Thanks for checking into this.

I suspect that if you update NSS as you describe in c21, it would be better
to just uplift that to Aurora, given that the branch just happened today.

What do you think?
Eric,

I am not familiar with how Firefox branches are created, so
I'd like to only deal with mozilla-inbound and let others
uplift NSS upgrades to Aurora. Is that OK?

But I am wondering why it is fine to just uplift an NSS
update to Aurora without pushing it to mozilla-inbound or
mozilla-central?
Wan-Teh,

I am not too familiar with how NSS gets into Firefox myself, but
here is what I was assuming would happen:

1. We would eventually move this version of NSS into m-i and m-c.

2. We would concurrently either:

   (a) update aurora to use the new version of NSS (I.e., uplift NSS)
   (b) update aurora to just have this one patch if people felt that
       (a) was risky.

Does that seem sensible? Do you have a preference?
Eric,

You mean just uplifting your NSS patch in the bug to Aurora?

Yes, we can apply NSS patches directly when necessary. But
the patches need to be documented in the security/patches
directory (adding a patch file and describing the patch in
the README file). Also, I think it still needs to be pushed
to mozilla-inbound first.

Would you like to do that?
Wan-Teh,

It's my fault for not being clear. My final goal here is to make
sure we have this fix for FF 32 (which just went to Aurora). I'm
flexible about how that is accomplished.

If you think it's possible to land a new version of NSS on m-i
soon and then uplift that to Aurora, that's probably best.
Wan-Teh,

Do you have any updates on the NSS update for Firefox? Were you and Richard
able to fix the build problems?
Flags: needinfo?(wtc)
Eric:

I just pushed NSS_3_16_2_BETA4 to mozilla-inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/957b31f09f9e
Flags: needinfo?(wtc)
Comment on attachment 8433760 [details] [diff] [review]
part 1: NSS fix by Eric Rescorla, v2

The NSS fix was pushed to mozilla-central:
https://hg.mozilla.org/mozilla-central/rev/957b31f09f9e
Attached patch Additional DTLS tests (obsolete) — Splinter Review
Attachment #8441484 - Flags: review?(martin.thomson)
Comment on attachment 8441484 [details] [diff] [review]
Additional DTLS tests

Review of attachment 8441484 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/transport_unittests.cpp
@@ +217,5 @@
> +           peer->fingerprint_,
> +           peer->fingerprint_len_);
> +
> +    size_t fingerprint_len = peer->fingerprint_len_;
> +    MOZ_ASSERT(len_offset);

MOZ_ASSERT(peer->fingerprint_len_ + len_offset < TransportLayer::kMaxDigestLength);
MOZ_ASSERT(peer->fingerprint_len_ + len_offset > 0);

@@ +223,5 @@
> +    // Avoid adding a negative number to an unsigned.
> +    if (len_offset > 0) {
> +      fingerprint_len += len_offset;
> +    } else {
> +      fingerprint_len -= (-1 * len_offset);

Want to add the static_cast<size_t> stuff we discussed here to avoid the warning?

@@ +231,5 @@
> +        "sha-1",
> +        fingerprint_to_set, fingerprint_len);
> +
> +    if (fingerprint_len > TransportLayerDtls::kMaxDigestLength) {
> +      ASSERT_FALSE(NS_SUCCEEDED(res));

Just looking at SetVerificationDigest, if the fingerprint is too long, it doesn't change whether verification is required, or the old fingerprint.  Since I assume that the test runs, what is causing it to fail in TestConnectBadDigestSuperLong then?
Attachment #8441484 - Flags: review?(martin.thomson) → review+
Attachment #8441484 - Attachment is obsolete: true
---
 media/mtransport/test/transport_unittests.cpp | 162 +++++++++++++++++++++++++-
 1 file changed, 161 insertions(+), 1 deletion(-)
Attachment #8441534 - Flags: review?(martin.thomson)
Comment on attachment 8441534 [details] [diff] [review]
[PATCH]  part 2: Unit test

Review of attachment 8441534 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/mtransport/test/transport_unittests.cpp
@@ +48,5 @@
> +const uint8_t kTlsHandshakeType =        0x16;
> +
> +const uint8_t kTlsHandshakeCertificate = 0x0b;
> +
> +const unsigned char kTlsFakeChangeCipherSpec[] = {

Why not be consistent and use uint8_t here too?

@@ +122,5 @@
>    void SetLoss(uint32_t packet) {
>      loss_mask_ |= (1 << (packet & 32));
>    }
>  
> +  void SetInspector(Inspector* inspector) {

// transfers ownership of inspector to TransportLayerLossy

@@ +167,5 @@
> + public:
> +  DtlsRecordParser(const unsigned char *data, size_t len)
> +      : buffer_(data, len), offset_(0) {}
> +
> +  bool NextRecord(uint8_t* ct, RefPtr<DataBuffer>* buffer) {

What's the story on pass by reference here?

@@ +239,5 @@
> +  virtual void OnRecord(TransportLayer* layer,
> +                        uint8_t content_type,
> +                        const unsigned char *data, size_t len) {
> +    // Only inject once.
> +    if (injected_)

I'll only say this once, hopefully, but I prefer to see braces on all control statements.
Attachment #8441534 - Flags: review?(martin.thomson) → review+
https://hg.mozilla.org/mozilla-central/rev/882807bc3d73
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Comment on attachment 8433760 [details] [diff] [review]
part 1: NSS fix by Eric Rescorla, v2

Aurora approval granted.
Attachment #8433760 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I'm actually now kind of confused about what the right thing is here,
since this is in NSS. Do we want to just update Aurora to the new NSS?
Flags: needinfo?(lmandel)
Flags: needinfo?(dougt)
Flags: needinfo?(wtc)
(In reply to Eric Rescorla (:ekr) from comment #37)
> I'm actually now kind of confused about what the right thing is here,
> since this is in NSS. Do we want to just update Aurora to the new NSS?

That was my understanding when approving. We're early enough in Aurora where I'm comfortable taking this change if the update is thought to be safe. I won't want to update again after the midpoint on Aurora (end of next week).
Flags: needinfo?(lmandel)
I suspect you'd uplift https://hg.mozilla.org/mozilla-central/rev/957b31f09f9e
Flags: needinfo?(dougt)
(In reply to Doug Turner (:dougt) from comment #39)
> I suspect you'd uplift
> https://hg.mozilla.org/mozilla-central/rev/957b31f09f9e

More or less - see bug 1020695.
Eric: David or I will take care of uplifting a new NSS tag
to mozilla-aurora in bug 1020695.
Flags: needinfo?(wtc)
Wan-Teh: Thank you for your help.
You need to log in before you can comment on or make changes to this bug.