Closed Bug 822433 Opened 12 years ago Closed 12 years ago

Crash in dtls_FreeHandshakeMessages

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(firefox19 disabled, firefox20 fixed, firefox-esr10 unaffected, firefox-esr17 unaffected, b2g18 unaffected)

RESOLVED FIXED
3.14.2
Tracking Status
firefox19 --- disabled
firefox20 --- fixed
firefox-esr10 --- unaffected
firefox-esr17 --- unaffected
b2g18 --- unaffected

People

(Reporter: ekr, Assigned: ekr)

References

Details

(Whiteboard: [WebRTC] [blocking-webrtc+] [qa-] [adv-main20-])

Attachments

(1 file, 3 obsolete files)

If you create an SSL object in DTLS mode but don't actually process any packets
and then do PR_Close(), you get:


#0  0x000000010077d1eb in dtls_FreeHandshakeMessages (list=0x120d2ba60) at dtlscon.c:156
#1  0x00000001007946c0 in ssl3_DestroySSL3Info (ss=0x120d2b000) at ssl3con.c:10346
#2  0x00000001007b7db5 in ssl_DestroySocketContents (ss=0x120d2b000) at sslsock.c:406
#3  0x00000001007b7bca in ssl_FreeSocket (ss=0x120d2b000) at sslsock.c:465
#4  0x00000001007a69b6 in ssl_DefClose (ss=0x120d2b000) at ssldef.c:206
#5  0x00000001007b0b05 in ssl_SecureClose (ss=0x120d2b000) at sslsecur.c:1063
#6  0x00000001007bd805 in ssl_Close (fd=0x120074b20) at sslsock.c:2050
#7  0x000000010008d508 in PR_Close (fd=0x120074b20) at /Users/ekr/dev/mozilla-inbound/nsprpu /pr/src/io/priometh.c:104


The problem is that the list for the dtls messages is initialized to all 0s, which is not a valid
state:

  lastMessageFlight = {
    next = 0x0, 
    prev = 0x0
  }, 

Thus, we try to dereference NULL, which is obviously bad.

What is happening here is that we are shutting down DTLS before the
server has called ssl_InitState() in ssl3_HandleClientHello. Since
the default memory state of the hs object is all zeros, this produces
an invalid state of the list.

This crash was introduced in:
https://bug681065.bugzilla.mozilla.org/attachment.cgi?id=664288


When we created the object on the heap, the non-NULL check took care of this,
but now we're just checking for empty and that check freaks out if the
PRClist pointers are NULL:

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prclist.h#93


I have attached a patch to fix this, but I wonder if we should also fix
PR_CLIST_IS_EMPTY so that it thinks that lists with both pointers being
NULL (which cannot possibly be valid lists) are empty. E.g.,

#define PR_CLIST_IS_EMPTY(_l) \
  (!(_l) || ((_l)->next == (_l)))
Attached patch Fix for crash (obsolete) — Splinter Review
Attachment #693077 - Flags: review?(bsmith)
Blocks: 822434
Comment on attachment 693077 [details] [diff] [review]
Fix for crash

Review of attachment 693077 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/nss/lib/ssl/ssl3con.c
@@ +10346,1 @@
>  	dtls_FreeHandshakeMessages(&ss->ssl3.hs.lastMessageFlight);

I think it is better to have dtls_FreeHandshakeMessages check
for an uninitialized PRCList and return early:

    if (!list->next)
        return;

Another solution is to move the initialization of the PRCList:
    PR_INIT_CLIST(&ss->ssl3.hs.lastMessageFlight);
from ssl3_InitState (in ssl3con.c) to ssl_NewSocket (in
sslsock.c).
(In reply to Wan-Teh Chang from comment #3)
> Comment on attachment 693077 [details] [diff] [review]
> Fix for crash
> 
> Review of attachment 693077 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: security/nss/lib/ssl/ssl3con.c
> @@ +10346,1 @@
> >  	dtls_FreeHandshakeMessages(&ss->ssl3.hs.lastMessageFlight);
> 
> I think it is better to have dtls_FreeHandshakeMessages check
> for an uninitialized PRCList and return early:
> 
>     if (!list->next)
>         return;

This seems like the minimal of the two changes. I will resubmit with that.

> Another solution is to move the initialization of the PRCList:
>     PR_INIT_CLIST(&ss->ssl3.hs.lastMessageFlight);
> from ssl3_InitState (in ssl3con.c) to ssl_NewSocket (in
> sslsock.c).
Attached patch Fix for crash (obsolete) — Splinter Review
Attachment #693077 - Attachment is obsolete: true
Attachment #693077 - Flags: review?(bsmith)
Attachment #693220 - Flags: review?(wtc)
Attachment #693220 - Flags: review?(bsmith)
bsmith: we are seeing this bug with some frequency in Firefox. What's the best way to get this into m-c once it's approved?
With the other PeerConnection-shutdown patches applied, we now crash on the close to 100% (at least on Linux), so this is pretty important to fix.  If it gets r+'d we'd be ready to land it on m-c

If the fix *has* to wait for a while, we'd like to land a patch to leak the DTLS object to avoid crashing.
Comment on attachment 693220 [details] [diff] [review]
Fix for crash

r=wtc.

I now prefer the other solution I proposed. ssl_NewSocket is essentially
the constructor of sslSocket, so calling

    PR_INIT_CLIST(&ss->ssl3.hs.lastMessageFlight);

in ssl_NewSocket will emulate what we would do if this were C++ code:
the PRCList constructor would initialize itself that way.

If you go with this patch, please add a comment to explain this tests
for an uninitialized (but zeroed) PRCList.
Attachment #693220 - Flags: review?(wtc) → review+
(In reply to Wan-Teh Chang from comment #8)
> Comment on attachment 693220 [details] [diff] [review]
> Fix for crash
> 
> r=wtc.
> 
> I now prefer the other solution I proposed. ssl_NewSocket is essentially
> the constructor of sslSocket, so calling
> 
>     PR_INIT_CLIST(&ss->ssl3.hs.lastMessageFlight);
> 
> in ssl_NewSocket will emulate what we would do if this were C++ code:
> the PRCList constructor would initialize itself that way.
> 
> If you go with this patch, please add a comment to explain this tests
> for an uninitialized (but zeroed) PRCList.

OK, I'll prepare a patch with the other solution and send it you for re-review,
since it's totally different.
Attached patch NSS initialize DTLS patch (obsolete) — Splinter Review
Attachment #693220 - Attachment is obsolete: true
Attachment #693220 - Flags: review?(bsmith)
Comment on attachment 693880 [details] [diff] [review]
NSS initialize DTLS patch

Review of attachment 693880 [details] [diff] [review]:
-----------------------------------------------------------------

I moved the initialization and added an assert in its place.

That said, I'm considering initializing in both locations for belt and suspenders. Thoughts?
Attachment #693880 - Flags: review?(wtc)
Severity: normal → critical
Priority: -- → P1
Whiteboard: [WebRTC] [blocking-webrtc+]
Assignee: nobody → ekr
Comment on attachment 693880 [details] [diff] [review]
NSS initialize DTLS patch

Review of attachment 693880 [details] [diff] [review]:
-----------------------------------------------------------------

r=wtc.

::: security/nss/lib/ssl/ssl3con.c
@@ +10007,5 @@
>  
>      PORT_Memset(&ss->xtnData, 0, sizeof(TLSExtensionData));
>  
>      if (IS_DTLS(ss)) {
> +        PR_ASSERT(PR_CLIST_IS_EMPTY(&ss->ssl3.hs.lastMessageFlight));

Use PORT_Assert (NSS's equivalent of PR_ASSERT).
Attachment #693880 - Flags: review?(wtc) → review+
ekr: initializing in both locations is fine by me.
bsmith - so what's the right way to land this?  It's a major crasher and total blocker for webrtc with other patches currently ready to land (I crash close to 100% of the time here with the patches applied to fix other bugs).  I'd like to land it ASAP (pending sec-approval) and it will eventually appear in the next import of NSS to replace our local patch.
Flags: needinfo?(bsmith)
Attachment #693880 - Attachment is obsolete: true
Comment on attachment 694129 [details] [diff] [review]
NSS initialize DTLS patch

Review of attachment 694129 [details] [diff] [review]:
-----------------------------------------------------------------

Now with just the added initialization but leaving the one in ssl3con.c
Attachment #694129 - Flags: review?(wtc)
(In reply to Eric Rescorla (:ekr) from comment #16)
> Now with just the added initialization but leaving the one in ssl3con.c

Is there a reason

This simpler patch (one initialization) does seem better to me, because ssl3_InitState is called in response to sending/receiving data, and this bug is about (AFAICT) a failure that occurs before we send/receive data. The only reason I can think of to initialize it in ssl3_InitState is if we were trying to support the STARTTLS stuff, but we don't support that (AFAICT) for DTLS.

(In reply to Randell Jesup [:jesup] from comment #14)
> bsmith - so what's the right way to land this?  It's a major crasher and
> total blocker for webrtc with other patches currently ready to land (I crash
> close to 100% of the time here with the patches applied to fix other bugs). 
> I'd like to land it ASAP (pending sec-approval) and it will eventually
> appear in the next import of NSS to replace our local patch.

Assuming there are no objections or changes by noon, I will land it in NSS CVS, tag the NSS CVS HEAD with a CVS_HEAD_20121219 tag, and import that tag into mozilla-central.

Do we need to backport this to Aurora? If so then we can do the same for Aurora but we should mention that to the NSS team ASAP.

IF we need to backport it to *beta* then we will need a(nother) private patch for mozilla-beta.

Please let me know if there are other changes to NSS that Gecko needs immediately.
Flags: needinfo?(bsmith)
The reason for leaving it in ssl3_InitState() was just that this code is complicated, and that's where we initialize much of the rest of the structure, so I figured belt and suspenders.
Checking in lib/ssl/sslsock.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsock.c,v  <--  sslsock.c
new revision: 1.99; previous revision: 1.98
done

See bug 823705 for the landing in mozilla-central.
Severity: critical → normal
Priority: P1 → --
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.14.2
> Priority: P1 → --
> Severity: critical → normal
Severity: normal → critical
Priority: -- → P1
(In reply to Brian Smith (:bsmith) from comment #17)
> (In reply to Eric Rescorla (:ekr) from comment #16)
> > Now with just the added initialization but leaving the one in ssl3con.c
> 
> Is there a reason?

I also find the double initialization non-ideal. I agreed to
double initialization because I don't fully understand the
purpose of ssl3_InitState. An obvious question was why we
don't just call ssl3_InitState in ssl_NewSocket. One possible
answer was that ssl_NewSocket doesn't know if the socket will
be used for SSL2.

I just took a look at ssl3_InitState. Although it sets most
members to fixed values, it also initializes a small number
of members based on socket options (for example, ss->vrange.max
and ss->opt.bypassPKCS11). This means:

1. Even if we remove SSL2 support, ssl_NewSocket still can't
simply call ssl3_InitState. sslSocket members that can only
be initialized when the socket is configured must be
initialized late in ssl3_InitState.

2. ss->ssl3.hs.lastMessageFlight can be initialized early in
ssl_NewSocket because the initialization doesn't depend on any
socket option.

Perhaps we can clean up the sslSocket initialization code when
SSL2 support is removed.
(In reply to Wan-Teh Chang from comment #21)
> (In reply to Brian Smith (:bsmith) from comment #17)
> > (In reply to Eric Rescorla (:ekr) from comment #16)
> > > Now with just the added initialization but leaving the one in ssl3con.c
> > 
> > Is there a reason?
> 
> I also find the double initialization non-ideal. I agreed to
> double initialization because I don't fully understand the
> purpose of ssl3_InitState. 

Yep. This was my reasoning as well. I wanted to make this patch
fast and figured that meant taking as few chances as possible.


> An obvious question was why we
> don't just call ssl3_InitState in ssl_NewSocket. One possible
> answer was that ssl_NewSocket doesn't know if the socket will
> be used for SSL2.
> 
> I just took a look at ssl3_InitState. Although it sets most
> members to fixed values, it also initializes a small number
> of members based on socket options (for example, ss->vrange.max
> and ss->opt.bypassPKCS11). This means:
> 
> 1. Even if we remove SSL2 support, ssl_NewSocket still can't
> simply call ssl3_InitState. sslSocket members that can only
> be initialized when the socket is configured must be
> initialized late in ssl3_InitState.
> 
> 2. ss->ssl3.hs.lastMessageFlight can be initialized early in
> ssl_NewSocket because the initialization doesn't depend on any
> socket option.
> 
> Perhaps we can clean up the sslSocket initialization code when
> SSL2 support is removed.

I would be in favor of this.
Attachment #694129 - Flags: review?(wtc)
Whiteboard: [WebRTC] [blocking-webrtc+] → [WebRTC] [blocking-webrtc+] [qa-]
Adding Ami so this can get uplifted to Chrome.
Whiteboard: [WebRTC] [blocking-webrtc+] [qa-] → [WebRTC] [blocking-webrtc+] [qa-] [adv-main20-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: