Closed Bug 1314604 (CVE-2016-8635) Opened 9 years ago Closed 8 years ago

NSS 3.21.x branch still crashing with some DH keys, fix from bug 1306103 was apparently incomplete

Categories

(NSS :: Libraries, defect)

3.21
defect
Not set
normal

Tracking

(firefox-esr4551+ affected, firefox51 unaffected, firefox52 unaffected, firefox53 unaffected)

RESOLVED FIXED
Tracking Status
firefox-esr45 51+ affected
firefox51 --- unaffected
firefox52 --- unaffected
firefox53 --- unaffected

People

(Reporter: KaiE, Unassigned)

References

Details

(Keywords: crash, sec-moderate, wsec-dos)

Attachments

(4 files)

This is a follow-up to bug 1306103. During testing Hubert identified and convinced Bob that the original fix introduced a new issue. I'm just the messenger here, so I'll let Hubert and Bob explain the new issue. (Hubert, maybe you could simply copy/paste the relevant portions from the emails that you and Bob exchanged?). I'm attaching a patch, which is based on Bob's suggestion. I assume that RH will take that patch for our Nov 8 update. Should the official Mozilla ESR 45.x release planned for Nov 8 also pick up this fix? Related is: (In reply to Martin Thomson [:mt:] from bug 1306103 comment #16) > After eyeballing the code, it looks like > we narrowly avoid the bug on the client side as well in > sendDHClientKeyExchange.
Flags: needinfo?(hkario)
Keywords: regression
Note the attached patch was originally intended for the NSS_3_21_BRANCH, but I notice that the same code is still used.. Should the patch added to both 3.21.x branch and to the tip of NSS (pre 3.28) ?
I have performed few more tests and I'm afraid that bug 1306103 is not fixed. For some reason RHEL packages handle the error in a way that NSS server doesn't crash. The alerts it (and RHEL packages) sends and the messages it prints, indicate that the server doesn't abort right after receiving the key share with invalid value but rather due to MAC check failure on the Finished message - breakpoints set on ssl_CBCExtractMAC in ssl3_HandleRecord() in ssl3con.c get triggered. If I use NSS_3_21_3_RTM (12729:ee067d70a228) (with or without attachment 8806711 [details] [diff] [review]) the server crashes with following stacktrace upon receiving Client Key Exchange with keys share of 0: LD_LIBRARY_PATH=~/dev/dist/Linux4.7_x86_64_cc_glibc_PTH_64_DBG.OBJ/lib/ valgrind ~/dev/dist/Linux4.7_x86_64_cc_glibc_PTH_64_DBG.OBJ/bin/selfserv -n localhost -p 4433 -d sql:/tmp/nssdb -V tls1.0: -H 1 -U 0 -G ==15430== Memcheck, a memory error detector ==15430== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15430== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==15430== Command: /home/hkario/dev/dist/Linux4.7_x86_64_cc_glibc_PTH_64_DBG.OBJ/bin/selfserv -n localhost -p 4433 -d sql:/tmp/nssdb -V tls1.0: -H 1 -U 0 -G ==15430== selfserv: HDX PR_Read returned error -5961: TCP connection reset by peer ==15430== Thread 2: ==15430== Conditional jump or move depends on uninitialised value(s) ==15430== at 0x4C2F45E: __strncmp_sse42 (vg_replace_strmem.c:645) ==15430== by 0x408405: handle_connection (selfserv.c:1593) ==15430== by 0x406A29: jobLoop (selfserv.c:647) ==15430== by 0x4068D7: thread_wrapper (selfserv.c:616) ==15430== by 0x5CA8674: _pt_root (ptthread.c:216) ==15430== by 0x5EC95C9: start_thread (pthread_create.c:333) ==15430== selfserv: HDX PR_Read returned error -12282: The server has encountered bad data from the client. selfserv: HDX PR_Read returned error -8173: security library: memory allocation failure. selfserv: mpi/mpi.c:4653: mp_read_unsigned_octets: Assertion `mp != ((void *)0) && str != ((void *)0) && len > 0' failed. ==15430== ==15430== Process terminating with default action of signal 6 (SIGABRT): dumping core ==15430== at 0x63166F5: raise (raise.c:54) ==15430== by 0x63182F9: abort (abort.c:89) ==15430== by 0x630EF96: __assert_fail_base (assert.c:92) ==15430== by 0x630F041: __assert_fail (assert.c:101) ==15430== by 0x7406DA0: mp_read_unsigned_octets (mpi.c:4653) ==15430== by 0x73E3406: DH_Derive (dh.c:226) ==15430== by 0x6EFBF74: DH_Derive (loader.c:302) ==15430== by 0x6EE919B: NSC_DeriveKey (pkcs11c.c:6890) ==15430== by 0x53327FF: PK11_PubDerive (pk11skey.c:1901) ==15430== by 0x4E5C418: ssl3_HandleDHClientKeyExchange (ssl3con.c:9919) ==15430== by 0x4E5C86B: ssl3_HandleClientKeyExchange (ssl3con.c:10043) ==15430== by 0x4E6018C: ssl3_HandleHandshakeMessage (ssl3con.c:11640) ==15430== ==15430== HEAP SUMMARY: ==15430== in use at exit: 2,629,580 bytes in 2,384 blocks ==15430== total heap usage: 45,723 allocs, 43,339 frees, 25,820,998 bytes allocated ==15430== ==15430== LEAK SUMMARY: ==15430== definitely lost: 0 bytes in 0 blocks ==15430== indirectly lost: 0 bytes in 0 blocks ==15430== possibly lost: 129,793 bytes in 195 blocks ==15430== still reachable: 2,499,787 bytes in 2,189 blocks ==15430== of which reachable via heuristic: ==15430== length64 : 168,256 bytes in 375 blocks ==15430== suppressed: 0 bytes in 0 blocks ==15430== Rerun with --leak-check=full to see details of leaked memory ==15430== ==15430== For counts of detected and suppressed errors, rerun with: -v ==15430== Use --track-origins=yes to see where uninitialised values come from ==15430== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0)
Flags: needinfo?(hkario)
Version: 3.27 → 3.21
Kai pointed out to me, that the crash happens in softokn, which may mean that the check if the client key share is valid may need to be implemented in ssl3_HandleDHClientKeyExchange method, not in softokn. In other words, NSS shouldn't ask softokn to derive keys if it can tell for itself that the value it received is invalid (0, 1, p-1 or larger).
Flags: needinfo?(rrelyea)
I'm concerned if softoken can't handle invalid inputs properly. We do already have checks for 0, 1, >=(p-1) on trunk. Those are probably not in 3.21 though and backporting might be costly.
Hubert, I wonder if something is wrong in your environment. I cannot reproduce the crash with the reproducer you gave to me, the server closes the socket with an alert (handshake failure).
note that we're crashing in assert so we need an unoptimised build, otherwise they will be optimised out I built it with DEBUG=1 MOZ_DEBUG_SYMBOLS=1 USE_64=1 flags set (In reply to Martin Thomson [:mt:] from comment #5) > I'm concerned if softoken can't handle invalid inputs properly. We do > already have checks for 0, 1, >=(p-1) on trunk. Those are probably not in > 3.21 though and backporting might be costly. as far as I can tell, the assert is hit before the check; the check is in dh.c:241 if (mp_cmp_d(&Yb, 1) <= 0 || mp_cmp(&Yb, &psub1) >= 0) { err = MP_BADARG; goto cleanup; } while assert comes from dh.c:226 SECITEM_TO_MPINT(*publicValue, &Yb);
We have checks in lib/ssl. We shouldn't be passing a NULL for the public value though.
(In reply to Kai Engert (:kaie) from comment #6) > Hubert, I wonder if something is wrong in your environment. I cannot > reproduce the crash with the reproducer you gave to me, the server closes > the socket with an alert (handshake failure). I had incomplete information. I can reproduce the crash
Keywords: crash
We confirmed that NSS 3.21.3 is equally affected by this crash, regardless which softokn is used. We crash with both the softokn included with 3.21.3, and also with the older softokn that RHEL uses.
Attached file stack.txt —
Keywords: regression
Summary: Regression caused by bug 1306103 → Still crashing with some DH keys, fix from bug 1306103 was apparently incomplete
Summary: Still crashing with some DH keys, fix from bug 1306103 was apparently incomplete → NSS 3.21.x branch still crashing with some DH keys, fix from bug 1306103 was apparently incomplete
This is a backport of the checks we added in a more recent version. I've added a safety check to avoid the perception that we are doing an out-of-bounds read. I checked, and the issue that ekr identified isn't an issue since we either generate the prime ourselves, or check that it is long enough.
Attachment #8807302 - Flags: review?(kaie)
I forgot to add: the analysis here is that when we read the public value and that public value is zero length, we leave the publicValue.data set to NULL (publicValue.len is set to 0). That causes the NULL deref. Since we don't even do the most basic checks on the number, this crashes.
Attached patch bug1314604-trunk-1.patch — — Splinter Review
Let's be extra paranoid and use this.
Attachment #8807306 - Flags: review?(kaie)
Comment on attachment 8807302 [details] [diff] [review] bug1314604.patch (backport to NSS 3.21.x branch) I confirm it fixes the crash with the reproducer that Hubert gave me. Is that sufficient for an r+ on this backport patch? If you need a real code review, it would be better to ask someone who understands this code.
Attachment #8807302 - Flags: review?(kaie) → review+
Comment on attachment 8807302 [details] [diff] [review] bug1314604.patch (backport to NSS 3.21.x branch) Review of attachment 8807302 [details] [diff] [review]: ----------------------------------------------------------------- Just for the sake of paranoia. Note that other than a straight backport, I'm doing bad things with error codes because trunk has an extra error code that wasn't in 3.21.
Attachment #8807302 - Flags: review?(ekr)
EKR, Bob, would you please be able to complete a review of the backport patch today? It might be good to try to get this fixed for the Firefox 45.5 release, which has slipped one week, and is scheduled for Nov 15.
[Tracking Requested - why for this release]: Dan, if we have a fix ready by tomorrow, can the Nov 15 ESR 45.5 update pick it up?
Flags: needinfo?(dveditz)
Comment on attachment 8807306 [details] [diff] [review] bug1314604-trunk-1.patch Reasonable satefy check for NSS trunk, r=kaie
Attachment #8807306 - Flags: review?(kaie) → review+
Attachment #8807302 - Attachment description: bug1314604.patch → bug1314604.patch (backport to NSS 3.21.x branch)
Attachment #8806711 - Attachment description: 1314604-v1.patch → 1314604-v1.patch (patch from Bob, not fixing the issue)
(In reply to Kai Engert (:kaie) from comment #19) > Dan, if we have a fix ready by tomorrow, can the Nov 15 ESR 45.5 update pick it up? That's a question for Liz at this point. We have already created a 45.5 build that QE has signed off on. If this isn't a "stop ship" kind of bug then probably not unless something else forces a re-build.
Flags: needinfo?(dveditz) → needinfo?(lhenry)
Comment on attachment 8807306 [details] [diff] [review] bug1314604-trunk-1.patch Review of attachment 8807306 [details] [diff] [review]: ----------------------------------------------------------------- r+ because it fixes the issue. I think that there is a more optimal patch. I'd like to verify that the actual crash is dereferencing dh_p->data[-1] and not some issue with geting the size from SECKEY_BitIntengerBitLength(). bob ::: lib/ssl/sslsock.c @@ +1669,5 @@ > int cmp; > > + if (dh_p->len == 0 || dh_Ys->len == 0) { > + return PR_FALSE; > + } This is complete and clear, I don't thing the dh_Ys->len should be necessary unless there's a bug in SECKEY_BitIntengerBitLength, which should return zero if dh_hYs->len == 0. @@ +1675,5 @@ > if ((dh_p->data[dh_p->len - 1] & 0x01) == 0) { > return PR_FALSE; > } > /* dh_Ys can't be 1, or bigger than dh_p. */ > if (size_y <= 1 || size_y > size_p) { Why doesn't this catch our issue? if size_y == 0, then it's definately <= 1 and if size_y is >=1 it is certainly greater than size_p if size_p == 0, so the only think the new check is preventing is a crash dereferencing dh_p->data[-1] (which could crash). If we move this check above the odd check, then we probably fix the crash.
Attachment #8807306 - Flags: feedback+
Comment on attachment 8807302 [details] [diff] [review] bug1314604.patch (backport to NSS 3.21.x branch) Review of attachment 8807302 [details] [diff] [review]: ----------------------------------------------------------------- r+ rrelyea
Attachment #8807302 - Flags: feedback+
So I've fine with either patch. I'd like to make sure my understanding of why the patch is necessary is true (is the crash really when we dereference d_p->data[-1]? If not I'm mystified why the patch would work since we already make that same check further below (in a tricky, but efficient test). Re picking up the new error code: should be fine, The only issue is if we should update the error strings as well. If applications use the error strings, there everything should be fine. If they don't, they will output an unknown NSS error with a raw number (which happens not unfrequently anyway). bob
Flags: needinfo?(rrelyea)
I would rather put this off till 45.6 (the December esr update) unless it is crucial - as Dan says we already have the ESR build in place and QE ongoing. Can we live without this in 45.5.0?
Flags: needinfo?(lhenry)
Regarding Bob's comments: He had not noticed that one of the patches was for trunk, and the other for the older branch. I clarifed that, but he said, his comments are still valid. Bob, the error code SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY that is set in this longer patch has been introduced in year 2010 already, and SSL_ERROR_CLIENT_KEY_EXCHANGE_FAILURE is much older even, so we're good.
Comment on attachment 8806711 [details] [diff] [review] 1314604-v1.patch (from Bob, branch only, optional) Martin: Bob suggested, that in addition to your longer patch, we should also take this smaller patch, to be safe. Do you agree?
Attachment #8806711 - Flags: review?(martin.thomson)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25) > I would rather put this off till 45.6 (the December esr update) unless it is > crucial - as Dan says we already have the ESR build in place and QE ongoing. > Can we live without this in 45.5.0? I believe that the cause for bug 1306103 will be announced on Nov 15. IIUC the additional crash from this bug is very similar, if one knows about bug 1306103, it can easily inspire potential attackers to try out variations of the bad input (just like Hubert did when finding this additional bug). So, not fixing this bug will expose you to this risk for 6 weeks.
In addition, Red Hat would prefer to ship this patch on Nov 15, even if Mozilla doesn't.
Comment on attachment 8806711 [details] [diff] [review] 1314604-v1.patch (from Bob, branch only, optional) Review of attachment 8806711 [details] [diff] [review]: ----------------------------------------------------------------- Yes, this is a good change. It isn't necessary (rv is initialized to SECFailure and it isn't changed above this line), but it is certainly worth being extra cautious here.
Attachment #8806711 - Flags: review?(martin.thomson) → review+
Attachment #8806711 - Attachment description: 1314604-v1.patch (patch from Bob, not fixing the issue) → 1314604-v1.patch (patch from Bob, patch for both trunk and branch)
Comment on attachment 8806711 [details] [diff] [review] 1314604-v1.patch (from Bob, branch only, optional) (In reply to Martin Thomson [:mt:] from comment #30) > Comment on attachment 8806711 [details] [diff] [review] > > Yes, this is a good change. It isn't necessary (rv is initialized to > SECFailure and it isn't changed above this line), but it is certainly worth > being extra cautious here. Thanks, I didn't notice it's unnecessary. Trunk doesn't need it, because the loser section doesn't use rv, and always return SECFailure.
Attachment #8806711 - Attachment description: 1314604-v1.patch (patch from Bob, patch for both trunk and branch) → 1314604-v1.patch (from Bob, branch only, optional)
Martin, EKR, I think you might have forgotten about this one (see comment 17). Do you want to get this fixed in Firefox 45.x ? If yes, we'd have to get that branch patch reviewed, and landed into the NSS_3_21_BRANCH and a new 3.21.x release created, and uplifted to Firefox 45.x ESR branch. Due to my vacation, returning Dec 7, I probably won't be able to help with this work, since the drivers probably don't want to take an uplift that close to the Dec 13 release date. So, either you try to get the release done without me (please don't forget the old-configure.in change to require the new NSS version, if you do), or wait until mid January for the second round of 45.x
Flags: needinfo?(ekr)
Flags: needinfo?(ekr) → needinfo?(martin.thomson)
Still looking for review, see comment 17.
Flags: needinfo?(martin.thomson)
Back to needinfo EKR, because Martin is waiting for his review.
Flags: needinfo?(ekr)
Looks like RedHat assigned CVE-2016-8635 to this one.
Alias: CVE-2016-8635
Group: crypto-core-security
What happened to this bug? Can it be closed?
Flags: needinfo?(martin.thomson)
Yeah, let's do that. It's either OBE, or fixed (it was fixed on trunk, so the whole thing was about backports).
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(martin.thomson)
Resolution: --- → FIXED
Flags: needinfo?(ekr)
Attachment #8807302 - Flags: review?(ekr)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: