Closed
Bug 1273678
Opened 9 years ago
Closed 8 years ago
AddressSanitizer: heap-use-after-free on address in nssPKIObject_AddRef
Categories
(NSS :: Libraries, defect)
Tracking
(firefox49 wontfix, firefox-esr45- wontfix, firefox50+ wontfix, firefox51+ wontfix, firefox52+ wontfix, firefox-esr52 wontfix, firefox53+ wontfix, firefox54+ wontfix, firefox55+ fixed)
People
(Reporter: bc, Assigned: ttaubert)
References
(Blocks 1 open bug, )
Details
(Keywords: csectype-uaf, sec-moderate, Whiteboard: [adv-main55+][post-critsmash-triage])
Attachments
(4 files, 2 obsolete files)
38.08 KB,
text/plain
|
Details | |
509 bytes,
text/plain
|
Details | |
1.53 KB,
patch
|
Details | Diff | Splinter Review | |
4.76 KB,
patch
|
ttaubert
:
feedback+
|
Details | Diff | Splinter Review |
1. http://www.musica-classica.it/forum/index.php?/topic/3736-il-caso-luchesi/&page=5 with an asan linux x86_64 build.
2. ==9341==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d0010a9110 at pc 0x7f02b3b3f2a8 bp 0x7ffc8fcfa3a0 sp 0x7ffc8fcfa398
WRITE of size 4 at 0x61d0010a9110 thread T0
#0 0x7f02b3b3f2a7 in nssPKIObject_AddRef /builds/slave/m-cen-l64-asan-ntly-0000000000/build/src/security/nss/lib/pki/pkibase.c:156
on rhel6 x86_64.
I have been unable to reproduce this issue on rhel6 or fedora24 but considered it important enough to file anyway. This may have been due to an advertisement. I'll keep trying to reproduce as time permits.
Comment 1•9 years ago
|
||
I'm not sure if this error is really in NSS. It looks like an NSS shutdown issue to me. David could you have a quick look?
Flags: needinfo?(dkeeler)
![]() |
||
Comment 2•9 years ago
|
||
I wrote a small test client that simulates what those stack traces seem to indicate is going on (a TLS connection is happening while the platform is unloading the pkcs11 module containing the default root certificates), and it appears that NSS is not robust against this. I'm not sure it's intended to be (sometimes NSS acquires locks when working with these objects, but sometimes it doesn't appear to).
If this is a use-case that NSS wants to support, the lack of locking should be fixed. In the meantime, Firefox probably shouldn't unload modules while any networking is happening.
Flags: needinfo?(dkeeler)
Updated•9 years ago
|
Keywords: csectype-uaf
Comment 3•9 years ago
|
||
I'll mark this sec-high because it sounds like a shutdown ordering kind of issue, which will be more difficult to exploit.
Keywords: sec-high
Comment 4•9 years ago
|
||
Thanks for your assessment David. Can you attach your test code?
It sounds to me like NSS should lock here unless Wan-Teh has a reason why we don't.
Flags: needinfo?(wtc)
Flags: needinfo?(dkeeler)
![]() |
||
Comment 5•9 years ago
|
||
Here's what I came up with. Let me know if the steps to reproduce aren't clear enough (see README).
Flags: needinfo?(dkeeler)
Updated•9 years ago
|
Comment 6•8 years ago
|
||
David, can you look at this again? This sec-high is just kind of hanging out, getting old but no less bad.
Flags: needinfo?(dkeeler)
![]() |
||
Comment 7•8 years ago
|
||
This would be best suited to someone who understands how NSS is supposed to handle this situation (see comment 2). Maybe Tim?
Flags: needinfo?(dkeeler) → needinfo?(ttaubert)
Assignee | ||
Comment 8•8 years ago
|
||
I'm trying to understand what's happening here but Franziskus and I have trouble reproducing this with the script you provided. Does that still work for you?
Flags: needinfo?(ttaubert) → needinfo?(dkeeler)
![]() |
||
Comment 9•8 years ago
|
||
It took a bit of fiddling around, but yes, this still reproduces for me (this version of the testcase spins off two different threads that connect to the server, which seems to make it reproduce faster).
Things to try:
* nss/nspr compiled with USE_ASAN? (I eventually just grabbed a build from https://treeherder.mozilla.org/#/jobs?repo=nss )
* is a TLS server running on port 8000?
* is LD_LIBRARY_PATH correct when running ./nsslclient?
With all that, it still takes about 1-3 minutes to reproduce. Unfortunately, the error message isn't very elucidating since asan appears to be encountering an error of its own:
==24541==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000323cc0 at pc 0x7f688e8df733 bp 0x7f68884374f0 sp 0x7f68884374e8
WRITE of size 4 at 0x61d000323cc0 thread T2
#0 0x7f688e8df732 in nssToken_AddRef /home/worker/nss/lib/dev/devtoken.c:62:5
#1 0x7f688e8df462 in nssSlot_GetToken /home/worker/nss/lib/dev/devslot.c:221:16
#2 0x7f688e8c4f5d in nssTrustDomain_FindTrustForCertificate /home/worker/nss/lib/pki/trustdomain.c:1075:20
#3 0x7f688e8dd92b in fill_CERTCertificateFields /home/worker/nss/lib/pki/pki3hack.c:798:17
#4 0x7f688e8d9ad2 in stan_GetCERTCertificate /home/worker/nss/lib/pki/pki3hack.c:896:9
#5 0x7f688e8d9c56 in STAN_GetCERTCertificate /home/worker/nss/lib/pki/pki3hack.c:929:12
#6 0x7f688e8b1a7d in CERT_NewTempCertificate /home/worker/nss/lib/certdb/stanpcertdb.c:393:10
#7 0x7f688de550e6 in ssl3_CompleteHandleCertificate /home/worker/nss/lib/ssl/ssl3con.c:10594:24
#8 0x7f688de7dfe3 in ssl3_HandleCertificate /home/worker/nss/lib/ssl/ssl3con.c:10507:12
#9 0x7f688de61ba0 in ssl3_HandlePostHelloHandshakeMessage /home/worker/nss/lib/ssl/ssl3con.c:11741:18
#10 0x7f688de5cc7a in ssl3_HandleHandshakeMessage /home/worker/nss/lib/ssl/ssl3con.c:11688:22
#11 0x7f688de698fa in ssl3_HandleHandshake /home/worker/nss/lib/ssl/ssl3con.c:11869:18
#12 0x7f688de64d47 in ssl3_HandleRecord /home/worker/nss/lib/ssl/ssl3con.c:12632:22
#13 0x7f688de8ddcb in ssl3_GatherCompleteHandshake /home/worker/nss/lib/ssl/ssl3gthr.c:474:22
#14 0x7f688de93567 in ssl_GatherRecord1stHandshake /home/worker/nss/lib/ssl/sslcon.c:78:10
#15 0x7f688deb1e04 in ssl_Do1stHandshake /home/worker/nss/lib/ssl/sslsecur.c:65:14
#16 0x7f688deb5c4c in SSL_ForceHandshake /home/worker/nss/lib/ssl/sslsecur.c:413:14
#17 0x4e577a in doConnect /home/keeler/src/nsslclient/nsslclient.c:90:7
#18 0x4e5850 in connectThreadMain /home/keeler/src/nsslclient/nsslclient.c:105:5
#19 0x7f688e4bca2f in _pt_root /home/worker/nspr/Linux3.13_x86_64_clang-3.9_glibc_PTH_ASAN_64_DBG.OBJ/pr/src/pthreads/../../../../pr/src/pthreads/ptthread.c:216:5
#20 0x7f688d9105c9 in start_thread (/lib64/libpthread.so.0+0x75c9)
#21 0x7f688cd1df6c in __clone (/lib64/libc.so.6+0x102f6c)
ASAN:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.
Flags: needinfo?(dkeeler)
Comment 10•8 years ago
|
||
Any luck reproducing this, Tim?
status-firefox51:
--- → affected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
status-firefox-esr45:
--- → ?
tracking-firefox51:
--- → ?
tracking-firefox52:
--- → ?
tracking-firefox53:
--- → ?
tracking-firefox-esr45:
--- → ?
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 11•8 years ago
|
||
Haven't found the time to try again thus far, but it's on my list of things to fix rather sooner than later.
Flags: needinfo?(ttaubert)
Comment 14•8 years ago
|
||
Track 52+. Tim, I don't suppose you've had more luck reproducing this since November?
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 15•8 years ago
|
||
I tried to a few times since David's last message. I can't reproduce it on Docker/Linux, on a Linux desktop machine, and also Franziskus can't reproduce it on his Linux desktop machine. Doesn't reproduce on macOS either :(
Flags: needinfo?(ttaubert)
Comment 16•8 years ago
|
||
Tim can't reproduce. David, can you take a look again? Would be too bad if this remains unfixed much longer.
Flags: needinfo?(dkeeler)
![]() |
||
Comment 17•8 years ago
|
||
I don't know why I didn't think to do this earlier, but I ran my testcase with rr, which enabled me to get a better idea of what's going on. I'm still getting a handle on this, but it looks like the presence of a Fedora-specific token (the "System Trust" token) is causing the uaf, which might indicate why this doesn't reproduce on non-Fedora systems (which is what I'm assuming Tim is using?). I'll keep investigating.
Flags: needinfo?(dkeeler)
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to David Keeler [:keeler] (use needinfo?) from comment #17)
> might indicate why this doesn't reproduce on non-Fedora systems (which is
> what I'm assuming Tim is using?).
Yay, a lead! I've been indeed using a virtual and a real Ubuntu only. Franziskus uses Arch.
![]() |
||
Comment 19•8 years ago
|
||
I figured out how to reproduce this (well, not exactly the same uaf as in comment 0, but a similar root-unloading race condition) in an ubuntu-based docker image, so I don't think this is fedora-specific. If you have docker set up, all you should need to do is:
(download this Dockerfile to cwd)
sudo docker build -t nssthreading .
sudo docker run nssthreading
Attachment #8759742 -
Attachment is obsolete: true
Attachment #8799045 -
Attachment is obsolete: true
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to David Keeler [:keeler] (use needinfo?) from comment #19)
> I figured out how to reproduce this (well, not exactly the same uaf as in
> comment 0, but a similar root-unloading race condition) in an ubuntu-based
> docker image, so I don't think this is fedora-specific. If you have docker
> set up, all you should need to do is:
Thanks David, this is great! I was able to reproduce this on Docker (with macOS as the host) as soon as I set the number of CPUs > 1. Franziskus was able to reproduce on his Arch desktop too.
Now we "only" have to find out what's happening... definitely looks like a race condition due to improper locking.
Flags: needinfo?(ttaubert)
Comment 21•8 years ago
|
||
Mark 51 won't fix as 51 was released.
Comment 22•8 years ago
|
||
Tim, did you have a chance to find what is happening there?
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 23•8 years ago
|
||
All attempts to reproduce this outside of the docker image have failed for me, unfortunately. I can reproduce this if I use David's image, as mentioned in comment #20. But that doesn't build anymore because the artifacts have expired. We'd also have to adjust the Makefile a little now that we're using GYP for our ASan builds.
I have not managed to reproduce this when building NSS myself, with e.g. ./build.sh --asan --ubsan.
David, I hate to steal more of your time but can you reproduce this when building NSS yourself?
Another thing I spotted is that the docker image installs `libnss3-tools` for certutil. That install a bunch of libraries, including NSS and NSPR. I can't do "make nssthreading" with that removed. I wonder if there's some weird interaction between system libs and the ones taken from the Taskcluster build. OTOH, the UAF looks legit.
Flags: needinfo?(ttaubert) → needinfo?(dkeeler)
![]() |
||
Comment 24•8 years ago
|
||
I can still reproduce this locally. Perhaps try with this patch (maybe without asan?) - it poisons the PK11SlotInfo memory when freeing it and introduces some delay in nssTrustDomain_FindTrustForCertificate, which is where I'm currently seeing this reproduce (the original stack has nssTrustDomain_RemoveTokenCertsFromCache, but I think this is from the same underlying cause).
To run locally, you'll probably have to change the "dist/Linux3.13_x86_64_clang-3.9_glibc_PTH_ASAN_64_DBG.OBJ/" paths in the Makefile and source file to wherever your local build put the dist folder (also make sure that LD_LIBRARY_PATH is set appropriately - "dist/Debug/lib/" is what's working for me, with dist as a symlink to the nss dist dir).
The dockerfile installs nss utils because the Makefile uses certutil to do some setup, but I imagine that might not even be necessary (yeah - if you replace the NSS_Init with NSS_InitReadWrite you don't need to call certutil).
Flags: needinfo?(dkeeler)
Assignee | ||
Comment 25•8 years ago
|
||
After setting up my new Linux desktop (running Arch) I tried again and still can't reproduce it :( Franziskus can't either. I also couldn't reproduce it after building on the Docker image and linking against the self-built library. I wouldn't know how to debug and fix this otherwise...
The probably only thing left to try is to set up a Fedora virtual machine and see if it's reproducible in there? I'm a little out of ideas. I hope we don't have to spend the second week of May in front of David's machine and fix it ;)
Updated•8 years ago
|
Comment 27•8 years ago
|
||
Tim really, really can't reproduce :)
Can you pick this up again, keeler?
Flags: needinfo?(dkeeler)
![]() |
||
Comment 28•8 years ago
|
||
Looking at this again, it seems incorrect that the structures NSSSlot and NSSToken could each have a pointer to a PK11SlotInfo and yet not cause the PK11SlotInfo's refcount to increase (in my testcase, what sometimes happens is a PK11SlotInfo is destroyed while there are still NSSSlot/NSSTokens that reference it). Once I fixed that, I found that while a comment in nssToken_Destroy claims that an NSSToken will always have the last reference to an NSSSlot, this is demonstrably not true, so I fixed it to null out the NSSSlot->NSSToken pointer, since it would no longer be valid once the NSSToken is destroyed. Then I had to add a null check in nssSlot_GetToken since it could be null. Tim, what do you think? (with this patch, I can't reproduce the asan failures I was seeing)
Flags: needinfo?(dkeeler)
Attachment #8850230 -
Flags: feedback?(ttaubert)
Assignee | ||
Comment 29•8 years ago
|
||
David, thanks for the patch! I've been looking at it and it does look very promising indeed. I'm currently still trying to reproduce based on what's in your patch. Will get back to you soon about it.
Comment 30•8 years ago
|
||
I figure 54 and 55 are likely to be affected too.
We are a week away from shipping 53 release, but I'll keep tracking this for now, in case it's something we end up wanting to fix in mid-cycle.
status-firefox54:
--- → ?
status-firefox55:
--- → ?
Updated•8 years ago
|
status-firefox-esr52:
--- → affected
tracking-firefox-esr52:
--- → ?
Comment 31•8 years ago
|
||
This missed 53, but we could still take a patch for 54. Tim or David, do you think 54 is affected?
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 32•8 years ago
|
||
Comment on attachment 8850230 [details] [diff] [review]
possible patch
Review of attachment 8850230 [details] [diff] [review]:
-----------------------------------------------------------------
I still can't reproduce locally after trying a few more things but this looks very promising, definitely something we need to fix.
::: lib/dev/devslot.c
@@ +66,5 @@
> nssSlot_AddRef(
> NSSSlot *slot)
> {
> PR_ATOMIC_INCREMENT(&slot->base.refCount);
> + PK11_ReferenceSlot(slot->pk11slot);
(See below.)
::: lib/dev/devtoken.c
@@ +36,5 @@
> * When the token is actually destroyed, that ref must go too.
> */
> + nssSlot_EnterMonitor(tok->slot);
> + tok->slot->token = NULL;
> + nssSlot_ExitMonitor(tok->slot);
How did you find an instance where the token doesn't have the last reference to the slot? Instead of the three lines here and the null-check in `nssSlot_GetToken()` I added a `PORT_Assert(tok->slot->base.refCount == 1)`. Running a few tests locally didn't hit that. It would be good if we could properly assert that...
@@ +63,5 @@
> nssToken_AddRef(
> NSSToken *tok)
> {
> PR_ATOMIC_INCREMENT(&tok->base.refCount);
> + PK11_ReferenceSlot(tok->pk11slot);
Referencing the slot once on creation is fine, but do we need to increase the refCount every time someone references the token? I think we can remove this and, above, move the `PK11_FreeSlot()` call into the case where refCount==0 after the atomic decrement.
Attachment #8850230 -
Flags: feedback?(ttaubert) → feedback+
Assignee | ||
Comment 33•8 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> This missed 53, but we could still take a patch for 54. Tim or David, do you
> think 54 is affected?
Should be affected, but I'm not sure this is something we need to uplift. It seems quite difficult to exploit. David what's your take on this?
Flags: needinfo?(ttaubert)
Assignee | ||
Comment 34•8 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #32)
> How did you find an instance where the token doesn't have the last reference
> to the slot? Instead of the three lines here and the null-check in
> `nssSlot_GetToken()` I added a `PORT_Assert(tok->slot->base.refCount == 1)`.
> Running a few tests locally didn't hit that. It would be good if we could
> properly assert that...
Pushed to try with that assertion and the core parts of your patch and we don't hit it there either:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=416422a1fdf9bd9fa27a3f29598dd34f612195dd
Comment 35•8 years ago
|
||
There are no more scheduled releases for esr45, so marking wontfix.
![]() |
||
Comment 36•8 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #32)
> ::: lib/dev/devtoken.c
> @@ +36,5 @@
> > * When the token is actually destroyed, that ref must go too.
> > */
> > + nssSlot_EnterMonitor(tok->slot);
> > + tok->slot->token = NULL;
> > + nssSlot_ExitMonitor(tok->slot);
>
> How did you find an instance where the token doesn't have the last reference
> to the slot? Instead of the three lines here and the null-check in
> `nssSlot_GetToken()` I added a `PORT_Assert(tok->slot->base.refCount == 1)`.
> Running a few tests locally didn't hit that. It would be good if we could
> properly assert that...
IIRC, I encountered a situation where if one thread removes the module the token and slot are on while another thread is operating on the slot, the token will not have the last reference to the slot.
> @@ +63,5 @@
> > nssToken_AddRef(
> > NSSToken *tok)
> > {
> > PR_ATOMIC_INCREMENT(&tok->base.refCount);
> > + PK11_ReferenceSlot(tok->pk11slot);
>
> Referencing the slot once on creation is fine, but do we need to increase
> the refCount every time someone references the token? I think we can remove
> this and, above, move the `PK11_FreeSlot()` call into the case where
> refCount==0 after the atomic decrement.
Ah, I think I see. Unless I'm forgetting some issue I ran into, I agree that should work.
![]() |
||
Comment 37•8 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #33)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> > This missed 53, but we could still take a patch for 54. Tim or David, do you
> > think 54 is affected?
>
> Should be affected, but I'm not sure this is something we need to uplift. It
> seems quite difficult to exploit. David what's your take on this?
I believe 54 is still affected, but I think this would be very difficult to exploit, particularly without user intervention (I think this is only possible when unloading PKCS#11 modules or at shutdown).
Assignee | ||
Comment 39•8 years ago
|
||
I promised David that I'll take it from here.
Assignee: nobody → ttaubert
Status: NEW → ASSIGNED
Flags: needinfo?(dkeeler)
Comment 40•8 years ago
|
||
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #4)
> Thanks for your assessment David. Can you attach your test code?
> It sounds to me like NSS should lock here unless Wan-Teh has a reason why we
> don't.
I just realized Franziskus asked me a question in comment 4.
I am sorry I missed his question. I assume Tim and David already
figured out the root cause of this bug.
Flags: needinfo?(wtc)
Assignee | ||
Comment 41•8 years ago
|
||
(In reply to David Keeler [:keeler] (use needinfo?) from comment #36)
> (In reply to Tim Taubert [:ttaubert] from comment #32)
> > ::: lib/dev/devtoken.c
> > @@ +36,5 @@
> > > * When the token is actually destroyed, that ref must go too.
> > > */
> > > + nssSlot_EnterMonitor(tok->slot);
> > > + tok->slot->token = NULL;
> > > + nssSlot_ExitMonitor(tok->slot);
> >
> > How did you find an instance where the token doesn't have the last reference
> > to the slot? Instead of the three lines here and the null-check in
> > `nssSlot_GetToken()` I added a `PORT_Assert(tok->slot->base.refCount == 1)`.
> > Running a few tests locally didn't hit that. It would be good if we could
> > properly assert that...
>
> IIRC, I encountered a situation where if one thread removes the module the
> token and slot are on while another thread is operating on the slot, the
> token will not have the last reference to the slot.
Yeah, that seems like it could happen. Not sure if that's a common occurrence but we should guard against as you suggested.
Assignee | ||
Comment 42•8 years ago
|
||
This is basically David's patch, reduced a little as suggested by me above.
https://nss-review.dev.mozaws.net/D332
Assignee | ||
Comment 43•8 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.31
Comment 44•8 years ago
|
||
If we intend to fix this for Fx54 and ESR52.2, we'll need NSS 3.30 and 3.28 point releases ASAP since the RC builds are happening on Monday.
Assignee | ||
Comment 45•8 years ago
|
||
I don't think this really should be sec-high, maybe rather sec-moderate. It seems really hard to exploit, only when shutting down the browser. I'd let it ride the trains and hit beta in ~2 weeks.
Flags: needinfo?(ttaubert)
Updated•8 years ago
|
Group: crypto-core-security → core-security-release
Updated•8 years ago
|
Keywords: sec-high → sec-moderate
Comment 46•8 years ago
|
||
Track 54+/55+ and mark 54 won't fix per comment #45.
Reporter | ||
Comment 47•8 years ago
|
||
Looks like the same issue on fennec beta:
https://treeherder.mozilla.org/logviewer.html#?job_id=104139235&repo=mozilla-beta&lineNumber=855
Comment 48•8 years ago
|
||
Sounds like this is wontfix for ESR52, though maybe we should keep it on the radar for ride-along inclusion if/when we find ourselves shipping another 3.28.x point release anyway.
Updated•8 years ago
|
Whiteboard: [adv-main55+]
Comment 49•8 years ago
|
||
For Fedora, I have recently packaged NSS 3.31, which includes this fix. It is still under testing but I received a report from a user saying that the package is causing a deadlock in nssSlot_IsTokenPresent(), with a stacktrace similar to comment 9:
https://bugzilla.redhat.com/show_bug.cgi?id=1470352#c8
It seems that nssSlot_GetToken() obtains a lock before calling nssSlot_IsTokenPresent(), while nssSlot_IsTokenPresent() also tries to lock the slot for C_GetSlotInfo:
https://hg.mozilla.org/projects/nss/rev/859a4f82207b#l1.33
https://hg.mozilla.org/projects/nss/file/tip/lib/dev/devslot.c#l134
Tim, David, do you have any idea on this?
Flags: needinfo?(ttaubert)
![]() |
||
Comment 50•8 years ago
|
||
It would be fairly straightforward to re-work nssSlot_GetToken to not hold the "monitor" (lock) while calling nssSlot_IsTokenPresent.
Comment 51•8 years ago
|
||
Re comment 49 and 50, Daiki fixed it with bug 1381784.
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(ttaubert)
Updated•8 years ago
|
Flags: qe-verify?
Whiteboard: [adv-main55+] → [adv-main55+][post-critsmash-triage]
Updated•7 years ago
|
Group: core-security-release
Updated•5 years ago
|
Blocks: asan-maintenance
You need to log in
before you can comment on or make changes to this bug.
Description
•