1508776 - (CVE-2019-11756) UAF in sftk_FreeSession due to improper refcounting

Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is report bp-ec6e8cfb-9c0b-4e68-9917-977e80181120. ============================================================= Seen while looking at nightly crash stats. 1 crash so far, but there are several other similar signatures in the same build ID: 20181120100045. Some of the crash reports have safebrowing stuff in the stack. *https://crash-stats.mozilla.com/report/index/5e48d3a0-ac12-4463-84ee-fb62f0181120 - [@ sftk_ObjectFromHandle] * https://crash-stats.mozilla.com/report/index/aa4a675d-1dae-4b22-9698-355130181120 - [@ sftk_SessionFromHandle ] * https://crash-stats.mozilla.com/report/index/18de7f1b-8ec5-4fc1-8784-42ac90181120 * https://crash-stats.mozilla.com/report/index/7b929956-332a-4de6-8f53-6c5100181120 Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e44bb5b4bc79be613d29b3f95d7b508e68e3d128&tochange=eeddcefcdad847bf8a5737153079e9619ee5aa66 Top 10 frames of crashing thread: 0 @0x1cdb1e67a80 1 softokn3.dll sftk_FreeSession security/nss/lib/softoken/pkcs11u.c:1868 2 softokn3.dll NSC_DigestUpdate security/nss/lib/softoken/pkcs11c.c:1794 3 nss3.dll PK11_DigestOp security/nss/lib/pk11wrap/pk11cxt.c:783 4 xul.dll nsCryptoHash::Update security/manager/ssl/nsCryptoHash.cpp:116 5 xul.dll nsresult mozilla::safebrowsing::LookupCacheV4::VerifyChecksum toolkit/components/url-classifier/LookupCacheV4.cpp:416 6 xul.dll mozilla::safebrowsing::LookupCacheV4::LoadFromFile toolkit/components/url-classifier/LookupCacheV4.cpp:205 7 xul.dll nsresult mozilla::safebrowsing::LookupCache::LoadPrefixSet toolkit/components/url-classifier/LookupCache.cpp:478 8 xul.dll mozilla::safebrowsing::LookupCache::Open toolkit/components/url-classifier/LookupCache.cpp:83 9 xul.dll class RefPtr<mozilla::safebrowsing::LookupCache> mozilla::safebrowsing::Classifier::GetLookupCache toolkit/components/url-classifier/Classifier.cpp:1526 =============================================================

Marcia Knous [:marcia]

Reporter

Comment 1

•

6 years ago

No further crashes on 65 nightly since I originally filed this. sftk_ObjectFromHandle signature and sftk_SessionFromHandle also doesn't appear to have any further crashes on nightly.

Randell Jesup [:jesup] (needinfo me)

Comment 2

•

6 years ago

2 total crashes in 65; volume is low enough that nightly isn't a great marker. Crashes are a mix of bad pointers and execs of bad pointers -> sec-high. At least one is a clear e5 UAF Crashes seem to go back to FF 52 or earlier

Group: core-security

status-firefox64: unaffected → affected

status-firefox-esr60: --- → affected

Keywords: csectype-uaf, csectype-wildptr, sec-high

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox64: affected → wontfix

status-firefox66: --- → affected

Daniel Veditz [:dveditz]

Comment 3

•

6 years ago

I don't know if this is a bug in the NSS implementation or a bug in how Safe Browsing is using it.

Group: core-security → core-security-release

Component: Security → Safe Browsing

Product: Core → Toolkit

Jared Wein [:jaws] (please needinfo? me)

Comment 4

•

6 years ago

Low crash rate and unclear steps forward. Marking as wontfix for 65.

status-firefox65: affected → wontfix

Justin Dolske [:Dolske]

Comment 5

•

6 years ago

sec-high, can we get an owner and priority assigned for investigation?

Flags: needinfo?(dlee)

Dimi Lee [:dimi]

Updated

•

6 years ago

Assignee: nobody → dlee

Status: NEW → ASSIGNED

Flags: needinfo?(dlee)

Priority: -- → P2

Justin Dolske [:Dolske]

Comment 6

•

6 years ago

Can we get an update and ETA here? This is currently a sec-high bug.

Flags: needinfo?(ettseng)

Flags: needinfo?(dlee)

Dimi Lee [:dimi]

Comment 7

•

6 years ago

I'll update this before the end of this week

Flags: needinfo?(ettseng)

Flags: needinfo?(dlee)

Priority: P2 → P1

Dimi Lee [:dimi]

Comment 8

•

6 years ago

By checking those crashes with SafeBrowsing call stacks, I didn't find any suspicious behavior and pointers passed to NSS are also valid.
The way SafeBrowsing uses NSS is very straightforward:
When there is a SafeBrowsing update, the worker thread Init the lib, update the buffer with SHA256 hash and then call Finish().
Since there are also crashes with a similar signature from different components, I guess this probably is not a SafeBrowsing bug.

I checked minidumps from different components, as mentioned in Comment 2, it looks like the pointer used in |sftk_FreeSession| is corrupt.

I am not familiar with NSS stuff, so I'll need some help.

Hi J.C.
Do you know who might be able to help on this bug? Thanks!

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 9

•

6 years ago

That's pretty much me. I'll get started on this as soon as I can, no later than Monday the 18th, and update the bug. (Leaving the NI intact)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 10

•

6 years ago

I was a week late here due to coming down very ill. I've begun looking into this today. So far it appears to be a race condition, but I haven't yet nailed it down.

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 11

•

6 years ago

So I still don't have a solid lead on this, but it's clearly an NSS race condition, so let's start by moving components.

I'm pretty confident this is not going to be remotely exploitable. We can probably lower this to a sec-moderate, as I don't see how to trigger this, nor in a way to prompt useful corruption.

The stacks indicate this is a crash from a double-free on a given SFTKSession, which would be a ref-miscount in sftk_FreeSession, probably caused by a race on the refCount during destruction itself.

SFTKSession intends to be a smart-pointer with a handy thread-safety lock, but since it cannot track its own state, upon deletion,
further uses of itself are bogus and will be uncaught. So it's a smart-pointer that can't be relied upon to fail-safe once it is
deallocated. Hurray.

Because of how this works, methods like sftk_SessionFromHandle find the session for a given handle, but as far as I can tell right now, the handle can long outlive the session. Destroying the session doesn't remove it from the handle. Perhaps that's the simplest remediation.

[Marking wontfix for 66, affecting 67 (though we haven't seen a crash yet, it's rare), and I don't think this is a regression.]

Assignee: dlee → jjones

Group: core-security-release → crypto-core-security

Severity: critical → normal

status-firefox66: affected → wontfix

status-firefox67: --- → affected

Component: Safe Browsing → Libraries

Keywords: regression

OS: Windows 10 → Windows

Product: Toolkit → NSS

QA Contact: jjones

Summary: Crash in sftk_FreeSession → Mutex collision in sftk_FreeSession

Version: Trunk → other

J.C. Jones [:jcj] (he/they)

Assignee

Comment 12

•

6 years ago

My summary update was improper; much earlier I thought this was a mutex collision, but it's actually a UAF from improper refcounting. Specifically, that once the refcount indicates we can destroy the object, we destroy the refcount, causing future accesses to be invalid.

The most obvious options here are:

as mentioned in comment 11, to return result from sftk_FreeSession that obligates callers to zero their session pointers.
add sentinel values
build an indirection type that can be more like a real smart pointer; this is likely very hard for this type

Summary: Mutex collision in sftk_FreeSession → UAF in sftk_FreeSession due to improper refcounting

J.C. Jones [:jcj] (he/they)

Assignee

Updated

•

6 years ago

Updated

•

6 years ago

Comment 13

•

5 years ago

Is this still something you may work on? Or, that we might find more help to investigate further?
I'm going through some untouched-for-a-while sec-high issues to make sure we've investigated all we can before marking them stalled.

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 14

•

5 years ago

kjacobs and I spent another 8 hours or so (each) on this in August. This bug is a perfect example of why even clever C is dangerous.

Where we left off is that we're going to need to build something to help us find this bug. And this is not the only UAF caused by this improper refcounting in NSS, so it'd be nice to get the whole class of bugs fixed.

I don't know if we should mark this stalled or not. When I have time to work on NSS again, I intend to return to working on this bug.

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 15

•

5 years ago

Attached file Bug 1508776 - Remove unneeded refcounting from SFTKSession r?mt,kjacobs — Details

sftk_SessionFromHandle reads SFTKSession objects from the slot, and it's
possible that during slot session closure/shutdown that a given session may have
already had its refcount subtracted. In that case, just don't return the
session.

J.C. Jones [:jcj] (he/they)

Assignee

Comment 16

•

5 years ago

Every instance of sftk_FreeSession is correctly guarded to ensure it is never passed a null pointer, which further implicates that the whole structure has been freed or in some other way clobbered before the crash. So the actual crashing code is probably one of these two lines:

    PZLock *lock = SFTK_SESSION_LOCK(slot, session->handle);

    PZ_Lock(lock);

And it's probably SFTK_SESSION_LOCK, since as a macro would put sfk_FreeSession be the top of the crashing stack frames, which it is. Which would mean that (session->)slot or slot->sessionLock are null. Since session isn't null, that implies its contents are no longer consistent.

This is all to say, my initial analysis seems accurate - sftk_FreeSession is being passed a non-null session that has already been deallocated, but it can't tell because upon deallocation the memory can be overwritten. (There could be wider heap corruption, too, but this seems more likely.)

In practically all cases, sftk_FreeSession is being passed a session from sftk_SessionFromHandle which is getting it from the slot via sftk_SlotFromSessionHandle.

The only way to remove a session from the slot is NSC_CloseSession, which is called indirectly via pk11_CloseSession. If the refcounting works correctly, it's not possible to have a deallocated session until pk11_CloseSession is called, as the slot holds a reference effectively itself, complete with an assertion in NSC_CloseSession.

There are locks on the refcounts, and PR_Lock is used to enforce the critical sections. The locks are weird, in that they use a hash function to assign multiple sessions to a single lock, but they are consistent in the practice, so that's probably not the issue.

It is notable however that the lock belongs to the slot, not the session, yet we exit the critical section before destroying the session, which could be a race condition, but one which is still depending on some kind of refcount error elsewhere.

My only reasonable-level suggestion here is to check the session refcount in sftk_SessionFromHandle and return NULL if it's zero, which is patch D47010. This will need some testing.

Martin Thomson [:mt:]

Comment 17

•

5 years ago

•

Edited

Eek. I don't like seeing refcounted objects that are accessible when their refcount is zero. That's a violation of an important invariant.

This is really hard to think through. I'm not sure that my sequence of events here is right either, but I don't think that this change addresses the core problem.

Looking at NSC_CloseSession, say that two threads attempt to call that NSC_CloseSession at the same time. Both could pull a valid pointer from sftk_SessionFromHandle if the two calls happen at around same time. The lock is released several times in this process. I think that the lock is consistent in covering both the refCount and the reference to the session from the slot, but the interaction of the two could be a problem.

Assuming that refCount == 1 to start, this looks like a problematic sequence of events. The order of the first two steps doesn't matter.

A: sftk_SessionFromHandle => refCount++
B: sftk_SessionFromHandle => refCount++
A: sftkqueue_delete and refCount--
B: skip the delete block because the session is no longer attached to the slot
B: sftk_FreeSession => refCount--
B: refCount was reduced to 1, call free()
A: sftk_FreeSession => UAF on sftk_SlotFromSession as it attempts to read session->slot

All of those steps, aside from the last two, are independently covered by the lock. The scary part is that covering the free() with the lock wouldn't help. The point is that A is holding a reference to an object that has been freed.

The problem is that there is an interaction between reference counting and the ownership here. Given that this object is pinned to the slot and only freed on shutdown, I have a more dramatic simplification to suggest: remove the reference count entirely. Then use the lock to govern its insertion and removal to and from the slot. Only delete the object if it was removed from the slot.

That doesn't address the problem of sessions that are in active use during shutdown. Notice that NSC_CloseSession doesn't assert that the reference count is 1, it asserts that it is greater than zero, which suggests that this code might not be careful enough about tracking outstanding uses before hitting the shutdown code. Changing that assertion to be a ==1 and running tests would be illuminating. Other than that check, I would advocate for dumping the reference count entirely.

J.C. Jones [:jcj] (he/they)

Assignee

Comment 18

•

5 years ago

Thank you for the further analysis, Martin - running a concrete example is, to use your words, illuminating.

Your hypothesis that asserting refcount==1 after the sftkqueue_delete in NSC_CloseSession seems to work fine in NSS ASAN builds on MacOS. I'm currently testing webcrypto tests in m-c and working on a patch to try removing the refcount entirely.

That said, these kind of fixes we'd probably want to bury in a big try-push against mozilla-central to fully evaluate on all platforms.

Phabricator Automation

Updated

•

5 years ago

Attachment #9095024 - Attachment description: Bug 1508776 - Catch corner case for sftksession objects r?mt,kjacobs → Bug 1508776 - Remove unneeded refcounting from SFTKSession r?mt,kjacobs

J.C. Jones [:jcj] (he/they)

Assignee

Comment 19

•

5 years ago

Attached file Bug 1577822 - land NSS e022f47937ca UPGRADE_NSS_RELEASE (obsolete) — Details

Phabricator Automation

Updated

•

5 years ago

Attachment #9095363 - Attachment description: Bug 1508776 - land NSS e022f47937ca UPGRADE_NSS_RELEASE → Bug 1577822 - land NSS e022f47937ca UPGRADE_NSS_RELEASE

J.C. Jones [:jcj] (he/they)

Assignee

Comment 20

•

5 years ago

D47184 is a mozilla-central uplift with a collection of NSS changes and the patch from D47010. I do not intend to land D47184, but am planning to run a full all/all try run against mozilla-central using it, once we have some initial thoughts on the refcount removal patch.

J.C. Jones [:jcj] (he/they)

Assignee

Comment 21

•

5 years ago

I have an all-all try push running against mozilla-central ( https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fc5474b750a6225b5b10b7608d67c1ddbe85ea8&selectedJob=268655312 ) and it has only one semi-suspicious crash, with retriggers in-progress. Overall, this change looks like it works for Firefox.

The nature of the change - particularly to PKCS11_CloseAllSessions - is that consumers of NSS that do a lot of module unloading are going to be most likely to run into any issues from this change. That would be users unplugging smart cards, most likely.

I'll see if I can dig some up for testing.

J.C. Jones [:jcj] (he/they)

Assignee

Comment 22

•

5 years ago

Comment on attachment 9095024 [details]
Bug 1508776 - Remove unneeded refcounting from SFTKSession r?mt,kjacobs

Security Approval Request

How easily could an exploit be constructed based on the patch?: Not easily. This might be impossible to trigger.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
Which older supported branches are affected by this flaw?: All
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: Yes
If not, how different, hard to create, and risky will they be?: This would be easy to add to the NSS uplifts, but three days of testing in nightly might not be enough. Still, I don't know how this could be triggered (except at shutdown or by removing a smart card repeatedly), so I think maybe it could land in nightly and uplift later...maybe?
How likely is this patch to cause regressions; how much testing does it need?: It's unlikely to cause problems for the majority of Firefox users, but those using smart cards could see random issues. We're going to need some testing to feel confident in it before going to a wide audience.

Attachment #9095024 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 23

•

5 years ago

Comment on attachment 9095024 [details]
Bug 1508776 - Remove unneeded refcounting from SFTKSession r?mt,kjacobs

sec-approval+ for mozilla-central.

Attachment #9095024 - Flags: sec-approval? → sec-approval+

Al Billings [:abillings - ex-MoCo]

Updated

•

5 years ago

status-firefox67: affected → wontfix

status-firefox69: --- → wontfix

status-firefox70: --- → affected

status-firefox71: --- → affected

status-firefox-esr60: affected → wontfix

status-firefox-esr68: --- → affected

tracking-firefox70: --- → +

tracking-firefox71: --- → +

tracking-firefox-esr68: --- → 70+

J.C. Jones [:jcj] (he/they)

Assignee

Updated

•

5 years ago

Blocks: 1566873

Phabricator Automation

Updated

•

5 years ago

Attachment #9095363 - Attachment is obsolete: true

J.C. Jones [:jcj] (he/they)

Assignee

Comment 24

•

5 years ago

https://hg.mozilla.org/projects/nss/rev/5619cbbca3db5f583a888181ec024069a46087aa

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Target Milestone: --- → 3.47

J.C. Jones [:jcj] (he/they)

Assignee

Comment 25

•

5 years ago

I feel this bug is too risky to uplift into ESR or even probably Beta, and while it's a wildptr, there's no known exploit path. I'd like to not uplift this, and let it instead ride the trains.

Tom, would that be you now to give a blessing thus?

Flags: needinfo?(tom)

Tom Ritter [:tjr]

Comment 26

•

5 years ago

I think so; that plan sounds ok to me.

Flags: needinfo?(tom)

J.C. Jones [:jcj] (he/they)

Assignee

Comment 27

•

5 years ago

Thanks, Tom.

No longer blocks: 1566873

J.C. Jones [:jcj] (he/they)

Assignee

Updated

•

5 years ago

status-firefox70: affected → wontfix

status-firefox-esr68: affected → wontfix

Julien Cristau [:jcristau]

Comment 28

•

5 years ago

https://hg.mozilla.org/mozilla-central/rev/32eea6049fe700ff689a5bd7d11abe249ec0c610

status-firefox71: affected → fixed

tracking-firefox-esr68: 70+ → ---

Daniel Veditz [:dveditz]

Updated

•

5 years ago

Group: crypto-core-security → core-security-release

Brindusa Tot, Desktop QA

Updated

•

5 years ago

Flags: qe-verify-

Whiteboard: [post-critsmash-triage]

J.C. Jones [:jcj] (he/they)

Assignee

Comment 29

•

5 years ago

Can you assign a CVE to this, Tom? Thanks.

Flags: needinfo?(tom)

Tom Ritter [:tjr]

Updated

•

5 years ago

Alias: CVE-2019-11756

Flags: needinfo?(tom)

Tom Ritter [:tjr]

Updated

•

5 years ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main71+]

Tom Ritter [:tjr]

Comment 30

•

5 years ago

Attached file advisory.txt — Details

Huzaifa Sidhpurwala

Comment 31

•

5 years ago

Since this flaw is already mentioned in the release notes at:
https://developer.mozilla.org/en-US/docs/Mozilla/Projects/NSS/NSS_3.47_release_notes

Can we please make this bug public?

Flags: needinfo?(jjones)

Mark Straver

Comment 32

•

5 years ago

Please do not rush making these bugs open to the public. We are still in the process of adjusting our code.

J.C. Jones [:jcj] (he/they)

Assignee

Comment 33

•

5 years ago

Marking security bugs as public is the responsibility of the security management team. Overall we are generally more conservative for NSS, and more conservative still for sec-high bugs, but the marking-public operation is done in bulk periodically based on bugqueries.

It there a specific reason you need this sec-high disclosed early, Huzaifa? The description of removing the locks is much less useful than the descriptions in this bug and within the patch for generating a PoC. The early opening would need to be justified to Tom Ritter, not me.

Flags: needinfo?(jjones) → needinfo?(huzaifas)

Huzaifa Sidhpurwala

Comment 34

•

5 years ago

(In reply to J.C. Jones [:jcj] (he/him) from comment #33)

Marking security bugs as public is the responsibility of the security management team. Overall we are generally more conservative for NSS, and more conservative still for sec-high bugs, but the marking-public operation is done in bulk periodically based on bugqueries.

It there a specific reason you need this sec-high disclosed early, Huzaifa? The description of removing the locks is much less useful than the descriptions in this bug and within the patch for generating a PoC. The early opening would need to be justified to Tom Ritter, not me.

Making the summary and the patch known while keeping the bug private just seems odd to me and hence the request. fwiw this bug is already disclosed via the patch and the entry in the release notes

Flags: needinfo?(huzaifas)

Mark Straver

Comment 35

•

5 years ago

(In reply to Huzaifa Sidhpurwala from comment #34)

Making the summary and the patch known while keeping the bug private just seems odd to me and hence the request. fwiw this bug is already disclosed via the patch and the entry in the release notes

Ah, but there is a distinct difference between having a summary (describing the generics of the bug) and code patch (for which efforts are made to not paint a bulls-eye on the vulnerability) public, and disclosing a bug that generally has a ready proof-of-concept and detailed discussion of the vulnerability and an analysis of exactly why it occurs.

From the former, it's extremely difficult to come to a way to exploit a vulnerability. From the latter, it'll be relatively (very) easy. This is also why tests, that expose a way to trigger the problem (similar to a PoC) tend to be landed (much) later than the patches for the problem.

While you are right that having the patch public might make it obvious where the problem lies, it won't hand anything on a silver platter; but it's required for Open Source development -- so this is the best compromise.

David Lawrence [:dkl]

Comment 36

•

5 years ago

Removing employee no longer with company from CC list of private bugs.

Daniel Veditz [:dveditz]

Updated

•

4 years ago

Group: core-security-release

Bug 1508776 - Remove unneeded refcounting from SFTKSession r?mt,kjacobs 5 years ago J.C. Jones [:jcj] (he/they) 47 bytes, text/x-phabricator-request	abillings : sec-approval+	Details \| Review
Bug 1577822 - land NSS e022f47937ca UPGRADE_NSS_RELEASE 5 years ago J.C. Jones [:jcj] (he/they) 47 bytes, text/x-phabricator-request		Details \| Review
advisory.txt 5 years ago Tom Ritter [:tjr] 182 bytes, text/plain		Details