Closed Bug 1607449 Opened 5 years ago Closed 4 years ago

ThreadSanitizer: data race [@ fill_CERTCertificateFields] vs. [@ CERT_DestroyCertificate] with potential use-after-free

Tracking

(firefox-esr68 wontfix, firefox-esr78 wontfix, firefox72 wontfix, firefox73 wontfix, firefox74 wontfix, firefox82 wontfix, firefox83 wontfix, firefox84 fixed)

Status:

RESOLVED FIXED

Milestone:

3.59

Tracking Flags:

Tracking

Status

firefox-esr68

---

wontfix

firefox-esr78

---

wontfix

firefox72

---

wontfix

firefox73

---

wontfix

firefox74

---

wontfix

firefox82

---

wontfix

firefox83

---

wontfix

firefox84

---

fixed

People

(Reporter: decoder, Assigned: kjacobs)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-race, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main84+r])

Crash Data

Attachments

(3 files, 1 obsolete file)

Detailed Crash Information 5 years ago Christian Holler (:decoder) 24.20 KB, text/plain		Details
Bug 1607449 - Lock cert->nssCertificate to prevent data race. 5 years ago Kevin Jacobs [:kjacobs] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1607449 - Remove supression for seemingly fixed issue. r?decoder 4 years ago Aria Beingessner [:Gankra] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1607449 - Backed out changeset c5d8b4f34c06 4 years ago Aria Beingessner [:Gankra] 47 bytes, text/x-phabricator-request		Details \| Review

Christian Holler (:decoder)

Reporter

Description

•

5 years ago

The attached crash information was detected while running CI tests with ThreadSanitizer on mozilla-central revision bc5880b621d5.

In this race it looks like one thread is about to destroy an NSSCertificate while the main thread is still working on it. Filing this as a security bug because this can lead to a potential use-after-free. I don't know who exactly is responsible for the proper synchronization, but since PSM seems to be the owner of the freeing thread, filing it here.

General information about TSan reports

Why fix races?

Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.

Rating

If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.

False Positives / Benign Races

Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].

[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf

Suppressing unfixable races

If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.

Christian Holler (:decoder)

Reporter

Comment 1

•

5 years ago

Attached file Detailed Crash Information — Details

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

Group: core-security → crypto-core-security

Andrew McCreight [:mccr8]

Updated

•

5 years ago

Keywords: csectype-race

Dana Keeler (she/her) (use needinfo) [:keeler]

Updated

•

5 years ago

Assignee: nobody → nobody

Component: Security: PSM → Libraries

Product: Core → NSS

QA Contact: jjones

Version: Trunk → other

J.C. Jones [:jcj] (he/they)

Comment 2

•

5 years ago

Kevin, please triage.

Flags: needinfo?(kjacobs.bugzilla)

Daniel Veditz [:dveditz]

Updated

•

5 years ago

Keywords: sec-moderate

Daniel Veditz [:dveditz]

Comment 3

•

5 years ago

Could be sec-high if this race could be forced to happen reliably in practice.

J.C. Jones [:jcj] (he/they)

Comment 4

•

5 years ago

This race involves two simultaneous PSM operations:

Fetching all client auth certificates, using ClientAuthDataRunnable in nsNSSIOLayer.cpp, and
Asynchronously verifying the server certificate that we're connected to using SSLServerCertVerificationJob in SSLServerCertVerification.cpp, which is the start of the NSS callback that ultimately runs mozilla::pkix

From inspection, it appears to me the obvious way to trigger this would be to connect to a server which is using a certificate that Firefox also has as a client cert, and has the private key for. The client auth code might choose the server cert from the cache, which then gets deleted when moz::pkix finishes.

Unfortunately, this happens early enough in cert verification that there's no need for the server certificate to actually be trusted in any way, but it does require Firefox to be configured with such a cert in the first place, and I don't believe it's possible to set a server cert for WebRTC, the obvious way to meet this minimums.

Kevin Jacobs [:kjacobs]

Assignee

Comment 5

•

5 years ago

•

Edited

Thanks, I didn't get a chance to dig into this until today.

Could an attacker interrogate the client for a legitimate client cert in one connection, then subsequently present it as a server cert (that will fail SSLServerCertVerificationJob, but still trigger the free)? It appears so, as long as https://searchfox.org/mozilla-central/source/security/nss/lib/pki/tdcache.c#718 can be fooled.

Edit: For the first connection you'd need a legitimate server cert chain (since we don't send the client cert until verification is complete), or simply a captured TLS 1.2 handshake...

Flags: needinfo?(kjacobs.bugzilla)

BugBot [:suhaib / :marco/ :calixte]

Comment 6

•

5 years ago

The priority flag is not set for this bug.
:jcj, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Comment 7

•

5 years ago

Something doesn't mesh here. The TSAN log shows server-auth callback is running at the same time as the client-cert search is occurring; how can that be if they're serial in nature?

Flags: needinfo?(jjones) → needinfo?(kjacobs.bugzilla)

Kevin Jacobs [:kjacobs]

Assignee

Comment 8

•

5 years ago

•

Edited

We prevent sending of the client cert until the server chain has been verified. So the Edit only applies to getting a client cert to later reflect and trigger the UAF.

Flags: needinfo?(kjacobs.bugzilla)

BugBot [:suhaib / :marco/ :calixte]

Comment 9

•

5 years ago

The priority flag is not set for this bug.
:jcj, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jjones)

J.C. Jones [:jcj] (he/they)

Comment 10

•

5 years ago

After further conversation, I understand Kevin's comment 8 about how to go about writing an exploit to reach this UAF. It's possible that this could be provoked in a straightforward manner, which would warrant promotion to sec-high.

Assigning to Kevin.

Dana, we need to talk through how client cert caching works -- and whether we need to use the cache at all there, because that would be the most straightforward fix.

Assignee: nobody → kjacobs.bugzilla

Status: NEW → ASSIGNED

status-firefox72: --- → wontfix

status-firefox74: --- → affected

status-firefox-esr68: --- → affected

Flags: needinfo?(jjones) → needinfo?(dkeeler)

Priority: -- → P1

Kevin Jacobs [:kjacobs]

Assignee

Comment 11

•

5 years ago

•

Edited

I had questioned why these use cases were sharing a single cache (as it would seem to provide very little benefit, if any). Looking back into it though, it makes sense from an NSS perspective and isn't easily changed from PSM like I imagined it might be.

Dana, please correct me if you have a different opinion, but I think NSS just needs to synchronize accesses to these fields.

Dana Keeler (she/her) (use needinfo) [:keeler]

Comment 12

•

5 years ago

This cache is entirely an NSS mechanism, so I'm not sure where PSM would come into it.
That said, what is this cache speeding up? Certificate decoding? Perhaps we could remove it entirely.
At the very least, if we need to lock e.g. the trust field as in https://searchfox.org/mozilla-central/rev/ad6234148892bc29bf08d736604ac71925138040/security/nss/lib/pk11wrap/pk11cert.c#381-383, it makes sense we would need to lock other fields as well (incidentally, I noticed that NSS currently uses a global lock for that field, which probably can be improved upon).

Flags: needinfo?(dkeeler)

Kevin Jacobs [:kjacobs]

Assignee

Comment 13

•

5 years ago

What specific test triggered this race? I wrote a test for the more trivial exploit, but only managed to trigger bug 1615569.

Flags: needinfo?(choller)

Christian Holler (:decoder)

Reporter

Comment 14

•

5 years ago

(In reply to Kevin Jacobs [:kjacobs] from comment #13)

What specific test triggered this race? I wrote a test for the more trivial exploit, but only managed to trigger bug 1615569.

As far as I can tell, this happened in or around dom/base/test/test_bug466080.html. The problem is that TSan was running without halt_on_error at that time to speed up the try runs, so it could also have been one of the tests right before that. The race was also intermittent, I discovered it pretty late.

If that is not sufficient, we could remove the suppression for a try run and retrigger the relevant test chunks a few times to get more precise information.

Flags: needinfo?(choller)

Kevin Jacobs [:kjacobs]

Assignee

Comment 15

•

5 years ago

Attached file Bug 1607449 - Lock cert->nssCertificate to prevent data race. — Details

Aria Beingessner [:Gankra]

Comment 16

•

4 years ago

Attached file Bug 1607449 - Remove supression for seemingly fixed issue. r?decoder (obsolete) — Details

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 17

•

4 years ago

Remove supression for seemingly fixed issue. r=decoder
https://hg.mozilla.org/mozilla-central/rev/c5d8b4f34c06

Kevin Jacobs [:kjacobs]

Assignee

Comment 18

•

4 years ago

This already recurred on CI and, filed under bug 1673960.

I need to revisit my patch w.r.t. Dana's concern in https://phabricator.services.mozilla.com/D64233#1963138. FWIW, my [local] TSAN also stopped reporting this issue shortly after I made that patch.

I'll re-prioritize this, but depending on the report frequency, we might want to re-suppress this one.

Aria Beingessner [:Gankra]

Comment 19

•

4 years ago

Yes, absolutely just revert my patch in the mean-time.

Aria Beingessner [:Gankra]

Comment 20

•

4 years ago

Attached file Bug 1607449 - Backed out changeset c5d8b4f34c06 — Details

Still an issue, just rare.

Phabricator Automation

Updated

•

4 years ago

Attachment #9184023 - Attachment is obsolete: true

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 22

•

4 years ago

Backed out changeset c5d8b4f34c06 r=decoder
https://hg.mozilla.org/integration/autoland/rev/790a8aef5dff53355605d69376f2974d92a050d9
https://hg.mozilla.org/mozilla-central/rev/790a8aef5dff

J.C. Jones [:jcj] (he/they)

Comment 24

•

4 years ago

https://hg.mozilla.org/projects/nss/rev/aa6f29a76cfca9026b595f3cd52ffc76f18e8508

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

Target Milestone: --- → 3.59

J.C. Jones [:jcj] (he/they)

Comment 25

•

4 years ago

https://hg.mozilla.org/projects/nss/rev/c3cb09a7d08766701c7614879701b22af47543a8

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

Group: crypto-core-security → core-security-release

status-firefox73: affected → wontfix

status-firefox74: affected → wontfix

status-firefox82: --- → wontfix

status-firefox83: --- → wontfix

status-firefox84: --- → fixed

status-firefox-esr68: affected → wontfix

status-firefox-esr78: --- → wontfix

Brindusa Tot, Desktop QA

Updated

•

4 years ago

Flags: qe-verify-

Whiteboard: [post-critsmash-triage]

Tom Ritter [:tjr]

Comment 26

•

4 years ago

I can't tell if this issue is fixed, and should be included in the internally reported advisories, or just suppressed...

Flags: needinfo?(kjacobs.bugzilla)

Kevin Jacobs [:kjacobs]

Assignee

Comment 27

•

4 years ago

The issue is fixed. Suppression removed in https://hg.mozilla.org/mozilla-central/rev/4836d57847c7bf330c7c89101bcb144778607c3f

Flags: needinfo?(kjacobs.bugzilla)

Tom Ritter [:tjr]

Updated

•

4 years ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main84+r]

Daniel Veditz [:dveditz]

Updated

•

3 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.