SQL backend does not reload certificates

RESOLVED FIXED in 3.18

Status

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: elio.maldonado.batiz, Assigned: elio.maldonado.batiz)

Tracking

3.15.1
3.18
x86_64
Linux

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

5 years ago
Originally reported on Red Hat Enterprise Linux.
 Pavel Moravec 2013-05-10 04:09:38 EDT

Description of problem:
SQL backend of NSS should allow reloading certificate database without the need to restart the app using it.

But using reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=576536#c27, it does not seem work.


Version-Release number of selected component (if applicable):
nss-3.14.0.0-12.el6.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Install qpid broker / MRG-M, e.g. from http://download.devel.redhat.com/brewroot/////packages/qpid-cpp/0.18/14.el6/x86_64/
2. PATH=$PATH:/opt/rh-qpid/clients
(that is for path to qpid-send)
3. Follow reproducer from https://bugzilla.redhat.com/show_bug.cgi?id=576536#c27.

  
Actual results:
"./create_test_certificates.sh no yes" fails with:

testing if SSL works (should work):
qpid-send: Failed to connect (reconnect disabled)
1


Expected results:
"./create_test_certificates.sh no yes" should return:

testing if SSL works (should work):
0


Additional info:
- Let me know for any issues with setting up qpid.
- Also let me know if you need to point to qpid source code calling the NSS routines.
- Mareček Aleš (alich) from QE has internal reproducer as well

 Bob Relyea 2013-06-27 18:16:48 EDT

So after looking at the test program I see two things:

1) If what I see trying to be done is what I think your trying to do (include each trusted client auth cert individually), then I can only hope this is not a Red Hat product doing this. It's not really a good sustainable support module.. but...

2) What you are trying to do should work. NSS doesn't 'load' databases, but it searches them. You are using the sql lite DB, which is the correct thing. I can only guess that we are running into a problem with the certificate being cached, so that the trust record isn't being updated correctly on the cert. This may require some surgery to the NSS cert cache.

We may need to flush the SSL SID cache in the server to handle this kind of update.

---
Playing with the test program I see the following:

1) We are always getting an update when we take trust away. (that is the remove cert case is always working correctly).

2) If we remove the cert and add it back without trying to reference the server, then everything works.

3) starting with the cert removed, adding it, then referencing the server works.

4) starting with the cert removed, referencing the server, then adding the cert does not work.

That seems to tell me that we may have a certificate reference leak in the failed client auth case. This could be in NSS, or it could be in the application. Unfortunately I'm having trouble running the tests under a debug version of NSS. Once I solve that, I should be able to quickly determine where the reference leak is... so this is looking more like just a plain bug rather than and RFE.

bob
----
skipping some comments
----
 Bob Relyea 2013-07-02 15:54:17 EDT

OK, I've found the issue. We're caching the 'issuer' certs in the CRL cache. It will take some surgery to completely fix up the CRL cache (it shouldn't be holding on to cert references long term). In the mean time, there are some things we can do to deal with the existing issue:

1) if the client certs are issued from a root CA, you can still directly trust each client cert separately. The problem we are running into is the client certs are self-signed so they they look like their own issuers.

2) I can supply a patch to elio that would skip the CRL check if the issuer and subject are the same, keeping the self-signed leaf certs out of the CRL cache.

Doing either does not fix any issues updating the trust on root certs that have been used in validations.

bob

Created attachment 767986 [details] [diff] [review]
don't hold issuer certs handles in the CRL cache.

Elio: here's a patch that fixes the problem generally (including updates to CAs and itermediate certificates).

Please make some test builds for RHEL 6.5 with this patch.

QA: This patch is a medium risk. The changes are relatively contained, and would most likely affect CRL processing is there is no problem. I'm running the NSS upstream tests right now. The should turn up any serious issues, but this patch probably warrents running some regression tests.

bob

 Bob Relyea 2013-07-02 19:59:36 EDT

Created attachment 767992 [details] [diff] [review]
don't hold issuer certs handles in the CRL cache.

This one works better. (passes the NSS upstream tests).
(Assignee)

Comment 1

5 years ago
I will attach a patch suitably modified for nss-3.15.2 or higher.
(Assignee)

Updated

5 years ago
Assignee: nobody → emaldona

Comment 2

5 years ago
When working with trust, we (Chrome) ended up finding a number of thread safety issues when NSS was used from a multi-threaded application and cert trust changed. I'm really nervous about changing certs' trust records on the fly - we ended up going a completely different route (using libpkix's additional trust anchors) because of this.

I'm also nervous of situations where we might start failing (eg: if a network drive is disconnected/interrupted ) where otherwise we succeeded (eg: because NSS cached the slot contents on init and never registers slot changed events when the underlying DB changes)

Comment 3

5 years ago
This patch doesn't change the in memory certs trust flags. The patch keeps the CRL cache from hanging onto the certificate handle so it could see new trust when the database changes. Unless Chrome is using CRLs, the patch wouldn't affect you.
(Assignee)

Comment 4

4 years ago
(In reply to Elio Maldonado from comment #0)
> Created attachment 767986 [details] [diff] [review]
> don't hold issuer certs handles in the CRL cache.

Hmm, this points to one of the attachments for Bug 887483. I'll fix this.
(Assignee)

Comment 5

4 years ago
Created attachment 8547253 [details] [diff] [review]
dont hold issuer cert handles in crl cache

Bob's patch adapted to current sources.
(Assignee)

Updated

4 years ago
Attachment #8547253 - Flags: review?(ryan.sleevi)

Comment 6

4 years ago
Comment on attachment 8547253 [details] [diff] [review]
dont hold issuer cert handles in crl cache

Review of attachment 8547253 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/certdb/certi.h
@@ +117,5 @@
>      PRLock* lock;
>  #endif
> +    SECItem *issuerDERCert;    /* issuer DER cert. Don't hold a reference
> +				  to the actual cert so the trust can be
> +				  updated on the cert automatically.

This comment seems superfluous / recoverable via source control. It doesn't seem to add value (ditto that on line 180)

@@ +176,5 @@
>         which gets created at the same time as this object */
>      NSSRWLock* lock;
>      CRLDPCache** dps;
>      PLHashTable* distributionpoints;
> +    CERTCertificate* issuer; /* This should be the DER Cert, not a cert handle */

Since this is #ifdef'd out, why don't you fix it now? Or rename it?

Or better yet, entirely remove it, since it is ifdef'd out.

::: lib/certdb/crl.c
@@ +1940,3 @@
>          {
>              dirty = PR_TRUE;
> +	    cache->dbHandle = issuer->dbhandle;

DANGER? This doesn't seem to be safe. TrustDomain is ref-counted and tied to the associated token. If NSSTrustDomain_Destroy is called, things will go bad.

Now, NSSTrustDomain is never called (except on STAN shutdown), but this definitely seems to be breaking some API rules.

The less-dangerous alternative that follows the API conventions, but violates user expectations, is to forcibly use the default domain. Given how much code in NSS does this, this seems more consistent / less dangerous?

::: tests/chains/chains.sh
@@ +973,5 @@
>      #   URI: "http://ocsp.server:2601"
>      OCSP_HOST=$(${BINDIR}/pp -w -t certificate -i ${CERT_FILE} | grep URI | sed "s/.*:\/\///" | sed "s/:.*//")
>      OCSP_PORT=$(${BINDIR}/pp -w -t certificate -i ${CERT_FILE} | grep URI | sed "s/^.*:.*:\/\/.*:\([0-9]*\).*$/\1/")
>  
> +    echo "Cert = ${CERT_NICK}.cert"

Is this necessary?
Attachment #8547253 - Flags: review?(ryan.sleevi) → review-
(Assignee)

Comment 7

4 years ago
Thank you Ryan for the prompt review. I'll let Bob reply as he's the author of the patch.
Flags: needinfo?(rrelyea)

Comment 8

4 years ago
> This comment seems superfluous / recoverable via source control.

I've taken to stating the obvious, because I've found what is obvious to me isn't always obvious to others. I'd rather error on the side of more than less information (as long as it isn't wrong).

> DANGER? This doesn't seem to be safe. TrustDomain is ref-counted and tied to the associated token.
>  If NSSTrustDomain_Destroy is called, things will go bad.

Hmm, this is a general issue. GetDefaultCERTDB always returns an non-referenced CERTCertDBHandle. The handle in the cert doesn't have a reference The old CERTCertDBHandle was not reference counted, so any code that lives in that space doesn't treat it as such. We'll probably have to handle that by creating a CERTCertDBHandle structure which holds a reference to the trustdomain. Rather than trying to fix all the uses inside (and outside) of NSS. Currently both structures are opaque.

> Since this is #ifdef'd out, why don't you fix it now? Or rename it?
> Or better yet, entirely remove it, since it is ifdef'd out.

Probably should just remove it. I'm sure we would have to look more carefully at the code if we need to add locking anyway.

> Is this necessary?

No

bob
Flags: needinfo?(rrelyea)

Comment 9

4 years ago
> The less-dangerous alternative that follows the API conventions, but violates user expectations,
> is to forcibly use the default domain. Given how much code in NSS does this, this seems more
> consistent / less dangerous?

There isn't a consistent use. The expected use is to include a certdb. At some point there will need to be an effort to remove, limit the remaining use of forcibly getting the default trust domain. We are running into more an more applications that want to use separate trust domains, so no new NSS code should use getdefaultCertDB (spelling obviously wrong).

bob
(Assignee)

Comment 10

4 years ago
Created attachment 8552145 [details] [diff] [review]
don't hold issuer cert handles in crl cache - V2
Attachment #8547253 - Attachment is obsolete: true
Attachment #8552145 - Flags: review?(ryan.sleevi)

Comment 11

4 years ago
Comment on attachment 8552145 [details] [diff] [review]
don't hold issuer cert handles in crl cache - V2

Review of attachment 8552145 [details] [diff] [review]:
-----------------------------------------------------------------

I'm adding r+, simply because I'm not confident my complaint about the comment wouldn't be overridden. I've added my grump below, but beyond that, the code is functionally sane - even if our handling of DB handles is somewhat insane/misleading :/

::: lib/certdb/certi.h
@@ +117,5 @@
>      PRLock* lock;
>  #endif
> +    SECItem *issuerDERCert;    /* issuer DER cert. Don't hold a reference
> +				  to the actual cert so the trust can be
> +				  updated on the cert automatically.

I still don't feel this comment adds any meaningful clarity to the code. I'm doing this review, having read the bug, and I'm still confused as to what this comment means.

I understand the desire to be expressive, but I don't think that the necessary context can be easily and succinctly provided, short of back-referencing to this bug. Given that the commit will do so, and a blame would be the first thing I always do when wondering "why is this code this way", I feel even a comment "See this bug X to understand why we do this" would be superfluous.

I am *all* for comments, but I don't think this helps much.
Attachment #8552145 - Flags: review?(ryan.sleevi) → review+
(Assignee)

Updated

4 years ago
Target Milestone: --- → 3.18
(Assignee)

Comment 12

4 years ago
Pushed https://hg.mozilla.org/projects/nss/rev/ac277ab60e8f
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.