Last Comment Bug 479393 - (psm-pkix) Add libpkix-based certificate validation to PSM (off by default)
(psm-pkix)
: Add libpkix-based certificate validation to PSM (off by default)
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Security: PSM (show other bugs)
: Trunk
: All All
: -- enhancement with 3 votes (vote)
: mozilla6
Assigned To: Kai Engert (:kaie)
:
:
Mentors:
: 532158 (view as bug list)
Depends on: 420991 480706 483963 485155 485658 642148 647902 647923 670454 775827
Blocks: 611781 399324 542674 624514 pkix-default
  Show dependency treegraph
 
Reported: 2009-02-19 23:26 PST by Wan-Teh Chang
Modified: 2012-07-19 19:59 PDT (History)
26 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
Proposed patch (1.00 KB, patch)
2009-02-19 23:26 PST, Wan-Teh Chang
kaie: review+
Details | Diff | Splinter Review
performance comparison numbers (74.33 KB, text/plain)
2010-08-23 06:44 PDT, Kai Engert (:kaie)
no flags Details
performance comparison chart (57.83 KB, image/png)
2010-08-23 06:45 PDT, Kai Engert (:kaie)
no flags Details
patch: work in progress (59.84 KB, patch)
2011-04-04 16:38 PDT, Kai Engert (:kaie)
no flags Details | Diff | Splinter Review
Patch: work in progress (52.14 KB, patch)
2011-04-05 15:01 PDT, Kai Engert (:kaie)
no flags Details | Diff | Splinter Review
Patch v4 (49.14 KB, patch)
2011-04-05 15:51 PDT, Kai Engert (:kaie)
no flags Details | Diff | Splinter Review
Patch v5 (48.94 KB, patch)
2011-04-07 23:38 PDT, Kai Engert (:kaie)
brian: review-
rrelyea: review+
Details | Diff | Splinter Review
Patch v6 (51.36 KB, patch)
2011-04-26 08:36 PDT, Kai Engert (:kaie)
brian: review-
Details | Diff | Splinter Review
Patch v7 (51.75 KB, patch)
2011-04-28 09:07 PDT, Kai Engert (:kaie)
brian: review-
Details | Diff | Splinter Review
Patch v8 (54.48 KB, patch)
2011-05-02 11:23 PDT, Kai Engert (:kaie)
kaie: review+
Details | Diff | Splinter Review
Patch v8 - addon a - moves two functions (1.92 KB, patch)
2011-05-02 11:24 PDT, Kai Engert (:kaie)
kaie: review+
Details | Diff | Splinter Review
Patch v8 - addon b - whitespace cleanup (13.40 KB, patch)
2011-05-02 11:26 PDT, Kai Engert (:kaie)
kaie: review+
Details | Diff | Splinter Review
Incremental patch v9 (670 bytes, patch)
2011-05-03 11:43 PDT, Kai Engert (:kaie)
brian: review+
Details | Diff | Splinter Review

Description Wan-Teh Chang 2009-02-19 23:26:08 PST
Created attachment 363270 [details] [diff] [review]
Proposed patch

PSM should call CERT_SetUsePKIXForValidation(PR_TRUE) after
initializing NSS to make the old NSS cert validation APIs
use the new libpkix cert validation engine internally.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2009-02-25 00:25:52 PST
In bug 479508 comment 2, Rob wrote:

> Even with Wan-Teh's patch from bug #479393 added to the mix, I still see the
> 20-cert-long Certificate Hierarchy.

Yes, that is expected.  As I wrote in bug 479508 comment 1, PSM uses the old
code for constructing the chain displayed by Certificate Manager.  That is 
unaffected by this patch.  This patch affects only cert chain validation,
not display.  I believe that additional PSM code changes will be necessary to 
change the chain displayed by the Cert Manager.
Comment 2 Kai Engert (:kaie) 2009-02-25 09:34:27 PST
(In reply to comment #1)
> 
> Yes, that is expected.  As I wrote in bug 479508 comment 1, PSM uses the old
> code for constructing the chain displayed by Certificate Manager.  That is 
> unaffected by this patch.  This patch affects only cert chain validation,
> not display.  I believe that additional PSM code changes will be necessary to 
> change the chain displayed by the Cert Manager.

We need to file a bug to request this change.
Comment 3 Kai Engert (:kaie) 2009-02-25 09:35:51 PST
Comment on attachment 363270 [details] [diff] [review]
Proposed patch

r=kaie

Should landing of this patch be blocked by getting the more complicated chain display work done?
Comment 4 Nelson Bolyard (seldom reads bugmail) 2009-02-25 15:23:16 PST
(In reply to Kai's question in comment #3)
> Should landing of this patch be blocked by getting the more complicated 
> chain display work done?

I say no, it should not be blocked.  I think that changing the chain display
may be a lot of work.  It may necessitate some changes to NSS first.  
Kai, if there's a bug/RFE for changing Cert Mgr's chain display, please 
CC me on it.  If not, then please file such an RFE, and CC me.
I will add a discussion there about the many issues of getting a chain 
"the right way".
Comment 5 Wan-Teh Chang 2009-02-25 17:54:39 PST
I just pushed changeset 209a7cc3150e to mozilla-central
for Firefox 3.2 (trunk) development:
http://hg.mozilla.org/mozilla-central/rev/209a7cc3150e
Comment 6 Dave Camp (:dcamp) 2009-02-25 21:52:51 PST
Looks like this was causing test failures:

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1235616516.1235622320.13307.gz

Backed out: http://hg.mozilla.org/mozilla-central/rev/939103e7a4ab
Comment 7 Wan-Teh Chang 2009-02-26 20:45:08 PST
Honza,

One of the tests failures caused by my checkin is the test for
bug 413909 (test_bug413909.html) that you recently added.  The
message from the Tinderbox log file is:

*** 592 INFO Running chrome://mochikit/content/chrome/security/ssl/test_bug413909.html...
*** 593 ERROR TEST-UNEXPECTED-FAIL | chrome://mochikit/content/chrome/security/ssl/test_bug413909.html | Secure page did not load, adding exception failed?

I don't know how to run that test.  Could you help us debug
this test failure?  You'll need to apply my patch (attachment
363270 [review]) because it has been backed out.  Thanks.
Comment 8 Wan-Teh Chang 2009-02-26 20:57:04 PST
Dave,

The other test failures caused by my checkin are all related
to the test xpinstall/tests/browser_signed_untrusted.js that
you recently added in bug 474763.  Could you shed some light
on what these test failures mean?  Here are the messages from
the Tinderbox log file:

chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js
TEST-PASS | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Should only be 1 item listed in the confirmation dialog
TEST-PASS | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Should have seen the name from the trigger list
TEST-PASS | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Should have listed the correct url for the item
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Should have seen the supposed signer - Got (Author not verified), expected (Unknown Organisation)
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Should have listed the item as signed - Got false, expected true
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/xpinstall/tests/browser_signed_untrusted.js | Install should fail - Got 0, expected -260
Comment 9 Honza Bambas (:mayhemer) 2009-02-27 07:34:00 PST
(In reply to comment #7)
> Honza,
> 
> One of the tests failures caused by my checkin is the test for
> bug 413909 (test_bug413909.html) that you recently added.  The
> message from the Tinderbox log file is:
> 
> *** 592 INFO Running
> chrome://mochikit/content/chrome/security/ssl/test_bug413909.html...
> *** 593 ERROR TEST-UNEXPECTED-FAIL |
> chrome://mochikit/content/chrome/security/ssl/test_bug413909.html | Secure page
> did not load, adding exception failed?
> 

I get ###!!! ASSERTION: why did NSS call our bad cert handler if all looks good? Let's cancel the connection: 'Not Reached', f
ile d:/mozilla/mozilla-central/security/manager/ssl/src/nsNSSIOLayer.cpp, line 2997

In cert_VerifyCertChainPkix call to PKIX_BuildChain fails, result is SECFailure but the log is empty and IOLayer believes there is no problem with the certificate.

This is dump of the self-signed server cert for which it fails (build\pgo\certs database, bug413909cert nickname):

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 1 (0x1)
        Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
        Issuer: "CN=bug413909.xn--hxajbheg2az3al.xn--jxalpdlp"
        Validity:
            Not Before: Mon May 26 20:40:36 2008
            Not After : Sat May 26 20:40:36 2018
        Subject: "CN=bug413909.xn--hxajbheg2az3al.xn--jxalpdlp"
        Subject Public Key Info:
            Public Key Algorithm: PKCS #1 RSA Encryption
            RSA Public Key:
                Modulus:
                    [snip]
                Exponent: 65537 (0x10001)
        Signed Extensions:
            Name: Certificate Subject Alt Name
            RFC822 Name: "bug413909@testing.mozilla.org">bug413909@testing.mozilla.org"

    Signature Algorithm: PKCS #1 SHA-1 With RSA Encryption
    Signature:
        [snip]
    Fingerprint (MD5):
        18:24:64:40:07:38:67:C6:6D:03:D5:C6:BD:27:7A:D9
    Fingerprint (SHA1):
        A4:76:7C:1A:E2:81:CC:14:0D:36:F3:95:DC:F5:87:E4:A1:2D:69:59

    Certificate Trust Flags:
        SSL Flags:
            User
        Email Flags:
            User
        Object Signing Flags:
            User

I will take a deeper look at this.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2009-02-27 07:45:51 PST
(In reply to comment #9)

> In cert_VerifyCertChainPkix call to PKIX_BuildChain fails, result is 
> SECFailure but the log is empty and IOLayer believes there is no problem 
> with the certificate.

I think I just reviewed a patch that I think addresses that very issue
(PKIX_BuildChain returns SECFailure but the log is empty).  It was
Attachment 363348 [details] [diff] for bug 444404.  Wanna try that patch and see if it 
fixes this problem?
Comment 11 Honza Bambas (:mayhemer) 2009-02-27 07:57:05 PST
Unfortunatelly I could not apply it on trunk (even latest). All 3 hunks fail.
Comment 12 Honza Bambas (:mayhemer) 2009-02-27 07:58:04 PST
Ups, that's NSS CVS trunk patch, sorry :)
Comment 13 Honza Bambas (:mayhemer) 2009-02-27 09:43:13 PST
So, it doesn't help. Log is still empty. Does your patch work also for self signed certs w/o any CRLDP and OCSP extensions?
Comment 14 Dave Townsend [:mossop] 2009-02-27 09:57:26 PST
(In reply to comment #8)
> Dave,
> 
> The other test failures caused by my checkin are all related
> to the test xpinstall/tests/browser_signed_untrusted.js that
> you recently added in bug 474763.  Could you shed some light
> on what these test failures mean?  Here are the messages from
> the Tinderbox log file:

It looks like this has changed the behaviour of installing a broken signed xpi. Previously when an xpi was signed with a cert from an unknown CA the install process would detect and display the certificate when checking whether to install, and then the install would fail because the full certificate check would spot that it was untrusted. The new behaviour appears to be to treat the xpi as unsigned and just install.

In a way this is an ok thing to change, since it no longer tells the user that the xpi is signed there is no reason for the install to fail, Dan what are your thoughts?

The code that does the initial signature check which is where the change is happening is here: http://mxr.mozilla.org/mozilla-central/source/xpinstall/src/CertReader.cpp#254
Comment 15 Honza Bambas (:mayhemer) 2009-02-27 10:17:37 PST
(In reply to comment #13)
> So, it doesn't help. Log is still empty. Does your patch work also for self
> signed certs w/o any CRLDP and OCSP extensions?

certsFound list in pkix_Build_GatherCerts is empty after the 'while (state->certStoreIndex < state->buildConstants.numCertStores)' cycle. Then in pkix_BuildForwardDepthFirstSearch state->status goes to BUILD_TRYAIA and then BUILD_AIAPENDING. BUILD_AIAPENDING branch is not executed because 'state->buildConstants.aiaMgr' is null. Cycle ends and and result is pkixErrorCode = PKIX_SECERRORUNKNOWNISSUER. In cert_VerifyCertChainPkix we then get error with errCode = PKIX_NULLARGUMENT.
Comment 16 Nelson Bolyard (seldom reads bugmail) 2009-02-27 18:40:20 PST
Honza, please file a bug against NSS about the issues you documented in 
comment 15.  Assign it to Alexei.volkov, p1.  Make it block this bug.
Comment 17 Honza Bambas (:mayhemer) 2009-02-28 05:11:51 PST
(In reply to comment #16)
> Honza, please file a bug against NSS about the issues you documented in 
> comment 15.  Assign it to Alexei.volkov, p1.  Make it block this bug.

Filled bug 480706.
Comment 18 Kai Engert (:kaie) 2009-03-05 08:07:13 PST
(In reply to comment #4)
> I think that changing the chain display
> may be a lot of work.  It may necessitate some changes to NSS first.  
> Kai, if there's a bug/RFE for changing Cert Mgr's chain display, please 
> CC me on it.  If not, then please file such an RFE, and CC me.
> I will add a discussion there about the many issues of getting a chain 
> "the right way".

Filed bug 481565
Comment 19 Wan-Teh Chang 2009-03-05 08:45:48 PST
The bug number should be bug 481656.
Comment 20 Honza Bambas (:mayhemer) 2009-04-08 09:27:58 PDT
Looks like the test failure (bug 413909) is gone when I apply the patch to todays trunk. Something has changed?
Comment 21 Wan-Teh Chang 2009-04-08 11:22:38 PDT
Kai pushed NSS 3.12.3 to mozilla-central on Monday:
http://hg.mozilla.org/mozilla-central/rev/e4c0eed67bf9
Comment 22 Wan-Teh Chang 2009-04-20 15:04:24 PDT
I don't have time to work on this because this is much more
work than I expected.  Sorry.
Comment 23 Honza Bambas (:mayhemer) 2009-04-21 01:54:14 PDT
I could potentially take a look at this.
Comment 24 Wan-Teh Chang 2009-04-21 07:37:46 PDT
Honza, thanks for your help.
Comment 25 Nelson Bolyard (seldom reads bugmail) 2009-07-01 11:52:31 PDT
This bug suggests a specific solution to a problem for which there are 
multiple solutions.  So, I am changing the bug title to describe the issue
differently.  The issue is that FF/PSM needs to use a full RFC 3280 compliant
cert path building and cert path validation algorithm for ALL its certificate
validations, not only for EV cert validations.  NSS has a new library (new in
NSS 3.12) named libPKIX.  There are several ways to use libPKIX.  One is to 
call CERT_SetUsePKIXForValidation(PR_TRUE).  Another is to replace all PSM's 
calls to CERT_VerifyCert or CERT_VerifyCertificate with calls to CERT_PKIXVerifyCert.
Comment 26 Kai Engert (:kaie) 2010-08-20 14:35:27 PDT
*** Bug 532158 has been marked as a duplicate of this bug. ***
Comment 27 Kai Engert (:kaie) 2010-08-20 14:39:35 PDT
In bug 532158 Dan Veditz proposed that this issue should block2.0

Dan said:

"If we're going to make a change like this we should do so in a major release
like Firefox 4. PKIX gets us all kinds of good things like enforcing name
constraints, and support for cross-signing of certs through which Mozilla could
impose additional constraints on CAs.

There is probably some cost to this. How tested is the code? (it's been around
a while, but I don't know if it has been exposed to a broad range of real-world
certs.) Does it have significantly different performance than the old code?

If we're going to do it it's already getting late."


Based on comment 25 I've marked bug 532158 as a duplicate of this one.
Comment 28 Kai Engert (:kaie) 2010-08-20 15:13:37 PDT
(In reply to comment #27)
> 
> If we're going to make a change like this we should do so in a major release
> like Firefox 4. PKIX gets us all kinds of good things like enforcing name
> constraints, and support for cross-signing of certs through which Mozilla could
> impose additional constraints on CAs.

Good point.

It would also be necessary to fetch missing intermediate CA certs (if we want that, see bug 399324).


> There is probably some cost to this. How tested is the code? (it's been around
> a while, but I don't know if it has been exposed to a broad range of real-world
> certs.)

The history of this bug shows that there might be many traps we could run into. It turned out to be trickier than expected, and nobody has focused on the issues yet.

It may be worth to re-run the experiment and see which test failures we get.


FWIW, Anyone is able to run the new code by setting environment variable
  NSS_ENABLE_PKIX_VERIFY="1"
prior to starting the Mozilla app.


I've been running my primary work Firefox and Thunderbird profiles with this configuration for a long time, probably around a year already.

Until recently I had seen frequent crashes, which were (hopefully) all caused by bug 550929. Meanwhile we've finally upgraded to a fixed version, NSS 3.12.7, and so far it seems to work well. Besides from the crashes, I haven't experienced other issues.


> Does it have significantly different performance than the old code?

I haven't checked. Would the results of a tryserver build answer that question?


> If we're going to do it it's already getting late.

True. I propose I start a tryserver build with the attached small patch and we can have a look at the test and performance results.
Comment 29 Kai Engert (:kaie) 2010-08-23 06:41:58 PDT
Try builds:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/kaie@kuix.de-503ca5c20c53/

I'll attach performance comparisons, text and chart.
Comment 30 Kai Engert (:kaie) 2010-08-23 06:44:28 PDT
Created attachment 468298 [details]
performance comparison numbers

Origin of results from http://tinyurl.com/2vq5gsq
(Look at the attachment before you decide you need to click the link, the link dynamically re-evaluates the data)
Comment 31 Kai Engert (:kaie) 2010-08-23 06:45:28 PDT
Created attachment 468299 [details]
performance comparison chart

Chart created using Pike's talos-node tool.
Comment 32 Benjamin Smedberg [:bsmedberg] 2010-08-24 09:03:01 PDT
wanted, but not blocking
Comment 33 Kai Engert (:kaie) 2010-08-26 09:43:24 PDT
Comment on attachment 468298 [details]
performance comparison numbers

Apparently this attachment didn't work, so please click the link from comment 30.
Comment 34 Daniel Veditz [:dveditz] 2010-09-13 17:46:24 PDT
Comment on attachment 468299 [details]
performance comparison chart

I can't authoritatively review this. The graph looks OK to me, some of the numbers in the other chart are concerning (but I don't know how important they are, and others look good). I'd like to punt this to Johnathan who can weigh this in the context of the various ongoing perf efforts.
Comment 35 Kai Engert (:kaie) 2011-03-25 05:22:07 PDT
Comment on attachment 468299 [details]
performance comparison chart

clearing review request.

I understand we want to land, because we need PKIX for improved security functionality.
Comment 36 Kai Engert (:kaie) 2011-03-25 05:30:26 PDT
I ran the mochitests for security.
I get same results w and w/o this patch applied.

(I saw two TODO warnings in mixedcontent/test_dynDelayedUnsecureXHR.html )
Comment 37 Kai Engert (:kaie) 2011-03-25 05:34:25 PDT
Based on comment 4, I'm removing the strict dependency on bug 481656.
Comment 38 Kai Engert (:kaie) 2011-03-25 05:42:21 PDT
If this is an important immediate goal (and I understand it is),

I would like to ask for help with dependent bug 481763 and bug 481764.

Either we should get them fixed very soon,
or we should reach a conclusion whether those bugs can be ignored/postponed.

I have written lots of comments in those bugs.
They are important for the default behaviour of the PKIX verification code,
since that's what we going to do,
use PKIX with it's default settings.
Comment 39 Honza Bambas (:mayhemer) 2011-03-25 06:25:04 PDT
(In reply to comment #36)
> (I saw two TODO warnings in mixedcontent/test_dynDelayedUnsecureXHR.html )

This is expected.

(In reply to comment #38)
> I would like to ask for help with dependent bug 481763 and bug 481764.

I could take them no sooner then after allhands/platform-meeting and some time of restore from jet-lag.
Comment 40 Kai Engert (:kaie) 2011-03-25 07:06:40 PDT
> (In reply to comment #38)
> > I would like to ask for help with dependent bug 481763 and bug 481764.

The primary need for help is to review my thoughts around default verification behaviour in bug 481764 and give feedback.
Comment 41 Kai Engert (:kaie) 2011-04-04 16:38:59 PDT
Created attachment 524024 [details] [diff] [review]
patch: work in progress
Comment 42 Kai Engert (:kaie) 2011-04-05 15:01:23 PDT
Created attachment 524093 [details] [diff] [review]
Patch: work in progress
Comment 43 Kai Engert (:kaie) 2011-04-05 15:51:49 PDT
Created attachment 524102 [details] [diff] [review]
Patch v4

First real patch.
Comment 44 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-07 16:24:51 PDT
We discussed changing this patch so that the use of libpkix is a compile-time decision (#ifdef). It would be better if this could be conditional an undocumented runtime pref, so that if/when we run into problems, we can flip the pref, restart, and try again with the old mechanism.
Comment 45 Kai Engert (:kaie) 2011-04-07 23:38:48 PDT
Created attachment 524583 [details] [diff] [review]
Patch v5

(In reply to comment #44)
> It would be better if this could be conditional an undocumented runtime pref

ok. Done.
Comment 46 Kai Engert (:kaie) 2011-04-07 23:40:54 PDT
Let's reconsider to add this patch for Firefox 5 by Friday evening.

The patch no longer changes the default behaviour.

It simply allows to switch to the libPKIX verification engine, if you know about the hidden prefs.
Comment 47 Robert Relyea 2011-04-08 16:37:35 PDT
Comment on attachment 524583 [details] [diff] [review]
Patch v5

r+ for the NSS specific portions. I have not figured out if all the Mozilla autopointers are operating correctly.

Also get brians approval before landing this patch.

One Semantic change I noticed. Previously calls to set the ocspstatus (Now revocation status) did not flush the SSL cache if we were turning off OCSP. Specifically, this means we didn't flush the SSL cache if OCSP were off and we were trying to get the the certusages. Now we will full the cache in all those cases. (Flushing the SSL cache will require all SSL sessions to do full handshakes if restarted. This already happens if OCSP is turned on.

bob
Comment 48 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-13 12:11:22 PDT
We cannot have a mismatch between the cert chain display in the UI and the actual cert chain we built, because people are using that UI to determine what to do during SSL errors and/or to debug other badness, so I am adding back the dependency on bug 481656. Also, bug 635384 would be a known regression resulting in a crash.
Comment 49 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-14 12:48:02 PDT
Comment on attachment 524583 [details] [diff] [review]
Patch v5

First, how are the current (non-libpkix) PSM OCSP policies different from the pre-defined policies returned CERT_GetClassicOCSP*Policy()? If they are the same then we should just use the predefined libpkix policies until we decide know how to properly enable CRL fetching and AIA.

Please factor out the CRL and AIA logic into separate patches. The CRL logic here doesn't look right. In particular, (a) if we have a fresh CRL, why would we also do a OCSP request (CERT_REV_M_CONTINUE_TESTING_ON_FRESH_INFO)? And, how do we protect ourselves from downloading a massive CRL? And, how do we delete these CRLs when the user clears history, in private browsing, etc. The same considerations apply to AIA, There are a lot of changes needed to properly support these two new features, and we should work on them in other bugs.

There is a lot of duplicate code like this:

  nsRefPtr<nsCERTValInParamWrapper> survivingParams;
  nsCOMPtr<nsINSSComponent> inss;
  CERTValOutParam cvout[1];
  cvout[0].type = cert_po_end;
  inss = do_GetService(kNSSComponentCID, &rv);
  if (!inss) {
    goto loser;
  }
  inss->GetDefaultCERTValInParam(survivingParams);
  rv = CERT_PKIXVerifyCert(...);

Let's put this common logic into a static function so that we only have to read/understand/verify this logic one time.

Calls to CERT_DisableOCSPChecking and related functions are not conditional on the pref to use libpkix. All the calls that control the old cert chain building library should be conditional on the pref so that we know which code to remove in the future, and so we ensure there is no unexpected dependency between libpkix and the settings that these functions control.

Add a comment above the call to nssComponent->SkipOcsp() to remind us to delete the SkipOcsp method when we remove the option to use the old cert chain building library.
Comment 50 Kai Engert (:kaie) 2011-04-19 09:20:40 PDT
Brian asked on IRC:

<bsmith> One thing I didn't understand was the mutex usage
<bsmith> before calling setValidationOptions
<bsmith> why not put the mutex lock inside setValidationOptions?
<bsmith> For example, why doesn't SkipOcspOff need to lock the mutex?

My answer:


nsNSSComponent::InitializeNSS calls setOCSPOptions with the mutex already
locked, so we can't add the lock there.

The observer did not yet lock the mutex, so I added the lock.

However, you correctly identified that we must add a lock in SkipOcspOff.

Ideally this should be rewritten to use a monitor (recursive reference counted
locking), but maybe we can do that later, given that the number of callers is
still manageable.
Comment 51 Kai Engert (:kaie) 2011-04-19 10:17:25 PDT
Self-review request: Make sure the code actually assumes the non-pkix defaults, to it works as desired, even without having any default values in our prefs.js files.


(In reply to comment #48)
> We cannot have a mismatch between the cert chain display in the UI and the
> actual cert chain we built, because people are using that UI to determine what
> to do during SSL errors and/or to debug other badness, so I am adding back the
> dependency on bug 481656. 

I agree, but I propose we do the work in steps, land this bug first, and do the other work on top of it. Shouldn't be a problem, because non-pkix is default.


> Also, bug 635384 would be a known regression
> resulting in a crash.

But they only can run into it, when they enable the non-default prefs, so we can work to fix this before enabling pkix by default.


(In reply to comment #49)
> First, how are the current (non-libpkix) PSM OCSP policies different from the
> pre-defined policies returned CERT_GetClassicOCSP*Policy()? If they are the
> same then we should just use the predefined libpkix policies until we decide
> know how to properly enable CRL fetching and AIA.

I disagree. We want the flexibility to use our own settings, which this patch enables. I don't see how it helps us to write a third variant of code that implements your proposal, but will have be replaced in the short future anyway (with an approach that is as flexible as this patch).


> Please factor out the CRL and AIA logic into separate patches.

I had a long debate on IRC with Brian over this.

I think I was able to agree with my proposals, because of the following arguments:

- this patch doesn't enable CRL/AIA by default
- this patch merely gives us the opportunity to test this functionality
  with standard nightly builds by switching prefs
- yes, our CRL/AIA downloading code "might" still be inferior,
  but we don't know for sure until we have a chance to give it some
  real world testing
- yes, some properties of our CRL downloading story is still inferior,
  because of redownloading etc.,
  but we need to able to play with it (by enabling prefs)
- your worry about the behaviour of non-default code pathes is laudable,
  but we allow non-perfect code that is preffed off.
  In fact, the strategy for the aurora/experimental branches is
  "if it doesn't work, then pref it off",
  which in fact proves that it's accetable to ship a product
  that contains non-perfect code that is preffed off,
  so, you don't need to worry too much about code pathes that are
  preffed off, but are not yet perfect


> The CRL logic
> here doesn't look right. In particular, (a) if we have a fresh CRL, why would
> we also do a OCSP request (CERT_REV_M_CONTINUE_TESTING_ON_FRESH_INFO)?

That is my proposal.

My thinking was, CRLs are too expensive to download frequently, so it might be worthwile to check for more frequent information from OCSP.

This is a policy decision, and I propose we have the discussion about the correct policy elsewhere, possibly in a separate bug named "decide about correct revocation checking policy for Firefox.next". The purpose of this bug is to allow us to proceed and start testing.


> And, how
> do we protect ourselves from downloading a massive CRL? 

Because the pref is off by default at this time.

(If you deliberately turn it on for testing, we currently do not have protection.)


> And, how do we delete
> these CRLs when the user clears history, in private browsing, etc. 

We don't right now. You could propose separate prefs (in a separate bug) that restrict the persistent storage of revocation information in private-browsing mode. But right now, it's not enabled by default.

I propose to morph the name of this bug into "add code that enables PSM to optionally use libPKIX for certificate verification".

I propose to have another bug that says "enable libPKIX verification by default", then we can have all your other concern bugs as dependencides for the "enable-by-default-bug", while we continue to work on this this milestone.


> The same
> considerations apply to AIA, There are a lot of changes needed to properly
> support these two new features, and we should work on them in other bugs.

Our decision might well be, that despite enabling libPKIX verification by default, we disable CRL/AIA by default.


> There is a lot of duplicate code like this:
> 
>   nsRefPtr<nsCERTValInParamWrapper> survivingParams;
>   nsCOMPtr<nsINSSComponent> inss;
>   CERTValOutParam cvout[1];
>   cvout[0].type = cert_po_end;
>   inss = do_GetService(kNSSComponentCID, &rv);
>   if (!inss) {
>     goto loser;
>   }
>   inss->GetDefaultCERTValInParam(survivingParams);
>   rv = CERT_PKIXVerifyCert(...);
>
> Let's put this common logic into a static function so that we only have to
> read/understand/verify this logic one time.

I'll look into it.


> Calls to CERT_DisableOCSPChecking and related functions are not conditional on
> the pref to use libpkix.

These calls have the bad side effect that they are effective globally (all threads), even though our intention is to do just a local operation with this off.

My initial hope was to get completely rid of using the very unfortunate CERT_DisableOCSPChecking calls, that's probably why I had overlooked this, when I made the follow up patch to introduce the use-libpkix pref.

Also, it's a bad idea to add the pref checking to those functions, because the pref-checking has thread restrictions, but this code can run on any thread.

It shouldn't be a problem to keep these calls for now, because we had them always anyway, and even if we start to use libPKIX by default, those calls won't have any effect.


> All the calls that control the old cert chain building
> library should be conditional on the pref so that we know which code to remove
> in the future, and so we ensure there is no unexpected dependency between
> libpkix and the settings that these functions control.

I propose to avoid adding more pref checking, because it's expensive and has thread restrictions.

My counter offer is to add a new
  #define ENABLE_CLASSIC_PSM_VERIFY_CODE
and add all code that it supposed to be removed inside
  #ifdef ENABLE_CLASSIC_PSM_VERIFY_CODE
that enables it to easily remove it later.


> Add a comment above the call to nssComponent->SkipOcsp() to remind us to delete
> the SkipOcsp method when we remove the option to use the old cert chain
> building library.

Unnecessary if we do the above.
Comment 52 Kai Engert (:kaie) 2011-04-19 10:20:02 PDT
(In reply to comment #47)
> 
> One Semantic change I noticed. Previously calls to set the ocspstatus (Now
> revocation status) did not flush the SSL cache if we were turning off OCSP.
> Specifically, this means we didn't flush the SSL cache if OCSP were off and we
> were trying to get the the certusages. Now we will full the cache in all those
> cases. (Flushing the SSL cache will require all SSL sessions to do full
> handshakes if restarted. This already happens if OCSP is turned on.

I'll try to fix this in the next patch.
(It was unnecessary before Brian asked me to add the use-libpkix pref)
Comment 53 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-19 14:33:18 PDT
Comment on attachment 524583 [details] [diff] [review]
Patch v5

12345678901234567890123456789012345678901234567890123456789012345678901234567890

>+  enum aia_config { aia_off = 0, aia_on };

AIA is an ambiguous name. The pointer to the OCSP responder is in the AIA too. Please rename these from "aia" to "missing_cert_download" or similar, so that the names of the prefs correspond to the names of these member variables.

>+  nsRefPtr<nsCERTValInParamWrapper> survivingParams;
>+  nsCOMPtr<nsINSSComponent> inss;

Please move these declarations down to where they are used (in CommonVerifySignature)

>+#include "sslerr.h"

This include doesn't seem necessary.

>+    if ( rv == SECSuccess && isServer ) {
>+        /* cert is OK.  This is the client side of an SSL connection.
>+        * Now check the name field in the cert against the desired hostname.
>+        * NB: This is our only defense against Man-In-The-Middle (MITM) attacks!
>+        */
>+        if (hostname && hostname[0])
>+            rv = CERT_VerifyCertName(peerCert, hostname);
>+        else
>+            rv = SECFailure;
>+        if (rv != SECSuccess)
>+            PORT_SetError(SSL_ERROR_BAD_CERT_DOMAIN);
>+    }
>+        
>+    PORT_Free(hostname);

I understand this code was copy/pasted from libssl's SSL_AuthCertificate. But, it doesn't match the logic used in nsNSSBadCertHandler. We need to file a bug (against libssl or against PSM) for whichever version is wrong. When I write the patch for bug 650296, I will remove the duplication between this and nsNSSBadCertHandler, so if the nsNSSBadCertHandler logic is the correct one, then let's copy/paste the code from there instead of from libssl.

>+  inss->GetDefaultCERTValInParam(survivingParams);

We need to check the return value of GetDefaultCERTValInParam each time we call it (there are multiple instances of this problem), or document why it cannot fail.

>+#include "cert.h"
> NS_IMETHODIMP
> nsNSSComponent::SkipOcspOff()
> {
>-  setOCSPOptions(mPrefBranch);
>+  setValidationOptions(mPrefBranch);
>   return NS_OK;
> }

As mentioned on IRC, we need to grab the mutex here. And, we need to document what the mutex is protecting. In the documentation for setValidationOptions, document that it requires that the mutex already be held.

>+    rv = mPrefBranch->GetBoolPref("security.use_libpkix_verification", &globalConstFlagUsePKIXVerification);

This will cause my compiler to warn that rv is assigned a value that is never read. use "(void) " instead of "rv = "

>+      // static validation options for usagesarray - do not hit the network
>+      nsRefPtr<nsCERTValInParamWrapper> mDefaultCERTValInParamLocalOnly = new nsCERTValInParamWrapper;
>+      if (mDefaultCERTValInParamLocalOnly.get()) {
>+        mDefaultCERTValInParamLocalOnly->Construct(
>+          nsCERTValInParamWrapper::aia_off,
>+          nsCERTValInParamWrapper::crl_local_only,
>+          nsCERTValInParamWrapper::ocsp_off,
>+          nsCERTValInParamWrapper::ocsp_relaxed,
>+          nsCERTValInParamWrapper::any_revo_relaxed,
>+	  "ocsp");
>+      }

"nsRefPtr<nsCERTValInParamWrapper> mDefaultCERTValInParamLocalOnly" should be "mDefaultCERTValInParamLocalOnly" in order to initialize the member variable. operator new cannot fail so o need to check for null. But, Construct() can fail, so you need to check for failure there.

>+      nsAutoLock lock(mutex);

nsAutoLock is deprecated (there are multiple instances of this in the patch). See https://developer.mozilla.org/En/XPCOM_Thread_Synchronization

>+    CERTValOutParam cvout[2];
>+    cvout[0].type = cert_po_errorLog;
>+    cvout[0].value.pointer.log = verify_log;
>+    // NOTE: I hope this is right. It seems to be in+out parameter,
>+    //       we must allocate the outer log structure to hold the results
>+    cvout[1].type = cert_po_end;

Remove the "I hope this is right." comment. I looked at the libpkix source code and it does require value.pointer.log to be allocated by the caller.
Comment 54 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-19 14:46:50 PDT
(In reply to comment #51)
> I propose to morph the name of this bug into "add code that enables PSM to
> optionally use libPKIX for certificate verification".
> 
> I propose to have another bug that says "enable libPKIX verification by
> default", then we can have all your other concern bugs as dependencides for 
> the "enable-by-default-bug", while we continue to work on this this
> milestone.

+1. I will file the other bug and mark it as depending on this one.

> > Let's put this common logic into a static function so that we only have to
> > read/understand/verify this logic one time.
> 
> I'll look into it.

I will do this in a follow-up patch, because I already know I will eliminate most of these calls.

> My counter offer is to add a new
>   #define ENABLE_CLASSIC_PSM_VERIFY_CODE
> and add all code that it supposed to be removed inside
>   #ifdef ENABLE_CLASSIC_PSM_VERIFY_CODE
> that enables it to easily remove it later.

This can happen in a follow-up bug. No need to do it now, as there are *lots* of these kinds of changes that need to be made, and the patches for the other follow-up bugs will eliminate/change some of them.
Comment 55 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-19 14:56:47 PDT
Bug 651246 is the new bug to make libpkix the default.
Comment 56 Kai Engert (:kaie) 2011-04-26 08:28:45 PDT
(In reply to comment #53)
> 
> AIA is an ambiguous name. The pointer to the OCSP responder is in the AIA too.
> Please rename these from "aia" to "missing_cert_download" or similar, so that
> the names of the prefs correspond to the names of these member variables.

done


> >+  nsRefPtr<nsCERTValInParamWrapper> survivingParams;
> >+  nsCOMPtr<nsINSSComponent> inss;
> 
> Please move these declarations down to where they are used (in
> CommonVerifySignature)

Not possible, because this old code uses "goto", and you get an compiler error about lifetime of pointers crossing scopes.


> >+#include "sslerr.h"
> 
> This include doesn't seem necessary.

I'm adding includes for a reason, only :)

Without the include I get:
/plaindata/moz/mocent/mozilla/security/manager/ssl/src/nsNSSCallbacks.cpp:1008: error: 'SSL_ERROR_BAD_CERT_DOMAIN' was not declared in this scope

Let's keep the include, if it's necessary on some platforms.


> >+  inss->GetDefaultCERTValInParam(survivingParams);
> 
> We need to check the return value of GetDefaultCERTValInParam each time we call
> it (there are multiple instances of this problem), or document why it cannot
> fail.

Ok.


> 
> >+#include "cert.h"
> > NS_IMETHODIMP
> > nsNSSComponent::SkipOcspOff()
> > {
> >-  setOCSPOptions(mPrefBranch);
> >+  setValidationOptions(mPrefBranch);
> >   return NS_OK;
> > }
> 
> As mentioned on IRC, we need to grab the mutex here. And, we need to document
> what the mutex is protecting. In the documentation for setValidationOptions,
> document that it requires that the mutex already be held.


I've changed this code.

The intention of this function was limited to "switch ocsp back on, if necessary".

It's limited to the non-pkix code.

I've decided that I was too lazy, when you asked me to keep this code. We should not call the full set of new code.

Instead, in the new patch, I've changed this code to only care for the old portions, and we don't need the mutex (which is supposed to protect access to the default pkix parameter objects).



> >+    rv = mPrefBranch->GetBoolPref("security.use_libpkix_verification", &globalConstFlagUsePKIXVerification);
> 
> This will cause my compiler to warn that rv is assigned a value that is never
> read. use "(void) " instead of "rv = "


I've now done something better, which I failed to do in the initial patch. If there is a failure reading this patch, let's use a compile-time-default, which I've now added (false).


> "nsRefPtr<nsCERTValInParamWrapper> mDefaultCERTValInParamLocalOnly" should be
> "mDefaultCERTValInParamLocalOnly" in order to initialize the member variable.

Yes, thanks for catching this copy-paste mistake.


> operator new cannot fail so o need to check for null.

Ok, that's news...


> But, Construct() can
> fail, so you need to check for failure there.

Ok.


> >+      nsAutoLock lock(mutex);
> 
> nsAutoLock is deprecated (there are multiple instances of this in the patch).
> See https://developer.mozilla.org/En/XPCOM_Thread_Synchronization

... and doesn't compile any longer. Replaced with MutexAutoLock.


> >+    CERTValOutParam cvout[2];
> >+    cvout[0].type = cert_po_errorLog;
> >+    cvout[0].value.pointer.log = verify_log;
> >+    // NOTE: I hope this is right. It seems to be in+out parameter,
> >+    //       we must allocate the outer log structure to hold the results
> >+    cvout[1].type = cert_po_end;
> 
> Remove the "I hope this is right." comment. I looked at the libpkix source code
> and it does require value.pointer.log to be allocated by the caller.

Thanks. Removed.


(see next patch after this comment)
Comment 57 Kai Engert (:kaie) 2011-04-26 08:36:56 PDT
Created attachment 528316 [details] [diff] [review]
Patch v6

Bob's request addressed.
Most of Brian's requests addressed.
Comment 58 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-27 13:05:28 PDT
This change will change the semantics, if not the signatures, of certificate-related APIs.
Comment 59 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-27 20:01:40 PDT
Comment on attachment 528316 [details] [diff] [review]
Patch v6

Review of attachment 528316 [details] [diff] [review]:

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ +154,5 @@
+#define OCSP_REQUIRED_DEFAULT 0
+#define FRESH_REVOCATION_REQUIRED_DEFAULT PR_FALSE
+#define MISSING_CERT_DOWNLOAD_DEFAULT PR_FALSE
+#define FIRST_REVO_METHOD_DEFAULT "ocsp"
+#define USE_NSS_LIBPKIX_DEFAULT PR_FALSE

The code is easier to read/navigate if these defaults are moved to the the single scope in which they are used (setValidationOptions). Also, these shouldn't be macros since we don't need to change the values at build time.

(OCSP_ENABLED_DEFAULT is an exception, because it is used in two scopes. But, it can be made a static const PRBool instead of a macro.)

@@ +694,5 @@
 nsNSSComponent::SkipOcspOff()
 {
+  nsNSSShutDownPreventionLock locker;
+  PRInt32 ocspEnabled;
+  mPrefBranch->GetIntPref("security.OCSP.enabled", &ocspEnabled);

should be:

PRInt32 ocspEnabled = OCSP_ENABLED_DEFAULT;
(void) mPrefBranch->GetIntPref("security.OCSP.enabled", 
                               &ocspEnabled);

@@ +1176,5 @@
+  rv = pref->GetIntPref("security.OCSP.enabled", &ocspEnabled);
+  // 0 = disabled, 1 = enabled, 
+  // 2 = enabled with given default responder
+  if (NS_FAILED(rv))
+    ocspEnabled = OCSP_ENABLED_DEFAULT;

When accessing prefs, let's use this coding style:

   PRBool crlDownloading = PR_FALSE;
   (void) pref->GetBoolPref("security.CRL_download.enabled",
                            &crlDownloading);

This way, when we grep for "security.CRL_download.enabled" we can see the default value without navigating further, and without needing (most) (redundant) *_DEFAULT macros.

@@ +1206,5 @@
+                           : ocspMode_FailureIsNotAVerificationFailure);
+
+  nsRefPtr<nsCERTValInParamWrapper> newCVIN = new nsCERTValInParamWrapper;
+  if (newCVIN.get()) {
+    newCVIN->Construct(

This should be:

  nsRefPtr<nsCERTValInParamWrapper> newCVIN 
     = new nsCERTValInParamWrapper;
  if (NS_FAILED(newCVIN->Construct(...

because operator new cannot fail but Construct() can. See https://developer.mozilla.org/en/Infallible_memory_allocation

@@ +1227,5 @@
+
+  if (crlDownloading || ocspEnabled || ocspRequired || anyFreshRequired) {
+    /*
+     * This might change the validity of already established SSL sessions,
+     * let's not reuse them.

There's no need to optimize this; we can just always clear the session cache in the rare case when any of these prefs change. Then we don't have to think about under what specific conditions we need to clear the cache.

@@ +1786,5 @@
       mNSSInitialized = PR_TRUE;
 
+      // Configure the old cert validation APIs (CERT_VerifyCert and
+      // CERT_VerifyCertificate) to use the new libpkix cert validation engine.
+      CERT_SetUsePKIXForValidation(globalConstFlagUsePKIXVerification);

This call should be removed. The default libpkix settings won't match the semantics we use elsewhere in this patch AND the semantics won't match the old semantics. It's a third set of semantics that we should avoid.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +3377,5 @@
+    return cancel_and_failure(infoObject);
+  nsRefPtr<nsCERTValInParamWrapper> survivingParams;
+  nsrv = inss->GetDefaultCERTValInParam(survivingParams);
+  if (NS_FAILED(nsrv))
+    return cancel_and_failure(infoObject);

We're supposed to avoid calling cancel_and_failure() here, because we didn't set up an alternative way of reporting the error, right?

::: security/manager/ssl/src/nsUsageArrayHelper.cpp
@@ +218,5 @@
+                      cvout, NULL);
+
+  usages = cvout[0].value.scalar.usages;
+}
+

This relies on CERT_PKIXVerifyCert clearing the error state returned by PR_GetError(), because we don't check the return value of CERT_PKIXVerifyCert. I am not sure it always does that. We should check the return value instead of relying on this undocumented/unknown behavior.
Comment 60 Kai Engert (:kaie) 2011-04-28 04:38:56 PDT
(In reply to comment #59)
> 
> ::: security/manager/ssl/src/nsNSSComponent.cpp
> @@ +154,5 @@
> +#define OCSP_REQUIRED_DEFAULT 0
> +#define FRESH_REVOCATION_REQUIRED_DEFAULT PR_FALSE
> +#define MISSING_CERT_DOWNLOAD_DEFAULT PR_FALSE
> +#define FIRST_REVO_METHOD_DEFAULT "ocsp"
> +#define USE_NSS_LIBPKIX_DEFAULT PR_FALSE
> 
> The code is easier to read/navigate if these defaults are moved to the the
> single scope in which they are used (setValidationOptions).

Fine with me.


> Also, these
> shouldn't be macros since we don't need to change the values at build time.

?

These are the build time defaults, in case no default values can be read from configuration files at run time. The built time is in fact the only time when we might want to ever change these default values.


> (OCSP_ENABLED_DEFAULT is an exception, because it is used in two scopes. But,
> it can be made a static const PRBool instead of a macro.)

Yes it can, but why?

In my opinion, the readability and maintainability is better, if the full group of defaults can be found in a single place, next to each other.

If we use a #define, the compiler is able to embed the zero default directly into code. If you require it to be a variable, you force a couple of bytes in the binary's data section, right?
Comment 61 Kai Engert (:kaie) 2011-04-28 04:57:04 PDT
(In reply to comment #59)
> 
> @@ +1176,5 @@
> +  rv = pref->GetIntPref("security.OCSP.enabled", &ocspEnabled);
> +  // 0 = disabled, 1 = enabled, 
> +  // 2 = enabled with given default responder
> +  if (NS_FAILED(rv))
> +    ocspEnabled = OCSP_ENABLED_DEFAULT;
> 
> When accessing prefs, let's use this coding style:
> 
>    PRBool crlDownloading = PR_FALSE;
>    (void) pref->GetBoolPref("security.CRL_download.enabled",
>                             &crlDownloading);
> 
> This way, when we grep for "security.CRL_download.enabled" we can see the
> default value without navigating further, and without needing (most)
> (redundant) *_DEFAULT macros.

I've never seen a requirement that we must optimize our code for searching it with grep.

My code is correct and has an advantage over yours:

Some functions modify the out parameter prior to returning (such as initializing it to some generic default). In my understanding, the definition of an out parameters is: if a function returns a failure, you must not rely on the contents of the out variable.

But your proposal would require to rely on the function to never touch the out parameter, and keep it at the value that we had originally passed in.

Now you can argue that it might be safe in this particular context, because the current implementation of libpref makes the promise of not touching the out parameter in a failure scenario. But who knows whether that will ever change?

My code is correct and failsafe. And my code was quicker to write, because I didn't have to spend time on reading the caller function, I'm simply sticking to the function interface.
Comment 62 Kai Engert (:kaie) 2011-04-28 09:06:26 PDT
> @@ +1206,5 @@
> +                           : ocspMode_FailureIsNotAVerificationFailure);
> +
> +  nsRefPtr<nsCERTValInParamWrapper> newCVIN = new nsCERTValInParamWrapper;
> +  if (newCVIN.get()) {
> +    newCVIN->Construct(
> 
> This should be:
> 
>   nsRefPtr<nsCERTValInParamWrapper> newCVIN 
>      = new nsCERTValInParamWrapper;
>   if (NS_FAILED(newCVIN->Construct(...
> 
> because operator new cannot fail but Construct() can. See
> https://developer.mozilla.org/en/Infallible_memory_allocation

I think you meant NS_SUCCEEDED. Yes, it's a good idea to check and avoid swapping the objects on failure.


> 
> @@ +1227,5 @@
> +
> +  if (crlDownloading || ocspEnabled || ocspRequired || anyFreshRequired) {
> +    /*
> +     * This might change the validity of already established SSL sessions,
> +     * let's not reuse them.
> 
> There's no need to optimize this; we can just always clear the session cache in
> the rare case when any of these prefs change. Then we don't have to think about
> under what specific conditions we need to clear the cache.

I'm OK with your proposal, especially after having changed the code to not call setValidationOptions from the skipOCSP* functions.


> @@ +1786,5 @@
>        mNSSInitialized = PR_TRUE;
> 
> +      // Configure the old cert validation APIs (CERT_VerifyCert and
> +      // CERT_VerifyCertificate) to use the new libpkix cert validation
> engine.
> +      CERT_SetUsePKIXForValidation(globalConstFlagUsePKIXVerification);
> 
> This call should be removed. The default libpkix settings won't match the
> semantics we use elsewhere in this patch AND the semantics won't match the old
> semantics. It's a third set of semantics that we should avoid.

My motivation for adding this, maybe we still call some other APIs that indirectly call into the old cert verification APIs?

IMHO having this call is reasonable and doesn't cause harm.

When calling with FALSE, we will still use the old semantics.

When calling with TRUE, we really don't want the old semantics. We want to use libPKIX. If we discover undesired semantics, we must work on that in follow up patches.


> @@ +3377,5 @@
> +    return cancel_and_failure(infoObject);
> +  nsRefPtr<nsCERTValInParamWrapper> survivingParams;
> +  nsrv = inss->GetDefaultCERTValInParam(survivingParams);
> +  if (NS_FAILED(nsrv))
> +    return cancel_and_failure(infoObject);
> 
> We're supposed to avoid calling cancel_and_failure() here, because we didn't
> set up an alternative way of reporting the error, right?

This does the same as the other fatal error scenarios (see remainder of that function).

Whenever an unexpected failure happens (and most of those can only be caused by out-of-memory), we already cancel without setting up more error reporting.

Unless we decide to rewrite this function to deal differently with unexpected failures (would have to be done in a separate bug), this change is consistent with the remainder of the function.


> ::: security/manager/ssl/src/nsUsageArrayHelper.cpp
> @@ +218,5 @@
> +                      cvout, NULL);
> +
> +  usages = cvout[0].value.scalar.usages;
> +}
> +
> 
> This relies on CERT_PKIXVerifyCert clearing the error state returned by
> PR_GetError(), because we don't check the return value of CERT_PKIXVerifyCert.
> I am not sure it always does that. We should check the return value instead of
> relying on this undocumented/unknown behavior.

I agree it makes sense to check the return values of the verification-function-calls.

I rewrote this piece of the code.
Comment 63 Kai Engert (:kaie) 2011-04-28 09:07:07 PDT
Created attachment 528864 [details] [diff] [review]
Patch v7
Comment 64 Kai Engert (:kaie) 2011-04-28 09:09:47 PDT
(In reply to comment #58)
> This change will change the semantics, if not the signatures, of
> certificate-related APIs.

I hope we can avoid changing signatures.

With this bug we shouldn't affect semantics either, because we're keeping the old defaults. That's the intention of the bug.

I think your comment rather applies to the other bug, where we discuss to change the defaults.
Comment 65 Kai Engert (:kaie) 2011-04-28 09:11:05 PDT
I want to note, I've not yet adjusted the whitespace/indenting in this patch, to make the reviewing process easier.
I intent to clean up whitespace/indenting prior to checkin.
Comment 66 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-28 14:35:17 PDT
(In reply to comment #60)
> If we use a #define, the compiler is able to embed the zero default directly
> into code. If you require it to be a variable, you force a couple of bytes in
> the binary's data section, right?

Every modern compiler will be able to optimize the two ways exactly the same. 

(In reply to comment #61)
> My code is correct and failsafe. And my code was quicker to write, because I
> didn't have to spend time on reading the caller function, I'm simply sticking
> to the function interface.

Resolved over the phone. Kai doesn't want to use this style in PSM.

(In reply to comment #62)
> > @@ +1786,5 @@
> >        mNSSInitialized = PR_TRUE;
> > 
> > +      // Configure the old cert validation APIs (CERT_VerifyCert and
> > +      // CERT_VerifyCertificate) to use the new libpkix cert validation
> > engine.
> > +      CERT_SetUsePKIXForValidation(globalConstFlagUsePKIXVerification);
> > 
> > This call should be removed. The default libpkix settings won't match the
> > semantics we use elsewhere in this patch AND the semantics won't match the 
> > old semantics. It's a third set of semantics that we should avoid.
> 
> My motivation for adding this, maybe we still call some other APIs that
> indirectly call into the old cert verification APIs?

If so, we need to find and fix them, because the semantics for those verifications will be different than the semantics of the verifications you change in this patch, and there's no way to get the default libpkix semantics to match the semantics we want to use for verification, because of the custom in/out parameters we use. Anyway, after bug 611781 is fixed, we will not even have the old CERT_* functions available on some (most) important platforms.

> This does the same as the other fatal error scenarios (see remainder of that
> function).
> Unless we decide to rewrite this function to deal differently with unexpected
> failures (would have to be done in a separate bug), this change is consistent
> with the remainder of the function.

File bug 653565 about this. Maybe it is correct, but some of these calls to cancel_and_failure look wrong to me based on previous discussion about this issue in other bugs.
Comment 67 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-28 15:03:13 PDT
Comment on attachment 528864 [details] [diff] [review]
Patch v7

Review of attachment 528864 [details] [diff] [review]:

r+ with these changes.

::: security/manager/ssl/src/nsNSSCertificateDB.cpp
@@ +557,5 @@
   }
 
   certdb = CERT_GetDefaultCertDB();
   certusage = certUsageEmailRecipient;
+  certificateusage = certificateUsageEmailSigner;

This should be certificateUsageEmailRecipient, right? If not, please add a comment, because it looks wrong.

::: security/manager/ssl/src/nsNSSComponent.cpp
@@ +688,5 @@
+  nsNSSShutDownPreventionLock locker;
+  PRInt32 ocspEnabled;
+  mPrefBranch->GetIntPref("security.OCSP.enabled", &ocspEnabled);
+  // 0 = disabled, 1 = enabled, 
+  // 2 = enabled with given default responder

ocspEnabled could be left uninitialized during a failure of GetIntPref. (A problem in the old code too.)

@@ +1161,5 @@
+#define MISSING_CERT_DOWNLOAD_DEFAULT PR_FALSE
+#define FIRST_REVO_METHOD_DEFAULT "ocsp"
+#define USE_NSS_LIBPKIX_DEFAULT PR_FALSE
+
+void nsNSSComponent::setValidationOptions(nsIPrefBranch * pref)

Please add the comment that the mutex must be held when calling this function.

@@ +1783,5 @@
 
+      // Configure the old cert validation APIs (CERT_VerifyCert and
+      // CERT_VerifyCertificate) to use the new libpkix cert validation engine.
+      CERT_SetUsePKIXForValidation(globalConstFlagUsePKIXVerification);
+

Please remove this. We can do this in newly-filed bug 653572.

::: security/manager/ssl/src/nsUsageArrayHelper.cpp
@@ +249,5 @@
 #if 0
   check(suffix, usages & certificateUsageAnyCA, count, outUsages);
 #endif
 
+  if (localOnly && nssComponent) {

test nsNSSComponent::globalConstFlagUsePKIXVerification instead of nssComponent, so that it is clear that this code should be removed when we remove the calls to the old code.
Comment 68 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-04-30 18:52:47 PDT
Also, the new files need mode lines and license boilerplate
Comment 69 Kai Engert (:kaie) 2011-05-02 10:53:46 PDT
(In reply to comment #67)
> 
>    certdb = CERT_GetDefaultCertDB();
>    certusage = certUsageEmailRecipient;
> +  certificateusage = certificateUsageEmailSigner;
> 
> This should be certificateUsageEmailRecipient, right? If not, please add a
> comment, because it looks wrong.

You're right, thanks for catching this. Changed.


> ::: security/manager/ssl/src/nsNSSComponent.cpp
> @@ +688,5 @@
> +  nsNSSShutDownPreventionLock locker;
> +  PRInt32 ocspEnabled;
> +  mPrefBranch->GetIntPref("security.OCSP.enabled", &ocspEnabled);
> +  // 0 = disabled, 1 = enabled, 
> +  // 2 = enabled with given default responder
> 
> ocspEnabled could be left uninitialized during a failure of GetIntPref. (A
> problem in the old code too.)

I agree, important enough to fix now.
I propose to use the #define now added here,
and to simply move the ::SkipOcsp* functions further down,
so that all OCSP setting functions are next to each other in the file.



> +void nsNSSComponent::setValidationOptions(nsIPrefBranch * pref)
> 
> Please add the comment that the mutex must be held when calling this function.

done



> +      // Configure the old cert validation APIs (CERT_VerifyCert and
> +      // CERT_VerifyCertificate) to use the new libpkix cert validation
> engine.
> +      CERT_SetUsePKIXForValidation(globalConstFlagUsePKIXVerification);
> +
> 
> Please remove this. We can do this in newly-filed bug 653572.

OK. I've commented on that bug.


>  #if 0
>    check(suffix, usages & certificateUsageAnyCA, count, outUsages);
>  #endif
> 
> +  if (localOnly && nssComponent) {
> 
> test nsNSSComponent::globalConstFlagUsePKIXVerification instead of
> nssComponent

I don't want to remove the non-null-check of nssComponent that has existed in the old code.

But I agree that I should also check for
  !nsNSSComponent::globalConstFlagUsePKIXVerification
(just as done further above in the function, prior to calling SkipOcspOn.
Comment 70 Kai Engert (:kaie) 2011-05-02 11:20:51 PDT
(In reply to comment #68)
> Also, the new files need mode lines and license boilerplate

done
Comment 71 Kai Engert (:kaie) 2011-05-02 11:23:36 PDT
Created attachment 529523 [details] [diff] [review]
Patch v8

Addresses the final review comments from Brian.
Has therefore r=bsmith
Comment 72 Kai Engert (:kaie) 2011-05-02 11:24:58 PDT
Created attachment 529524 [details] [diff] [review]
Patch v8 - addon a - moves two functions

This is required on top of patch v8 to make it build.

It simply moves two functions to a different location further down in the file.

rs=me
Comment 73 Kai Engert (:kaie) 2011-05-02 11:26:15 PDT
Created attachment 529526 [details] [diff] [review]
Patch v8 - addon b - whitespace cleanup

This fixes the whitespace (indendation levels) that I left out until now, to simplify the reviewing process.

rs=me
Comment 74 Kai Engert (:kaie) 2011-05-02 11:28:01 PDT
This code depends on NSS 3.12.10, because it uses two new allocation functions.

NSS 3.12.0 will be available soon, and land soon afterwards, adding dependency.
Comment 75 Kai Engert (:kaie) 2011-05-03 11:43:43 PDT
Created attachment 529785 [details] [diff] [review]
Incremental patch v9

The automated tests showed a bug in my patch.

The NSS code in SSL_AuthCertificate uses this expression to do an early exit:
    if ( rv != SECSuccess || isServer )
        return rv;

I made a mistake when reverting this expression.
I incorrectly turned this into:
    if ( rv == SECSuccess && isServer ) {

The correct reverted code is:
    if ( rv == SECSuccess && !isServer ) {

I'll attach an incremental patch.
Comment 76 Kai Engert (:kaie) 2011-05-03 11:44:54 PDT
Using v9 the mochitests pass.
Comment 77 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-04 13:27:50 PDT
Comment on attachment 529785 [details] [diff] [review]
Incremental patch v9

Review of attachment 529785 [details] [diff] [review]:

r+. Sorry I overlooked it the first time.
Comment 78 Kai Engert (:kaie) 2011-05-05 14:10:42 PDT
given that this bug doesn't change the defaults,
I've moved keywords APIchange, dev-doc-needed to bug 651246

(I thought I had suggested it in this bug, already, but can't find it right now)
Comment 80 Octoploid 2011-05-06 10:54:30 PDT
This breaks the build (Linux, amd64):

/var/tmp/mozilla-central/security/manager/ssl/src/nsCERTValInParamWrapper.cpp: In destructor ‘virtual nsCERTValInParamWrapper::~nsCERTValInParamWrapper()’:
/var/tmp/mozilla-central/security/manager/ssl/src/nsCERTValInParamWrapper.cpp:55:41: error: ‘CERT_DestroyCERTRevocationFlags’ was not declared in this scope
/var/tmp/mozilla-central/security/manager/ssl/src/nsCERTValInParamWrapper.cpp: In member function ‘nsresult nsCERTValInParamWrapper::Construct(nsCERTValInParamWrapper::missing_cert_download_config, nsCERTValInParamWrapper::crl_download_config, nsCERTValInParamWrapper::ocsp_download_config, nsCERTValInParamWrapper::ocsp_strict_config, nsCERTValInParamWrapper::any_revo_fresh_config, const char*)’:
/var/tmp/mozilla-central/security/manager/ssl/src/nsCERTValInParamWrapper.cpp:77:40: error: ‘CERT_AllocCERTRevocationFlags’ was not declared in this scope
nsRecentBadCerts.cpp
Comment 81 Octoploid 2011-05-06 10:57:45 PDT
This happens with nss-3.12.9 and --with-system-nss.
Comment 82 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-05-06 12:38:09 PDT
Octoploid, that is now being tracked in bug 655340.
Comment 83 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-07-18 14:24:50 PDT
AFAICT, this actually landed in the Firefox 6 cycle.

Note You need to log in before you can comment on or make changes to this bug.