Bug 321755 (CRLDP)

implement crlDistributionPoint extension in libPKIX

RESOLVED FIXED in 3.12.4

Status

enhancement
P2
normal
RESOLVED FIXED
14 years ago
5 years ago

People

(Reporter: ken2006, Assigned: alvolkov.bgs)

Tracking

(Blocks 2 bugs)

trunk
3.12.4
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: PKIX, )

Attachments

(16 attachments, 14 obsolete attachments)

36.75 KB, patch
alvolkov.bgs
: review+
Details | Diff | Splinter Review
18.17 KB, patch
julien.pierre
: review+
Details | Diff | Splinter Review
10.23 KB, patch
alvolkov.bgs
: review+
Details | Diff | Splinter Review
7.75 KB, patch
alvolkov.bgs
: review?
nelson
Details | Diff | Splinter Review
131.48 KB, patch
Details | Diff | Splinter Review
23.60 KB, patch
julien.pierre
: superreview-
Details | Diff | Splinter Review
11.24 KB, patch
Details | Diff | Splinter Review
17.49 KB, patch
Details | Diff | Splinter Review
3.96 KB, patch
Details | Diff | Splinter Review
27.54 KB, patch
Details | Diff | Splinter Review
28.20 KB, patch
alvolkov.bgs
: review-
nelson
: review-
Details | Diff | Splinter Review
29.63 KB, patch
Details | Diff | Splinter Review
10.69 KB, patch
Details | Diff | Splinter Review
112.26 KB, patch
Details | Diff | Splinter Review
4.07 KB, patch
alvolkov.bgs
: review+
Details | Diff | Splinter Review
635 bytes, patch
nelson
: review+
Details | Diff | Splinter Review
Reporter

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6

Server at 65.125.174.105:443 has crl revocation URl attrib set to 'critical' (experiment to see if CRL checking can be enforced).

Attempting to load that URL (using firefox 1.0.x) alerts error code -8151 (SEC_ERROR_UNKNOWN_CRITICAL_EXTENSION). 

Since crlDistributionPoint is a known extension, and one that can be tested with a pass or fail value, one should expect to see this succeed, at least to a point where a pass or fail code is presnted within the context of the CRL status being valid/not.

Reproducible: Always

Steps to Reproduce:
1. Attempt load of https://65.125.174.105/public/



Though the genrally recommended state for crlDistributionPoint is critical=false, it is desirable in some cases to force CRL use, even at the expense of losing some incompatible clients, and at risk of some clients ignoring the critical flag (i.e. ie6).

If the moz suite were to support this, then an organization could endorse it's use by the assertion that CRLs are explicitly tested when critical=true.

Updated

14 years ago
Assignee: wtchang → julien.pierre.bugs
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Thows SEC_ERROR_UNKNOWN_CRITICAL_EXTENSION if crlDistributionPoint is set critical → Throws SEC_ERROR_UNKNOWN_CRITICAL_EXTENSION if crlDistributionPoint is set critical
It is appropriate for NSS to dishonor certs containing critical extensions 
that are unknown to it (that is, unimplemented).
NSS doesn't implement CRLDPs because of intellectual property (patent) concerns. 
I am unaware of any license for the relevant patent that is fully compatible 
with the MPL.  
Reporter

Comment 2

14 years ago
(In reply to comment #1)
> NSS doesn't implement CRLDPs because of intellectual property (patent)
> concerns. 

Nelson, what is that patent(s) number? I'd like to investigate.. If the patent claims a system that automatically retrieves CRLs when a flag is set, that seems obvious to me and disputable. I think there are even some prior arts related to that basic claim.
Reporter

Comment 3

14 years ago
I've searched through several u.s. patents, including 5,699,431 and its referenced ones. None that I saw made claims on 'automatic retrieval', beyond what Mozilla already does when one opens a CRL and chooses a refresh interval.. In fact, the entire concept of CRL and Distrution Points as we know it appears to be 'claimed' by several of those patents -and apparently are already implemented in Moz.. I don't think that auto retrieval triggered via a critical flag adds any overlap, based one ones I looked at.. An one might argue that that trigger is 'obvious' within the scope of CRLDPs, and x509 crtical-flags now being 'prior' arts..

Comment 4

14 years ago
Ken,

This issue can only be cleared by lawyers and/or the patent holder, unfortunately. I'm not at liberty to say anything more than we are working it.
OS: Windows 2000 → All
Hardware: PC → All

Updated

14 years ago
Priority: -- → P2
Target Milestone: --- → 3.12
QA Contact: jason.m.reid → libraries
News: July 25, 2007: 08:00 AM EST
http://money.cnn.com/news/newsfeeds/articles/prnewswire/LAW05125072007-1.htm

"Entrust will supply its certificate revocation list distribution points (CRL-DP) patent 5,699,431 to Sun under a royalty-free license for incorporation of that capability into the Mozilla open-source libraries."

Watch this space for more news. :)
Severity: normal → enhancement
Version: unspecified → trunk
Posted patch patch, process CRLDP, v1 (obsolete) — Splinter Review
This patch applies cleanly to the trunk, builds, and passes all.sh, 
which means it causes no regressions.  However, AFAIK there are no 
tests yet in all.sh that would exercise this new code (much), so I
cannot say that it is tested.  

The next step is probably to devise a way to actively test this new 
code (e.g. to enable vfyserv to test it).
Comment on attachment 274738 [details] [diff] [review]
patch, process CRLDP, v1

I have brought this patch forward to the trunk (again).
I would like some review from someone who has seen this code
before I made the most recent changes to it.
Attachment #274738 - Flags: review?(julien.pierre.boogz)
Summary: Throws SEC_ERROR_UNKNOWN_CRITICAL_EXTENSION if crlDistributionPoint is set critical → accept critical crlDistributionPoint extension in libPKIX
Whiteboard: PKIX
Comment on attachment 274738 [details] [diff] [review]
patch, process CRLDP, v1

Please don't let this huge patch get stale!
Attachment #274738 - Attachment description: PRELIMINARY patch, v0 → patch, process CRLDP, v1

Comment 9

12 years ago
Comment on attachment 274738 [details] [diff] [review]
patch, process CRLDP, v1

I am only about 15% through the patch, but I have already found some bad things with the templates and decoding that warrant r-. I hope to have all the details of the complete review by the end of today.
Attachment #274738 - Flags: review?(julien.pierre.boogz) → review-

Comment 10

12 years ago
OK, I have taken a full pass at it now. Since the past is so big, I may have missed some thing and will go over it again most likely next week. But here is what I have to say for now.

- in xcrldist.c :

There arre problems with the new templates for the CRL issuing distribution point extension.

In CRLIssuingDistributionPointTemplate :
- The first entry in the sequence, for derDistPoint, is wrong in several ways :
a) there should not be a SEC_ASN1_EXPLICIT nor SEC_ASN1_CONSTRUCTED
b) we should not use SEC_AnyTemplate

This field is defined in RFC3280 as follows :

       distributionPoint          [0] DistributionPointName OPTIONAL,

And DistributionPointName is :

DistributionPointName ::= CHOICE {
     fullName                [0]     GeneralNames,
     nameRelativeToCRLIssuer [1]     RelativeDistinguishedName }

The corresponding code in CERT_DecodeCRLIssuingDistributionPoint is wrong . It tries to emulate choice by looking at the DER.

        /* get the data if the distributionPointName is not omitted */
        if (value->derDistPoint.data != NULL) {
            value->distPointName.distPointType = (DistributionPointTypes)
                    ((value->derDistPoint.data[0] & 0x1f) +1);

Ugh! We need to use a choice in the template here.
It appears that the same problem exists for the CRLDistributionPointStr structure, which was already there. They should both be fixed
to use a CHOICE in the template for the distributionPointName component. This is probably why the error was perpetuated . Sigh :(.

CERT_DecodeCRLIssuingDistributionPoint appears to have a leak :

   *pValue = value; /* consider failure free XXX */

In failure cases the structure needs to be freed. *pValue does not need to be set, or could be set to NULL.

c) We'll need SEC_ASN1_SUB / SEC_ASN1_MKSUB, since some subtemplates that are referenced are moving to libnssutil ...


- in pkix_pl_pki.h :

The definitions of some structures is repeated many times in the comments, for example for each field of the issuingDistributionPoint extension, for each accessor function . This is unnecessary. It only needs to be there once. Having it so many times just clutters the header file.

- in pkix_tools.h, there is a change to an ifdef .

I don't think that change is correct nor the original code . We can't import PKIX_ERRORNAMES accross DLLs with the __declspec(dllimport) convention.
It may work because it is not getting accessed accross DLLs . So maybe it just needs to be extern . I don't know what the "IN_LIBPKIX" macro means.
Since the test programs that uses the libpkix API are now consolidated into one statically-linked program, I think we can remove the ifdef and just use the else clause .


- in pkix_pl_cert.c :

There is a double-checked locking idiom that has bitten us so many times again :
 	

        /* if we don't have a cached copy from before, we create one */
	if (cert->distPoints == NULL) {
	    PKIX_OBJECT_LOCK(cert);
	    if (cert->distPoints == NULL) {
	       ...
	    }
	    PKIX_OBJECT_UNLOCK(cert);
        }
        PKIX_INCREF(cert->distPoints);
        *pDistPoints = cert->distPoints;

The problem is if cert->distPoints was not NULL upon entry . Because there was no locking, there is no way to tell if the
current thread/CPU has a proper view of the memory pointed to . Even on architectures that don't do writes out-of-order, and
when the code sets the pointer last (as is the case here), without locking, there is still a risk that the compiler can reorder
things and another thread will catch the pointer before what it points to is fully filled up.

I think the solution here is to move the decoding of the extension earlier to the time of the cert object creation, before it is
shared and needs to be locked. This function would then become merely a read-only accessor, and would not need to lock the cert object.

Otherwise, we could lock before the NULL pointer check, which is much simpler of a change, but also less efficient.

Whatever solution we choose should be applied to the decoding of other extensions, as they most likely have the same problem.

- the copyright header in new files, such as pkix_pl_certdp.c, needs to be updated to be consistent with the rest of libpkix .

- Double-checked locking problem exists again in PKIX_PL_CertDistributionPoint_GetDistributionPointName.
There is no good reason to do the late decoding there. When DPName object is created, it should be decoded.

- the double-checked locking problem again exists in PKIX_PL_CertDistributionPoint_GetCrlIssuerNames and PKIX_PL_CRL_GetIssuingDistributionPoint,
PKIX_PL_DistributionPointName_GetFullNames
Group: security
Julien, you changed this bug to be "security sensitive".
Was that intentional or accidential?
It must have been accidental, since this is an RFE, not a bug in old code.
Group: security

Comment 13

12 years ago
It was accidental. Sorry.
Alias: CRLDP
Alexei, please review this patch. 

The original patch (v1 above) included a large change to several files
to alphabetize the contents of some large defines that are used to 
generate enums.  That change is very helpful because it makes it 
a lot easier to compare those long defines to see if they've gotten
out of sync, and see if any items are present in one list but not the 
other.  But that change obscures the additions being made for CRLDP.
So, in this patch I have separated out the changes for alphabetization
(and other cosmetic whitespace changes) into a separate patch.  
If this patch passes review, I will commit it before the rest of the 
CRLDP patch.

Alexei, please check to see if reordering the defines in this way will 
or will not have any unexpected side effects on the operation of libPKIX.
Assignee: julien.pierre.boogz → nelson
Status: NEW → ASSIGNED
Attachment #299468 - Flags: review?(alexei.volkov.bugs)
Assignee

Comment 15

12 years ago
Comment on attachment 299468 [details] [diff] [review]
alphabetize list of classes, error codes and error strings (checked in)

r=alexei.
Changes looks ok. I've also run tests. Result shows no regressions.
Attachment #299468 - Flags: review?(alexei.volkov.bugs) → review+
Comment on attachment 299468 [details] [diff] [review]
alphabetize list of classes, error codes and error strings (checked in)

pkix_errorstrings.h; new revision: 1.14; previous revision: 1.13
pkixt.h;             new revision: 1.10; previous revision: 1.9
Attachment #299468 - Attachment description: alphabetize list of classes, error codes and error strings → alphabetize list of classes, error codes and error strings (checked in)
Comment on attachment 299468 [details] [diff] [review]
alphabetize list of classes, error codes and error strings (checked in)

The checkin shown in the preceding comment committed the wrong patch.
This checkin backs out that patch and commits the intended and 
reviewed patch.

pkix_errorstrings.h; new revision: 1.15; previous revision: 1.14
pkixt.h;             new revision: 1.11; previous revision: 1.10
Priority: P2 → P1
Summary: accept critical crlDistributionPoint extension in libPKIX → implement crlDistributionPoint extension in libPKIX
Target Milestone: 3.12 → 3.12.1
Julien,  
I would like to try to summarize your review comments in comment 10,
and then respond to some of them, and then propose how to move forward.

I think your review comments in comment 10 may be summarized as follows:
1a) use of SEC_ASN1_EXPLICIT | SEC_ASN1_CONSTRUCTED in xcrldist.c
1b) use of SEC_ASN1_ANY and multiple decoding steps (including code to 
interpret DER directly rather that using a decoder), rather than use 
of SEC_ASN1_CHOICE and decoding all in one step in xcrldist.c.

2) Excessively repeated comments in pkix_pl_pki.h

3) Use of __declspec(dllimport) in pkix_tools.h

4) Double-checked locking in a bunch of places

5) old copyright headers in new source files

Is that an accurate summary?  
Are those all your concerns?
Would it be OK to proceed to commit changes in any files not affected by
the above issues?

I would like to respond here to two of the above concerns, because I think 
they are not entirely valid.

Re: 1a) use of SEC_ASN1_EXPLICIT | SEC_ASN1_CONSTRUCTED in xcrldist.c

As I've documented in bug 436957 comment 3, 4 and 5, context-specific tagged 
choices are always encoded with explicit tagging, even when declared to be 
implicit, and explicit tags are always constructed.  All but one of the 
templates in NSS that use SEC_ASN1_EXPLICIT also specify SEC_ASN1_CONSTRUCTED, so it does not appear to be an error to do so.
So, I believe the issue summarized as 1a is a non-issue.

Re: 3) Use of __declspec(dllimport) in pkix_tools.h 

The use of __declspec(dllimport) in a declaration of data found in 
another DLL (that is, data not found in the DLL or EXE where the 
declaration is used) is allowable and correct if used properly.  
Data declared that way cannot be used in initializers of structs or 
other data declarations, but can be used in assignment statements in
executable code.  We have been using it that way for years in libSSL
and programs that link with libSSL.  It is how those programs access 
the table of implemented SSL cipher suites from the SSL DLL.  See 
<http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/security/nss/lib/ssl/ssl.h&rev=1.28&mark=53-54,62,65#53>

The DATA declaration of PKIX_ERRORNAMES is already in nss.def.  The 
change to the ifdef was to make it properly accessible to code outside
of the DLL.  I imagine that this was so that test programs could access
the error strings.  

With this technique, the code that is inside the same DLL where the array 
actually exists should NOT use the special __declspec(dllimport) to
access the data, but the code that is outside of that DLL MUST use that 
special declaration.  That is the purpose of the IN_LIBPKIX symbol.
It is defined when compiling code that it built into libPKIX and is not
defined for other code.  

As you probably know, there is an RFE to rid NSS entirely of the array of
PKIX_ERRORNAMES and all referencs to it in libPKIX and related code.  IF
& when that happens, this declaration may become superfluous.

Going forward, I propose to make a series of smaller patches, each of which
adds a part of the code ultimately needed to implemented CRLDP handling.
I hope to get them reviewed and committed over time, rather than all at once
to avoid reviewing monster patches and to avoid bit rot.  
I'm not going to attempt to create a single new monster patch to replace
patch v1 above.  Instead, I'm going to try to create a series of patches
to replace it is a somewhat bottom-up fashion.  I decided to start with 
the code to decode CRL IDP extensions, since that was an area where Julien
had numerous review comments for patch v1.  I modified this code a lot to 
address those comments.  Having done that, I have several observations.

1. There is no code to test this (nor was there any code to test this part 
of the original patch).  I seems to me that we should have one or more 
utility programs to decode and print CRLs, and also to create CRLs, and that
they should be extended to test out this new code.  But I need some pointers
about which programs to extend.

2. This code is for decoding only.  There is no corresponding code to encode
these extensions.  I'm not at all confident that the new templates would 
work correctly in the encoder.  There's an optional explicit tagged choice 
in this template, and I am not at all sure that the template will allow 
the optional item to be omitted.  I'd like to find the utility that encodes
CRLs (we have one, right?) and extend it to call the encoder, to ensure 
that this is right before committing the new templates and associated structs.

This patch also cleans up some other existing code.  It rewrites one private function and gives it a properly formatted name, and declares it in a private
header file.  

I'd like the review of this patch to look at the clean up parts and to 
offer any comments and perspective on the new templates.
Attachment #274738 - Attachment is obsolete: true
Attachment #326598 - Flags: review?(julien.pierre.boogz)
This patch is more complete than the previous one.  It does 2 more things:
a) exports the new functions from libnss3.so, and 
b) uses the new functions in the common "pretty printing" code used by
crlutil and pp.  

Unfortunately, it's still not thoroughly tested because I haven't yet found 
a CA with a CRL with the IDP extension. :(
Attachment #326598 - Attachment is obsolete: true
Attachment #326846 - Flags: review?(julien.pierre.boogz)
Attachment #326598 - Flags: review?(julien.pierre.boogz)

Comment 21

11 years ago
Comment on attachment 326846 [details] [diff] [review]
patch for decoding and displaying CRL IDP, v2

This looks mostly OK.

My comments are all about error checking.

1) I would like to see more error checking in cert_ByteArrayToBitArray, at least that the pointer arguments are non-NULL.  It would be nice if this function didn't return void also.

2) CERT_FindCRLIssuingDistPointExten is a public function, and should check that both crl and pValue are non-NULL, and set the invalid argument error if not.

3) CERT_DecodeCRLIssuingDistributionPoint is a public function. It should check the arena argument even for optimized code, not just assert. And before that, it should check that encodedValue is non-NULL also, or fail and set the invalid arguments error.
Attachment #326846 - Flags: review?(julien.pierre.boogz) → review+
I'm planning to add code to crlutil to generate the IDP extension, 
and while I was looking at the sources for that function, I decided to 
do some cleanup.  Mostly, this patch does these things:
a) eliminates lots of duplicated checks for input arguments, 
b) sets an INVALID_ARGS error on invalid args
c) unmarks marked arenas before returning successfully.

It also fixes at least one potential crash in a path that would goto loser
leaving the variable mark uninitialized, which was then passed to 
PORT_ArenaRelease.

It also removes a few assertions.  That might be controversial.
Attachment #327528 - Flags: review?(alexei.volkov.bugs)
Assignee

Comment 23

11 years ago
Comment on attachment 327528 [details] [diff] [review]
cleanup crlgen.c before adding new code, v1

looks good. r+
Attachment #327528 - Flags: review?(alexei.volkov.bugs) → review+
Assignee

Comment 24

11 years ago
Example of crl generation script:

-------------
update=20080820100011Z
addcert 100-199 20080818100011Z
addext crlIdp 0  fullName="CN=TestCA,...,..." onlyUserCerts=TRUE onlyCaCerts=FALSE onlySomeReasons=1,2 onlyAttrCerts=FALSE indirectCrl=FALSE
-------------

Adding non-critical(0) crl ipd extension with DP full name "CN=TestCA" that contains only user certs with keyCompromise and cACompromise reason codes. 

Default value for boolean types is FALSE, thus no need to specify other keyword/value paid than onlyUserCerts.

DP Name can only be one of the two: full name(keyword: fullName), or relative name(keyword: relativeName). The name should be surrounded by double quotes if it has a space in its value.

SPC is also a keyword/value pair delimiter, thus can be used instead of '='.

Reason codes(keyword: onlySomeReasons) is a comma separated list of reason codes.
Attachment #335603 - Flags: review?(nelson)
Assignee

Comment 25

11 years ago
Posted patch Merged with NSS_CRLDP_BRANCH (obsolete) — Splinter Review
Assignee

Comment 26

11 years ago
Didn't include new files to the attachment 338336 [details] [diff] [review]. This patch fixes it.
Attachment #338336 - Attachment is obsolete: true
Priority: P1 → P2
Target Milestone: 3.12.1 → 3.12.5
Comment on attachment 338798 [details] [diff] [review]
NSS_CRLDP_BRANCH merge(with new files)

Does this patch replace attachment 326846 [details] [diff] [review]? If yes, you might want to mark that other patch as obsolete (it has r+ but is not checked in)
Comment on attachment 335603 [details] [diff] [review]
crlutil patch - CRLIDP generation

Does this patch replace attachment 327528 [details] [diff] [review]? If yes, you might want to mark that other patch as obsolete (it has r+ but is not checked in)
Comment on attachment 335603 [details] [diff] [review]
crlutil patch - CRLIDP generation

Patch does not compile. crlDpReasonIndex is used but never declared.
Comment on attachment 338798 [details] [diff] [review]
NSS_CRLDP_BRANCH merge(with new files)

Nelson proposed, maybe it's possible to take the non-libpkix portions of this patch first, and delay the rest. Maybe this would make it easier to get those portions in. Maybe those might help us in getting some basic crldp support done outside of libpkix.
Alexei, I think the question in comment 28 was being asked of you.
Kai, it's very often unclear who is meant by "you" in bug comments, so be 
sure to name names. :)
In reply to comment 30, yes I proposed that.  Then I remembered what the 
problem is with that code outside of libPKIX.  The templates for the ASN.1
decoder are constructed in such a way that they won't work with the encoder.
I asked Alexei to enhance crlutil so that I could test the encoder with it,
so that I could try to develop templates that work with both encoder and 
decoder, and I believe he did so (attachment 335603 [details] [diff] [review]).  But that work was interrupted so many times that I never got back to it.
Posted patch hack patch v1 (obsolete) — Splinter Review
Attachment #358899 - Flags: review?(nelson)
Assignee

Comment 35

11 years ago
(In reply to comment #28)
> (From update of attachment 335603 [details] [diff] [review])
> Does this patch replace attachment 327528 [details] [diff] [review]? If yes, you might want to mark that
> other patch as obsolete (it has r+ but is not checked in)

I guess Nelson has forgotten to integrate it. It cleans a bunch of places in crlgen.c.
Comment on attachment 358899 [details] [diff] [review]
hack patch v1

I have not yet done a complete review of this patch, in part because this 
is an Hg/git style patch, and bugzilla's patch analysis tools only understand
CVS diffs.  So, I request that the next version of this patch be submitted 
for review as a CVS diff against the CVS tip of trunk.

Here are some "preliminary" review comments.  Please address these before 
submitting a new CVS diff patch.

1. 
>+CERT_IsCRLAvailableAndIsItFresh(CERTCertificate* cert, CERTCertificate* issuer,
Please rename this new function to 
  CERT_IsFreshCRLAvailable

2. in  security/nss/lib/certhigh/certvfy.c we see:

>+extern SECStatus
>+ocsp_ParseURL(const char *url, char **pHostname, PRUint16 *pPort, char **pPath);

Please put this declaration in a suitable private NSS header file, 
not in a .c file.

3. New function sec_ImportNetworkCRL doesn't follow NSS style conventions, 
and doesn't match style conventions used in the rest of that file.

4. Many new lines exceed the 80 column length limit, especially the new 
code added to SEC_CheckCRL, but also many other places.
Attachment #358899 - Flags: review?(nelson) → review-
Comment on attachment 358899 [details] [diff] [review]
hack patch v1

More comments on this patch.

1. This patch doesn't affect libPKIX
2. This patch changes the default behavior of the old pre-libPKIX cert 
verification functions.
3. This patch doesn't offer control over leaf checks versus chain checks.
4. In SEC_DownloadImportCrl, you need to add 1 to the amount passed to 
PR_Malloc.  Otherwise, location[uri->len] = 0; actually writes to a byte 
past the end of the space malloced.
5. I'd prefer dpoints->distPoints[0] to *dpoints->distPoints and
dpoints->distPoints[1] to *(dpoints->distPoints+1)
6. You should test 
dpoints->distPoints[0]->distPoint.fullName to be non-NULL before fetching 
dpoints->distPoints[0]->distPoint.fullName->type or
dpoints->distPoints[0]->distPoint.fullName->name.other
7.  I don't think it's necessary to preclude CDPs with more than one 
distribution point.  What's necessaary is to check that at least one 
point is a kind we can handle.

Please ask Julien to (also) review the next rev of this patch.
(In reply to comment #36)
> >+CERT_IsCRLAvailableAndIsItFresh(CERTCertificate* cert, CERTCertificate* issuer,
> Please rename this new function to 
>   CERT_IsFreshCRLAvailable

What about CERT_FindLocalCRLAndValidity ?


> >+ocsp_ParseURL(const char *url, char **pHostname, PRUint16 *pPort, char **pPath);
> Please put this declaration in a suitable private NSS header file


moved to ocspi.h


> 3. New function sec_ImportNetworkCRL doesn't follow NSS style conventions, 
> and doesn't match style conventions used in the rest of that file.

Does it look better now?


> 4. Many new lines exceed the 80 column length limit, especially the new 
> code added to SEC_CheckCRL, but also many other places.

##############


> 1. This patch doesn't affect libPKIX

True. Would you rather see this patch moved to a different bug?
Or should we change the bug's subject and remove "libpkix"? (since the bug alias is the general crldp...)


> 2. This patch changes the default behavior of the old pre-libPKIX cert 
> verification functions.

Right, although only if an HTTP client callback has been registered.
But still, I agree, this default behavior should not get changed.
I added a function that requires the application to explicitly enable the use of automatic CRL downloading:

=> SEC_CRLDownloadEnable


> 3. This patch doesn't offer control over leaf checks versus chain checks.

True. I would like to skip this for now.
I guess CRLs for intermediates are much smaller than those listed in leaf certs, so it should be reasonably acceptable we can't disable chain CRL downloading, do you agree?


> 4. In SEC_DownloadImportCrl, you need to add 1 to the amount passed to 
> PR_Malloc.  Otherwise, location[uri->len] = 0; actually writes to a byte 
> past the end of the space malloced.

Thanks for catching this, shame on me.


> 5. I'd prefer dpoints->distPoints[0] to *dpoints->distPoints and
> dpoints->distPoints[1] to *(dpoints->distPoints+1)

Changed.


> 6. You should test 
> dpoints->distPoints[0]->distPoint.fullName to be non-NULL before fetching 
> dpoints->distPoints[0]->distPoint.fullName->type or
> dpoints->distPoints[0]->distPoint.fullName->name.other

Changed.


> 7.  I don't think it's necessary to preclude CDPs with more than one 
> distribution point.  What's necessaary is to check that at least one 
> point is a kind we can handle.

Yeah, but this is tricky. Because I don't fully understand (yet) how to distinguish URIs that refer to a fragemented/incremental CRL from URIs that refer to a full CRL! I don't know how to do that! Advice welcome.

My thought has been:
- if the cert lists only a single URI, then it's probably a full URI
- if there are multiple URIs, how shall I know what's what???

This is a place for future improvement.
But I tried to do better.
Please see the new implementation of SEC_CheckCRL which has a lot of comments.

(Will attach patch soon, after some more cleanup)
obsoletes hack-patch v1
Attachment #358899 - Attachment is obsolete: true
Comment on attachment 359003 [details] [diff] [review]
lib-classic patch v2

Some comments about moved data, to make reviewing easier:

>-struct OCSPCacheItemStr {
>-struct OCSPCacheDataStr {

The above types have been moved to file ocspti.h
NO changes.


> struct OCSPGlobalStruct {

This type has been moved to file ocspti.h
NO changes, but added three members at the end.

This global variable is now shared between OCSP and CRL download code.
I'm willing to rename this later if you want better names.
Renaming would cause the patch to increase very much in size.
I propose to delay any renaming, in order to simplify the review.


>-static struct OCSPGlobalStruct {
>+struct OCSPGlobalStruct OCSP_Global = { 

Removed static, because it's now needed in both ocsp.c and certvfy.c
Attachment #359003 - Flags: superreview?(julien.pierre.boogz)
Attachment #359003 - Flags: review?(nelson)
Comment on attachment 359003 [details] [diff] [review]
lib-classic patch v2

Nelson, you might be interested in the new implementation of function SEC_CheckCRL,

and in the following two new functions:

>+SECStatus
>+SEC_CRLDownloadEnable(PRBool enable)

>+SECStatus
>+SEC_CRLDownloadSettings(PRUint32 crlTimeoutSeconds,
>+                        PRUint32 maxCrlDownloadSizeInKB)
Nelson, I know that you prefer CVS diffs and that you like the incremental diff ability.

"part A" attachment 359005 [details] [diff] [review] is equivalent to the patch you have reviewed already
(hack patch v1)

In order to make the bugzilla diff tool work, I split my new patch (lib-classic patch v2) into two pieces:

"part B" attachment 359006 [details] [diff] [review] is the patch subset changes the same set of files as the above.

So, please click on "part B", select "diff" against "part A" and you'll see the incremental changes...

Finally, look at "part C" to see the changes to additional files.

Updated

11 years ago
Attachment #359003 - Flags: superreview?(julien.pierre.boogz) → superreview-

Comment 46

11 years ago
Comment on attachment 359003 [details] [diff] [review]
lib-classic patch v2

Kai,

Unfortunately, I found a lot of problems with your patch.

First, some documentation  since you had a question in your comments : the "dp" argument of several of the CRL cache APIs, it is supposed to be the distribution point name field of the certificate's CRL DP extension . Currently, the cache is missing some of the code to handle distribution points, so NULL is a valid value. But this will need to be fixed in the cache in the future. Look for the string XCRL to see the places that need to be fixed. Basically, a hash table needs to be added to have multiple CRLs by issuer under a single distribution point.

Now, onto the patch. I don't think you need to make a copy of CERT_CheckCRL . This is a private function, so it is OK to change its prototype.

What you have done is costlier in terms of performance - you go to the cache twice, once to find out if you have a CRL in it and return its validity period, and then another time to actually lookup the cert. This means going through locks several times - once to get the CRL validity period, and one more time to check the serial number against the CRL. The lookup of the serial number in the CRL is just a very cheap hash table lookup. So if you are going through the cache locks in the first place, there is no reason to skip the local lookup. It's possible that the local revocation information is more up to date than the network revocation information, in case you are subjected to a replay attack of an older CRL.

Also note that the CRL never expires, unlike a cerificate. The nextUpdate field is actually optional. The most meaningful field is the thisUpdate field, which tells you how old the revocation information actually is. It is always OK to use an older CRL (though it may be slightly wasteful in some rare cases, such as if thisUpdate < EEcert->notBefore). This is another argument for not skipping the local lookup. This is also why the CRL cache and certificate verification code path never uses SEC_CheckCrlTimes anywhere. That API is now only used for the KRL case, and IMO it should be either completely deleted, or renamed as sec_ (lowercase prefix).

Also, note that the CRL cache will be changed in the future to support distribution point CRLs, and perhaps also delta CRLs. The approach you took in CERT_FindLocalCRLAndValidity pokes into the internals of the cache, by looking at dpcache->selected, which is currently a full CRL for the current issuer, but is subject to change. This field is currently hidden in all the CRL cache functions, and should remain that way. You may want to take a different approach than returning the CRL validity period, and instead perhaps return a simple PRBool that states whether a revocation lookup has been performed in a CRL whose nextUpdate is either missing, or, if present has already passed - ie. a bit that tells you that you may want to do a network fetch to get newer revocation information.

Re: sec_ImportNetworkCRL . There should be no need to do the CRL signature verification in this step. The CRL cache already always takes care of it during lookup.

I am not sure that it's really necessary to import the CRL to the DB. 

This is not very future proof, as it will not be possible with CRLs with IDP or delta CRLs, since the softoken currently only supports full CRLs. I don't know if we will enhance softoken to support those. We may or may not want to do that. That's partially the subject of bug 217392.

You can actually replace the entire function sec_ImportNetworkCRL with a call to CERT_CacheCRL . This will simply insert it into the CRL cache, in RAM . That's what the web server does today when it fetches a CRL dynamically. All CRL decoding and signature verification will automatically be taken care of when needed. This also has the advantage of obviating the problem of finding the right issuer certificate for the CRL in this spot. It doesn't obviate it completely of course - that's still the subject of bug 233118 . But at least that problem should only need to be fixed in one place, not two. The only downside is that the CRL is not persisted. If you really want to persist the CRL to disk, you can use PK11_ImportCRL, not SEC_NewCrl, and use CRL_IMPORT_DEFAULT_OPTIONS . But for now that call will only work with full CRLs.

My biggest problem is with SEC_CheckCRL . From what I understand, your patch is currently attempting to implement only support for full CRLs. Unfortunately, you cannot always know in advance whether you are going to get a full CRL before fetching it. You correctly exclude indirect CRLs if crlIssuer is present.

You also need to exclude CRLs for specific reason codes by looking at the reasons field of the CRLDP extension. You shouldn't try to fetch it either if it's set.

Regarding the fact that there may be multiple URIs, it is OK to try multiple. It's possible that multiple protocols are supported for example. This should not be a reason to give up. 

However, even after you make those changes, you may still end up getting a CRL with a CRL IDP extension . This is currently a critical extension, marked as unsupported in 3.12, pending further changes. This means that you would have done the whole CRL download for nothing in this case - quite a waste of network resources.

When the CRL IDP extension is present in the CRL, it is almost always a partial or indirect CRL. There are only two exceptions :
1) all the CRL IDP extension fields are empty . This is something the client can tell
2) the CA has decided not to partition the CRL. But the client cannot know that this is the case. And the CA can dynamically partition in the future. Any CRL with an IDP containing a DistributionPointName must be assumed to be a partial CRL.

Comment 47

11 years ago
FYI, I would prefer we don't try to enhance the legacy code. This code is supposed to be obsolete in 3.12 . This bug is about implementing CRL DP in libpkix. We should implement CRL DP in certs and CRL IDP in CRLs at the same time. Otherwise, we will run into really messy situation where we waste time fetching CRLs that we cannot support.
Comment on attachment 359003 [details] [diff] [review]
lib-classic patch v2

Clearing review request.

This patch has been requected in discussions.

We are waiting for Alexei to produce a patch that implements something similar at the libpkix level, and waiting for Julien to implement something in the CRLDP cache that can remember the fact "downloaded CRL is partial" in the in-memory-only cache.
Attachment #359003 - Flags: review?(nelson)

Updated

11 years ago
Blocks: 474606

Updated

10 years ago
Blocks: 480683
Assignee

Comment 49

10 years ago
Comments to the patch will follow.
Assignee: nelson → alexei.volkov.bugs
Attachment #365324 - Flags: review?
Comment on attachment 365324 [details] [diff] [review]
Patch v1 - implement crl fetching using cert CRL DP 

I think Alexei meant to ask me to review this.
Attachment #365324 - Flags: review? → review?(nelson)
* * *  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
I am going to split off a new bug from this one, a bug to cover the 
implementation of CIDP extensions in CRLs.  Many of the patches attached
to this bug are actually about CIDP extension implementation, not about
fetching of CRLs from CDP extensions in certs.  I wish I could move those
attachments and their review flags to the new bug. :(

Until recently, it was thought that the implementation of CIDP extension
processing in CRLs was prerequisite to fetching CRLs from URLs in CDPs.
There were (and still are) big issues attached to CIDP extension processing,
the biggest of which is the need to handle partitioned CRLs.  But a recent
break through was the realization that CIDP extension handling is NOT a 
prerequisite to CRL fetching from UDPs in CDPs, and so the work to to the
fetching is proceeding with all due haste.
* * *  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comment on attachment 365324 [details] [diff] [review]
Patch v1 - implement crl fetching using cert CRL DP 

Alexei and I reviewed this patch, at least partially, last week.
Alexei has the review notes.   I am expecting an updated patch soon.
Alexei, please let me know if you disagree with what I've written here.
Attachment #365324 - Flags: review?(nelson) → review-
Assignee

Comment 53

10 years ago
Attachment #365324 - Attachment is obsolete: true
Attachment #367046 - Flags: review?(nelson)
Assignee

Comment 54

10 years ago
Attachment #367047 - Flags: review?(nelson)
Assignee

Comment 55

10 years ago
The patch does not implement the handling on partitioned crl. Only full crls(ones what do not have crlidp extension) will be cached.
The following changes are made:
* allows to cache the crl without copping it's der
* each Issuer cache new creates a hashtable to track crldps and theirs crl caches
* DPs that point to partial crl will be stored to avoid new attempts to fetch partial crls.
Attachment #367052 - Flags: superreview?(nelson)
Attachment #367052 - Flags: review?(julien.pierre.boogz)
Attachment #367046 - Attachment description: Patch v2 - implement crl fetching using cert CRL DP. Addressed review comments. → Patch v2 part 1 - implement crl fetching using cert CRL DP. Addressed review comments.
Attachment #367047 - Attachment description: supplemental patch v1 to implement crl fetching using cert CRL DP. Addressed review comments. → Patch v2 part 2 - patch to files not in v1
Attachment #367052 - Attachment description: Patch v1 - crlcache changes to support crldp fetching → Patch v2 part 1 subset - crlcache changes to support crldp fetching - for Julien's review
Attachment #367052 - Flags: superreview?(nelson)

Updated

10 years ago
Attachment #367052 - Flags: review?(julien.pierre.boogz) → review-

Comment 56

10 years ago
Comment on attachment 367052 [details] [diff] [review]
Patch v2 part 1 subset - crlcache changes to support crldp fetching - for Julien's review

Alexei,

I have a few problems with attachment 367052 [details] [diff] [review].

1) Why did you add CERT_CacheCRLWithFlags ?

a) What's the use case for the decoding flags in libpkix ?

b) Regarding the adoption return argument, you call it isAdopted in the header file, wasAdopted in the implementation. 

From what I see of the implementation, you are always trying to adopt the memory in CERT_CacheCRLWithFlags. As long as the function succeeds, the memory is always adopted. But CERT_CacheCRL has a predefined behavior, and it does not adopt the memory, ie. it is not the sole owner of that memory. It shares it with the caller, the application. The caller really owns the memory, until CERT_UncacheCRL is called.

So, I think the code you have written effectively breaks CERT_CacheCRL, since it now calls CERT_CacheCRLWithFlags, and the later always tries to adopt the memory.

I think generally, when our APIs have given the choice of adopting the memory or not, this has been an input flag, or in/out, but I don't think it can be an output flag only.

r- because you appear to have broken the memory management in a legacy API, CERT_CacheCRL

My other question is, once the CRL is "adopted" by the cache, how is it ever freed ?

The model for CERT_CacheCRL / CERT_UncacheCRL was simple - the application would put a CRL into the cache, and would remove it when it no longer needed it, eg. at shutdown, or when it found a new one.

It would appear that in this case, the "adopted" CRL is actually leaked, since it can never be removed from the cache.

All other CRLs can currently be removed from the cache. Even token CRLs can be removed from the cache when CRLs are deleted from the token.

It seems that you intend for libpkix to download a CRL, put it into the cache, and then just forget about it forever. I don't think that's right.

Otherwise, whenever there is a new CRL, it will just be added to the CRL cache. Old CRLs don't get deleted automatically from the cache, they have to be pulled somehow.

If you want to do some kind of automatic CRL deletion in the cache, some more design work is needed.
Assignee

Comment 57

10 years ago
(In reply to comment #56)

> 1) Why did you add CERT_CacheCRLWithFlags ? 
Two reasons: avoid copying der every time we cache the crl and make cache to be
an owner of the der memory.

> a) What's the use case for the decoding flags in libpkix ?
libpkix will deliver a der into crl cache and make it to own the der by using the following flags CRL_DECODE_DONT_COPY_DER | CRL_DECODE_ADOPT_HEAP_DER.

 
> b) Regarding the adoption return argument, you call it isAdopted in the header
> file, wasAdopted in the implementation. 
Just typo. Will be fixed. In implementation, the flag is set to TRUE if crl was created. It does not tell whether the crl was created with or without copying its der. It is up to caller to use the flag that in the patch was called isAdopted, to determinate what to do with the memory of the crl der.

 
> From what I see of the implementation, you are always trying to adopt the
> memory in CERT_CacheCRLWithFlags. As long as the function succeeds, the memory
> is always adopted.
Both of the statements above are not true. The behavior depends on the used flags.

> But CERT_CacheCRL has a predefined behavior, and it does not
> adopt the memory, ie. it is not the sole owner of that memory. It shares it
> with the caller, the application. The caller really owns the memory, until
> CERT_UncacheCRL is called.
CERT_CacheCRL calls CERT_CacheCRLWithFlags with the CRL_DECODE_DONT_COPY_DER flag that will tell der memory not to be copied. An application that provided the der will use CERT_UncacheCRL. It is consistent with the old code.

> So, I think the code you have written effectively breaks CERT_CacheCRL, since
> it now calls CERT_CacheCRLWithFlags, and the later always tries to adopt the
> memory.
Not true. 

> I think generally, when our APIs have given the choice of adopting the memory
> or not, this has been an input flag, or in/out, but I don't think it can be an
> output flag only.
CacheCrlWhithFlags has new input and output flags: the input flag that tells what to do with der memory(crlDecoderFlags) and the output flag that tells if crl was decoded or not(badly names wasAdopted or isAdopted). 


> r- because you appear to have broken the memory management in a legacy API,
> CERT_CacheCRL
I don't see the breakage.
 
> My other question is, once the CRL is "adopted" by the cache, how is it ever
> freed ?
My understanding was that want to keep all crls we got to provide more detailed revocation info. But we may not need to have it when caching crls what were download using a DP cert extension. We may define the maximum number of crl we want to store against a DP and then every time we cache a new crl that we got using a DP, we may delete the oldest from the cache.

> 
> The model for CERT_CacheCRL / CERT_UncacheCRL was simple - the application
> would put a CRL into the cache, and would remove it when it no longer needed
> it, eg. at shutdown, or when it found a new one.
It is easy from crl cache stand point, but what should application do with these der? It would require additional storage to keep tack of all the ders.
Does not seem right.

> 
> It would appear that in this case, the "adopted" CRL is actually leaked, since
> it can never be removed from the cache.
It is not leaked. But we may need to control how many issuer crls we want to keep simultaneously in the cache.

> It seems that you intend for libpkix to download a CRL, put it into the cache,
> and then just forget about it forever. I don't think that's right.
One would think that keeping control of all crl ders memory outside of the cache and for cache to operates on the crl based on the memory it does not own it not right either.
Hardware: All → Other

Comment 58

10 years ago
Alexei, Nelson and I had a meeting today to discuss the remaining issues. I sent some feedback by email. There are still a few pending questions, but there was agreement that the memory ownership issue at least needs some changes.
Assignee

Updated

10 years ago
Attachment #369180 - Flags: review? → review?(julien.pierre.boogz)
Assignee

Comment 60

10 years ago
Attachment #367046 - Attachment is obsolete: true
Attachment #369443 - Flags: review?(nelson)
Attachment #367046 - Flags: review?(nelson)
Assignee

Comment 61

10 years ago
Attachment #367047 - Attachment is obsolete: true
Attachment #369444 - Flags: review?(nelson)
Attachment #367047 - Flags: review?(nelson)

Comment 62

10 years ago
Comment on attachment 369180 [details] [diff] [review]
Patch v3 part 1 subset - crlcache changes to support crldp fetching - for Julien's review   

I think unfortunately that we need to start over with this issue. Don't take it personally, but I have only been looking at this the last week and a half, and I think there are too many error cases that we overlooked when trying to do some quick milestone.

1) nit : the comment on line 101 is misplaced (and has a typo)

2) optional: in CachedCrlStr, perhaps the refetchAfter should be part of a union, for variables that are origin-specific. See if there are others in that structure that are the same.

3) nit: certHasPartitionedOrInvalidCrl has a typo in argument names.

4) the comment for cert_CacheCRLWithFlags is incorrect ("same as above"). It probably should say "Same as CERT_CacheCRL".

5) The comment for DPCache_HasFetchedCrl is incorrect. I also think the name of the function is confusing. It turns out what it does is return whether the DP cache contains any CRL that has been downloaded (origin of CRL_OriginFetchedWithTimeout). Please try to find a better name.

6) In DPCache_HasFetchedCrl, if (!cache->crls), there should be an assertion .

7) In DPCache_GetUpToDate, line 1925, you need to change the comment.
Also, there probably should be a PORT_Assert(savcrl) before the if statement (just a suggestion, not a problem with your patch).

8) In DPCache_GetUpToDate, please see if you can find any way to keep the PR_Now() call optional, rather than having it at the beginning of the function unconditionally. It was conditional before. This call can be expensive and needs to be used sparingly. The time is only necessary if there are CRLs, but if there are none, it should not be. It doesn't appear that you need it before line 1891 and 1935 in your patch.

9) In IssuerCache_GetDPCache, you removed some assertions there that were correct for the non-XCRL case, but are not for the XCRL case. I think the conditional compilation makes things confusing, and I would favor stopping the use of the XCRL macro and just cleaning up the code assuming it's always set. XCRL was supposed to be temporary.

10) Now that we are starting to use the "dp" argument in AcquireDPCache, and other places, we need to document what it actually is. More on this later.

11) nit: typo on line 2716

12) cert_CacheCRLWithFlags is the function that I have the biggest problems with.
The call to CERT_DecodeCRLWithFlags is commented.
You look up SEC_OID_X509_ISSUING_DISTRIBUTION_POINT, yet still assign cerDpName to derDpName, so what was the point of looking up the IDP ?

The rest is more of a design issue. It has to do mostly with error cases.

a) The certDpName argument should actually be the DER of a single GeneralName representing the URL the object was downloaded from, extracted
from the certificate's CDP extension. It should not be a distributionPointName, which may represent multiple URLs.
This is partly my fault for missing this in the review last week.

b) This function allows CRLs to be indexed in the cache by a subject other than their own subject, by taking a certSubjName argument.
This was never possible before for CRLs of other origins - CRL_OriginToken or CRL_OriginExplicit.
As the designer of the CRL cache I think it violates the design of the cache, and think that it should not be done.
Storing CRLs with mismatched subject will lead to revocation check failures when those CRLs don't verify.
CRLs should only be stored in the CRL cache by components of the CRLs, such as subject, or part of the IDP.

There should be a comparison between certSubjName and crl->subject, and if there is a mismatch, the CRL could additionally be indexed
by download URL alone, in a separate top-level cache.

c) The issue is very similar when it comes to dpName, we should not use the dpName from the cert's CDP, and let it override the dpName of the IDP for the purpose of indexing within the cache. That will lead to a similar type of failures.
It is more complicated for this case though, because for proper IDP processing, both the CDP's dpName and the IDP's dpName need to be broken down into individual URLs so that a determination can be made whether they match or not.
But that should best be done by upper level code at the time of cert verification. That comparison shouldn't need to be done twice - once when storing the CRL, and another time when verifying it.
Also, the dpName can be completely missing from the IDP. There is no place in the CRL cache to store and retrieve such CRLs.
As I said above, we should replace dpName with a GeneralName, and we can index/hash by that name instead of dpName. The caller of
CERT_CacheCRLWithFlags has already had to break down the CDP to extract the GeneralName of the download URL. But this function needs to decode and break down IDP.
If there is a mismatch, ie. the CDP's General Name doesn't match any name in the IDP, then this CRL also needs to be indexed by URL.

d) In this patch, invalid DER CRLs are also cached under the "assumed" subject name and DPname from the cert. This will also cause revocation check failures
for those "assumed" issuers and DPs until such time that those CRLs are removed from the cache. I think we should avoid those failures.
Those bogus DER CRLs could be added to the URL cache, purely for the purpose of avoiding extra downloads.

e) As a MUCH simpler alternative, we can solve all the complex problems in a, b, c, and d by having the download engine index all downloaded objects by their URLs.
The download-specific variables like timeouts for refetch can be stored in that cache. It doesn't matter if the objects are good or bad DER CRLs,
what their subject, or dpname is, if any. The download engine doesn't need to know that.
The objects can just be fed into the CRL cache with CERT_CacheCRL, there is no need for CERT_CacheCRLWithFlags and its extra arguments at all.
The download engine can manage the objects' expiration itself, and if CERT_CacheCRL succeeded, it can call CERT_UncacheCRL when the object has expired and there is a new one.
If CERT_CacheCRL failed, it can even throw the downloaded contet away and just keep the download date to know when to try again.
If you do that, and only want to support cert's CDPs with full CRLs (CRLs without IDP), you actually don't need a single code change to crl.c.

That work is a single top-level hash table of URLs to a simple structure of SECItem + timeouts, which can hardly be a lot of code, and doing that obviates the need for this entire patch, IMO.
To fully support IDP, we can reuse some code from this patch, but much is still missing.
Attachment #369180 - Flags: review?(julien.pierre.boogz) → review-

Updated

10 years ago
Hardware: Other → All
Comment on attachment 369443 [details] [diff] [review]
Patch v3 part 1 - implement crl fetching using cert CRL DP. Addressed review comments.

Since this patch is a superset of another patch which has gotten r- 
from Julien, this patch must also get r-  :(
Attachment #369443 - Flags: review?(nelson) → review-

Comment 64

10 years ago
Whilre taking another look at 369180, I noticed another problem I had missed. In DPCache_GetUpToDate, the "purging" of the old CRL happens when the CRL cache is acquired. But it can not call back into the download engine. So, I think at least one revocation check will be done without any CRL, until the next download. That's a security hole. Somebody could either get in without a revocation check, or could get denied access if the NIST policy is in place (must have a CRL - and it's missing).
At that place in the code, the CRL cache cannot call back into the download engine to get the latest CRL, like it can go to tokens. So, that is the wrong place to do the purge. The purge needs to be done in an atomic way. Ideally : insert the newly downloaded CRL first, then delete the old CRL. That is best done at the highest level, ie. the download engine, rather than in the middle of the CRL cache, which cannot do an atomatic update.
With token CRLs the update is atomic, because when the CRL cache does the fetch, it holds a lock and it can insert the newly fetched token CRL while the cache lock is still held, which means deleting the old token CRLs from the cache does not have the nefarious effects mentioned above of letting someone in without a revocation check at update time.

Comment 65

10 years ago
This patch is a work in progress. I am attaching it because it contains the API for the CRL cache and should allow Alexei to write the corresponding changes in libpkix to use it.

It includes one necessary change to compile in libpkix since I modified one of the existing private CRL cache APIs.

Another necessary change for proper runtime is in bug 487736 .

The remaining items to finish this work are :

1) complete testing of all.sh . The only remaining crashes I have seen were related to bug 487736, but I haven't finished running everything again with the fix to verify that all is green.
2) test cert_CacheCRLByGeneralName . I haven't done any testing on this new code  yet. libpkix is planned to be the only consumer at this time, but I will test it with a hacked crlutil . I may do some refactoring in that function as I test it.
3) document the memory management for cert_CacheCRLByGeneralName
4) reviews, obviously. Feel free to comment on the patch. The official review request should be sometime tomorrow after I complete items 1-3.

Comment 66

10 years ago
I forgot to note that I found some issues with the way the NIST CRL policy was implemented while writing this. Specifically, DPCache_Lookup would return a NULL entry for a serial number if either it didn't find an entry in the cached CRL, or if the CRL cache was empty - ie. if there was no CRL in it. Given this, there was no way for the caller to differentiate the two cases. The caller would have to poke into the CRLDPCache structure to figure it out. I didn't check libpkix to see if it does. In this patch, I changed DPCache_Lookup to return a new argument, PRBool* emptyCache, which will help the caller if NIST policy is enabled.
I would like to know if we have any test cases for the NIST CRL policy case.

Comment 67

10 years ago
I have completed my testing with this patch. It now passes all.sh .
I have also tested, fixed and documented the cache function cert_CacheCRLByGeneralName .

Note that it is supposed to take a canonicalized version of a general name as an index, but if such a version is not available, the DER encoding of the whole name will do in the meantime, as long as all the callers of this function agree that's what they use. Since only libpkix will use it at the moment, it should be OK.

I think the problem of name matching is something that will be relevant not just for CRLs but for certs as well, and we will need to implement some form of GeneralName canonicalization to meet the requirements of RFC 5280.
Attachment #372183 - Flags: review?(alexei.volkov.bugs)
Assignee

Updated

10 years ago
Attachment #372183 - Flags: review?(nelson)
Assignee

Comment 68

10 years ago
Comment on attachment 372183 [details] [diff] [review]
Julien's CRL cache patch (Tested)

Asking nelson for additional review.
Comment on attachment 372183 [details] [diff] [review]
Julien's CRL cache patch (Tested)

Wait, doesn't this patch make a binary-incompatible change to the signature of a public function?

Comment 70

10 years ago
Nelson,

Re: comment 69, which function would that be ?
CERT_CheckCRL

Comment 72

10 years ago
That is not a public function.
> That is not a public function.
OK, then it's declared in the wrong header file.  
Please move the declaration to a private header file.

Comment 74

10 years ago
That function is a left-over from NSS 2.8, and never got exported in 3.2 . Moving it to another header file should be done, but separately from this bug.

Comment 75

10 years ago
We cleared up that there is no incompatible change to CERT_CheckCRL . Only an argument was changed from non-const to const. And it is private. When we do move it to a private header, it should be renamed to a lower case prefix (cert_).

Here is a summary of the changes implemented in attachment 372183 [details] [diff] [review] :

- A new cache is created that indexes CRLs by SECItem. The index is meant to be the canonicalized encoding of a general name . This cache is called the "named CRL cache"

- Private APIs are added to acquire, release, query and insert DER CRLs into the cache. For now, this is for exclusive consumption by the CRL download engine in libpkix. This named cache will allow this engine to always know if a CRL has already been downloaded at a given URL (one form of GeneralName).

- the "insert" function, cert_CacheCRLByGeneralName, has the property that it "swallows" the CRL. The caller completely relinquishes ownership of the CRL memory to this function. If the DER decodes, the CRL will be added to the CRL cache. Any decodable CRL that comes in with the same name will replace the old one . The CRL cache is updated atomically.

- a new function, cert_CheckRevocationStatus, expands on CERT_CheckCRL by adding the ability to check if the revocation status was "unknown", for example in the case of a missing CRL. This is required for proper NIST policy implementation.
Attachment #369444 - Flags: review?(nelson)
Comment on attachment 372183 [details] [diff] [review]
Julien's CRL cache patch (Tested)

This is a partial review.  I will continue to review this patch.
I don't expect a new patch yet.  Most, perhaps all, of these issues 
can be addressed by adding more comments (or rewording the existing 
ones).  If all these can be resolved by just adding comments, then 
these won't necessarily result in r-.  But if resolving these requires
changing more than mere comments, then r- is more likely.

My review comments below apply to the changes to one of the header files.

> /* check if a particular SN is in the CRL cache and return its entry */
> SECStatus DPCache_Lookup(CRLDPCache* cache, SECItem* sn,
>-                         CERTCrlEntry** returned);
>+                         CERTCrlEntry** returned, PRBool* emptyCache);

The comment should explain the emptyCache parameter.  I gather it is an
output parameter, but is it also an input parameter?  What does it mean??


>+ * canonicalizedName represents the source of the CRL (a GeneralName). This

A DER encoded GeneralName?  

>+ * should be canonicalized by the caller, 

Explain/define "canonicalized".  Is this completely free form?  Is the caller
free to define the contents entirely as he wishes, because the cache will make
no intepretation of the comments at all, treating it purely as a "blob"? 
Or is there some form that the input must take? and if so, what is it?

>+ *                                        since this function will not attempt
>+ * to do any name matching.

I gather you mean "will do only the simplest form of name matching, doing a straight binary comparison of the entire field value for simple equality or inequality".  That's not the same as "not attempt to do any name matching".


>+struct NamedCRLCacheStr {
>+    PRLock* lock;
>+    PLHashTable* entries;
>+};

Should that be a public structure? Should the struct members be visible 
to the API's callers?  or should it be private to the CRL cache code only?
Should there be any arenapool associated with the structure above or below?

>+struct NamedCRLCacheEntryStr {
>+    SECItem* crl;                   /* DER, kept only if CRL
>+                                         * is successfully cached */
>+    PRBool inCRLCache;
>+    PRTime successfulInsertionTime; /* insertion time */
>+    PRTime lastAttemptTime;         /* time of last call to
>+                              cert_CacheCRLByGeneralName with this name */
>+    PRBool badDER;      /* ASN.1 error */
>+    PRBool dupe;        /* matching DER CRL already in CRL cache */
>+    PRBool unsupported; /* IDP, delta, any other reason */
>+};

I gather that some of the above fields are effectively outputs, set by some 
function.  Which function?

>+typedef enum {
>+    certRevocationStatusRevoked = 0,
>+    certRevocationStatusValid = 1,
>+    certRevocationStatusUnknown = 2,
>+} CERTRevocationStatus;
>+
>+/* Returns detailed status of the cert(revStatus variable). Tells if
>+ * issuer cache has OriginFetchedWithTimeout crl in it. */
>+SECStatus
>+cert_CheckCertRevocationStatus(CERTCertificate* cert, CERTCertificate* issuer,
>+                               const SECItem* dp, PRTime t, void *wincx,
>+                               CERTRevocationStatus *revStatus,
>+                               CERTCRLEntryReasonCode *revReason);
>+
>+
>+SECStatus cert_AcquireNamedCRLCache();

Does this function take no arguments?  If so, it should be declared (void).
Or are you trying to do a non-ANSI declaration here, one that says "this 
is a function whose arguments are unspecified." ?

Since the above function seems to take no arguments and return no handles,
I gather that it basically acquires a global lock on some entire cache.
If so, I'd prefer that you call it "locked" rather than "acquired".  

>+/* This must be called only while cache is acquired, and the entry is only
>+ * valid until cache is released.

It wasn't clear to me if the "This" above referred to the function declared
just above it or just below it.  

While what cache is acquired?  The entire CRL cache? A cached entry?
Does one of the arguments point to the acquired thing? 
Or is that acquired thing a global or thread-local variable? 

>+ */
>+SECStatus cert_FindCRLByGeneralName(const SECItem* canonicalizedName,
>+                               NamedCRLCacheEntry** retEntry);
>+
>+SECStatus cert_ReleaseNamedCRLCache();

Does this function have any arguments?  

NSS is frequently criticized for its many "singletons", implied global 
states for which there is no explicit handle.  I think that in most cases,
this is a valid criticism, and since we're aware of it, we should not design
new APIs that only work with singletons.  But If you tell me that this is purely a short-term expedient for Alexei's CRLDP work, that's probably a good excuse.  :)

Comment 77

10 years ago
Nelson,

Re: comment 76,

a) In DPCache_Lookup, emptyCache is an output parameter only. It means that the CRL cache for that Issuer/DP is empty, ie. there is no CRL in it. This is not fatal for the legacy code, but is fatal with libpkix if NIST policy is set. I added the argument specifically for the NIST policy while reviewing the code. It's not specifically related to CRL DP, but it is required by libpkix to properly determined the "unknown" revocation status.

b) For now, yes, it will have to be a DER-encoded GeneralName, until we have a proper canonicalization function. I will file an RFE.

Ideally, it's not meant to be completely free-form, and I will specify the format when the function is available.

But yes, the function will work with any format as long as the callers are consistent about it.

c) Yes, it does binary name matching of course, it's not totally random. It doesn't go beyond that.

d) NamedCRLCacheStr is a private structure, and doesn't need to be visible to the API callers. There is no need for an arena pool because everything inserted into the table - CRLs - is previously allocated by the caller. We don't want to copy them into an arena pool because that would increase the working set of the application unnecessarily.

e) re. NamedCRLCacheEntryStr, this structure is filled in by cert_CacheCRLByGeneralName, and read by cert_FindCRLByGeneralName

f) re. cert_acquireNamedCRLCache and cert_ReleaseNamedCRLCache, they take no arguments and should be declared as taking void. "acquire" vs "locked" is more of a stylistic matter, and the former is already used in the CRL cache.

g) "This" was referring to the function just below, cert_FindCRLByGeneralName. I think all our comments are that way as a rule in header files.

cert_FindCRLByGeneralName returns an entry in the hash table. The purpose of this function is mainly for the caller to look at time stamps in the structure and determine whether another download is needed. It should be held a very short time.

h) singleton issue

It was not an expedient thing.

The table really does have to be locked because PLHashTable is not thread-safe.

The reason cert_FindCRLByGeneralName itself does not lock is that the caller may want to do several consecutive lookups in the table, if the cert's CDP contains multiple GeneralNames (URLs), and the hash table only needs to be locked/unlocked once.

cert_CacheCRLByGeneralName is the one doing the update to the table. There may be an opportunity to lock for shorter periods of time within that function as an optimization, but that will also increase the number of times the table will have to be locked/unlocked, and extra lookups will be necessary.
Attachment #372183 - Attachment description: Tested patch → Julien's CRL cache patch (Tested)
Assignee

Comment 78

10 years ago
Comment on attachment 372183 [details] [diff] [review]
Julien's CRL cache patch (Tested)

Where are a few things found in the patch:
1. some compiler warning
2. crl.c:2724 - time variable "t" should be asserted before making time validity check on the issuer cert. 
3. crl.c:2743 - remove the assertion. The look up function can return SECFailure because of the signature on the crl was not checked and the time t is set to 0. Julien suggested to have an argument that will tell if the cache has invalid state. It is sufficient, in combination with time value, to judge if status of the certificate should be changed from revoked, due to invalid cache, to unknown.    

t == 0 is an indication that revocation test is done during chain building phase. During this phase, cert and crl signature checks are not allowed.
Attachment #372183 - Flags: review?(alexei.volkov.bugs) → review-
Attachment #372183 - Flags: review?(nelson) → review-
Comment on attachment 372183 [details] [diff] [review]
Julien's CRL cache patch (Tested)

In reply to comment 77, and in follow up to my earlier comments on the 
header files, here are more comments.  Please read this list all the 
way to the bottom before acting on it.

a) thanks for this info.  Please add it as a comment to the header file.

b+c) the comment needs to clearly say tha the cache imposes no format 
constraints on the GeneralizedNames, they are just binary blobs, but 
that all users of the API must agree to a common definition/syntax.
Then it should recommend that a canonicalized name format be used to 
facilitate X.500 name matching rules.  

d) I wish we had separate header files for declarations that are needed 
by users of the CRL API, and declarations that are truly private to the
CRL code itself.  Since we don't, I won't require it here, but it would
be a very good idea.

e) thanks. Please make sure the comment makes that clear.

f+h) This is my big issue with the whole patch.  I'll speak to it below.

g) Please don't use pronouns in the header file descriptions.  Use the
name of the function or structure every time.  Yes, it's more verbose.
It's also MUCH clearer.

g (again)) please add this comment as a block comment describing the role
of cert_FindCRLByGeneralName with regards to 

OK, now I've reviewed the patch to crl.c.  I believe the .c code implements 
the API defined in the .f file reasonably well, but I am not happy with that
API because of the singletons.  Here are some suggested things to do about 
it.

1) If you "acquire" something, you should explicitly get the thing you 
acquired.  The cert_AcquireNamedCRLCache() function should return a handle.
I suggest that it return a handle that is thread dependent, not a constant.
PR_GetThreadID(PR_GetCurrentThread()) is one value to use for this.
Another is PR_GetCurrentThread().  cert_AcquireNamedCRLCache should avoid
deadlocks due to re-entrant locks, either by asserting or by using a 
PRMonitor.  NSPR has new public macros that are handy for asserting that 
the calling thread holds a lock or monitor.

2) The other functions that depend on this cache lock must require that
handle as an argument.  They must assert that the handle is the handle of
the lock holder (there are NSPR macros for this).  

3) I really dislike "assert(0)" because in some cases, one CANNOT tell 
what is the problem to which it is objecting.  It's least objectionable
when it occurs right after an if test, e.g.
   if (condition) {
      assert(0)
but in other cases, such as in cert_CacheCRLByGeneralName at line 3144, 
it's not so clear.  Be very sure it's not asserting that the values that 
come from outside sources (e.g. inside of CRLs) are good values.

Now, given that all of this API in internal and private to NSS, if we're
really in a time crunch, such that we cannot fix this and still have 
Alexei check in by end of Friday, then I'm willing to give this r+. 
Otherwise, I'd like to see this fixed.  

Since Alexei already gave this patch r-, I feel safe giving it another. :)
You can see my "druthers" from the paragraphs above.  But again, how much 
of this needs to be fixed in the short term is a function of how much 
delay it will cause.
Assignee

Comment 80

10 years ago
Patch is tested, but I'll wait for Julien final patch to create final crldp patch.
Attachment #369180 - Attachment is obsolete: true
Attachment #369443 - Attachment is obsolete: true
Attachment #369444 - Attachment is obsolete: true

Comment 81

10 years ago
Alexei,

Unfortunately, my final patch will have some changes, including to the API. There are some issues that Nelson and I discussed over IM today, notably regarding the output of DPCache_Lookup. This is what I have coded and am testin right now :

+typedef enum {
+    dpcacheNoEntry = 0,             /* no entry found for this SN */
+    dpcacheFoundEntry = 1,          /* entry found for this SN */
+    dpcacheCallerError = 2,         /* invalid args */
+    dpcacheInvalidCacheError = 3,   /* CRL in cache may be bad DER */
+                                    /* or unverified */
+    dpcacheEmpty = 4,               /* no CRL in cache */
+    dpcacheLookupError = 5          /* internal error */
+} dpcacheStatus;
+
...

 /* check if a particular SN is in the CRL cache and return its entry */
-SECStatus DPCache_Lookup(CRLDPCache* cache, SECItem* sn,
-                         CERTCrlEntry** returned);
+dpcacheStatus DPCache_Lookup(CRLDPCache* cache, SECItem* sn,
+                             CERTCrlEntry** returned);

While this change won't affect the caller of cert_CheckRevocationStatus, since that interface won't change, it does affect some more code libpkix. 

It seems that pkix_pl_Pk11CertStore_CheckRevByCrl is essentially duplicating what CERT_CheckCRL was doing. I have made some changes to my tree to accomodate the new interface for DPCache_Lookup, but they may conflict with your patch. I'm wondering if that function could be changed to call cert_CheckRevocationStatus, to avoid the code duplication.

I still have other changes to write/test. I am about halfway done.
Assignee

Updated

10 years ago
Attachment #373269 - Attachment is patch: true
Attachment #373269 - Attachment mime type: application/octet-stream → text/plain

Comment 83

10 years ago
Comment on attachment 373392 [details] [diff] [review]
Implement some changes from Alexei, and some from Nelson (checked in)

Per our concall, I have committed this patch to the trunk, so that Alexei can do his integration on top of it.

This patch was tested with all.sh on Solaris x86 and passed.

Checking in lib/certdb/cert.h;
/cvsroot/mozilla/security/nss/lib/certdb/cert.h,v  <--  cert.h
new revision: 1.77; previous revision: 1.76
done
Checking in lib/certdb/certi.h;
/cvsroot/mozilla/security/nss/lib/certdb/certi.h,v  <--  certi.h
new revision: 1.29; previous revision: 1.28
done
Checking in lib/certdb/crl.c;
/cvsroot/mozilla/security/nss/lib/certdb/crl.c,v  <--  crl.c
new revision: 1.63; previous revision: 1.62
done
Checking in lib/libpkix/pkix_pl_nss/module/pkix_pl_pk11certstore.c;
/cvsroot/mozilla/security/nss/lib/libpkix/pkix_pl_nss/module/pkix_pl_pk11certstore.c,v  <--  pkix_pl_pk11certstore.c
new revision: 1.16; previous revision: 1.15
done
Attachment #373392 - Attachment description: Implement some changes from Alexei, and some from Nelson → Implement some changes from Alexei, and some from Nelson (checked in)

Comment 84

10 years ago
There are a couple of issues in cert_CacheCRLByGeneralName that my limited testing with it did not find, but which Alexei's did. The problems are :

1) the key value (canonicalizedName) is not duplicated before the entry is added to hash table. So the hash table points to memory that ends up going away. The simplest fix is to make a copy of the name into NamedCRLCacheEntryStr, and pass that pointer to PL_HashTableAdd .

2) when adding an entry with PL_HashTableAdd, if a previous entry exists, NSPR destroys the old entry first before adding the new one. The freeEntry function is invoked, which in this case causes the name to be destroyed. However, the hash table ends up keeping a pointer to the old key. So, the fix described in 1) does not really work unless one calls PL_HashTableRemove first to remove the old entry, to make sure NSPR no longer has a pointer to the old key, before calling PL_HashTableAdd with the new entry and the new key pointer. Alternately, allocOps could be used, but that is much more code.

3) There was a missing call to CERT_UncacheCRL when a good CRL entry got replaced by a new one .

I have a patch to fix these issues, but no simple way to test it unfortunately. My time might best be spent writing a good test case now.

Comment 85

10 years ago
I believe this patch fixes all the remaining issues I'm aware of, reported by both Alexei and Nelson, except for the use of NSPR lock macros which are only available in NSPR 4.8, which hasn't RTM'ed yet.

I'm running all.sh on it now. I haven't tested the changes to cert_FindCRLByGeneralName yet.
Assignee

Comment 86

10 years ago
Review comments + corrected for the last crl.c patch 373430
Attachment #373269 - Attachment is obsolete: true
Attachment #373437 - Flags: review?(nelson)

Comment 87

10 years ago
This patch includes a one-line correction that Alexei was talking about in comment 86. It has passed all.sh .

Also noteworthy is that all but two tinderboxes have run with attachment 373392 [details] [diff] [review], and are green. One is still pending (ciotat 64 DBG), and the other, goride 32 OPT, ran into an unrelated failure, bug 485145. So, I think it's likely that the patch did not introduce regressions.
Attachment #373430 - Attachment is obsolete: true
Assignee

Comment 88

10 years ago
As checked in. Changes according late night review( Thank you. Nelson) + latest Julien changes to the name cache. Required additional testing.
Attachment #373437 - Attachment is obsolete: true
Attachment #373437 - Flags: review?(nelson)
Assignee

Comment 89

10 years ago
(In reply to comment #88)
> Required additional testing.
It passed all.sh on the local machine.
I used globalsign certs for testing. Please list any websites that you think may help in CRL DP testing. Thx.
Comment on attachment 369443 [details] [diff] [review]
Patch v3 part 1 - implement crl fetching using cert CRL DP. Addressed review comments.

The later patches, after this one, did not include two files that
were found in this patch.  So, the revisions to those files that
were made after this patch was reviewed (r-) have not been reviewed.
Nonetheless, I have found it necessary to commit them to get the 
tree to stop burning.

libpkix/pkix_pl_nss/pki/pkix_pl_crldp.c; initial revision: 1.1
libpkix/pkix_pl_nss/pki/pkix_pl_crldp.h; initial revision: 1.1

Comment 91

10 years ago
We have had many tests failing last night in tinderbox, particularly on ciotat, but also on attic. I looked at many logs, and they are failures during SSL stress tests. The server cert is considered revoked when it shouldn't be. This is probably a race condition.

There was also a failure in chains.sh on nssmac2 related to an invalid HTTP response from server. I'm not sure if that's a bug or an environmental problem with the HTTP server.
Assignee

Comment 92

10 years ago
I've to find the problem today. Problem is in cert_CheckCertRevocationStatus function as it returns certRevocationStatusRevoked status, which is default status that this function returns in case of error. Expected status is certRevocationStatusValid: the cert is not in the list of certs revoked by crl.

The problem shows up when libpkix validates the whole chain - date argument is set(not NULL). CRL signature is validated before cert status is returned.

Here is the stack:
=>[1] pkix_pl_Pk11CertStore_CheckRevByCrl(store = 0x2b6880, pkixCert = 0x2b71d8, pkixIssuer = 0x2bb488, date = 0x259ea0, crlDownloadDone = 0, pReasonCode = 0xfe2f80f4, pStatus = 0xfe2f80e4, plContext = 0x2539f8), line 547 in "pkix_pl_pk11certstore.c"
  [2] pkix_CrlChecker_CheckLocal(cert = 0x2b71d8, issuer = 0x2bb488, date = 0x259ea0, checkerObject = 0x15fbd8, procParams = 0x220958, methodFlags = 35U, chainVerificationState = 1, pRevStatus = 0xfe2f817c, pReasonCode = 0x2b87d8, plContext = 0x2539f8), line 279 in "pkix_crlchecker.c"
  [3] PKIX_RevocationChecker_Check(cert = 0x2b71d8, issuer = 0x2bb488, revChecker = 0x2b6930, procParams = 0x220958, chainVerificationState = 1, testingLeafCert = 1, pRevStatus = 0xfe2f825c, pReasonCode = 0x2b87d8, pNbioContext = 0xfe2f8294, plContext = 0x2539f8), line 397 in "pkix_revocationchecker.c"
  [4] pkix_CheckChain(certs = 0x15f3a8, numCerts = 1U, anchor = 0x189178, checkers = 0x2af0e8, revChecker = 0x2b6930, removeCheckedExtOIDs = 0x16a920, procParams = 0x220958, pCertCheckedIndex = 0x2b87c4, pCheckerIndex = 0x2b87c8, pRevChecking = 0x2b87e8, pReasonCode = 0x2b87d8, pNBIOContext = 0xfe2f8378, pFinalSubjPubKey = 0xfe2f8384, pPolicyTree = 0xfe2f8380, pVerifyTree = (nil), plContext = 0x2539f8), line 797 in "pkix_validate.c"
 ....

Comment 93

10 years ago
Thanks, Alexei.

Try setting breakpoints or asserts in CachedCrl_Verify and see if it ever fails during the SSL stress tests.
Assignee

Comment 94

10 years ago
So the race is in the part of the code that deals with validation of the cache when signature on the crl is checked for the first time. More to come...
Alexei,  I wish you were on AIM. :(
If this is a race, who are the racers?  Are there multiple threads involved?
Assignee

Comment 96

10 years ago
Set a break point at cert_CheckCertRevocationStatus right after call to AcquireDPCache. Time argument is not NULL. Data printouts below show, that
cache is in invalid state even though crl signature was validated.  


(dbx) list
 2748       ds = DPCache_Lookup(dpcache, &cert->serialNumber, &entry);
 2749       switch (ds)
 2750       {
 2751           case dpcacheFoundEntry:
 2752               PORT_Assert(entry);
 2753               /* check the time if we have one */
 2754               if (entry->revocationDate.data && entry->revocationDate.len)
 2755               {
 2756                   PRTime revocationDate = 0;
 2757                   if (SECSuccess == DER_DecodeTimeChoice(&revocationDate,
(dbx) p *dpcache        
*dpcache = {
    lock              = 0x29bf30
    issuer            = 0x2b7c40
    subject           = 0x20bfb0
    distributionPoint = (nil)
    ncrls             = 1U
    crls              = 0x49570
    selected          = (nil)
    invalid           = 1U
    refresh           = 0
    mustchoose        = 0
    lastfetch         = 1240178223832544LL
    lastcheck         = 0
}
(dbx) p *dpcache->crls[0]
*dpcache->crls[0] = {
    crl         = 0x2b5950
    origin      = CRL_OriginToken
    entries     = (nil)
    prebuffer   = (nil)
    sigChecked  = 1
    sigValid    = 1
    unbuildable = 0
}
(dbx) p pt               
dbx: "pt" is not defined in the scope `libnss3.so`crl.c`cert_CheckCertRevocationStatus`
dbx: see `help scope' for details
(dbx) p t 
t = 1240178223849531LL
Here's my proposed solution for the bugs that Alexei and I found this 
afternoon.
Attachment #373593 - Flags: review?(alexei.volkov.bugs)
Assignee

Updated

10 years ago
Attachment #373593 - Flags: review?(alexei.volkov.bugs) → review+
Assignee

Comment 98

10 years ago
Comment on attachment 373593 [details] [diff] [review]
proposed patch for bugs in crl.c Alexei found this afternoon. (checked in)

r=alexei
Attachment #373593 - Attachment description: proposed patch for bugs in crl.c Alexei found this afternoon. → proposed patch for bugs in crl.c Alexei found this afternoon. (checked in)

Comment 99

10 years ago
I have been running some QA overnight in a loop at home and am seeing the following test failures. I don't know if they are the same as the ones in tinderbox yet.

#2142: Bridge: Verifying certificate(s) UserBridge.der BridgeNavy.der Navy.der with flags -d ArmyOnlyDB

#3929: Stress TLS ECDH-ECDSA AES 128 CBC with SHA (no reuse, client auth) 	Failed

#4414: AnyPolicyWithLevel: Verifying certificate(s) EE2CA23.der RootCA.der CA1RootCA.der CA22CA1.der CA23CA22.der with flags -d AllDB -o OID.2.0 -t RootCA.der

Updated

10 years ago
Attachment #373443 - Attachment description: Corrected patch → Corrected patch. Subset of attachment 373447, which is checked in.

Comment 100

10 years ago
A change in attachment 373593 [details] [diff] [review] was made to addCRLToCache to avoid a crash when entry was uninitialized and NamedCRLCacheEntry_Create failed. But the fix also introduced a leak, by not freeing crl . We only run into this code path in out-of-memory conditions.
Attachment #373766 - Flags: review?(nelson)
Rob & Robin,  
It would help the testing of CRLDP a lot if you could provide a list of 
numerous (e.g. 20) web sites with Comodo EV certificates which bear CRLDP
extensions, but not AIA OCSP extensions.  Feel free to send them to me by
email, if you'd rather not post them here.
Comment on attachment 373766 [details] [diff] [review]
Fix leak (checked, in)

thanks, Julien.
Attachment #373766 - Flags: review?(nelson) → review+

Comment 103

10 years ago
Comment on attachment 373766 [details] [diff] [review]
Fix leak (checked, in)

Thanks, Nelson.

Checking in crl.c;
/cvsroot/mozilla/security/nss/lib/certdb/crl.c,v  <--  crl.c
new revision: 1.66; previous revision: 1.65
done
Attachment #373766 - Attachment description: Fix leak → Fix leak (checked, in)
Is this resolved yet? Will it ever make 3.5? Some sites still don't get EV even though they did previously in 3.0.x, not sure if this is the right place for this though...
(In reply to comment #104)
> Is this resolved yet? Will it ever make 3.5? Some sites still don't get EV even
> though they did previously in 3.0.x, not sure if this is the right place for
> this though...

The NSS team have developed and landed support for CRLDP in their own repository, as part of the 3_12_4 version. This is their bug, so presumably they'll close it when it is resolved to their satisfaction, but to be clear, that won't necessarily mean any change in Firefox.

We don't take NSS directly, instead, we take specific tags of the NSS repository. We haven't yet taken an NSS version with this support on the 1.9.1 branch because it's the kind of thing that needs a significant amount of baking.  We will consider it in a point release though, if we can be confident the risk is low (since the reward is significant). mozilla-central currently *is* using a 3_12_4 version of NSS, though, so I would expect it to have this support (unless we're using an older 3_12_4 tag, in which case that's something we can update).
Johnathan, 
How (by what criteria) will we know when CRLDP has "baked" enough?
Has acceptance of the CRLDP work been stalled because this bug was not 
marked resolved? :(

The CRLDP work has been "baking" since 2009-04-20, about 50 days now.
On 2009-05-21, after it had been baking about 30 days, we agreed that it was
premature to include CRLDP, because the secondary effects (page load delays,
additional network load) were not (IMO) sufficiently known or understood.
However, on that date, I understood that the FF 3.5 release was *imminent*.
Now, 3 weeks later, with FF 3.5 still not released (and no longer seeming 
like it's imminent), it seems we've lost 3 weeks of valuable time to be 
testing CRLDP in a 3.5 RC.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: 3.12.5 → 3.12.4
(In reply to comment #106)
> Johnathan, 
> How (by what criteria) will we know when CRLDP has "baked" enough?
> Has acceptance of the CRLDP work been stalled because this bug was not 
> marked resolved? :(
> 
> The CRLDP work has been "baking" since 2009-04-20, about 50 days now.
> On 2009-05-21, after it had been baking about 30 days, we agreed that it was
> premature to include CRLDP, because the secondary effects (page load delays,
> additional network load) were not (IMO) sufficiently known or understood.
> However, on that date, I understood that the FF 3.5 release was *imminent*.
> Now, 3 weeks later, with FF 3.5 still not released (and no longer seeming 
> like it's imminent), it seems we've lost 3 weeks of valuable time to be 
> testing CRLDP in a 3.5 RC.

The release of 3.5 is (I guess by definition) more imminent than ever - we are completely locked down and waiting on resolution of some JS blockers before spinning a release candidate.

Baking is principally a question of understanding and mitigating risk, not calendar time. A CSS change may need almost no baking, since the risk is well understood and easily assessed (does the change look right on every platform we support?)

An NSS change is harder for us to understand the risk on through direct inspection, so we rely on test coverage and exposure to a significant chunk of users for a significant period of time, to see what bugs or crashes shake out.

On trunk right now, which is using a 3_12_4-based tag, if I look at recent crash reports sorted by frequency:

http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.6a1pre&date=&range_value=4&range_unit=weeks&query_search=signature

I see 5 NSS-looking crashes in the top 50:

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=NSSRWLock_LockRead_Util

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=libnssutil3.dylib%400xa381

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=nssutil3.dll%400x34c0

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=libnssutil3.so%400x79da

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=libnssutil3.so%400xc6ed

Some of those may be variants on the same thing, some of them may be unrelated to changes in 3_12_4, but knowing how much risk they represent is what we mean by baking.

When we can say "This has been on central for some time, the crashers have all been remedied or at least well understood, and here is the test coverage that confirms our logic" we should file a bug to get this landed on 1.9.1.x (and possibly 1.9.0.x?)

Comment 108

10 years ago
Jonathan,

Are bugs filed on any of these top crashers ?
I looked at a few of the stacks, and I am not at all convinced that the crashes have anything to do with NSS code changes made between 3.12.3 and 3.12.4 .
The problem once again is that most of your stacks don't contain proper debug info for NSPR/NSS symbols. That makes it nearly impossible to diagnose the last 4 crashers.

For some reason, the first crasher does have the necessary debug info. And from it, I can tell you that it's not an NSS bug. It's a PSM bug where it calls an NSS function even though NSS is not initialized (or it has already been shut down).
(In reply to comment #108)
> Jonathan,
> 
> Are bugs filed on any of these top crashers ?

I don't know - I haven't filed them personally. My hope was to answer Nelson's question about what constitutes "baking" for the purposes of shipping the code to 300m users.  I don't claim to know which (if any!) of them represent "real" fallout from this code, but they are examples of the questions to which we'd need answers.

> I looked at a few of the stacks, and I am not at all convinced that the crashes
> have anything to do with NSS code changes made between 3.12.3 and 3.12.4 .
> The problem once again is that most of your stacks don't contain proper debug
> info for NSPR/NSS symbols. That makes it nearly impossible to diagnose the last
> 4 crashers.
> 
> For some reason, the first crasher does have the necessary debug info. And from
> it, I can tell you that it's not an NSS bug. It's a PSM bug where it calls an
> NSS function even though NSS is not initialized (or it has already been shut
> down).

That is good information to have, and getting definitive answers to all of these is how we build confidence in the quality of the code. If you want me to file bugs about investigating these 5, I could do that, but I would expect that someone from the NSS team would be interested in/already watching crash stats from Firefox, since it represents such a broadly deployed instance and is likely to be a good source of crash data?  I am willing to believe, though, that the crash-stats site was not sufficiently advertised before.

I think the build symbols issue should have been resolved with bug 458553, right?
Johnathan,
Thanks for your comment 107.  I had not seen that tool before, and it is 
useful to understand how Mozilla folks identify top issues.  

I believe none of the crashes you identified in comment 107 is new to NSS
since 3.12.3 (or whatever FF 3.5 is currently using).  I did find one other
crash that may or may not be new.  (We don't know because there were no 
symbols for NSS in Firefox builds for so long).  It is 
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=PR_DestroyMonitor

However, that table shows that the Firefox "trunk" builds are still experiencing crashes that the NSS team believes we've fixed since 3.12.3 
was released.  And this makes me wonder just how current in the code in 
FF 3.5 and are there bug that are mistakenly believed to be fixed in FF 3.5
because they are fixed in NSS now, but whose fixes have not yet gotten into
FF 3.5?

Updated

10 years ago
Depends on: 504405

Updated

10 years ago
Depends on: 504408
You need to log in before you can comment on or make changes to this bug.