Closed Bug 418544 Opened 13 years ago Closed 12 years ago

Pathological OCSP inefficiency with libPKIX

Categories

(NSS :: Libraries, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.2

People

(Reporter: nelson, Assigned: alvolkov.bgs)

Details

(Whiteboard: PKIX)

Attachments

(5 files)

It seems that libPKIX does a LOT more OCSP checking than one would expect.

While trying to debug bug 418367 in a debugger using the NSS test program
OCSPCLNT, I set a breakpoint in the (new) code that detects a zero-length signatures.  (See the patch attached to that bug for the two places to set
the breakpoint.)

When the environment variable NSS_ENABLE_PKIX_VERIFY is not set, that 
breakpoint gets hit exactly once, as expected.  When that environment 
variable is set to "1", that breakpoint is hit 7 times!
I expected it to be hit 2-3 times at most, not 7 times.  

I think there is some O(n^2) activity going on here. 
Is the OCSP cache working?  
Are there actually 7 OCSP requests being made? 
or is one response's signature being checked 7 times? or ?

Steps to reproduce:
Using a trunk build, setup a clean cert and key DB.  Put nssckbi into the
same directory as the DB files.  Using certutil, put two CA certs into 
the cert DB.  (I will attach those CA certs shortly.)  
Then use ocspclnt to do a verification on the first CA cert.

After downloading the two certs (named digicert.002 and digicert.003)
this shell command will install the certs into your DB
   for a in digi*; do certutil -d DB -A -n  $a -i $a -t ',,'; done

Then the test is conducted with this command:

    ocspclnt -d DB -V digicert.002 -u C -vvvv
Whiteboard: PKIX
This is a binary (DER) cert
This is also a binary (DER) cert
Now that the OCSP responder that was the subject of bug 418367 has been fixed,
the steps given above can no longer be used to reproduce this problem. :(
It may be necessary to resolve this WORKSFORME.
(In reply to comment #3)
> Now that the OCSP responder that was the subject of bug 418367 has been fixed,
> the steps given above can no longer be used to reproduce this problem. :(
> It may be necessary to resolve this WORKSFORME.


If you want to reproduce the original behavior, maybe one could explicitly patch NSS to set sig.len=0 / subjectCert->signatureWrap.signature.len=0 in the two places you propose to change in bug 418367.

(Patch NSS in the debugging environment only.)
Here's the original response to the first OCSP request.  
Don't know if this is very helpful.
This behavior is related to inefficiency of algorithm of the pkix_BuildForwardDepthFirstSearch function. Each cert from local cert store is validated at least three(!) times. 
(In reply to comment #6)
> This behavior is related to inefficiency of algorithm of the
> pkix_BuildForwardDepthFirstSearch function. Each cert from local cert store is
> validated at least three(!) times. 

That strikes me as pathological, and in need of a fix!  
Would you agree that it's at least p2 for 3.12?

Agreed. P2 for 3.12.
Priority: -- → P2
I think the chain cache was supposed to take care of the multiple verifications ... Is the whole verification being performed again, or just the revocation check ?
Target Milestone: 3.12 → 3.12.2
(In reply to comment #9)
> I think the chain cache was supposed to take care of the multiple verifications
> ... Is the whole verification being performed again, or just the revocation
> check ?
chain cache can not be a part of the solution as the problem is in algorithm that builds a chain. So the chain is not yet build and is not in the cache at this point.

Algorithm has bogus logic of querying local cert store multiple times instead of checking local cert store for a set of certs and later,  if none of the local certs fit,  try to fetch certs through AIA.

This behavior leads to same error to be repeated multiple times in the result, as well as multiple revocation checks on the cert.

In details, I've found that variable alreadyTriedAIA from PKIX_ForwardBuilderStateStruct does not carry any logical load. Proposing remove the variable and simplify the algorithm. I'll attach the patch shortly.
Assignee: nobody → alexei.volkov.bugs
Status: NEW → ASSIGNED
Attachment #327185 - Flags: review?(nelson)
One more reason for doing ocsp on the same cert multiple times: cert collected twice from the cert stores that have local flag set to true. Check pkix_build.c:2003
                if ((state->useOnlyLocal) == PKIX_FALSE) {
                    certStoreCanBeUsed = PKIX_TRUE;
                } else {
                    PKIX_CHECK(PKIX_CertStore_GetLocalFlag
                            (certStore, &certStoreCanBeUsed, plContext),
                            PKIX_CERTSTOREGETLOCALFLAGFAILED);
                }
If state->useOnlyLocal local cert stores will be asked for certs again. Should be fix to improve performance.
Alexei, I think your comment 10 is saying, in effect, that libPKIX does
revocation checks while it's trying to build the chain, and again when
the chain is built.  Is that correct?  

I'm wondering what we can do to ensure that we don't do more than one 
revocation check per cert per path validation.  After we've done one
revocation check on a cert, and know that it is not revoked, we certainly
should not check AGAIN for AT LEAST the duration of the time it takes to
complete the patch construction and validation.  

I can see that your patch may reduce the number of times we check a cert
by as much as half, but will it really reduce the number of times to one?
Comment on attachment 327185 [details] [diff] [review]
Patch v1 - remove alreadyTriedAIA variable (checked in)

I'm convinced this patch is low risk, but not entirely convinced that the code with this patch is correct. 
r=nelson
Attachment #327185 - Flags: review?(nelson) → review+
libpkix need checks non-local cert stores, if attempt to collect certs through AIA has failed. The current implementation has the flaw that makes libpkix collect certs from local stores for the second time.

The patch fixes this problem, and compiler warning (pkix_BuildForwardDepthFirstSearch call).
Attachment #328779 - Flags: review?(nelson)
Comment on attachment 328779 [details] [diff] [review]
Patch v2 - do not check local cert stores for the second time (checked in)

r+ with the following change:
> 
>+        /* make sure we have no certs in the list */
>+        PKIX_DECREF(state->candidateCerts);
>+        PKIX_CHECK(PKIX_List_Create(&state->candidateCerts, plContext),
>+                   PKIX_LISTCREATEFAILED);
>+

If the candidateCerts object (list?) is empty, then there is no need to 
destroy it and create a new one.  It can just be reused.
If the candidateCerts object is NOT empty, that is an error made by the
caller.  
I suggest an assertion here that the caller's list of candidateCerts is
empty, followed by an if test that only executes the above two new lines
if the list is non-empty.
Attachment #328779 - Flags: review?(nelson) → review+
Both patches have been integrated. But there is a big mess how candidateCerts list get handled. Will need another patch to clean the code. I also suspect, that the list will not be empty in some cases.  For this reason I've ifdefed the assertion.
Attachment #327185 - Attachment description: Patch v1 - remove alreadyTriedAIA variable → Patch v1 - remove alreadyTriedAIA variable (checked in)
Attachment #328779 - Attachment description: Patch v2 - do not check local cert stores for the second time → Patch v2 - do not check local cert stores for the second time (checked in)
Unsetting target milestone in unresolved bugs whose targets have passed.
Target Milestone: 3.12.2 → ---
Was fixed when patch 345779 for bug 205434 was integrated.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.12.2
OS: Windows XP → All
Hardware: x86 → All
You need to log in before you can comment on or make changes to this bug.