Closed
Bug 149854
Opened 23 years ago
Closed 23 years ago
CRL cache to avoid DER CRL decodes on every cert verification
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.6
People
(Reporter: julien.pierre, Assigned: julien.pierre)
References
Details
Attachments
(1 file)
45.45 KB,
patch
|
Details | Diff | Splinter Review |
When an application verifies a certificate, for example using CERT_VerifyCert
explicitly, or on a server during an SSL client auth handshake, NSS searches all
PKCS#11 tokens for a CRL matching the certificate's issuer. If if finds one, the
token (softoken) returns the matching DER CRL, which is then decoded, searched,
and immediately discarded.
Every time there is a new certification verification in the application, the
process repeats, and the DER CRL is decoded all over again, even if it hasn't
been modified.
This is an extremely wasteful use of CPU and memory. The repeated decode
operations are not necessary.
To give an example of just how bad the impact of this bug is in the real-world :
1) install a CRL with 1200000 entries into cert7.db
2) start NES web server with that DB and set it up for client auth
3) login to the site with your client cert
4) the server's CPU peaks for 3 minutes decoding it
5) go back to 3.
As you see, this limits the server - an ultra10 at 440 MHz - to a mere 20 client
auth handshakes per *hour*, at peak CPU usage.
If a CRL is not used, the same server will be able to do about 30000 handshakes
per hour, 3 orders of magnitude more.
While the first 3 minute decoding step is inevitable, the 19 subsequent ones can
be avoided. This would mean that the server with revocation would perform about
the same with CRLs as without, once it had got past the initial long CRL
decoding.
The solution to avoid repeated CRL decoding is to implement a cache of CRL
objects. The cache should keep track of both decoded and DER versions of CRL
objects by issuer. This cache should be above the pk11wrap layer and below the
cert verification layer. I will post more details about the design of this cache
as they become available.
![]() |
Assignee | |
Comment 1•23 years ago
|
||
Taking bug.
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.6
![]() |
Assignee | |
Comment 3•23 years ago
|
||
I have created an implementation of the cache based on the design we discussed.
It isn't complete yet, but it is functional enough to allow it to be tested. The
two missing parts are :
- code to check that the PKCS#11 object has been deleted in the token. This
means currently the CRL is cached forever
- a hash table of serial numbers to entries. Currently the cache is doing linear
searches
Hopefully I can complete these parts tomorrow.
For now, I have been able to successfully test this patch with NES 6.1. It shows
that client cert validation is much faster, but there are many other parts in
the code path besides cert validation that are more expensive than the cache
lookup, so this turned out not to be a good test.
The benchmark I decided to use is a modified verification of certutil that does
multiple iterations and collects the time spent on doing verification the first
time and for subsequent operations.
Here are the results on a Solaris debug build.
First, when trying to verify a good certificate that is not on the CRL. This is
the worst case, but also the most common : the entire array is traversed linearly.
(strange)/export/home/jpierre/nss/36/mozilla/dist/SunOS5.8_DBG.OBJ/bin{85} ./good
Time for initial uncached verification : 0:58.177100
Time for 99 cached verifications : 0:52.425020
Average per cached verification : 0: 0.529540
certutil: certificate is valid
73.70u 1.70s 1:50.99 67.9%
As you can see, the initial uncached verification takes 58 seconds. This time is
spent looking up the 26 MB CRL, fully decoding it, then traversing the array of
1.2 million entries and not find a match. [Bob, this lookup + decode was only
taking 23 seconds two days ago on the debug build. The decode part is 11s. This
means the softoken lookup part went up from 12s to 47s over the last couple
days. I don't know why yet. This might be a problem in my environment. But this
decrease in performance in the uncached case doesn't concern the cache which is
the subject of this bug]
Then, the subsequent cached lookups take about a half second each. This time is
mostly spent traversing the entire array of entries.
With a revoked certificate, the situation is much better :
(strange)/export/home/jpierre/nss/36/mozilla/dist/SunOS5.8_DBG.OBJ/bin{84} ./revoked
Time for initial uncached verification : 0:55.290150
Time for 99 cached verifications : 0: 0.143320
Average per cached verification : 0: 0. 1440
certutil: certificate is invalid: Peer's Certificate has been revoked.
This is most likely because the serial number is listed early in the CRL array
and therefore the linear search matches quickly and stops.
In summary, the cache improves performance by a factor of 40000 on the revoked
certificate, and about 100 on the good certificate. Keep in mind this was in the
debug build - I expect the improvement to be less in the optimized build. But I
still have some bugs in the cache and the optimized build dumps core right now,
so I couldn't measure performance with it.
By implementing the hash table of serial numbers to entries, both cases of
revoked and valid certificates would end up taking the same short amount of time
to look up in the CRL. Therefore, the improvement would be the maximum in both
cases.
It is also important to note that in the cache, the CRL is protected by a lock,
unlike in the past where each thread had its own full separate copy of it. This
means that if we keep it as an array, and we end up having long linear searches
(most users on a server are not revoked, so this is the common case), then this
means all certificate verifications on the same issuer in other threads are
effectively serialized. With a lookup time of 0.5 s to traverse the full CRL,
this can have a very bad effect in a multiprocessor server.
Of course, this is probably a moot point compared to the current horrid
performance of CRLs without the cache ... But since we are trying to solve the
problem, I think we should solve it completely, and not just half of it.
![]() |
||
Comment 4•23 years ago
|
||
Sounds like the CRL cache entry wants to have a reader/writer lock
rather than an ordinary lock. There is an implementation of reader/writer
locks in lib/util/nssrwlk*.[ch] that may be suitable.
![]() |
Assignee | |
Comment 5•23 years ago
|
||
Yes, the reader/writer lock would be useful. I will look at using it.
FYI, about the 58s time last night for the original lookup - it was probably
something with the state of the disk cache on my Solaris box, because today I am
running the same exact test script with identical binaries and the inital
uncached verification time is back down to 22s.
![]() |
Assignee | |
Comment 6•23 years ago
|
||
It turns out the crashes I was seeing last week were unrelated to my cache code.
It was working fine.
The R/W lock implementation that Nelson suggested indeed turned out to be quite
suitable.
The test below validates a good certificate, which means it has to traverse the
whole array, since I don't have the hash table implemented yet. It was performed
on a 2-way machine. The 200 validations are done in two threads, 100 in each
thread. In the first test below, I use PRLock, which serializes access to the CRL.
(nes02)/h/strange/export/home/jpierre/nss/36/mozilla/dist/SunOS5.8_OPT.OBJ/bin{44}
./good
Time for initial uncached verification : 0:11.898470
Time for 200 cached verifications : 1: 7. 87600
Average time per cached verification : 0: 0.335430
certutil: certificate is valid
73.51u 1.20s 1:19.63 93.8%
In the second test, I use NSSRWLock in read mode when reading, write mode when
updating. There were no writes during this test.
(nes02)/h/strange/export/home/jpierre/nss/36/mozilla/dist/SunOS5.8_OPT.OBJ/bin{47}
./good
Time for initial uncached verification : 0: 7.343350
Time for 200 cached verifications : 0:36. 10650
Average time per cached verification : 0: 0.180050
certutil: certificate is valid
76.77u 1.15s 0:43.85 177.6%
In the second test, both CPUs were peaked 100%. And as you can see, the time
went down from 69s to 36s, not quite half, but close.
![]() |
Assignee | |
Comment 7•23 years ago
|
||
Scratch that. I wasn't running the lock properly. I wasn't running with any lock
at all. Maybe it's time to head home.
![]() |
Assignee | |
Comment 8•23 years ago
|
||
OK, this time I really got the rwlock code in there. And actually it performs
about the same as no locking, with no updates going in the background.
(nes02)/h/strange/export/home/jpierre/nss/36/mozilla/dist/SunOS5.8_OPT.OBJ/bin{59}
./good
Time for initial uncached verification : 0: 7.291630
Time for 200 cached verifications : 0:37.465910
Average time per cached verification : 0: 0.187320
certutil: certificate is valid
79.71u 1.00s 0:45.27 178.2%
![]() |
Assignee | |
Comment 9•23 years ago
|
||
The NSSRWLock is fine if the check is long-lived, ie. as it currently is with
the linear search of the array of entries. In the tests I ran last night, one
verification was taking about 330,000 CPU-microseconds . Therefore, there wasn't
much lock contention.
However, when the hash-table is implemented (very soon), the lookup will be much
shorter. I don't have the hash table implemented yet, but I have a test that
comes very close to it - the verification of a cert listed on the CRL, which
always completes in under 1000 microseconds, as it is found on the CRL.
So I ran this revoked cert test with the NSSRwLock, and I found that things
weren't nearly as good.
Here are the results :
- 100 threads doing 100 iterations each, 2 CPUs enabled :
Cached lookups took 7.9s to complete
- 100 threads doing 100 iterations each, 1 CPU enabled :
Cached lookups took 5.5s to complete
- 2 threads doing 10000 iterations each, 2 CPUs enabled :
Cached lookups took 9.9s to complete
- 2 threads doing 10000 iterations each, 1 CPU enabled :
Cached lookups took 9.6s to complete
This shows that with 100 threads, the NSSRWLock has negative scalability of
about 33% when a second CPU is added on Solaris. With two threads it is roughly
on par between 1 and 2 CPUs, but there is no benefit from adding a second CPU.
![]() |
Assignee | |
Comment 10•23 years ago
|
||
I have completed a functional implementation of the CRL cache.
The items that were missing from last week that are now implemented are :
1) the cache always checks the token first to see if the CRL object still exists
2) there is an optional reader/writer lock to protected the decoded CRL - or
strictly speaking, the "CRLDPCache" which is a distribution point cache object.
I still have second thoughts about this, so this is #ifdef'ed by USE_RWLOCK .
When this macro is not defined, a PRLock is used instead.
3) serial numbers are matched through a hash table instead of a linear search
4) there is a new ShutdownCrlCache() cleanup function that frees the CRL cache.
It notably deletes all issuer certs and CRL objects used by the cache at
NSS_Shutdown() time
I have verified that performance is equal for both revoked and good certs. Here
are the results on lookups in a 26 MB, optimized build, on a single AMD CPU on
OS/2, using the reader/writer lock implementation.
[e:\dev\nss\36\mozilla\dist\os22.45_icc_opt.obj\bin]good
Time for initial uncached verification : 0: 7. 61028
Time for 10000 cached verifications : 0: 6.238655
Threads : 100. Iterations : 100.
Average time per cached verification : 0: 0. 617
E:\DEV\NSS\36\MOZILLA\DIST\OS22.45_ICC_OPT.OBJ\BIN\CERTUTIL.EXE: certificate is
valid
[e:\dev\nss\36\mozilla\dist\os22.45_icc_opt.obj\bin]revoked.cmd
Time for initial uncached verification : 0: 7. 48772
Time for 10000 cached verifications : 0: 6.674798
Threads : 100. Iterations : 100.
Average time per cached verification : 0: 0. 657
E:\DEV\NSS\36\MOZILLA\DIST\OS22.45_ICC_OPT.OBJ\BIN\CERTUTIL.EXE: certificate is
invalid: Peer's Certificate has been revoked.
[e:\dev\nss\36\mozilla\dist\os22.45_icc_opt.obj\bin]
Keep in mind the time for the uncached looked is optimistic - it is only so
short because of the 15,000 rpm SCSI disk. On my Ultrasparc 10 with IDE disk, it
typically takes over 20s.
I also checked that it worked fine on Solaris.
![]() |
Assignee | |
Comment 11•23 years ago
|
||
This patch contains all necessary changes for the CRL cache.
Most of the new code is at the end of certdb/crl.c
![]() |
Assignee | |
Comment 12•23 years ago
|
||
I have attached a patch which contains the implementation of the CRL cache.
It is fully function and I have fixed all the bugs that I know of. But the
source hasn't been reviewed by anyone yet, or double-checked. I am not very
happy with the code (function naming, organization). I was really struggling
when writing it - it should have been written in C++ rather than C.
There is still a scalability problem on SMP that remains a mystery.
Here is an example of a result on a dual CPU box, when verifying my Thawte
certificate.
With two CPUs enabled :
Time for initial uncached verification : 0: 0.129470
Time for 10000 cached verifications : 0:41.128490
Threads : 100. Iterations : 100.
Average time per cached verification : 0: 0. 4110
With one CPU enabled :
Time for initial uncached verification : 0: 0. 99230
Time for 10000 cached verifications : 0:26.120070
Threads : 100. Iterations : 100.
Average time per cached verification : 0: 0. 2610
As you see, with a single CPU, we complete the 10000 verifications in 2/3rd of
the time as with two CPUs.
I have monitored the CPU usage with vmstat. Both CPUs are peaked at all times -
there isn't any idling. In single CPU mode, the lone CPU is also peaked.
FYI, most of the time in this test is not spent in the cache, but rather in the
certificate signature verification. My Thawte cert has a 2048-bit key.
The previous tests I conducted were with certs that had a key of only 512 bits.
This is why they could verify in only about 600 microseconds - vs 2600
microseconds for this Thawte cert with a 2048 bits key.
I will write a new test case that only verifies the CRL rather than perform the
whole certificate verification, to eliminate this variable from the tests, as
well as others, such as the number of certs in the cert chain that are getting
verified.
![]() |
Assignee | |
Comment 13•23 years ago
|
||
FYI, I also measured a very noticeable improvement with PSM.
For my test, I used Mozilla 1.1 with my current cert/key database - which has
about 10 certs, including AOL certs and Thawte certs, and AOL and Thawte CRLs. I
measured the time to open PSM, on the 400 MHz OS/2 machine in my office.
- with NSS 3.6 tip : 15 seconds
- with NSS 3.6 + CRL cache : 5 seconds
This is in large part because PSM is doing 40 verifications for each cert, and
therefore pulls the CRL 40 times. So the CRL cache is a huge win for PSM too,
not just for servers.
![]() |
Assignee | |
Comment 14•23 years ago
|
||
Code checked in. NSS 3.6 is now feature complete
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•