Closed Bug 121916 Opened 20 years ago Closed 16 years ago

Tracking bug: Certificate manager slowness

Categories

(Core Graveyard :: Security: UI, defect, P3)

1.0 Branch
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: julien.pierre, Unassigned)

References

Details

(Keywords: meta, Whiteboard: [kerh-m])

Attachments

(1 file, 2 obsolete files)

I have a large certificate database - 2.5 MB. On Windows it takes 7s to display
the certificates going to edit/preferences/privacy&security/certificates/manage
certificates. On OS/2 it takes 35s . During that time, the CPU and disk is busy.
It definitely should not take that long. The certutil tool from nss can list
certs from my cert database it in under a second, so I don't think my problem is
with NSS, but rather in the client code. Even on Windows it should not take 7s.

I have attached my certdb which only contains public certificates so nothing is
being compromised.
What are the CPU speeds on the Windows and OS/2 systems?
Sorry, forgot to mention it.
Pentium II 450 with 384 MB RAM for the Windows box; Pentium II 400 with 192 MB
RAM for the OS/2 box. 

The difference in time was still the same even before the Windows box got its
RAM tripled from 128 to 384 MB last year.

I don't believe the problem is related to hardware differences but software instead.

Ivan and/or I will give this a look and report back here about our 
findings. 
Most of the os2 versus win32 time difference is because of disk 
cacheing. The os2 hpfs 2mb disk cache size limit hurts quite a bit. On
my 700MHz thinkpad, 32 sec to open manage on hpfs, while only 12 sec on
an os2 jfs drive with 10mb cache. Recommend you evaluate using os2 jfs.  

Much of remaining difference seems due to PR_Lock/PR_Unlock implementation 
differences between os2 mozilla and windows mozilla. (lack of asm PR_Lock on 
earlier versions os2 mozilla). 

As to why load is 7-10 sec instead of 1-2 sec with certutil, I think is beyound 
the scope of this reply. Most of the time is spent in SEC_ASN1DecoderUpdate and 
and children routines of that routine. 
Sam     
  
       
Sam,

My test was performed with Mozilla on a JFS volume, using a 24 MB disk cache.
Still the difference between it and Windows performance was quite high. I did
notice some disk I/O, but not that much. The Windows box does have a faster
UltraWide SCSI disk on a Symbios controller, vs busmaster DMA IDE disk on the
OS/2 box, but still neither has a slow disk. I don't understand why so much I/O
needs to be done in this case though. The database is 2.5 MB so it should all
fit in the  disk cache on both systems. The rest of the difference should be
explained by CPU-consuming routines.
If we are spending a lot more in the ASN.1 decoder on OS/2 than Windows to do
the same operations, we should open an NSS bug on it. There may be some assembly
code involved that's not being invoked on OS/2. I did make some changes to use
more asm code on OS/2 with NSS 3.4, and perhaps those changes haven't made it
into the client tree. We should definitely retest this with the client once NSS
3.4 is in the client tree, if it is not already.


Just talked to Nelson about it. If the security manager is verifying certs, it
would call into the low-level crypto code for RSA, which gets huge improvements
from assembly code on OS/2 in NSS 3.4 by the changes I made, and should be on
par with Windows. This is probably the explanation for the 5x difference.
Hopefully the client will use NSS 3.4 soon and this bug can be resolved.
The reason for moderate/incomplete cacheing is because file is read in 
4K bytes records 'random access', reading all over file, not read in 
any sequential order. The file system never kicks in read-ahead cache.
Sam   
It sounds like the disk cache algorithm could use improvements then, if it can't
figure out that this relatively small file (compared to the cache size) is being
accessed so frequently and just cache it all. Windows seems to be able to do it,
though arguably it has a dynamically sized cache, which makes it easier.

I ran some tests on my 333 MHZ machine/ 128 mb with HPFS files.  The problem it
seems is much worse using the 9.7 OS/2 driver (12/26) than it is in the 6.2.1
OS/2 driver (9.4+ equivalent)

6.2.1 --> first time 34 seconds,  close window rehit manage button 29 seconds.

9.7   --> first time 97 seconds,  close window rehit manage button 49 seconds.

on NT 6.2.1 on same machine I got 10 seconds followed by 9.7 seconds.

The problem is much worse on 9.7 than it was in 9.4.  That is why Sam and Julien
have such different results.


Ivan
Stephane,

Do you know what changed between 0.9.4.1 and 0.9.7 in PSM that would make the
security manager that much slower (3x) ?
This problem sounds like a good case for keeping the NSS cert DB all in memory
rather than access it sequentially from disk. Of course we have memory
constraints from the client folks, but I think this should be an option for
better performance.

FYI I attempted an OS/2 client build with NSS_3_4=1 to see if the performance
would get better due to the asm code in freebl. NSS 3.4 built as part of the
client tree, but PSM did not. It was looking for libsoftoken.lib when trying to
build mozilla/security/manager/ssl/src . There was no such library. The softoken
was built as softokn3.lib / softokn3.dll successfully, though.
You really don't want to do that. Cert.db's can have a huge number of certs in
them, and can take up quite a lot of memory. Better just to cache the certs you
are using often.

bob
I should add that I was using Netscape 6.2 on Windows (based on mozilla 0.9.4.1)
and a current build of Mozilla (0.9.7+) on OS/2. I wasn't comparing the same
builds. I don't know what the current build of Mozilla on Windows would give -
I'm using Netscape 6.2 as a production browser for mail, so I don't want to
switch at the moment.
I'll try an 0.9.4.1 build on OS/2 to do an apples-to-apples comparison.
I just found that by having your 2.5 meg cert7.db file,  it added 2 full seconds
to my startup time on OS/2 9.4+ time (10%+ increase).
The increased launch time is partially due to to the fact that NSS
initialization is delayed now. There isn't significant changes in NSS between
6.2.1 and 0.9.7.

Also, presenting the Cert dialog is the one operation that will scale with the
size of the certDB. Other operations access certs through DB indexes and
shouldn't be affected by the size of ther certdb.
When I said it added 2 seconds to my startup time I meant Browser startup time
without Turbo option.
I just installed 1/29 build of OS/2 Mozilla 9.8+ .
The large Cert7.db did not seem to change startup time.  The manage
certificate numbers are listed below along with previuos measurements :

6.2.1 --> first time 34 seconds,  close window rehit manage button 29 seconds.

9.7   --> first time 97 seconds,  close window rehit manage button 49 seconds.

9.8+  --> first time 53 seconds,  close window rehit manage button 43 seconds.

It appears that os/2 6.2.1 (9.4+) is quicker than the more current drivers.

Ivan 
With trunks builds, NSS doesn't get initialized until the first time you do a
crypto operation.  This means that the NSS database won't get loaded unless you
do some crypto operation at start up, like SSL/IMAP or encrypted wallet. 
Regardless, neither of these will happen until *after* the first window is
opened (not including the window for selecting a profile).

Either way, there is work to reduce the time spent loading the cert db's during
NSS initialization which will land as soon as NSS 3.4 goes into the trunk.
I retested this on OS/2 today. The test case was to create a new Mozilla profile
and copy the cert7.db attached to this defect, then go to
edit/preferences/privacy & security/manage certificates, and wait for the dialog
to be fully drawn. I used the system clock time to measure time.

With an OS/2 0.9.4.1 build provided by IBM, the load time was 22 seconds.
With an OS/2 daily build 2002013016 (0.9.7+ trunk), also provided by IBM, the
load time was 38 seconds.

Both builds were using the same version of NSS - 3.3.

So it looks like there is a large regression in performance occurring between
0.9.4 and the current trunk. I will do some comparison tests on Windows as well.
As was discussed earlier today, current Mozilla builds load PSM/NSS on demand on
the first crypto operation. That adds some time to the first load of this
problem as "manage certificates" is the first crypto operation.
I repeated the test in the 0.9.7+ daily OS/2 build and it was 34s the second
time around, about 4 seconds less. That may be attributable either to NSS not
having to be reloaded or to disk caching, or both.
Here are the results of the Windows tests :
with Netscape 6.2 (Mozilla 0.9.4.1 ) : 7s to 8s
with Mozilla 0.9.7+ 01/28 build using NSS 3.3 : 22s first load, 19s second load
with Mozilla 0.9.7+ 01/28 build using NSS 3.4 : could not display the DB, the
client crashed . This was an optimized build so I had no chance to check the
problem.

I still don't have an OS/2 build with NSS 3.4 due to remaining build issues with
PSM when 3.4 is engaged. NSS 3.4 standalone does build on OS/2.
Changing Platform & OS to All since this is a cross-platform performance regression.
OS: OS/2 → All
Hardware: PC → All
Have you tried visiting an SSL site first, then viewing the ceritificate manager?

How do those numbers compare?
Javier - I tried going to https://www.etrade.com first, and repeating the tests:
on OS/2 0.9.4.1 : 24s first load, 20s second load
on OS/2 0.9.7+ : 37s first load, 35s second load.

I think this is within the margin of error of the results I got yesterday, so I
don't believe going to an SSL site before manage certs made a difference.
Julien, let's revisit when 3.4 is the standard version on daily builds.
Priority: -- → P3
Target Milestone: --- → 2.2
Stephane,

My tests have shown that the regression did not come from NSS - with the same
3.3 version of NSS, things got much slower between 0.9.4 and 0.9.7+ as of last week.
It's fine if you want to wait to investigate, but I don't think NSS 3.4 can buy
you all the performance that you lost somewhere else.
Using a 02/06 0.9.8+ build including NSS 3.4 on OS/2, manage certs now takes 43s
to display the contents of my cert db.
One of the performance problems that I identified is the certificate
verification, which is done repeatedly for each certificate in order to display
cerificate usage. This patch is to use a new more efficient verification
function.
Depends on: 149832
Attached patch patch update (obsolete) — Splinter Review
This replaces the previous patch, which contained some errors in checking the
bitfields that caused it not to display the correct status after calling the
new function.
Attachment #87618 - Attachment is obsolete: true
By debugging my CERT_VerifyCertificate function with the patch, I noticed
several problems.

First, there were still multiple calls to CERT_VerifyCertificate made for each
certificate, even though I had applied my patch to fill the usageArrary in a
single call.

The problem is that Mozilla's cert manager queries the status again for each
column and usage, eg. once for the "status" column, and then another for each of
the "purposes" ! This results in 5 calls to CERT_VerifyCertificate per certificate.

Previously, there was 5 x 8 = 40 CERT_VerifyCert calls per certificate.
This can be reduced to 1 call if the mozilla cert manager is smart enough to
save the usage array in an object per cell, instead of discarding it.

I have not yet measured the actual performance benefit yet, but I expect it to
be significant when CRLs are installed. Currently OCSP always gets disabled in
the cert manager, so it isn't possible to measure. I will comment the code that
disables OCSP for cert manager next week to do these measurements.
Thawte is currently not serving their CRL, so I'm not able to do a performance
check for CRLs. Our internal CRL is too small to show a difference.
I will however try OCSP with our internal certs.
Depends on: 124037, 149834
Keywords: nsbeta1
Comment on attachment 90172 [details] [diff] [review]
patch update

patch has been moved to separate dependent bug #149834
Attachment #90172 - Attachment is obsolete: true
Keywords: nsbeta1
Summary: Manage certificates horribly slow → Tracking bug: Certificate manager slowness
Target Milestone: 2.2 → Future
Mass reassign ssaux bugs to nobody
Assignee: ssaux → nobody
Mass change "Future" target milestone to "--" on bugs that now are assigned to
nobody.  Those targets reflected the prioritization of past PSM management.
Many of these should be marked invalid or wontfix, I think.
Target Milestone: Future → ---
Product: PSM → Core
Keywords: meta
Whiteboard: [kerh-m]
I'm changing this to worksforme, as all tracked dependencies have been fixed.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Version: psm1.01 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.