Closed Bug 121916 Opened 23 years ago Closed 19 years ago

Tracking bug: Certificate manager slowness

Categories

(Core Graveyard :: Security: UI, defect, P3)

1.0 Branch
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: julien.pierre, Unassigned)

References

Details

(Keywords: meta, Whiteboard: [kerh-m])

Attachments

(1 file, 2 obsolete files)

I have a large certificate database - 2.5 MB. On Windows it takes 7s to display the certificates going to edit/preferences/privacy&security/certificates/manage certificates. On OS/2 it takes 35s . During that time, the CPU and disk is busy. It definitely should not take that long. The certutil tool from nss can list certs from my cert database it in under a second, so I don't think my problem is with NSS, but rather in the client code. Even on Windows it should not take 7s. I have attached my certdb which only contains public certificates so nothing is being compromised.
What are the CPU speeds on the Windows and OS/2 systems?
Sorry, forgot to mention it. Pentium II 450 with 384 MB RAM for the Windows box; Pentium II 400 with 192 MB RAM for the OS/2 box. The difference in time was still the same even before the Windows box got its RAM tripled from 128 to 384 MB last year. I don't believe the problem is related to hardware differences but software instead.
Ivan and/or I will give this a look and report back here about our findings.
Most of the os2 versus win32 time difference is because of disk cacheing. The os2 hpfs 2mb disk cache size limit hurts quite a bit. On my 700MHz thinkpad, 32 sec to open manage on hpfs, while only 12 sec on an os2 jfs drive with 10mb cache. Recommend you evaluate using os2 jfs. Much of remaining difference seems due to PR_Lock/PR_Unlock implementation differences between os2 mozilla and windows mozilla. (lack of asm PR_Lock on earlier versions os2 mozilla). As to why load is 7-10 sec instead of 1-2 sec with certutil, I think is beyound the scope of this reply. Most of the time is spent in SEC_ASN1DecoderUpdate and and children routines of that routine. Sam
Sam, My test was performed with Mozilla on a JFS volume, using a 24 MB disk cache. Still the difference between it and Windows performance was quite high. I did notice some disk I/O, but not that much. The Windows box does have a faster UltraWide SCSI disk on a Symbios controller, vs busmaster DMA IDE disk on the OS/2 box, but still neither has a slow disk. I don't understand why so much I/O needs to be done in this case though. The database is 2.5 MB so it should all fit in the disk cache on both systems. The rest of the difference should be explained by CPU-consuming routines. If we are spending a lot more in the ASN.1 decoder on OS/2 than Windows to do the same operations, we should open an NSS bug on it. There may be some assembly code involved that's not being invoked on OS/2. I did make some changes to use more asm code on OS/2 with NSS 3.4, and perhaps those changes haven't made it into the client tree. We should definitely retest this with the client once NSS 3.4 is in the client tree, if it is not already.
Just talked to Nelson about it. If the security manager is verifying certs, it would call into the low-level crypto code for RSA, which gets huge improvements from assembly code on OS/2 in NSS 3.4 by the changes I made, and should be on par with Windows. This is probably the explanation for the 5x difference. Hopefully the client will use NSS 3.4 soon and this bug can be resolved.
The reason for moderate/incomplete cacheing is because file is read in 4K bytes records 'random access', reading all over file, not read in any sequential order. The file system never kicks in read-ahead cache. Sam
It sounds like the disk cache algorithm could use improvements then, if it can't figure out that this relatively small file (compared to the cache size) is being accessed so frequently and just cache it all. Windows seems to be able to do it, though arguably it has a dynamically sized cache, which makes it easier.
I ran some tests on my 333 MHZ machine/ 128 mb with HPFS files. The problem it seems is much worse using the 9.7 OS/2 driver (12/26) than it is in the 6.2.1 OS/2 driver (9.4+ equivalent) 6.2.1 --> first time 34 seconds, close window rehit manage button 29 seconds. 9.7 --> first time 97 seconds, close window rehit manage button 49 seconds. on NT 6.2.1 on same machine I got 10 seconds followed by 9.7 seconds. The problem is much worse on 9.7 than it was in 9.4. That is why Sam and Julien have such different results. Ivan
Stephane, Do you know what changed between 0.9.4.1 and 0.9.7 in PSM that would make the security manager that much slower (3x) ?
This problem sounds like a good case for keeping the NSS cert DB all in memory rather than access it sequentially from disk. Of course we have memory constraints from the client folks, but I think this should be an option for better performance. FYI I attempted an OS/2 client build with NSS_3_4=1 to see if the performance would get better due to the asm code in freebl. NSS 3.4 built as part of the client tree, but PSM did not. It was looking for libsoftoken.lib when trying to build mozilla/security/manager/ssl/src . There was no such library. The softoken was built as softokn3.lib / softokn3.dll successfully, though.
You really don't want to do that. Cert.db's can have a huge number of certs in them, and can take up quite a lot of memory. Better just to cache the certs you are using often. bob
I should add that I was using Netscape 6.2 on Windows (based on mozilla 0.9.4.1) and a current build of Mozilla (0.9.7+) on OS/2. I wasn't comparing the same builds. I don't know what the current build of Mozilla on Windows would give - I'm using Netscape 6.2 as a production browser for mail, so I don't want to switch at the moment. I'll try an 0.9.4.1 build on OS/2 to do an apples-to-apples comparison.
I just found that by having your 2.5 meg cert7.db file, it added 2 full seconds to my startup time on OS/2 9.4+ time (10%+ increase).
The increased launch time is partially due to to the fact that NSS initialization is delayed now. There isn't significant changes in NSS between 6.2.1 and 0.9.7. Also, presenting the Cert dialog is the one operation that will scale with the size of the certDB. Other operations access certs through DB indexes and shouldn't be affected by the size of ther certdb.
When I said it added 2 seconds to my startup time I meant Browser startup time without Turbo option.
I just installed 1/29 build of OS/2 Mozilla 9.8+ . The large Cert7.db did not seem to change startup time. The manage certificate numbers are listed below along with previuos measurements : 6.2.1 --> first time 34 seconds, close window rehit manage button 29 seconds. 9.7 --> first time 97 seconds, close window rehit manage button 49 seconds. 9.8+ --> first time 53 seconds, close window rehit manage button 43 seconds. It appears that os/2 6.2.1 (9.4+) is quicker than the more current drivers. Ivan
With trunks builds, NSS doesn't get initialized until the first time you do a crypto operation. This means that the NSS database won't get loaded unless you do some crypto operation at start up, like SSL/IMAP or encrypted wallet. Regardless, neither of these will happen until *after* the first window is opened (not including the window for selecting a profile). Either way, there is work to reduce the time spent loading the cert db's during NSS initialization which will land as soon as NSS 3.4 goes into the trunk.
I retested this on OS/2 today. The test case was to create a new Mozilla profile and copy the cert7.db attached to this defect, then go to edit/preferences/privacy & security/manage certificates, and wait for the dialog to be fully drawn. I used the system clock time to measure time. With an OS/2 0.9.4.1 build provided by IBM, the load time was 22 seconds. With an OS/2 daily build 2002013016 (0.9.7+ trunk), also provided by IBM, the load time was 38 seconds. Both builds were using the same version of NSS - 3.3. So it looks like there is a large regression in performance occurring between 0.9.4 and the current trunk. I will do some comparison tests on Windows as well.
As was discussed earlier today, current Mozilla builds load PSM/NSS on demand on the first crypto operation. That adds some time to the first load of this problem as "manage certificates" is the first crypto operation. I repeated the test in the 0.9.7+ daily OS/2 build and it was 34s the second time around, about 4 seconds less. That may be attributable either to NSS not having to be reloaded or to disk caching, or both.
Here are the results of the Windows tests : with Netscape 6.2 (Mozilla 0.9.4.1 ) : 7s to 8s with Mozilla 0.9.7+ 01/28 build using NSS 3.3 : 22s first load, 19s second load with Mozilla 0.9.7+ 01/28 build using NSS 3.4 : could not display the DB, the client crashed . This was an optimized build so I had no chance to check the problem. I still don't have an OS/2 build with NSS 3.4 due to remaining build issues with PSM when 3.4 is engaged. NSS 3.4 standalone does build on OS/2.
Changing Platform & OS to All since this is a cross-platform performance regression.
OS: OS/2 → All
Hardware: PC → All
Have you tried visiting an SSL site first, then viewing the ceritificate manager? How do those numbers compare?
Javier - I tried going to https://www.etrade.com first, and repeating the tests: on OS/2 0.9.4.1 : 24s first load, 20s second load on OS/2 0.9.7+ : 37s first load, 35s second load. I think this is within the margin of error of the results I got yesterday, so I don't believe going to an SSL site before manage certs made a difference.
Julien, let's revisit when 3.4 is the standard version on daily builds.
Priority: -- → P3
Target Milestone: --- → 2.2
Stephane, My tests have shown that the regression did not come from NSS - with the same 3.3 version of NSS, things got much slower between 0.9.4 and 0.9.7+ as of last week. It's fine if you want to wait to investigate, but I don't think NSS 3.4 can buy you all the performance that you lost somewhere else.
Using a 02/06 0.9.8+ build including NSS 3.4 on OS/2, manage certs now takes 43s to display the contents of my cert db.
One of the performance problems that I identified is the certificate verification, which is done repeatedly for each certificate in order to display cerificate usage. This patch is to use a new more efficient verification function.
Depends on: 149832
Attached patch patch update (obsolete) — Splinter Review
This replaces the previous patch, which contained some errors in checking the bitfields that caused it not to display the correct status after calling the new function.
Attachment #87618 - Attachment is obsolete: true
By debugging my CERT_VerifyCertificate function with the patch, I noticed several problems. First, there were still multiple calls to CERT_VerifyCertificate made for each certificate, even though I had applied my patch to fill the usageArrary in a single call. The problem is that Mozilla's cert manager queries the status again for each column and usage, eg. once for the "status" column, and then another for each of the "purposes" ! This results in 5 calls to CERT_VerifyCertificate per certificate. Previously, there was 5 x 8 = 40 CERT_VerifyCert calls per certificate. This can be reduced to 1 call if the mozilla cert manager is smart enough to save the usage array in an object per cell, instead of discarding it. I have not yet measured the actual performance benefit yet, but I expect it to be significant when CRLs are installed. Currently OCSP always gets disabled in the cert manager, so it isn't possible to measure. I will comment the code that disables OCSP for cert manager next week to do these measurements.
Thawte is currently not serving their CRL, so I'm not able to do a performance check for CRLs. Our internal CRL is too small to show a difference. I will however try OCSP with our internal certs.
Depends on: 124037, 149834
Keywords: nsbeta1
Comment on attachment 90172 [details] [diff] [review] patch update patch has been moved to separate dependent bug #149834
Attachment #90172 - Attachment is obsolete: true
Keywords: nsbeta1
Summary: Manage certificates horribly slow → Tracking bug: Certificate manager slowness
Target Milestone: 2.2 → Future
Mass reassign ssaux bugs to nobody
Assignee: ssaux → nobody
Mass change "Future" target milestone to "--" on bugs that now are assigned to nobody. Those targets reflected the prioritization of past PSM management. Many of these should be marked invalid or wontfix, I think.
Target Milestone: Future → ---
Product: PSM → Core
Keywords: meta
Whiteboard: [kerh-m]
I'm changing this to worksforme, as all tracked dependencies have been fixed.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WORKSFORME
Version: psm1.01 → 1.0 Branch
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: