Last Comment Bug 177366 - clean up cert reference counting
: clean up cert reference counting
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.4
: All All
: P1 normal (vote)
: 3.6.1
Assigned To: Ian McGreer
: Bishakha Banerjee
Mentors:
Depends on:
Blocks: psmleaks 177260
  Show dependency treegraph
 
Reported: 2002-10-29 13:52 PST by Ian McGreer
Modified: 2002-11-26 13:48 PST (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Patch v1 - clean up cert ref counting (12.99 KB, patch)
2002-10-29 13:56 PST, Ian McGreer
no flags Details | Diff | Splinter Review
experimental patch, Ian says this will not work for servers (2.97 KB, patch)
2002-10-31 10:05 PST, Kai Engert (:kaie)
no flags Details | Diff | Splinter Review
v2, need to remove addref when certs are put in cache (13.64 KB, patch)
2002-10-31 11:11 PST, Ian McGreer
no flags Details | Diff | Splinter Review

Description Ian McGreer 2002-10-29 13:52:00 PST
This bug arose because Kai found there are still problems with the handling of
cert reference counts in NSS.  These problems were introduced in 3.4. 
Workarounds for previous bugs have seemed to work for the most part, but there
are apparently some instances where problems may still occur.

Before 3.4, once a cert entered the system (became a CERTCertificate), it was
immediately stored in the temp db (there is one exception,
CERT_DecodeDERCertificate could create a "free-floating" cert).  Entries in the
temp db consisted of references to both perm certs and temp certs.  When a cert
was destroyed, the lock to the the temp db was obtained, and if the refcount was
zero, the temp db reference was removed and freed.  Thus, the 0th reference to
the cert could be thought of as protected by the temp db's lock.

From 3.4 on, the temp db split into a true temporary storage (hashtable), and a
cache (also a hashtable).  These modules are separate.  The implementation of
the first is within NSSCryptoContext, and the second is within NSSTrustDomain. 
In Stan, it was intended that NSSCryptoContext would manage a set of temp certs,
and they would be freed when the context was freed.  Also, the trust domain's
cache would not be fixed-size, and thus could manage cert references on its own.

Unfortunately, those ideas found their way into the 3.4 code, which must operate
as previous versions did.  The crypto context store and the trust domain cache
both held true references to the cert, which had to be freed along with the last
reference to the cert.  This created timing problems when one thread was trying
to extract a cert from the cache while another was destroying it.  Those
problems were fixed with some workarounds, which seemed to solve server-side
problems, but client-side problems remain (the S/MIME code in particular).

I have decided to do the obvious, which is to fully restore the old NSS
behavior.  What I did was to expose the locks for the crypto context store and
the trust domain cache.  When a cert is freed, it obtains the appropriate lock,
decrements its refcount, and if the refcount is zero it removes itself from
whichever store it is in.  This is essentially grafting a 3.X way of doing
things into 4.0 code, but it is necessary for 3.4+ to work correctly.

patch forthcoming.
Comment 1 Ian McGreer 2002-10-29 13:56:13 PST
Created attachment 104555 [details] [diff] [review]
Patch v1 - clean up cert ref counting
Comment 2 Kai Engert (:kaie) 2002-10-31 09:54:15 PST
Comment on attachment 104555 [details] [diff] [review]
Patch v1 - clean up cert ref counting

With this patch, I see additional NSS shutdown failures.
Comment 3 Kai Engert (:kaie) 2002-10-31 10:05:06 PST
Created attachment 104768 [details] [diff] [review]
experimental patch, Ian says this will not work for servers

Ian, thanks a lot for working on this. However, I fear more work might is
necessary.

I'm attaching this patch for completeness, it makes it easier to explain the
problem I see. It is the patch I had sent to you by email, and which you said
will not work in servers.


Patch v1 in bug 177260 fixes all shutdown failures I can currently reproduce.
That patch contains fixes to some NSS issues (tracked in separate bugs).
The patch in bug 177260 also includes this small patch to NSS I'm attaching in
here.

My first test with Ian's patch was bug 157937.

The interesting this is:
- bug 157937 is fixed by patch v1 from bug 177260, regardless whether I use the
full patch, or whether I exclude this small snippet from this attachment.
(However, the snippet is required to fix other leaks)

- bug 157937 is not fixed with the patch "clean up cert ref counting" attached
to this bug

So I fear the patch v1 in this bug is introducing a new leak.
Comment 4 Kai Engert (:kaie) 2002-10-31 10:09:12 PST
Of course, when I said:
  bug 157937 is not fixed with the patch "clean up cert ref counting" 
  attached to this bug
I meant:
  combine
    patch v1 from 177260, but exclude the snippet attached here,
    titled "experimental patch"
  with
    Patch v1 from this bug.
Comment 5 Ian McGreer 2002-10-31 11:11:22 PST
Created attachment 104773 [details] [diff] [review]
v2,  need to remove addref when certs are put in cache
Comment 6 Kai Engert (:kaie) 2002-10-31 12:12:43 PST
Comment on attachment 104773 [details] [diff] [review]
v2,  need to remove addref when certs are put in cache

This patch v2, together with patch v1 from bug 177260, works in all the cases I
have tested.
Comment 7 Kai Engert (:kaie) 2002-11-04 04:56:21 PST
I'd like to nominate this patch for checkin to the 3_6 branch.
Comment 8 Robert Relyea 2002-11-04 12:51:11 PST
OK by me. Kaie, do you have a list of patches we are trying to get through drivers?

bob
Comment 9 Ian McGreer 2002-11-06 11:05:59 PST
patch checked in to tip.
Comment 10 Ian McGreer 2002-11-21 09:29:35 PST
marking this fixed on the tip.
Comment 11 Wan-Teh Chang 2002-11-26 13:48:01 PST
The patch (v2) has been checked into the NSS_3_6_BRANCH.
Set the target milestone to 3.6.1.

Note You need to log in before you can comment on or make changes to this bug.