Closed Bug 177366 Opened 22 years ago Closed 22 years ago

clean up cert reference counting

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugz, Assigned: bugz)

References

Details

Attachments

(2 files, 1 obsolete file)

This bug arose because Kai found there are still problems with the handling of
cert reference counts in NSS.  These problems were introduced in 3.4. 
Workarounds for previous bugs have seemed to work for the most part, but there
are apparently some instances where problems may still occur.

Before 3.4, once a cert entered the system (became a CERTCertificate), it was
immediately stored in the temp db (there is one exception,
CERT_DecodeDERCertificate could create a "free-floating" cert).  Entries in the
temp db consisted of references to both perm certs and temp certs.  When a cert
was destroyed, the lock to the the temp db was obtained, and if the refcount was
zero, the temp db reference was removed and freed.  Thus, the 0th reference to
the cert could be thought of as protected by the temp db's lock.

From 3.4 on, the temp db split into a true temporary storage (hashtable), and a
cache (also a hashtable).  These modules are separate.  The implementation of
the first is within NSSCryptoContext, and the second is within NSSTrustDomain. 
In Stan, it was intended that NSSCryptoContext would manage a set of temp certs,
and they would be freed when the context was freed.  Also, the trust domain's
cache would not be fixed-size, and thus could manage cert references on its own.

Unfortunately, those ideas found their way into the 3.4 code, which must operate
as previous versions did.  The crypto context store and the trust domain cache
both held true references to the cert, which had to be freed along with the last
reference to the cert.  This created timing problems when one thread was trying
to extract a cert from the cache while another was destroying it.  Those
problems were fixed with some workarounds, which seemed to solve server-side
problems, but client-side problems remain (the S/MIME code in particular).

I have decided to do the obvious, which is to fully restore the old NSS
behavior.  What I did was to expose the locks for the crypto context store and
the trust domain cache.  When a cert is freed, it obtains the appropriate lock,
decrements its refcount, and if the refcount is zero it removes itself from
whichever store it is in.  This is essentially grafting a 3.X way of doing
things into 4.0 code, but it is necessary for 3.4+ to work correctly.

patch forthcoming.
Blocks: psmleaks, 177260
Comment on attachment 104555 [details] [diff] [review]
Patch v1 - clean up cert ref counting

With this patch, I see additional NSS shutdown failures.
Attachment #104555 - Flags: needs-work+
Ian, thanks a lot for working on this. However, I fear more work might is
necessary.

I'm attaching this patch for completeness, it makes it easier to explain the
problem I see. It is the patch I had sent to you by email, and which you said
will not work in servers.


Patch v1 in bug 177260 fixes all shutdown failures I can currently reproduce.
That patch contains fixes to some NSS issues (tracked in separate bugs).
The patch in bug 177260 also includes this small patch to NSS I'm attaching in
here.

My first test with Ian's patch was bug 157937.

The interesting this is:
- bug 157937 is fixed by patch v1 from bug 177260, regardless whether I use the
full patch, or whether I exclude this small snippet from this attachment.
(However, the snippet is required to fix other leaks)

- bug 157937 is not fixed with the patch "clean up cert ref counting" attached
to this bug

So I fear the patch v1 in this bug is introducing a new leak.
Attachment #104555 - Attachment description: clean up cert ref counting → Patch v1 - clean up cert ref counting
Of course, when I said:
  bug 157937 is not fixed with the patch "clean up cert ref counting" 
  attached to this bug
I meant:
  combine
    patch v1 from 177260, but exclude the snippet attached here,
    titled "experimental patch"
  with
    Patch v1 from this bug.
Attachment #104555 - Attachment is obsolete: true
Comment on attachment 104773 [details] [diff] [review]
v2,  need to remove addref when certs are put in cache

This patch v2, together with patch v1 from bug 177260, works in all the cases I
have tested.
I'd like to nominate this patch for checkin to the 3_6 branch.
OK by me. Kaie, do you have a list of patches we are trying to get through drivers?

bob
patch checked in to tip.
marking this fixed on the tip.
Status: NEW → RESOLVED
Closed: 22 years ago
Priority: -- → P1
Resolution: --- → FIXED
Target Milestone: --- → 3.7
The patch (v2) has been checked into the NSS_3_6_BRANCH.
Set the target milestone to 3.6.1.
Target Milestone: 3.7 → 3.6.1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: