2.97 KB, patch
|Details | Diff | Splinter Review|
13.64 KB, patch
|Details | Diff | Splinter Review|
This bug arose because Kai found there are still problems with the handling of cert reference counts in NSS. These problems were introduced in 3.4. Workarounds for previous bugs have seemed to work for the most part, but there are apparently some instances where problems may still occur. Before 3.4, once a cert entered the system (became a CERTCertificate), it was immediately stored in the temp db (there is one exception, CERT_DecodeDERCertificate could create a "free-floating" cert). Entries in the temp db consisted of references to both perm certs and temp certs. When a cert was destroyed, the lock to the the temp db was obtained, and if the refcount was zero, the temp db reference was removed and freed. Thus, the 0th reference to the cert could be thought of as protected by the temp db's lock. From 3.4 on, the temp db split into a true temporary storage (hashtable), and a cache (also a hashtable). These modules are separate. The implementation of the first is within NSSCryptoContext, and the second is within NSSTrustDomain. In Stan, it was intended that NSSCryptoContext would manage a set of temp certs, and they would be freed when the context was freed. Also, the trust domain's cache would not be fixed-size, and thus could manage cert references on its own. Unfortunately, those ideas found their way into the 3.4 code, which must operate as previous versions did. The crypto context store and the trust domain cache both held true references to the cert, which had to be freed along with the last reference to the cert. This created timing problems when one thread was trying to extract a cert from the cache while another was destroying it. Those problems were fixed with some workarounds, which seemed to solve server-side problems, but client-side problems remain (the S/MIME code in particular). I have decided to do the obvious, which is to fully restore the old NSS behavior. What I did was to expose the locks for the crypto context store and the trust domain cache. When a cert is freed, it obtains the appropriate lock, decrements its refcount, and if the refcount is zero it removes itself from whichever store it is in. This is essentially grafting a 3.X way of doing things into 4.0 code, but it is necessary for 3.4+ to work correctly. patch forthcoming.
Comment on attachment 104555 [details] [diff] [review] Patch v1 - clean up cert ref counting With this patch, I see additional NSS shutdown failures.
Created attachment 104768 [details] [diff] [review] experimental patch, Ian says this will not work for servers Ian, thanks a lot for working on this. However, I fear more work might is necessary. I'm attaching this patch for completeness, it makes it easier to explain the problem I see. It is the patch I had sent to you by email, and which you said will not work in servers. Patch v1 in bug 177260 fixes all shutdown failures I can currently reproduce. That patch contains fixes to some NSS issues (tracked in separate bugs). The patch in bug 177260 also includes this small patch to NSS I'm attaching in here. My first test with Ian's patch was bug 157937. The interesting this is: - bug 157937 is fixed by patch v1 from bug 177260, regardless whether I use the full patch, or whether I exclude this small snippet from this attachment. (However, the snippet is required to fix other leaks) - bug 157937 is not fixed with the patch "clean up cert ref counting" attached to this bug So I fear the patch v1 in this bug is introducing a new leak.
Of course, when I said: bug 157937 is not fixed with the patch "clean up cert ref counting" attached to this bug I meant: combine patch v1 from 177260, but exclude the snippet attached here, titled "experimental patch" with Patch v1 from this bug.
Created attachment 104773 [details] [diff] [review] v2, need to remove addref when certs are put in cache
Comment on attachment 104773 [details] [diff] [review] v2, need to remove addref when certs are put in cache This patch v2, together with patch v1 from bug 177260, works in all the cases I have tested.
I'd like to nominate this patch for checkin to the 3_6 branch.
OK by me. Kaie, do you have a list of patches we are trying to get through drivers? bob
patch checked in to tip.
marking this fixed on the tip.
The patch (v2) has been checked into the NSS_3_6_BRANCH. Set the target milestone to 3.6.1.