Closed Bug 176168 Opened 23 years ago Closed 23 years ago

CERT_AddTempCertToPerm leaks

Categories

(NSS :: Libraries, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: KaiE, Assigned: julien.pierre)

Details

(Whiteboard: [3.5.1])

Attachments

(1 file)

I was hunting for the possible cause of bug 175153. I am able to prove the call to CERT_AddTempCertToPerm is producing a leak and causes NSS to fail shutting itself down. I could track it down to the PK11_ReferenceSlot call in fill_CERTCertificateFields in file pki3hack.c. By not increasing the refcount, but only assigning the slot, the leak and shutdown failure goes away. Trying to understand whether this is a safe fix or not, I began to believe member ownSlot is not used by any shutdown code that could free the slot. I suspect this member is obsolete. I can't tell whether the reference counting for slot in CERTCertificate is still needed or not. At least I'm attaching a patch that deactivates that member, removes a couple of reference counting, found by searching for ownSlot occurences in the code. So far I did not crash with that patch, but I simply might have been lucky.
Oops, this should have been a NSS bug. Reassigning.
Assignee: ssaux → wtc
Component: Client Library → Libraries
Product: PSM → NSS
QA Contact: junruh → bishakhabanerjee
Please review whether this patch is safe. If not, we need some global code that finds all certs with ownSlot set to TRUE and derefences the slots.
This experimental patch seems to fix the list of bugs that I'm adding as dependent ones.
Depends on: 175153, 175154, 175157, 175159, 175161
Kai, Certificates are obtained from slots (tokens, properly). They must have a reference to the slot they came from, else the slot could be destroyed, leaving a dangling cert with an invalid pointer. If you look at the patch you made, you'll notice that you removed code to both reference and free the slot. As far as I can tell, there is no bug in the way certs are associated with slots, in that NSS correctly obtains a reference when the cert is created, and deletes it when the cert is destroyed. What is more likely is that there is a leak of a certificate reference, causing a leak of a slot reference. Your patch will simply mask the first leak. If the cert in question is properly destroyed (via CERT_DestroyCertificate), but the slot reference is not deleted, that would be a bug; but I don't see it. The "ownSlot" field is a member of CERTCertificate. It has no relevance to shutdown code, because the shutdown code does not free certificates. It is, however, utilized by the cert code (maybe not so much after 3.4, but I don't think it is safe to ignore it). I would suggest debugging the calls to CERT_DupCertificate and CERT_DestroyCertificate for the cert that causes the leak. You should find that the cert itself is leaked somewhere; once that leak is fixed the slot will be properly freed.
Assigned the bug to Julien. The fix for this bug may need to be merged on the NSS_3_5_BRANCH and MOZILLA_1_0_BRANCH for Mozilla 1.0.x embedding users.
Assignee: wtc → jpierre
Priority: -- → P1
Whiteboard: [3.5.1]
Target Milestone: --- → 3.7
A first partial answer: > The "ownSlot" field is a member of CERTCertificate. It has no relevance to > shutdown code, because the shutdown code does not free certificates. It is, > however, utilized by the cert code (maybe not so much after 3.4, but I don't > think it is safe to ignore it). Besides the places I have changed, there is not a single code fragment elsewhere that accesses member ownSlot.
Kai, what is the version (cvs tag) of NSS in which you found this bug?
> what is the version (cvs tag) of NSS in which you found this bug? The current NSS_CLIENT_TAG.
So the NSS version is 3.6 Beta3. When we have tracked down the leak, we should find out whether the NSS 3.5 branch has the same leak.
Version: unspecified → 3.6
I had a lengthy chat with Ian. I already understood the requirement of references and dangling pointers, I produced that patch for demonstration purposes and because I did not yet fully understand the internal details which certificate data structure references whom. I now understand better, and we created two helper debugging functions, that enable me to dump reference counts from within PSM. With that new functionality, I have begun to trace the counts during code execution. As a result, I now take this bug report back, it is invalid. I can explain why I thought it would be this function that causes the problem. If the peer cert is not added to the permanent store, it does not carry a reference to a slot. It just has a null pointer. Because of that, it has no effect whether that certificate gets ever freed or not. Only after the cert is added to the perm storage, it carries the slot reference, and it matters whether that cert is getting freed or not. And it is not. After doing some more debugging, I found two more problems. I'm only writing them in here for your info, but I will repeat the statement once I file a new patch to solve the other problems. First, I found a true leak in PSM. It looks it was my own leak... :( The leak is introduced during the SSL handshake callback. However, that is not all. So far I have found one more problem. Tracing the reference counting of certificates has shown that we run into a bad situation. After we have called NSS_Shutdown, we still arrive in certificate destroy code... The reason is that our current attempt in Mozilla, to free all NSS resources before shutting down NSS, has not yet succeed. I learned, that although we shut down the network, there is still some global object, getting destroyed after PSM, that has a reference to a HTTP channel object, which references a SSL status object, which references a certificate. I finally agree with suggestions from Darin, that PSM must take action to use weak references instead of strong references. I will introduce a global list to all currently know SSL status information objects. As soon as the requirement for a NSS restart comes up, I will iterate over the global list and change all references to NSS objects into null, and make future calls to the SSL information objects fail gracefully. I obviously can not rely on other parts of Mozilla to behave good enough for us. Besides the SSL status information objects, I'm thinking about introducing those lists and the graceful fail mode for other PSM data structures, too.
Status: NEW → RESOLVED
Closed: 23 years ago
No longer depends on: 175153, 175154, 175157, 175159, 175161
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: