Closed
Bug 176168
Opened 23 years ago
Closed 23 years ago
CERT_AddTempCertToPerm leaks
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
INVALID
3.7
People
(Reporter: KaiE, Assigned: julien.pierre)
Details
(Whiteboard: [3.5.1])
Attachments
(1 file)
3.46 KB,
patch
|
Details | Diff | Splinter Review |
I was hunting for the possible cause of bug 175153.
I am able to prove the call to CERT_AddTempCertToPerm is producing a leak and
causes NSS to fail shutting itself down.
I could track it down to the PK11_ReferenceSlot call in
fill_CERTCertificateFields in file pki3hack.c. By not increasing the refcount,
but only assigning the slot, the leak and shutdown failure goes away.
Trying to understand whether this is a safe fix or not, I began to believe
member ownSlot is not used by any shutdown code that could free the slot. I
suspect this member is obsolete.
I can't tell whether the reference counting for slot in CERTCertificate is still
needed or not.
At least I'm attaching a patch that deactivates that member, removes a couple of
reference counting, found by searching for ownSlot occurences in the code. So
far I did not crash with that patch, but I simply might have been lucky.
Reporter | ||
Comment 1•23 years ago
|
||
Oops, this should have been a NSS bug. Reassigning.
Assignee: ssaux → wtc
Component: Client Library → Libraries
Product: PSM → NSS
QA Contact: junruh → bishakhabanerjee
Reporter | ||
Comment 2•23 years ago
|
||
Please review whether this patch is safe.
If not, we need some global code that finds all certs with ownSlot set to TRUE
and derefences the slots.
Reporter | ||
Comment 3•23 years ago
|
||
This experimental patch seems to fix the list of bugs that I'm adding as
dependent ones.
Comment 4•23 years ago
|
||
Kai,
Certificates are obtained from slots (tokens, properly). They must have a
reference to the slot they came from, else the slot could be destroyed, leaving
a dangling cert with an invalid pointer.
If you look at the patch you made, you'll notice that you removed code to both
reference and free the slot. As far as I can tell, there is no bug in the way
certs are associated with slots, in that NSS correctly obtains a reference when
the cert is created, and deletes it when the cert is destroyed.
What is more likely is that there is a leak of a certificate reference, causing
a leak of a slot reference. Your patch will simply mask the first leak.
If the cert in question is properly destroyed (via CERT_DestroyCertificate), but
the slot reference is not deleted, that would be a bug; but I don't see it.
The "ownSlot" field is a member of CERTCertificate. It has no relevance to
shutdown code, because the shutdown code does not free certificates. It is,
however, utilized by the cert code (maybe not so much after 3.4, but I don't
think it is safe to ignore it).
I would suggest debugging the calls to CERT_DupCertificate and
CERT_DestroyCertificate for the cert that causes the leak. You should find that
the cert itself is leaked somewhere; once that leak is fixed the slot will be
properly freed.
Comment 5•23 years ago
|
||
Assigned the bug to Julien.
The fix for this bug may need to be merged on the
NSS_3_5_BRANCH and MOZILLA_1_0_BRANCH for Mozilla
1.0.x embedding users.
Assignee: wtc → jpierre
Priority: -- → P1
Whiteboard: [3.5.1]
Target Milestone: --- → 3.7
Reporter | ||
Comment 6•23 years ago
|
||
A first partial answer:
> The "ownSlot" field is a member of CERTCertificate. It has no relevance to
> shutdown code, because the shutdown code does not free certificates. It is,
> however, utilized by the cert code (maybe not so much after 3.4, but I don't
> think it is safe to ignore it).
Besides the places I have changed, there is not a single code fragment elsewhere
that accesses member ownSlot.
Comment 7•23 years ago
|
||
Kai, what is the version (cvs tag) of NSS in which
you found this bug?
Reporter | ||
Comment 8•23 years ago
|
||
> what is the version (cvs tag) of NSS in which you found this bug?
The current NSS_CLIENT_TAG.
Comment 9•23 years ago
|
||
So the NSS version is 3.6 Beta3.
When we have tracked down the leak, we should find
out whether the NSS 3.5 branch has the same leak.
Version: unspecified → 3.6
Reporter | ||
Comment 10•23 years ago
|
||
I had a lengthy chat with Ian. I already understood the requirement of
references and dangling pointers, I produced that patch for demonstration
purposes and because I did not yet fully understand the internal details which
certificate data structure references whom.
I now understand better, and we created two helper debugging functions, that
enable me to dump reference counts from within PSM.
With that new functionality, I have begun to trace the counts during code execution.
As a result, I now take this bug report back, it is invalid.
I can explain why I thought it would be this function that causes the problem.
If the peer cert is not added to the permanent store, it does not carry a
reference to a slot. It just has a null pointer. Because of that, it has no
effect whether that certificate gets ever freed or not. Only after the cert is
added to the perm storage, it carries the slot reference, and it matters whether
that cert is getting freed or not. And it is not.
After doing some more debugging, I found two more problems. I'm only writing
them in here for your info, but I will repeat the statement once I file a new
patch to solve the other problems.
First, I found a true leak in PSM. It looks it was my own leak... :(
The leak is introduced during the SSL handshake callback.
However, that is not all. So far I have found one more problem.
Tracing the reference counting of certificates has shown that we run into a bad
situation. After we have called NSS_Shutdown, we still arrive in certificate
destroy code...
The reason is that our current attempt in Mozilla, to free all NSS resources
before shutting down NSS, has not yet succeed. I learned, that although we shut
down the network, there is still some global object, getting destroyed after
PSM, that has a reference to a HTTP channel object, which references a SSL
status object, which references a certificate.
I finally agree with suggestions from Darin, that PSM must take action to use
weak references instead of strong references.
I will introduce a global list to all currently know SSL status information
objects. As soon as the requirement for a NSS restart comes up, I will iterate
over the global list and change all references to NSS objects into null, and
make future calls to the SSL information objects fail gracefully. I obviously
can not rely on other parts of Mozilla to behave good enough for us.
Besides the SSL status information objects, I'm thinking about introducing those
lists and the graceful fail mode for other PSM data structures, too.
You need to log in
before you can comment on or make changes to this bug.
Description
•