Closed
Bug 324103
Opened 19 years ago
Closed 19 years ago
DB reference leaks in softoken prevent cert DB from being flushed at close
Categories
(NSS :: Libraries, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.1
People
(Reporter: nelson, Assigned: nelson)
Details
Attachments
(2 files, 1 obsolete file)
715 bytes,
patch
|
rrelyea
:
review+
julien.pierre
:
superreview+
|
Details | Diff | Splinter Review |
1.20 KB,
patch
|
rrelyea
:
review+
julien.pierre
:
review+
|
Details | Diff | Splinter Review |
While investigating bug 324033, I came to an unexpected discovery, one that
probably explains some of the cert DB corruption that's been reported lately.
I found that when NSS_Shutdown calls C_Finalize, C_Finalize properly closes
the key3.db, but NOT the cert8.db. That means that the cert8.db's header
isn't always flushed to disk when the file is closed, in fact, it's rather
seldom! I spent most of the day investigating why this is, and found that
it's due to a reference leak inside softoken.
Softoken associates a reference count with each DB "handle". Each softoken objects effectively hold a reference to the DB. The DB's handle counts those
references. The ref count is initialized to one when the DB is opened.
C_Finalize doesn't close the DB unless the reference count is exactly 1.
So, the next question was: what is the source of the reference leak?
I found one source of such a leak. There may be others.
I discovered this by debugging the following simple command:
certutil -L -X -d .
The -X option forces certutil to open the cert and key DBs read-write.
After it finished listing the one cert in that DB, I found that the
key3.db file's timestamp had been updated (as expected), but cert8.db's
had not. Debugging it showed that the hash_close function was not being
called for cert8.db at all.
I found that there were two separate reference leaks for a file that had
a single cert in it. They were through two paths to the same function.
The leaked reference is obtained in function DecodeTrustEntry in pcertdb.c
at about line 4382. The function nsslowcert_DestroyTrust does not free
this reference. A single call to nssTrust_GetCERTCertTrustForCert causes
two such leaks, one through NSC_FindObjectsInit and the other through
NSC_GetAttributeValue. Here are the stacks for the acquisitions of those
two leaked references. The stacks are very similar. Non-identical lines
are marked with arrows.
DecodeTrustEntry() line 4383
FindTrustByKey() line 4756 + 17 bytes
nsslowcert_FindTrustByKey() line 4787 + 15 bytes
nsslowcert_FindTrustByIssuerAndSN() line 4943 + 16 bytes <---
sftk_searchCertsAndTrust() line 4711 + 13 bytes <---
sftk_searchTokenList() line 5004 + 45 bytes <---
NSC_FindObjectsInit() line 5062 + 29 bytes <---
find_objects() line 409 + 21 bytes <---
find_objects_by_template() line 539 + 29 bytes <--
nssToken_FindTrustForCertificate() line 1288 + 25 bytes <--
nssTrustDomain_FindTrustForCertificate() line 1217 + 34 bytes <--
nssTrust_GetCERTCertTrustForCert() line 594 + 13 bytes
fill_CERTCertificateFields() line 760 + 13 bytes
stan_GetCERTCertificate() line 813 + 17 bytes
STAN_GetCERTCertificate() line 837 + 11 bytes
PK11_TraverseCertsInSlot() line 1835 + 11 bytes
PK11_ListCertsInSlot() line 2385 + 18 bytes
listCerts() line 681 + 9 bytes
ListCerts() line 720 + 33 bytes
certutil_main() line 2829 + 94 bytes
main() line 3161 + 15 bytes
mainCRTStartup() line 338 + 17 bytes
DecodeTrustEntry() line 4382
FindTrustByKey() line 4756 + 17 bytes
nsslowcert_FindTrustByKey() line 4787 + 15 bytes
sftk_getTrust() line 370 + 16 bytes <---
sftk_FindTrustAttribute() line 1083 + 9 bytes <---
sftk_FindTokenAttribute() line 1315 + 13 bytes <---
sftk_FindAttribute() line 1343 + 22 bytes <---
NSC_GetAttributeValue() line 4183 + 22 bytes <----
nssCKObject_GetAttributes() line 154 + 22 bytes <---
nssCryptokiTrust_GetAttributes() line 586 + 29 bytes <---
nssTrust_Create() line 1004 + 35 bytes <--
nssTrustDomain_FindTrustForCertificate() line 1239 + 16 bytes <---
nssTrust_GetCERTCertTrustForCert() line 594 + 13 bytes
fill_CERTCertificateFields() line 760 + 13 bytes
stan_GetCERTCertificate() line 813 + 17 bytes
STAN_GetCERTCertificate() line 837 + 11 bytes
PK11_TraverseCertsInSlot() line 1835 + 11 bytes
PK11_ListCertsInSlot() line 2385 + 18 bytes
listCerts() line 681 + 9 bytes
ListCerts() line 720 + 33 bytes
certutil_main() line 2829 + 94 bytes
main() line 3161 + 15 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 7c816d4f()
Bob, I'm asking you to look at this. You're our Softken expert.
Assignee | ||
Comment 1•19 years ago
|
||
The failure to close the DB occurs here:
sftk_freeCertDB() line 292
sftk_DBShutdown() line 2759 + 9 bytes
SFTK_ShutdownSlot() line 2791 + 9 bytes
SFTK_DestroySlotData() line 2803 + 9 bytes
nscFreeAllSlots() line 2918 + 9 bytes
nsc_CommonFinalize() line 3065 + 15 bytes
NSC_Finalize() line 3111 + 11 bytes
SECMOD_UnloadModule() line 406 + 20 bytes
SECMOD_SlotDestroyModule() line 808 + 9 bytes
PK11_DestroySlot() line 446 + 14 bytes
PK11_FreeSlot() line 459 + 9 bytes
SECMOD_DestroyModule() line 779 + 18 bytes
SECMOD_DestroyModuleListElement() line 825 + 12 bytes
SECMOD_DestroyModuleList() line 841 + 17 bytes
SECMOD_Shutdown() line 99 + 11 bytes
NSS_Shutdown() line 601 + 5 bytes
certutil_main() line 3147 + 11 bytes
main() line 3161 + 15 bytes
mainCRTStartup() line 338 + 17 bytes
KERNEL32! 7c816d4f()
Assignee | ||
Comment 2•19 years ago
|
||
This patch would cause all.sh to become much better at detecting
DB reference leaks.
Attachment #209073 -
Flags: review?(rrelyea)
Assignee | ||
Comment 3•19 years ago
|
||
This patch changes nsslowcert_DestroyTrust to free its DB handle reference.
With this patch in place, the simple test command given above now
succesfully closes cert8.db, but I'm not sure this is right in all cases.
Bob, please review and advise. Thanks.
Attachment #209074 -
Flags: review?(rrelyea)
Comment 4•19 years ago
|
||
Comment on attachment 209073 [details] [diff] [review]
Proposed patch to detect DB reference leaks
>- if (certHandle) {
>+ if (certHandle) { PORT_Assert(certHandle->ref == 1);
> sftk_freeCertDB(certHandle);
> }
This patch uses an unusual coding style for "if"
statements.
Comment 5•19 years ago
|
||
Comment on attachment 209073 [details] [diff] [review]
Proposed patch to detect DB reference leaks
We should only assert of the slotID we are shutting down is the NETSCAPE_SLOT_ID, PRIVATE_KEY_SLOLT_ID, or FIPS_SLOT_ID.
It is possible for applications to have other slots open which are closed on the fly.
The above test will still help detect leaks in the 3 common slots (99.9% of the cases).
bob
Attachment #209073 -
Flags: review?(rrelyea) → review-
Comment 6•19 years ago
|
||
Comment on attachment 209074 [details] [diff] [review]
Proposed fix
r+ definitely check this one in.
Attachment #209074 -
Flags: review?(rrelyea) → review+
Assignee | ||
Comment 8•19 years ago
|
||
(In reply to comment #4)
> This patch uses an unusual coding style for "if" statements.
Yes, with the particular debugger I'm using, when editing and
recompiling on the fly, it helps to not insert or delete lines,
because otherwise all the breakpoints cease to be on the
intended lines.
Assignee | ||
Comment 9•19 years ago
|
||
(In reply to comment #5)
> We should only assert of the slotID we are shutting down is the
> NETSCAPE_SLOT_ID, PRIVATE_KEY_SLOLT_ID, or FIPS_SLOT_ID.
> It is possible for applications to have other slots open
Other slots in softoken?
I guess I didn't realize that that is already implemented.
> which are closed on the fly.
I don't understand that point about on-the-fly.
Presumably it is wrong to close the DBs for any slot at any
time when there are outstanding (leaked) references for that
slot's DB(s). Right?
Is it your concern that if they're closed before NSS_Shutdown
that they'll get closed again in NSS_Shutdown?
It should be true that, after they're closed, slot->certDB and
slot->keyDB will be NULL, (RIGHT ? I hope :-) in which case
there should be no danger of these assertions triggering on an
already closed slot.
> The above test will still help detect leaks in the 3 common slots
> (99.9% of the cases).
I'm willing to add slotID tests into the assertions, once I understand
why they shouldn't apply to those other slots. (Taking bug).
Assignee: rrelyea → nelson
Assignee | ||
Comment 10•19 years ago
|
||
Comment on attachment 209074 [details] [diff] [review]
Proposed fix
Julien, please SR for checkin on the 3.11 branch.
Attachment #209074 -
Flags: superreview?(julien.pierre.bugs)
Assignee | ||
Comment 11•19 years ago
|
||
Comment on attachment 209074 [details] [diff] [review]
Proposed fix
Checked in on trunk.
Checking in pcertdb.c; new revision: 1.56; previous revision: 1.55
Awaiting SR for the 3.11 branch.
Attachment #209074 -
Attachment description: Proposed fix (?) → Proposed fix
Assignee | ||
Comment 12•19 years ago
|
||
Bob, is this what you had in mind?
Attachment #209073 -
Attachment is obsolete: true
Attachment #209175 -
Flags: review?(rrelyea)
Comment 13•19 years ago
|
||
Comment on attachment 209074 [details] [diff] [review]
Proposed fix
Looks fine. zero'ing trust->handle after the call to sftk_freeCertDB would be a nice extra.
Attachment #209074 -
Flags: superreview?(julien.pierre.bugs) → superreview+
Comment 14•19 years ago
|
||
Comment on attachment 209175 [details] [diff] [review]
Detect DB reference leaks, v2
yes, r+ relyea
Alternately you can check for >= SFTK_MIN_USER_SLOT
Attachment #209175 -
Flags: review?(rrelyea) → review+
Assignee | ||
Comment 15•19 years ago
|
||
Thanks for the reviews.
Julien, re: comment 13, Notice that the entire trust structure is memset to zero
right after this.
Bob, Please read comment 9 and then explain why this test is inappropriate for DBs in "User slots".
Status: NEW → ASSIGNED
Assignee | ||
Comment 16•19 years ago
|
||
Proposed fix committed on 3.11 branch
Checking in pcertdb.c; new revision: 1.53.2.2; previous revision: 1.53.2.1
Assignee | ||
Comment 17•19 years ago
|
||
Comment on attachment 209175 [details] [diff] [review]
Detect DB reference leaks, v2
Checked in patch to detect leaks at shutdown. On trunk.
Checking in pkcs11.c; new revision: 1.113; previous revision: 1.112
Assignee | ||
Updated•19 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
OS: Windows XP → All
Priority: -- → P2
Resolution: --- → FIXED
Assignee | ||
Comment 18•19 years ago
|
||
Comment on attachment 209175 [details] [diff] [review]
Detect DB reference leaks, v2
This bug got fixed on the trunk, but not on the branch.
julien, please review for the 3.11 branch.
Attachment #209175 -
Flags: review?(julien.pierre.bugs)
Assignee | ||
Comment 19•19 years ago
|
||
reopening until last patch get checked in on branch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•19 years ago
|
Attachment #209175 -
Flags: review?(julien.pierre.bugs) → review+
Assignee | ||
Comment 20•19 years ago
|
||
Last patch checked in on 3.11 branch.
Checking in pkcs11.c; new revision: 1.112.2.2; previous revision: 1.112.2.1
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•