Closed Bug 324103 Opened 19 years ago Closed 19 years ago

DB reference leaks in softoken prevent cert DB from being flushed at close

Categories

(NSS :: Libraries, defect, P2)

3.11
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.11.1

People

(Reporter: nelson, Assigned: nelson)

Details

Attachments

(2 files, 1 obsolete file)

While investigating bug 324033, I came to an unexpected discovery, one that probably explains some of the cert DB corruption that's been reported lately. I found that when NSS_Shutdown calls C_Finalize, C_Finalize properly closes the key3.db, but NOT the cert8.db. That means that the cert8.db's header isn't always flushed to disk when the file is closed, in fact, it's rather seldom! I spent most of the day investigating why this is, and found that it's due to a reference leak inside softoken. Softoken associates a reference count with each DB "handle". Each softoken objects effectively hold a reference to the DB. The DB's handle counts those references. The ref count is initialized to one when the DB is opened. C_Finalize doesn't close the DB unless the reference count is exactly 1. So, the next question was: what is the source of the reference leak? I found one source of such a leak. There may be others. I discovered this by debugging the following simple command: certutil -L -X -d . The -X option forces certutil to open the cert and key DBs read-write. After it finished listing the one cert in that DB, I found that the key3.db file's timestamp had been updated (as expected), but cert8.db's had not. Debugging it showed that the hash_close function was not being called for cert8.db at all. I found that there were two separate reference leaks for a file that had a single cert in it. They were through two paths to the same function. The leaked reference is obtained in function DecodeTrustEntry in pcertdb.c at about line 4382. The function nsslowcert_DestroyTrust does not free this reference. A single call to nssTrust_GetCERTCertTrustForCert causes two such leaks, one through NSC_FindObjectsInit and the other through NSC_GetAttributeValue. Here are the stacks for the acquisitions of those two leaked references. The stacks are very similar. Non-identical lines are marked with arrows. DecodeTrustEntry() line 4383 FindTrustByKey() line 4756 + 17 bytes nsslowcert_FindTrustByKey() line 4787 + 15 bytes nsslowcert_FindTrustByIssuerAndSN() line 4943 + 16 bytes <--- sftk_searchCertsAndTrust() line 4711 + 13 bytes <--- sftk_searchTokenList() line 5004 + 45 bytes <--- NSC_FindObjectsInit() line 5062 + 29 bytes <--- find_objects() line 409 + 21 bytes <--- find_objects_by_template() line 539 + 29 bytes <-- nssToken_FindTrustForCertificate() line 1288 + 25 bytes <-- nssTrustDomain_FindTrustForCertificate() line 1217 + 34 bytes <-- nssTrust_GetCERTCertTrustForCert() line 594 + 13 bytes fill_CERTCertificateFields() line 760 + 13 bytes stan_GetCERTCertificate() line 813 + 17 bytes STAN_GetCERTCertificate() line 837 + 11 bytes PK11_TraverseCertsInSlot() line 1835 + 11 bytes PK11_ListCertsInSlot() line 2385 + 18 bytes listCerts() line 681 + 9 bytes ListCerts() line 720 + 33 bytes certutil_main() line 2829 + 94 bytes main() line 3161 + 15 bytes mainCRTStartup() line 338 + 17 bytes DecodeTrustEntry() line 4382 FindTrustByKey() line 4756 + 17 bytes nsslowcert_FindTrustByKey() line 4787 + 15 bytes sftk_getTrust() line 370 + 16 bytes <--- sftk_FindTrustAttribute() line 1083 + 9 bytes <--- sftk_FindTokenAttribute() line 1315 + 13 bytes <--- sftk_FindAttribute() line 1343 + 22 bytes <--- NSC_GetAttributeValue() line 4183 + 22 bytes <---- nssCKObject_GetAttributes() line 154 + 22 bytes <--- nssCryptokiTrust_GetAttributes() line 586 + 29 bytes <--- nssTrust_Create() line 1004 + 35 bytes <-- nssTrustDomain_FindTrustForCertificate() line 1239 + 16 bytes <--- nssTrust_GetCERTCertTrustForCert() line 594 + 13 bytes fill_CERTCertificateFields() line 760 + 13 bytes stan_GetCERTCertificate() line 813 + 17 bytes STAN_GetCERTCertificate() line 837 + 11 bytes PK11_TraverseCertsInSlot() line 1835 + 11 bytes PK11_ListCertsInSlot() line 2385 + 18 bytes listCerts() line 681 + 9 bytes ListCerts() line 720 + 33 bytes certutil_main() line 2829 + 94 bytes main() line 3161 + 15 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 7c816d4f() Bob, I'm asking you to look at this. You're our Softken expert.
The failure to close the DB occurs here: sftk_freeCertDB() line 292 sftk_DBShutdown() line 2759 + 9 bytes SFTK_ShutdownSlot() line 2791 + 9 bytes SFTK_DestroySlotData() line 2803 + 9 bytes nscFreeAllSlots() line 2918 + 9 bytes nsc_CommonFinalize() line 3065 + 15 bytes NSC_Finalize() line 3111 + 11 bytes SECMOD_UnloadModule() line 406 + 20 bytes SECMOD_SlotDestroyModule() line 808 + 9 bytes PK11_DestroySlot() line 446 + 14 bytes PK11_FreeSlot() line 459 + 9 bytes SECMOD_DestroyModule() line 779 + 18 bytes SECMOD_DestroyModuleListElement() line 825 + 12 bytes SECMOD_DestroyModuleList() line 841 + 17 bytes SECMOD_Shutdown() line 99 + 11 bytes NSS_Shutdown() line 601 + 5 bytes certutil_main() line 3147 + 11 bytes main() line 3161 + 15 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 7c816d4f()
This patch would cause all.sh to become much better at detecting DB reference leaks.
Attachment #209073 - Flags: review?(rrelyea)
Attached patch Proposed fix Splinter Review
This patch changes nsslowcert_DestroyTrust to free its DB handle reference. With this patch in place, the simple test command given above now succesfully closes cert8.db, but I'm not sure this is right in all cases. Bob, please review and advise. Thanks.
Attachment #209074 - Flags: review?(rrelyea)
Comment on attachment 209073 [details] [diff] [review] Proposed patch to detect DB reference leaks >- if (certHandle) { >+ if (certHandle) { PORT_Assert(certHandle->ref == 1); > sftk_freeCertDB(certHandle); > } This patch uses an unusual coding style for "if" statements.
Comment on attachment 209073 [details] [diff] [review] Proposed patch to detect DB reference leaks We should only assert of the slotID we are shutting down is the NETSCAPE_SLOT_ID, PRIVATE_KEY_SLOLT_ID, or FIPS_SLOT_ID. It is possible for applications to have other slots open which are closed on the fly. The above test will still help detect leaks in the 3 common slots (99.9% of the cases). bob
Attachment #209073 - Flags: review?(rrelyea) → review-
Comment on attachment 209074 [details] [diff] [review] Proposed fix r+ definitely check this one in.
Attachment #209074 - Flags: review?(rrelyea) → review+
Fix should go into 3.11 as well.
Target Milestone: --- → 3.11.1
(In reply to comment #4) > This patch uses an unusual coding style for "if" statements. Yes, with the particular debugger I'm using, when editing and recompiling on the fly, it helps to not insert or delete lines, because otherwise all the breakpoints cease to be on the intended lines.
(In reply to comment #5) > We should only assert of the slotID we are shutting down is the > NETSCAPE_SLOT_ID, PRIVATE_KEY_SLOLT_ID, or FIPS_SLOT_ID. > It is possible for applications to have other slots open Other slots in softoken? I guess I didn't realize that that is already implemented. > which are closed on the fly. I don't understand that point about on-the-fly. Presumably it is wrong to close the DBs for any slot at any time when there are outstanding (leaked) references for that slot's DB(s). Right? Is it your concern that if they're closed before NSS_Shutdown that they'll get closed again in NSS_Shutdown? It should be true that, after they're closed, slot->certDB and slot->keyDB will be NULL, (RIGHT ? I hope :-) in which case there should be no danger of these assertions triggering on an already closed slot. > The above test will still help detect leaks in the 3 common slots > (99.9% of the cases). I'm willing to add slotID tests into the assertions, once I understand why they shouldn't apply to those other slots. (Taking bug).
Assignee: rrelyea → nelson
Comment on attachment 209074 [details] [diff] [review] Proposed fix Julien, please SR for checkin on the 3.11 branch.
Attachment #209074 - Flags: superreview?(julien.pierre.bugs)
Comment on attachment 209074 [details] [diff] [review] Proposed fix Checked in on trunk. Checking in pcertdb.c; new revision: 1.56; previous revision: 1.55 Awaiting SR for the 3.11 branch.
Attachment #209074 - Attachment description: Proposed fix (?) → Proposed fix
Bob, is this what you had in mind?
Attachment #209073 - Attachment is obsolete: true
Attachment #209175 - Flags: review?(rrelyea)
Comment on attachment 209074 [details] [diff] [review] Proposed fix Looks fine. zero'ing trust->handle after the call to sftk_freeCertDB would be a nice extra.
Attachment #209074 - Flags: superreview?(julien.pierre.bugs) → superreview+
Comment on attachment 209175 [details] [diff] [review] Detect DB reference leaks, v2 yes, r+ relyea Alternately you can check for >= SFTK_MIN_USER_SLOT
Attachment #209175 - Flags: review?(rrelyea) → review+
Thanks for the reviews. Julien, re: comment 13, Notice that the entire trust structure is memset to zero right after this. Bob, Please read comment 9 and then explain why this test is inappropriate for DBs in "User slots".
Status: NEW → ASSIGNED
Proposed fix committed on 3.11 branch Checking in pcertdb.c; new revision: 1.53.2.2; previous revision: 1.53.2.1
Comment on attachment 209175 [details] [diff] [review] Detect DB reference leaks, v2 Checked in patch to detect leaks at shutdown. On trunk. Checking in pkcs11.c; new revision: 1.113; previous revision: 1.112
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
OS: Windows XP → All
Priority: -- → P2
Resolution: --- → FIXED
Comment on attachment 209175 [details] [diff] [review] Detect DB reference leaks, v2 This bug got fixed on the trunk, but not on the branch. julien, please review for the 3.11 branch.
Attachment #209175 - Flags: review?(julien.pierre.bugs)
reopening until last patch get checked in on branch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #209175 - Flags: review?(julien.pierre.bugs) → review+
Last patch checked in on 3.11 branch. Checking in pkcs11.c; new revision: 1.112.2.2; previous revision: 1.112.2.1
Status: REOPENED → RESOLVED
Closed: 19 years ago19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: