Closed
Bug 167615
Opened 22 years ago
Closed 22 years ago
crash in pk11_FindCrlAttribute
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.6
People
(Reporter: julien.pierre, Assigned: julien.pierre)
References
Details
Attachments
(7 files, 7 obsolete files)
1.00 KB,
patch
|
Details | Diff | Splinter Review | |
1.17 KB,
patch
|
Details | Diff | Splinter Review | |
81 bytes,
text/plain
|
Details | |
83.67 KB,
application/octet-stream
|
Details | |
cert database containing a copy of the CRL, required by batch. Place in mozilla/dist/$OBJDIR/bin/sav
108.00 KB,
application/octet-stream
|
Details | |
794 bytes,
patch
|
Details | Diff | Splinter Review | |
3.15 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
1. download the AOL CRL from http://certificates.netscape.com/getCRL?issuepoint=MasterCRL&op=importCRL and install it into Mozilla / Netscape 7 2. Send encrypted email to an employee 3. witness the following crash : pk11_FindCrlAttribute(PK11TokenObjectStr * 0x027cf780, unsigned long 0x00000011) line 1004 + 5 bytes pk11_FindTokenAttribute(PK11TokenObjectStr * 0x027cf780, unsigned long 0x00000011) line 1111 + 13 bytes pk11_FindAttribute(PK11ObjectStr * 0x027cf780, unsigned long 0x00000011) line 1141 + 22 bytes pk11_objectMatch(PK11ObjectStr * 0x027cf780, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 2340 + 22 bytes pk11_tokenMatch(PK11SlotStr * 0x027a0198, SECItemStr * 0x0012da40, unsigned long 0x50000000, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 2752 + 17 bytes pk11_crl_collect(SECItemStr * 0x0012da58, SECItemStr * 0x0012da40, int 0x00000004, void * 0x0012da7c) line 3585 + 31 bytes nsslowcert_TraverseDBEntries(NSSLOWCERTCertDBHandleStr * 0x027f9798, int 0x00000004, int (SECItemStr *, SECItemStr *, int, void *)* 0x0392a769 pk11_crl_collect(SECItemStr *, SECItemStr *, int, void *), void * 0x0012da7c) line 4006 + 19 bytes pk11_searchCrls(PK11SlotStr * 0x027a0198, SECItemStr * 0x0012dadc, int 0x00000000, unsigned long 0x0000000d, PK11SearchResultsStr * 0x02817d00, CK_ATTRIBUTE * 0x0012dbd4, unsigned long 0x00000001) line 3621 + 20 bytes pk11_searchTokenList(PK11SlotStr * 0x027a0198, PK11SearchResultsStr * 0x02817d00, CK_ATTRIBUTE * 0x0012dbd4, long 0x00000001, int * 0x0012db78, int 0x00000000) line 4261 + 33 bytes NSC_FindObjectsInit(unsigned long 0x01000001, CK_ATTRIBUTE * 0x0012dbd4, unsigned long 0x00000001) line 4306 + 29 bytes pk11_FindObjectByTemplate(PK11SlotInfoStr * 0x02b24fd8, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 176 + 23 bytes pk11_getcerthandle(PK11SlotInfoStr * 0x02b24fd8, CERTCertificateStr * 0x02b60988, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 1829 + 17 bytes PK11_FindObjectForCert(CERTCertificateStr * 0x02b60988, void * 0x02804d40, PK11SlotInfoStr * * 0x0012dc08) line 2447 + 25 bytes PK11_FindKeyByAnyCert(CERTCertificateStr * 0x02b60988, void * 0x02804d40) line 2476 + 17 bytes NSS_CMSSignerInfo_Sign(NSSCMSSignerInfoStr * 0x02aa8ea8, SECItemStr * 0x02acfba8, SECItemStr * 0x038dc7c4) line 140 + 19 bytes NSS_CMSSignedData_Encode_AfterData(NSSCMSSignedDataStr * 0x02aa8df0) line 271 + 26 bytes nss_cms_after_data(NSSCMSEncoderContextStr * 0x02aa11a0) line 377 + 12 bytes nss_cms_encoder_notify(void * 0x02aa11a0, int 0x00000000, void * 0x02aa8e2c, int 0x00000004) line 208 + 9 bytes sec_asn1e_notify_after(sec_EncoderContext_struct * 0x02b5e8c0, void * 0x02aa8e2c, int 0x00000004) line 180 + 23 bytes sec_asn1e_next_in_sequence(sec_asn1e_state_struct * 0x02b5eaa8) line 1131 + 25 bytes SEC_ASN1EncoderUpdate(sec_EncoderContext_struct * 0x02b5e8c0, const char * 0x00000000, unsigned long 0x00000000) line 1213 + 9 bytes NSS_CMSEncoder_Finish(NSSCMSEncoderContextStr * 0x02aa11a0) line 731 + 15 bytes PIPNSS! 60a6a6d2() MSGSMIME! 609620bb() MSGSMIME! 60961b27() MSGCOMPO! 60843830() MSGCOMPO! 60846afe() MSGCOMPO! 60847308() MSGCOMPO! 608485fd() MSGCOMPO! 60837e04() MSGCOMPO! 6083825a() XPCOM! 611774d0() XPC3250! 60d5239d() XPC3250! 60d5591c() JS3250! 60e4a7f8() JS3250! 60e4f898() JS3250! 60e4a835() XPC3250! 60d4fdc6() XPC3250! 60d4e745() XPCOM! 611769a7() XPCOM! 611774d0() XPC3250! 60d5239d() XPC3250! 60d5591c() JS3250! 60e4a7f8() JS3250! 60e4f898() JS3250! 60e4a835() JS3250! 60e4aacf() JS3250! 60e34957() JSDOM! 606f3ce6() JSDOM! 6070e502() GKCONTENT! 60308df4() GKCONTENT! 6030a0b5() GKCONTENT! 602ced32() GKLAYOUT! 603d6ea0() GKLAYOUT! 60434966() GKLAYOUT! 60434799() GKLAYOUT! 603d6e16() GKLAYOUT! 603d6ceb() GKCONTENT! 6030e492() GKCONTENT! 6030d248() GKLAYOUT! 603d6e3e() GKLAYOUT! 603d6c5d() GKVIEW! 60547853() GKVIEW! 6054130d() GKVIEW! 60541b5d() GKWIDGET! 60554792() GKWIDGET! 60557d86() GKWIDGET! 60558176() GKWIDGET! 60554cae() USER32! 77e11b60() USER32! 77e11cca() USER32! 77e183f1() APPSHELL! 600d7232() NETSCP! 00401bdf() NETSCP! 004038b6() KERNEL32!
Comment 1•22 years ago
|
||
I can't reproduce this crash. Julien, could you look into this?
Assignee: wtc → jpierre
Priority: -- → P1
Target Milestone: --- → 3.6
Assignee | ||
Comment 2•22 years ago
|
||
Try also grabbing the two CRLs from https://www.thawte.com/cgi/lifecycle/roots.exe . I also had them in my database. I assumed that the AOL CRL was the problem because the crash occurred when verifying signatures of AOL employees. But in fact, the crash also happens when going to "Manage CRLs". The problem is in fact triggered by the larger Thawte CRLs which use the blob support and get created in the cert7.dir subdirectory. If I manually delete the content of that directory, the CRLs go away as well as the crash.
Assignee | ||
Comment 3•22 years ago
|
||
Some more information I found out about the test case : The problem happens only in the situation where you already had those large CRLs in your database before NSS 3.6 - ie. within cert7.db rather than as separate files in cert7.dir . Here is the full test case : 1. start with Netscape 7 and NSS 3.5 DLLs 2. create a new profile 3. grab the two CRLs from https://www.thawte.com/cgi/lifecycle/roots.exe 4. check that edit/preferences/privacy and security/validation/manage CRLs works 5. exit 6. copy the NSS 3.6 DLLs over 7. restart Netscape 7 8. go to edit/preferences/privacy and security/validation/manage and delete both CRLs 9. grab the two CRLs from https://www.thawte.com/cgi/lifecycle/roots.exe 10. go to edit/preferences/privacy and security/validation/manage CRLs : it crashes
Assignee | ||
Comment 4•22 years ago
|
||
I am having a difficult time with this one. I have been able to identify the specific CRL with which the problem occurs. It is only the second of the Thawte CRLs - the one called "Thawte Server CA, CRL". That CRL is only 85 KB. The other one for the Freemail CA, which is 340 KB, doesn't cause the problem. I have been trying to create a test case that didn't involve Mozilla, but so far I haven't succeeded. I believe the corruption of the database occurs either during the deletion of the old CRL, or during the import of the new one. What I see from the stack before the crash is that pk11_getCrl returned NULL. This is because of a bad database error being reported from the lower level.
Assignee | ||
Comment 5•22 years ago
|
||
I found that if I change my test case at step 9 and import the CRL from the local disk instead of the web site, the crash does not happen at step 10 (opening "Manage CRLs"). However, the program exits if I then try to delete that newly-imported CRL. There is no exception / crash at that point. This tends to suggest that something is getting corrupt in memory within the Netscape process. This is further confirmed by the fact that using the same database, I am unable to reproduce the crash with our command-line tools. I think I need to purify this code to find the problem. I don't have the proper licenses currently (temporary one that expired) so I will have to take care of this tomorrow.
Assignee | ||
Comment 6•22 years ago
|
||
I have confirmed that this problem is related to the blob code. If I comment the following lines in dbs_put : if (data->size > DBS_MAX_ENTRY_SIZE) { dbs_mkBlob(key,data,&blob); ret = dbs_writeBlob(dbsp->blobdir, dbsp->mode, &blob, data); data = &blob; } Then the blob support is effectively disabled (no blobs ever get created) and I don't see the crash. However, I haven't been able to find out what's wrong in the shim that ends up causing the crash.
Comment 7•22 years ago
|
||
Assignee | ||
Comment 8•22 years ago
|
||
FYI, I have reproduced the crash on two platforms - the same problem occurs on OS/2. I'm trying to narrow it down further in the debugger. One of the symptoms I see is that nsslowcert_FindCrlByKey is called with a crlKey that has a length of 0xC8 when using the regular DBM code. This happens when I click on "manage CRLs" which enumerates all CRLs. With my test case, after the CRL has been deleted from the DB and added as a blob, when I click "manage CRLs" to display, the function gets called with a crlKey that has a length of 0xC9 . In other words, the database key is different. It is getting corrupt somehow, but I still don't know how.
Assignee | ||
Comment 9•22 years ago
|
||
I have finally been able to create a test case for this problem that does not involve the browser. What I did was modify crlutil in the following ways : 1) I added a way to delete CRLs when the issuer cert wasn't in cert7.db . The method for doing that is the same as PSM - calling SEC_LookupCrls, then SEC_FindCrlByName , and then SEC_DeletePermCrl . The call to SEC_FindCrlByName is necessary because the objects returned by SEC_LookupCrls don't have a slot pointer which causes the deletion to fail 2) I added a testing option that reproduces the sequence of steps that I had in the browser. The test case starts with a database created with a previous version of NSS that contains the 80KB thawte CRL. The test code I added in crlutil does the following : 1) delete the CRL 2) import the CRL (which writes a blob) 3) list the CRLs When I call this test code, I can reproduce the exact same crash on step 3 as in the browser However, I have also created a batch file to do the same thing, but calling crlutil 3 times - once for each of the 3 steps. In that case, the crash does not occur ! This suggests that something is being corrupt in memory. I still haven't got a chance to purify, but it should be possible now with the smaller test case outside of the browser.
Assignee | ||
Comment 10•22 years ago
|
||
FYI, Purify didn't find any error with my test case. I think this is a problem in the dbm code.
Assignee | ||
Comment 11•22 years ago
|
||
I just ran another test, with the blob code disabled. 1. I started with an empty cert database. 2. I imported the 85 KB thawte CRL with crlutil. 3. I deleted the CRL with crlutil 4. went back to step 2, in a loop At each iteration after the initial addition, the cert database grew by 4 KB. Then, ultimately, it reached a peak size of 417,792 bytes. At that point, even if I tried to import the CRL into it, it wouldn't show anymore with crlutil -L. Therefore, there is a pre-existing problem in DBM. I repeated the test with the Netscape CRL, which is below 64 KB. The same problem occurred, at a different peak size for cert7.db (229,376 bytes). When I use a 470 or a 2524 bytes CRL, I don't get the problem. The cert7.db stays at 16,384 bytes the entire time. I don't have any CRL in between 2524 and 29313 bytes (Netscape CRL) so I couldn't test any other case to find out what exact threshold is the problem. This problmem may or may not be directly related to what we are seeing right now with the blob code, but I think there is a good chance that it is.
Comment 12•22 years ago
|
||
That would be my suspicion as well. The problem seems to be inherent in storing large objects in the database, and these all look like symptoms of the problem we see with very large objects. It seems we have 3 options: 1) re-enable the blob code and ask people not to store large objects in older versions of netscape: This would be bad for people who already stored these objects, but at this point they are already in danger. 2) try to fix dbm. The multi-block write code is one of the most complicated parts of the DBM code, and seems to have some nasty address limits and funny branching code. In the end we still have pretty terrible performance with large objects. 3) Some combination of 1 & 2. This has the performance advantages of 1, but still has all the work that 2 entails. bob
Assignee | ||
Comment 13•22 years ago
|
||
Assignee | ||
Comment 14•22 years ago
|
||
Assignee | ||
Comment 15•22 years ago
|
||
Assignee | ||
Comment 16•22 years ago
|
||
Assignee | ||
Comment 17•22 years ago
|
||
I have added everything to this defect that is needed to reproduce this problem : - a patch to rollback to the blob support - a copy of a database containing the CRL (created with a previous version of NSS) - a batch file tha calls crlutil with the -T option that deletes and adds back the CRL, causing the crash I'm reassigning the bug to Bob as I don't feel like there I can resolve this with my current limited knowledge of the DBM code.
Assignee: jpierre → relyea
Assignee | ||
Comment 18•22 years ago
|
||
Assignee | ||
Comment 19•22 years ago
|
||
Assignee | ||
Comment 20•22 years ago
|
||
I have attached two patches that can avoid the crash even when the blob code is enabled and the database gets corrupt (of course, the CRL is not found with the blob code still, but at least we don't crash). - One crash was in softoken, trying to dereference the CRL object that wasn't found in the database ater hte corruption. - The other crash was in pk11wrap, trying to dereference the NULL CRL object returned from softoken. We may want to do a third patch to ensure that softoken never even returns such NULL CRL objects to the upper layer, but I'm not sure how to do that.
Comment 21•22 years ago
|
||
Assigned the bug to Julien. Bob, Nelson please review the two patches Julien just attached.
Assignee: relyea → jpierre
Comment 22•22 years ago
|
||
Review comments: 1. I'd find the patch to pk11_CollectCrls() clearer if it read as follows: if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue)) new_node->type = SEC_KRL_TYPE; else new_node->type = SEC_CRL_TYPE; 2. Function pk11_FindCrlAttribute() returns NULL in two places, but does not call PORT_SetError in either place. This patch adds a third place where it will return NULL. IMO, any time this function returns NULL, the error code should have been explicitly set, either by this function, or by a function it has just called. I'd like this fix to address that, also.
Assignee | ||
Comment 23•22 years ago
|
||
Nelson, 1. I agree this is more readable 2. In the case I added where NULL is returned, the NSPR error has already been set at a lower level, namely SEC_BAD_DATABASE_ERROR . So I don't think I should call PR_SetError. As far as the two other cases : a. the "return NULL" in the default case of the first switch statement should set an invalid attribute error. There is a PKCS#11 error defined for it - CKR_ATTRIBUTE_TYPE_INVALID, but there is no way to return it in that code path. I'm not aware of an NSPR error to match this. There aren't any NSPR errors for PKCS#11 defined in secerr.h . Perhaps I should just use a generic SEC_ERROR_INVALID_ARGS ? b. the "return NULL" at the end of the function can never be reached, that case currently gets caught by the default case in a).
Assignee | ||
Comment 24•22 years ago
|
||
Attachment #100635 -
Attachment is obsolete: true
Assignee | ||
Comment 25•22 years ago
|
||
Attachment #100636 -
Attachment is obsolete: true
Comment 26•22 years ago
|
||
Your updated patch id=100655 looks very very different from the previous version of that patch. Perhaps it contains some debugging code, or other changes that are irrelevant to this bug?
Assignee | ||
Comment 27•22 years ago
|
||
oops, sorry. I did a cvs diff in my softoken directory; the other files weren't intended to be part of this patch.
Assignee | ||
Comment 28•22 years ago
|
||
don't include irrelevant files as part of the patch
Attachment #100655 -
Attachment is obsolete: true
Comment 29•22 years ago
|
||
r=nelsonb for patches 100659 and 100670.
Comment 30•22 years ago
|
||
Comment on attachment 100659 [details] [diff] [review] 100636: patch to pk11wrap to gracefully handle NULL CRLs returned from token instead of crashing >Index: pk11cert.c [...] >- new_node->type = *((CK_BBOOL *)fetchCrl[1].pValue) ? >- SEC_KRL_TYPE : SEC_CRL_TYPE; >+ if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue)) >+ new_node->type = SEC_KRL_TYPE; >+ else >+ new_node->type = SEC_CRL_TYPE; >+ This patch assumes that if fetchCrl[1].pValue is NULL, it is a CRL. But it seems that a NULL fetchCrl[1].pValue is an error. Why don't we return a failure status if fetchCrl[1].pValue is NULL? That is, if (!fetchCrl[1].pValue) { PORT_SetError(...); goto loser; } new_node->type = *((CK_BBOOL *)fetchCrl[1].pValue) ? SEC_KRL_TYPE
Assignee | ||
Comment 31•22 years ago
|
||
Attachment #100659 -
Attachment is obsolete: true
Assignee | ||
Comment 32•22 years ago
|
||
PK11_GetAttributes returns a CK_RV, rather than a SECStatus . Yet we check for "rv == SECFailure", calling it in pk11_collectCrls : rv = PK11_GetAttributes(head->arena,slot,crlID,fetchCrl,fetchCrlSize); if (rv == SECFailure) { goto loser; } In this case, rv = 18 , or CKR_ATTRIBUTE_TYPE_INVALID, but we don't catch it. I just wish we were using a C++ compiler to compile the NSS code, even if it remains C code. It would have caught this error. As far as the other suggestion I had, which was to have softoken not return CRL objects that were corrupt, this isn't really possible, because we don't know about the corruption until we try to fetch the specific attribute (value). So this is probably as good as the patches are going to get.
Assignee | ||
Comment 33•22 years ago
|
||
Attachment #100807 -
Attachment is obsolete: true
Updated•22 years ago
|
Attachment #100809 -
Flags: needs-work+
Comment 34•22 years ago
|
||
Comment on attachment 100809 [details] [diff] [review] fix using correct types Good catch, Juline. Two suggested changes. 1. A stylistic nit: Relyea uses 'crv' for the return value of the CK_RV type. I suggest you change 'attr_rv' to 'crv'. 2. >- if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue)) >+ if (!fetchCrl[1].pValue) { >+ rv = SECFailure; >+ goto loser; >+ } You should call PORT_SetError() to set an error code.
Assignee | ||
Comment 35•22 years ago
|
||
Wan-Teh, I'm not sure I agree about setting the error code. If it comes from softoken, then the error is already set in this case - SEC_ERROR_BAD_DATABASE. That information would be lost if I set it. On the other hand, if it comes from a third party module, there wouldn't be a PR error set. It seems that in many other places in the wrapper, the PR error code is set to PK11_MapError(crv), so I'm going to do that here too.
Assignee | ||
Comment 36•22 years ago
|
||
This patch also contains very minor fixes for warnings when I compiled this file as C++.
Attachment #100809 -
Attachment is obsolete: true
Comment 37•22 years ago
|
||
The new patch looks corrrect for the error case out of PK11_GetAttributes() (using PK11_MapError() is the correct thing to do there). Wan-teh's comment specifically pointed out the error case where fetchCRL[1].pValue was NULL. In that case we need to do a PORT_SetError() (probably bad DAtabase, or create a new error of a BAD CRL) for the case where PK11_GetAttributes returned fine, but the object had a NULL pValue; bob
Assignee | ||
Comment 38•22 years ago
|
||
Set the error to SEC_ERROR_CRL_INVALID for the case where crv is CKR_OK but pValue is NULL . I also moved this test up before the allocation which isn't needed if this fails.
Attachment #100828 -
Attachment is obsolete: true
Comment 39•22 years ago
|
||
Comment on attachment 100945 [details] [diff] [review] updated patch r=wtc.
Attachment #100945 -
Flags: review+
Assignee | ||
Comment 40•22 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•