crash in pk11_FindCrlAttribute

RESOLVED FIXED in 3.6

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
16 years ago
16 years ago

People

(Reporter: Julien Pierre, Assigned: Julien Pierre)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 7 obsolete attachments)

1.00 KB, patch
Details | Diff | Splinter Review
1.17 KB, patch
Details | Diff | Splinter Review
81 bytes, text/plain
Details
83.67 KB, application/octet-stream
Details
108.00 KB, application/octet-stream
Details
794 bytes, patch
Details | Diff | Splinter Review
3.15 KB, patch
Wan-Teh Chang
: review+
Details | Diff | Splinter Review
(Assignee)

Description

16 years ago
1. download the AOL CRL from
http://certificates.netscape.com/getCRL?issuepoint=MasterCRL&op=importCRL and
install it into Mozilla / Netscape 7

2. Send encrypted email to an employee

3. witness the following crash :

pk11_FindCrlAttribute(PK11TokenObjectStr * 0x027cf780, unsigned long 0x00000011)
line 1004 + 5 bytes
pk11_FindTokenAttribute(PK11TokenObjectStr * 0x027cf780, unsigned long
0x00000011) line 1111 + 13 bytes
pk11_FindAttribute(PK11ObjectStr * 0x027cf780, unsigned long 0x00000011) line
1141 + 22 bytes
pk11_objectMatch(PK11ObjectStr * 0x027cf780, CK_ATTRIBUTE * 0x0012dbd4, int
0x00000001) line 2340 + 22 bytes
pk11_tokenMatch(PK11SlotStr * 0x027a0198, SECItemStr * 0x0012da40, unsigned long
0x50000000, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 2752 + 17 bytes
pk11_crl_collect(SECItemStr * 0x0012da58, SECItemStr * 0x0012da40, int
0x00000004, void * 0x0012da7c) line 3585 + 31 bytes
nsslowcert_TraverseDBEntries(NSSLOWCERTCertDBHandleStr * 0x027f9798, int
0x00000004, int (SECItemStr *, SECItemStr *, int, void *)* 0x0392a769
pk11_crl_collect(SECItemStr *, SECItemStr *, int, void *), void * 0x0012da7c)
line 4006 + 19 bytes
pk11_searchCrls(PK11SlotStr * 0x027a0198, SECItemStr * 0x0012dadc, int
0x00000000, unsigned long 0x0000000d, PK11SearchResultsStr * 0x02817d00,
CK_ATTRIBUTE * 0x0012dbd4, unsigned long 0x00000001) line 3621 + 20 bytes
pk11_searchTokenList(PK11SlotStr * 0x027a0198, PK11SearchResultsStr *
0x02817d00, CK_ATTRIBUTE * 0x0012dbd4, long 0x00000001, int * 0x0012db78, int
0x00000000) line 4261 + 33 bytes
NSC_FindObjectsInit(unsigned long 0x01000001, CK_ATTRIBUTE * 0x0012dbd4,
unsigned long 0x00000001) line 4306 + 29 bytes
pk11_FindObjectByTemplate(PK11SlotInfoStr * 0x02b24fd8, CK_ATTRIBUTE *
0x0012dbd4, int 0x00000001) line 176 + 23 bytes
pk11_getcerthandle(PK11SlotInfoStr * 0x02b24fd8, CERTCertificateStr *
0x02b60988, CK_ATTRIBUTE * 0x0012dbd4, int 0x00000001) line 1829 + 17 bytes
PK11_FindObjectForCert(CERTCertificateStr * 0x02b60988, void * 0x02804d40,
PK11SlotInfoStr * * 0x0012dc08) line 2447 + 25 bytes
PK11_FindKeyByAnyCert(CERTCertificateStr * 0x02b60988, void * 0x02804d40) line
2476 + 17 bytes
NSS_CMSSignerInfo_Sign(NSSCMSSignerInfoStr * 0x02aa8ea8, SECItemStr *
0x02acfba8, SECItemStr * 0x038dc7c4) line 140 + 19 bytes
NSS_CMSSignedData_Encode_AfterData(NSSCMSSignedDataStr * 0x02aa8df0) line 271 +
26 bytes
nss_cms_after_data(NSSCMSEncoderContextStr * 0x02aa11a0) line 377 + 12 bytes
nss_cms_encoder_notify(void * 0x02aa11a0, int 0x00000000, void * 0x02aa8e2c, int
0x00000004) line 208 + 9 bytes
sec_asn1e_notify_after(sec_EncoderContext_struct * 0x02b5e8c0, void *
0x02aa8e2c, int 0x00000004) line 180 + 23 bytes
sec_asn1e_next_in_sequence(sec_asn1e_state_struct * 0x02b5eaa8) line 1131 + 25 bytes
SEC_ASN1EncoderUpdate(sec_EncoderContext_struct * 0x02b5e8c0, const char *
0x00000000, unsigned long 0x00000000) line 1213 + 9 bytes
NSS_CMSEncoder_Finish(NSSCMSEncoderContextStr * 0x02aa11a0) line 731 + 15 bytes
PIPNSS! 60a6a6d2()
MSGSMIME! 609620bb()
MSGSMIME! 60961b27()
MSGCOMPO! 60843830()
MSGCOMPO! 60846afe()
MSGCOMPO! 60847308()
MSGCOMPO! 608485fd()
MSGCOMPO! 60837e04()
MSGCOMPO! 6083825a()
XPCOM! 611774d0()
XPC3250! 60d5239d()
XPC3250! 60d5591c()
JS3250! 60e4a7f8()
JS3250! 60e4f898()
JS3250! 60e4a835()
XPC3250! 60d4fdc6()
XPC3250! 60d4e745()
XPCOM! 611769a7()
XPCOM! 611774d0()
XPC3250! 60d5239d()
XPC3250! 60d5591c()
JS3250! 60e4a7f8()
JS3250! 60e4f898()
JS3250! 60e4a835()
JS3250! 60e4aacf()
JS3250! 60e34957()
JSDOM! 606f3ce6()
JSDOM! 6070e502()
GKCONTENT! 60308df4()
GKCONTENT! 6030a0b5()
GKCONTENT! 602ced32()
GKLAYOUT! 603d6ea0()
GKLAYOUT! 60434966()
GKLAYOUT! 60434799()
GKLAYOUT! 603d6e16()
GKLAYOUT! 603d6ceb()
GKCONTENT! 6030e492()
GKCONTENT! 6030d248()
GKLAYOUT! 603d6e3e()
GKLAYOUT! 603d6c5d()
GKVIEW! 60547853()
GKVIEW! 6054130d()
GKVIEW! 60541b5d()
GKWIDGET! 60554792()
GKWIDGET! 60557d86()
GKWIDGET! 60558176()
GKWIDGET! 60554cae()
USER32! 77e11b60()
USER32! 77e11cca()
USER32! 77e183f1()
APPSHELL! 600d7232()
NETSCP! 00401bdf()
NETSCP! 004038b6()
KERNEL32!

Comment 1

16 years ago
I can't reproduce this crash.  Julien, could you
look into this?
Assignee: wtc → jpierre
Priority: -- → P1
Target Milestone: --- → 3.6
(Assignee)

Comment 2

16 years ago
Try also grabbing the two CRLs from
https://www.thawte.com/cgi/lifecycle/roots.exe . I also had them in my database.
I assumed that the AOL CRL was the problem because the crash occurred when
verifying signatures of AOL employees.
But in fact, the crash also happens when going to "Manage CRLs".

The problem is in fact triggered by the larger Thawte CRLs which use the blob
support and get created in the cert7.dir subdirectory.
If I manually delete the content of that directory, the CRLs go away as well as
the crash.
(Assignee)

Comment 3

16 years ago
Some more information I found out about the test case :

The problem happens only in the situation where you already had those large CRLs
in your database before NSS 3.6 - ie. within cert7.db rather than as separate
files in cert7.dir .

Here is the full test case :
1. start with Netscape 7 and NSS 3.5 DLLs
2. create a new profile
3. grab the two CRLs from https://www.thawte.com/cgi/lifecycle/roots.exe
4. check that edit/preferences/privacy and security/validation/manage CRLs works
5. exit
6. copy the NSS 3.6 DLLs over
7. restart Netscape 7
8. go to edit/preferences/privacy and security/validation/manage and delete both
CRLs
9. grab the two CRLs from https://www.thawte.com/cgi/lifecycle/roots.exe
10. go to edit/preferences/privacy and security/validation/manage CRLs : it crashes
(Assignee)

Comment 4

16 years ago
I am having a difficult time with this one. I have been able to identify the
specific CRL with which the problem occurs. It is only the second of the Thawte
CRLs - the one called "Thawte Server CA, CRL". That CRL is only 85 KB.

The other one for the Freemail CA, which is 340 KB, doesn't cause the problem.

I have been trying to create a test case that didn't involve Mozilla, but so far
I haven't succeeded.

I believe the corruption of the database occurs either during the deletion of
the old CRL, or during the import of the new one.

What I see from the stack before the crash is that pk11_getCrl returned NULL.
This is because of a bad database error being reported from the lower level.
(Assignee)

Comment 5

16 years ago
I found that if I change my test case at step 9 and import the CRL from the
local disk instead of the web site, the crash does not happen at step 10
(opening "Manage CRLs"). However, the program exits if I then try to delete that
newly-imported CRL. There is no exception / crash at that point.

This tends to suggest that something is getting corrupt in memory within the
Netscape process. This is further confirmed by the fact that using the same
database, I am unable to reproduce the crash with our command-line tools. 

I think I need to purify this code to find the problem. I don't have the proper
licenses currently (temporary one that expired) so I will have to take care of
this tomorrow.
(Assignee)

Comment 6

16 years ago
I have confirmed that this problem is related to the blob code.

If I comment the following lines in dbs_put :

	if (data->size > DBS_MAX_ENTRY_SIZE) {
	    dbs_mkBlob(key,data,&blob);
	    ret = dbs_writeBlob(dbsp->blobdir, dbsp->mode, &blob, data);
	    data = &blob;
	}

Then the blob support is effectively disabled (no blobs ever get created) and I
don't see the crash.

However, I haven't been able to find out what's wrong in the shim that ends up
causing the crash.

Comment 7

16 years ago
Created attachment 98973 [details] [diff] [review]
Disable the blob db shim code for NSS 3.6 Beta1
(Assignee)

Comment 8

16 years ago
FYI, I have reproduced the crash on two platforms - the same problem occurs on OS/2.
I'm trying to narrow it down further in the debugger.

One of the symptoms I see is that
nsslowcert_FindCrlByKey
is called with a crlKey that has a length of 0xC8 when using the regular DBM code.
This happens when I click on "manage CRLs" which enumerates all CRLs.

With my test case, after the CRL has been deleted from the DB and added as a
blob, when I click "manage CRLs" to display, the function gets called with a
crlKey that has a length of 0xC9 .

In other words, the database key is different. It is getting corrupt somehow,
but I still don't know how.
(Assignee)

Comment 9

16 years ago
I have finally been able to create a test case for this problem that does not
involve the browser.

What I did was modify crlutil in the following ways :
1) I added a way to delete CRLs when the issuer cert wasn't in cert7.db .
The method for doing that is the same as PSM - calling SEC_LookupCrls, then
SEC_FindCrlByName , and then SEC_DeletePermCrl . The call to SEC_FindCrlByName
is necessary because the objects returned by SEC_LookupCrls don't have a slot
pointer which causes the deletion to fail

2) I added a testing option that reproduces the sequence of steps that I had in
the browser. The test case starts with a database created with a previous
version of NSS that contains the 80KB thawte CRL. The test code I added in
crlutil does the following :
1) delete the CRL
2) import the CRL (which writes a blob)
3) list the CRLs

When I call this test code, I can reproduce the exact same crash on step 3 as in
the browser

However, I have also created a batch file to do the same thing, but calling
crlutil 3 times - once for each of the 3 steps. In that case, the crash does not
occur ! This suggests that something is being corrupt in memory.
I still haven't got a chance to purify, but it should be possible now with the
smaller test case outside of the browser.
(Assignee)

Comment 10

16 years ago
FYI, Purify didn't find any error with my test case.
I think this is a problem in the dbm code.
(Assignee)

Comment 11

16 years ago
I just ran another test, with the blob code disabled.
1. I started with an empty cert database.
2. I imported the 85 KB thawte CRL with crlutil.
3. I deleted the CRL with crlutil
4. went back to step 2, in a loop

At each iteration after the initial addition, the cert database grew by 4 KB.
Then, ultimately, it reached a peak size of 417,792 bytes. At that point, even
if I tried to import the CRL into it, it wouldn't show anymore with crlutil -L.

Therefore, there is a pre-existing problem in DBM.

I repeated the test with the Netscape CRL, which is below 64 KB. The same
problem occurred, at a different peak size for cert7.db (229,376 bytes).

When I use a 470 or a 2524 bytes CRL, I don't get the problem. The cert7.db
stays at 16,384 bytes the entire time.

I don't have any CRL in between 2524 and 29313 bytes (Netscape CRL) so I
couldn't test any other case to find out what exact threshold is the problem.

This problmem may or may not be directly related to what we are seeing right now
with the blob code, but I think there is a good chance that it is.

Comment 12

16 years ago
That would be my suspicion as well. The problem seems to be inherent in storing
large objects in the database, and these all look like symptoms of the problem
we see with very large objects.

It seems we have 3 options:

1) re-enable the blob code and ask people not to store large objects in older
versions of netscape:
    This would be bad for people who already stored these objects, but at this
point they are already in danger.

2) try to fix dbm.
    The multi-block write code is one of the most complicated parts of the DBM
code, and seems to have some nasty address limits and funny branching code. In
the end we still have pretty terrible performance with large objects.

3) Some combination of 1 & 2.
     This has the performance advantages of 1, but still has all the work that 2
entails.

bob
(Assignee)

Updated

16 years ago
Depends on: 169573
(Assignee)

Comment 13

16 years ago
Created attachment 99938 [details] [diff] [review]
patch to restore blob code. In the interest of testing
(Assignee)

Comment 14

16 years ago
Created attachment 99939 [details]
batch to reproduce the crash
(Assignee)

Comment 15

16 years ago
Created attachment 99940 [details]
CRL used in the batch file to reproduce the crash
(Assignee)

Comment 16

16 years ago
Created attachment 99941 [details]
cert database containing a copy of the CRL, required by batch. Place in mozilla/dist/$OBJDIR/bin/sav
(Assignee)

Comment 17

16 years ago
I have added everything to this defect that is needed to reproduce this problem :
- a patch to rollback to the blob support
- a copy of a database containing the CRL (created with a previous version of NSS)
- a batch file tha calls crlutil with the -T option that deletes and adds back
the CRL, causing the crash

I'm reassigning the bug to Bob as I don't feel like there I can resolve this
with my current limited knowledge of the DBM code.
Assignee: jpierre → relyea
(Assignee)

Comment 18

16 years ago
Created attachment 100635 [details] [diff] [review]
patch to softoken to gracefully handle database lookup failure instead of crashing
(Assignee)

Comment 19

16 years ago
Created attachment 100636 [details] [diff] [review]
patch to pk11wrap to gracefully handle NULL CRLs return from token instead of crashing
(Assignee)

Comment 20

16 years ago
I have attached two patches that can avoid the crash even when the blob code is
enabled and the database gets corrupt (of course, the CRL is not found with the
blob code still, but at least we don't crash).

- One crash was in softoken, trying to dereference the CRL object that wasn't
found in the database ater hte corruption.

- The other crash was in pk11wrap, trying to dereference the NULL CRL object
returned from softoken.

We may want to do a third patch to ensure that softoken never even returns such
NULL CRL objects to the upper layer, but I'm not sure how to do that.

Comment 21

16 years ago
Assigned the bug to Julien.  Bob, Nelson please
review the two patches Julien just attached.
Assignee: relyea → jpierre
Review comments:

1. I'd find the patch to pk11_CollectCrls() clearer if it read as follows:

    if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue))
        new_node->type = SEC_KRL_TYPE;
    else
        new_node->type = SEC_CRL_TYPE;

2. Function pk11_FindCrlAttribute() returns NULL in two places, but does not
call PORT_SetError in either place.  This patch adds a third place where 
it will return NULL. IMO, any time this function returns NULL, the error 
code should have been explicitly set, either by this function, or by a 
function it has just called.  I'd like this fix to address that, also.

(Assignee)

Comment 23

16 years ago
Nelson,

1. I agree this is more readable

2. In the case I added where NULL is returned, the NSPR error has already been
set at a lower level, namely SEC_BAD_DATABASE_ERROR . So I don't think I should
call PR_SetError.

As far as the two other cases :

a. the "return NULL" in the default case of the first switch statement should
set an invalid attribute error. There is a PKCS#11 error defined for it -
CKR_ATTRIBUTE_TYPE_INVALID, but there is no way to return it in that code path.
I'm not aware of an NSPR error to match this. There aren't any NSPR errors for
PKCS#11 defined in secerr.h . Perhaps I should just use a generic
SEC_ERROR_INVALID_ARGS ?

b. the "return NULL" at the end of the function can never be reached, that case
currently gets caught by the default case in a).
(Assignee)

Comment 24

16 years ago
Created attachment 100655 [details] [diff] [review]
 100635: patch to softoken to gracefully handle database lookup failure instead of crashing
Attachment #100635 - Attachment is obsolete: true
(Assignee)

Comment 25

16 years ago
Created attachment 100659 [details] [diff] [review]
100636: patch to pk11wrap to gracefully handle NULL CRLs returned from token instead of crashing
Attachment #100636 - Attachment is obsolete: true
Your updated patch id=100655 looks very very different from the previous 
version of that patch.  Perhaps it contains some debugging code, or other
changes that are irrelevant to this bug?
(Assignee)

Comment 27

16 years ago
oops, sorry. I did a cvs diff in my softoken directory; the other files weren't
intended to be part of this patch.
(Assignee)

Comment 28

16 years ago
Created attachment 100670 [details] [diff] [review]
patch to softoken to gracefully handle database lookup failure instead of crashing

don't include irrelevant files as part of the patch
Attachment #100655 - Attachment is obsolete: true
r=nelsonb for patches 100659 and 100670.

Comment 30

16 years ago
Comment on attachment 100659 [details] [diff] [review]
100636: patch to pk11wrap to gracefully handle NULL CRLs returned from token instead of crashing

>Index: pk11cert.c
[...]
>-    new_node->type =  *((CK_BBOOL *)fetchCrl[1].pValue)  ? 
>-						SEC_KRL_TYPE : SEC_CRL_TYPE;
>+    if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue))
>+        new_node->type = SEC_KRL_TYPE;
>+    else
>+        new_node->type = SEC_CRL_TYPE;
>+

This patch assumes that if fetchCrl[1].pValue is
NULL, it is a CRL.  But it seems that a NULL
fetchCrl[1].pValue is an error.  Why don't we
return a failure status if fetchCrl[1].pValue is
NULL?  That is,

    if (!fetchCrl[1].pValue) {
	PORT_SetError(...);
	goto loser;
    }
    new_node->type =  *((CK_BBOOL *)fetchCrl[1].pValue)  ? 
						SEC_KRL_TYPE
(Assignee)

Comment 31

16 years ago
Created attachment 100807 [details] [diff] [review]
patch for pk11wrap using Wan-Teh's suggestion
Attachment #100659 - Attachment is obsolete: true
(Assignee)

Comment 32

16 years ago
PK11_GetAttributes returns a CK_RV, rather than a SECStatus .
Yet we check for "rv == SECFailure", calling it in pk11_collectCrls :

    rv = PK11_GetAttributes(head->arena,slot,crlID,fetchCrl,fetchCrlSize);
    if (rv == SECFailure) {
	goto loser;
    }

In this case, rv = 18 , or CKR_ATTRIBUTE_TYPE_INVALID, but we don't catch it.

I just wish we were using a C++ compiler to compile the NSS code, even if it
remains C code. It would have caught this error.

As far as the other suggestion I had, which was to have softoken not return CRL
objects that were corrupt, this isn't really possible, because we don't know
about the corruption until we try to fetch the specific attribute (value). So
this is probably as good as the patches are going to get.

(Assignee)

Comment 33

16 years ago
Created attachment 100809 [details] [diff] [review]
fix using correct types
Attachment #100807 - Attachment is obsolete: true

Updated

16 years ago
Attachment #100809 - Flags: needs-work+

Comment 34

16 years ago
Comment on attachment 100809 [details] [diff] [review]
fix using correct types

Good catch, Juline.  Two suggested changes.

1. A stylistic nit: Relyea uses 'crv' for
the return value of the CK_RV type.  I
suggest you change 'attr_rv' to 'crv'.

2. 

>-    if (fetchCrl[1].pValue && *((CK_BBOOL *)fetchCrl[1].pValue))
>+    if (!fetchCrl[1].pValue) {
>+	rv = SECFailure;
>+	goto loser;
>+    }

You should call PORT_SetError() to set an
error code.
(Assignee)

Comment 35

16 years ago
Wan-Teh,

I'm not sure I agree about setting the error code.
If it comes from softoken, then the error is already set in this case -
SEC_ERROR_BAD_DATABASE. That information would be lost if I set it.

On the other hand, if it comes from a third party module, there wouldn't be a PR
error set. It seems that in many other places in the wrapper, the PR error code
is set to PK11_MapError(crv), so I'm going to do that here too.
(Assignee)

Comment 36

16 years ago
Created attachment 100828 [details] [diff] [review]
patch setting the error

This patch also contains very minor fixes for warnings when I compiled this
file as C++.
Attachment #100809 - Attachment is obsolete: true

Comment 37

16 years ago
The new patch looks corrrect for the error case out of PK11_GetAttributes()
(using PK11_MapError() is the correct thing to do there). Wan-teh's comment
specifically pointed out the error case where fetchCRL[1].pValue was NULL. In
that case we need to do a PORT_SetError() (probably bad DAtabase, or create a
new error of a BAD CRL) for the case where PK11_GetAttributes returned fine, but
the object had a NULL pValue;

bob
(Assignee)

Comment 38

16 years ago
Created attachment 100945 [details] [diff] [review]
updated patch

Set the error to SEC_ERROR_CRL_INVALID for the case where crv is CKR_OK but
pValue is NULL . I also moved this test up before the allocation which isn't
needed if this fails.
Attachment #100828 - Attachment is obsolete: true

Comment 39

16 years ago
Comment on attachment 100945 [details] [diff] [review]
updated patch

r=wtc.
Attachment #100945 - Flags: review+
(Assignee)

Comment 40

16 years ago
Fix checked in.
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.