Closed Bug 272484 Opened 20 years ago Closed 18 years ago

Certificate manager crashes [@ _PR_MD_ATOMIC_DECREMENT - PK11_FreeSymKey]

Categories

(NSS :: Libraries, defect, P2)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jamesrome, Assigned: rrelyea)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(5 files, 3 obsolete files)

in both the 1.8a5 build and 2002113004 the certificate manager crashes if I look
at it. The last talkback is TB2278660H. There are a bunch more under my e-mail
address.

The certificate manager either crashes the second time I open it, or more
recently it crashes when I try to import a certificate into the Dallas JButton
token.

I also have FIPS security enabled.
Stack Signature _PR_MD_ATOMIC_DECREMENT 0b6c1c59 
Product ID MozillaTrunk 
Build ID 2004113004 
Trigger Time 2004-11-30 16:06:35.0 
Platform Win32 
Operating System Windows NT 5.1 build 2600 
Module nspr4.dll + (00016919) 
URL visited  
User Comments importing a certificate 
Since Last Crash 93 sec 
Total Uptime 93 sec 
Trigger Reason Access violation 
Source File, Line No. 
c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/nsprpub/pr/src/md/win
dows/ntmisc.c, line 800 
Stack Trace  

_PR_MD_ATOMIC_DECREMENT  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/nsprpub/pr/src/md/wi
ndows/ntmisc.c, line 800]
PK11_FreeSymKey  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pk1
1wrap/pk11skey.c, line 249]
sec_pkcs7_decoder_notify  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s7/p7decode.c, line 1045]
sec_asn1d_notify_before  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 449]
sec_asn1d_next_in_sequence  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 2007]
SEC_ASN1DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 2662]
SEC_PKCS7DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s7/p7decode.c, line 1189]
sec_pkcs12_decoder_wrap_p7_update  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s12/p12d.c, line 750]
SEC_ASN1DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 2779]
sec_pkcs12_decoder_asafes_callback  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s12/p12d.c, line 850]
sec_pkcs7_decoder_work_data  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s7/p7decode.c, line 207]
SEC_ASN1DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 2779]
SEC_PKCS7DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s7/p7decode.c, line 1189]
sec_pkcs12_decode_asafes_cinfo_update  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s12/p12d.c, line 950]
SEC_ASN1DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/uti
l/secasn1d.c, line 2779]
SEC_PKCS12DecoderUpdate  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s12/p12d.c, line 1276]
nsPKCS12Blob::inputToDecoder  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/manager/ssl
/src/nsPKCS12Blob.cpp, line 546]
0x027054f8
0x02610678
0x0288ba48
Assignee: general → wtchang
Severity: major → critical
Component: General → Libraries
Product: Mozilla Application Suite → NSS
QA Contact: general → bishakhabanerjee
Summary: Certificate manager crashes → Certificate manager crashes [@ _PR_MD_ATOMIC_DECREMENT]
QA Contact: bishakhabanerjee → jason.m.reid
Keywords: topcrash
Status: NEW → ASSIGNED
Taking.
Assignee: wtchang → b.jacques
Status: ASSIGNED → NEW
Attached patch fix (obsolete) — Splinter Review
Attachment #191919 - Flags: review?(wtchang)
Could you explain why this patch fixes the crash?
Sure. I think this occurs because bulkkey may be null for some reason.
PK11_FreeSymKey() dereferences the bulkkey that it is passed, which causes the
crash in _PR_MD_ATOMIC_DECREMENT when bulkkey is null. See also timeless's URL.

By moving the call to PK11_FreeSymKey() into the if ( [snip] && bulkkey ) {}
block, we ensure that bulkkey is non-null, which hopefully prevents the crash
from happening.
Actually, looking at my patch again, I think it prevents the key from being
freed in some cases.
Attached patch add null check (obsolete) — Splinter Review
This patch adds a seperate null check instead.
Attachment #191919 - Attachment is obsolete: true
Attachment #192041 - Flags: review?(wtchang)
Attachment #191919 - Flags: review?(wtchang)
Two comments here:

1) I'm worried we are papering over a real underlying problem. It's very clear
we are doing PKCS #5, because a quick examination of the code shows that this is
the only case where bulkkey can possibly become NULL without either bailing or
crashing horribly in the code above it. This is consistant with the stack trace
(showing PKCS #12) and the description  "... or more recently it crashes when I
try to import a certificate into the Dallas JButton token."

2) Part of the bug description is "in both the 1.8a5 build and 2002113004 the
certificate manager crashes if I look at it." The certificate manager does not
invoke the code listed on the stacktrace when it's opened, only when it is
trying to import a certificate.

Anyway I believe there are 2 separate bugs. The patch is acceptible for the
first bug (crash when I import into the token) only if we verify that it's
legitamate for the PKCS #5 object to have an NULL key (it may be it's an error
case, in which case we need to error out, not ignore it).

The second problem (cert manager crashes on launch) is likely in
module/nsNSSCertTree.cpp. There was a recent patch to that code which also makes
a 'null' check and bails. On debug builds, though I sometimes see thrashed
datastructures. It looks like a nasty race condition that does not always
happen. The symptoms are you crash when you bring up the cert manager.

bob
Comment on attachment 192041 [details] [diff] [review]
add null check

>@@ -698,17 +698,18 @@ sec_pkcs7_decoder_start_decrypt (SEC_PKC
>     if ( SEC_PKCS5IsAlgorithmPBEAlg(&(enccinfo->contentEncAlg)) && bulkkey ) {
> 	SEC_PKCS5KeyAndPassword *keyPwd = (SEC_PKCS5KeyAndPassword *)bulkkey;
> 	bulkkey = keyPwd->key;
>     }
> 
>     /*
>      * We are done with (this) bulkkey now.
>      */
>-    PK11_FreeSymKey (bulkkey);
>+    if (bulkkey)
>+        PK11_FreeSymKey (bulkkey);

As Bob Relyea said, the only way bulkkey can be NULL here is
that keyPwd->key is NULL.  So the NULL pointer check of bulkkey
should be done inside the if statement.  But we should find out
why it is NULL so we know the proper way to handle that condition.
Attachment #192041 - Flags: review?(wtchang) → review-
The basic problem is the horrible hack we have in our code to deal with PBE
encrypted data, where part of the password item is needed not only to generate
the key, but also the IV. The hack creates a data structure which passes both
the key and the IV, then 'cleverly' intercepts its use to restor just the key
value.

This patch will fix the crash situation. The user will find, however that he
won't be able to import a cert into the desired token in these cases.
Attachment #192041 - Attachment is obsolete: true
The reason we couldn't generate the PBE is because something failed when
passing the PBE to the token. This code tries to then generate the PBE using
the softoken. Later, when the key is use, the PKCS12 blob will likely be
decrypted and parsed by the softoken and imported into the target token (rather
then the target token unwrapping it directly).
Neither patch is tested, it would be nice to try it against the JButton token.

Also, in looking into the code, it's clear that the old code and patch1 will
leak a slot reference periodically!

bob
I never have been able to use a JButton with Firefox or Mozilla. It kinda worked
with Netscape. So yes, please test this! I managed to impoirt my certificate,
but I could never use it if it was stored in the JButton.

I have 20 JButtons that are now quite useless.....
This patch would only allow import. I wonder how old the PKCS #11 module is.
There were some fixes early on to Netscape where Netscape was incorrectly using
PKCS #11, though most of those fixes were done in ways to preserve
compatibility. There was one case, though in about 2000, where Netscape started
relying on PKCS #11 semantics that it hadn't relied on before. Some of the more
poorly written PKCS #11 modules broke at that point.

My guess is Dallas hasn't kept the PKCS #11 module for the JButtons up to date,
but it could be some latent bug in NSS as well.

Anyway my 'testing' point is I can't test it because I don't have any JButtons.
I was looking for a volunteer who did have one to try it. If you need a build I
can provide it.

bob
Assignee: b.jacques → rrelyea
Comment on attachment 192154 [details] [diff] [review]
Basic patch to fix the crash problem

This patch seems to make sense, but we need to test it to
verify the theory.
Comment on attachment 192156 [details] [diff] [review]
Fix the underlying issue...

>     /* if no slot specified, use the internal key slot */
>-    if(!keyPwd->slot) {
>-	keyPwd->slot = PK11_GetInternalKeySlot();
>+    if(keyPwd->slot) {
>+	slot = PK11_ReferenceSlot(keyPwd->slot);
>+    } else {
>+	slot = PK11_GetInternalKeySlot();
>     }

One change from the original code is that if keyPwd->slot
is NULL, it will remain NULL because we are now using the
local variable "slot" instead in this function.  Is this
intentional?

Other than that, this patch seems to make sense although
I didn't trace the code carefully.
Comment on attachment 192156 [details] [diff] [review]
Fix the underlying issue...

I traced the code in p12d.c a little more.  I found that
keyPwd->slot equals p12cx->slot (see sec_pkcs12_decoder_asafes_notify)
and p12cx->slot is never NULL (see SEC_PKCS12DecoderStart).
This means in the following code in sec_pkcs12_decoder_get_decrypt_key,
keyPwd->slot cannot be NULL, and that it isn't leaking slot references
because the code inside the if statement is never executed.

    /* if no slot specified, use the internal key slot */
    if(!keyPwd->slot) {
	keyPwd->slot = PK11_GetInternalKeySlot();
    }

So now I think this patch is correct.  We just need to test it...
Attachment #192156 - Flags: review+
Comment on attachment 192154 [details] [diff] [review]
Basic patch to fix the crash problem

This basic patch is fine even though it doesn't fix the (potential)
leak of the internal key slot, which I determined by code
inspection can't possibly happen (see comment 18).
Attachment #192154 - Flags: review+
The 'crash when starting the cert manager' is covered in bug  225034.
*** Bug 304367 has been marked as a duplicate of this bug. ***
This is 2nd place on the Firefox trunk topcrashers list.
Keywords: topcrashtopcrash+
Really, with this exact crash signature:

_PR_MD_ATOMIC_DECREMENT  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/nsprpub/pr/src/md/wi
ndows/ntmisc.c, line 800]
PK11_FreeSymKey  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pk1
1wrap/pk11skey.c, line 249]
sec_pkcs7_decoder_notify  
[c:/builds/tinderbox/MozillaTrunk/WINNT_5.0_Clobber/mozilla/security/nss/lib/pkc
s7/p7decode.c, line 1045]

Or is it that just _PR_MD_ATOMIC_DECREMENT on the top of stack. If the latter,
then you have a different bug and you need a new bug report. If the former, then
that means a *LOT* of people have smart cards installed (which I would love, but
find hard to believe).

bob
Priority: -- → P2
Target Milestone: --- → 3.11
(In reply to comment #23)
> Really, with this exact crash signature:

Sorry, my bad :)

The topcrasher isn't the same bug ...
Keywords: topcrash+topcrash
Summary: Certificate manager crashes [@ _PR_MD_ATOMIC_DECREMENT] → Certificate manager crashes [@ _PR_MD_ATOMIC_DECREMENT - PK11_FreeSymKey]
Fix the issue addressed in this bug by removing the hack altogether. 

This patch add a new feature which allows applications to attach data to a key.
The PBE code then uses this feature to attach the pwitem to the key rather than
hacking up a fake structure which isn't a key to begin with.

This patch also removes a long standing issue with symetric keys were we have
'session thrashing'. Sometimes we need new key structures where the session
will not be owned by the key itself (this happens a lot in an SSL connection).
In the old code we would get a new key, create a session for it, then
immediately close that session and inherit the parent's session. In this
current code the caller tells the sym key create code wether or not it needs a
session, and an appropriate key structure is returned to it. On freeing, keys
which own their own sessions are placed on separte queues so that those
sessions can be reused to hold new keys.
Attachment #196938 - Flags: review?(wtchang)
Bob, I have reviewed your whole patch, which consists
of two unrelated subsets.  I have more questions about
the subset that deals with "session thrashing".  The
subset that deals with the SEC_PKCS5KeyAndPassword as
symkey hack looks correct.  (I didn't try to understand
what the code does.  I just made sure the changes look
reasonable and there are no obvious mistakes.)

I only have one suggested change: make
PK11_SetSymKeyUserData return void.  The only reason
PK11_SetSymKeyUserData may fail is that the symKey
pointer is bad, but a bad symKey pointer can also
cause PK11_GetSymKeyUserData to crash/fail (that function
returns void).	Not to mention your code doesn't
check the return value of PK11_SetSymKeyUserData anyway.
Attachment #198027 - Flags: review+
This is the other subset of Bob's patch
"Fix the crash problem by removing the hack"
(attachment 196938 [details] [diff] [review]).
Attachment #196938 - Attachment is obsolete: true
Attachment #198028 - Flags: review?(wtchang)
Comment on attachment 198027 [details] [diff] [review]
Fix the crash problem by removing the hack (relevant subset)

Bob, we should also remove the definition of the
SEC_PKCS5KeyAndPassword type from keythi.h, or at
least add a "unused and deprecated" comment.

keythi.h is exported in spite of the "i" in its
name.  Still, I think it's safe to remove the
definition of the SEC_PKCS5KeyAndPassword type.
Comment on attachment 198027 [details] [diff] [review]
Fix the crash problem by removing the hack (relevant subset)

In comment 26, I said that I didn't try to understand
what the code in this patch does.  I just spent more
time doing that.  So now this patch gets my full endorsement.

I found another minor improvement.  In p12d.c, we have:

	/* if one is being decoded, finish the decode */
	if(p12dcx->currentASafeP7Dcx != NULL) {
	    if(!SEC_PKCS7DecoderFinish(p12dcx->currentASafeP7Dcx)) {
		p12dcx->currentASafeP7Dcx = NULL;
		p12dcx->errorValue = PORT_GetError();
		goto loser;
	    }
	    p12dcx->currentASafeP7Dcx = NULL;
	}
	p12dcx->currentASafeP7Dcx = NULL;

The last line can be removed.
Comment on attachment 198028 [details] [diff] [review]
Bob's fix for symkey "session thrashing"

r=wtc.	I have many suggested changes for this patch to
make the code easier to understand, but the code is correct.

Among my suggestions, only one is code change, which I will
describe here:

In pk11wrap/pk11skey.c, we have

>     if (symKey) {
> 	symKey->next = NULL;
>-	if ((symKey->series != slot->series) || (!symKey->sessionOwner))
>-    	    symKey->session = pk11_GetNewSession(slot,&symKey->sessionOwner);
>+	if (!ownSession) {
>+	    return symKey;
>+	}
>+	/* if we are getting an owner key, make sure we have a valid session.
>+         * session could be invalid if the token has been removed or because
>+         * we got it from the non-owner free list */
>+	if ((symKey->series != slot->series) || 
>+			 (symKey->session == CK_INVALID_SESSION)) {
>+	    symKey->session = pk11_GetNewSession(slot, &symKey->sessionOwner);
>+	}
> 	PORT_Assert(symKey->session != CK_INVALID_SESSION);
> 	if (symKey->session != CK_INVALID_SESSION)
> 	    return symKey;
>-        PK11_FreeSymKey(symKey);
>+	PK11_FreeSymKey(symKey);
>     }

I think we should do "return NULL;" at the end of the if block,
rather than falling through.  If pk11_GetNewSession just failed
here, it will most likely fail again.  So I don't see any point to
fall through (which will retry pk11_GetNewSession).

The rest of my suggested changes are comment improvements and possibly
renaming the ownSession function parameter.  I was confused by "owner",
"sessionOwner", "ownSession".  They all look similar.  "owner" and
"sessionOwner" are not documented.  "ownSession" actually does not
imply "sessionOwner" but rather means "create the symkey with a session,
which may not be owned by the symkey".	I've figured them all out after
reading the all the code, but it could be made more clear.  I will give
my suggestions to Bob in person.
Attachment #198028 - Flags: review?(wtchang) → review+
Comment on attachment 198027 [details] [diff] [review]
Fix the crash problem by removing the hack (relevant subset)

When I think about this patch more, I have a new concern
about the new PK11_GetSymKeyUserData and PK11_SetSymKeyUserData
functions.  They are more flexible than what we need to fix
this bug.  My concern is that multiple users may set their
data to the same symkey, so that they interfere with each
other or interfere with NSS's own use of the user data.

So, it may be safer to call these functions
PK11_GetPBESymKeyPassword and PK11_SetPBESymKeyPassword.
Or, we can make these functions for NSS internal use only
(__PK11_GetSymKeyUserData and __PK11_SetSymKeyUserData).
Or, we should at least document that users of these two
functions must make sure they are the only users of the
symkey's user data.
re wtc. I intended them to be flexible, and callable by users. The PBE keys are
not readily available to the user, so there isn't an issuer of usage there. I
figured if we needed the function for PBE, users may need to use it for keys
that they create as well.

I seems reasonable to document the usage:

/*
 * PK11_SetSymKeyUserData
 *   sets generic user data on keys (usually a pointer to a data structure)
 * that can later be retrieved by PK11_GetSymKeyUserData().
 *    data - data to be set.
 *    freefunc - function used to free the data.
 * Setting user data on symKeys with existing user data already set will cause
 * the existing user data to be freed before the new user data is set.
 * Freeing user data is done by calling the user specified freefunc.
 * If freefunc is NULL, the user data is assumed to be global or static an
 * not freed. Passing NULL for user data to PK11_SetSymKeyUserData has the
 * effect of freeing any existing user data, and clearing the user data
 * pointer. If user data exists when the symKey is finally freed, that
 * data will be freed with freefunc.
 *
 * Applications should only use this function on keys which the application
 * has created directly, as there is only one user data slot.
 */
Attachment #196938 - Flags: review?(wtchang)
Remove the now unused type definition.  We only need
to set p12dcx->currentASafeP7Dcx to NULL if it isn't
NULL.
Attachment #202174 - Flags: review?(rrelyea)
Attachment #202174 - Flags: review?(rrelyea) → review+
Is the "code cleanup patch" going in soon?  I just crashed in PR_MD_ATOMIC_DECREMENT...
(In reply to comment #34)
> Is the "code cleanup patch" going in soon?  I just crashed in
> PR_MD_ATOMIC_DECREMENT...

Er, nevermind, sorry.  That's just code removal.  Sigh...
This bug is only fixed in NSS 3.11, which is not yet
in any Mozilla client release.
Comment on attachment 202174 [details] [diff] [review]
Code cleanup patch

I checked in this code cleanup patch on the NSS
trunk (NSS 3.12).

Checking in cryptohi/keythi.h;
/cvsroot/mozilla/security/nss/lib/cryptohi/keythi.h,v  <--  keythi.h
new revision: 1.9; previous revision: 1.8
done
Checking in pkcs12/p12d.c;
/cvsroot/mozilla/security/nss/lib/pkcs12/p12d.c,v  <--  p12d.c
new revision: 1.30; previous revision: 1.29
done
QA Contact: jason.m.reid → libraries
(In reply to comment #36)
> This bug is only fixed in NSS 3.11, which is not yet
> in any Mozilla client release.

That's no longer true anymore, right?
It is working in seamonkey (which I switched to)
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Crash Signature: [@ _PR_MD_ATOMIC_DECREMENT - PK11_FreeSymKey]
Blocks: 229295
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: