crash in token object cache during SSL client auth stress test with smartcard

RESOLVED FIXED in 3.9

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
15 years ago
14 years ago

People

(Reporter: Julien Pierre, Assigned: Wan-Teh Chang)

Tracking

x86
Windows 2000

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 3 obsolete attachments)

(Reporter)

Description

15 years ago
I was running a stress test with a smartcard. It crashed twice in the same way
while I was showing this test to John Unruh.

 	nspr4.dll!PR_Lock(PRLock * lock=0x0056a1b0)  Line 235 + 0x28	C
 	nss3.dll!nssTokenObjectCache_Clear(nssTokenObjectCacheStr * cache=0x0056a170)
 Line 606 + 0xd	C
 	nss3.dll!nssToken_Remove(NSSTokenStr * tok=0x005698c0)  Line 170 + 0xc	C
 	nss3.dll!nssSlot_IsTokenPresent(NSSSlotStr * slot=0x0056a358)  Line 321 + 0xc	C
 	nss3.dll!search_for_objects(nssTokenObjectCacheStr * cache=0x0056a170)  Line
765 + 0x9	C
 	nss3.dll!get_token_certs_for_cache(nssTokenObjectCacheStr * cache=0x0056a170)
 Line 826 + 0x9	C
 	nss3.dll!nssTokenObjectCache_FindObjectsByTemplate(nssTokenObjectCacheStr *
cache=0x0056a170, unsigned long objclass=1, CK_ATTRIBUTE * otemplate=0x0228fad8,
unsigned long otlen=3, unsigned int maximumOpt=0, int * statusOpt=0x0228faa4) 
Line 1140 + 0x9	C
 	nss3.dll!find_objects_by_template(NSSTokenStr * token=0x005698c0,
nssSessionStr * sessionOpt=0x00564850, CK_ATTRIBUTE * obj_template=0x0228fad8,
unsigned long otsize=3, unsigned int maximumOpt=0, int * statusOpt=0x0228fb3c) 
Line 527 + 0x20	C
 	nss3.dll!nssToken_FindCertificatesBySubject(NSSTokenStr * token=0x005698c0,
nssSessionStr * sessionOpt=0x00564850, NSSItemStr * subject=0x00570ef4, int
searchType=2, unsigned int maximumOpt=0, int * statusOpt=0x0228fb3c)  Line 729
+ 0x1d	C
 	nss3.dll!nssTrustDomain_FindCertificatesBySubject(NSSTrustDomainStr *
td=0x00564650, NSSItemStr * subject=0x00570ef4, NSSCertificateStr * *
rvOpt=0x00000000, unsigned int maximumOpt=0, NSSArenaStr * arenaOpt=0x07b22130)
 Line 624 + 0x1d	C
 	nss3.dll!find_cert_issuer(NSSCertificateStr * c=0x00570ec0, NSSTimeStr *
timeOpt=0x00000000, NSSUsageStr * usage=0x0228fbc4, NSSPoliciesStr *
policiesOpt=0x00000000)  Line 423 + 0x18	C
 	nss3.dll!nssCertificate_BuildChain(NSSCertificateStr * c=0x00570ec0,
NSSTimeStr * timeOpt=0x00000000, NSSUsageStr * usage=0x0228fc3c, NSSPoliciesStr
* policiesOpt=0x00000000, NSSCertificateStr * * rvOpt=0x00000000, unsigned int
rvLimit=0, NSSArenaStr * arenaOpt=0x00000000, int * statusOpt=0x00000000)  Line
504 + 0x15	C
 	nss3.dll!NSSCertificate_BuildChain(NSSCertificateStr * c=0x00570ec0,
NSSTimeStr * timeOpt=0x00000000, NSSUsageStr * usage=0x0228fc3c, NSSPoliciesStr
* policiesOpt=0x00000000, NSSCertificateStr * * rvOpt=0x00000000, unsigned int
rvLimit=0, NSSArenaStr * arenaOpt=0x00000000, int * statusOpt=0x00000000)  Line
541 + 0x25	C
 	nss3.dll!CERT_CertChainFromCert(CERTCertificateStr * cert=0x0056f218, int
usage=0, int includeRoot=0)  Line 1055 + 0x19	C
 	ssl3.dll!ssl3_HandleCertificateRequest(sslSocketStr * ss=0x00598bf8, unsigned
char * b=0x07f504fa, unsigned int length=0)  Line 5151 + 0x13	C
 	ssl3.dll!ssl3_HandleHandshakeMessage(sslSocketStr * ss=0x00598bf8, unsigned
char * b=0x07f503ac, unsigned int length=334)  Line 8011 + 0x11	C
 	ssl3.dll!ssl3_HandleHandshake(sslSocketStr * ss=0x00598bf8, sslBufferStr *
origBuf=0x00598da0)  Line 8111 + 0x19	C
 	ssl3.dll!ssl3_HandleRecord(sslSocketStr * ss=0x00598bf8, SSL3Ciphertext *
cText=0x0228fdfc, sslBufferStr * databuf=0x00598da0)  Line 8380 + 0xd	C
 	ssl3.dll!ssl3_GatherCompleteHandshake(sslSocketStr * ss=0x00598bf8, int
flags=0)  Line 203 + 0x16	C
 	ssl3.dll!ssl_GatherRecord1stHandshake(sslSocketStr * ss=0x00598bf8)  Line
1260 + 0xb	C
 	ssl3.dll!ssl_Do1stHandshake(sslSocketStr * ss=0x00598bf8)  Line 149 + 0xd	C
 	ssl3.dll!ssl_SecureSend(sslSocketStr * ss=0x00598bf8, const unsigned char *
buf=0x004093e0, int len=21, int flags=0)  Line 1028 + 0x9	C
 	ssl3.dll!ssl_SecureWrite(sslSocketStr * ss=0x00598bf8, const unsigned char *
buf=0x004093e0, int len=21)  Line 1062 + 0x13	C
 	ssl3.dll!ssl_Write(PRFileDesc * fd=0x0061dff0, const void * buf=0x004093e0,
int len=21)  Line 1282 + 0x15	C
 	nspr4.dll!PR_Write(PRFileDesc * fd=0x0061dff0, const void * buf=0x004093e0,
int amount=21)  Line 143 + 0x14	C
 	strsclnt.exe!handle_connection(PRFileDesc * ssl_sock=0x0061dff0, int
connection=660)  Line 675 + 0x1d	C
 	strsclnt.exe!do_connects(void * a=0x0012fea0, void * b=0x0056d620, int
connection=660)  Line 802 + 0xd	C
 	strsclnt.exe!thread_wrapper(void * arg=0x004122e0)  Line 383 + 0x1a	C
 	nspr4.dll!_PR_NativeRunThread(void * arg=0x0057d4c8)  Line 433 + 0xd	C
 	msvcrt.dll!780060ce()
(Reporter)

Comment 1

15 years ago
I forgot to mention that the crash occurred just after I pulled out the token.
(Assignee)

Comment 2

15 years ago
Created attachment 128963 [details] [diff] [review]
Proposed patch

This crash is an assertion failure in the debug build.
The assertion failure means that the thread is locking
the token object cache recursively.  It first locks the
token object cache in nssTokenObjectCache_FindObjectsByTemplate,
and then tries to lock the token object cache again in
nssTokenObjectCache_Clear.

After poking around in the source tree for a while, the
best solution I came up with is to release the cache lock
while calling nssSlot_IsTokenPresent.  Ian, is it safe to
release the cache lock while calling nssSlot_IsTokenPresent?
(Assignee)

Updated

15 years ago
Attachment #128963 - Flags: review?(bugz)
(Assignee)

Updated

15 years ago
Attachment #128963 - Flags: superreview?(jpierre)
(Assignee)

Comment 3

15 years ago
Another solution I considered is to add a version of
nssSlot_IsTokenPresent that assumes the token object
cache is already locked.  That solution is more change
and the new function would have a strange prototype
and only be used here.  That's why I don't like it.

Ideally, we should lock the token object cache only
when necessary, instead of locking it throughout a
lengthy operation.  But it will take a lot of work
to do that.  So I am only making this small change
to release the token object cache lock while calling
nssSlot_IsTokenPresent.
(Assignee)

Updated

15 years ago
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 3.9
(Reporter)

Comment 4

15 years ago
Comment on attachment 128963 [details] [diff] [review]
Proposed patch

I'd prefer a patch where nssSlot_IsTokenPresent would take an additional PRBool
argument, telling it whether to lock or not.

Existing callers would call it with PR_TRUE, this one with PR_FALSE.
This avoids the unlocking/locking of the cache lock 

While reviewing this patch, I noticed that the clear_cache call if the token is
removed has been commented. Why is that ?
Attachment #128963 - Flags: superreview?(jpierre) → superreview-

Comment 5

15 years ago
Comment on attachment 128963 [details] [diff] [review]
Proposed patch

Looking back at the code, I think it would be more straightforward to reorder
things to avoid this.  I think the call to nssSlot_IsTokenPresent() should be
moved into nssTokenObjectCache_FindObjectsByTemplate, before the cache is
locked.
Attachment #128963 - Flags: review?(bugz) → review-
(Assignee)

Comment 6

15 years ago
Comment on attachment 128963 [details] [diff] [review]
Proposed patch

Ian, thanks for the suggestion.  Your suggestion is not
that easy to implement, however.  If you still remember
how that code works, could you put together a patch?
Otherwise I will do that, but I'll need to spend some
time to study the code first.
(Assignee)

Comment 7

15 years ago
Created attachment 131203 [details] [diff] [review]
Proposed patch v2

Ian, this is my attempt at implementing your suggestion.
Please let me know if this is what you have in mind.

I had to modify the comments, too, because the search_for_objects
function no longer checks if the token is present.

Question: I am grabbing a reference of cache->token's slot
and pass that to nssSlot_IsTokenPresent.  I am wondering if
it is necessary to grab that reference.  Can I pass
cache->token->slot directly to nssSlot_IsTokenPresent or
call nssToken_IsPresent (which is an NSS_3_4_CODE function)?
Attachment #128963 - Attachment is obsolete: true
(Assignee)

Comment 8

15 years ago
Comment on attachment 131203 [details] [diff] [review]
Proposed patch v2

Please review this patch.  I thought it'd be a good idea
to rename search_for_objects to better reflect what it
does now.  Any suggestion?
Attachment #131203 - Flags: review?(bugz)

Comment 9

15 years ago
Comment on attachment 131203 [details] [diff] [review]
Proposed patch v2

this is what I had in mind.
Attachment #131203 - Flags: review?(bugz) → review+
(Assignee)

Comment 10

15 years ago
Created attachment 131289 [details] [diff] [review]
Proposed patch v2.1

Differences from the previous patch.

1. I added a new function token_is_present for the nssToken_GetSlot,
nssSlot_IsTokenPresent, nssSlot_Destroy sequence.  This simplifies
the code.

2. I found that nssTokenObjectCache_FindObjectsByTemplate stores
PR_SUCCESS in *statusOpt in the original code, so I made the new
code do that, too.
Attachment #131203 - Attachment is obsolete: true
(Assignee)

Comment 11

15 years ago
Created attachment 131353 [details] [diff] [review]
Proposed patch v2.2

The only difference from v2.1 is a new comment that says
token_is_present must not be called with cache->lock locked.
(Assignee)

Updated

15 years ago
Attachment #131289 - Attachment is obsolete: true
(Assignee)

Comment 12

15 years ago
I've checked in the patch on the NSS trunk (3.9).

Ian, I'd still like you to review patch v2.2 on two points.

1. Verify that in nssTokenObjectCache_FindObjectsByTemplate,
it is correct to set status to *PR_SUCCESS* (with rvObjects
initialized to NULL) if the token is NOT present.

2. What's Stan's naming convention for static functions that
are called *without* locking the object's lock?  The all
lowercase function name that I'm using (token_is_present)
seems to be intended for static functions that are called
with the object locked.  I am worried that I violated this
naming convention.
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
(Assignee)

Comment 13

14 years ago
Ian, could you answer my second question in comment 12?
Thanks.
You need to log in before you can comment on or make changes to this bug.