NSS crashes in cert cache during client auth stress test

RESOLVED FIXED in 3.9

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
14 years ago
14 years ago

People

(Reporter: Julien Pierre, Assigned: Nelson Bolyard (seldom reads bugmail))

Tracking

x86
Windows XP

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

14 years ago
I got an access violation running my client auth test. This is on the client
auth, in strsclnt.

The line is :

	cache->objects[cachedCerts][i] = create_cert(objects[i], &status);

cachedCerts is a macro equal to zero .
cache->[objects] contains 3 objects that are all NULL. So the assignment fails.

The stack is as follows :

get_token_certs_for_cache(nssTokenObjectCacheStr * 0x00526088) line 876 + 31 bytes
nssTokenObjectCache_FindObjectsByTemplate(nssTokenObjectCacheStr * 0x00526088,
unsigned long 1, CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int
* 0x0423fb2c) line 1175 + 9 bytes
find_objects_by_template(NSSTokenStr * 0x0052e340, nssSessionStr * 0x00000000,
CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int * 0x0423fbcc)
line 545 + 32 bytes
nssToken_FindCertificatesByNickname(NSSTokenStr * 0x0052e340, nssSessionStr *
0x00000000, char * 0x078c8320, int 2, unsigned int 0, int * 0x0423fbcc) line 782
+ 29 bytes
PK11_FindCertFromNickname(char * 0x078c8320, void * 0x00505d30) line 1315 + 25 bytes
CERT_FindUserCertByUsage(NSSTrustDomainStr * 0x005291c0, char * 0x00505d48, int
0, int 0, void * 0x00505d30) line 272 + 13 bytes
FindCertAndKey(cert_and_key * 0x0012ff18) line 850 + 29 bytes
StressClient_GetClientAuthData(void * 0x0012ff18, PRFileDesc * 0x046c5d28,
CERTDistNamesStr * 0x0423fc84, CERTCertificateStr * * 0x04717810,
SECKEYPrivateKeyStr * * 0x04717814) line 914 + 9 bytes
ssl3_HandleCertificateRequest(sslSocketStr * 0x046bd980, unsigned char *
0x078a40ac, unsigned int 0) line 5122 + 48 bytes
ssl3_HandleHandshakeMessage(sslSocketStr * 0x046bd980, unsigned char *
0x078a3fcc, unsigned int 224) line 8009 + 17 bytes
ssl3_HandleHandshake(sslSocketStr * 0x046bd980, sslBufferStr * 0x046bdb28) line
8109 + 25 bytes
ssl3_HandleRecord(sslSocketStr * 0x046bd980, SSL3Ciphertext * 0x0423fdfc,
sslBufferStr * 0x046bdb28) line 8378 + 13 bytes
ssl3_GatherCompleteHandshake(sslSocketStr * 0x046bd980, int 0) line 203 + 22 bytes
ssl_GatherRecord1stHandshake(sslSocketStr * 0x046bd980) line 1260 + 11 bytes
ssl_Do1stHandshake(sslSocketStr * 0x046bd980) line 149 + 13 bytes
ssl_SecureSend(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int
21, int 0) line 1028 + 9 bytes
ssl_SecureWrite(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int
21) line 1062 + 19 bytes
ssl_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 1282 +
21 bytes
PR_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 143 + 20
bytes
handle_connection(PRFileDesc * 0x046c5d28, int 707) line 666 + 29 bytes
do_connects(void * 0x0012fe98, void * 0x00534240, int 707) line 793 + 13 bytes
thread_wrapper(void * 0x004123a0) line 374 + 26 bytes
_PR_NativeRunThread(void * 0x04656a20) line 433 + 13 bytes
MSVCRT! 77c37fb8()
KERNEL32! 77e7d33b()
(Reporter)

Comment 1

14 years ago
I forgot to mention that this crash happened as I pulled out the smartcard in
the middle of the test.
(Reporter)

Comment 2

14 years ago
I got another very similar crash at line 887 of the same devutil.c file .

	for (j=0; j<i; j++) {
	    /* sigh */
	    nssToken_AddRef(cache->objects[cachedCerts][i]->object->token);
	    nssArena_Destroy(cache->objects[cachedCerts][i]->arena);
        }


The crash is on the "nssToken_AddRef".
I notice that the "j" index is not used anywhere in the body of the loop. This
is odd. But even if it was, I believe it would still crash, as all the
cache->objects are NULL pointers.
(Reporter)

Updated

14 years ago
Priority: -- → P1
(Assignee)

Comment 3

14 years ago
Julien, please add directions on how to reproduce, including specific hardware
if that is relevant.

Comment 4

14 years ago
Assigned the bug to Nelson.

Something is wrong with the for loop in comment 2.
At least nssArena_Destroy should not be called
repeatedly on the same arena.
Assignee: wtc → nelsonb
Target Milestone: --- → 3.9
(Reporter)

Comment 5

14 years ago
Nelson,

This crash was very hard to reproduce. You need to run a strsclnt test against
an SSL server using a smartcard, and then pull out the smartcard in the middle.
It may take a few dozen attempts to reproduce this particular crash.

Comment 6

14 years ago
I took a look at the two crashes reported in this bug.
The condition that caused the second crash (reported in
comment 2) is just absurd -- if all the cache->objects
are NULL pointers, we should have crashed earlier in that
function.  This leads me to believe that we are bitten
by the fact that access to token->cache, in particular
the creation of token->cache->objects[cachedCerts], is
not thread safe.  Also, some of the functions we call
may return NULL while setting the 'status' output argument
to PR_SUCCESS, or not setting it at all.  So if 'status'
is PR_SUCCESS, 'objects' may be NULL, and create_object_array
will return NULL, and cache->objects[cachedCerts] will
be NULL.

Comment 7

14 years ago
I agree with Wan-Teh.  These crashes, and the one seen in bug 214535, are likely
due to the fact that the creation and destruction of the token object cache are
not safe.  I guess I didn't account for multithreaded apps with removable tokens :(

I suppose the safest solution is to use the default slot lock from PK11SlotInfo
to protect those operations.

Comment 8

14 years ago
I looked at this further and found that I was wrong.
Access to token->cache *is* thread-safe -- it is
protected by token->cache->lock.  That lock is
acquired in nssTokenObjectCache_FindObjectsByTemplate
before it calls get_token_certs_for_cache.  I missed
that PZ_Lock(cache->lock) call earlier.

I will look into the first crash.  I cannot explain
the second crash now.
(Assignee)

Comment 9

14 years ago
I believe both of these crashes were caused by undetected memory allocation
failures.  The functions create_object_array and create_object both had paths
that would set the "status" output variable to PR_SUCCESS (or simply leave it
unchanged) when a memory allocation failure occured.  This caused the caller
to store a NULL pointer buf not detect failure, since the status variable 
contained PR_SUCCESS.  

I will shortly attach a patch that I believe fixes these crashes.

(Assignee)

Comment 10

14 years ago
Created attachment 131782 [details] [diff] [review]
Proposed patch v1

This patch also does a wee bit of cleanup in the affected functions.
(Assignee)

Updated

14 years ago
Attachment #131782 - Flags: superreview?(wchang0222)
Attachment #131782 - Flags: review?(jpierre)

Comment 11

14 years ago
Comment on attachment 131782 [details] [diff] [review]
Proposed patch v1

> 	PRUint32 j;
> 	for (j=0; j<i; j++) {
> 	    /* sigh */
>-	    nssToken_AddRef(cache->objects[cachedCerts][i]->object->token);
>-	    nssArena_Destroy(cache->objects[cachedCerts][i]->arena);
>+	    nssToken_AddRef(cache->objects[cachedCerts][j]->object->token);
>+	    nssArena_Destroy(cache->objects[cachedCerts][j]->arena);
> 	}

There are two other for loops like this, for Trust and CRLs.  We
should fix them, too.
Attachment #131782 - Flags: superreview?(wchang0222) → superreview+

Comment 12

14 years ago
Nelson, you wrote:
> The functions create_object_array and create_object both had paths
> that would set the "status" output variable to PR_SUCCESS (or simply leave it
> unchanged) when a memory allocation failure occured.

By PR_SUCCESS, you really meant PR_FALSE, which has the same value
(0) as PR_SUCCESS, right?

(Assignee)

Comment 13

14 years ago
Created attachment 131910 [details] [diff] [review]
patch v2

This patch is a superset of patch v1.  It applies the same fixes to 
the trust and CRL caches as patch v1 applied to the cert cache. 

In answer to the question in comment 12, I should have written that the 
code returned a value that was interpreted by the caller as PR_SUCCESS.  
Yes, the code that returned PR_FALSE was one of the problems.
(Assignee)

Updated

14 years ago
Attachment #131782 - Attachment is obsolete: true
(Assignee)

Comment 14

14 years ago
Comment on attachment 131910 [details] [diff] [review]
patch v2

Please review this larger patch.
Attachment #131910 - Flags: superreview?(wchang0222)
Attachment #131910 - Flags: review?(jpierre)

Comment 15

14 years ago
Comment on attachment 131910 [details] [diff] [review]
patch v2

I noticed a problem I missed when I reviewed the previous
version of this patch.

In create_object, we now goto loser if the nssArena_Create()
call fails.  But under the loser label, we call
nssArena_Destroy(arena) without testing arena.	If
nssArena_Create() fails, arena is NULL, and
nssArena_Destroy(arena) will crash.

This is the only problem with the patch.  I verified that
create_object_array and create_object always set *status
before they return.

One minor optimization: in create_object_array,

    } else if (*numObjects > 0) {

can be replaced by simply

    } else {

because *numObjects cannot be zero at that point with your
patch applied.
Attachment #131910 - Flags: superreview?(wchang0222) → superreview-

Comment 16

14 years ago
Created attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

I found this bug while reviewing the code near Nelson's patch.

Comment 17

14 years ago
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

Please review this patch.
Attachment #131922 - Flags: review?(MisterSSL)
(Assignee)

Comment 18

14 years ago
Created attachment 131976 [details] [diff] [review]
patch v3

Addresses comment 15 above.
Attachment #131910 - Attachment is obsolete: true
(Assignee)

Updated

14 years ago
Attachment #131976 - Flags: superreview?(wchang0222)
Attachment #131976 - Flags: review?(jpierre)
(Assignee)

Comment 19

14 years ago
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

r=MisterSSL
Attachment #131922 - Flags: review?(MisterSSL) → review+

Comment 20

14 years ago
Comment on attachment 131976 [details] [diff] [review]
patch v3

>-    if (*numObjects == MAX_LOCAL_CACHE_OBJECTS) {
>+    if (*numObjects >= MAX_LOCAL_CACHE_OBJECTS) {

Actually *numObjects cannot be greater than
MAX_LOCAL_CACHE_OBJECTS because we pass
MAX_LOCAL_CACHE_OBJECTS as the third argument to
the nssToken_FindXXX() functions, which construct
the objects arrays for this function.  But this is
good defensive programming.

r=wtc.
Attachment #131976 - Flags: superreview?(wchang0222) → superreview+

Comment 21

14 years ago
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

I checked in this patch on the NSS trunk.
(Assignee)

Comment 22

14 years ago
Patch v3 checked in.  Resolved/Fixed.
Status: NEW → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED
(Reporter)

Updated

14 years ago
Attachment #131976 - Flags: review?(jpierre)
(Reporter)

Updated

14 years ago
Attachment #131910 - Flags: review?(jpierre)
(Reporter)

Updated

14 years ago
Attachment #131782 - Flags: review?(jpierre)
You need to log in before you can comment on or make changes to this bug.