Last Comment Bug 204549 - NSS crashes in cert cache during client auth stress test
: NSS crashes in cert cache during client auth stress test
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.8
: x86 Windows XP
: P1 normal (vote)
: 3.9
Assigned To: Nelson Bolyard (seldom reads bugmail)
: Bishakha Banerjee
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-05-05 19:20 PDT by Julien Pierre
Modified: 2003-09-24 13:36 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Proposed patch v1 (2.54 KB, patch)
2003-09-19 18:54 PDT, Nelson Bolyard (seldom reads bugmail)
wtc: superreview+
Details | Diff | Splinter Review
patch v2 (4.28 KB, patch)
2003-09-22 14:34 PDT, Nelson Bolyard (seldom reads bugmail)
wtc: superreview-
Details | Diff | Splinter Review
find_objects_by_template is not setting *statusOpt before one return statement (528 bytes, patch)
2003-09-22 17:44 PDT, Wan-Teh Chang
nelson: review+
Details | Diff | Splinter Review
patch v3 (4.73 KB, patch)
2003-09-23 11:07 PDT, Nelson Bolyard (seldom reads bugmail)
wtc: superreview+
Details | Diff | Splinter Review

Description Julien Pierre 2003-05-05 19:20:30 PDT
I got an access violation running my client auth test. This is on the client
auth, in strsclnt.

The line is :

	cache->objects[cachedCerts][i] = create_cert(objects[i], &status);

cachedCerts is a macro equal to zero .
cache->[objects] contains 3 objects that are all NULL. So the assignment fails.

The stack is as follows :

get_token_certs_for_cache(nssTokenObjectCacheStr * 0x00526088) line 876 + 31 bytes
nssTokenObjectCache_FindObjectsByTemplate(nssTokenObjectCacheStr * 0x00526088,
unsigned long 1, CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int
* 0x0423fb2c) line 1175 + 9 bytes
find_objects_by_template(NSSTokenStr * 0x0052e340, nssSessionStr * 0x00000000,
CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int * 0x0423fbcc)
line 545 + 32 bytes
nssToken_FindCertificatesByNickname(NSSTokenStr * 0x0052e340, nssSessionStr *
0x00000000, char * 0x078c8320, int 2, unsigned int 0, int * 0x0423fbcc) line 782
+ 29 bytes
PK11_FindCertFromNickname(char * 0x078c8320, void * 0x00505d30) line 1315 + 25 bytes
CERT_FindUserCertByUsage(NSSTrustDomainStr * 0x005291c0, char * 0x00505d48, int
0, int 0, void * 0x00505d30) line 272 + 13 bytes
FindCertAndKey(cert_and_key * 0x0012ff18) line 850 + 29 bytes
StressClient_GetClientAuthData(void * 0x0012ff18, PRFileDesc * 0x046c5d28,
CERTDistNamesStr * 0x0423fc84, CERTCertificateStr * * 0x04717810,
SECKEYPrivateKeyStr * * 0x04717814) line 914 + 9 bytes
ssl3_HandleCertificateRequest(sslSocketStr * 0x046bd980, unsigned char *
0x078a40ac, unsigned int 0) line 5122 + 48 bytes
ssl3_HandleHandshakeMessage(sslSocketStr * 0x046bd980, unsigned char *
0x078a3fcc, unsigned int 224) line 8009 + 17 bytes
ssl3_HandleHandshake(sslSocketStr * 0x046bd980, sslBufferStr * 0x046bdb28) line
8109 + 25 bytes
ssl3_HandleRecord(sslSocketStr * 0x046bd980, SSL3Ciphertext * 0x0423fdfc,
sslBufferStr * 0x046bdb28) line 8378 + 13 bytes
ssl3_GatherCompleteHandshake(sslSocketStr * 0x046bd980, int 0) line 203 + 22 bytes
ssl_GatherRecord1stHandshake(sslSocketStr * 0x046bd980) line 1260 + 11 bytes
ssl_Do1stHandshake(sslSocketStr * 0x046bd980) line 149 + 13 bytes
ssl_SecureSend(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int
21, int 0) line 1028 + 9 bytes
ssl_SecureWrite(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int
21) line 1062 + 19 bytes
ssl_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 1282 +
21 bytes
PR_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 143 + 20
bytes
handle_connection(PRFileDesc * 0x046c5d28, int 707) line 666 + 29 bytes
do_connects(void * 0x0012fe98, void * 0x00534240, int 707) line 793 + 13 bytes
thread_wrapper(void * 0x004123a0) line 374 + 26 bytes
_PR_NativeRunThread(void * 0x04656a20) line 433 + 13 bytes
MSVCRT! 77c37fb8()
KERNEL32! 77e7d33b()
Comment 1 Julien Pierre 2003-05-05 19:25:32 PDT
I forgot to mention that this crash happened as I pulled out the smartcard in
the middle of the test.
Comment 2 Julien Pierre 2003-05-05 20:26:18 PDT
I got another very similar crash at line 887 of the same devutil.c file .

	for (j=0; j<i; j++) {
	    /* sigh */
	    nssToken_AddRef(cache->objects[cachedCerts][i]->object->token);
	    nssArena_Destroy(cache->objects[cachedCerts][i]->arena);
        }


The crash is on the "nssToken_AddRef".
I notice that the "j" index is not used anywhere in the body of the loop. This
is odd. But even if it was, I believe it would still crash, as all the
cache->objects are NULL pointers.
Comment 3 Nelson Bolyard (seldom reads bugmail) 2003-05-13 20:51:25 PDT
Julien, please add directions on how to reproduce, including specific hardware
if that is relevant.
Comment 4 Wan-Teh Chang 2003-05-14 10:27:52 PDT
Assigned the bug to Nelson.

Something is wrong with the for loop in comment 2.
At least nssArena_Destroy should not be called
repeatedly on the same arena.
Comment 5 Julien Pierre 2003-05-14 14:33:17 PDT
Nelson,

This crash was very hard to reproduce. You need to run a strsclnt test against
an SSL server using a smartcard, and then pull out the smartcard in the middle.
It may take a few dozen attempts to reproduce this particular crash.
Comment 6 Wan-Teh Chang 2003-07-30 18:33:28 PDT
I took a look at the two crashes reported in this bug.
The condition that caused the second crash (reported in
comment 2) is just absurd -- if all the cache->objects
are NULL pointers, we should have crashed earlier in that
function.  This leads me to believe that we are bitten
by the fact that access to token->cache, in particular
the creation of token->cache->objects[cachedCerts], is
not thread safe.  Also, some of the functions we call
may return NULL while setting the 'status' output argument
to PR_SUCCESS, or not setting it at all.  So if 'status'
is PR_SUCCESS, 'objects' may be NULL, and create_object_array
will return NULL, and cache->objects[cachedCerts] will
be NULL.
Comment 7 Ian McGreer 2003-07-30 21:27:11 PDT
I agree with Wan-Teh.  These crashes, and the one seen in bug 214535, are likely
due to the fact that the creation and destruction of the token object cache are
not safe.  I guess I didn't account for multithreaded apps with removable tokens :(

I suppose the safest solution is to use the default slot lock from PK11SlotInfo
to protect those operations.
Comment 8 Wan-Teh Chang 2003-07-31 13:38:59 PDT
I looked at this further and found that I was wrong.
Access to token->cache *is* thread-safe -- it is
protected by token->cache->lock.  That lock is
acquired in nssTokenObjectCache_FindObjectsByTemplate
before it calls get_token_certs_for_cache.  I missed
that PZ_Lock(cache->lock) call earlier.

I will look into the first crash.  I cannot explain
the second crash now.
Comment 9 Nelson Bolyard (seldom reads bugmail) 2003-09-19 18:50:35 PDT
I believe both of these crashes were caused by undetected memory allocation
failures.  The functions create_object_array and create_object both had paths
that would set the "status" output variable to PR_SUCCESS (or simply leave it
unchanged) when a memory allocation failure occured.  This caused the caller
to store a NULL pointer buf not detect failure, since the status variable 
contained PR_SUCCESS.  

I will shortly attach a patch that I believe fixes these crashes.

Comment 10 Nelson Bolyard (seldom reads bugmail) 2003-09-19 18:54:35 PDT
Created attachment 131782 [details] [diff] [review]
Proposed patch v1

This patch also does a wee bit of cleanup in the affected functions.
Comment 11 Wan-Teh Chang 2003-09-19 20:05:49 PDT
Comment on attachment 131782 [details] [diff] [review]
Proposed patch v1

> 	PRUint32 j;
> 	for (j=0; j<i; j++) {
> 	    /* sigh */
>-	    nssToken_AddRef(cache->objects[cachedCerts][i]->object->token);
>-	    nssArena_Destroy(cache->objects[cachedCerts][i]->arena);
>+	    nssToken_AddRef(cache->objects[cachedCerts][j]->object->token);
>+	    nssArena_Destroy(cache->objects[cachedCerts][j]->arena);
> 	}

There are two other for loops like this, for Trust and CRLs.  We
should fix them, too.
Comment 12 Wan-Teh Chang 2003-09-19 20:20:54 PDT
Nelson, you wrote:
> The functions create_object_array and create_object both had paths
> that would set the "status" output variable to PR_SUCCESS (or simply leave it
> unchanged) when a memory allocation failure occured.

By PR_SUCCESS, you really meant PR_FALSE, which has the same value
(0) as PR_SUCCESS, right?

Comment 13 Nelson Bolyard (seldom reads bugmail) 2003-09-22 14:34:18 PDT
Created attachment 131910 [details] [diff] [review]
patch v2

This patch is a superset of patch v1.  It applies the same fixes to 
the trust and CRL caches as patch v1 applied to the cert cache. 

In answer to the question in comment 12, I should have written that the 
code returned a value that was interpreted by the caller as PR_SUCCESS.  
Yes, the code that returned PR_FALSE was one of the problems.
Comment 14 Nelson Bolyard (seldom reads bugmail) 2003-09-22 14:35:02 PDT
Comment on attachment 131910 [details] [diff] [review]
patch v2

Please review this larger patch.
Comment 15 Wan-Teh Chang 2003-09-22 17:22:12 PDT
Comment on attachment 131910 [details] [diff] [review]
patch v2

I noticed a problem I missed when I reviewed the previous
version of this patch.

In create_object, we now goto loser if the nssArena_Create()
call fails.  But under the loser label, we call
nssArena_Destroy(arena) without testing arena.	If
nssArena_Create() fails, arena is NULL, and
nssArena_Destroy(arena) will crash.

This is the only problem with the patch.  I verified that
create_object_array and create_object always set *status
before they return.

One minor optimization: in create_object_array,

    } else if (*numObjects > 0) {

can be replaced by simply

    } else {

because *numObjects cannot be zero at that point with your
patch applied.
Comment 16 Wan-Teh Chang 2003-09-22 17:44:05 PDT
Created attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

I found this bug while reviewing the code near Nelson's patch.
Comment 17 Wan-Teh Chang 2003-09-22 17:44:35 PDT
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

Please review this patch.
Comment 18 Nelson Bolyard (seldom reads bugmail) 2003-09-23 11:07:35 PDT
Created attachment 131976 [details] [diff] [review]
patch v3

Addresses comment 15 above.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2003-09-23 11:13:32 PDT
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

r=MisterSSL
Comment 20 Wan-Teh Chang 2003-09-23 13:25:21 PDT
Comment on attachment 131976 [details] [diff] [review]
patch v3

>-    if (*numObjects == MAX_LOCAL_CACHE_OBJECTS) {
>+    if (*numObjects >= MAX_LOCAL_CACHE_OBJECTS) {

Actually *numObjects cannot be greater than
MAX_LOCAL_CACHE_OBJECTS because we pass
MAX_LOCAL_CACHE_OBJECTS as the third argument to
the nssToken_FindXXX() functions, which construct
the objects arrays for this function.  But this is
good defensive programming.

r=wtc.
Comment 21 Wan-Teh Chang 2003-09-23 13:35:05 PDT
Comment on attachment 131922 [details] [diff] [review]
find_objects_by_template is not setting *statusOpt before one return statement

I checked in this patch on the NSS trunk.
Comment 22 Nelson Bolyard (seldom reads bugmail) 2003-09-24 13:36:35 PDT
Patch v3 checked in.  Resolved/Fixed.

Note You need to log in before you can comment on or make changes to this bug.