Closed
Bug 204549
Opened 21 years ago
Closed 21 years ago
NSS crashes in cert cache during client auth stress test
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.9
People
(Reporter: julien.pierre, Assigned: nelson)
Details
Attachments
(2 files, 2 obsolete files)
528 bytes,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
4.73 KB,
patch
|
wtc
:
superreview+
|
Details | Diff | Splinter Review |
I got an access violation running my client auth test. This is on the client auth, in strsclnt. The line is : cache->objects[cachedCerts][i] = create_cert(objects[i], &status); cachedCerts is a macro equal to zero . cache->[objects] contains 3 objects that are all NULL. So the assignment fails. The stack is as follows : get_token_certs_for_cache(nssTokenObjectCacheStr * 0x00526088) line 876 + 31 bytes nssTokenObjectCache_FindObjectsByTemplate(nssTokenObjectCacheStr * 0x00526088, unsigned long 1, CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int * 0x0423fb2c) line 1175 + 9 bytes find_objects_by_template(NSSTokenStr * 0x0052e340, nssSessionStr * 0x00000000, CK_ATTRIBUTE * 0x0423fb60, unsigned long 3, unsigned int 0, int * 0x0423fbcc) line 545 + 32 bytes nssToken_FindCertificatesByNickname(NSSTokenStr * 0x0052e340, nssSessionStr * 0x00000000, char * 0x078c8320, int 2, unsigned int 0, int * 0x0423fbcc) line 782 + 29 bytes PK11_FindCertFromNickname(char * 0x078c8320, void * 0x00505d30) line 1315 + 25 bytes CERT_FindUserCertByUsage(NSSTrustDomainStr * 0x005291c0, char * 0x00505d48, int 0, int 0, void * 0x00505d30) line 272 + 13 bytes FindCertAndKey(cert_and_key * 0x0012ff18) line 850 + 29 bytes StressClient_GetClientAuthData(void * 0x0012ff18, PRFileDesc * 0x046c5d28, CERTDistNamesStr * 0x0423fc84, CERTCertificateStr * * 0x04717810, SECKEYPrivateKeyStr * * 0x04717814) line 914 + 9 bytes ssl3_HandleCertificateRequest(sslSocketStr * 0x046bd980, unsigned char * 0x078a40ac, unsigned int 0) line 5122 + 48 bytes ssl3_HandleHandshakeMessage(sslSocketStr * 0x046bd980, unsigned char * 0x078a3fcc, unsigned int 224) line 8009 + 17 bytes ssl3_HandleHandshake(sslSocketStr * 0x046bd980, sslBufferStr * 0x046bdb28) line 8109 + 25 bytes ssl3_HandleRecord(sslSocketStr * 0x046bd980, SSL3Ciphertext * 0x0423fdfc, sslBufferStr * 0x046bdb28) line 8378 + 13 bytes ssl3_GatherCompleteHandshake(sslSocketStr * 0x046bd980, int 0) line 203 + 22 bytes ssl_GatherRecord1stHandshake(sslSocketStr * 0x046bd980) line 1260 + 11 bytes ssl_Do1stHandshake(sslSocketStr * 0x046bd980) line 149 + 13 bytes ssl_SecureSend(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int 21, int 0) line 1028 + 9 bytes ssl_SecureWrite(sslSocketStr * 0x046bd980, const unsigned char * 0x004093d0, int 21) line 1062 + 19 bytes ssl_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 1282 + 21 bytes PR_Write(PRFileDesc * 0x046c5d28, const void * 0x004093d0, int 21) line 143 + 20 bytes handle_connection(PRFileDesc * 0x046c5d28, int 707) line 666 + 29 bytes do_connects(void * 0x0012fe98, void * 0x00534240, int 707) line 793 + 13 bytes thread_wrapper(void * 0x004123a0) line 374 + 26 bytes _PR_NativeRunThread(void * 0x04656a20) line 433 + 13 bytes MSVCRT! 77c37fb8() KERNEL32! 77e7d33b()
Reporter | ||
Comment 1•21 years ago
|
||
I forgot to mention that this crash happened as I pulled out the smartcard in the middle of the test.
Reporter | ||
Comment 2•21 years ago
|
||
I got another very similar crash at line 887 of the same devutil.c file . for (j=0; j<i; j++) { /* sigh */ nssToken_AddRef(cache->objects[cachedCerts][i]->object->token); nssArena_Destroy(cache->objects[cachedCerts][i]->arena); } The crash is on the "nssToken_AddRef". I notice that the "j" index is not used anywhere in the body of the loop. This is odd. But even if it was, I believe it would still crash, as all the cache->objects are NULL pointers.
Reporter | ||
Updated•21 years ago
|
Priority: -- → P1
Assignee | ||
Comment 3•21 years ago
|
||
Julien, please add directions on how to reproduce, including specific hardware if that is relevant.
Comment 4•21 years ago
|
||
Assigned the bug to Nelson. Something is wrong with the for loop in comment 2. At least nssArena_Destroy should not be called repeatedly on the same arena.
Assignee: wtc → nelsonb
Target Milestone: --- → 3.9
Reporter | ||
Comment 5•21 years ago
|
||
Nelson, This crash was very hard to reproduce. You need to run a strsclnt test against an SSL server using a smartcard, and then pull out the smartcard in the middle. It may take a few dozen attempts to reproduce this particular crash.
Comment 6•21 years ago
|
||
I took a look at the two crashes reported in this bug. The condition that caused the second crash (reported in comment 2) is just absurd -- if all the cache->objects are NULL pointers, we should have crashed earlier in that function. This leads me to believe that we are bitten by the fact that access to token->cache, in particular the creation of token->cache->objects[cachedCerts], is not thread safe. Also, some of the functions we call may return NULL while setting the 'status' output argument to PR_SUCCESS, or not setting it at all. So if 'status' is PR_SUCCESS, 'objects' may be NULL, and create_object_array will return NULL, and cache->objects[cachedCerts] will be NULL.
Comment 7•21 years ago
|
||
I agree with Wan-Teh. These crashes, and the one seen in bug 214535, are likely due to the fact that the creation and destruction of the token object cache are not safe. I guess I didn't account for multithreaded apps with removable tokens :( I suppose the safest solution is to use the default slot lock from PK11SlotInfo to protect those operations.
Comment 8•21 years ago
|
||
I looked at this further and found that I was wrong. Access to token->cache *is* thread-safe -- it is protected by token->cache->lock. That lock is acquired in nssTokenObjectCache_FindObjectsByTemplate before it calls get_token_certs_for_cache. I missed that PZ_Lock(cache->lock) call earlier. I will look into the first crash. I cannot explain the second crash now.
Assignee | ||
Comment 9•21 years ago
|
||
I believe both of these crashes were caused by undetected memory allocation failures. The functions create_object_array and create_object both had paths that would set the "status" output variable to PR_SUCCESS (or simply leave it unchanged) when a memory allocation failure occured. This caused the caller to store a NULL pointer buf not detect failure, since the status variable contained PR_SUCCESS. I will shortly attach a patch that I believe fixes these crashes.
Assignee | ||
Comment 10•21 years ago
|
||
This patch also does a wee bit of cleanup in the affected functions.
Assignee | ||
Updated•21 years ago
|
Attachment #131782 -
Flags: superreview?(wchang0222)
Attachment #131782 -
Flags: review?(jpierre)
Comment 11•21 years ago
|
||
Comment on attachment 131782 [details] [diff] [review] Proposed patch v1 > PRUint32 j; > for (j=0; j<i; j++) { > /* sigh */ >- nssToken_AddRef(cache->objects[cachedCerts][i]->object->token); >- nssArena_Destroy(cache->objects[cachedCerts][i]->arena); >+ nssToken_AddRef(cache->objects[cachedCerts][j]->object->token); >+ nssArena_Destroy(cache->objects[cachedCerts][j]->arena); > } There are two other for loops like this, for Trust and CRLs. We should fix them, too.
Attachment #131782 -
Flags: superreview?(wchang0222) → superreview+
Comment 12•21 years ago
|
||
Nelson, you wrote:
> The functions create_object_array and create_object both had paths
> that would set the "status" output variable to PR_SUCCESS (or simply leave it
> unchanged) when a memory allocation failure occured.
By PR_SUCCESS, you really meant PR_FALSE, which has the same value
(0) as PR_SUCCESS, right?
Assignee | ||
Comment 13•21 years ago
|
||
This patch is a superset of patch v1. It applies the same fixes to the trust and CRL caches as patch v1 applied to the cert cache. In answer to the question in comment 12, I should have written that the code returned a value that was interpreted by the caller as PR_SUCCESS. Yes, the code that returned PR_FALSE was one of the problems.
Assignee | ||
Updated•21 years ago
|
Attachment #131782 -
Attachment is obsolete: true
Assignee | ||
Comment 14•21 years ago
|
||
Comment on attachment 131910 [details] [diff] [review] patch v2 Please review this larger patch.
Attachment #131910 -
Flags: superreview?(wchang0222)
Attachment #131910 -
Flags: review?(jpierre)
Comment 15•21 years ago
|
||
Comment on attachment 131910 [details] [diff] [review] patch v2 I noticed a problem I missed when I reviewed the previous version of this patch. In create_object, we now goto loser if the nssArena_Create() call fails. But under the loser label, we call nssArena_Destroy(arena) without testing arena. If nssArena_Create() fails, arena is NULL, and nssArena_Destroy(arena) will crash. This is the only problem with the patch. I verified that create_object_array and create_object always set *status before they return. One minor optimization: in create_object_array, } else if (*numObjects > 0) { can be replaced by simply } else { because *numObjects cannot be zero at that point with your patch applied.
Attachment #131910 -
Flags: superreview?(wchang0222) → superreview-
Comment 16•21 years ago
|
||
I found this bug while reviewing the code near Nelson's patch.
Comment 17•21 years ago
|
||
Comment on attachment 131922 [details] [diff] [review] find_objects_by_template is not setting *statusOpt before one return statement Please review this patch.
Attachment #131922 -
Flags: review?(MisterSSL)
Assignee | ||
Comment 18•21 years ago
|
||
Addresses comment 15 above.
Attachment #131910 -
Attachment is obsolete: true
Assignee | ||
Updated•21 years ago
|
Attachment #131976 -
Flags: superreview?(wchang0222)
Attachment #131976 -
Flags: review?(jpierre)
Assignee | ||
Comment 19•21 years ago
|
||
Comment on attachment 131922 [details] [diff] [review] find_objects_by_template is not setting *statusOpt before one return statement r=MisterSSL
Attachment #131922 -
Flags: review?(MisterSSL) → review+
Comment 20•21 years ago
|
||
Comment on attachment 131976 [details] [diff] [review] patch v3 >- if (*numObjects == MAX_LOCAL_CACHE_OBJECTS) { >+ if (*numObjects >= MAX_LOCAL_CACHE_OBJECTS) { Actually *numObjects cannot be greater than MAX_LOCAL_CACHE_OBJECTS because we pass MAX_LOCAL_CACHE_OBJECTS as the third argument to the nssToken_FindXXX() functions, which construct the objects arrays for this function. But this is good defensive programming. r=wtc.
Attachment #131976 -
Flags: superreview?(wchang0222) → superreview+
Comment 21•21 years ago
|
||
Comment on attachment 131922 [details] [diff] [review] find_objects_by_template is not setting *statusOpt before one return statement I checked in this patch on the NSS trunk.
Assignee | ||
Comment 22•21 years ago
|
||
Patch v3 checked in. Resolved/Fixed.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Reporter | ||
Updated•21 years ago
|
Attachment #131976 -
Flags: review?(jpierre)
Reporter | ||
Updated•21 years ago
|
Attachment #131910 -
Flags: review?(jpierre)
Reporter | ||
Updated•21 years ago
|
Attachment #131782 -
Flags: review?(jpierre)
You need to log in
before you can comment on or make changes to this bug.
Description
•