Open
Bug 943144
Opened 11 years ago
Updated 1 year ago
Deadlock in the trust domain lock and the object lock
Categories
(NSS :: Libraries, defect, P3)
Tracking
(Not tracked)
ASSIGNED
People
(Reporter: elio.maldonado.batiz, Assigned: rrelyea)
References
Details
Attachments
(1 file, 2 obsolete files)
4.27 KB,
patch
|
ryan.sleevi
:
review-
|
Details | Diff | Splinter Review |
A Red Hat customer reported experiencing about once a day what they referred to as a hang. Meaning that their client will make a connection to their ldap server and it will not reply. The ldapsearch did make a bind but never returned.
They waited 10 minutes before killing it. During this time Directory Server made no log updates. The only way to fix this was either to restart ns-slapd or reboot the server.
Under certain conditions, when server is processing multiple outgoing replication or windows sync agreements using TLS/SSL, and processing incoming client requests that use TLS/SSL and Simple Paged Results, the server will become unresponsive to new incoming client requests.
Comment 1•11 years ago
|
||
Do you have more info that establishes it as a trust domain lock?
Reporter | ||
Comment 2•11 years ago
|
||
Patch by Bob Relyea, minimally to the latest sources.
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Ryan Sleevi from comment #1)
> Do you have more info that establishes it as a trust domain lock?
Adding Bob and Rich which did the analysis and tested the fix.
Reporter | ||
Updated•11 years ago
|
Reporter | ||
Comment 4•11 years ago
|
||
Unfortunately only Red Hat folks can currently access this link.
Comment 5•11 years ago
|
||
Comment on attachment 8338147 [details] [diff] [review]
fix-deadlock.patch
Review of attachment 8338147 [details] [diff] [review]:
-----------------------------------------------------------------
::: lib/pki/tdcache.c
@@ +394,2 @@
> static void
> +remove_token_certs(NSSCertificate *c, struct token_cert_dtor *dtor)
style: This function seems to have obtained a stray space at the EOL
@@ +454,5 @@
> PZ_Lock(td->cache->lock);
> + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList);
> + PZ_Unlock(td->cache->lock);
> +
> + /* find the certs that match this token without olding the td cache lock */
typo: olding -> holding
@@ +455,5 @@
> + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList);
> + PZ_Unlock(td->cache->lock);
> +
> + /* find the certs that match this token without olding the td cache lock */
> + iter=nssList_CreateIterator(certList);
style: The rest of this function uses " = " (whitespace around assignment). It would be good to be consistent here.
@@ +462,5 @@
> + }
> + for (c = (NSSCertificate *)nssListIterator_Start(iter);
> + c != (NSSCertificate *)NULL;
> + c = (NSSCertificate *)nssListIterator_Next(iter)) {
> + remove_token_certs( c, &dtor);
style: seemingly unnecessary whitespace between "( c"
BUG? |certList| holds a reference to the certificate, per line 391. This ref is undone if and only if object->instances[i]->token == dtor->token (in remove_token_certs). If it doesn't, this reference is leaked, is it not?
Further, because of the added ref, it seems like you will never hit the condition in line 476, because td->cache->issuerAndSN will still be holding a reference (because the reference was copied when creating certList).
@@ +487,5 @@
> }
> }
> +
> + nspr_rv = PR_SUCCESS;
> +loser:
Don't we typically have an additional line break before loser labels?
Attachment #8338147 -
Flags: review-
Reporter | ||
Updated•11 years ago
|
Assignee: nobody → rrelyea
Reporter | ||
Updated•11 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 6•9 years ago
|
||
Updated for the current sources. Bob, could you please address Ryan's concerns in the previous review? Thank you.
Attachment #8338147 -
Attachment is obsolete: true
Flags: needinfo?(rrelyea)
Attachment #8741607 -
Flags: review?(ryan.sleevi)
Attachment #8741607 -
Flags: review?(rrelyea)
Assignee | ||
Comment 7•9 years ago
|
||
> BUG? |certList| holds a reference to the certificate, per line 391. This ref is undone if and only if
> object->instances[i]->token == dtor->token (in remove_token_certs). If it doesn't, this reference is leaked,
> is it not
Good catch. This actually this is a bug. Before we destroy the list we should walk through and Free the references. I was expecting the nssList_Destroy did that. The ref you are refering to is the ref held by instances array, which needs to be freed because we are dropping the instance.
That means that we probably need 2 cert_iter functions, one for this case and one for nssTrustDomain_GetCertsFromCache (where the latter returns an array holding the cert references).
> Further, because of the added ref, it seems like you will never hit the condition in line 476, because
> td->cache->issuerAndSN will still be holding a reference (because the reference was copied when creating
> certList).
I'm not sure what you mean by the condition of line 476.
In the new code line 476 is a comment. In the old code it's in another function. I'm guessing you might mean line 479 in the new code. I'm not sure what the td->cache->issuerAndSN holding a reference has to do with the numInstances. Instances are dropped from the array in remove_token_certs. The instance itself may not be gone (it's refrenced counted), but the numInstances referenced by this object will decrement independent of the reference count on the object. When the last token gets removed from the cert, it gets removed from the cache.
Flags: needinfo?(rrelyea)
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8741607 [details] [diff] [review]
fix deadlock - updated
Review of attachment 8741607 [details] [diff] [review]:
-----------------------------------------------------------------
r- Ryan pointed an actual leak. We should include the code fragment I've suggested here.
::: lib/pki/tdcache.c
@@ +495,5 @@
> }
> +
> + nspr_rv = PR_SUCCESS;
> +loser:
> + if (certList) {
As Ryan* pointed out, we need to free the reference we made from the certlist before we destroy the certlist:
for (c = (NSSCertificate *)nssListIterator_Start(iter);
c != (NSSCertificate *)NULL;
c = (NSSCertificate *)nssListIterator_Next(iter)) {
nssCertificate_Destroy(c);
}
Attachment #8741607 -
Flags: review?(rrelyea) → review-
Comment 9•9 years ago
|
||
Comment on attachment 8741607 [details] [diff] [review]
fix deadlock - updated
Review of attachment 8741607 [details] [diff] [review]:
-----------------------------------------------------------------
setting r- because it's still not addressed
Attachment #8741607 -
Flags: review?(ryan.sleevi) → review-
Reporter | ||
Comment 10•9 years ago
|
||
Adresses review requests
Attachment #8741607 -
Attachment is obsolete: true
Attachment #8751032 -
Flags: review?(ryan.sleevi)
Comment 11•9 years ago
|
||
Comment on attachment 8751032 [details] [diff] [review]
fix deadlock
Review of attachment 8751032 [details] [diff] [review]:
-----------------------------------------------------------------
::: lib/pki/tdcache.c
@@ +457,5 @@
> PZ_Lock(td->cache->lock);
> + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList);
> + PZ_Unlock(td->cache->lock);
> +
> + /* find the certs that match this token without olding the td cache lock */
typo: s/olding/holding/
Which I flagged previously.
However, this comment, while accurate, doesn't explain 'why', which is far more important.
@@ +458,5 @@
> + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList);
> + PZ_Unlock(td->cache->lock);
> +
> + /* find the certs that match this token without olding the td cache lock */
> + iter=nssList_CreateIterator(certList);
I also flagged this previously.
@@ +478,2 @@
> for (i=0; i<dtor.numCerts; i++) {
> if (dtor.certs[i]->object.numInstances == 0) {
It doesn't seem the substance of the concerns I raised previously has been addressed here. Could you clarify why you believe it is?
@@ +494,5 @@
> }
> }
> +
> + nspr_rv = PR_SUCCESS;
> +loser:
Also not addressed
@@ +499,5 @@
> + if (certList) {
> + for (c = (NSSCertificate *)nssListIterator_Start(iter);
> + c != (NSSCertificate *)NULL;
> + c = (NSSCertificate *)nssListIterator_Next(iter)) {
> + nssList_Destroy(certList);
BUG: This seems obviously incorrect and will double-free memory.
@@ +506,2 @@
> nss_ZFreeIf(dtor.certs);
> + return nspr_rv;;
Stray ;
Attachment #8751032 -
Flags: review?(ryan.sleevi) → review-
Updated•2 years ago
|
Severity: normal → S3
Updated•1 year ago
|
Severity: S3 → S4
Priority: -- → P3
You need to log in
before you can comment on or make changes to this bug.
Description
•