Open Bug 943144 Opened 11 years ago Updated 1 year ago

Deadlock in the trust domain lock and the object lock

Categories

(NSS :: Libraries, defect, P3)

3.15.2
x86_64
Linux

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: elio.maldonado.batiz, Assigned: rrelyea)

References

Details

Attachments

(1 file, 2 obsolete files)

A Red Hat customer reported experiencing about once a day what they referred to as a hang. Meaning that their client will make a connection to their ldap server and it will not reply. The ldapsearch did make a bind but never returned. They waited 10 minutes before killing it. During this time Directory Server made no log updates. The only way to fix this was either to restart ns-slapd or reboot the server. Under certain conditions, when server is processing multiple outgoing replication or windows sync agreements using TLS/SSL, and processing incoming client requests that use TLS/SSL and Simple Paged Results, the server will become unresponsive to new incoming client requests.
Do you have more info that establishes it as a trust domain lock?
Attached patch fix-deadlock.patch (obsolete) — Splinter Review
Patch by Bob Relyea, minimally to the latest sources.
(In reply to Ryan Sleevi from comment #1) > Do you have more info that establishes it as a trust domain lock? Adding Bob and Rich which did the analysis and tested the fix.
Unfortunately only Red Hat folks can currently access this link.
Comment on attachment 8338147 [details] [diff] [review] fix-deadlock.patch Review of attachment 8338147 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/pki/tdcache.c @@ +394,2 @@ > static void > +remove_token_certs(NSSCertificate *c, struct token_cert_dtor *dtor) style: This function seems to have obtained a stray space at the EOL @@ +454,5 @@ > PZ_Lock(td->cache->lock); > + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList); > + PZ_Unlock(td->cache->lock); > + > + /* find the certs that match this token without olding the td cache lock */ typo: olding -> holding @@ +455,5 @@ > + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList); > + PZ_Unlock(td->cache->lock); > + > + /* find the certs that match this token without olding the td cache lock */ > + iter=nssList_CreateIterator(certList); style: The rest of this function uses " = " (whitespace around assignment). It would be good to be consistent here. @@ +462,5 @@ > + } > + for (c = (NSSCertificate *)nssListIterator_Start(iter); > + c != (NSSCertificate *)NULL; > + c = (NSSCertificate *)nssListIterator_Next(iter)) { > + remove_token_certs( c, &dtor); style: seemingly unnecessary whitespace between "( c" BUG? |certList| holds a reference to the certificate, per line 391. This ref is undone if and only if object->instances[i]->token == dtor->token (in remove_token_certs). If it doesn't, this reference is leaked, is it not? Further, because of the added ref, it seems like you will never hit the condition in line 476, because td->cache->issuerAndSN will still be holding a reference (because the reference was copied when creating certList). @@ +487,5 @@ > } > } > + > + nspr_rv = PR_SUCCESS; > +loser: Don't we typically have an additional line break before loser labels?
Attachment #8338147 - Flags: review-
Assignee: nobody → rrelyea
Status: NEW → ASSIGNED
Attached patch fix deadlock - updated (obsolete) — Splinter Review
Updated for the current sources. Bob, could you please address Ryan's concerns in the previous review? Thank you.
Attachment #8338147 - Attachment is obsolete: true
Flags: needinfo?(rrelyea)
Attachment #8741607 - Flags: review?(ryan.sleevi)
Attachment #8741607 - Flags: review?(rrelyea)
> BUG? |certList| holds a reference to the certificate, per line 391. This ref is undone if and only if > object->instances[i]->token == dtor->token (in remove_token_certs). If it doesn't, this reference is leaked, > is it not Good catch. This actually this is a bug. Before we destroy the list we should walk through and Free the references. I was expecting the nssList_Destroy did that. The ref you are refering to is the ref held by instances array, which needs to be freed because we are dropping the instance. That means that we probably need 2 cert_iter functions, one for this case and one for nssTrustDomain_GetCertsFromCache (where the latter returns an array holding the cert references). > Further, because of the added ref, it seems like you will never hit the condition in line 476, because > td->cache->issuerAndSN will still be holding a reference (because the reference was copied when creating > certList). I'm not sure what you mean by the condition of line 476. In the new code line 476 is a comment. In the old code it's in another function. I'm guessing you might mean line 479 in the new code. I'm not sure what the td->cache->issuerAndSN holding a reference has to do with the numInstances. Instances are dropped from the array in remove_token_certs. The instance itself may not be gone (it's refrenced counted), but the numInstances referenced by this object will decrement independent of the reference count on the object. When the last token gets removed from the cert, it gets removed from the cache.
Flags: needinfo?(rrelyea)
Comment on attachment 8741607 [details] [diff] [review] fix deadlock - updated Review of attachment 8741607 [details] [diff] [review]: ----------------------------------------------------------------- r- Ryan pointed an actual leak. We should include the code fragment I've suggested here. ::: lib/pki/tdcache.c @@ +495,5 @@ > } > + > + nspr_rv = PR_SUCCESS; > +loser: > + if (certList) { As Ryan* pointed out, we need to free the reference we made from the certlist before we destroy the certlist: for (c = (NSSCertificate *)nssListIterator_Start(iter); c != (NSSCertificate *)NULL; c = (NSSCertificate *)nssListIterator_Next(iter)) { nssCertificate_Destroy(c); }
Attachment #8741607 - Flags: review?(rrelyea) → review-
Comment on attachment 8741607 [details] [diff] [review] fix deadlock - updated Review of attachment 8741607 [details] [diff] [review]: ----------------------------------------------------------------- setting r- because it's still not addressed
Attachment #8741607 - Flags: review?(ryan.sleevi) → review-
Attached patch fix deadlockSplinter Review
Adresses review requests
Attachment #8741607 - Attachment is obsolete: true
Attachment #8751032 - Flags: review?(ryan.sleevi)
Comment on attachment 8751032 [details] [diff] [review] fix deadlock Review of attachment 8751032 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/pki/tdcache.c @@ +457,5 @@ > PZ_Lock(td->cache->lock); > + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList); > + PZ_Unlock(td->cache->lock); > + > + /* find the certs that match this token without olding the td cache lock */ typo: s/olding/holding/ Which I flagged previously. However, this comment, while accurate, doesn't explain 'why', which is far more important. @@ +458,5 @@ > + nssHash_Iterate(td->cache->issuerAndSN, cert_iter, (void *)certList); > + PZ_Unlock(td->cache->lock); > + > + /* find the certs that match this token without olding the td cache lock */ > + iter=nssList_CreateIterator(certList); I also flagged this previously. @@ +478,2 @@ > for (i=0; i<dtor.numCerts; i++) { > if (dtor.certs[i]->object.numInstances == 0) { It doesn't seem the substance of the concerns I raised previously has been addressed here. Could you clarify why you believe it is? @@ +494,5 @@ > } > } > + > + nspr_rv = PR_SUCCESS; > +loser: Also not addressed @@ +499,5 @@ > + if (certList) { > + for (c = (NSSCertificate *)nssListIterator_Start(iter); > + c != (NSSCertificate *)NULL; > + c = (NSSCertificate *)nssListIterator_Next(iter)) { > + nssList_Destroy(certList); BUG: This seems obviously incorrect and will double-free memory. @@ +506,2 @@ > nss_ZFreeIf(dtor.certs); > + return nspr_rv;; Stray ;
Attachment #8751032 - Flags: review?(ryan.sleevi) → review-
Severity: normal → S3
Severity: S3 → S4
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: