Closed Bug 212112 Opened 21 years ago Closed 21 years ago

A recursive deadlock on td->cache->lock

Categories

(NSS :: Libraries, defect)

3.7.3
PowerPC
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wtc, Assigned: wtc)

References

Details

(Whiteboard: [3.8.2][3.7.8])

Attachments

(1 file)

The NSS version is 3.7.7.

An NSS-based application was hung because it tried to lock
td->cache->lock recursively:

(gdb) where
#0  0x90034828 in semaphore_wait_trap ()
#1  0x90009d18 in pthread_mutex_lock ()
#2  0x003a45a0 in PR_Lock (lock=0xf3ce60) at
../../../../mozilla/nsprpub/pr/src/pthreads/ptsynch.c:190
#3  0x0095f868 in nssTrustDomain_RemoveTokenCertsFromCache (td=0xf46bb8,
token=0xf4b838) at tdcache.c:486
#4  0x008ef4f0 in nssToken_NotifyCertsNotVisible (tok=0xf4b838) at dev3hack.c:311
#5  0x00967a14 in nssSlot_IsTokenPresent (slot=0xf4c248) at devslot.c:360
#6  0x00967ac4 in nssSlot_GetToken (slot=0xf4c248) at devslot.c:403
#7  0x0095e808 in nssTrustDomain_FindTrustForCertificate (td=0xf46bb8,
c=0x37c3e80) at trustdomain.c:1194
#8  0x00965d60 in nssTrust_GetCERTCertTrustForCert (c=0x37c3e80, cc=0x37c4660)
at pki3hack.c:537
#9  0x009663f0 in fill_CERTCertificateFields (c=0x37c3e80, cc=0x37c4660,
forced=1) at pki3hack.c:690
#10 0x00966508 in stan_GetCERTCertificate (c=0x37c3e80, forceUpdate=1) at
pki3hack.c:718
#11 0x00966594 in STAN_ForceCERTCertificateUpdate (c=0x37c3e80) at pki3hack.c:734
#12 0x0095f8fc in nssTrustDomain_RemoveTokenCertsFromCache (td=0xf46bb8,
token=0xf4b838) at tdcache.c:492
#13 0x008ef4f0 in nssToken_NotifyCertsNotVisible (tok=0xf4b838) at dev3hack.c:311
#14 0x00967a14 in nssSlot_IsTokenPresent (slot=0xf4c248) at devslot.c:360
#15 0x0096bed0 in nssToken_IsPresent (token=0xf4b838) at devtoken.c:1620
#16 0x00911120 in pk11_IsPresentCertLoad (slot=0xf45140, loadCerts=1) at
pk11slot.c:2012
#17 0x0091131c in PK11_IsPresent (slot=0xf45140) at pk11slot.c:2059
#18 0x0090da28 in PK11_FindSlotByName (name=0x3628ba0 "AOL Key") at pk11slot.c:584

We believe that the error occurs in the vicinity of functions
nssTrustDomain_RemoveTokenCertsFromCache and STAN_ForceCERTCertificateUpdate.

Two questions:

1. Does nssTrustDomain_RemoveTokenCertsFromCache need to call
STAN_ForceCERTCertificateUpdate at all?

2. If the answer to question 1 is yes, does
nssTrustDomain_RemoveTokenCertsFromCache need to call
STAN_ForceCERTCertificateUpdate while holding td->cache->lock?
I looked at the other places where we call STAN_ForceCERTCertificateUpdate
and concluded that it is not necessary to call that function while holding
td->cache->lock.

This patch moves the STAN_ForceCERTCertificateUpdate call outside the
td->cache->lock critical section.  I have to iterate through the whole
dtor.certs array twice.  It should be possible to avoid that, but I think
these arrays (the certs on a removable token) should be small so that
the cost of iterating through the whole array is not high.
Comment on attachment 127290 [details] [diff] [review]
Call STAN_ForceCERTCertificateUpdate without holding td->cache->lock

Nelson, please review this patch.  You might also
want to use LXR to verify that we are calling
STAN_ForceCERTCertificateUpdate without holding
td->cache->lock elsewhere in NSS.

Ian, if you could answer my two questions and review
this patch, I'd appreciate it.
Attachment #127290 - Flags: superreview?(bugz)
Attachment #127290 - Flags: review?(nelsonb)
Comment on attachment 127290 [details] [diff] [review]
Call STAN_ForceCERTCertificateUpdate without holding td->cache->lock

Certainly looks correct to me.	I agree performance is hopefully not a problem
in this are, since the number of token certs should be small (on a removable
token).
Attachment #127290 - Flags: superreview?(bugz) → superreview+
Fix checked into the NSS tip (NSS 3.9), NSS_3_8_BRANCH (NSS 3.8.2),
and NSS_3_7_BRANCH (NSS 3.7.8).
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Whiteboard: [3.8.2][3.7.8]
Target Milestone: --- → 3.9
Comment on attachment 127290 [details] [diff] [review]
Call STAN_ForceCERTCertificateUpdate without holding td->cache->lock

The solution employed here solves the deadlock but doesn't address the
recursion itself, which I believe it fundamentally unnecessary.  I wish that
recursion could actually be eliminated.  But I'm satisfied that this fix is
correct.
Attachment #127290 - Flags: review?(nelsonb) → review+
Good point, Nelson.  I opened bug 212474 for the elimination
of the recursion.
*** Bug 216945 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: