Closed Bug 126087 Opened 23 years ago Closed 23 years ago

Hot Locks (PK11SymKey->refLock)

Categories

(NSS :: Libraries, defect, P1)

Sun
Solaris
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: kirk.erickson, Assigned: kirk.erickson)

References

Details

(Whiteboard: [adt2 RTM])

Attachments

(11 files, 5 obsolete files)

15.30 KB, text/plain
Details
7.01 KB, text/plain
Details
8.35 KB, text/plain
Details
2.02 KB, patch
Details | Diff | Splinter Review
5.48 KB, text/html
Details
4.46 KB, text/html
Details
526 bytes, patch
Details | Diff | Splinter Review
665 bytes, patch
Details | Diff | Splinter Review
1.53 KB, patch
wtc
: review+
Details | Diff | Splinter Review
2.35 KB, patch
Details | Diff | Splinter Review
2.76 KB, patch
Details | Diff | Splinter Review
Created this bug to track changes expected to be completed by next week to address lock contention reported in meeting called by Wan-Teh with Sun tuning team last Thursday. The data from Ning Sun is the first attachment. Note, iWS 6.0 SP1 was profiled, utilizing stock NSS (3.3); the sessionLock fix that made it into 3.4 recently was not present (see bug 121523). My plan is to benchmark before and after applying each of two sets of changes in turn. The first is to simply remove the lock in prng_GenerateGlobalRandomBytes(). It was reported that this lock is not needed at all. The second is to remove all the locking we can around instances of PK11SlotInfo->refLock, by using PR_Atomic instructions instead. To test the later, we need to ensure the underlying Sparc4 instructions are engaged. Wan-Teh said to create a symbolic link thusly: kirke@soupnazi[10] pwd /export2/kirke/workarea-nss/mozilla/dist/SunOS5.8_OPT.OBJ/lib kirke@soupnazi[11] ln -s libultrasparc4.so libatomic.so
Note, the sessionLock fix already checked in addresses the sessionLock (lock 6).
soupnazi When I remove the lock in prng_GenerateGlobalRandomBytes(), I'm unable to benchmark. I saw 100% cpu usage, and 0.90 ops/sec. I updated my tree this morning, and made sure we worked fine before making only the following change: Index: mozilla/security/nss/lib/freebl/prng_fips1861.c =================================================================== RCS file: /cvsroot/mozilla/security/nss/lib/freebl/prng_fips1861.c,v retrieving revision 1.12 diff -b -u -r1.12 prng_fips1861.c --- mozilla/security/nss/lib/freebl/prng_fips1861.c 15 Nov 2001 02:41:17 -0000 1.12 +++ mozilla/security/nss/lib/freebl/prng_fips1861.c 19 Feb 2002 19:02:31 -0000 @@ -366,13 +366,10 @@ PORT_SetError(SEC_ERROR_INVALID_ARGS); return SECFailure; } - /* --- LOCKED --- */ - PZ_Lock(rng->lock); /* Check the amount of seed data in the generator. If not enough, * don't produce any data. */ if (rng->seedCount < MIN_SEED_COUNT) { - PZ_Unlock(rng->lock); PORT_SetError(SEC_ERROR_NEED_RANDOM); return SECFailure; } @@ -392,8 +389,6 @@ len -= num; output += num; } - PZ_Unlock(rng->lock); - /* --- UNLOCKED --- */ return rv; } Bob, is this unsafe? or are the Sun folks right - this lock can be removed and I'm just removing it incorrectly?
As I look at the code, it appears to be unsafe. There are two possible issues here: in a race condition, two different callers may get the same random bits (this is bad because it creates an 'undetectable' security leak). The second problem is probably what is tripping you up, in a race condition, avail may become negative. We need to think about a way we can code grabbing more random data concurrently without horking the state of our random number generator. We need to get the resulting code reviewed very carefully to make sure we aren't introducing other problems. bob
The latest NSS library (Ian-0228-opt) works fine on our machine without any errors. Our 1-way SPECweb99-SSL result with the new library shows slight perf degredation compared to the original NSS (367 vs. 354 ops/s). But the scalability has been improved significantly with the new library. The 8-way results goes up by ~43%. It scales well upto 2-way but starts to tail off at 4-way. We collected the plockstat on both 1-way and 4-way. the sleep count of the first 5 hot locks on 4-way increased by over 100 times than 1-way. Please see the attached 2 files for detail statistics and the stack trace for each hot lock. Here is a brief summary on the data. 1-way: 1. pk11 slot objectLock var pk11_handleObject() 2. symKey refLock used by PK11_FreeSymKey() 3. PK11SlotInfo reference lock var PK11_ReferenceSlot() 4. sessionLock 5. objectLock from pk11_GetObjectFromList() 6. connection queue (not related to NSS) 7. slot freeListLock pk11_getKeyFromList() 8. lock from NSFC cache (not related to NSS) 9. object attributeLock from pk11_FindAttribute()
4-way: 1. pk11 slot objectLock var pk11_handleObject() 2. symKey refLock used by PK11_FreeSymKey() 3. sessionLock 4. PK11SlotInfo reference lock var PK11_ReferenceSlot() 5. slot freeListLock pk11_getKeyFromList() 6. objectLock from pk11_GetObjectFromList() 7. prng_GenerateGlobalRandomBytes() 8. connection queue (not related to NSS) 9. lock from NSFC cache (not related to NSS) 10. nssPKIObject_AddRef() ? 11. LockArena() from NSPR 12. PK11SlotList lock from PK11_GetFirstSafe() 13. object attributeLock from pk11_FindAttribute() The locks are sorted roughly according to the sleep count as suggested by Alex.
Re: the PRNG lock. It cannot just be removed. It protects the full state of the global PRNG. The PRNG itself is a serial algorithm, that is, state[N+1] = f(state[N]). I don't see how this can be done without a single lock. One way to avoid repeated accesses to the lock would be to increase the amount of psuedo-random data generated on each pass. Currently, only 20 bytes is generated. I don't know if that would decrease the total amount of time spent waiting on the lock, however.
I see a small gain on soupnazi (4p) with this patch. It removes the reference to slot->objectLock in pk11_handleObject(). Rather than taking the highest ops/sec, I've made my harness throw out the first two samples, and average the remainder. With a clean pull and build of today's tip: 532.22 14% With only this change: 538.00 13% Its possible Ning will see a more dramatic improvement, but we need Ian's checkin for her to run the tip under Deimos.
Kirk, My checkins are all on the tip. The last build I gave her was a tip build, as of Friday.
A small gain in both full and restart times. The entries with names ending with -atomicHO designate the runs with the above patch in pk11_handleObject(), which uses PR_AtomicIncrement() instead of slot->objectLock. I've supplied Ning with new libs and commands with these bits for iWS 6.0sp2 comparisons under Deimos (ftp://box.red.iplanet.com/pub/deimos/2002-0304/).
This patch includes the previous patch, and addresses the 3rd hotest lock. It removes slot->refLock from PK11_ReferenceSlot(), by using PR_AtomicIncrement(). I saw no speedup in my selfserv stress, but perhaps Ning will in her iWS stress tests. I'll update the bits on the ftp site in hopes she can compare this patch with the previous patch.
Attachment #72469 - Attachment is obsolete: true
Comment on attachment 72670 [details] [diff] [review] AtomicIncrement in pk11_handleObject and PK11_ReferenceSlot Neither Ning nor I saw a gain with atomic instructions for hot locks (1) and (3).
Attachment #72670 - Attachment is obsolete: true
Using PR_AtomicDecrement() rather than symKey->refLock yield a gain for me. Will provide Ning with this library set I pulled from the tip and benchmarked before and after applying this patch only. 204.84 full-zones 1% idle 214.32 full-zones-atomic2 1% idle --------------------------------------- 9.48 ops/sec speedup on full handshakes 948/204.84 = 4.63% gain 537.02 restart-zones 15% idle 542.07 restart-zones-atomic2 14% idle ---------------------------------------- 5.05 ops/sec speedup on restarts 505/537.02 = 0.94% gain
Attachment #73359 - Attachment is obsolete: true
Ning and I both saw a gain with the previous patch. This patch combines the previous change (removing symKey->refLock) with this change. It reduces the contention on slot->freeListLock by using atomic instructions on slot->keyCount. To do more with the slot->freeListLock, we need to create additional locks and round-robin their use. This entails profiling to ascertain an "ideal" number of locks, or a mechanism in software to detect the contention, and create on demand. Before we take this step, I'd like to see the big picture again. Please do your 1 and 4 way runs again with this set, and lets make sure its still really hot and worth this effort. I'd like to know contention hasn't fallen elsewhere before embarking down that path. This change was based on comments back from Ning and Bob Relyea: > The freeListLock is locking a very short piece of code, you can see this > in pk11skey.c: > > In getKeyFromList: > > PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) > if (slot->freeSymKeysHead) { > symKey = slot->freeSymKeysHead; > slot->freeSymKeysHead = symKey->next; > slot->keyCount--; > } > PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) > > and in FreeSymKey: > > PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) > if (slot->keyCount < slot->maxKeyCount) { > symKey->next = slot->freeSymKeysHead; > slot->freeSymKeysHead = symKey; > slot->keyCount++; > symKey->slot = NULL; > freeit = PR_FALSE; > } > PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) > I can see a couple of things you can do to limit the locks: > > 1st move the slot->keyCount stuff out of the locks and use Atomic > increment/decrement. This mostly helps the free case because you can > simply move the lock insiide the if statement. > > 2nd, the only other thing that can be done is to use a round robin set > of free lists, each with it's own locks. You'll have to do profiling to > see how many free lists you would need. Be sure to addjust maxKeyCount > down by the number of freelists you have. It might be best to make the > number of lists a power of two, it makes the math to figure out which > list faster. I don't think you really need to use locks to protect the > round robin value since it's not a serious problem if you accidently > skip a bucket once because of contention. > > bob > > Ning Sun wrote: > > >Hi Kirk, > > > >I tested your new NSS bits, it shows about 2% improvement on 4-way > >over the Ian-0228-opt version. > > > >Thanks a lot for the help. > > > >You are right. the second lock in my report should be slot->freeListLock > >instead of the refLock. The contention on this is very high. Hopefully > >we can do something about it. > > > >Thanks, > >-ning Compared to the previous patch, I saw a gain of from 214.32 (1% idle) to 216.41 (0% idle) on full handshakes, and from 542.07 (14% idle) to 545.95 (15% idle) on restarts. Compared to the tip (which is closed and about to release as NSS 3.4) 216.41 - 204.84 = 11.57 ops/sec gain on full handshakes (5.65%) 545.95 - 537.02 = 8.93 ops/sec gain on restarts (1.66%)
I stressed the previous patch overnight with no problems. Here are the performance numbers from soupnazi, an Ultra Sparc II machine with 4 450Mhz cpus. It includes results from the current NSS 3.4 tip, NSS 3.3.1 and from November (just before Bob's big checkin) for comparison. The results from this patch are named ...zones-atomic2a The results from the previous version ...zones-atomic2 In these runs, the server was hit by nine machines on the same subnet. It ran with 100 threads, as did all clients.
Attachment #72495 - Attachment is obsolete: true
OS: Linux → Solaris
Priority: -- → P1
Target Milestone: --- → 3.4.1
Without the zone allocator, heap contention appears to mask gains. Ning is using SmartHeap, and saw this patch perform better than the previous patch, but found that official svbld bits performed better than either 0308 or 0314. As a result, I tried to isolate what build environment differences might account for the general drop with my builds. Found that I was using an older compiler than Sonja. And upgraded. Found no other differences in build environment variables, compiler command lines, or linker command lines. The only other difference I found was that I was doing a noimport build, pulling NSPRPUB_RELEASE_4_2_BETA2 to pick up Wan-Teh's version of the zone allocator. I backed out to NSPRPUB_RELEASE_4_1_2, which is the release which NSS34 will release with as things stand. Then I applied my own zone allocator changes, which I used previously with NSPR 4.1.2. These are the results of my selfserv stress runs. Once again, these changes appear to help selfserv stress, especially on full handshake runs (207.39->222.58), where idle dropped back to zero for the first time too. Also, 561.50 is a new record for restarts (see restart-zones-34-atomic). Both with and without this patch (see names with -atomic) NSS 3.4 (see names with -34) show a jump in idle cpu time when the zone allocator (-zones) is engaged.
Checked in patch above and closed this bug. Will open a new bug when we get a new hotlist that shows where contention has fallen. The final patch address contention on slot->freeListLock and &symKey->refCount, touching pk11wrap/pk11skey.c only.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
While debugging some changes I made, I noticed that tstclnt was not shutting down cleanly as it used to. It seemed the internal slot was being leaked (several references), and after looking at checkins since 3.4 RTM I realized it had to be here. I made this change and it is fine now. Checking in; just noting the change here.
Comment on attachment 74085 [details] [diff] [review] address hotlock 2: symKey->refLock and slot->freeListLock Kirk, Thanks for putting together this patch and testing it. I see two issues with this patch that need to be addressed. 1. I don't think there is a gain to use atomic increment and decrement on slot->keyCount because the slot->keyCount-- or slot->keyCount++ operations are only a small portion of the code between lock and unlock; there is still code to operate on the free list. Roughly, you reduce the critical region from 4 lines of code to 3 lines. 2. The use of atomic increment and decrement on symKey->refLock should be a win, but you did not replace symKey->refCount++ with PR_AtomicIncrement(&sym->refCount). :-) >@@ -264,15 +264,15 @@ > PORT_Free(symKey->data.data); > } > slot = symKey->slot; >- PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) > if (slot->keyCount < slot->maxKeyCount) { >+ PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) With this change, you are now testing slot->keyCount before you grab the lock. This is not thread-safe. > symKey->next = slot->freeSymKeysHead; > slot->freeSymKeysHead = symKey; >- slot->keyCount++; >+ PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) >+ PR_AtomicIncrement(&slot->keyCount); > symKey->slot = NULL; > freeit = PR_FALSE; > } >- PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) > if (freeit) { > pk11_CloseSession(symKey->slot, symKey->session, > symKey->sessionOwner); The change to shorten the critical region here is great! I am pretty sure that freeit = PR_FALSE; can be done after the unlock call. However, I think symKey->slot = NULL; cannot be after the unlock call. It does not need to be in the critical region, but it should be before the lock call.
Attachment #74085 - Flags: needs-work+
To elaborate my comment on this change: > symKey->next = slot->freeSymKeysHead; > slot->freeSymKeysHead = symKey; >- slot->keyCount++; >+ PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) >+ PR_AtomicIncrement(&slot->keyCount); > symKey->slot = NULL; > freeit = PR_FALSE; > } >- PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) > if (freeit) { > pk11_CloseSession(symKey->slot, symKey->session, > symKey->sessionOwner); Once you add symKey to the freelist and unlock, you no longer have a valid reference to it, so you can't set symKey->slot to NULL after you unlock. You can set symKey->slot to NULL either before the lock call or inside the critical region.
Wan-Teh wrote: > > if (slot->keyCount < slot->maxKeyCount) { > >+ PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) > > With this change, you are now testing slot->keyCount before > you grab the lock. This is not thread-safe. I recall balking on this too, but Bob had reason to believe this was safe. maxKeyCount is initialized by PK11_NewSlotInfo()/PK11_InitToken() in pk11slot.c, and never touched subsequently. My recollection is that its safe because its only after initialization is complete that the reference in PK11_FreeSymKey() can occur. I'll come up with a new patch that addresses your other concerns, and perhaps Bob will clear up this aspect.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This bug was brought out by my last fix. The slot cannot be set to NULL before leaving the freelist, or another thread may grab the key off the freelist and have their slot NULL'ed.
Sorry, I should have read the last comments more carefully, but I was deep in debugging. I made the obvious change of moving the symKey->slot = NULL back into the critical region, and checked it in to fix the tinderboxes. The other option, of doing it before the critical region, is just as valid, so if that is preferable go ahead and move it again.
Wan Teh is write, we can't mark the slot as NULL once we've added the key to the free list unless we are still locking the free list, however, we can set the slot to NULL before we add the key to the free list. It's safe at that point because we know there are no other references to the key (we are in the middle of destroying it). So the key is free to be tweaked until we put it on the free list. (just be sure to set it to NULL *after* we've made a copy of the slot since we still need that ;). bob
OK, I meant Wan-Teh is 'right' not 'write':). It's too late, I think I need to go home;).
There are still occasional tinderbox failures in stress tests, almost assuredly due to these changes. I wonder if Wan-Teh's other observation is the problem. That is, the PR_AtomicDecrement was not matched with a PR_AtomicIncrement. We have: PK11SymKey * PK11_ReferenceSymKey(PK11SymKey *symKey) { PK11_USE_THREADS(PZ_Lock(symKey->refLock);) symKey->refCount++; PK11_USE_THREADS(PZ_Unlock(symKey->refLock);) return symKey; } The lock no longer protects the refCount from being decremented, so it is useless. And I assume mixing the ++ operator with PR_AtomicDecrement is not safe.
Ian, You are right. The PK11_ReferenceSymKey function should say: PK11SymKey * PK11_ReferenceSymKey(PK11SymKey *symKey) { PR_AtomicIncrement(&symKey->refCount++); return symKey; } Bob, is it okay to use NSPR atomic routines here? NSPR atomic routines are implemented with NSPR locks on machines that don't have the appropriate atomic instructions. It would appear that this is not PKCS#11 compliant because we won't using the PKCS#11 mutexes. Do we have stubs for PR_AtomicIncrement and PR_AtomicDecrement?
Correction: the ++ operator should be deleted. PK11SymKey * PK11_ReferenceSymKey(PK11SymKey *symKey) { PR_AtomicIncrement(&symKey->refCount); return symKey; }
Wan-Teh, I checked in that change a while ago, as an experiment to see if it fixes the intermittant tbox failures. As for not using a lock, IINM this should not affect PKCS#11 because it is above the line. That is, this code is in libnss, so we can do whatever we want.
> I see two issues with this patch that need to be addressed. > > 1. I don't think there is a gain to use atomic increment > and decrement on slot->keyCount because the slot->keyCount-- > or slot->keyCount++ operations are only a small portion of > the code between lock and unlock; there is still code to > operate on the free list. Roughly, you reduce the critical > region from 4 lines of code to 3 lines. I've created a new patch and attached it here for review which backs out of the change in pk11_getKeyFromList(): PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) if (slot->freeSymKeysHead) { symKey = slot->freeSymKeysHead; slot->freeSymKeysHead = symKey->next; slot->keyCount--; } PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);) > 2. The use of atomic increment and decrement on symKey->refLock > should be a win, but you did not replace symKey->refCount++ > with PR_AtomicIncrement(&sym->refCount). :-) I gather between Ian and Wan-Teh's checkins that this has been resolved. I was addressing the hotlock which Ning pointed to in PK11_FreeSymKey() and neglected the instance in PK11_ReferenceSymKey. We're setting symKey->slot to NULL before the UNLOCK as things stand.
Comment on attachment 78918 [details] [diff] [review] Patch to back out of atomic slot-keyCount change Kirk, Thanks for the patch. This patch is correct but is incomplete. You also need to back out the PR_AtomicIncrement(&slot->keyCount) change. Also, now that we use atomic increment/decrement routines on symKey->refCount, we can remove symKey->refLock. Could you take care of that too?
Attachment #78918 - Flags: needs-work+
Attachment #78918 - Attachment is obsolete: true
Ok, this patch catches both PK11_FreeSymKey() and pk11_getKeyFromList(). This is what we had originally, and what this patch takes us back too: PK11_USE_THREADS(PZ_Lock(slot->freeListLock);) if (slot->keyCount < slot->maxKeyCount) { symKey->next = slot->freeSymKeysHead; slot->freeSymKeysHead = symKey; slot->keyCount++; symKey->slot = NULL; freeit = PR_FALSE; } PK11_USE_THREADS(PZ_Unlock(slot->freeListLock);)
Comment on attachment 78948 [details] [diff] [review] backout of atomic slot->keyCount change r=wtc. Thanks.
Attachment #78948 - Flags: review+
Since we are using PR atomic routines on symKey->refCount, symKey->refLock should be deleted. PR atomic routines take a PRInt* pointer as the argument, so symKey->refCount should be declared as PRInt32.
Changed the QA contact to Bishakha.
QA Contact: sonja.mirtitsch → bishakhabanerjee
Moved to NSS 3.4.2.
Target Milestone: 3.4.1 → 3.4.2
Comment on attachment 82289 [details] [diff] [review] Patch for the NSS_3_4_BRANCH This patch has been checked into the NSS_3_4_BRANCH.
Marked the bug fixed. Note that I updated the bug's summary to reflect what lock contention actually got addressed.
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Summary: Hot Locks (GenerateRandom and PK11SlotInfo->refLock) → Hot Locks (PK11SymKey->refLock)
Blocks: 145836
adt1.0.1+ (on ADT's behalf) for checkin to the 1.0 branch, pending Driver's approval. Pls check this in asap. thanks!
Whiteboard: [adt2 RTM]
Keywords: mozilla1.0.1fixed1.0.1
bishakhabanerjee@netscape.com - can you verify this bug fix in 1.01 branch? When verified, pls replace fixed1.0.1 keyword with verified1.0.1. Thanks.
marking verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: