Closed Bug 121523 Opened 23 years ago Closed 23 years ago

PKCS Lock Contention (sessionLock and scalability)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

3.4

People

(Reporter: kirk.erickson, Assigned: kirk.erickson)

Details

Attachments

(6 files, 7 obsolete files)

Fixes contention on slot->sessionLock 23 years ago Kirk Erickson 12.64 KB, patch		Details \| Diff \| Splinter Review
Results stressing selfserv with above patch 23 years ago Kirk Erickson 7.02 KB, text/html		Details
pk11_DestroySession() callers 23 years ago Kirk Erickson 6.84 KB, text/plain		Details
sessionLock patch 23 years ago Kirk Erickson 23.77 KB, patch		Details \| Diff \| Splinter Review
sessionLock patch 23 years ago Kirk Erickson 13.49 KB, patch		Details \| Diff \| Splinter Review
sessionLock patch (with relaxed lock in PK11_GenerateRandom) 23 years ago Kirk Erickson 12.93 KB, patch		Details \| Diff \| Splinter Review
sessionLock fix 23 years ago Kirk Erickson 12.98 KB, patch		Details \| Diff \| Splinter Review
results after update today, and latest sessionLock fix 23 years ago Kirk Erickson 7.80 KB, text/html		Details
sessionLock fix 23 years ago Wan-Teh Chang 13.50 KB, patch		Details \| Diff \| Splinter Review
Apply this after the above patch for SlotInit failure leaks fix 23 years ago Kirk Erickson 1.08 KB, patch		Details \| Diff \| Splinter Review
sessionLock fix 23 years ago Wan-Teh Chang 13.28 KB, patch		Details \| Diff \| Splinter Review
SlotStr lock comments 23 years ago Kirk Erickson 768 bytes, patch		Details \| Diff \| Splinter Review
SlotStr comments 23 years ago Kirk Erickson 913 bytes, patch		Details \| Diff \| Splinter Review

Kirk Erickson

Assignee

Description

•

23 years ago

Shifted from bug 116255 (see Commens #23 through #27). This bug was opened to separate the 7.4% performance degradation (between 3.3.1 and 3.4) because the scalability problems reported by Shih-Hao-Hung (of Sun) are a separate issue. Note also, the contention on slot->sessionLock that Shih-Hao-Hung reported (by profiling with a tool called plockstat under Solaris) was done on a 12-cpu box. He added recently: It seems to me that the session lock affect the performance of a large system. In our experiments, we did not see its effect when the number of processors is less or equal to 4. It might be the reason why you did not get good results from your 4-way system. iWS6+Deimos on 1GHz Serengeti: (HTTP_LOAD) #cpu ops/s (maxprocs=1) ops/s (maxprocs=12) 1 298 (0% idle) 296 (0% idle) 2 520 (0% idle) 516 (0% idle) 4 950 (0% idle) 960 (0% idle) 8 900 (3% idle) 1720 (0% idle) 12 1050 (58% idle) 2448 (0% idle) The comments in 116255 by Bob Relyea outlined a plan of attack I have followed through on. There's still a SEGV to iron out, and tuning to do on the buckets and total session hash entries. The plan is to update this bug with a patch for review, and target the 3.4.1 NSS release. We'll feed a build of the patched library back to Shi-Hao-Hung for comparison and profiling before check-in. He is using NSS with iWS 6.0, whereas my benchmarking is with selfserv. In addition to converting slot->sessionLock into a bucket of locks, we may pool sessions. Further investigation is needed. I will update this bug with a stack trace for Bob to inspect, showing where sessions are being destroyed from (assuming this is the case).

Kirk Erickson

Assignee

Comment 1

•

23 years ago

Attached patch Fixes contention on slot->sessionLock (obsolete) — Details — Splinter Review

This patch replaces the single slot->sessionLock with a series of locks. It effects 3 files: mozilla/security/nss/lib/softoken/pkcs11.c mozilla/security/nss/lib/softoken/pkcs11i.h mozilla/security/nss/lib/softoken/pkcs11u.c I've performed several benchmarking runs from home and work to settle on a matrix of hash table size, locks per bucket, obucketslocks per hash,locks,and buckets and sessionID to hash table index macros /* SESSION_LOCK_LOG2_PER_BUCKET must be a prime number. With SESSION_HASH_SIZE=1024, LOG2 can be 9, 5, or 1. With SESSION_HASH_SIZE=4096, LOG2 can be 11, 9, 5, or 1. HASH_SIZE LOG2_PER_BUCKET LOCK_PER_BUCKET BUCKETS 1024 9 512 2 1024 5 32 32 1024 1 2 512 4096 11 2048 2 4096 9 512 8 4096 5 32 128 4096 1 2 2048 */ #define SESSION_LOCK_LOG2_PER_BUCKET 5 #define SESSION_LOCK_PER_BUCKET (2 << (SESSION_LOCK_LOG2_PER_BUCKET-1)) #define SESSION_LOCK_BUCKETS (SESSION_HASH_SIZE/SESSION_LOCK_PER_BUCKET) /* NOSPREAD sessionID to hash table index macro has been slower. */ #if 0 #define NOSPREAD #endif Here are the selfserv runs I made. Results appear in the following attachment, and are available by visiting: http://box.red.iplanet.com/nss/selfserv selfserv --optimize --name 1024-9-nospread selfserv --optimize --name 1024-9-spread selfserv --optimize --name 1024-5-spread selfserv --optimize --name 1024-5-nospread selfserv --optimize --name 1024-1-nospread selfserv --optimize --name 1024-1-spread selfserv --optimize --name 4096-1-spread selfserv --optimize --name 4096-1-nospread selfserv --optimize --name 4096-11-nospread selfserv --optimize --name 4096-11-spread selfserv --optimize --name 4096-5-spread selfserv --optimize --name 4096-5-nospread selfserv --optimize --name 4096-9-nospread selfserv --optimize --name 4096-9-spread Best results were yielded with the 4096-5-spread run hitting selfserv on soupnazi.red.iplanet.com from seven neighboring client machines. The server and all clients ran with 100 threads each. soupnazi (Solaris 5.8) 192.18.71.52 (DS lab 17/3383) SunOS soupnazi 5.8 Generic sun4u sparc SUNW,Ultra-4 System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (4 X UltraSPARC-II 400MHz) System clock frequency: 100 MHz Memory size: 1024 Megabytes Nearby client machines: 192.18.71.54 alphachip (SunOS 5.8) 192.18.71.87 foofighter (SunOS 5.8) 192.18.71.232 gods (RedHat Linux 7.1) 192.18.71.53 lola (SunOS 5.8) 192.18.71.55 meatball (SunOS 5.8) 192.18.71.97 repltest-linux1 (RedHat Linux 6.2) When we used fewer locks, CPU usage went up. This result surprised me. I expected less contention would mean smaller %idle values. The first entry in the results table, named "1-session-lock" is the result using the same harness previous to applying the patch. Comparing that result (508.81) and the "4096-5-spread" result (574.08) showed a 12.8% performance gain: 100/x = 508.81/574.08 57408 = 508.81x x = 57408/508.81 x = 112.82797 While debugging this patch, I noticed we're freeing and allocating sessions repeatedly. A separate attachment follows with traces Bob requested to identify the callers. I expect there are additional gains to be realized by pooling sessions.

Kirk Erickson

Assignee

Comment 2

•

23 years ago

Attached file Results stressing selfserv with above patch — Details

Kirk Erickson

Assignee

Comment 3

•

23 years ago

Attached file pk11_DestroySession() callers — Details

Bob, Attached are the stack traces you requested identifying pk11_DestroySession() callers. You said if we pool contexts we should get sessions with them but apparently contexts are not being pooled. Also, I was confused by the fact that pk11_FreeContext() is called by pk11_DestroySession(). Doesn't this make the session structure higher level. If we pool sessions will we avoid reallocating contexts? Let me know if we can fix the callers or whether I should pool sessions as part of the fix for this bug. kirk

Kirk Erickson

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Summary: PKCPKCS Lock Contention (sessionLock and scalability) → PKCS Lock Contention (sessionLock and scalability)

Kirk Erickson

Assignee

Comment 4

•

23 years ago

Targeting NSS 3.4.1 release. Sent 3 sets of Solaris builds to Shih-Hao-Hung for testing: libsoftokn3-1024-1-nospread.so* libsoftokn3-4096-1-spread.so* libsoftokn3-4096-5-spread.so* libsoftokn-1024-1-nospread.a libsoftokn-4096-1-spread.a libsoftokn-4096-5-spread.a

Priority: -- → P1

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #66792 - Attachment description: Fixes contention on slot->sessionLock contention → Fixes contention on slot->sessionLock

Wan-Teh Chang

Comment 5

•

23 years ago

Per our discussion in the conference call this afternoon, we want to include Kirk's session lock fix in NSS 3.4. Kirk, please attach your patch to this bug for Bob to review. Thanks.

Target Milestone: --- → 3.4

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #66792 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 6

•

23 years ago

Attached patch sessionLock patch (obsolete) — Details — Splinter Review

1204-1-nosread rather than 4096-5-spread Avoiding lock in PK11_GenerateRandom when using internal random number generator.

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #67692 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 7

•

23 years ago

Attached patch sessionLock patch (obsolete) — Details — Splinter Review

1204-1-nosread rather than 4096-5-spread Avoiding lock in PK11_GenerateRandom when using internal random number generator.

Wan-Teh Chang

Comment 8

•

23 years ago

Kirk, You described your patch as "1204-1-nosread", but your results showed that "spread" is better than "nospread". Why didn't you include "spread" in your patch?

Kirk Erickson

Assignee

Comment 9

•

23 years ago

Wan-Teh asked: > You described your patch as "1204-1-nosread", but > your results showed that "spread" is better than > "nospread". Why didn't you include "spread" in > your patch? My benchmarking on soupnazi showed the 1024-1-nospread as nearly matching my fastest (4096-5-spread). So I included it in the set I provided Shih-Hao for testing. Shih-Hao found that the 4096 table size yielded insignificant improvement over the 1024-1-unspread config. This appears to be a "sweat spot". Its true - generally I saw better numbers in SPREAD runs, but this one out did them all. We can provide Shih-Hao with the other configs: 1024-1-spread 1024-5-unspread 1024-5-spread and to verify its best for him too, but I believe 1024-1-unspread is the safest choice to checkin at this point.

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #67693 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 10

•

23 years ago

Attached patch sessionLock patch (with relaxed lock in PK11_GenerateRandom) (obsolete) — Details — Splinter Review

This is after Bob's first review. We made it possible to have 1 lock per bucket, and simplified places where the hash table index could be used rather than the session handle.

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #67781 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 11

•

23 years ago

Attached patch sessionLock fix (obsolete) — Details — Splinter Review

Changed names: :%s/SESSION_LOCK_PER_BUCKET/BUCKETS_PER_SESSION_LOCK/g :%s/SESSION_LOCK_BUCKETS/NUMBER_OF_SESSION_LOCKS/g :%s/SESSION_LOCK_LOG2_PER_BUCKET/LOG2_BUCKETS_PER_SESSION_LOCK/g Results follow - my benchmarks showed 1024-1-spread fastest. Thats the bucket config of this patch. Sending an array of libs to Shih-Hao for additional benchmark comparisons.

Kirk Erickson

Assignee

Comment 12

•

23 years ago

Attached file results after update today, and latest sessionLock fix — Details

On soupnazi: soupnazi (Solaris 5.8) 192.18.71.52 (DS lab 17/3383) SunOS soupnazi 5.8 Generic sun4u sparc SUNW,Ultra-4 System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (4 X UltraSPARC-II 400MHz) System clock frequency: 100 MHz Memory size: 1024 Megabytes Best selfserv result was achived in 1024-1-spread run: SESSION_HASH_SIZE 1024 LOG2_BUCKETS_PER_SESSION_LOCK 1 NOSPREAD undefined Thats 2 buckets per lock, and 512 session locks.

Wan-Teh Chang

Comment 13

•

23 years ago

Attached patch sessionLock fix (obsolete) — Details — Splinter Review

This patch is based on Kirk's patch (attachment 67817 [details] [diff] [review]) with the following changes. 1. I changed "pk11_SessionLock" to "PK11_SESSION_LOCK" to make it clear that this is a macro. 2. In softoken/pkcs11.c, function pk11_DestroySlotData, I added code to destroy slot->slotLock. 3. In softoken/pkcs11.c, function NSC_CloseSession, we were grabbing the slot lock and the session lock simultaneously. This makes it more likely to get lock contention. Bob told me how to rework the code so that we only need to grab one lock at a time. Kirk, please give this patch a try. TO DO: 1. In softoken/pkcs11.c, function PK11_SlotInit, there is memory leak in the error path if we can't create all the locks. When that happens, the locks that we have created should be destroyed. 2. Bob needs to document which lock protects which data in the PK11Slot structure.

Attachment #67817 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 14

•

23 years ago

Attached patch Apply this after the above patch for SlotInit failure leaks fix — Details — Splinter Review

Wan-Teh Chang

Comment 15

•

23 years ago

Attached patch sessionLock fix — Details — Splinter Review

I tweaked the white space in Kirk's patch to make the diffs easier to read. I checked this patch into the tip. Thanks, Kirk!

Attachment #67994 - Attachment is obsolete: true

Wan-Teh Chang

Comment 16

•

23 years ago

Comment on attachment 68000 [details] [diff] [review] Apply this after the above patch for SlotInit failure leaks fix Kirk, your patch is fine, but I found that it does not address the memory leaks on all the error paths in that function (PK11_SlotInit). Then I noticed that those memory leaks on error paths are a pre-existing problem. So I don't think we need to address this problem in this bug. I filed bug 123693 for the memory leaks in the error paths in PK11_SlotInit().

Attachment #68000 - Flags: needs-work+

Wan-Teh Chang

Comment 17

•

23 years ago

Kirk, The only thing left to do for this bug is to document which locks protect which data in the PK11Slot structure. The locks are: slotLock sessionLock[i] objectLock Thanks.

Kirk Erickson

Assignee

Comment 18

•

23 years ago

Attached patch SlotStr lock comments (obsolete) — Details — Splinter Review

Bob, please review this when you get a minute. I'll checkin and close this bug on your approval.

Robert Relyea

Comment 19

•

23 years ago

You need to specify the actual elements slotLock protects (there are lots of unprotected fields). bob

Wan-Teh Chang

Comment 20

•

23 years ago

Kirk, I would like like to know exactly what sessionLock[i] protects. It seems to be controlled by the PK11_SESSION_LOCK macro.

Robert Relyea

Comment 21

•

23 years ago

BTW don't forget the sessionLock[] protects the reference count of session objects in that bucket.

Kirk Erickson

Assignee

Updated

•

23 years ago

Attachment #68572 - Attachment is obsolete: true

Kirk Erickson

Assignee

Comment 22

•

23 years ago

Attached patch SlotStr comments — Details — Splinter Review

Updated with feedback from Bob.

Robert Relyea

Comment 23

•

23 years ago

Cool, looks good kirk. bob

Kirk Erickson

Assignee

Comment 24

•

23 years ago

Bob approved the SlotStr commenting, I've checked them in. Closing this bug. Shih-Hao achieved record numbers using a bucketing configuration different from the current setting (1024-1-spread). He used 1024-0-unspread. He hasn't had an opportunity yet to try the array of configurations I provided. Its possible we'll want to change the config after receiving feedback from Shih-Hao. A separate bug will be opened in this case.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.