PKCS Lock Contention (sessionLock and scalability)

RESOLVED FIXED in 3.4

Status

NSS
Libraries
P1
normal
RESOLVED FIXED
17 years ago
17 years ago

People

(Reporter: Kirk Erickson, Assigned: Kirk Erickson)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(6 attachments, 7 obsolete attachments)

(Assignee)

Description

17 years ago
Shifted from bug 116255 (see Commens #23 through #27).
This bug was opened to separate the 7.4% performance degradation
(between 3.3.1 and 3.4) because the scalability problems reported
by Shih-Hao-Hung (of Sun) are a separate issue.

Note also, the contention on slot->sessionLock that Shih-Hao-Hung
reported (by profiling with a tool called plockstat under Solaris)
was done on a 12-cpu box.   He added recently:

It seems to me that the session lock affect the performance of a large 
system. In our experiments, we did not see its effect when the number of 
processors is less or equal to 4. It might be the reason why you did not 
get good results from your 4-way system.

iWS6+Deimos on 1GHz Serengeti: (HTTP_LOAD)
#cpu    ops/s (maxprocs=1)  ops/s (maxprocs=12)
 1       298 (0% idle)          296 (0% idle)
 2       520 (0% idle)          516 (0% idle)
 4       950 (0% idle)          960 (0% idle)
 8       900 (3% idle)         1720 (0% idle)
 12     1050 (58% idle)        2448 (0% idle)

The comments in 116255 by Bob Relyea outlined a plan of attack
I have followed through on.  There's still a SEGV to iron out,
and tuning to do on the buckets and total session hash entries.

The plan is to update this bug with a patch for review, and target 
the 3.4.1 NSS release.

We'll feed a build of the patched library back to Shi-Hao-Hung
for comparison and profiling before check-in.  He is using NSS 
with iWS 6.0, whereas my benchmarking is with selfserv.  

In addition to converting slot->sessionLock into a bucket of locks,
we may pool sessions.  Further investigation is needed.  I will 
update this bug with a stack trace for Bob to inspect, showing where
sessions are being destroyed from (assuming this is the case).
(Assignee)

Comment 1

17 years ago
Created attachment 66792 [details] [diff] [review]
Fixes contention on slot->sessionLock

This patch replaces the single slot->sessionLock with a series of locks.
It effects 3 files:

mozilla/security/nss/lib/softoken/pkcs11.c 
mozilla/security/nss/lib/softoken/pkcs11i.h
mozilla/security/nss/lib/softoken/pkcs11u.c

I've performed several benchmarking runs from home and work to settle on
a matrix of hash table size, locks per bucket, obucketslocks per hash,locks,and
buckets and sessionID to hash table index macros

/*
SESSION_LOCK_LOG2_PER_BUCKET must be a prime number.
With SESSION_HASH_SIZE=1024, LOG2 can be 9, 5, or 1.
With SESSION_HASH_SIZE=4096, LOG2 can be 11, 9, 5, or 1.

HASH_SIZE   LOG2_PER_BUCKET  LOCK_PER_BUCKET   BUCKETS
1024	    9		     512	       2
1024	    5		     32 	       32
1024	    1		     2		       512
4096	    11		     2048	       2
4096	    9		     512	       8
4096	    5		     32 	       128
4096	    1		     2		       2048
*/
#define SESSION_LOCK_LOG2_PER_BUCKET 5
#define SESSION_LOCK_PER_BUCKET (2 << (SESSION_LOCK_LOG2_PER_BUCKET-1))
#define SESSION_LOCK_BUCKETS (SESSION_HASH_SIZE/SESSION_LOCK_PER_BUCKET)
/* NOSPREAD sessionID to hash table index macro has been slower. */
#if 0
#define NOSPREAD
#endif

Here are the selfserv runs I made.  Results appear in the following 
attachment, and are available by visiting:

http://box.red.iplanet.com/nss/selfserv

selfserv --optimize --name 1024-9-nospread
selfserv --optimize --name 1024-9-spread
selfserv --optimize --name 1024-5-spread
selfserv --optimize --name 1024-5-nospread
selfserv --optimize --name 1024-1-nospread
selfserv --optimize --name 1024-1-spread
selfserv --optimize --name 4096-1-spread
selfserv --optimize --name 4096-1-nospread
selfserv --optimize --name 4096-11-nospread
selfserv --optimize --name 4096-11-spread
selfserv --optimize --name 4096-5-spread
selfserv --optimize --name 4096-5-nospread
selfserv --optimize --name 4096-9-nospread
selfserv --optimize --name 4096-9-spread

Best results were yielded with the 4096-5-spread run hitting selfserv
on soupnazi.red.iplanet.com from seven neighboring client machines.
The server and all clients ran with 100 threads each.

soupnazi (Solaris 5.8)
    192.18.71.52    (DS lab 17/3383)
    SunOS soupnazi 5.8 Generic sun4u sparc SUNW,Ultra-4
    System Configuration:  Sun Microsystems  sun4u Sun
	Enterprise 450 (4 X UltraSPARC-II 400MHz)
    System clock frequency: 100 MHz
    Memory size: 1024 Megabytes
    Nearby client machines:
	192.18.71.54  alphachip 	(SunOS 5.8)
	192.18.71.87  foofighter	(SunOS 5.8)
	192.18.71.232 gods		(RedHat Linux 7.1)
	192.18.71.53  lola		(SunOS 5.8)
	192.18.71.55  meatball		(SunOS 5.8)
	192.18.71.97  repltest-linux1	(RedHat Linux 6.2)

When we used fewer locks, CPU usage went up.  This result surprised me.
I expected less contention would mean smaller %idle values.

The first entry in the results table, named "1-session-lock" is
the result using the same harness previous to applying the patch.
Comparing that result (508.81) and the "4096-5-spread" result
(574.08) showed a 12.8% performance gain:

    100/x = 508.81/574.08
    57408 = 508.81x
    x = 57408/508.81
    x = 112.82797

While debugging this patch, I noticed we're freeing and allocating
sessions repeatedly.  A separate attachment follows with traces Bob
requested to identify the callers.  I expect there are additional
gains to be realized by pooling sessions.
(Assignee)

Comment 2

17 years ago
Created attachment 66793 [details]
Results stressing selfserv with above patch
(Assignee)

Comment 3

17 years ago
Created attachment 66794 [details]
pk11_DestroySession() callers

Bob,

Attached are the stack traces you requested identifying
pk11_DestroySession() callers.	You said if we pool contexts we should
get sessions with them but apparently contexts are not being pooled.

Also, I was confused by the fact that pk11_FreeContext() is called
by pk11_DestroySession().  Doesn't this make the session structure
higher level.  If we pool sessions will we avoid reallocating contexts?

Let me know if we can fix the callers or whether I should pool sessions
as part of the fix for this bug.

kirk
(Assignee)

Updated

17 years ago
Status: NEW → ASSIGNED
Summary: PKCPKCS Lock Contention (sessionLock and scalability) → PKCS Lock Contention (sessionLock and scalability)
(Assignee)

Comment 4

17 years ago
Targeting NSS 3.4.1 release.

Sent 3 sets of Solaris builds to Shih-Hao-Hung for testing:

libsoftokn3-1024-1-nospread.so*     
libsoftokn3-4096-1-spread.so*
libsoftokn3-4096-5-spread.so*

libsoftokn-1024-1-nospread.a
libsoftokn-4096-1-spread.a
libsoftokn-4096-5-spread.a

Priority: -- → P1
(Assignee)

Updated

17 years ago
Attachment #66792 - Attachment description: Fixes contention on slot->sessionLock contention → Fixes contention on slot->sessionLock

Comment 5

17 years ago
Per our discussion in the conference call this afternoon, we
want to include Kirk's session lock fix in NSS 3.4.

Kirk, please attach your patch to this bug for Bob to review.
Thanks.
Target Milestone: --- → 3.4
(Assignee)

Updated

17 years ago
Attachment #66792 - Attachment is obsolete: true
(Assignee)

Comment 6

17 years ago
Created attachment 67692 [details] [diff] [review]
sessionLock patch

1204-1-nosread rather than 4096-5-spread
Avoiding lock in PK11_GenerateRandom when using internal
random number generator.
(Assignee)

Updated

17 years ago
Attachment #67692 - Attachment is obsolete: true
(Assignee)

Comment 7

17 years ago
Created attachment 67693 [details] [diff] [review]
sessionLock patch 

1204-1-nosread rather than 4096-5-spread
Avoiding lock in PK11_GenerateRandom when using internal
random number generator.

Comment 8

17 years ago
Kirk,

You described your patch as "1204-1-nosread", but
your results showed that "spread" is better than
"nospread".  Why didn't you include "spread" in
your patch?
(Assignee)

Comment 9

17 years ago
Wan-Teh asked:
> You described your patch as "1204-1-nosread", but
> your results showed that "spread" is better than
> "nospread".  Why didn't you include "spread" in
> your patch?

My benchmarking on soupnazi showed the 1024-1-nospread
as nearly matching my fastest (4096-5-spread).  So I 
included it in the set I provided Shih-Hao for testing.
Shih-Hao found that the 4096 table size yielded 
insignificant improvement over the 1024-1-unspread config.

This appears to be a "sweat spot".  Its true - generally I
saw better numbers in SPREAD runs, but this one out did them
all.  We can provide Shih-Hao with the other configs:

	1024-1-spread
	1024-5-unspread
	1024-5-spread

and to verify its best for him too, but I believe 1024-1-unspread
is the safest choice to checkin at this point.
(Assignee)

Updated

17 years ago
Attachment #67693 - Attachment is obsolete: true
(Assignee)

Comment 10

17 years ago
Created attachment 67781 [details] [diff] [review]
sessionLock patch (with relaxed lock in PK11_GenerateRandom)

This is after Bob's first review.
We made it possible to have 1 lock per bucket, and
simplified places where the hash table index could be
used rather than the session handle.
(Assignee)

Updated

17 years ago
Attachment #67781 - Attachment is obsolete: true
(Assignee)

Comment 11

17 years ago
Created attachment 67817 [details] [diff] [review]
sessionLock fix

Changed names:
:%s/SESSION_LOCK_PER_BUCKET/BUCKETS_PER_SESSION_LOCK/g
:%s/SESSION_LOCK_BUCKETS/NUMBER_OF_SESSION_LOCKS/g
:%s/SESSION_LOCK_LOG2_PER_BUCKET/LOG2_BUCKETS_PER_SESSION_LOCK/g

Results follow - my benchmarks showed 1024-1-spread fastest.
Thats the bucket config of this patch.
Sending an array of libs to Shih-Hao for additional benchmark
comparisons.
(Assignee)

Comment 12

17 years ago
Created attachment 67822 [details]
results after update today, and latest sessionLock fix

On soupnazi:

soupnazi (Solaris 5.8)
    192.18.71.52    (DS lab 17/3383)
    SunOS soupnazi 5.8 Generic sun4u sparc SUNW,Ultra-4
    System Configuration:  Sun Microsystems  sun4u Sun
	Enterprise 450 (4 X UltraSPARC-II 400MHz)
    System clock frequency: 100 MHz
    Memory size: 1024 Megabytes

Best selfserv result was achived in 1024-1-spread run:

SESSION_HASH_SIZE 1024
LOG2_BUCKETS_PER_SESSION_LOCK 1
NOSPREAD undefined

Thats 2 buckets per lock, and 512 session locks.

Comment 13

17 years ago
Created attachment 67994 [details] [diff] [review]
sessionLock fix

This patch is based on Kirk's patch (attachment 67817 [details] [diff] [review]) with
the following changes.

1. I changed "pk11_SessionLock" to "PK11_SESSION_LOCK" to make
it clear that this is a macro.

2. In softoken/pkcs11.c, function pk11_DestroySlotData, I added
code to destroy slot->slotLock.

3. In softoken/pkcs11.c, function NSC_CloseSession, we were
grabbing the slot lock and the session lock simultaneously.
This makes it more likely to get lock contention.  Bob told
me how to rework the code so that we only need to grab one
lock at a time.

Kirk, please give this patch a try.

TO DO:
1. In softoken/pkcs11.c, function PK11_SlotInit, there is memory
leak in the error path if we can't create all the locks.  When
that happens, the locks that we have created should be destroyed.

2. Bob needs to document which lock protects which data in the
PK11Slot structure.
Attachment #67817 - Attachment is obsolete: true
(Assignee)

Comment 14

17 years ago
Created attachment 68000 [details] [diff] [review]
Apply this after the above patch for SlotInit failure leaks fix

Comment 15

17 years ago
Created attachment 68017 [details] [diff] [review]
sessionLock fix

I tweaked the white space in Kirk's patch to make the
diffs easier to read.  I checked this patch into the
tip.  Thanks, Kirk!
Attachment #67994 - Attachment is obsolete: true

Comment 16

17 years ago
Comment on attachment 68000 [details] [diff] [review]
Apply this after the above patch for SlotInit failure leaks fix

Kirk, your patch is fine, but I found that it does
not address the memory leaks on all the error paths
in that function (PK11_SlotInit).

Then I noticed that those memory leaks on error paths
are a pre-existing problem.  So I don't think we need
to address this problem in this bug.

I filed bug 123693 for the memory leaks in the error
paths in PK11_SlotInit().
Attachment #68000 - Flags: needs-work+

Comment 17

17 years ago
Kirk,

The only thing left to do for this bug is to document which locks
protect which data in the PK11Slot structure.  The locks are:
  slotLock
  sessionLock[i]
  objectLock

Thanks.
(Assignee)

Comment 18

17 years ago
Created attachment 68572 [details] [diff] [review]
SlotStr lock comments

Bob, please review this when you get a minute.
I'll checkin and close this bug on your approval.

Comment 19

17 years ago
You need to specify the actual elements slotLock protects (there are lots of
unprotected fields).

bob

Comment 20

17 years ago
Kirk,

I would like like to know exactly what sessionLock[i] protects.
It seems to be controlled by the PK11_SESSION_LOCK macro.

Comment 21

17 years ago
BTW don't forget the sessionLock[] protects the reference count of session
objects in that bucket.
(Assignee)

Updated

17 years ago
Attachment #68572 - Attachment is obsolete: true
(Assignee)

Comment 22

17 years ago
Created attachment 68892 [details] [diff] [review]
SlotStr comments

Updated with feedback from Bob.

Comment 23

17 years ago
Cool, looks good kirk.

bob
(Assignee)

Comment 24

17 years ago
Bob approved the SlotStr commenting, I've checked them in.
Closing this bug.

Shih-Hao achieved record numbers using a bucketing configuration
different from the current setting (1024-1-spread).  He used
1024-0-unspread.  He hasn't had an opportunity yet to try the
array of configurations I provided.  Its possible we'll want
to change the config after receiving feedback from Shih-Hao.
A separate bug will be opened in this case.
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.