Closed Bug 200708 Opened 22 years ago Closed 19 years ago

Some locks are not contended for

Categories

(NSS :: Libraries, defect, P2)

3.3.2
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugz, Assigned: nelson)

Details

(Whiteboard: 3.3.5)

Attachments

(3 files, 3 obsolete files)

Pallab has reported that NSS has many locks that are created but never contended for. One class of these are a result of creating arenas that are only used by one thread. There are two obvious cases (borne out by data Pallab collected): 1. Throwaway arenas There are many instances of the following code in NSS: some_function() { PLArenaPool *tmparena; tmparena = PORT_NewArena(); /* code that uses the arena */ PORT_FreeArena(tmparena); } Since the lifetime of the arena is entirely within the function, it obviously does not need to be threadsafe. In fact, we shouldn't have to do any allocation at all for such arenas, as a PLArenaPool can be on the stack. 2. Read-only objects I suspect many NSS objects only use their arena when they are constructed. For such objects, the arena does not need to be threadsafe. Experimentally, SECKEYPrivateKey seems to be such an object, though I haven't verified by code inspection. Fixing (1) would be easy, (2) is a little trickier, because we would have to be careful about not breaking things. I can think of two ways to change this behavior for arenas that do not need to be threadsafe: A. Stop using the PORT_ functions and just use NSPR directly. B. Create special PORT_ functions that don't do any locking. I would prefer A.
Really? Prefer A? I thought Sun was trying to stop using NSPR entirely. How about option c c) Create a new form of PORT_NewArena that takes an argument that specifies whether or not to create the lock. Then change the other functions to test the lock pointer before attempting to (un)lock it.
I have analyzed Pallab's data, which I will attach shortly. The data is based on 3.3.4. I am morphing the bug because it isn't as simple as modifying the use of arenas. In a single SSL connection, at startup, NSS (under the webserver) creates 56 uncontended locks. They are categorized as follows: 1) sslSocket (8) When the socket is not shared among threads, these locks have no contention. 2) Arenas in read-only NSS structures (7) 2a) CERTCertificateList (1) 2b) SECKEYPrivateKey (5) 2c) SECKEYDBKey (1) Unfortunately, the first two are public types. So while NSS seems to use them in a read-only manner (once they are created, the arena is not used again), we don't have control over how applications use them. 3) PKCS #11 session locks (18) 4) PKCS #11 object locks (2) 5) crypto context (PK11Context) locks (10) The lock is for session multiplexing, in the case where NSS exhausts the sessions available from a token, and thus has to switch the context between different operations. 6) ASN.1 decoding (5) 7) PKCS #5 decoding of algorithm ID (1)
Summary: Some arenas don't require locks → Some locks are not contended for
Made a mistake. The first item should be: 1) sslSocket (13) Another note: changes on the tip would reduce the number of PKCS#11 session locks to 10, and the number of crypto context locks to 6. Additionally, this data was obtained from an initial SSL connection, before the PKCS#11 session freelist is built, so in the long run the average number of new sessions (and hence new uncontended locks) in a given connections should be 4. With those considerations, on the tip the total should be about 38.
Some general observations. A) The ASN.1 locks are contained within SEC_ASN1DecodeItem. The arena is created to hold the decoder state and is only used within this function, so I believe this is an arena that doesn't need to be locked. B) The problem with the read-only objects, as I mentioned, is that they are public. For Stan, this should not be a problem. C) The crypto context, PKCS#11 session, PKCS#11 object, and sslSocket locks are all a result of NSS needing to provide thread safety, even when it may not be used. I wonder if something can be done about (C) for Stan. Could the softoken act like its not threadsafe, and rely on the above-PKCS#11 locking we use for such tokens? Then, NSS could make decisions about when to lock. We could write code in such a way that when we know sessions won't be exhausted and aren't being shared among threads, then the associated contexts and sessions wouldn't need locks, much the same way I proposed of having arenas without locking. But maybe that would be too difficult, I haven't really thought it out.
What was the test environment? If you only have one thread per SSL socket, then many locks will not be contended. But SSL is designed to allow "full duplex" (a.k.a. "two way simultaneous") communication with handshakes going on during the application data flow. An application that uses these features will need those other locks. Over the years, we've had MANY MANY cases where an application had more than one thread using the same socket, despite many denials from the application developers. Those situations led to crashes in NSS, and hence to blame of NSS software. NSS put all those locks in to protect itself against applications that sometimes use an SSL socket in more than one thread, even when they deny doing so. Prior to the addition of those locks, many application developers who claimed their applications never had multiple threads acting on the same socket had a poor opinion of NSS, because they claimed it crashed a lot. After putting in those locks, their opinion of NSS rose, because it stopped crashing. Of course, the implication of this historic fact is that the applications did indeed have multiple threads acting on the same sockets. I'd rather have NSS have uncontended locks (they shouldn't cost much, right) and have NSS be perceived as being relatively good quality than to remove those locks and have NSS be perceived as lower quality. If there was some way to be absolutely sure that an application really does not ever use an SSL socket in more than one thread, then I'd agree that we could eliminate those locks. But until then, ...
Added myself to the CC-list.
Just to clarify, the patch from comment #9 is for comment #2 case 6 the patch from comment #10 is for comment #2 case 2c (there were actually two instances of those locks, I miscounted)
Attachment #120740 - Flags: review?(nelsonb)
Attachment #120742 - Flags: review?(nelsonb)
Attachment #120745 - Flags: review?(relyea)
Whiteboard: 3.3.5
P2, at least. Ian, are you going to check this in on the trunk?
Priority: -- → P2
Yes. I've been meaning to redo the patches. For the first one (the code for creating non-locking arenas), I plan to add the thread id of the thread in which the arena was created as part of the arena, so that it can be asserted that the non-locking arenas are only used in one thread (this was suggested by Nelson). For the other two patches, I will add comments warning any and all readers of the code that the arena must never escape from the function it was created in, since it is not threadsafe.
Target Milestone: --- → 3.9
Attachment #120740 - Attachment is obsolete: true
Attachment #120742 - Attachment is obsolete: true
Attachment #120745 - Attachment is obsolete: true
Attachment #123178 - Flags: review?(nelsonb)
selfserv stress was a wash performance wise, but held up nicely. I did fullhandshakes for several hours.
Attachment #120740 - Flags: review?(nelsonb)
Attachment #120745 - Flags: review?(relyea)
Attachment #120742 - Flags: review?(nelsonb)
Comment on attachment 123178 [details] [diff] [review] revised, combined patch Ian, I apologize for taking so long to review this simple patch. Just this week, I learned how to search bugzilla for outstanding review requests, and found this one. r=MisterSSL, with one requested change, and one suggested change. >Index: mozilla/security/nss/lib/util/secport.c >@@ -77,12 +77,16 @@ > #endif /* THREADMARK */ > > /* The value of this magic must change each time PORTArenaPool changes. */ >-#define ARENAPOOL_MAGIC 0xB8AC9BDF >+#define ARENAPOOL_MAGIC 0xB8AC9BE0 1. Please keep the magic number ODD. We wanna crash if this is ever used as a pointer, and an odd value increases the likelihood of a crash. 2. The following lines of code appear in numerous places in your patch. >+#ifdef DEBUG >+ if (!pool->lock) >+ PORT_Assert(pool->creating_thread == PR_GetCurrentThread()); >+#endif /* DEBUG */ I suggest replacing those 4 lines with this line: PORT_Assert(pool->lock || pool->creating_thread == PR_GetCurrentThread());
Attachment #123178 - Flags: review?(MisterSSL) → review+
Moved to NSS 3.10. Ian, what's the status of this bug?
Target Milestone: 3.9 → 3.10
Taking Ian's perf bugs for 3.11.
Assignee: bugz → nelson
QA Contact: bishakhabanerjee → jason.m.reid
Target Milestone: 3.10 → 3.11
QA Contact: jason.m.reid → libraries
Fixed in NSS 3.11 with introduction of SSL_NOLOCKS socket option.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: