Closed
Bug 403240
Opened 17 years ago
Closed 16 years ago
threads hanging in nss_InitLock
Categories
(NSS :: Libraries, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.11.10
People
(Reporter: nelson, Assigned: julien.pierre)
References
Details
Attachments
(4 files, 4 obsolete files)
18.13 KB,
patch
|
wtc
:
review+
|
Details | Diff | Splinter Review |
5.21 KB,
patch
|
nelson
:
review+
wtc
:
superreview+
|
Details | Diff | Splinter Review |
1.03 KB,
patch
|
Details | Diff | Splinter Review | |
1.63 KB,
patch
|
Details | Diff | Splinter Review |
Bug 399304 comment 1 and Bug 399304 comment 2 both report hangs in which many threads are stuck inside of nss_InitLock. Bug 399304 proposes to work around this problem by not using nss_InitLock for that particular lock, and doing "early initialization" on that lock. nss_InitLock continues to be used for other locks that are initialized on first use. But the truth is that there is a problem with nss_InitLock. It is evidently unreliable, and we should either fix it or stop using it altogether. I think we need to understand WHY it fails before we make any decision. If we then decide that the problem is intractable, then it would be valid to decide to abandon it. But we should not abandon it simply because we do not understand how and why it fails.
Reporter | ||
Updated•17 years ago
|
Priority: -- → P1
Assignee | ||
Comment 1•17 years ago
|
||
I think we should remove all uses of nss_InitLock where possible . We have had many, many problems with it. Understanding the exact root cause would be nice, but we have observed time and again that it doesn't work. There are 2 approaches : a) fix the function and keep using it . We have already tried that many times, and obviously failed. We could try again for many years and still not get it to work right on all platforms and compilers. b) stop using the broken function where we don't need to do late initialization. This is a much cleaner and better solution, IMO, not to mention it's a lot easier. Late initialization is not in itself evil, but late initialization of anything must be protected by a memory barrier in order to work on all platforms regardless of CPU store order and compiler optimizations. The implementation of nss_InitLock does not go through a memory barrier. It only uses atomics, which are not a memory barrier. Memory barriers are usually available when one calls an OS lock function. Or we have to use some platform-specific assembly instruction that may be cheaper. On platforms without total store order, such as PowerPC, Itanium, and PA-RISC, threads running on different CPUs can have a different concurrent view of *ppLock . Ie, one thread can have performed the assignment and the other threads are not seeing that yet. I would expect that to cause a memory leak - several threads will attempt to create the lock. Actually, I see that Slavo filed a bug about this very problem : bug 399296 . What's a little surprising is that it happened on Linux x86, a platform which is supposed to have total store order. The likely explanation for this is too aggressive of a compiler optimization. There have been many issues in the past due to nss_initLock. See : - bug 237934 - nss_InitLock not atomic . I already attempted to fix the function, but it was known that it would not work in every case. - bug 273637 - bug 399304 - bug 399296 After the fix for bug 399296 is checked in, the only 2 remaining cases of nss_InitLock use where a race is possible will be in libssl . A workaround for their use was checked in a long time ago as part of bug 273637. So, IMO, that would warrant lowering this bug to a P3 bug, not P1. I would suggest we fix that P3 bug the following way : 1) before the release of NSS 3.12, we delete this function from libnssutil3, and move it to libssl3 as a private function, since it will be its only remaining use. The one previously exported from libnss3 can be changed to a no-op since it is marked as __. It would be good not to have this broken function in our new libnssutil3 shared library. 2) we change all uses of nss_InitLock outside of libssl3 to simple calls to PZ_NewLock() since they are already protected some other way, by virtue of being called in some initialization function that is defined as non thread-safe (NSS_Initialize, C_Initialize). After that, we can optionally : 3) create a proper libssl initialization function . This can be done without requiring any application changes by making NSS_Initialize load libssl3 and call it; and by also having SSL_ConfigServerSessionIDCache and SSL_InheritMPServerSIDCache call it. Then we can delete the broken nss_InitLock from libssl altogether and close this bug as INVALID. Or we can just keep things the way they are, since servers already work correctly as SSL_ConfigServerSessionIDCache and SSL_InheritMPServerSIDCache have a fix to create all the libssl locks early . I'm not saying it wouldn't be nice to have a working nss_InitLock . I think it's of great academic interest. But it's not required by NSS.
Assignee | ||
Comment 2•17 years ago
|
||
In comment 1, I meant "after the fix for bug 399304 is checked in".
Reporter | ||
Comment 3•17 years ago
|
||
I reject the proposal to make NSS_Init* always initialize libSSL, even in programs that don't use libSSL. This problem exists on systems that have no troubles with store order. Therefore, this problem is not caused by store ordering. Let's not run away from this problem. If we minimize our use of nss_InitLock before we understand the bug's cause, we will delay the arrival of that understanding. In bug 399304 comment 7, Wan-Teh suggested that this may be an optimization issue. After returning from a function call, optimizers are supposed to assume that they no longer know the values of ANY variables other than local variables whose addresses have not been revealed to code outside of the function's scope. Wan-Teh suggested a simple experiment to eliminate that possibility, forcing the compiler to not optimize away the accesses to the memory location by declaring it volatile. Let's try that.
Assignee | ||
Comment 4•17 years ago
|
||
Nelson, re: comment 3, a) That was only the 3rd part of my proposal in comment 1, and I designated it as optional. What about parts 1 and 2 ? These will not really reduce our dependency on nss_InitLock, since it is already used as a one-time initialier, except in libssl. b) The exact cause of the hang in nss_initLock is not currently known. However, we know for sure it's code in nss_InitLock. Not using nss_Initlock anymore will not give us an understanding of the bug's root cause, but I don't believe having that understanding is P1 for any release. We aren't going to stop shipping NSS releases or writing code because that we don't have that precise understanding, especially when we can get NSS to work just fine or better without using this code. So, I'm lowering this to P3. I have been doing a lot of reading this afternoon of various papers related to double-checked locking pattern, which is not quite the same as what we have here since there is no actual locking at all in nss_initLock, but similar (at least, the double-checked part, and late initialization). Many of the papers mention that the use of volatile is interpreted widely differently by compilers. One mentioned that VC2005 will properly handle volatile for use with DCLP, but older versions will not for example. I'm less than confident that volatile will work on all platforms. I will try it on the one where I have been able to reproducible the problem, which is Linux AMD64. But even if that fixes it in this one case, it may not in all cases. PR_CallOnce and PR_CallOnceWithArg are supposed to solve the problem of late initialization, and they at least have a lock available, which is created an NSPR initialization time. So they have a better chance of working, in case where late initialization is really needed. There are still some edge cases theoritically that don't work on non-TSO platforms, but they are not have we have run into here.
Priority: P1 → P3
Comment 5•17 years ago
|
||
The memory leak of a lock that Slavo reported in bug 399296 has nothing to do with nss_InitLock. The lock certTrustLock is leaked because there is no PZ_DestroyLock(certTrustLock) call in NSS.
Assignee | ||
Comment 6•17 years ago
|
||
Wan-Teh, Good catch ! My fix for bug 399304 however added that missing PZ_DestroyLock.
Comment 7•17 years ago
|
||
Legend: I(x): after atomic increment, the value is x BR : break out of the while loop Y : thread yield D(x): after atomic increment, the value is x Each thread repeats the I Y D sequence in the while loop. A thread breaks out of the while loop after I(1). Example of a live lock: three threads A, B, and C, and two locks L1 and L2. Thread A tries to create lock L1. Thread B and Thread C try to create lock L2. Initial value = 0 A B C I(1) I(2) D(1) Y BR I(2) Y D(1) I(2) Y D(1) I(2) Y D(1) I(2) Y D(1) I(2) Y D(1) I(2) Y
Assignee | ||
Comment 8•17 years ago
|
||
Thank you very much, Wan-Teh. That made a lot of sense. I actually postulated this afternoon with Nelson in my office that having multiple locks being created simultaneously could be the cause of the problem, especially since I knew that I was fixing two of them. But I hadn't figured out in which case that could happen. Now we know. One fix that comes to mind would be to yield after the decrement rather than before it. This might solve the particular case here, but not the general case. I conclude from your finding that we can't reuse a single "static initializers" variable to protect multiple concurrent separate allocations. Do you agree ? Or do you think it's possible to rewrite the function to not run into this deadlock with just one such variable ?
Comment 9•17 years ago
|
||
If each thread is running on a different processor, thread yield is a no-op. So moving the thread yield won't fix the bug. You can use a spin lock: static PRInt32 lock; PORT_Assert( ppLock != NULL); /* acquire the spin lock */ while (PR_AtomicSet(&lock, 1) == 1) { PR_Sleep(PR_INTERVAL_NO_WAIT); /* PR_Yield() */ } if (!*ppLock) { *ppLock = PZ_NewLock(ltype); } /* release the spin lock */ lock = 0; return (*ppLock != NULL) ? SECSuccess : SECFailure; But we'll need to use memory barriers, and NSPR ran into problems with using spin locks for PR_StackPush/PR_StackPop on Solaris.
Comment 10•17 years ago
|
||
I believe the thread yield in this spin lock should prevent the problem that NSPR ran into with spin locks for PR_StackPush/PR_StackPop on Solaris, so memory barriers are the only remaining issue. To avoid dealing with memory barriers, we have two options. 1. Use PR_CallOnce to create a PRLock, and then use that around the code that creates *ppLock. 2. Create a PRLock in NSS initialization, and then use that around the code that creates *ppLock. The downside is that nss_InitLock can only be used after NSS is initialized.
Assignee | ||
Comment 11•17 years ago
|
||
Wan-Teh, I think a spinlock is not very efficient, and we probably want to avoid them due to the history of issues. If we use a lock as you suggest in comment 10, then we don't need the spinlock at all. We just have : static PRInt32 onceInitLock; static PRLock* initLock; PR_CallOnce(createInitLock); PR_Lock(initLock); if (!*ppLock) { *ppLock = PZ_NewLock(ltype); } PR_Unlock(initLock); I would be in favor of doing that, but of course it won't be very efficient either. IMO, doing that would be acceptable only in code that is infrequently executed. Since NSS is a library, there is no real way to predict how often functions are called in applications, except if for functions that are explicitly defined to be 1-time, like initialization functions. So, I think it is preferable not to do this late initialization if at all possible. nss_InitLock is currently part of the util shared library . It would be best for it to be able to operate prior to NSS initialization. It's possible for it to be called without libnss3 being loaded at all, for example from softoken, as is currently the case. However, all such uses can be replaced with PZ_Lock, and should be. This will leave libssl as the only caller of this function. The proposed implementation, with a spinlock or a lock, will be correct, but also quite inefficient. It would be an added point of contention on a global lock for all threads in a web server, for instance. Here is a proposal to further reduce this last need for nss_InitLock : 1) Create an SSL_Init function This will initialize the locks with PZ_Lock and set an sslInitialized flag 2) Have SSL_ConfigServerSessionIDCache and SSL_InheritMPServerSIDCache call SSL_Init, since these are normally done in single-thread mode 3) In places where the locks are used, test for sslInitialized . If true, just use the locks. If false, use nss_InitLock with PR_CallOnce, which code would be in libssl only This way, SSL servers will not need to pay the penalty of using the correct nss_InitLock that requires a lock. Other SSL applications (clients) would also be free to call SSL_Init if they don't want to pay the price for that lock.
Comment 12•17 years ago
|
||
We can use PR_CallOnce in libSSL to create those locks. Note: the current implementation of nss_InitLock is also a spin lock.
Assignee | ||
Comment 13•17 years ago
|
||
Attachment #288967 -
Flags: superreview?(nelson)
Attachment #288967 -
Flags: review?(wtc)
Comment 14•17 years ago
|
||
Comment on attachment 288967 [details] [diff] [review] Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks Does this patch compile? Two functions that return SECStatus don't have a return statement. One function that has no parameter is passed a NULL argument.
Reporter | ||
Comment 15•17 years ago
|
||
Comment on attachment 288967 [details] [diff] [review] Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks r- 1) This patch changes ssl_InitClientSessionCacheLock() into a function that can only be called correctly and safely if it is called "early". But it is not always called early. It is only called early in servers, not in clients. The code that it replaces did work safely when called late, IFF all the callers were trying to initialize the same lock. So this is a step backwards. This function is called from function lock_cache(), which is called from several places, including functions ssl_FreeSID(), ssl_LookupSID(), CacheSID(), LockAndUncacheSID(), SSL_ClearSessionCache(). All of those (except the last) are called during the course of an SSL handshake, and so two SSL client threads both trying to do handshakes at the same time, when no other handshake has been done before, will race, potentially producing two locks and leaking one. One potential solution is to change this function to use PR_CallOnce itself. But IIRC, back when I wrote this function, I was concerned that PR_CallOnce was too expensive to call in high performance code paths. That's why the existing spin lock code was written. It was intended to be much less overhead in the more frequent case where the lock is already created. Minor things: 2) Since new function ssl_InitLocks is only called in sslsnce.c, I suggest that it be defined there, and not in sslnonce.c. Leave the declaration of ssl_InitClientSessionCacheLock in sslimpl.h, and call it from initlocks() in sslsnce.c. 3) style issues. New code doesn't follow the existing style in numerous places. e.g. a) defining the function type and function name on the same line, instead of breaking the line so that the function name starts at the beginning of a new line. b) these source files use } else { but the new code uses } else { 4) Call PORT_SetError in this function: > SECStatus __nss_InitLock( PZLock **ppLock, nssILockType ltype ) > { >- return __nss_InitLock_Util(ppLock, ltype); >+ return SECFailure; > } 5) Please replace at least one of the occurrences of this new code: >+ if (PR_SUCCESS == status) { >+ return SECSuccess; >+ } else { >+ return SECFailure; >+ } With either return (SECStatus)status; or return (PR_SUCCESS == status) ? SECSuccess ? SECFailure; I strongly prefer the former.
Attachment #288967 -
Flags: superreview?(nelson) → superreview-
Comment 16•17 years ago
|
||
Comment on attachment 288967 [details] [diff] [review] Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks Julien, I suggest that you split this patch into two patches. The first patch deals with everything but lib/ssl, which should be straightforward to review. The second patch deals with lib/ssl. In lib/util/secport.c, remove #include "nspr.h". It is not necessary. Don't forget to fix nssRWLock_AtomicCreate. We may not be able to cvs remove nsslocks.h because it is an exported header file. We can move it to lib/nss and have it declare only the nss_InitLock function. In lib/nss/utilwrap.c, you can use a functional implementation of __nss_InitLock temporarily, to relieve the pressure of removing the use of nss_InitLock in lib/ssl.
Assignee | ||
Updated•17 years ago
|
Attachment #288967 -
Flags: review?(wtc)
Assignee | ||
Updated•16 years ago
|
Target Milestone: 3.11.9 → 3.11.10
Assignee | ||
Comment 17•16 years ago
|
||
Wan-Teh, Re: comment 14, yes the patch compiled, without warnings on Solaris studio, and with some about the lack of return statements on Linux with gcc. But there wasn't any warning about the extraneous NULL argument passed to initLocks in either case. Odd ! I will split the work into 2 patches in the hope to get things reviewed faster. Re: comment 16, how did you want me to fix nssRWLock_AtomicCreate ? Presumably by deleting it, the same way as nss_InitLock . This was a private function, and the only caller was in secoid_Init . This is called from NSS_Init and from C_Initialize which are presumed not to be thread-safe. So, I think the lock creation doesn't have to be atomic for that case.
Assignee | ||
Comment 18•16 years ago
|
||
A separate patch will follow for libssl to deal with Nelson's comments .
Attachment #288967 -
Attachment is obsolete: true
Assignee | ||
Updated•16 years ago
|
Attachment #303201 -
Flags: review?(wtc)
Assignee | ||
Comment 19•16 years ago
|
||
Nelson, Re: comment 15 : 1) you are right, I forgot to make one important change . lock_cache was supposed to stop calling ssl_InitClientSessionCacheLock() and call ssl_InitLocks(PR_TRUE), which would make it go through PR_CallOnce . But I forgot to make that change before I submitted the patch for review. 2) this is no longer the case, since lock_cache is in sslnonce.c and it calls ssl_InitLocks() in this patch 3) a) I have tried to conform to this, but there were already many instances were this style was not employed, in particular in the code I was replacing. b) there was one instance of this which I fixed 4) this is relevant to the other patch (not libssl), attachment 303201 [details] [diff] [review] . I forgot to add the PORT_SetError in there. I can add it before checkin after it's reviewed. 5) I fixed this
Attachment #303205 -
Flags: review?(nelson)
Assignee | ||
Comment 20•16 years ago
|
||
Attachment 303205 [details] [diff] (libssl patch) is for both the trunk and the NSS_3_11_BRANCH . I have checked that it applied to both. Attachment 303201 [details] [diff] (everything else) is for the trunk only.
Priority: P3 → P2
Comment 21•16 years ago
|
||
Comment on attachment 303201 [details] [diff] [review] Rid all of NSS except libssl of evil, broken late initialization of locks r=wtc. But this patch needs work before it can be checked in. 1. In nss/utilwrap.c > SECStatus __nss_InitLock( PZLock **ppLock, nssILockType ltype ) > { >- return __nss_InitLock_Util(ppLock, ltype); >+ return SECFailure; > } Please add a comment saying that this function is obsolete. You should call PORT_SetError(PR_NOT_IMPLEMENTED_ERROR). Since none of the current callers of nss_InitLock check its return value, there is a small risk to just return SECFailure. I suggest that you call abort(). 2. Removal of nsslocks.h Since nsslocks.h is an exported header file, I'm not sure if it's okay to remove it. At least, you also need to remove it from the Solaris package file mozilla/security/nss/pkg/solaris/SUNWtlsd/prototype. 3. I agree that we should remove nssRWLock_AtomicCreate rather than trying to make it work, assuming we can fix #4 below correctly. 4. In util/secoid.c, the changes in this file need to be reviewed carefully, because secoid_InitDynOidData is also called by SECOID_AddEntry. You may want to request a second review for the nssRWLock_AtomicCreate changes in this patch. In secoid_InitDynOidData(), if you're sure that NSS can only be used by a single thread at that time, then the NSSRWLock_LockWrite and NSSRWLock_UnlockWrite are also unnecessary and should be removed.
Attachment #303201 -
Flags: review?(wtc) → review+
Assignee | ||
Comment 22•16 years ago
|
||
Wan-Teh, Thanks for your review in comment 21 . 1. I will add the PORT_SetError(PR_NOT_IMPLEMENTED_ERROR) . This was also requested by Nelson . I don't think calling abort is necessary. Any following code that tries to PR_Lock the NULL lock will have the same effect. 2. Good point. I will talk to Christophe about that. If this is not possible, I will change it to an empty file. 3. Good. 4. I think the best solution is to remove the call to secoid_InitDynOidData from SECMOD_AddEntry . This function shouldn't be called before secoid_init which is called by NSS_Initialize and C_Initialize .
Assignee | ||
Comment 23•16 years ago
|
||
Changes from the previous patch : 1) rename secoid_Init to SECOID_Init, since it is exported for libnssutil3 2) change the callers of secoid_Init to call SECOID_Init, and make sure they check the result 3) remove call to secoid_InitDynOidData from SECMOD_AddEntry 4) add PORT_SetError(PR_NOT_IMPLEMENTED_ERROR) to __nss_InitLock in utilwrap 5) don't delete nsslocks.h . Just change its content.
Attachment #303201 -
Attachment is obsolete: true
Attachment #303383 -
Flags: review?(wtc)
Comment 24•16 years ago
|
||
Comment on attachment 303383 [details] [diff] [review] Update for trunk - remove late initialization of locks accross NSS except for libssl r=wtc. Good work. 1. In lib/nss/utilwrap.c, include "prerror.h" instead of "prerr.h". ("prerror.h" is the documented header. "prerr.h" is a generated file included by "prerror.h".) 2. In lib/util/manifest.mn, add nsslocks.h back to the EXPORTS list because you're not removing it from CVS. If Christophe thinks it's OK to remove nsslocks.h from the NSS package, I think we should cvs remove nsslocks.h. 3. In lib/util/secoid.c, secoid_InitDynOidData, you should remove the NSSRWLock_LockWrite and NSSRWLock_UnlockWrite calls and the if (!dynOidPool) test. The block comment before this function should be updated. In SECOID_AddEntry, the comment "Caller has set error code" didn't make sense before, and is more wrong now. Please replace it with a PORT_SetError(SEC_ERROR_NOT_INITIALIZED) call. In SECOID_Init, the beginning should now look like this: if (oidhash) { return SECSuccess; /* already initialized */ } if (secoid_InitDynOidData() != SECSuccess) { PORT_SetError(SEC_ERROR_LIBRARY_FAILURE); PORT_Assert(0); /*This function should never fail. */ return SECFailure; }
Attachment #303383 -
Flags: review?(wtc) → review+
Assignee | ||
Comment 25•16 years ago
|
||
Thanks, Wan-Teh. This patch should have all the changes you requested. I can't check this in to the trunk until Nelson reviews the patch for libssl.
Attachment #303383 -
Attachment is obsolete: true
Attachment #303628 -
Flags: review?(wtc)
Assignee | ||
Comment 26•16 years ago
|
||
Also, I should note that Christophe stated the Solaris patch mechanism couldn't remove files (sigh), so it is better to update the nsslocks.h header to be empty than to remove it from updated packages.
Comment 27•16 years ago
|
||
Comment on attachment 303628 [details] [diff] [review] Update with Wan-Teh's comments . For trunk only, and not for libssl r=wtc. In lib/util/secoid.c, we should delete the second sentence in the block comment before secoid_InitDynOidData: "This function MIGHT create the lock ... whether or not to call this function." because it doesn't make sense now.
Attachment #303628 -
Flags: review?(wtc) → review+
Assignee | ||
Comment 28•16 years ago
|
||
Thanks, Wan-Teh. I thought about removing that comment, but that case is still possible, if we run out of memory after creating the lock but before creating the pool. Not very likely, but just about the same as before. But I made sure all the callers of SECOID_Init check for failure so this should not be an actual problem.
Assignee | ||
Comment 29•16 years ago
|
||
Attachment #303205 -
Attachment is obsolete: true
Attachment #303657 -
Flags: review?(nelson)
Attachment #303205 -
Flags: review?(nelson)
Reporter | ||
Updated•16 years ago
|
Attachment #303657 -
Flags: review?(nelson) → review+
Assignee | ||
Comment 30•16 years ago
|
||
Thanks, Nelson. I checked in both patches to the trunk . Checking in cmd/bltest/blapitest.c; /cvsroot/mozilla/security/nss/cmd/bltest/blapitest.c,v <-- blapitest.c new revision: 1.52; previous revision: 1.51 done Checking in lib/certdb/certdb.c; /cvsroot/mozilla/security/nss/lib/certdb/certdb.c,v <-- certdb.c new revision: 1.90; previous revision: 1.89 done Checking in lib/certdb/stanpcertdb.c; /cvsroot/mozilla/security/nss/lib/certdb/stanpcertdb.c,v <-- stanpcertdb.c new revision: 1.78; previous revision: 1.77 done Checking in lib/nss/nssinit.c; /cvsroot/mozilla/security/nss/lib/nss/nssinit.c,v <-- nssinit.c new revision: 1.90; previous revision: 1.89 done Checking in lib/nss/utilwrap.c; /cvsroot/mozilla/security/nss/lib/nss/utilwrap.c,v <-- utilwrap.c new revision: 1.4; previous revision: 1.3 done Checking in lib/softoken/pkcs11.c; /cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v <-- pkcs11.c new revision: 1.146; previous revision: 1.145 done Checking in lib/softoken/legacydb/lginit.c; /cvsroot/mozilla/security/nss/lib/softoken/legacydb/lginit.c,v <-- lginit.c new revision: 1.12; previous revision: 1.11 done Checking in lib/ssl/ssl3con.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v <-- ssl3con.c new revision: 1.106; previous revision: 1.105 done Checking in lib/ssl/ssl3ecc.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v <-- ssl3ecc.c new revision: 1.20; previous revision: 1.19 done Checking in lib/ssl/sslimpl.h; /cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v <-- sslimpl.h new revision: 1.63; previous revision: 1.62 done Checking in lib/ssl/sslnonce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v <-- sslnonce.c new revision: 1.21; previous revision: 1.20 done Checking in lib/ssl/sslsnce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v <-- sslsnce.c new revision: 1.42; previous revision: 1.41 done Checking in lib/util/manifest.mn; /cvsroot/mozilla/security/nss/lib/util/manifest.mn,v <-- manifest.mn new revision: 1.17; previous revision: 1.16 done Removing lib/util/nsslocks.c; /cvsroot/mozilla/security/nss/lib/util/nsslocks.c,v <-- nsslocks.c new revision: delete; previous revision: 1.6 done Checking in lib/util/nsslocks.h; /cvsroot/mozilla/security/nss/lib/util/nsslocks.h,v <-- nsslocks.h new revision: 1.5; previous revision: 1.4 done Checking in lib/util/nssrwlk.c; /cvsroot/mozilla/security/nss/lib/util/nssrwlk.c,v <-- nssrwlk.c new revision: 1.8; previous revision: 1.7 done Checking in lib/util/nssutil.def; /cvsroot/mozilla/security/nss/lib/util/nssutil.def,v <-- nssutil.def new revision: 1.6; previous revision: 1.5 done Checking in lib/util/secoid.c; /cvsroot/mozilla/security/nss/lib/util/secoid.c,v <-- secoid.c new revision: 1.41; previous revision: 1.40 done Checking in lib/util/secoid.h; /cvsroot/mozilla/security/nss/lib/util/secoid.h,v <-- secoid.h new revision: 1.9; previous revision: 1.8 done Checking in lib/util/secport.c; /cvsroot/mozilla/security/nss/lib/util/secport.c,v <-- secport.c new revision: 1.21; previous revision: 1.20 done Checking in lib/util/utilrename.h; /cvsroot/mozilla/security/nss/lib/util/utilrename.h,v <-- utilrename.h new revision: 1.4; previous revision: 1.3 done
Assignee | ||
Comment 31•16 years ago
|
||
Comment on attachment 303657 [details] [diff] [review] Update for libssl per IM Wan-Teh, please review this for the branch. Thanks.
Attachment #303657 -
Flags: superreview?(wtc)
Comment 32•16 years ago
|
||
Comment on attachment 303657 [details] [diff] [review] Update for libssl per IM Julien, could you ask someone else to review this patch? For the NSS 3.11 branch, why don't you just make nss_InitLock good enough? All you need is the change I suggested in comment 9.
Comment 33•16 years ago
|
||
Comment on attachment 303657 [details] [diff] [review] Update for libssl per IM r=wtc. 1. In ssl3con.c, ssl_InitSymWrapKeysLock(void), we have: >+ if (!symWrapKeysLock) { >+ symWrapKeysLock = PZ_NewLock(nssILockOther); >+ } >+ return symWrapKeysLock ? SECSuccess:SECFailure; The "if (!symWrapKeysLock)" test should be removed. This test has been replaced by the "if (LocksInitializedEarly)" test and the PR_CallOnce call in ssl_InitLocks. Nit: add spaces before and after the colon (:). 2. In sslnonce.c, ssl_InitClientSessionCacheLock, we have: >+ if (!cacheLock) { >+ cacheLock = PZ_NewLock(nssILockCache); >+ } >+ return cacheLock ? SECSuccess:SECFailure; The same two comments as above. 3. In sslnonce.c, ssl_InitLocks: >+/* lateInit means that the call is not happening during a 1-time >+ * initialization function >+ */ >+SECStatus >+ssl_InitLocks(PRBool lateInit) The comment seems wrong -- lateInit means that the call IS happening during PR_CallOnce, which I believe is what you meant by "a 1-time initialization function", right? It would be less confusing to reverse the sense of the parameter and rename it earlyInit. Since none of the callers of ssl_InitLocks check its return value, ssl_InitLocks should just return void.
Attachment #303657 -
Flags: superreview?(wtc) → superreview+
Comment 34•16 years ago
|
||
(In reply to comment #28) The comment "This function MIGHT create the lock, but not the pool, so code should test for dynOidPool, not dynOidLock, when deciding whether or not to call this function." describes how secoid_InitDynOidData should be called in the original code: secoid_Init: 1805 if (!dynOidPool && secoid_InitDynOidData() != SECSuccess) { 1806 return SECFailure; 1807 } SECOID_AddEntry: 1717 if (!dynOidPool && secoid_InitDynOidData() != SECSuccess) { 1718 /* Caller has set error code. */ 1719 return ret; 1720 } Now we only call secoid_InitDynOidData() in SECOID_Init and we do NOT test for dynOidPool to decide whether or not to call this function. So the comment should be removed or updated to reflect the new code. We should also delete the following comment, which describes the nssRWLock_AtomicCreate call that was deleted: 1603 /* This function will create the lock if it doesn't exist, 1604 ** and will return the address of the lock, whether it was 1605 ** previously created, or was created by the function. 1606 */
Reporter | ||
Comment 35•16 years ago
|
||
(In reply to comment #33) > >+/* lateInit means that the call is not happening during a 1-time > >+ * initialization function > The comment seems wrong -- lateInit means that the call IS > happening during PR_CallOnce, which I believe is what you meant > by "a 1-time initialization function", right? Wan-Teh, Having discussed this with Julien at length recently, I can tell you that by "a 1-time initialization function", he did not mean a function called through PR_CallOnce, but rather meant a function that is called only once during process initialization code on at most a single thread, and is not called within code paths that may be repeated during program execution, such as in socket processing that may occur on multiple threads. Functions such as libSSL's server session cache initialization functions are presumed to be of the type to which Julien referred with that phrase. Perhaps there is a better phrase that can be used to convey that meaning. Having written that attempted clarification, I must express doubt about the view that early initialization is somehow always exempt from the issues that motivate this work. Julien believes that any initialization done in a primordial thread, before any other threads are created, is exempt from any worries about storage order, because (he believes) the act of creating a new thread will perform a memory barrier call (or calls) sufficient to prevent the new threads from seeing memory in a state inconsistent with that seen by the primordial thread when it performed initialization. I'm not entirely certain that's true, but I think that's a reasonable expectation for how it SHOULD work. My concern is because nothing in the definition of libSSL's cache initialization routines requires them to be executed only on a primordial thread before any and all other threads have been created. It is true that NSS's own server test program does that, but it is not (stated as) required. A single-process server might (for example) create numerous threads, and then initialize the SSL server session cache on one of them before calling PR_Accept or PR_Listen for the first time. In such a case, even though there is no chance of the cache initialization function being called multiple times, nothing seems to guarantee that the multiple threads will, thereafter, see a consistent view of the memory initialized during the cache initialization call.
Comment 36•16 years ago
|
||
Julien, perhaps you can say something like "lateInit means that the call is not happening during a 1-time initialization function but rather during dynamic, lazy initialization".
Assignee | ||
Comment 37•16 years ago
|
||
Wan-Teh, I checked in this supplemental patch to the trunk based on your feedback in comment 34 . Checking in secoid.c; /cvsroot/mozilla/security/nss/lib/util/secoid.c,v <-- secoid.c new revision: 1.42; previous revision: 1.41
Assignee | ||
Comment 38•16 years ago
|
||
Wan-Teh, Re: comment 32, I prefer to keep the 3.11 branch and the trunk closer in sync. And this doesn't require additional coding work . Thanks for deciding to review my patch. re: comment 33 issue 3, even though the callers of ssl_InitLocks don't check its return value currently, they may want to at some point. I dislike void functions and prefer to report the failure in case it is needed later. I have addressed your other issues in this patch, which I checked in to the trunk : Checking in ssl3con.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v <-- ssl3con.c new revision: 1.107; previous revision: 1.106 done Checking in sslnonce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v <-- sslnonce.c new revision: 1.22; previous revision: 1.21 done Nelson, Re: comment 35, Thanks for clarifying that for Wan-Teh. I didn't feel well enough to do any work during the long weekend. The main premise of these changes to libssl (and existing patches made before this one, such as in bug 273934) is that server applications normally initialize the SSL server session cache fairly early on. SSL server apps don't necessarily initialize the session cache it in the primordial thread, or before NSS_Initialize, as you point out. But they must already be doing it before processing any incoming SSL connections, in order to actually take advantage of the SSL server session cache for those connections. The hope is that they use a locking mechanism that also provides a memory barrier, such as a PRLock, which we provide as part of our NSPR/NSS libraries, to ensure that these calls to libssl happens after the initialization of the session cache, and thus have a view of memory that includes both the lock pointers and their content. If the server application uses a different type of lock which isn't a memory barrier, then the premise of these changes may not be true for the non-TSO platforms. If the application doesn't use any type of lock, then I would content that the application is already broken.
Assignee | ||
Comment 39•16 years ago
|
||
Nelson, The end of my comment 38 about using locks only applies if the server application has pre-started its worker threads. Another way for the server application to meet the requirement of the SSL server API to initialize the cache before processing the incoming SSL connections, is for the application not to start its worker threads until after after it initializes the SSL server session cache. I believe that's what most server applications do. In those cases, the application doesn't need to use locks . It is then be up to the OS thread creation function, such as pthread_create, to provide a memory view to the new worker threads that includes the current changes made to memory up to this point in time to the current thread, including the SSL server session cache lock and its pointer. Anything else would be an operating system bug.
Assignee | ||
Comment 40•16 years ago
|
||
I checked in the combination of attachment 303657 [details] [diff] [review] and attachment 304350 [details] [diff] [review] to the NSS_3_11_BRANCH : Checking in ssl3con.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v <-- ssl3con.c new revision: 1.76.2.23; previous revision: 1.76.2.22 done Checking in ssl3ecc.c; /cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v <-- ssl3ecc.c new revision: 1.3.2.13; previous revision: 1.3.2.12 done Checking in sslimpl.h; /cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v <-- sslimpl.h new revision: 1.42.2.14; previous revision: 1.42.2.13 done Checking in sslnonce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v <-- sslnonce.c new revision: 1.17.2.1; previous revision: 1.17 done Checking in sslsnce.c; /cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v <-- sslsnce.c new revision: 1.36.2.5; previous revision: 1.36.2.4 done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•