Closed Bug 403240 Opened 17 years ago Closed 16 years ago

threads hanging in nss_InitLock

Categories

(NSS :: Libraries, defect, P2)

3.11.7

Tracking

(Not tracked)

RESOLVED FIXED
3.11.10

People

(Reporter: nelson, Assigned: julien.pierre)

References

Details

Attachments

(4 files, 4 obsolete files)

Bug 399304 comment 1 and Bug 399304 comment 2 both report hangs in which 
many threads are stuck inside of nss_InitLock.  Bug 399304 proposes to 
work around this problem by not using nss_InitLock for that particular
lock, and doing "early initialization" on that lock.  nss_InitLock 
continues to be used for other locks that are initialized on first use.

But the truth is that there is a problem with nss_InitLock.  It is 
evidently unreliable, and we should either fix it or stop using it 
altogether.  I think we need to understand WHY it fails before we make
any decision.  If we then decide that the problem is intractable, then
it would be valid to decide to abandon it.  But we should not abandon 
it simply because we do not understand how and why it fails.
Priority: -- → P1
I think we should remove all uses of nss_InitLock where possible . We have had many, many problems with it. Understanding the exact root cause would be nice, but we have observed time and again that it doesn't work.

There are 2 approaches :
a) fix the function and keep using it . We have already tried that many times, and obviously failed. We could try again for many years and still not get it to work right on all platforms and compilers.
b) stop using the broken function where we don't need to do late initialization. This is a much cleaner and better solution, IMO, not to mention it's a lot easier.

Late initialization is not in itself evil, but late initialization of anything must be protected by a memory barrier in order to work on all platforms regardless of CPU store order and compiler optimizations. The implementation of nss_InitLock does not go through a memory barrier. It only uses atomics, which are not a memory barrier. Memory barriers are usually available when one calls an OS lock function. Or we have to use some platform-specific assembly instruction that may be cheaper.

On platforms without total store order, such as PowerPC, Itanium, and PA-RISC, threads running on different CPUs can have a different concurrent view of *ppLock . Ie, one thread can have performed the assignment and the other threads are not seeing that yet. I would expect that to cause a memory leak - several threads will attempt to create the lock. Actually, I see that Slavo filed a bug about this very problem : bug 399296 . What's a little surprising is that it happened on Linux x86, a platform which is supposed to have total store order.
The likely explanation for this is too aggressive of a compiler optimization. 

There have been many issues in the past due to nss_initLock. See :
- bug 237934 - nss_InitLock not atomic . I already attempted to fix the function, but it was known that it would not work in every case.
- bug 273637
- bug 399304
- bug 399296

After the fix for bug 399296 is checked in, the only 2 remaining cases of nss_InitLock use where a race is possible will be in libssl . A workaround for their use was checked in a long time ago as part of bug 273637. So, IMO, that would warrant lowering this bug to a P3 bug, not P1.

I would suggest we fix that P3 bug the following way :

1) before the release of NSS 3.12, we delete this function from libnssutil3, and move it to libssl3 as a private function, since it will be its only remaining use. The one previously exported from libnss3 can be changed to a no-op since it is marked as __. It would be good not to have this broken function in our new libnssutil3 shared library.

2) we change all uses of nss_InitLock outside of libssl3 to simple calls to PZ_NewLock() since they are already protected some other way, by virtue of being called in some initialization function that is defined as non thread-safe (NSS_Initialize, C_Initialize).

After that, we can optionally :

3) create a proper libssl initialization function . This can be done without requiring any application changes by making NSS_Initialize load libssl3 and call it; and by also having SSL_ConfigServerSessionIDCache and SSL_InheritMPServerSIDCache call it. Then we can delete the broken nss_InitLock from libssl altogether and close this bug as INVALID.
Or we can just keep things the way they are, since servers already work correctly as SSL_ConfigServerSessionIDCache and SSL_InheritMPServerSIDCache have a fix to create all the libssl locks early .

I'm not saying it wouldn't be nice to have a working nss_InitLock . I think it's of great academic interest. But it's not required by NSS.
In comment 1, I meant "after the fix for bug 399304 is checked in".
I reject the proposal to make NSS_Init* always initialize libSSL, even in 
programs that don't use libSSL.  

This problem exists on systems that have no troubles with store order.
Therefore, this problem is not caused by store ordering.

Let's not run away from this problem.  If we minimize our use of 
nss_InitLock before we understand the bug's cause, we will delay the 
arrival of that understanding.

In bug 399304 comment 7, Wan-Teh suggested that this may be an optimization
issue.   After returning from a function call, optimizers are supposed to 
assume that they no longer know the values of ANY variables other than 
local variables whose addresses have not been revealed to code outside of 
the function's scope.  Wan-Teh suggested a simple experiment to eliminate
that possibility, forcing the compiler to not optimize away the accesses 
to the memory location by declaring it volatile.  Let's try that. 
Nelson,

re: comment 3,

a) That was only the 3rd part of my proposal in comment 1, and I designated it as optional. What about parts 1 and 2 ? These will not really reduce our dependency on nss_InitLock, since it is already used as a one-time initialier, except in libssl.

b) The exact cause of the hang in nss_initLock is not currently known. However, we know for sure it's code in nss_InitLock. Not using nss_Initlock anymore will not give us an understanding of the bug's root cause, but I don't believe having that understanding is P1 for any release. We aren't going to stop shipping NSS releases or writing code because that we don't have that precise understanding, especially when we can get NSS to work just fine or better without using this code. So, I'm lowering this to P3.

I have been doing a lot of reading this afternoon of various papers related to double-checked locking pattern, which is not quite the same as what we have here since there is no actual locking at all in nss_initLock, but similar (at least, the double-checked part, and late initialization). Many of the papers mention that the use of volatile is interpreted widely differently by compilers. One mentioned that VC2005 will properly handle volatile for use with DCLP, but older versions will not for example. I'm less than confident that volatile will work on all platforms. I will try it on the one where I have been able to reproducible the problem, which is Linux AMD64. But even if that fixes it in this one case, it may not in all cases.

PR_CallOnce and PR_CallOnceWithArg are supposed to solve the problem of late initialization, and they at least have a lock available, which is created an NSPR initialization time. So they have a better chance of working, in case where late initialization is really needed. There are still some edge cases theoritically that don't work on non-TSO platforms, but they are not have we have run into here.
Priority: P1 → P3
The memory leak of a lock that Slavo reported in bug 399296
has nothing to do with nss_InitLock.  The lock certTrustLock
is leaked because there is no PZ_DestroyLock(certTrustLock)
call in NSS.

Wan-Teh,

Good catch ! My fix for bug 399304 however added that missing PZ_DestroyLock.
Legend:

I(x):  after atomic increment, the value is x
BR  :  break out of the while loop
Y   :  thread yield
D(x):  after atomic increment, the value is x

Each thread repeats the I Y D sequence in the while loop.

A thread breaks out of the while loop after I(1).

Example of a live lock: three threads A, B, and C, and
two locks L1 and L2.

Thread A tries to create lock L1.
Thread B and Thread C try to create lock L2.

Initial value = 0

A       B       C
I(1)
        I(2)
D(1)    Y
BR             I(2)
               Y
        D(1)
        I(2)
        Y
               D(1)
               I(2)
               Y
        D(1)
        I(2)
        Y
               D(1)
               I(2)
               Y
        D(1)
        I(2)
        Y
Thank you very much, Wan-Teh. That made a lot of sense. I actually postulated this afternoon with Nelson in my office that having multiple locks being created simultaneously could be the cause of the problem, especially since I knew that I was fixing two of them. But I hadn't figured out in which case that could happen.
Now we know. One fix that comes to mind would be to yield after the decrement rather than before it. This might solve the particular case here, but not the general case.

I conclude from your finding that we can't reuse a single "static initializers" variable to protect multiple concurrent separate allocations. Do you agree ? Or do you think it's possible to rewrite the function to not run into this deadlock with just one such variable ? 
If each thread is running on a different processor, thread
yield is a no-op.  So moving the thread yield won't fix the
bug.

You can use a spin lock:

    static PRInt32 lock;

    PORT_Assert( ppLock != NULL);

    /* acquire the spin lock */
    while (PR_AtomicSet(&lock, 1) == 1) {
        PR_Sleep(PR_INTERVAL_NO_WAIT);    /* PR_Yield() */
    }
    if (!*ppLock) {
        *ppLock = PZ_NewLock(ltype);
    }
    /* release the spin lock */
    lock = 0;

    return (*ppLock != NULL) ? SECSuccess : SECFailure;

But we'll need to use memory barriers, and NSPR ran into
problems with using spin locks for PR_StackPush/PR_StackPop
on Solaris. 
I believe the thread yield in this spin lock should prevent
the problem that NSPR ran into with spin locks for
PR_StackPush/PR_StackPop on Solaris, so memory barriers
are the only remaining issue.

To avoid dealing with memory barriers, we have two options.
1. Use PR_CallOnce to create a PRLock, and then use that
around the code that creates *ppLock.
2. Create a PRLock in NSS initialization, and then use that
around the code that creates *ppLock.  The downside is that
nss_InitLock can only be used after NSS is initialized.
Wan-Teh,

I think a spinlock is not very efficient, and we probably want to avoid them due to the history of issues. If we use a lock as you suggest in comment 10, then we don't need the spinlock at all.
We just have :

    static PRInt32 onceInitLock;
    static PRLock* initLock;

    PR_CallOnce(createInitLock);
    PR_Lock(initLock);
    if (!*ppLock) {
        *ppLock = PZ_NewLock(ltype);
    }
    PR_Unlock(initLock);

I would be in favor of doing that, but of course it won't be very efficient either. IMO, doing that would be acceptable only in code that is infrequently executed. Since NSS is a library, there is no real way to predict how often functions are called in applications, except if for functions that are explicitly defined to be 1-time, like initialization functions. So, I think it is preferable not to do this late initialization if at all possible.

nss_InitLock is currently part of the util shared library . It would be best for it to be able to operate prior to NSS initialization. It's possible for it to be called without libnss3 being loaded at all, for example from softoken, as is currently the case. However, all such uses can be replaced with PZ_Lock, and should be.

This will leave libssl as the only caller of this function. The proposed implementation, with a spinlock or a lock, will be correct, but also quite inefficient. It would be an added point of contention on a global lock for all threads in a web server, for instance.

Here is a proposal to further reduce this last need for nss_InitLock :

1) Create an SSL_Init function
This will initialize the locks with PZ_Lock and set an sslInitialized flag

2) Have
SSL_ConfigServerSessionIDCache and
SSL_InheritMPServerSIDCache call SSL_Init, since these are normally done in single-thread mode

3) In places where the locks are used, test for sslInitialized . If true, just use the locks. If false, use nss_InitLock with PR_CallOnce, which code would be in libssl only

This way, SSL servers will not need to pay the penalty of using the correct nss_InitLock that requires a lock.

Other SSL applications (clients) would also be free to call SSL_Init if they don't want to pay the price for that lock.
We can use PR_CallOnce in libSSL to create those locks.

Note: the current implementation of nss_InitLock is also a
spin lock.
Comment on attachment 288967 [details] [diff] [review]
Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks

Does this patch compile?  Two functions that return SECStatus
don't have a return statement.  One function that has no parameter
is passed a NULL argument.
Comment on attachment 288967 [details] [diff] [review]
Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks

r- 

1) This patch changes ssl_InitClientSessionCacheLock() into a 
function that can only be called correctly and safely if it is
called "early".  But it is not always called early.  It is only
called early in servers, not in clients.  The code that it replaces
did work safely when called late, IFF all the callers were trying 
to initialize the same lock.  So this is a step backwards.

This function is called from function lock_cache(), which is called 
from several places, including functions ssl_FreeSID(), ssl_LookupSID(), 
CacheSID(), LockAndUncacheSID(), SSL_ClearSessionCache().  All of those 
(except the last) are called during the course of an SSL handshake, and so 
two SSL client threads both trying to do handshakes at the same time, when 
no other handshake has been done before, will race, potentially producing 
two locks and leaking one.

One potential solution is to change this function to use PR_CallOnce itself.  
But IIRC, back when I wrote this function, I was concerned that PR_CallOnce 
was too expensive to call in high performance code paths.  That's why the 
existing spin lock code was written.  It was intended to be much less 
overhead in the more frequent case where the lock is already created.

Minor things:

2) Since new function ssl_InitLocks is only called in sslsnce.c, I suggest
that it be defined there, and not in sslnonce.c.  Leave the declaration 
of ssl_InitClientSessionCacheLock in sslimpl.h, and call it from 
initlocks() in sslsnce.c.

3) style issues.  
New code doesn't follow the existing style in numerous places.  e.g. 

a) defining the function type and function name on the same line, instead
   of breaking the line so that the function name starts at the beginning 
   of a new line.

b) these source files use
       } else {  
   but the new code uses
       } else
       {

4) Call PORT_SetError in this function:
> SECStatus __nss_InitLock(   PZLock    **ppLock, nssILockType ltype )
> {
>-    return __nss_InitLock_Util(ppLock, ltype);
>+    return SECFailure;
> }


5) Please replace at least one of the occurrences of this new code:

>+        if (PR_SUCCESS == status) {
>+            return SECSuccess;
>+        } else {
>+            return SECFailure;
>+        }

With either 
     return (SECStatus)status;
or 
     return (PR_SUCCESS == status) ? SECSuccess ? SECFailure;
I strongly prefer the former.
Attachment #288967 - Flags: superreview?(nelson) → superreview-
Comment on attachment 288967 [details] [diff] [review]
Delete broken and unnecessary nss_InitLock and nss_InitMonitor . Make libssl use PR_CallOnce to create locks

Julien, I suggest that you split this patch into two patches.
The first patch deals with everything but lib/ssl, which
should be straightforward to review.  The second patch deals
with lib/ssl.

In lib/util/secport.c, remove #include "nspr.h".  It is not
necessary.

Don't forget to fix nssRWLock_AtomicCreate.

We may not be able to cvs remove nsslocks.h because it is an
exported header file.  We can move it to lib/nss and have it
declare only the nss_InitLock function.

In lib/nss/utilwrap.c, you can use a functional implementation
of __nss_InitLock temporarily, to relieve the pressure of removing
the use of nss_InitLock in lib/ssl.
Attachment #288967 - Flags: review?(wtc)
Target Milestone: 3.11.9 → 3.11.10
Wan-Teh,

Re: comment 14, yes the patch compiled, without warnings on Solaris studio, and with some about the lack of return statements on Linux with gcc. But there wasn't any warning about the extraneous NULL argument passed to initLocks in either case. Odd !

I will split the work into 2 patches in the hope to get things reviewed faster. Re: comment 16, how did you want me to fix nssRWLock_AtomicCreate ? Presumably by deleting it, the same way as nss_InitLock . This was a private function, and the only caller was in secoid_Init . This is called from NSS_Init and from C_Initialize which are presumed not to be thread-safe. So, I think the lock creation doesn't have to be atomic for that case.
A separate patch will follow for libssl to deal with Nelson's comments .
Attachment #288967 - Attachment is obsolete: true
Attachment #303201 - Flags: review?(wtc)
Attached patch Updated patch for libssl (obsolete) — Splinter Review
Nelson,

Re: comment 15 :

1) you are right, I forgot to make one important change . lock_cache was supposed to stop calling ssl_InitClientSessionCacheLock() and call ssl_InitLocks(PR_TRUE), which would make it go through PR_CallOnce . But I forgot to make that change before I submitted the patch for review.

2) this is no longer the case, since lock_cache is in sslnonce.c and it calls ssl_InitLocks() in this patch

3)
a) I have tried to conform to this, but there were already many instances were this style was not employed, in particular in the code I was replacing.
b) there was one instance of this which I fixed

4) this is relevant to the other patch (not libssl), attachment 303201 [details] [diff] [review] . I forgot to add the PORT_SetError in there. I can add it before checkin after it's reviewed.

5) I fixed this
Attachment #303205 - Flags: review?(nelson)
Attachment 303205 [details] [diff] (libssl patch) is for both the trunk and the NSS_3_11_BRANCH . I have checked that it applied to both.
Attachment 303201 [details] [diff] (everything else) is for the trunk only.
Priority: P3 → P2
Comment on attachment 303201 [details] [diff] [review]
Rid all of NSS except libssl of evil, broken late initialization of locks

r=wtc.  But this patch needs work before it can be checked in.

1. In nss/utilwrap.c

> SECStatus __nss_InitLock(   PZLock    **ppLock, nssILockType ltype )
> {
>-    return __nss_InitLock_Util(ppLock, ltype);
>+    return SECFailure;
> }

Please add a comment saying that this function is obsolete.
You should call PORT_SetError(PR_NOT_IMPLEMENTED_ERROR).
Since none of the current callers of nss_InitLock check its
return value, there is a small risk to just return SECFailure.
I suggest that you call abort().

2. Removal of nsslocks.h

Since nsslocks.h is an exported header file, I'm not sure if
it's okay to remove it.  At least, you also need to remove it
from the Solaris package file
mozilla/security/nss/pkg/solaris/SUNWtlsd/prototype.

3. I agree that we should remove nssRWLock_AtomicCreate rather
than trying to make it work, assuming we can fix #4 below correctly.

4. In util/secoid.c, the changes in this file need to be reviewed
carefully, because secoid_InitDynOidData is also called by SECOID_AddEntry.
You may want to request a second review for the nssRWLock_AtomicCreate
changes in this patch.

In secoid_InitDynOidData(), if you're sure that NSS can only be used
by a single thread at that time, then the NSSRWLock_LockWrite and
NSSRWLock_UnlockWrite are also unnecessary and should be removed.
Attachment #303201 - Flags: review?(wtc) → review+
Wan-Teh,

Thanks for your review in comment 21 .

1. I will add the PORT_SetError(PR_NOT_IMPLEMENTED_ERROR) . This was also requested by Nelson . I don't think calling abort is necessary. Any following code that tries to PR_Lock the NULL lock will have the same effect.

2. Good point. I will talk to Christophe about that. If this is not possible, I will change it to an empty file.

3. Good.

4. I think the best solution is to remove the call to secoid_InitDynOidData from SECMOD_AddEntry . This function shouldn't be called before secoid_init which is called by NSS_Initialize and C_Initialize .
Changes from the previous patch :
1) rename secoid_Init to SECOID_Init, since it is exported for libnssutil3
2) change the callers of secoid_Init to call SECOID_Init, and make sure they check the result
3) remove call to secoid_InitDynOidData from SECMOD_AddEntry 
4) add PORT_SetError(PR_NOT_IMPLEMENTED_ERROR) to __nss_InitLock in utilwrap
5) don't delete nsslocks.h . Just change its content.
Attachment #303201 - Attachment is obsolete: true
Attachment #303383 - Flags: review?(wtc)
Comment on attachment 303383 [details] [diff] [review]
Update for trunk - remove late initialization of locks accross NSS except for libssl

r=wtc.  Good work.

1. In lib/nss/utilwrap.c, include "prerror.h" instead of "prerr.h".
("prerror.h" is the documented header.  "prerr.h" is a generated
file included by "prerror.h".)

2. In lib/util/manifest.mn, add nsslocks.h back to the EXPORTS list
because you're not removing it from CVS.

If Christophe thinks it's OK to remove nsslocks.h from the NSS package,
I think we should cvs remove nsslocks.h.

3. In lib/util/secoid.c, secoid_InitDynOidData, you should remove
the NSSRWLock_LockWrite and NSSRWLock_UnlockWrite calls and the
if (!dynOidPool) test.  The block comment before this function should
be updated.

In SECOID_AddEntry, the comment "Caller has set error code" didn't
make sense before, and is more wrong now.  Please replace it with a
PORT_SetError(SEC_ERROR_NOT_INITIALIZED) call.

In SECOID_Init, the beginning should now look like this:

    if (oidhash) {
        return SECSuccess; /* already initialized */
    }

    if (secoid_InitDynOidData() != SECSuccess) {
        PORT_SetError(SEC_ERROR_LIBRARY_FAILURE);
        PORT_Assert(0); /*This function should never fail. */
        return SECFailure;
    }
Attachment #303383 - Flags: review?(wtc) → review+
Thanks, Wan-Teh. This patch should have all the changes you requested.
I can't check this in to the trunk until Nelson reviews the patch for libssl.
Attachment #303383 - Attachment is obsolete: true
Attachment #303628 - Flags: review?(wtc)
Also, I should note that Christophe stated the Solaris patch mechanism couldn't remove files (sigh), so it is better to update the nsslocks.h header to be empty than to remove it from updated packages.
Comment on attachment 303628 [details] [diff] [review]
Update with Wan-Teh's comments . For trunk only, and not for libssl

r=wtc.

In lib/util/secoid.c, we should delete the second sentence in the block comment
before secoid_InitDynOidData:

"This function MIGHT create the lock ... whether or not to call this function."

because it doesn't make sense now.
Attachment #303628 - Flags: review?(wtc) → review+
Thanks, Wan-Teh. I thought about removing that comment, but that case is still possible, if we run out of memory after creating the lock but before creating the pool. Not very likely, but just about the same as before. But I made sure all the callers of SECOID_Init check for failure so this should not be an actual problem.
Attachment #303205 - Attachment is obsolete: true
Attachment #303657 - Flags: review?(nelson)
Attachment #303205 - Flags: review?(nelson)
Blocks: 397486
Attachment #303657 - Flags: review?(nelson) → review+
Thanks, Nelson.

I checked in both patches to the trunk .

Checking in cmd/bltest/blapitest.c;
/cvsroot/mozilla/security/nss/cmd/bltest/blapitest.c,v  <--  blapitest.c
new revision: 1.52; previous revision: 1.51
done
Checking in lib/certdb/certdb.c;
/cvsroot/mozilla/security/nss/lib/certdb/certdb.c,v  <--  certdb.c
new revision: 1.90; previous revision: 1.89
done
Checking in lib/certdb/stanpcertdb.c;
/cvsroot/mozilla/security/nss/lib/certdb/stanpcertdb.c,v  <--  stanpcertdb.c
new revision: 1.78; previous revision: 1.77
done
Checking in lib/nss/nssinit.c;
/cvsroot/mozilla/security/nss/lib/nss/nssinit.c,v  <--  nssinit.c
new revision: 1.90; previous revision: 1.89
done
Checking in lib/nss/utilwrap.c;
/cvsroot/mozilla/security/nss/lib/nss/utilwrap.c,v  <--  utilwrap.c
new revision: 1.4; previous revision: 1.3
done
Checking in lib/softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.146; previous revision: 1.145
done
Checking in lib/softoken/legacydb/lginit.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/lginit.c,v  <--  lginit.c
new revision: 1.12; previous revision: 1.11
done
Checking in lib/ssl/ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.106; previous revision: 1.105
done
Checking in lib/ssl/ssl3ecc.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v  <--  ssl3ecc.c
new revision: 1.20; previous revision: 1.19
done
Checking in lib/ssl/sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.63; previous revision: 1.62
done
Checking in lib/ssl/sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.21; previous revision: 1.20
done
Checking in lib/ssl/sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.42; previous revision: 1.41
done
Checking in lib/util/manifest.mn;
/cvsroot/mozilla/security/nss/lib/util/manifest.mn,v  <--  manifest.mn
new revision: 1.17; previous revision: 1.16
done
Removing lib/util/nsslocks.c;
/cvsroot/mozilla/security/nss/lib/util/nsslocks.c,v  <--  nsslocks.c
new revision: delete; previous revision: 1.6
done
Checking in lib/util/nsslocks.h;
/cvsroot/mozilla/security/nss/lib/util/nsslocks.h,v  <--  nsslocks.h
new revision: 1.5; previous revision: 1.4
done
Checking in lib/util/nssrwlk.c;
/cvsroot/mozilla/security/nss/lib/util/nssrwlk.c,v  <--  nssrwlk.c
new revision: 1.8; previous revision: 1.7
done
Checking in lib/util/nssutil.def;
/cvsroot/mozilla/security/nss/lib/util/nssutil.def,v  <--  nssutil.def
new revision: 1.6; previous revision: 1.5
done
Checking in lib/util/secoid.c;
/cvsroot/mozilla/security/nss/lib/util/secoid.c,v  <--  secoid.c
new revision: 1.41; previous revision: 1.40
done
Checking in lib/util/secoid.h;
/cvsroot/mozilla/security/nss/lib/util/secoid.h,v  <--  secoid.h
new revision: 1.9; previous revision: 1.8
done
Checking in lib/util/secport.c;
/cvsroot/mozilla/security/nss/lib/util/secport.c,v  <--  secport.c
new revision: 1.21; previous revision: 1.20
done
Checking in lib/util/utilrename.h;
/cvsroot/mozilla/security/nss/lib/util/utilrename.h,v  <--  utilrename.h
new revision: 1.4; previous revision: 1.3
done
Comment on attachment 303657 [details] [diff] [review]
Update for libssl per IM

Wan-Teh, please review this for the branch. Thanks.
Attachment #303657 - Flags: superreview?(wtc)
Comment on attachment 303657 [details] [diff] [review]
Update for libssl per IM

Julien, could you ask someone else to review this
patch?

For the NSS 3.11 branch, why don't you just make
nss_InitLock good enough?  All you need is the
change I suggested in comment 9.
Comment on attachment 303657 [details] [diff] [review]
Update for libssl per IM

r=wtc.

1. In ssl3con.c, ssl_InitSymWrapKeysLock(void), we have:

>+    if (!symWrapKeysLock) {
>+        symWrapKeysLock = PZ_NewLock(nssILockOther);
>+    }
>+    return symWrapKeysLock ? SECSuccess:SECFailure;

The "if (!symWrapKeysLock)" test should be removed.  This
test has been replaced by the "if (LocksInitializedEarly)"
test and the PR_CallOnce call in ssl_InitLocks.

Nit: add spaces before and after the colon (:).

2. In sslnonce.c, ssl_InitClientSessionCacheLock, we have:

>+    if (!cacheLock) {
>+        cacheLock = PZ_NewLock(nssILockCache);
>+    }
>+    return cacheLock ? SECSuccess:SECFailure;

The same two comments as above.

3. In sslnonce.c, ssl_InitLocks:

>+/* lateInit means that the call is not happening during a 1-time
>+ * initialization function
>+ */
>+SECStatus
>+ssl_InitLocks(PRBool lateInit)

The comment seems wrong -- lateInit means that the call IS
happening during PR_CallOnce, which I believe is what you meant
by "a 1-time initialization function", right?

It would be less confusing to reverse the sense of the parameter
and rename it earlyInit.

Since none of the callers of ssl_InitLocks check its return
value, ssl_InitLocks should just return void.
Attachment #303657 - Flags: superreview?(wtc) → superreview+
(In reply to comment #28)

The comment "This function MIGHT create the lock, but not the pool, so
code should test for dynOidPool, not dynOidLock, when deciding whether
or not to call this function." describes how secoid_InitDynOidData should
be called in the original code:

secoid_Init:
1805     if (!dynOidPool && secoid_InitDynOidData() != SECSuccess) {
1806         return SECFailure;
1807     }

SECOID_AddEntry:
1717     if (!dynOidPool && secoid_InitDynOidData() != SECSuccess) {
1718         /* Caller has set error code. */
1719         return ret;
1720     }

Now we only call secoid_InitDynOidData() in SECOID_Init and we do NOT
test for dynOidPool to decide whether or not to call this function.
So the comment should be removed or updated to reflect the new code.

We should also delete the following comment, which describes the
nssRWLock_AtomicCreate call that was deleted:

1603     /* This function will create the lock if it doesn't exist,
1604     ** and will return the address of the lock, whether it was 
1605     ** previously created, or was created by the function.
1606     */
(In reply to comment #33)
> >+/* lateInit means that the call is not happening during a 1-time
> >+ * initialization function

> The comment seems wrong -- lateInit means that the call IS
> happening during PR_CallOnce, which I believe is what you meant
> by "a 1-time initialization function", right?

Wan-Teh, Having discussed this with Julien at length recently, I can tell
you that by "a 1-time initialization function", he did not mean a function
called through PR_CallOnce, but rather meant a function that is called
only once during process initialization code on at most a single thread, 
and is not called within code paths that may be repeated during program 
execution, such as in socket processing that may occur on multiple threads.  
Functions such as libSSL's server session cache initialization functions are
presumed to be of the type to which Julien referred with that phrase.
Perhaps there is a better phrase that can be used to convey that meaning.  

Having written that attempted clarification, I must express doubt about the
view that early initialization is somehow always exempt from the issues that
motivate this work.  Julien believes that any initialization done in a 
primordial thread, before any other threads are created, is exempt from any 
worries about storage order, because (he believes) the act of creating a new 
thread will perform a memory barrier call (or calls) sufficient to prevent 
the new threads from seeing memory in a state inconsistent with that seen by 
the primordial thread when it performed initialization.  I'm not entirely 
certain that's true, but I think that's a reasonable expectation for how it 
SHOULD work.  

My concern is because nothing in the definition of libSSL's cache initialization routines requires them to be executed only on a primordial 
thread before any and all other threads have been created.  It is true that 
NSS's own server test program does that, but it is not (stated as) required.
A single-process server might (for example) create numerous threads, and 
then initialize the SSL server session cache on one of them before calling 
PR_Accept or PR_Listen for the first time.  In such a case, even though there is no chance of the cache initialization function being called multiple times,
nothing seems to guarantee that the multiple threads will, thereafter, see a 
consistent view of the memory initialized during the cache initialization call.
Julien, perhaps you can say something like "lateInit means that the call
is not happening during a 1-time initialization function but rather during
dynamic, lazy initialization".
Wan-Teh, I checked in this supplemental patch to the trunk based on your feedback in comment 34 .

Checking in secoid.c;
/cvsroot/mozilla/security/nss/lib/util/secoid.c,v  <--  secoid.c
new revision: 1.42; previous revision: 1.41
Wan-Teh,

Re: comment 32, I prefer to keep the 3.11 branch and the trunk closer in sync. And this doesn't require additional coding work . Thanks for deciding to review my patch.

re: comment 33 issue 3, even though the callers of ssl_InitLocks don't check its return value currently, they may want to at some point. I dislike void functions and prefer to report the failure in case it is needed later.

I have addressed your other issues in this patch, which I checked in to the trunk :

Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.107; previous revision: 1.106
done
Checking in sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.22; previous revision: 1.21
done

Nelson,

Re: comment 35,
Thanks for clarifying that for Wan-Teh. I didn't feel well enough to do any work during the long weekend.

The main premise of these changes to libssl (and existing patches made before this one, such as in bug 273934) is that server applications normally initialize the SSL server session cache fairly early on.

SSL server apps  don't necessarily initialize the session cache it in the primordial thread, or before NSS_Initialize, as you point out. But they must already be doing it before processing any incoming SSL connections, in order to actually take advantage of the SSL server session cache for those connections. The hope is that they use a locking mechanism that also provides a memory barrier, such as a PRLock, which we provide as part of our NSPR/NSS libraries, to ensure that these calls to libssl happens after the initialization of the session cache, and thus have a view of memory that includes both the lock pointers and their content. If the server application uses a different type of lock which isn't a memory barrier, then the premise of these changes may not be true for the non-TSO platforms. If the application doesn't use any type of lock, then I would content that the application is already broken.
Nelson,

The end of my comment 38 about using locks only applies if the server application has pre-started its worker threads. Another way for the server application to meet the requirement of the SSL server API to initialize the cache before processing the incoming SSL connections, is for the application not to start its worker threads until after after it initializes the SSL server session cache. I believe that's what most server applications do. In those cases, the application doesn't need to use locks . It is then be up to the OS thread creation function, such as pthread_create, to provide a memory view to the new worker threads that includes the current changes made to memory up to this point in time to the current thread, including the SSL server session cache lock and its pointer. Anything else would be an operating system bug.
I checked in the combination of attachment 303657 [details] [diff] [review] and attachment 304350 [details] [diff] [review] to the NSS_3_11_BRANCH :

Checking in ssl3con.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3con.c,v  <--  ssl3con.c
new revision: 1.76.2.23; previous revision: 1.76.2.22
done
Checking in ssl3ecc.c;
/cvsroot/mozilla/security/nss/lib/ssl/ssl3ecc.c,v  <--  ssl3ecc.c
new revision: 1.3.2.13; previous revision: 1.3.2.12
done
Checking in sslimpl.h;
/cvsroot/mozilla/security/nss/lib/ssl/sslimpl.h,v  <--  sslimpl.h
new revision: 1.42.2.14; previous revision: 1.42.2.13
done
Checking in sslnonce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslnonce.c,v  <--  sslnonce.c
new revision: 1.17.2.1; previous revision: 1.17
done
Checking in sslsnce.c;
/cvsroot/mozilla/security/nss/lib/ssl/sslsnce.c,v  <--  sslsnce.c
new revision: 1.36.2.5; previous revision: 1.36.2.4
done
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: