When NT fibers are used in an SSL server, there is a possibility of deadlock in
the current (3.4) implementation of the session cache. The code acquires a Win32
mutex which is a blocking system call. If the fiber yields after acquiring the
mutex (as a result of sleep or NSPR I/O for example), another fiber could be
scheduled on the same native thread and block indefinitely when trying to
acquire the same Win32 mutex. This would be a deadlock. It is somewhat of a
theoritical problem that hasn't been demonstrated to occur with the current
code, but it is conceivable in the future that if the cache gets modified this
might happen. For this reason it is necessary to implement a robust locking
mechanism that works with fibers.
If the fiber application is single-process, the solution is to use a PRLock for
locking, as PRLocks are safe for use with NT fibers and won't cause deadlocks.
However, PRLocks are local to a process and cannot work if the NT fiber
application is multi-process. In that case, a 2-level mechanism for locking is
required. First, a PRLock is used to prevent the aforementioned problem with NT
fibers within a process. Then, the Win32 mutex can be safely acquired to make
sure that the operation is happening safely accross processes.
The implementation of this scheme however is a little bit complicated,
especially because the code is also shared with the Win95 implementation of locking.
The patch involves modifying the Win32 sslMutex object from a lone Win32 mutex
(which, as demonstrated above, isn't adequate for fiber applications) to a
structure that can contain both a PRLock and a Win32 mutex, as well as a flag
indicating whether the mutex is cross-process (shared) or not. This information
was already passed to sslMutex_init but is needed further at runtime so it is
From then on there are 3 cases :
1) for a single-process application, on either Win95 or WINNT :
Only the PRLock in the structure gets used . The higher-level sslMutex_ calls
fall back to generic single_process_sslMutex calls.
2) for a multi-process application on Win95.
Only the Win32 mutex in the structure gets used . This is equivalent to the old
implementation and is adequate since fibers are not available on WINNT. The
Win32 mutex provides exclusion from any threads in any process and is therefore
3) for a multi-process application on WINNT.
This is the most complicated case. It may end up being of somewhat academic use,
as no multi-process servers are currently known to use fibers on NT. selfserv is
in the process of being modified to do that but iWS probably won't be.
For this case, the parent process creates sslMutex objects containing only a
Win32 mutex in shared memory, at the time the multi-process session cache is
When child processes of the application execute, the session cache inheritance
function makes a local copy of the sidCacheLock structures, and then updates
them to add a PRLock into the sslMutex object. This has to be done in the
high-level code in sslsnce.c as the lower-level code only manipulates sslMutex
objects which are initially created in shared memory. Fortunately this code is
very well isolated within an #ifdef WINNT .
Later on, when locking is needed, the PRLock is first acquired for fiber &
thread safety, then the Win32 mutex for process-safety.
Created attachment 51290 [details] [diff] [review]
I have attached a proposed patch. Please review and comment. I may not exactly
have preserved all the existing code style yet but don't expect this to be quite
the final patch for this yet because of its size.
In addition to the above-mentioned changes, I have also changed the Unix
implementations to always use a PRLock if the application is single-process. If
the application is multi-process, the code falls back to the older Unix
implementations of sslMutex. This should cause performance to improve on
single-process SSL servers for Unix.
It appears there is still a problem right now on Unix (Solaris). Please hold off
the review. Thanks.
Created attachment 51366 [details] [diff] [review]
new proposed patch
The problem with Solaris was related to values I wasn't returning properly to
the higher level. I fixed the formatting to be more consistent with the rest of
the code, and to look OK in vi. I also added an implementation for "other"
platforms (non-Windows and non-Unix) that will work in single-process mode.
I just attached a new patch which solves all the known issues. I have tested it
with selfserv in single and multi-process on NT with fibers, NT with native
threads, Solaris, and AIX .
Marking NEW so it can get a review/sr= for the patch.
Keyser Sosez: This is an NSS change for servers, not going into mozilla,
not subject to the usual browser review rules.
Created attachment 52159 [details] [diff] [review]
one more patch for review, hopefully good enough for check-in
I reviewed your latest patch (attachment 52159 [details] [diff] [review]). In general
your patch is good. Most of my suggested changes are just
1. In single_process_sslMutex_Destroy(), we should return
SECFailure after the PORT_SetError() call.
2. In SECStatus sslMutex_ChildInit(sslMutex *sem), it is
better to rename the 'sem' parameter as 'pMutex'. Also,
not only the child processes but the parent process
should also call this function, right? I would suggest
that you not add the new function sslMutex_ChildInit().
We can just call single_process_sslMutex_Init().
3. In the Windows version of sslMutex_Destroy, the data type
of 'retvalue' should be SECStatus.
4. In sslmutex.h, it is better to say
<foo.h> should only be used for standard headers. NSPR is
not there yet ;-)
5. In sslmutex.h, the Unix version of sslMutex, it would be
better to rename 'sslMutx' as something like 'sem' or 'sslSem'
to suggest the fact that it is a sem_t.
6. In sslsnce.c, it is better to use PORT_Alloc() in place of malloc().
We also need to make sure that the new WINNT sidCacheLocks are freed
when we destroy the cache.
2. I have renamed the childinit function to sslMutex_2LevelInit, and created a
matching sslMutex_2LevelDestroy . This needed to be done in the parent as well
which was not the case, so I have fixed that now.
6. I have a comment in the code about the need to free the memory, when a
session cache free function is created
Created attachment 52276 [details] [diff] [review]
last patch hopefully
Comment on attachment 52276 [details] [diff] [review]
last patch hopefully
This patch is good. I only suggested one minor
change (make sslMutex_2LevelDestroy static).
Checked in the fix.