Last Comment Bug 102251 - SSL session cache locking issue with NT fibers
: SSL session cache locking issue with NT fibers
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.4
: x86 Windows NT
: -- normal (vote)
: ---
Assigned To: Julien Pierre
: Sonja Mirtitsch
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2001-09-28 14:30 PDT by Julien Pierre
Modified: 2001-10-05 17:38 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
proposed patch (17.75 KB, patch)
2001-09-28 14:31 PDT, Julien Pierre
no flags Details | Diff | Splinter Review
new proposed patch (26.70 KB, patch)
2001-09-28 18:20 PDT, Julien Pierre
no flags Details | Diff | Splinter Review
one more patch for review, hopefully good enough for check-in (19.62 KB, patch)
2001-10-04 17:51 PDT, Julien Pierre
wtc: review+
Details | Diff | Splinter Review
last patch hopefully (20.18 KB, patch)
2001-10-05 15:17 PDT, Julien Pierre
wtc: review+
Details | Diff | Splinter Review

Description Julien Pierre 2001-09-28 14:30:51 PDT
When NT fibers are used in an SSL server, there is a possibility of deadlock in
the current (3.4) implementation of the session cache. The code acquires a Win32
mutex which is a blocking system call. If the fiber yields after acquiring the
mutex (as a result of sleep or NSPR I/O for example), another fiber could be
scheduled on the same native thread and block indefinitely when trying to
acquire the same Win32 mutex. This would be a deadlock. It is somewhat of a
theoritical problem that hasn't been demonstrated to occur with the current
code, but it is conceivable in the future that if the cache gets modified this
might happen. For this reason it is   necessary to implement a robust locking
mechanism that works with fibers.

If the fiber application is single-process, the solution is to use a PRLock for
locking, as PRLocks are safe for use with NT fibers and won't cause deadlocks.
However, PRLocks are local to a process and cannot work if the NT fiber
application is multi-process. In that case, a 2-level mechanism for locking is
required. First, a PRLock is used to prevent the aforementioned problem with NT
fibers within a process. Then, the Win32 mutex can be safely acquired to make
sure that the operation is happening safely accross processes.

The implementation of this scheme however is a little bit complicated,
especially because the code is also shared with the Win95 implementation of locking.
The patch involves modifying the Win32 sslMutex object from a lone Win32 mutex
(which, as demonstrated above, isn't adequate for fiber applications) to a
structure that can contain both a PRLock and a Win32 mutex, as well as a flag
indicating whether the mutex is cross-process (shared) or not. This information
was already passed to sslMutex_init but is needed further at runtime so it is
now stored.

From then on there are 3 cases :
1) for a single-process application, on either Win95 or WINNT :
Only the PRLock in the structure gets used . The higher-level sslMutex_ calls
fall back to generic single_process_sslMutex calls.

2) for a multi-process application on Win95.
Only the Win32 mutex in the structure gets used . This is equivalent to the old
implementation and is adequate since fibers are not available on WINNT. The
Win32  mutex provides exclusion from any threads in any process and is therefore
adequate.

3) for a multi-process application on WINNT.

This is the most complicated case. It may end up being of somewhat academic use,
as no multi-process servers are currently known to use fibers on NT. selfserv is
in the process of being modified to do that but iWS probably won't be.

For this case, the parent process creates sslMutex objects containing only a
Win32 mutex in shared memory, at the time the multi-process session cache is
initialized.

When child processes of the application execute, the session cache inheritance
function makes a local copy of the sidCacheLock structures, and then updates
them to add a PRLock into the sslMutex object. This has to be done in the
high-level code in sslsnce.c as the lower-level code only manipulates sslMutex
objects which are initially created in shared memory. Fortunately this code is
very well isolated within an #ifdef WINNT .
Later on, when locking is needed, the PRLock is first acquired for fiber &
thread safety, then the Win32 mutex for process-safety.
Comment 1 Julien Pierre 2001-09-28 14:31:56 PDT
Created attachment 51290 [details] [diff] [review]
proposed patch
Comment 2 Julien Pierre 2001-09-28 14:34:13 PDT
I have attached a proposed patch. Please review and comment. I may not exactly
have preserved all the existing code style yet but don't expect this to be quite
the final patch for this yet because of its size.

In addition to the above-mentioned changes, I have also changed the Unix
implementations to always use a PRLock if the application is single-process. If
the application is multi-process, the code falls back to the older Unix
implementations of sslMutex. This should cause performance to improve on
single-process SSL servers for Unix.
Comment 3 Julien Pierre 2001-09-28 15:09:18 PDT
It appears there is still a problem right now on Unix (Solaris). Please hold off
the review. Thanks.
Comment 4 Julien Pierre 2001-09-28 18:20:55 PDT
Created attachment 51366 [details] [diff] [review]
new proposed patch
Comment 5 Julien Pierre 2001-09-28 18:23:42 PDT
The problem with Solaris was related to values I wasn't returning properly to
the higher level. I fixed the formatting to be more consistent with the rest of
the code, and to look OK in vi. I also added an implementation for "other"
platforms (non-Windows and non-Unix) that will work in single-process mode.

I just attached a new patch which solves all the known issues. I have tested it
with selfserv in single and multi-process on NT with fibers, NT with native
threads, Solaris, and AIX .

Comment 6 Keyser Sose 2001-10-01 20:37:01 PDT
Marking NEW so it can get a review/sr= for the patch.
Comment 7 Nelson Bolyard (seldom reads bugmail) 2001-10-01 21:26:21 PDT
Keyser Sosez: This is an NSS change for servers, not going into mozilla,
not subject to the usual browser review rules.
Comment 8 Julien Pierre 2001-10-04 17:51:30 PDT
Created attachment 52159 [details] [diff] [review]
one more patch for review, hopefully good enough for check-in
Comment 9 Wan-Teh Chang 2001-10-05 13:37:46 PDT
Julien,

I reviewed your latest patch (attachment 52159 [details] [diff] [review]).  In general
your patch is good.  Most of my suggested changes are just
stylistic changes.

1. In single_process_sslMutex_Destroy(), we should return
   SECFailure after the PORT_SetError() call.

2. In SECStatus sslMutex_ChildInit(sslMutex *sem), it is
   better to rename the 'sem' parameter as 'pMutex'.  Also,
   not only the child processes but the parent process
   should also call this function, right?  I would suggest
   that you not add the new function sslMutex_ChildInit().
   We can just call single_process_sslMutex_Init().

3. In the Windows version of sslMutex_Destroy, the data type
   of 'retvalue' should be SECStatus.

4. In sslmutex.h, it is better to say
   #include "prtypes.h"
   #include "prlock.h"
   
   <foo.h> should only be used for standard headers.  NSPR is
   not there yet ;-)

5. In sslmutex.h, the Unix version of sslMutex,  it would be
   better to rename 'sslMutx' as something like 'sem' or 'sslSem'
   to suggest the fact that it is a sem_t.

6. In sslsnce.c, it is better to use PORT_Alloc() in place of malloc().
   We also need to make sure that the new WINNT sidCacheLocks are freed
   when we destroy the cache.


Comment 10 Julien Pierre 2001-10-05 15:16:51 PDT
1. fixed
2. I have renamed the childinit function to sslMutex_2LevelInit, and created a
matching sslMutex_2LevelDestroy . This needed to be done in the parent as well
which was not the case, so I have fixed that now.
3. fixed
4. fixed
5. fixed
6. I have a comment in the code about the need to free the memory, when a
session cache free function is created
Comment 11 Julien Pierre 2001-10-05 15:17:15 PDT
Created attachment 52276 [details] [diff] [review]
last patch hopefully
Comment 12 Wan-Teh Chang 2001-10-05 17:10:33 PDT
Comment on attachment 52276 [details] [diff] [review]
last patch hopefully

This patch is good.  I only suggested one minor
change (make sslMutex_2LevelDestroy static).
Comment 13 Julien Pierre 2001-10-05 17:38:37 PDT
Checked in the fix.

Note You need to log in before you can comment on or make changes to this bug.