Closed Bug 331096 Opened 18 years ago Closed 16 years ago

NSS Softoken must detect forks on all unix-ish platforms

Categories

(NSS :: Libraries, enhancement, P1)

3.11
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nelson, Assigned: julien.pierre)

Details

Attachments

(1 file, 5 obsolete files)

There are numerous products that work with NSS/Softoken and do NOT
work with most other PKCS#11 modules, because they initialize NSS
and then fork.  Apparenlty, NSS's Softoken is forgiving about this.

It shouldn't be.  The applications that don't work with other PKCS#11 
modules are now blaming NSS for their troubles, suggesting that once their 
programs have been tested and passed with NSS, they should work with any 
properly implemented PKCS#11 module.  They aregue that if it is necessary
to test/qualify their apps with every supported PKCS#11 module, that 
expense is so large as to cause them to reconsider use of PKCS#11.
This is political bad news for NSS.  This needs to be P1 or P2.

On unix platforms, NSC_Initialize should call getpid and stash the PID 
away in memory.  Then other PKCS11 operations in softoken should call 
getpid again, and ensure that the pid is the same as the one found by 
NSC_Initialize.  These operations should return a PKCS#11 error 
appropriate to this problem.  

We don't want to make a getpid system call in every PKCS#11 function call, 
because in small fast frequently used PKCS#11 functions, it would slow 
down the application unacceptably.  We want to do this in functions that 
are called infrequently, or that are slow enough that the cost of one
extra system call will go unnoticed, such as for RSA private key ops.

The functions that do the PID check can then set a private flag in 
softoken, that causes ALL further calls to softoken (except possibly
NSC_Finalize) to also report the fork problem.  That means that NSC
functions that don't want to call getpid every time can still check the 
library's have_forked flag every time and take appropriate action.

There are some versions of Unix where threads have their own PIDs,
even when they're still logically part of the same process address 
space.  I don't know if NSS supports them any more (or ever did),
but if so, then we need another solution for those platforms.  
Maybe we merely treat them as non-Unix platforms and don't do the 
PID check for them.  

Finally, perhaps there should be a way to enable/disable this check
at run time, so that broken applications can continue to function,
once they discover this problem, until they are fixed.
Please try to do this for 3.11.1
Priority: -- → P2
QA Contact: jason.m.reid → libraries
Priority: P2 → P1
Target Milestone: 3.11.1 → 3.11.2
Nelson,

We can use pthread_at_fork to add a handler when softoken is initialized.
This fork handler would set a flag in softoken private memory in the child process that would make all further PKCS#11 calls fail (except C_Finalize and C_Initialize). There should be no need to ever call getpid() .

The reason that softoken is apparently "forgiving" of the fork behavior problem is that servers typically read their certs and keys at initialization time, and that's the only time the databases are ever accessed in typical cases (non client-auth). But once you setup client auth, the fork problem will show even with softoken, because there is only one file handle for each of the cert and key databases accross all processes. The application which has the fork problem supports client auth, but does not QA their product with client auth anymore than they do PKCS#11 accelerators, so this bug of theirs slipped through.

IMO, enforcing this no-fork behavior in softoken is the right thing to do. The Solaris Crypto Framework already does that. I think this should be the default behavior, and we could have a way to disable the check, such as an environment variable; but I'm not convinced that we need one.

I don't think we can get away with only providing the softoken fork handler.  We will further break applications that are already partially broken when we do that. At least one server product is currently forking after NSS init, and hasn't produced a fixed for this bug yet. We can't break them 100% by enforcing no-fork in softoken. Also, the LDAP facility in Solaris 10 calls into NSS. Any Solaris 10 system with LDAP configured may invoke NSS when calling gethostbyname(). It's impossible to enforce this no-fork behavior in all programs that call gethostbyname(). We can't force all Solaris developers not to fork if they call gethostbyname() .

I suggest that we try to solve this problem by using pthread_at_fork for a second fork handler, an NSS one, rather than a softoken one. The NSS fork handler could shutdown all PKCS#11 modules in the child process that NSS had already loaded in the parent, including softoken, and then reinitialize them all again in the child process. Of course, there are other considerations with softoken such as safety for read/write. But this should work at least for the read-only case. This fix still won't be a panacea, since any PKCS#11 object handles obtained in the parent process may not be valid in the child after the modules are reloaded, depending on how the modules are implemented; but at least NSS and the PKCS#11 modules will not be in a state where they don't work at all. I don't think there is anything we can do about those handles, unfortunately; the application programmers really have to take care not to persist them and expect them to be valid after the fork ...

I think it's important to have the 2 fork handlers separate, since some programs use softoken directly from the Java PKCS#11 provider for example; and other programs use softoken through NSS. Other programs also use softoken through both NSS and the Java PKCS#11 provider.
Target Milestone: 3.11.2 → 3.11.3
Target Milestone: 3.11.3 → 3.12
Assignee: neil.williams → julien.pierre.boogz
Attached patch also link libpthread on AIX (obsolete) — Splinter Review
Attachment #300807 - Flags: review?(nelson)
Comment on attachment 300803 [details] [diff] [review]
add pthread_atfork handler, and make all PKCS#11 calls in the child process return CKR_DEVICE_ERROR

>+    CheckFork();

After macro expansion, there will be two semicolons at the end of line.
Wan-Teh,

That's true. It's also the case for other macros that already (fips check). That doesn't seem to be causing any problem or warning, not even with the AIX compiler.
Comment on attachment 300807 [details] [diff] [review]
also link libpthread on AIX

Why is this patch needed only on AIX and not on all posix/pthreads
platforms?  How does libpthreads get linked in on other posix/pthreads
platforms?  Seems to me that the linking of libpthreads should be done
in a similar way on all posix/pthreads platforms.
Comment on attachment 300803 [details] [diff] [review]
add pthread_atfork handler, and make all PKCS#11 calls in the child process return CKR_DEVICE_ERROR

I have a lot of issues with this patch.

1. effectiveness: Most of these new fork checks are added in 
functions that only get called during NSS_Init*, and not again
thereafter.  Consequently, these checks are likely to never get
called in a child that was forked after NSS was initialized.

Ideally, we want the very first call to a NSC_ function that occurs
after fork to do the fork check.  So, the question is, what NSC_
functions are most likely to be called first after a fork that follows
NSS initialization?  Maybe Bob relyea can make some suggestions here.

2. The fork checking macro should not be duplicated in multiple
source files, IMO.  

3. CheckFork is not a function, and doesn't behave like a function,
since it causes a return from the function that invokes it. 
So, rename the macro from CheckFork() to CHECK_FORK or FORK_CHECK 
and omit the parentheses.  

4. Generally, if statements outside of basic blocks are potentially
dangerous in macros.  It's best to embed them in basic blocks. 
The usual ploy is:

#define IF_MACRO \
   do { if (test) statement; } while (0)
Attachment #300803 - Flags: review?(nelson) → review-
Nelson,

1. My patch adds checks to every public PKCS#11 call that the child process could make. Please remember that the softoken may be used by applications that don't even call NSS init, for example Java applications. The PKCS#11 spec allows all other functions to be called after C_Initialize. We can't predict the order of PKCS#11 calls since it is a public API. My patch will ensure that if an application forks right after C_Initialize, all the PKCS#11 calls in the child process will fail, and the application will be alerted of the problem no matter what the next call is.

2. I will fix that in a later version.

3/4. I can change the case for CheckFork to denote that it is a macro.
There is already a precedent in fipstokn.c of macros that have parentheses if and return statements. See SFTK_FIPSCHECK and SFTK_FIPSFATALCHECK in fipstokn.c . I wrote CheckFork based on the same model.
Nelson,

None of the other platforms that I tested, that is to say the ones we build on at Sun - AIX, HP-UX, Linux, Solaris - produced link errors. Only AIX had a link error. That's why the patch is only needed there. I suspect other platforms like BSD may need additions, but I can't write the patch for the platforms I don't have access to, as the library names may be different.
In reply to comment 9: Perhaps I missed some of the NSC_ functions I was
hoping to find.  If the patch had been generated with -p, it would have
been much easier to see all the relevant function names.

Yes, there is precedent for macros that look like function calls but don't
behave like functions, but it's not a good precedent.

In reply to comment 10: Then evidently the other platforms are linking in 
libpthreads due to some linker option being supplied somewhere else, in 
some other makefile.  I'm saying it would be best if AIX put that linker
option in the same place relative to where the other platforms do it.
E.g. if Solaris does it in (say) some file in coreconf, then AIX should also.
Where does Solaris do it?
Comment on attachment 300803 [details] [diff] [review]
add pthread_atfork handler, and make all PKCS#11 calls in the child process return CKR_DEVICE_ERROR

Is there any way to unregister a fork handler?
What happens if (say)
- libsoftoken is initialized and we register a fork handler
- libsoftoken is shutdown and unloaded
- the process forks
?
Nelson,

No, there is no way for the application to manually unregister a fork handler.
I was wondering the very same questions yesterday when I wrote the patch. So I tested that scenario in pk11mode, which dynamically loads and unloads softoken. The fork handler did not get called again when the process forked after unloading libsofokten. I think dlclose takes care of automatically unregistering the fork handler.
Nelson,

On Solaris, the pthread_atfork reference is satisfied by linking with libc. There is no need to explicitly link to libpthread.
Note also that sqlite uses some pthreads functions, and its config.mk has this for AIX only as well. I don't think this should go in coreconf since the other shared libs (libnss3, libssl3, libnssckbi, libsmime3) don't need to link with libpthread.
Comment on attachment 300807 [details] [diff] [review]
also link libpthread on AIX

OK, I'm convinced for AIX. :)
Attachment #300807 - Flags: review?(nelson) → review+
Attached patch Update with Nelson's feedback (obsolete) — Splinter Review
- renamed CheckFork to CHECK_FORK
- used do { } while (0) in CHECK_FORK
- moved CHECK_FORK to softoken.h
- generated diff with -p to show that only FC_ and NSC_ functions invoke CHECK_FORK

I want to retain the parentheses for the macro. I think it would just make the code look odd in the macro invocations without them. The uppercase is sufficient to denote that it's a macro.
Attachment #300803 - Attachment is obsolete: true
Attachment #300947 - Flags: review?(nelson)
Attached patch Separate patch for softoken.h (obsolete) — Splinter Review
This allows you to diff the previous 2 patches
Attachment #300949 - Flags: review?(nelson)
Comment on attachment 300947 [details] [diff] [review]
Update with Nelson's feedback

Can the FIPS token just rely on the checks done in the NSC_ functions?
Comment on attachment 300949 [details] [diff] [review]
Separate patch for softoken.h

As a reviewer, when I see a line of code that looks like this:
    FunctionName();
I tend to assume it is a function call because of the parentheses.  
Even if I don't know exactly what that function does, I know that, 
since it is a function call, it cannot effect a goto or return in 
the code I am reading.  Since we don't allow setjmp/longjmp in NSS,
the flow of execution from a function call can only be to proceed 
to the next statement in line.  

Having the function name be in all CAPS with underscores clues me
that this is a macro and not a function, but the empty parentheses
still tell me that this macro invocation will behave like a function
call, and will not affect flow of execution in the caller.  AFAIAC,
that is the only purpose of trailing empty parentheses in a macro 
definition.  I don't mind having macros with empty parentheses when 
those macros behave like functions, and do not affect flow of 
execution in the caller.  

I don't think that macro invocations without parentheses look odd.
   FORK_CHECK;
doesn't strike me as odd or wrong at all.  BTW, neither does 
   FORK_CHECK  
(note the absence of a trailing semicolon) which avoids the problem 
of a meaningless semicolon when the macro expansion is empty.  

Nonetheless, Since this file already has a precedent for a macro 
that affects flow of execution while looking like a function, 
I'll let this one pass.  r+
Attachment #300949 - Flags: review?(nelson) → review+
Comment on attachment 300947 [details] [diff] [review]
Update with Nelson's feedback

Julien, 
Thanks for doing the diff -p. 

You wrote:
> My patch adds checks to every public PKCS#11 call that the child process
> could make.

Perhaps it does for the functions in files fipstokn.c and pkcs11.c, but 
it does not do so for any of the NSC functions in pkcs11c.c.  
I went looking for functions with names such as
NSC_Decrypt
NSC_DecryptInit
NSC_Encrypt
NSC_EncryptInit
(etc.) 
and did not find them in the patch.  All those functions are in pkcs11c.c
which is not touched by this patch.
Attachment #300947 - Flags: review?(nelson) → review-
Attached patch Separate patch for pkcs11c.c (obsolete) — Splinter Review
Good catch Nelson, I missed this source file. Since no changes were necessary for attachment 300947 [details] [diff] [review], please give that patch an r+ after you review this one.

Wan-Teh, potentially, the FIPS functions could rely on the checks from the NSC_ functions, but not all of them call the lower functions, or call them immediately. Anything that calls code before the checks and goes through a lock may cause a deadlock after the fork. The check is fairly inexpensive, and I wouldn't be too worried about doing 2 checks for some functions in FIPS mode since we don't care about the softoken performance in FIPS mode. Better safe than sorry in this case, IMO.
Attachment #300954 - Flags: review?(nelson)
Comment on attachment 300954 [details] [diff] [review]
Separate patch for pkcs11c.c

r+ for this patch.
But I still have an issue with the patch to pkcs11.c. 
See next comment.
Attachment #300954 - Flags: review?(nelson) → review+
Comment on attachment 300947 [details] [diff] [review]
Update with Nelson's feedback

This patch prevents NSC_Finalize from succeeding.  Don't we want the child to at least be able to close the open FDs?
Comment on attachment 300947 [details] [diff] [review]
Update with Nelson's feedback

I should have written:
prevents both NSC_Finalize and FC_Finalize from working.
I suspect there are a number of functions that release resources.  Maybe all those should continue to work.
A deadlock is also a failure.  Your goal is to make the softoken fail.

The checks in most FC_ functions are redundant.  The FC_ functions
that don't call the corresponding NSC_functions all call
nsc_Common* functions instead, and we can do the fork checks there.

You should also allow softoken initialization after fork in the
child processes.  This can be easily done.
Nelson,

Re: comment 25,

In theory, we would want NSC_Finalize and FC_Finalize to work in the child and
be able to do the cleanup. But to make that work would require making sure that
no locks are held in the parent before fork so that NSC_Finalize can complete
without any deadlock. The right place to clear the locks would be in a "prepare"
fork handler in the parent (first argument to pthread_atfork). However, I don't
know how to actually write such a handler.  I think it would be quite tricky
and more likely to make the parent deadlock before fork. So, I'm not even trying and making the child just fail. The caller needs to know not to fork after initializing softoken. This is the goal of the patch.

Wan-Teh,

Correct, a deadlock is a failure, and having the fork checks at the earlier possible time, upon entry of FC_ and NSC_ functions, ensures that there will not be a deadlock. I won't dispute that it's possible to remove the CHECK_FORK in the FC_ functions that are only a direct call to the nsc lower level functions. I just think it is much messier. If the FIPS and NSC implementations diverge in any way in the future, some of the CHECK_FORK may be forgotten or may be in the wrong place - ie. they would be in the nsc lower function, but not in the FC_ higher level. So, I think having the CHECK_FORK upon entry of all the public functions (FC_ and NSC_) is the most logical and maintainable place to have them.

As far as allowing softoken initialization after fork in child processes, how easy do you think it would be ? To do it right would require cleaning up and basically making NSC_Finalize and FC_Finalize work in the child, or making C_Initialize do that same cleanup in the child before reinitializing . See my response to Nelson earler in this comment about doing that.

We could try to reinitialize in the child without doing the full cleanup, say, by zero'ing all data structures, but then we'll run into issues of memory leaks, lock leaks, and potentially shared file descriptors with the parent if the child tries to reopen the same files as the parent did. I don't think that's going to be easy to fix.
Comment on attachment 300947 [details] [diff] [review]
Update with Nelson's feedback

(In reply to comment #26)
> The checks in most FC_ functions are redundant.  The FC_ functions
> that don't call the corresponding NSC_functions all call
> nsc_Common* functions instead, and we can do the fork checks there.

Yeah, they're redundant.  But I like the fact that it's easily provable
that all the relevant public functions make the checks.  That's less easy
to prove if you do the checks in common functions.

> You should also allow softoken initialization after fork in the
> child processes.  This can be easily done.

Well, as the patch stands right now, If softoken is not initialized when 
fork happens, then subsequent softoken initialization is not prevented.
If softoken is initialized when the fork happens, it cannot be shut down in 
the child.  I think it would have to be shut down before it could be 
re-initialized in the child.  So, the point you're raising seems to be
that, as the patch stands now, even if softoken could be shut down in the 
child after fork, it could not then be subsequently re-initialized.  

I agree, in principle, that it would be best if the child could shut down 
softoken and then, if desired, reinitialize softoken.  But I wouldn't stop
this patch while waiting for that much greater enhancement.

I find Julien's argument persuasive about causing even the _Finalize 
functions to fail unless and until we can find a way to make them succeed
without risk of deadlock.
Attachment #300947 - Flags: review- → review+
BTW, do we want this fix for 3.11.next?
I forgot to submit a comment that describes the scenario.  Sorry about the confusion.

Here is the scenario:
1. The parent does NOT initialize the softoken.
2. The parent forks a child.
3. The child initializes the softoken.

Note that the parent never initializes the softoken, so finalizing and reinitializing
softoken isn't an issue.

The current at-fork handler would prevent this from working.  This can be easily
made to work if the handler is defined like this:

+PRBool forked = PR_FALSE;
+
+void ForkedChild(void)
+{
+    if (nsc_init || nsf_init)
+        forked = PR_TRUE;
+}
If the parent never initializes softoken, then the atfork handler does not 
get registered, and hence is not called at fork time.  Right?
Wan-Teh,

Re: comment 30,

In this case, there is no problem, because pthread_atfork is only called when FC_Initialize or NSC_Initialize are successful. So the child forks, and has no problem initializing the softoken since ForkedChild is never called.

The problem would show up if :

1. The parent initializes the softoken.
2. The parent shuts down the softoken - but does not unload it
3. The parent forks a child.
4. The child initializes the softoken.

In this case, the child would fail to initialize softoken with my patch for no good reason.  Your change would fix that, so I will incorporate it.
Nelson,

Thanks for the reviews. This patch includes the change recommended by Wan-Teh in comment 30 in pkcs11.c .

There is also a change that Bob asked me to make in softoken.h and pkcs11.h to allow for disabling this code. Apparently Red Hat needs to build a version of softoken that does not depend on pthreads. So, I have added a check for !defined(NO_PTHREADS) in the ifdef's.

fipstokn.c, config.mk and pkcs11c.c are unchanged from the patches you submitted.
Attachment #300807 - Attachment is obsolete: true
Attachment #300947 - Attachment is obsolete: true
Attachment #300949 - Attachment is obsolete: true
Attachment #300954 - Attachment is obsolete: true
Attachment #300975 - Flags: review?(nelson)
Nelson,

Re: comment 29, I'm not sure if we want to do this in 3.11.10. We probably would want to run a bunch of servers against it before we ship NSS that way so they get time to fix themselves if they are broken. And we probably want the Solaris folks LDAP guys to test it too.
In comment 33, I meant "unchanged from the patches you reviewed".
Sorry about the confusion again.  I didn't read the entire patch
and wasn't thinking when pthread_atfork is called.

I am worried about the inability to unregister at-fork handlers.
What Julien reported in comment 13 is not documented, so it is
implementation dependent.

In Linux, NSPR registers a thread-specific data destructor.
If libnspr4.so is unloaded, that function (the infamous _pt_thread_death
function) is still called when the thread terminates, leading to
a crash.  We had to work around that problem.
Fair point. Unfortunately the spec doesn't say anything about this. I only performed the test I described in comment 13 on Solaris. I had created a pthread_atfork handler that would dump core. I got no core file so I knew that it didn't get invoked in the child.

I will check what happens with pk11mode on AIX, HP-UX, and a few versions of Linux, when adding a fork() after the PR_UnloadLibrary for softoken.
Comment on attachment 300975 [details] [diff] [review]
Consolidated patch incorporating Wan-Teh and Bob's changes

r=nelson, provided that the tests you proposed to do in comment 37 all pass (don't crash).
Attachment #300975 - Flags: review?(nelson) → review+
For my test, I was using the following fork handler :

void ForkedChild(void)
{
    printf("You evil forker.\n");
    PORT_Assert(0);
    if (nsc_init || nsf_init) {
        forked = PR_TRUE;
    }
}

The tests succeeded on Solaris 10 32 bits, RHEL3 32-bit, RHEL4 64-bits. No printout or core dump.

On AIX 32 bits and HP-UX 32 bits however, the fork handler was invoked - several times. 

[jp96085@nssaixb]/h/monstre/export/home/julien/nss/tip/mozilla/dist/AIX5.1_DBG.OBJ/bin 6 % time ./pk11mode
Loaded FC_GetFunctionList for FIPS MODE; slotID 0
You evil forker.
Assertion failure: 0, at /h/monstre/export/home/julien/nss/tip/mozilla/security/nss/lib/softoken/pkcs11.c:474
You evil forker.
Assertion failure: 0, at /h/monstre/export/home/julien/nss/tip/mozilla/security/nss/lib/softoken/pkcs11.c:474
**** Total number of TESTS ran in FIPS MODE is 87. ****
**** ALL TESTS PASSED ****
35.148u 0.057s 0:38.67 91.0%    3+6k 0+0io 44pf+0w

[jp96085@hploan1]/h/monstre/export/home/julien/nss/tip/mozilla/dist/HP-UXB.11.11_DBG.OBJ/bin 49 % ./pk11mode
Loaded FC_GetFunctionList for FIPS MODE; slotID 0
You evil forker.
Assertion failure: 0, at pkcs11.c:474
You evil forker.
Assertion failure: 0, at pkcs11.c:474
**** Total number of TESTS ran in FIPS MODE is 87. ****
**** ALL TESTS PASSED ****

I was only forking once, right before the end of main, so that was fairly surprising.

It turns out that the tests that pk11mode runs actually cause fork on those platform. I couldn't determine what they were trying to fork exactly due to the piss poor debugger support. It may have been a netstat fork. But it doesn't matter.

When I reduced pk11mode to do only C_Initialize and C_Finalize, and left my fork() call at the end of main after the PR_UnloadLibrary, the fork handler in the child was not invoked on either platform. This means the other PKCS#11 calls made by pk11mode were causing extra forks somehow. But after the library is unloaded, the fork handler is not involved in the child. So, this means my code is OK on all platforms that I tested.

Thus, I checked this in to the trunk :

Checking in config.mk;
/cvsroot/mozilla/security/nss/lib/softoken/config.mk,v  <--  config.mk
new revision: 1.23; previous revision: 1.22
done
Checking in fipstokn.c;
/cvsroot/mozilla/security/nss/lib/softoken/fipstokn.c,v  <--  fipstokn.c
new revision: 1.26; previous revision: 1.25
done
Checking in pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.145; previous revision: 1.144
done
Checking in pkcs11c.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11c.c,v  <--  pkcs11c.c
new revision: 1.100; previous revision: 1.99
done
Checking in softoken.h;
/cvsroot/mozilla/security/nss/lib/softoken/softoken.h,v  <--  softoken.h
new revision: 1.17; previous revision: 1.16
done


Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Did we implement the following from the description of the bug ?

"Finally, perhaps there should be a way to enable/disable this check
at run time, so that broken applications can continue to function,
once they discover this problem, until they are fixed."

I couldn't find it in the patch.
No, we did not.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: