Last Comment Bug 462293 - Crash on fork after Softoken is dlClose'd on some Unix platforms in NSS 3.12
: Crash on fork after Softoken is dlClose'd on some Unix platforms in NSS 3.12
Status: RESOLVED FIXED
regression crash
: crash, regression
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.12
: All All
: P1 critical (vote)
: 3.12.3
Assigned To: Julien Pierre
:
:
Mentors:
: 462453 (view as bug list)
Depends on:
Blocks: 462453
  Show dependency treegraph
 
Reported: 2008-10-29 22:20 PDT by Julien Pierre
Modified: 2008-11-18 16:17 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Use PID checks on Solaris <10 (3.20 KB, patch)
2008-10-30 17:12 PDT, Julien Pierre
no flags Details | Diff | Splinter Review
Use PID checks on Solaris < 10 . Update. (3.61 KB, patch)
2008-10-30 22:34 PDT, Julien Pierre
nelson: review+
Details | Diff | Splinter Review
Add fork tests to pk11mode (7.02 KB, patch)
2008-10-30 22:41 PDT, Julien Pierre
no flags Details | Diff | Splinter Review
Update with Nelson's feedback (checked in) (3.48 KB, patch)
2008-10-31 18:49 PDT, Julien Pierre
no flags Details | Diff | Splinter Review
Test update (7.03 KB, patch)
2008-10-31 19:00 PDT, Julien Pierre
nelson: review-
Details | Diff | Splinter Review
Second test update (checked in) (6.89 KB, patch)
2008-11-06 17:42 PST, Julien Pierre
nelson: review+
Details | Diff | Splinter Review
Fix crash on other Unix platforms (4.19 KB, patch)
2008-11-07 18:38 PST, Julien Pierre
wtc: review+
Details | Diff | Splinter Review
Patch update (196 bytes, patch)
2008-11-12 18:34 PST, Julien Pierre
no flags Details | Diff | Splinter Review
Correct file (6.96 KB, patch)
2008-11-13 15:52 PST, Julien Pierre
wtc: review+
Details | Diff | Splinter Review
Incorporate Wan-Teh's feedback. Add one more pk11mode test. (12.19 KB, patch)
2008-11-14 18:04 PST, Julien Pierre
wtc: review+
Details | Diff | Splinter Review

Description Julien Pierre 2008-10-29 22:20:56 PDT
We need an alternate method of detecting forks on that platform.
The logic could be the following :

- In C_Initialize, record the PID to a global.
- Change the CHECK_FORK macro to call getpid() and compare it to the global variable.

This is less efficient than using pthread_atfork, so it should be done conditionally only for s8 and s9. We can continue to use the current method on s10+.
Comment 1 Julien Pierre 2008-10-29 22:40:59 PDT
The reason this is P1 is that an app can dlopen softoken, call C_Initialize, call C_Finalize, then dlclose softoken , then fork.

On s8 and s9, the child will crash calling into an address that's no longer in memory.

I tested this scenario as part of bug 331096 and the behavior was OK on s10. I didn't test s9 and s8, and they are broken.

One possible fix is to link softoken with -z nodelete. I don't particularly like the idea of leaking memory especially since we can't make that conditional for s8/s9.
Comment 2 Nelson Bolyard (seldom reads bugmail) 2008-10-30 12:26:02 PDT
I suggest that you create a new check fork FUNCTION, and have the macro 
call it, on S8 and S9.  The function can also check to see if global value
is zero, and initialize it with an initial call to getPID if so.  

Also, C_Finalize should reset this global value.
Comment 3 Julien Pierre 2008-10-30 13:57:33 PDT
Nelson,

I think resetting the state in C_Finalize is something unrelated to the crash due to the fork handler still being registered, and should be dealt with separately. I will email you separately about that issue.
Comment 4 Nelson Bolyard (seldom reads bugmail) 2008-10-30 15:31:02 PDT
When C_Finalize succeeds, it must clear the PID.  Otherwise, the following sequence will always fail:

parent
  C_Initialize
  C_Finalize
  fork
child
  C_Initialize

There is nothing wrong with that behavior.  That behavior should always work.
Comment 5 Nelson Bolyard (seldom reads bugmail) 2008-10-30 15:51:35 PDT
Of course, what I wrote for the PID also goes for the "forked" flag, on
platforms that use that technique.  

In today's meeting, I believe there was consensus that NSS must not disallow 
children to call C_Initialize, even if NSS was initialized before the fork.
NSS must not be more strict that the PKCS#11 spec allows it to be.

Some went further and expressed the desire to allow C_Finalize and other 
functions related to the releasing of resources (such as C_CloseSession, C_CloseAllSessions, C_Logout, and maybe C_FindObjectsFinal, C_CancelFunction) 
but there was not clear consensus about that.
Comment 6 Julien Pierre 2008-10-30 16:24:24 PDT
Nelson,

Thanks for clarifying by email. Your comment 2 was not clear about which situations you wanted C_Finalize to reset the global value. I agree that C_Finalize should reset the state, ie. the global PID, in the case of a successful C_Finalize, ie. not in the case where a fork was detected. This will allow the situation in comment 4 to work, which I agree we must support.

Re: comment 5, IMO, the same does not go for the "forked" flag. The meaning of the "forked" flag is that the parent process forked after C_Initialize. The forked flag is set only in the child, never in the parent.

It's currently not possible for the child to do a successful C_Finalize in that case. The child always is hosed, period - there is currently nothing it can do to recover - not C_Initialize, nor C_Finalize. The only thing that *might* currently work would be for the app to dlclose() the softoken module, and that would take care of clearing the forked flag at the same time.

I will file a separate bug/RFE about allowing the child to call C_Initialize after the parent forked, since I think it is unrelated to this fork handler crash.
Comment 7 Julien Pierre 2008-10-30 17:12:11 PDT
Created attachment 345628 [details] [diff] [review]
Use PID checks on Solaris <10

This is my first draft. Very little testing has been done at this point.

So far, I have only verified with the printfs that the Solaris version check works correctly on s9 and s10 machines. I haven't tried to fork any process yet with either scheme.
Comment 8 Julien Pierre 2008-10-30 22:34:34 PDT
Created attachment 345666 [details] [diff] [review]
Use PID checks on Solaris < 10 . Update.

This patch has been tested on s9 and s10. Both the PID check and fork handler methods were verified.

There was a bug in the previous patch for the PID check. myPid was being reset in NSC_Finalize , where it was supposed to be in nsc_CommonFinalize. The change was not effective for FIPS mode. Now it is.

The other change is that I added an environment variable called NSS_STRICT_NOFORK . When this variable is set to 1, the softoken will assert and dump core in every PKCS#11 function if the parent forked while softoken was initialized. This functionality is present in debug builds only. I found it very helpful to detect and fix the aforementioned bug, so I think it should stay.
Comment 9 Julien Pierre 2008-10-30 22:41:30 PDT
Created attachment 345667 [details] [diff] [review]
Add fork tests to pk11mode

This patch adds 5 optional fork tests to the pk11mode PKCS#11 test program on Unix platforms. They are triggered by the -F argument (for fork).

The test cases are :
1) fork before softoken is loaded or initialized
2) fork with softoken loaded, but not initialized
3) fork with softoken both loaded and initialized
4) fork with softoken still loaded, but de-initialized
5) fork with softoken unloaded

Status of the tests is verified by means of process exit code, which is an arbitrary value in tests 1 and 5, CKR_CRYPTOKI_NOT_INITIALIZED in tests 2 and 5, and CKR_DEVICE_ERROR in test 3.

I believe this covers all the fork cases we need to worry about, but if there are others, feel free to suggest some more.

I was able to find the bug in NSC_Finalize my initial patch using these tests.
Comment 10 Nelson Bolyard (seldom reads bugmail) 2008-10-31 18:08:41 PDT
Comment on attachment 345666 [details] [diff] [review]
Use PID checks on Solaris < 10 . Update.

r+. Please make these changes.
1. Remove the commented-out debug printfs.
2. Change the sense of the sense of the NSS_STRICT_NOFORK envariable 
so that the default (in its absence) is to be strict.  The variable
should make it be less strict.

Oh, wait until the tinderbox tree goes green before committing this..
It's orange now.
Comment 11 Nelson Bolyard (seldom reads bugmail) 2008-10-31 18:11:04 PDT
Comment on attachment 345667 [details] [diff] [review]
Add fork tests to pk11mode

Is this patch missing a piece? 
It adds options to pk11mode, but I don't see any changes to any test script called from all.sh to actually use these new options.
Comment 12 Julien Pierre 2008-10-31 18:14:58 PDT
Nelson,

You are right, there is no change to the test script. I was going to let Slavo do that part. We only needs to add the -F argument to pk11mode on Unix platforms.
Comment 13 Julien Pierre 2008-10-31 18:49:48 PDT
Created attachment 345829 [details] [diff] [review]
Update with Nelson's feedback (checked in)

- Removed debug printf.
- Change the default to assert when a PKCS#11 call is made in a child after fork. This can be turned off by setting NSS_STRICT_NOFORK=0 in the environment.

I will check this in when the tree is green.
Comment 14 Julien Pierre 2008-10-31 19:00:07 PDT
Created attachment 345830 [details] [diff] [review]
Test update

Changed the default to run fork tests.
The -F argument will now turn off the fork tests.
This is easier than trying to figure out the platform in the shell scripts and set -F only on Unix platforms.  With this update to pk11mode, no script change is needed.
Comment 15 Nelson Bolyard (seldom reads bugmail) 2008-10-31 19:06:11 PDT
Comment on attachment 345829 [details] [diff] [review]
Update with Nelson's feedback (checked in)

An observation:  The expression:

 ( (!forkAssert) || ( forkAssert && (0 == strcmp(forkAssert, "1"))) ) 

Is exactly equivalent to the shorter and more efficient

 ( (!forkAssert) || (0 == strcmp(forkAssert, "1") )
Comment 16 Julien Pierre 2008-11-03 13:33:24 PST
Comment on attachment 345829 [details] [diff] [review]
Update with Nelson's feedback (checked in)

I checked in this fix to the trunk.

Checking in pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.153; previous revision: 1.152
done
Checking in softoken.h;
/cvsroot/mozilla/security/nss/lib/softoken/softoken.h,v  <--  softoken.h
new revision: 1.18; previous revision: 1.17
done
Comment 17 Julien Pierre 2008-11-03 13:35:23 PST
I might have been too quick to mark this resolved. The bug is fixed, but I would still like a review on the additional test cases so we don't run into this again.
Comment 18 Nelson Bolyard (seldom reads bugmail) 2008-11-06 16:12:48 PST
Comment on attachment 345830 [details] [diff] [review]
Test update



>+#if defined(XP_UNIX) && !defined(NO_PTHREADS)
>+#include <unistd.h>
>+#include <sys/wait.h>
>+#endif

I suggest that you add a #define in that block that defines a symbol
that can be used as a "feature test macro" below.  IOW, add something
like #define DO_FORK_CHECK 1  and then below, just do #ifdef DO_FORK_CHECK
rather than repeating that more complex #if.  This has the side benefit
that if you ever decide to change that more complex #if, you only need
to change it in one place, not in many.


>+CK_RV PKM_ForkCheck(int expected, CK_FUNCTION_LIST_PTR fList);

Please #ifdef all references to ForkCheck.  I don't want this 
program to be calling ForkCheck on non-Unix platforms.  I also don't
want to be doing unnecessary putenv calls on Windows.
Yes I know it does nothing on them, but still.


>-    PLOptState *opt = PL_CreateOptState(argc, argv, "nvhf:d:p:");
>+    PLOptState *opt = PL_CreateOptState(argc, argv, "nvhf:Fd:p:");

This code adds "F" unconditionally to the list of known option 
characters.  But below, the case that handles 'F' is conditinally
compiled.  That means that adding -F on Windows will cause a failure
but adding -F on Unix will not.  I suggest you fix that by 
removing the ifdef from the code below. Go ahead set set doForkTests
to false.  That just won't mean anything on Windows.

>+#if defined(XP_UNIX) && !defined(NO_PTHREADS)
>+        case 'F':  /* disable fork tests */
>+            doForkTests = PR_FALSE;
>+            break;
>+#endif

Please ifdef the block below, and all similar new blocks that begin 
with if (doForkTests)

>+    if (doForkTests)
>+    {
>+        /* first, try to fork without softoken loaded to make sure
>+         * everything is OK */
>+        crv = PKM_ForkCheck(123, NULL);
>+        if (crv != CKR_OK)
>+            goto cleanup;
>+    }

The convention used in this file seems to be that "case" statements are 
indented at the same level of indentation as the switch that encloses them.
Please use that convention here also, and unindent the entire contents of
this new switch by one level.

>+    switch (child) {
>+        case -1:
>+            PKM_Error("Fork failed.\n");
>+            crv = CKR_DEVICE_ERROR;
>+            break;
>+        case 0:
>+            if (fList) {
>+                /* If softoken is loaded, make a PKCS#11 call in the child.

Finally, regarding the new code in the parent that tries to manage the child 
after the fork, I don't see any code there that deals with early returns from
wait() due to EINTR.  Also, does the code need to take precautions against 
SIGCHLD interrupts?
Comment 19 Julien Pierre 2008-11-06 17:42:39 PST
Created attachment 346798 [details] [diff] [review]
Second test update (checked in)

1) I added the DO_FORK_CHECK macro .

2) I did not wait to pepper the main functions with #ifdef . The entire source has very few ifdef's. I prefer to have just one #ifdef inside PKM_ForkCheck.

I moved the putenv calls into PKM_ForkCheck, and added one extra argument to PKM_ForkCheck. So now we really don't do anything on Windows regardless of the state of the doForkChecks variable.

3) I made the option available on all platforms, but it will have no effect on non-Unix platforms. I added the word "Unix" to the usage for this option.

4) I think this was the same request as 2), and I did not want to do add unnecessary ifdef's.

5) I fixed the indentation in the switch statement.

6) I'm not sure if any extra code is needed. It has always worked in all my tests so far. I believe wait() is implemented by waiting for SIGCHILD, so we shouldn't worry about that signal interruption. Maybe other signals.
Comment 20 Nelson Bolyard (seldom reads bugmail) 2008-11-06 18:15:08 PST
Comment on attachment 346798 [details] [diff] [review]
Second test update (checked in)

ok, r=nelson, thanks
Comment 21 Julien Pierre 2008-11-06 18:22:25 PST
Thanks for the review, Nelson.

I think we can finally mark this resolved.

Checking in pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.19; previous revision: 1.18
done
Comment 22 Julien Pierre 2008-11-06 18:27:03 PST
I completely forgot the tree was frozen. I will check the tinderbox detailed logs to see if this causes any new problem.
Comment 23 Julien Pierre 2008-11-06 18:44:53 PST
*** Bug 462453 has been marked as a duplicate of this bug. ***
Comment 24 Julien Pierre 2008-11-07 15:48:04 PST
Unfortunately, I have to reopen this once more.
The last test - fork after C_Finalize and dlclose() - caused a crash on AIX.
So, this problem is unfortunately not limited to Solaris 8/9.

I am very surprised by this, because I reported having manually verified that things were OK for this case on AIX in https://bugzilla.mozilla.org/show_bug.cgi?id=331096#c39 .

The HP-UX tests haven't finished running. The issue might exist there too.
Comment 25 Wan-Teh Chang 2008-11-07 16:07:26 PST
Perhaps we should take a more conservative approach.

Rather than determining by experiments whether dlclose
unregisters the fork handlers defined in the shared
library it is unloading, we should ask the OS vendors
for an authoritative answer.  In the absence of an
affirmative answer from the OS vendor, we should default
to the getpid approach.
Comment 26 Julien Pierre 2008-11-07 17:38:22 PST
I agree with comment 25, the default needs to be changed.
It looks like pk11mode causes an infinite loop on HP-UX. I still don't know why.
I am trying to debug with printf because I don't see a working debugger on the system.
Comment 27 Wan-Teh Chang 2008-11-07 17:44:30 PST
The debugger on HP-UX is gdb, and it's installed in /opt/langtools/bin.
There are gdb32 for 32-bit apps and gdb64 for 64-bit apps.
Comment 28 Julien Pierre 2008-11-07 17:54:02 PST
Thanks, Wan-Teh.

The problem on HP-UX is also in the last fork test - after the PKCS#11 module is finalized and unloaded. After the fork, the child goes into an infinite loop. I can't even debug it with gdb. ctrl-c won't work. I think this calls for using the getpid() method on HP-UX as well.

It looks like pthread_atfork only works as we expect on s10 and Linux :(

I will prepare a new patch to reflect these findings. The default will be getpid.
Comment 29 Julien Pierre 2008-11-07 18:38:34 PST
Created attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

This makes the getpid method the default for Unix platforms.
Linux uses pthread_atfork since we have since it succeeed on all our machines.
Solaris still has the runtime check to switch between getpid and pthread_atfork.

This patch also contains a small fix to pk11mode - skip the last test if the PKCS#11 library was not unloaded.
Comment 30 Julien Pierre 2008-11-07 18:39:12 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

I have tested this patch on AIX, HP-UX, Solaris 8, 9, and even Windows to make sure it compiled.
Comment 31 Julien Pierre 2008-11-07 18:49:39 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

Cancelling review request. I didn't test linux before submitting the patch. And of course there is a problem there :(
Comment 32 Julien Pierre 2008-11-07 19:29:09 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

This patch is OK after all even on Linux. I just had an environment problem.
Comment 33 Wan-Teh Chang 2008-11-07 20:04:47 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

What is this NO_PTHREADS macro?  This MXR search
shows it's not defined:
http://mxr.mozilla.org/security/search?string=NO_PTHREADS

Instead of using LINUX, you should define a macro
to indicate it's okay to use pthread_atfork.  This
makes it easier to enable pthread_atfork for a new
platform in the future.

You can use PR_BEGIN_MACRO and PR_END_MACRO instead
of "do {" and "} while (0)".

I prefer the shorter
  #ifdef FOO
to
  #if defined(FOO)

You should add the sftk_ prefix to extern variables
'forked', 'myPid', and 'usePthread_atfork'.

'usePthread_atfork' looks strange.  How about
'usePthreadAtFork'?
Comment 34 Wan-Teh Chang 2008-11-07 20:06:05 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

You may want to ask Nelson to review this patch instead
because he reviewed previous versions of this fix.
Comment 35 Wan-Teh Chang 2008-11-07 20:16:39 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

ForkedChild should be declared static because it
is used in only one file (lib/softoken/pkcs11.c).
Comment 36 Wan-Teh Chang 2008-11-07 20:20:30 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

In the Solaris definition of CHECK_FORK, I think it is
better to write the test as

(usePthread_atfork && forked) || (!usePthread_atfork && myPid != getpid())
Comment 37 Nelson Bolyard (seldom reads bugmail) 2008-11-08 17:15:40 PST
(In reply to comment #36)

> (usePthread_atfork && forked) || (!usePthread_atfork && myPid != getpid())

Testing the same variable twice can be avoided by rewriting the expression as

(usePthread_atfork ? forked : (myPid != getpid()))
Comment 38 Slavomir Katuscak 2008-11-10 03:32:01 PST
Since last patch related to this bug (pk11mode.c) was integrated, pk11mode is failing on AIX. I see it was already mentioned in comment 24, but rather to have it clear:

fips.sh: Run PK11MODE in FIPSMODE  -----------------
pk11mode -d ../fips -p fips- -f ../tests.fipspw.602244
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 

FIPS MODE PKM_Error: Child misbehaved.
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Child return status : 0.
**** Total number of TESTS ran in FIPS MODE is 92. ****
fips.sh: #175: Run PK11MODE in FIPS mode (pk11mode) . - Core file is detected - FAILED
Comment 39 Wan-Teh Chang 2008-11-10 16:51:20 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

r=wtc.  Before you check this in, please make as many of the suggested
changes as you see fit.

I have some more suggested changes.

If DEBUG is defined, you should define FORK_ASSERT() as a function to
avoid code duplication from macro expansion.

In FORK_ASSERT(), we have
        if ( (!forkAssert) || \
             ( forkAssert && (0 == strcmp(forkAssert, "1"))) ) { \
You can rewrite the Boolean expression as
        if ( (!forkAssert) || (0 == strcmp(forkAssert, "1")) ) { \

In the default definition of CHECK_FORK(), we have
        if (myPid && myPid != getpid()) { \
            FORK_ASSERT(); \
            return CKR_DEVICE_ERROR; \
        } \
You may be able to rewrite the Boolean expression as
        if (myPid != getpid()) { \

In NSC_ModuleDBFunc, we have
#if defined(XP_UNIX) && !defined(NO_PTHREADS)
    if (forked) return NULL;
#endif
Should that be removed, or replaced by a macro that is similar to
CHECK_FORK() but returns NULL instead of CKR_DEVICE_ERROR?
Comment 40 Wan-Teh Chang 2008-11-10 16:54:48 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

cmd/pk11mode/pk11mode.c also has one instance of NO_PTHREADS left.
I wonder if that should be removed.
Comment 41 Wan-Teh Chang 2008-11-11 15:30:00 PST
Comment on attachment 347026 [details] [diff] [review]
Fix crash on other Unix platforms

In nsc_CommonFinalize, we have
>     if (!usePthread_atfork) {
>         myPid = 0; /* allow reinitialization */
>     }
>+#elif defined(XP_UNIX) && !defined(LINUX)
>+    myPid = 0; /* allow reinitialization */
> #endif

I don't understand why we need to reset myPid to 0 to
allow reinitialization.  The initialization code doesn't
check myPid to be 0 before setting it to getpid().

If it's necessary, should we also reset 'forked' to PR_FALSE?
Comment 42 Julien Pierre 2008-11-12 13:48:50 PST
Wan-Teh,

Thanks for the review.

Re: comment 33, the use of the NO_PTHREADS macro comes from a request from Bob Relyea to not have a dependency on pthreads in softoken on Linux.

See https://bugzilla.mozilla.org/show_bug.cgi?query_format=specific&order=relevance+desc&bug_status=__open__&id=331096#c33

Apparently, my latest patch broke that case, since it assumes that Linux always uses pthread_atfork .

Previously, defining NO_PTHREADS resulted in disabling fork checks. But now it can result in switching to the getpid() method.

I will provide an updated patch for that case and address your comment as well. This should be a feature macro.

Re: comment 35,
I agree. ForkedChild wasn't introduced or changed in the latest patch, but it belongs as a static.

Re: comment 36 and comment 37, I will settle for Nelson's solution .

Re: comment 39,

I am not at all concerned about code duplication from macro expansion in DEBUG code. Is there any reason that we should be ?

I will shorten the FORK_ASSERT macro as well as the default CHECK_FORK macro.


Good catch about NSC_ModuleDBFunc. I completely forgot about that case. This is the one exception to the rule - a pseudo-PKCS#11 call that unfortunately doesn't return a CK_RV . I have some mixed feelings about that.
I don't want to write a second macro. I will come up with a solution using an intermediate function that calls CHECK_FORK .

Re: comment 40, yes, that particular instance of NO_PTHREADS can be removed since the fork check is now implemented in a non pthread-dependent fashion when using getpid().

Given the amount of changes suggested/requested, I unfortunately feel that a new patch with review is necessary.
Comment 43 Julien Pierre 2008-11-12 16:38:21 PST
Re: comment 41,

The code that assigns getpid() to myPid is invoked as part of C_Initialize . Currently, there is a CHECK_FORK in that function, just like every other PKCS#11 call. CHECK_FORK checks that myPid is either zero, or that its value is the same as getpid(). Ie. the macro checks that either softoken is not initialized, or we are still in the same process that did the initialization.

If we don't reset myPid to zero in C_Finalize(), the process may fork after C_Finalize, and then a subsequent initialization in the child process would fail, since myPid would not match getpid(). Believe it or not, this is another case we have encountered. And I don't have a test for it in pk11mode yet. Sigh.

On the other hand, the "forked" global variable is only ever set by the pthread_atfork child handler, and should never be reset, since the softoken cannot currently clean up after itself after a fork in the child process, or in any of its grandchildren.
Comment 44 Julien Pierre 2008-11-12 18:34:17 PST
Created attachment 347902 [details] [diff] [review]
Patch update

This makes of the changes requested as discussed.

You now get your choice of build macros to get the desired behavior.
- defining NO_CHECK_FORK will turn off all fork checks
- defining CHECK_FORK_MIXED will choose the Solaris "mixed" implementation
- defining CHECK_FORK_PTHREAD will choose the pthread_atfork implementation
- defining CHECK_FORK_GETPID will choose the getpid implementation

If none of them are defining, softoken.h tries to figure out the best one for the platform according to our current knowledge. That is mixed for solaris, pthread for Linux, and getpid for every other Unix.

I had to keep the check for myPid && myPid != getpid() in CHECK_FORK, otherwise the very first C_Initialize would fail since myPid is initially 0.

I tested this on Solaris, Linux, AIX and Windows.

I still need to write one more test for pk11mode but it won't be today and it can be reviewed separately.
Comment 45 Wan-Teh Chang 2008-11-13 15:32:18 PST
Comment on attachment 347902 [details] [diff] [review]
Patch update

You attached the wrong file.
Comment 46 Julien Pierre 2008-11-13 15:52:13 PST
Created attachment 348090 [details] [diff] [review]
Correct file
Comment 47 Wan-Teh Chang 2008-11-13 17:07:07 PST
Comment on attachment 348090 [details] [diff] [review]
Correct file

r=wtc.

I still think the global variables 'forked', 'myPid', 'usePthread_atfork'
should have a prefix such as sftk_ or nsc_.

>+#error Incorrect fork method.

"fork method" => "fork check method".

>     if (!usePthread_atfork) {
>         myPid = 0; /* allow reinitialization */
>     }
>+#elif defined(XP_UNIX) && !defined(LINUX)
>+    myPid = 0; /* allow reinitialization */
> #endif

You may want to change "allow reinitialization" to something like
"allow the CHECK_FORK() in NSC_Initialize to succeed.".  My previous
confusion was that I thought "reinitialization" meant the
reinitialization of myPid.

You can also solve the reinitialization problem by removing the
CHECK_FORK() from NSC_Initialize.

>+ * This section should be updated as more platforms get pthread fixes for
>+ * dlclose.

You may want to say what the "pthread fixes" are: "unregister fork handlers
in dlclose".

>+#elif defined(LINUX)
>+
>+/* Linux uses pthread_atfork */

This comment can be removed now.

>+        if (usePthread_atfork ? forked : myPid && myPid != getpid() ) { \

You may want to add parentheses () around "myPid && myPid != getpid()".

>+/* non-Unix platforms, or fork check disabled */
> 
> #define CHECK_FORK()
> 
>+#define NO_FORK_CHECK

This should be protected with #ifndef NO_FORK_CHECK, otherwise
you may get a macro redefinition error if NO_FORK_CHECK is defined
by the build system.
Comment 48 Wan-Teh Chang 2008-11-13 17:14:12 PST
Comment on attachment 348090 [details] [diff] [review]
Correct file

It's not clear why NSC_ModuleDBFunc needs a fork check.
NSC_ModuleDBFunc is not a PKCS #11 function.  You don't
need to be logged in to use it.
Comment 49 Julien Pierre 2008-11-14 18:04:07 PST
Created attachment 348293 [details] [diff] [review]
Incorporate Wan-Teh's feedback. Add one more pk11mode test.

Wan-Teh,

I made most of your suggested changes, except those around variable naming.

Re: the need for CHECK_FORK NSC_ModuleDBFunc, even though it is not a PKCS#11 entry point, and it doesn't appear to use locks currently, it is still part of the softoken module. With the current state of the code, we don't support the softoken module at all in the child if the parent initialized before forked. So, I think it makes sense for this check to be in all entry points. We had already discussed the same thing for C_Initialize as part of bug 331096 also. If we improve the softoken's fork-safety, we can remove the CHECK_FORK from some or all of the entry points.

The main thing to review here is the additional test I made in pk11mode to attempt to fork and initialize in the child, after the parent has shutdown NSS. This is one of the cases we need to support. The DS does that.
Comment 50 Wan-Teh Chang 2008-11-14 21:53:52 PST
Comment on attachment 348293 [details] [diff] [review]
Incorporate Wan-Teh's feedback. Add one more pk11mode test.

r=wtc.

> CK_RV PKM_ForkCheck(int expected, CK_FUNCTION_LIST_PTR fList,
>-		    PRBool forkAssert);
>+		    PRBool forkAssert, CK_C_INITIALIZE_ARGS_NSS* initArgs);

Nit: be consistent with the position of *.  Use the "type *var" style
in this file.

> CK_RV PKM_ForkCheck(int expected, CK_FUNCTION_LIST_PTR fList,
>-		    PRBool forkAssert)
>+		    PRBool forkAssert, CK_C_INITIALIZE_ARGS_NSS* initArgs)

Same as above.

>+            if (!initArgs) {

Nit: In an if-else statement, I like to see the condition to
have a positive sense, i.e., if (initArgs).

>+                 * it fails with CKR_CRYPTOKI_NOT_INITIALIZED .
and
>+                 * kick in, and make it return CKR_DEVICE_ERROR .

Nit: no space before the period (.).  Same thing in the 'else' block.
Comment 51 Wan-Teh Chang 2008-11-14 22:04:17 PST
Not checking for fork in NSC_ModuleDBFunc allows us to remove
the special ForkCheck function.  NSC_ModuleDBFunc logically
belongs to the pk11wrap layer.  It has to live in the softoken
after we made dbm an internal component of the softoken.  So
NSC_ModuleDBFunc doesn't need to follow the PKCS #11 rule on
forking.
Comment 52 Julien Pierre 2008-11-18 16:17:34 PST
Thanks for the review, Wan-Teh. I checked this in to the trunk. I hope this is finally resolved.

Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.20; previous revision: 1.19
done
Checking in lib/softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.154; previous revision: 1.153
done
Checking in lib/softoken/softoken.h;
/cvsroot/mozilla/security/nss/lib/softoken/softoken.h,v  <--  softoken.h
new revision: 1.20; previous revision: 1.19
done

Note You need to log in before you can comment on or make changes to this bug.