Last Comment Bug 473505 - softoken's C_Initialize and C_Finalize should succeed after a fork in a child process
: softoken's C_Initialize and C_Finalize should succeed after a fork in a child...
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: 3.12.1
: All All
: P1 blocker (vote)
: 3.12.3
Assigned To: Julien Pierre
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-01-13 20:30 PST by Julien Pierre
Modified: 2009-02-03 15:19 PST (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Skip over all locks (9.11 KB, patch)
2009-01-13 20:40 PST, Julien Pierre
no flags Details | Diff | Splinter Review
Also skip over all lock destructions in NSC_Finalize (12.19 KB, patch)
2009-01-14 18:39 PST, Julien Pierre
no flags Details | Diff | Splinter Review
Work in progress (32.44 KB, patch)
2009-01-15 01:34 PST, Julien Pierre
no flags Details | Diff | Splinter Review
Working patch, except for sqlite (33.81 KB, patch)
2009-01-15 14:45 PST, Julien Pierre
rrelyea: superreview-
Details | Diff | Splinter Review
updated patch, working with sqlite (40.80 KB, patch)
2009-01-26 20:39 PST, Julien Pierre
rrelyea: review-
Details | Diff | Splinter Review
Updated patch (39.60 KB, patch)
2009-01-28 17:36 PST, Julien Pierre
rrelyea: superreview-
Details | Diff | Splinter Review
Take care of corner case (39.55 KB, patch)
2009-02-02 16:58 PST, Julien Pierre
rrelyea: review+
nelson: superreview+
Details | Diff | Splinter Review
Patch as checked in (38.39 KB, patch)
2009-02-02 21:35 PST, Julien Pierre
no flags Details | Diff | Splinter Review
Fix build breakage on Windows. As checked in. (1.98 KB, patch)
2009-02-03 15:19 PST, Julien Pierre
no flags Details | Diff | Splinter Review

Description Julien Pierre 2009-01-13 20:30:17 PST
According to the PKCS#11 spec, it is legal to call C_Initialize in a child process after a fork if the parent had initialized the PKCS#11 module. The child's C_Initialize should be able to succeed in reinitializing the module.

The NSS softoken does not comply with that and currently fails. C_Initialize returns with an error due to the fork check.

The PKCS#11 spec does not require C_Finalize to succeed in a child process, but it is trivial to do so if we have made C_Initialize work, since re-initializing requires finalizing and initializing the module.

One problem in this situation is that we don't know the state of the parent process before the fork, and whether any softoken locks are currently held.
So, in the case of a re-initialization, finalize needs to skip over, zero, and therefore leak any softoken locks that may be in an unknown held state in the parent's address space.
Comment 1 Julien Pierre 2009-01-13 20:40:00 PST
Created attachment 356895 [details] [diff] [review]
Skip over all locks

This patch is not for review or checkin. I created it mechanically.
This patch skips over all instances of PR_Lock and PR_Unlock that have NSC_Finalize in the call stack. I verified in the debugger that this was the case.

This patch will be used to identify all the locks we have to zero or skip over conditionally based on fork check status. There is a total of 13 different locking / unlocking code paths to modify. Some of these are in layers below softoken, such as freebl and nssdbm . There are also internal NSPR locks in PR_GetEnv and PR_UnloadLibrary, which I dealt with by skipping over those calls, though this may not be the right strategy.

This patch did not attempt to prevent the destruction of any locks yet, but most likely no code attempts to free locks without unlocking first, so this patch should still provide an exhaustive list of locks to deal with.
Comment 2 Julien Pierre 2009-01-14 18:39:05 PST
Created attachment 357084 [details] [diff] [review]
Also skip over all lock destructions in NSC_Finalize

Not for checkin.
I did not attempt to deal with the locks in sqlite. I was only running with the default DBM.
Comment 3 Julien Pierre 2009-01-14 21:50:55 PST
Here are two stacks in internal NSPR locks :

There is a possible race if the parent process holds of those locks in one thread, and then forks in another. This could be because one thread is in the middle of C_Finalize, or because the application is an NSPR application and is calling either PR_GetEnv or PR_UnloadLibrary. The two possible fixes include not calling these after a fork is detected in softoken, or making a change to NSPR to make it fork safe in those cases.

t@1 (l@1) stopped in PR_Lock at line 205 in file "ptsynch.c"
  205       PR_ASSERT(lock != NULL);
(dbx) w
current thread: t@1
=>[1] PR_Lock(lock = 0x808fee8), line 205 in "ptsynch.c"
  [2] PR_GetEnv(var = 0xfe8e8e54 "NSS_DISABLE_UNLOAD"), line 81 in "prenv.c"
  [3] sftkdbCall_Shutdown(), line 416 in "lgglue.c"
  [4] sftkdb_Shutdown(), line 2671 in "sftkdb.c"
  [5] nsc_CommonFinalize(pReserved = (nil), isFIPS = 0), line 2659 in "pkcs11.c"
  [6] NSC_Finalize(pReserved = (nil)), line 2728 in "pkcs11.c"
  [7] SECMOD_UnloadModule(mod = 0x8092e68), line 463 in "pk11load.c"
  [8] SECMOD_SlotDestroyModule(module = 0x8092e68, fromSlot = 1), line 808 in "pk11util.c"
  [9] PK11_DestroySlot(slot = 0x80bbbe0), line 451 in "pk11slot.c"
  [10] PK11_FreeSlot(slot = 0x80bbbe0), line 464 in "pk11slot.c"
  [11] SECMOD_DestroyModule(module = 0x8092e68), line 779 in "pk11util.c"
  [12] SECMOD_DestroyModuleListElement(element = 0x808fa20), line 825 in "pk11util.c"
  [13] SECMOD_DestroyModuleList(list = 0x808fa20), line 841 in "pk11util.c"
  [14] SECMOD_Shutdown(), line 99 in "pk11util.c"
  [15] NSS_Shutdown(), line 888 in "nssinit.c"
  [16] certutil_main(argc = 4, argv = 0x80467b4, initialize = 1), line 2921 in "certutil.c"
  [17] main(argc = 4, argv = 0x80467b4), line 2934 in "certutil.c"
(dbx) c
t@1 (l@1) stopped in PR_Lock at line 205 in file "ptsynch.c"
  205       PR_ASSERT(lock != NULL);
(dbx) w
current thread: t@1
=>[1] PR_Lock(lock = 0x809196c), line 205 in "ptsynch.c"
  [2] PR_EnterMonitor(mon = 0x8091968), line 531 in "ptsynch.c"
  [3] PR_UnloadLibrary(lib = 0x8090370), line 1256 in "prlink.c"
  [4] sftkdbCall_Shutdown(), line 418 in "lgglue.c"
  [5] sftkdb_Shutdown(), line 2671 in "sftkdb.c"
  [6] nsc_CommonFinalize(pReserved = (nil), isFIPS = 0), line 2659 in "pkcs11.c"
  [7] NSC_Finalize(pReserved = (nil)), line 2728 in "pkcs11.c"
  [8] SECMOD_UnloadModule(mod = 0x8092e68), line 463 in "pk11load.c"
  [9] SECMOD_SlotDestroyModule(module = 0x8092e68, fromSlot = 1), line 808 in "pk11util.c"
  [10] PK11_DestroySlot(slot = 0x80bbbe0), line 451 in "pk11slot.c"
  [11] PK11_FreeSlot(slot = 0x80bbbe0), line 464 in "pk11slot.c"
  [12] SECMOD_DestroyModule(module = 0x8092e68), line 779 in "pk11util.c"
  [13] SECMOD_DestroyModuleListElement(element = 0x808fa20), line 825 in "pk11util.c"
  [14] SECMOD_DestroyModuleList(list = 0x808fa20), line 841 in "pk11util.c"
  [15] SECMOD_Shutdown(), line 99 in "pk11util.c"
  [16] NSS_Shutdown(), line 888 in "nssinit.c"
  [17] certutil_main(argc = 4, argv = 0x80467b4, initialize = 1), line 2921 in "certutil.c"
  [18] main(argc = 4, argv = 0x80467b4), line 2934 in "certutil.c"
Comment 4 Julien Pierre 2009-01-14 22:37:38 PST
The same stacks are in the loader for freebl.

t@1 (l@1) stopped in PR_Lock at line 205 in file "ptsynch.c"
  205       PR_ASSERT(lock != NULL);
(dbx) w
current thread: t@1
=>[1] PR_Lock(lock = 0x808fee8), line 205 in "ptsynch.c"
  [2] PR_GetEnv(var = 0xfe8ead40 "NSS_DISABLE_UNLOAD"), line 81 in "prenv.c"
  [3] BL_Unload(), line 924 in "loader.c"
  [4] nsc_CommonFinalize(pReserved = (nil), isFIPS = 0), line 2672 in "pkcs11.c"
  [5] NSC_Finalize(pReserved = (nil)), line 2733 in "pkcs11.c"
  [6] SECMOD_UnloadModule(mod = 0x8092e68), line 463 in "pk11load.c"
  [7] SECMOD_SlotDestroyModule(module = 0x8092e68, fromSlot = 1), line 808 in "pk11util.c"
  [8] PK11_DestroySlot(slot = 0x80bbc40), line 451 in "pk11slot.c"
  [9] PK11_FreeSlot(slot = 0x80bbc40), line 464 in "pk11slot.c"
  [10] SECMOD_DestroyModule(module = 0x8092e68), line 779 in "pk11util.c"
  [11] SECMOD_DestroyModuleListElement(element = 0x808fa20), line 825 in "pk11util.c"
  [12] SECMOD_DestroyModuleList(list = 0x808fa20), line 841 in "pk11util.c"
  [13] SECMOD_Shutdown(), line 99 in "pk11util.c"
  [14] NSS_Shutdown(), line 888 in "nssinit.c"
  [15] certutil_main(argc = 4, argv = 0x80467b4, initialize = 1), line 2921 in "certutil.c"
  [16] main(argc = 4, argv = 0x80467b4), line 2934 in "certutil.c"
(dbx) c
t@1 (l@1) stopped in PR_Lock at line 205 in file "ptsynch.c"
  205       PR_ASSERT(lock != NULL);
(dbx) w
current thread: t@1
=>[1] PR_Lock(lock = 0x809196c), line 205 in "ptsynch.c"
  [2] PR_EnterMonitor(mon = 0x8091968), line 531 in "ptsynch.c"
  [3] PR_UnloadLibrary(lib = 0x80903f0), line 1256 in "prlink.c"
  [4] BL_Unload(), line 926 in "loader.c"
  [5] nsc_CommonFinalize(pReserved = (nil), isFIPS = 0), line 2672 in "pkcs11.c"
  [6] NSC_Finalize(pReserved = (nil)), line 2733 in "pkcs11.c"
  [7] SECMOD_UnloadModule(mod = 0x8092e68), line 463 in "pk11load.c"
  [8] SECMOD_SlotDestroyModule(module = 0x8092e68, fromSlot = 1), line 808 in "pk11util.c"
  [9] PK11_DestroySlot(slot = 0x80bbc40), line 451 in "pk11slot.c"
  [10] PK11_FreeSlot(slot = 0x80bbc40), line 464 in "pk11slot.c"
  [11] SECMOD_DestroyModule(module = 0x8092e68), line 779 in "pk11util.c"
  [12] SECMOD_DestroyModuleListElement(element = 0x808fa20), line 825 in "pk11util.c"
  [13] SECMOD_DestroyModuleList(list = 0x808fa20), line 841 in "pk11util.c"
  [14] SECMOD_Shutdown(), line 99 in "pk11util.c"
  [15] NSS_Shutdown(), line 888 in "nssinit.c"
  [16] certutil_main(argc = 4, argv = 0x80467b4, initialize = 1), line 2921 in "certutil.c"
  [17] main(argc = 4, argv = 0x80467b4), line 2934 in "certutil.c"
(dbx) c
Comment 5 Julien Pierre 2009-01-15 01:34:04 PST
Created attachment 357117 [details] [diff] [review]
Work in progress

This patch attempts to skip over all lock acquisitions during C_Finalize and C_Initialize in softoken in the child process if we detect that the parent forked with the module initialized.

It does this mainly by extending the fork detection code and setting and testing global variables where needed. The macro SKIP_AFTER_FORK is used to skip over lock acquisitions and deletions.

There are many locks in other shared libraries than softokn3, such as libfreebl, libnssutil3, libnssdbm3, libsqlite3.

The patch deals with the locks in freebl and util.

For nssdbm3, I wasn't quite sure how to propagate the fork state. There seems to be two different interfaces that softoken uses into nssdbm - the "lg_" and "legacy_". I added a "legacy_SetForkState" entry point, but right now it is never being called yet. Hopefully Bob can tell me how this should best be done.

I also did not attempt to do anything with sqlite, but the problem exists there too - it has its own locks.

There are also two NSPR internal lock acquisitions that we trip over, as mentioned in previous comment, and this patch doesn't attempt to skip them.

Note that I did not do a full code review of all lock acquisitions and deletions. I only determined the right places to fix by using breakpoints in the debugger. The 2 programs I used were pk11mode and certutil -L. There may be some remaining cases of locks that aren't always acquired or deleted on finalize.

One such case was in freebl - there was a remaining late initialization of a lock for RSA blinding parameters. I changed that to be an earlier init. But there could be similar remaining cases.

One systematic approach to dealing with the problem once and for all would be to have a special macro to replace PZ_Lock, PZ_Unlock, and PZ_DestroyLock, that we would always call into in the softoken, and would check the fork state bit. This would affect code size, as well as performance slightly. But it would be more maintainable. With the current approach in the patch, one will need to add SKIP_AFTER_FORK around any new lock acquisitions or deletions that may occur in new code in softoken as part of C_Finalize in the future. This may or may not be maintainable.
Comment 6 Wan-Teh Chang 2009-01-15 13:55:01 PST
(In reply to comment #3)
> Here are two stacks in internal NSPR locks :
> 
> There is a possible race if the parent process holds of those locks in one
> thread, and then forks in another. This could be because one thread is in the
> middle of C_Finalize, or because the application is an NSPR application and is
> calling either PR_GetEnv or PR_UnloadLibrary. The two possible fixes include
> not calling these after a fork is detected in softoken, or making a change to
> NSPR to make it fork safe in those cases.

It's a lot of work to make NSPR fork-safe.  It'd
be easier for softoken to not call PR_GetEnv or
PR_UnloadLibrary after a fork is detected.

That requirement in PKCS #11 may have been written
before pthreads were commonplace.  The requirement
is difficult to implement for a pthread-based
PKCS #11 module.
Comment 7 Julien Pierre 2009-01-15 14:32:52 PST
Thanks, Wan-Teh. I am not really certain if we need to worry about this race or not. As far as PR_GetEnv, we could look up the value of the environment variable during initialization and cache it. For PR_UnloadLibrary we probably want to keep the call in there, though.
Comment 8 Julien Pierre 2009-01-15 14:45:03 PST
Created attachment 357238 [details] [diff] [review]
Working patch, except for sqlite
Comment 9 Nelson Bolyard (seldom reads bugmail) 2009-01-19 18:51:22 PST
Comment on attachment 357238 [details] [diff] [review]
Working patch, except for sqlite

I reviewed this patch.
I found it to be mostly OK, but I had a few concerns which I discussed with Julien last week.  The two biggies were:
a) this patch breaks blapitest for RSA, because it does not modify blapitest to initialize freebl.
2) Rather than add a new init function for every algorithm that needs early initialization, I propose to create a function BL_Init which calls any other new future init functions.  Note that I do not propose to include RNG_Init functions in that.
Comment 10 Robert Relyea 2009-01-22 18:04:39 PST
Comment on attachment 357238 [details] [diff] [review]
Working patch, except for sqlite

r-, though since this is an intermediate patch, this is probably to be expected.

Comments with weight (what really should be fixed).

1. The pk11mode.c code appears to add a doForkTests after an already existing doForkTests block, which expects exactly the opposite of what your test is supposed to do. I think the old block needs to be removed.

2. RSA_Init... I agree with Nelson, make it BL_Init (or some such thing).

3. new delcaration of nsc_init in pk11fips.c, but no use of that variable.

4. sftkdb_closeDB() you only set the state before the final close, not before the attempt to close the update database. It's possible in a partial merge case to have the update database open at this stage.

5. even though you haven't check the sqlite code, you should at least have an sdb_ interface for your forkState, or you will crash if the sdb code is used. (always, since you call the set state unconditionally.

6. You need to arrange for the state to be cleared after the database is closed.... probably just after we open it, or make sure the state is cleared in the 'basic' database code (lgopen and sdb_open... well the latter isn't a problem since it's already part of your API).

General comments.

I dislike the idea of the global. It seems most of the places where we need this is in cleanup code, which could take a 'isfork' parameter. That being said I can see where that may be tricky in some cases (shutdown called as a result of a free, so free needs the paramter, then free is called from many places, etc.) This is more of a 'for your consideration'. If the next patch still has a global I will not r- for that.

Given that you have a global, I like the fact you added the setting on the sdb_ structure. This means if we ever support external db modules, the code should continue to work.

Other than those comments this looks good.

bob
Comment 11 Robert Relyea 2009-01-22 18:07:13 PST
> the 'basic' database code (lgopen and sdb_open... well the latter isn't a
> problem since it's already part of your API).

That should have read 'since it's already part of softoken'. sdb_ is statically linked with softoken, so it has access to your forkcached variable. It still needs the api so you don't crash when setting it.
Comment 12 Julien Pierre 2009-01-22 18:54:11 PST
Bob,

Thanks for your review.
Re: comment 9,

1. actually the two consecutive fork tests in pk11mode are not the exact opposite.
The test depends on the argument of PKM_ForkCheck . The third argument tells the softoken whether to assert or not when an "illegal" fork is detected by softoken. The fourth argument is a CK_C_INITIALIZE_ARGS_NSS structure. When not passed, the test calls C_GetTokenInfo after fork in the child; when passed, it calls C_Initialize .

2. This will be in the next patch.

3. good catch. This went in before I moved the reset sequence to its own sftkForkReset function, which I just called from 4 different places. I didn't want to duplicate that code between NSC_Initialize, NSC_Finalize, FC_Initialize and FC_Finalize.

4. good catch. I will add another call before the update.

5. Did you mean that I should have a check for the sdb_SetForkState pointer to prevent the crash ? I will add that.

6. Right now I am resetting the fork state to PR_FALSE in lgglue.c . Will that ever get called in the sqlite case or not ? My guess is the later, and if not, then I will need to find another spot to do it.  However, right now the sqlite3 DLL doesn't have those entry points anyway ... That is my next step.

I know the global isn't ideal, but I think it's by far the simplest to implement. I think the patch would be much larger if I tried to pass the fork state around. And it really doesn't need to be passed most of the time. 
The stack size will also remain the same as it is now as a result of using the global.
Comment 13 Robert Relyea 2009-01-23 14:28:02 PST
> 1.
>

Ok, a comment is probably in order there then.


> 5. Did you mean that I should have a check for the sdb_SetForkState pointer to
> prevent the crash ? I will add that.

No, you should add an sdb_SetForkState handler to sdb.c. Any time you see sdb->sdb_XXXX, it can either be from the dbm loaded library (lg_) or from sdb.c (sdb_). In the future it may be even another externally loaded library (the reserved extern: keyword). When you change that structure, you need to update all occurances. Fortunately we do not have any external devices yet, so it's not a frozen interface. Now sdb.c is statically linked with the rest of softoken, so it doesn't have to do anything with that flag.



> 6. Right now I am resetting the fork state to PR_FALSE in lgglue.c. 

Oh, right. I think this is a bit problematic. We should either reset it explicitly in sftkdb_open() or in the shared library itself (either in lg_close or lg_open would be fine, lg_close probably makes the most sense). We shouldn't neeed to export it at all explicity from the library since it's part of the sdb_ structure. You should drop all your existing changes to lgglue.c (other than your comments on the PR_GetEnv and PR_UnloadLibrary).

Basically the structure should be: assume the legacy db isn't the only loaded provider of the sdb_ interface.

> Will that
> ever get called in the sqlite case or not ?

I don't know what 'that' refers to. lgglue?

> However, right now the sqlite3
> DLL doesn't have those entry points anyway ... That is my next step.

sdb.c is different from sqlite3. We should not patch sqlite3 unless you are willing to push the changes upstream. We import sqlite3 from the upstream producers, and we allow people to use their own compiled versions of sqlite3. Red Hat, for instance, ships sqlite3 with their system and we use the system version of the library. Mozilla also has their own version of sqlite3 (In fact we convinced them to take the upstream version so they didn't have to carry all of their changes).

So we should make sure we test shared DB with the fork code (and make sure all of our locks are dealt with), but we should deal with sqlite3 the same way we deal with NSPR.

bob



> I know the global isn't ideal, but I think it's by far the simplest to
> implement. I think the patch would be much larger if I tried to pass the fork
> state around. And it really doesn't need to be passed most of the time. 
> The stack size will also remain the same as it is now as a result of using the
> global.

I suspect not (bigger patch), but it would be more work tracking down all the stacks (which would take some time). In any case I just submitted it for your consideration (though I suspect you know that before you started).... 'nuf said, bigger fish;).

bob
Comment 14 Julien Pierre 2009-01-26 20:39:46 PST
Created attachment 358991 [details] [diff] [review]
updated patch, working with sqlite

This patch implements most of the review feedback, but not all of it.
It "works" with sqlite - meaning all.sh passes and there are no longer crashes in shared db tests, however the locks in sqlite3 are still unsafe, and I didn't make any attempt to skip over them and leak them yet.

Bob,

RE: comment 13, I added that handler to sdb.c . Right now, it's a no-op.
However, it seems that do its proper job, it would need to propagate the flag to other shared libraries, ie. to sqlite3 or any other external devices.

I am not sure what's the best place in the sdb interface to do that. I looked at sdb_close, but it's called many times from many places. Is there the equivalent concept of an sdb_finalize function ? That's where I would like to call sdb_setForkState(PR_FALSE) . But so far I haven't found the right place to do it. I haven't deleted the lgglue.c changes yet because of this.

Regarding your question, yes , "that" referred to the lgglue code.

As far as sqlite3, I would probably want to push the changes upstream  after I make them. I don't know if we want to treat its locks the same way as the NSPR locks - ie. ignore them altogether. These locks may be a bit more likely to be tripped over since they are used from many PKCS#11 calls. If the parent forks and has them in use, the locks could be acquired, so the child might deadlock still.
Comment 15 Robert Relyea 2009-01-27 13:21:55 PST
> RE: comment 13, I added that handler to sdb.c . Right now, it's a no-op.
> However, it seems that do its proper job, it would need to propagate the flag
> to other shared libraries, ie. to sqlite3 or any other external devices.

I think the stub should be fine. I've looked at sqlite3 and there are only 2 locks (both of them global). It probably should have it's own fork() handler to deal with the fork() issue. (which would require not interface changes, and make it safe for all apps that use threads and fork()).


> I am not sure what's the best place in the sdb interface to do that. I looked
> at sdb_close, but it's called many times from many places. Is there the
> equivalent concept of an sdb_finalize function ? That's where I would like to
> call sdb_setForkState(PR_FALSE) . But so far I haven't found the right place to
> do it. I haven't deleted the lgglue.c changes yet because of this.


->sdb_Close() Is only called in sftkdb_CloseDB, right after the sdb_setForkState(). It is the equivalent to Finalize as far as db connections are concerned, so it's the appropriate place to reset the 'setForkState'.
(BTW we are actually talking about lg_Close() in this case).

bob
Comment 16 Robert Relyea 2009-01-27 13:37:54 PST
Comment on attachment 358991 [details] [diff] [review]
updated patch, working with sqlite

r-, but it's really close.

The required changes are logically one change (removing the hack from lgglue.c).

1. lgglue.c - only keep the comments on PR_GetEnv and PR_UnloadLibrary. The rest should revert to the original.

2. nssdbm.def - revert to original.

3. lginit.c - 
 3a. legacy_SetForkState should be renamed to lg_SetForkState like the rest of the function in this file.
 3b. reset fork state at the end of lg_Close().

other than that it looks good to go.

bob
Comment 17 Julien Pierre 2009-01-27 16:10:08 PST
Bob,

I implemented your changes, but now I am tripping on a lock again in nsslowcert_LockFreeList, because nobody has called sdb_SetForkState yet at that point . That's why I had the changes in lgglue.c .
Here is the relevant stack :

  [1] PR_Lock(lock = 0x80b34a0), line 205 in "ptsynch.c"
  [2] nsslowcert_LockFreeList(), line 226 in "pcertdb.c"
  [3] DestroyCertEntryFreeList(), line 1065 in "pcertdb.c"
  [4] nsslowcert_DestroyFreeLists(), line 5269 in "pcertdb.c"
  [5] legacy_Shutdown(), line 666 in "lginit.c"
=>[6] sftkdbCall_Shutdown(), line 414 in "lgglue.c"
  [7] sftkdb_Shutdown(), line 2686 in "sftkdb.c"
  [8] nsc_CommonFinalize(pReserved = (nil), isFIPS = 0), line 2685 in "pkcs11.c"
  [9] sftkForkReset(pReserved = (nil), crv = 0x80462e8), line 2733 in "pkcs11.c"
  [10] NSC_Finalize(pReserved = (nil)), line 2757 in "pkcs11.c"

This stack happens after sftkdb_CloseDB is called (twice, since I'm closing both the FIPS and non-FIPS parts), and thus, after the fork state has been reset to PR_FALSE in lg_Close.

Is there perhaps another place where sdb_SetForkState needs to be called ? If not, I think I have to keep my changes in lgglue.c .
Comment 18 Robert Relyea 2009-01-27 18:13:10 PST
Clearly the end of legacy_Shutdown()....

this brings up an interesting question, though. what happens if you open and close a legacy database before you shutdown...

You probably should pass the forkstate to legacy_Shutdown() and let it set the global at it's beginning as well. It's a private function, so adding a paramenter should be OK.

bob
Comment 19 Julien Pierre 2009-01-28 17:36:01 PST
Created attachment 359424 [details] [diff] [review]
Updated patch

Bob,

I believe this patch does the right thing. I changed it to only reset the fork state in legacy_Shutdown, but not in lg_Close, so I think we should always be OK.
Comment 20 Robert Relyea 2009-01-29 10:26:27 PST
I'm OK with just setting the state to 'false' in legacy_Shutdown, and not in lg_Close, I think we have a corner case, however, that needs a solution.

Case: Application starts NSS, requesting an sql database. That database doesn't exist yet, but there is a legacy DB in the same directory. Automatic update starts, and the application is able to complete the update. At this point the legacy database has been opened and closed, and we have nssdbm3 loaded. Now the application forks and then inits. We will call legacy_Shutdown without ever having set the Shutdown state.

This can be solved 2 ways: keep track of the legacy databases that are openned and when the last one is closed, call legacy_Shutdown, or pass the forkState to legacy_Shutdown.

I think the latter is probably easier.

bob
Comment 21 Robert Relyea 2009-02-02 11:44:51 PST
Comment on attachment 359424 [details] [diff] [review]
Updated patch

r-

This patch is basically correct, but the one corner case should really be fixed.

bob
Comment 22 Julien Pierre 2009-02-02 16:58:32 PST
Created attachment 360199 [details] [diff] [review]
Take care of corner case

I couldn't recreate the corner case, but I believe this patch should take care of it theoritically. Bob, please review.
Comment 23 Robert Relyea 2009-02-02 18:03:10 PST
Comment on attachment 360199 [details] [diff] [review]
Take care of corner case

r+ That should do it.
Thanks Julien.

bob
Comment 24 Nelson Bolyard (seldom reads bugmail) 2009-02-02 18:37:27 PST
Comment on attachment 360199 [details] [diff] [review]
Take care of corner case

R+ after suggested changes.  All my comments are about code style issues.
I find no logic faults.  So, I won't give r-, but I do want 
these to be addressed before checkin.

The easy way to make these changes, btw, is to do a global 
search and replace on the patch file, then apply the new patch 
file to a fresh tree and make a new patch from that tree.

1. Lines too long.
This patch adds a number of lines that exceed 80 columns.  
The single most frequent cause of this is the length of the 
new variable named cachedParentForkedAfterC_Initialize.
Please remove the word "cached" from the beginning of that 
name.  It doesn't add anything to the meaning of this variable,
and removing it will solve most of the line length problems.

2. Some function names don't follow NSS conventions.
Util_SetForkState should be UTIL_SetForkState.
sftkForkReset should be SFTK_ForkReset, or sftk_ForkReset, 
since it's not static.

3. Comments in the wrong place?
+    disableUnload = PR_GetEnv("NSS_DISABLE_UNLOAD"); /* there is a lock in
+                                                      * PR_GetEnv */
     if (!disableUnload) {
-        PR_UnloadLibrary(legacy_glue_lib);
+        PR_UnloadLibrary(legacy_glue_lib); /* there is a lock in
+                                            * PR_UnloadLibrary */

While those comments are true, it's not clear why these facts 
are relevant to this place in the code.  When a future developer
reads those comments, his question will be: so what?
What do these comments tell a developer to do to this code?
What changes to these lines of code do those comments suggest?  
I think these comments suggest that changes to the implementations of 
those functions are desired, but is this file the place to suggest that?
If these comments belong in the code at all, perhaps they belong elsewhere.  
Maybe they're best left in the bug, but I don't have a specific suggestion
on what else to do with them.
Comment 25 Julien Pierre 2009-02-02 21:35:17 PST
Bob, Nelson,

Thanks for the reviews. I committed an updated patch .

Checking in cmd/bltest/blapitest.c;
/cvsroot/mozilla/security/nss/cmd/bltest/blapitest.c,v  <--  blapitest.c
new revision: 1.56; previous revision: 1.55
done
Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.22; previous revision: 1.21
done
Checking in lib/freebl/blapi.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapi.h,v  <--  blapi.h
new revision: 1.29; previous revision: 1.28
done
RCS file: /cvsroot/mozilla/security/nss/lib/freebl/blapii.h,v
done
Checking in lib/freebl/blapii.h;
/cvsroot/mozilla/security/nss/lib/freebl/blapii.h,v  <--  blapii.h
initial revision: 1.1
done
Checking in lib/freebl/ldvector.c;
/cvsroot/mozilla/security/nss/lib/freebl/ldvector.c,v  <--  ldvector.c
new revision: 1.20; previous revision: 1.19
done
Checking in lib/freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.41; previous revision: 1.40
done
Checking in lib/freebl/loader.h;
/cvsroot/mozilla/security/nss/lib/freebl/loader.h,v  <--  loader.h
new revision: 1.23; previous revision: 1.22
done
Checking in lib/freebl/prng_fips1861.c;
/cvsroot/mozilla/security/nss/lib/freebl/prng_fips1861.c,v  <--  prng_fips1861.c
new revision: 1.29; previous revision: 1.28
done
Checking in lib/freebl/rsa.c;
/cvsroot/mozilla/security/nss/lib/freebl/rsa.c,v  <--  rsa.c
new revision: 1.39; previous revision: 1.38
done
Checking in lib/softoken/fipstokn.c;
/cvsroot/mozilla/security/nss/lib/softoken/fipstokn.c,v  <--  fipstokn.c
new revision: 1.28; previous revision: 1.27
done
Checking in lib/softoken/lgglue.c;
/cvsroot/mozilla/security/nss/lib/softoken/lgglue.c,v  <--  lgglue.c
new revision: 1.11; previous revision: 1.10
done
Checking in lib/softoken/lgglue.h;
/cvsroot/mozilla/security/nss/lib/softoken/lgglue.h,v  <--  lgglue.h
new revision: 1.3; previous revision: 1.2
done
Checking in lib/softoken/pkcs11.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11.c,v  <--  pkcs11.c
new revision: 1.160; previous revision: 1.159
done
Checking in lib/softoken/pkcs11i.h;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11i.h,v  <--  pkcs11i.h
new revision: 1.50; previous revision: 1.49
done
Checking in lib/softoken/pkcs11u.c;
/cvsroot/mozilla/security/nss/lib/softoken/pkcs11u.c,v  <--  pkcs11u.c
new revision: 1.81; previous revision: 1.80
done
Checking in lib/softoken/sdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sdb.c,v  <--  sdb.c
new revision: 1.8; previous revision: 1.7
done
Checking in lib/softoken/sdb.h;
/cvsroot/mozilla/security/nss/lib/softoken/sdb.h,v  <--  sdb.h
new revision: 1.4; previous revision: 1.3
done
Checking in lib/softoken/sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v  <--  sftkdb.c
new revision: 1.19; previous revision: 1.18
done
Checking in lib/softoken/sftkpwd.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkpwd.c,v  <--  sftkpwd.c
new revision: 1.6; previous revision: 1.5
done
Checking in lib/softoken/softoken.h;
/cvsroot/mozilla/security/nss/lib/softoken/softoken.h,v  <--  softoken.h
new revision: 1.22; previous revision: 1.21
done
Checking in lib/softoken/legacydb/keydb.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/keydb.c,v  <--  keydb.c
new revision: 1.11; previous revision: 1.10
done
Checking in lib/softoken/legacydb/lgdb.h;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/lgdb.h,v  <--  lgdb.h
new revision: 1.5; previous revision: 1.4
done
Checking in lib/softoken/legacydb/lginit.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/lginit.c,v  <--  lginit.c
new revision: 1.13; previous revision: 1.12
done
Checking in lib/softoken/legacydb/pcertdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/pcertdb.c,v  <--  pcertdb.c
new revision: 1.7; previous revision: 1.6
done
Checking in lib/util/nssutil.def;
/cvsroot/mozilla/security/nss/lib/util/nssutil.def,v  <--  nssutil.def
new revision: 1.8; previous revision: 1.7
done
Checking in lib/util/secoid.c;
/cvsroot/mozilla/security/nss/lib/util/secoid.c,v  <--  secoid.c
new revision: 1.46; previous revision: 1.45
done
Checking in lib/util/secoid.h;
/cvsroot/mozilla/security/nss/lib/util/secoid.h,v  <--  secoid.h
new revision: 1.11; previous revision: 1.10
done
Comment 26 Julien Pierre 2009-02-02 21:35:50 PST
Created attachment 360247 [details] [diff] [review]
Patch as checked in
Comment 27 Julien Pierre 2009-02-03 15:19:26 PST
Created attachment 360394 [details] [diff] [review]
Fix build breakage on Windows. As checked in.

Note You need to log in before you can comment on or make changes to this bug.