Last Comment Bug 451024 - certutil.exe crashes with Segmentation fault inside PR_Cleanup
: certutil.exe crashes with Segmentation fault inside PR_Cleanup
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Libraries (show other bugs)
: unspecified
: x86 Windows XP
: -- normal (vote)
: 3.12.2
Assigned To: Wan-Teh Chang
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-18 03:27 PDT by Maks Romih
Modified: 2008-08-25 15:48 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Proposed patch (1.45 KB, patch)
2008-08-22 21:07 PDT, Wan-Teh Chang
no flags Details | Diff | Splinter Review
Proposed patch v2 (1.01 KB, patch)
2008-08-23 22:24 PDT, Wan-Teh Chang
nelson: review+
Details | Diff | Splinter Review

Description Maks Romih 2008-08-18 03:27:06 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008081311 Minefield/3.0.1
Build Identifier: nss 3.12

May be I have a somewhat unusual build, but may be that is why the bug at least showed up. I've succeeded to build Firefox unofficially on Windows XP with gcc+mingw+buildtools. Firefox runs OK.

Then in the same build directories, inside mozilla/security/nss/cmd/certutil I've built certutil.exe with the command

  /bin/make -C $(pwd) MAKE="/bin/make -j1" -j1 CC="gcc -g -mno-cygwin" SOURCE_MD_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist DIST=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist NSPR_INCLUDE_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/include/nspr NSPR_LIB_DIR=d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/lib MOZILLA_CLIENT=1 NO_MDUPDATE=1 NSS_ENABLE_ECC=1 BUILD_TREE=d:/Maksr/Produkti/Install/Mozilla/mozilla BUILD_OPT=1 OPT_CODE_SIZE=1 NS_USE_GCC=1 NS_USE_NATIVE= OS_TARGET=WIN95

I add the .../mozilla/dist/bin to my PATH.

I copy the three files, cert8.db, key3.db, secmod.db from my Firefox profile directory to a temporary working directory and run the command certuil -L -d .
It lists a few certs, then it crashes with a Dialog box to send an error report to Microsoft.

I've tried to run it with gdb and I've found out, that at the end of the certutil_main function in nss/cmd/certutil/certutil.c there is a call to NSS_Shutdown, which unloads among others the library nssckbi.dll, but after that the call to PR_Cleanup wants to clean some thread and calls a destructor at address 0x63c0b96c that is no longer in mermory, because it has been unloaded with nssckbi.

Here is an excerpt from my session with gdb:

Breakpoint 9, certutil_main (argc=0, argv=0x22fc98, initialize=1)
    at certutil.c:2894
2894        if ((initialized == PR_TRUE) && NSS_Shutdown() != SECSuccess) {
(gdb) info share
From        To          Syms Read   Shared Object Library
0x7c901000  0x7c9afe88  Yes         C:\WINNT\system32\ntdll.dll
0x7c801000  0x7c8f4bec  Yes         C:\WINNT\system32\kernel32.dll
0x77c11000  0x77c67d74  Yes         C:\WINNT\system32\msvcrt.dll
0x643c1000  0x644943ec  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\nss3.dll
0x64f41000  0x64f727b0  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\nspr4.dll
0x77dd1000  0x77e6ab38  Yes         C:\WINNT\system32\advapi32.dll
0x77e71000  0x77f01464  Yes         C:\WINNT\system32\rpcrt4.dll
0x77fe1000  0x77ff0884  Yes         C:\WINNT\system32\secur32.dll
0x76b41000  0x76b6c8b4  Yes         C:\WINNT\system32\winmm.dll
0x7e411000  0x7e49fde8  Yes         C:\WINNT\system32\user32.dll
0x77f11000  0x77f568a0  Yes         C:\WINNT\system32\gdi32.dll
0x71ad1000  0x71ad804c  Yes         C:\WINNT\system32\wsock32.dll
0x71ab1000  0x71ac6dc8  Yes         C:\WINNT\system32\ws2_32.dll
0x71aa1000  0x71aa7324  Yes         C:\WINNT\system32\ws2help.dll
0x622c1000  0x622c9110  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\plds4.dll
0x6ce41000  0x6ce49140  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\plc4.dll
0x68f81000  0x68f947cc  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\nssutil3.dll
0x6e1c1000  0x6e1e2718  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\smime3.dll
0x64a81000  0x64aafe08  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\softokn3.dll
0x61c01000  0x61c65f84  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\sqlite3.dll
0x679c1000  0x679e2914  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\nssdbm3.dll
0x696c1000  0x69707f04  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\freebl3.dll
0x63c01000  0x63c4c030  Yes         D:\Maksr\Produkti\Install\Mozilla\mozilla\di
st\bin\nssckbi.dll
(gdb) info symbol 0x63c0b96c
PR_Free in section .text
(gdb) next
2898        PR_Cleanup();
(gdb) info symbol 0x63c0b96c
No symbol matches 0x63c0b96c.
(gdb) next

Breakpoint 6, _PR_DestroyThreadPrivate (self=0x3d4790) at prtpd.c:245
245     {
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x63c0b96c in ?? ()
(gdb)
(gdb) backtrace
#0  0x63c0b96c in ?? ()
#1  0x64f484ed in _PR_DestroyThreadPrivate (self=0x3d4790) at prtpd.c:265
#2  0x64f57fe2 in _PR_CleanupThread (thread=0x3d4790) at prcthr.c:62
#3  0x64f4d8ee in PR_Cleanup () at prinit.c:424
#4  0x00406f5d in certutil_main (argc=0, argv=0x22fc98, initialize=1)
    at certutil.c:2898
#5  0x004095aa in main (argc=4, argv=0x3d2890) at certutil.c:2910
(gdb)

The most informing are two different answers to "info symbol 0x63c0b96c".

Maks Romih.


Reproducible: Sometimes

Steps to Reproduce:
1.
2.
3.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2008-08-18 13:35:32 PDT
What version of NSPR are you using?  
Where/how did you get it?
Comment 2 Wan-Teh Chang 2008-08-18 15:39:19 PDT
Maks,

PR_Free is the NSPR thread private data destructor installed by
this call:
http://mxr.mozilla.org/security/source/security/nss/lib/base/error.c#103

100 static PRStatus
101 error_once_function ( void)
102 {
103   return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
104 }

I need to know the version of NSS you're using.
1. In mozilla/client.mk, what's the value of the NSS_CO_TAG variable?
2. In mozilla/security/nss/lib/nss/nss.h, what's the value of the
   NSS_VERSION macro?
Comment 3 Maks Romih 2008-08-19 01:54:03 PDT
(In reply to comment #1)
> What version of NSPR are you using?  
> Where/how did you get it?
> 

I've got NSPR with the Firefox source tarball firefox-3.0.1-source.tar.bz2 that I downloaded on Jully 28th.

Properties/Version on my nspr4.dll shows 4.7.1.0.
Comment 4 Maks Romih 2008-08-19 02:21:40 PDT
> 
> PR_Free is the NSPR thread private data destructor installed by
> this call:
> http://mxr.mozilla.org/security/source/security/nss/lib/base/error.c#103
> 
> 100 static PRStatus
> 101 error_once_function ( void)
> 102 {
> 103   return PR_NewThreadPrivateIndex(&error_stack_index, PR_Free);
> 104 }
> 

Yes, exactly. This function in my case gets called twice. Problematic is second call which is made after loading nssckbi.dll. Here is the call stack:

#0  PR_NewThreadPrivateIndex (newIndex=0x63c43040, dtor=0x63c0b96c <PR_Free>) at prtpd.c:131
#1  0x63c0b35a in error_once_function () at error.c:99
#2  0x64f4df74 in PR_CallOnce (once=0x63c43050, func=0x63c0b348 <error_once_function>) at prinit.c:808
#3  0x63c0b37a in error_get_my_stack () at error.c:120
#4  0x63c43050 in error_stack_index () from D:\Maksr\Produkti\Install\Mozilla\mozilla\dist\bin\nssckbi.dll
#5  0x63c0b348 in nss_arena_hash_free_entry () from D:\Maksr\Produkti\Install\Mozilla\mozilla\dist\bin\nssckbi.dll
#6  0x63c0b453 in nss_ClearErrorStack () at error.c:277
#7  0x63c0ac09 in NSSArena_Create () at arena.c:385
#8  0x63c07a7c in nssCKFWInstance_Create (pInitArgs=0x6445b70a, LockingState=MultiThreaded, mdInstance=0x63c10080, pError=0x22f954) at instance.c:217
#9  0x63c01efa in NSSCKFWC_Initialize (pFwInstance=0x63c43020, mdInstance=0x63c10080, pInitArgs=0x6445b70a) at wrap.c:205
#10 0x63c01195 in builtinsC_Initialize (pInitArgs=0x6445b70a) at d:/Maksr/Produkti/Install/Mozilla/mozilla/dist/public/nss/nssck.api:117
#11 0x643d54f7 in secmod_ModuleInit (mod=0xb38b08, alreadyLoaded=0x22f9c8) at pk11load.c:147
#12 0x643d58bf in SECMOD_LoadPKCS11Module (mod=0xb38b08) at pk11load.c:379
#13 0x643e4930 in SECMOD_AddModule (newModule=0xb38b08) at pk11util.c:487
#14 0x643e49b8 in SECMOD_AddNewModuleEx (moduleName=0x6445a244 "Root Certs", dllPath=0xb389e8 "./nssckbi.dll", defaultMechanismFlags=0, cipherEnableFlags=0, modparms=0x0, nssparms=0x0) at pk11util.c:586
#15 0x643e4a81 in SECMOD_AddNewModule (moduleName=0x6445a244 "Root Certs", dllPath=0xb389e8 "./nssckbi.dll", defaultMechanismFlags=0, cipherEnableFlags=0) at pk11util.c:626
#16 0x643c1ae6 in nss_Init (configdir=0x421e90 ".", certPrefix=0x417141 "", keyPrefix=0x417141 "", secmodName=0x418ab3 "secmod.db", updateDir=0x6445a00c "", updCertPrefix=0x6445a00c "", updKeyPrefix=0x6445a00c "", updateID=0x6445a00c "", updateName=0x6445a00c "", readOnly=1, noCertDB=0, noModDB=0, forceOpen=0, noRootInit=0, optimizeSpace=0, noSingleThreadedModules=0, allowAlreadyInitializedModules=0, dontFinalizeModules=0) at nssinit.c:388
#17 0x643c1ca1 in NSS_Initialize (configdir=0x421e90 ".", certPrefix=0x417141 "", keyPrefix=0x417141 "", secmodName=0x418ab3 "secmod.db", flags=1) at nssinit.c:661
#18 0x00406d9c in certutil_main (argc=4288833, argv=0x1, initialize=1) at certutil.c:2324
#19 0x004095aa in main (argc=4, argv=0x3d2980) at certutil.c:2910

In my nssckbi.dll the PR_Free is imported, so the address, that gets transferred in the second parameter to PR_NewThreadPrivateIndex is from the segment of nssckbi.dll. This causes segfault afterwards, when the NSS_Shutdown unloads the nssckbi.dll before PR_Cleanup tries to destroy the thread.

I will investigate now if the MSVC build has the same problem, however I'm not so at home wih WinDbg as I'm with Gdb. Maybe import mechanism between DLLs in MSVC works differently than in GCC.

> I need to know the version of NSS you're using.
> 1. In mozilla/client.mk, what's the value of the NSS_CO_TAG variable?
> 2. In mozilla/security/nss/lib/nss/nss.h, what's the value of the
>    NSS_VERSION macro?
> 

NSS_CO_TAG           = FIREFOX_3_0_1_RELEASE
#define NSS_VERSION  "3.12.0.3" _NSS_ECC_STRING _NSS_CUSTOMIZED

Comment 5 timeless 2008-08-19 03:39:03 PDT
yeah, you're basically passing a thunk to the cleanup library. and then unloading the library. by the time you call pr_cleanup, the library containing the thunk is dead and the pointer isn't valid.

you can use nspr to lookup the real PR_Free function instead.
Comment 6 Wan-Teh Chang 2008-08-19 08:10:14 PDT
Maks,

Thanks for the NSS version info.  I was worried that you're
using a very old version of NSS (used in Firefox 1.0.x), whose
nssckbi.dll has its own stub of PR_Free:
http://mxr.mozilla.org/aviarybranch/source/security/nss/lib/ckfw/nsprstub.c#317

If you unload that old nssckbi.dll, it would make sense that
its PR_Free stub is also gone from the address space.

You are using the latest version of NSS, whose nssckbi.dll uses
the real PR_Free exported from nspr4.dll.  Except that in this
case it is a thunk in nssckbi.dll for PR_Free that is passed to
PR_NewThreadPrivateIndex.

The fix that timeless suggested in comment 5 should work, but is
ugly.

Can you modify GCC (or suggest to the GCC maintainers) so that
GCC uses the address of the real function rather than the thunk
when the address is passed to a function defined in another DLL?

We fixed a related memory leak (bug 431805) before.  I believe
the same fix applied to nssckbi.dll would fix this crash.  In
mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize,
could you add this after the two NULL pointer checks?

  nss_DestroyErrorStack();

There are two other ways to fix this crash:

1. Make lib/base a shared library.  (We could just make it part
of nssutil3.dll.)  Right now lib/base is a static library.  So
anything that links with lib/base (nss3.dll and nssckbi.dll) has
its own copy of the NSS error stack, and each one needs to call
nss_DestroyErrorStack() before shutdown.

2. Remove the NSS error stack.  I don't think anyone is using it.
Comment 7 Nelson Bolyard (seldom reads bugmail) 2008-08-19 09:31:39 PDT
Is this a consequence of the fix for bug 337887, which added 
-mnop-fun-dllimport to the CFLAGS for GCC on win32?

That was a workaround for an unnecessary gcc incompatibility.
Maybe we can get the gcc folks to fix the cause of bug 337887 and
then back out that workaround?
Comment 8 Maks Romih 2008-08-20 06:37:13 PDT
(In reply to comment #6)

Wan-Teh, thank you for the solution!

> We fixed a related memory leak (bug 431805) before.  I believe
> the same fix applied to nssckbi.dll would fix this crash.  In
> mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize,
> could you add this after the two NULL pointer checks?
> 
>   nss_DestroyErrorStack();
> 

This really helps! No more crash.

I've also verified with gdb that the destructor PR_Free is now called only once from _PR_DestroyThreadPrivate.

Maks.
Comment 9 Maks Romih 2008-08-21 05:05:41 PDT
(In reply to comment #8)
> (In reply to comment #6)
> 
> Wan-Teh, thank you for the solution!
> 
> > We fixed a related memory leak (bug 431805) before.  I believe
> > the same fix applied to nssckbi.dll would fix this crash.  In
> > mozilla/security/nss/lib/ckfw/wrap.c, function NSSCKFWC_Finalize,
> > could you add this after the two NULL pointer checks?
> > 
> >   nss_DestroyErrorStack();
> > 
> 
> This really helps! No more crash.
> 
I've tried certutil and it really works now. But there's still a problem with modutil, that still crashes in a similar manner. I think it would be better to put nss_DestroyErrorStack after the row

  error = nssCKFWInstance_Destroy(*pFwInstance);

because, when linked into modutil, this row creates a new error stack.

Maks.
Comment 10 Wan-Teh Chang 2008-08-21 11:02:39 PDT
Maks,

You're right.  The NSSArena_Destroy call at the end of
nssCKFWInstance_Destroy creates the error stack if it
doesn't exist.

We should put nss_DestroyErrorStack right before the
"return error;" statement at the end of NSSCKFWC_Finalize.

I am wondering why certutil doesn't crash when we put
nss_DestroyErrorStack before the line

  error = nssCKFWInstance_Destroy(*pFwInstance);

nssCKFWInstance_Destroy should also create a new error stack
when linked into certutil, right?  Can you find out what
happens in gdb?
Comment 11 Maks Romih 2008-08-22 02:10:39 PDT
(In reply to comment #10)
> 
> We should put nss_DestroyErrorStack right before the
> "return error;" statement at the end of NSSCKFWC_Finalize.
> 

Now I've put it there and both, certutil and modutil seem to work OK.

> I am wondering why certutil doesn't crash when we put
> nss_DestroyErrorStack before the line
> 
>   error = nssCKFWInstance_Destroy(*pFwInstance);
> 
> nssCKFWInstance_Destroy should also create a new error stack
> when linked into certutil, right?  Can you find out what
> happens in gdb?

Yeah, I was wondering also. The fact is in comment 8 and comment 9 I was mistaken. This bug was as elusive as dangling pointers can be (I marked it as "Reproducible: Sometimes" in comment 1).

When tracing with gdb I was misled, because in the function _PR_DestroyThreadPrivate the destructor really got called one time less than before, so I thought it was OK. But one call less was the result of my applying the patch from bug 431805, not the fix suggested by Wan-Teh.

Now, that I've put the nss_DestroyErrorStack just before return in NSSCKFWC_Finalize, the destructor PR_Free doesn't get called at all. Both error stacks are destroyed before PR_Cleanup, one from NSSCKFWC_Finalize (called indirectly from NSS_Shutdown) and one from NSS_Shutdown itself.

I hope this concludes the matter.

This was my first reported bug in this bugzilla and I'm really not handy enough to make an official patch. Wan-Teh, will you take care?

Maks.
Comment 12 Maks Romih 2008-08-22 03:48:55 PDT
(In reply to comment #11)

> NSSCKFWC_Finalize, the destructor PR_Free doesn't get called at all. Both 

PR_Free of course do get called, but I mean not from _PR_DestroyThreadPrivate when PR_Cleanup destroys the threads.

Maks.
Comment 13 Wan-Teh Chang 2008-08-22 21:07:51 PDT
Created attachment 335151 [details] [diff] [review]
Proposed patch

Maks, please test this patch, which is slightly different
from what you tested before.

Note that I call nss_DestroyErrorStack() right before
NSSCKFWC_Finalize returns, but only if NSSCKFWC_Finalize
returns successfully (CKR_OK).  The reason is that
people may want to examine the error stack if
NSSCKFWC_Finalize returns an error.

Of course, people may ignore the NSSCKFWC_Finalize
failure and unload the nssckbi.dll module anyway,
so this may not be a good idea.
Comment 14 Nelson Bolyard (seldom reads bugmail) 2008-08-23 20:12:13 PDT
Comment on attachment 335151 [details] [diff] [review]
Proposed patch

If I remove the very large comment from the patch, it boils down to:

>   switch( error ) {
>   PRInt32 remainingInstances;
>   case CKR_OK:
>     remainingInstances = PR_AtomicDecrement(&liveInstances);
>     if (!remainingInstances) {
> 	nssArena_Shutdown();
>     }
>+    nss_DestroyErrorStack();
>     break;
>   case CKR_CRYPTOKI_NOT_INITIALIZED:
>   case CKR_FUNCTION_FAILED:
>   case CKR_GENERAL_ERROR:
>   case CKR_HOST_MEMORY:
>     break;
>   default:
>     error = CKR_GENERAL_ERROR;
>     break;
>   }
> 
>   return error;

Seems like the crash will occur in the event that "error" is any value
other than CKR_OK.  No?
Comment 15 Wan-Teh Chang 2008-08-23 20:31:57 PDT
Right.  But that's my dilemma.  By definition, the
error stack is used to provide more information about
an error.  That's why I don't want to destroy the
error stack on error.

But I suspect that our users will ignore the return value
of NSSCKFWC_Finalize and unload the nssckbi.dll module
anyway.  So I'll be happy to move the nss_DestroyErrorStack()
call before the "return error;" statement.
Comment 16 Nelson Bolyard (seldom reads bugmail) 2008-08-23 20:47:06 PDT
Well, my first preference would be to find a fix that avoids/prevents 
passing an address from within nssckbi.dll to PR_NewThreadPrivateIndex.
Comment 17 Nelson Bolyard (seldom reads bugmail) 2008-08-23 20:55:03 PDT
Thinking out loud here.

Comment 16 wasn't meant to imply that I oppose the fix you proposed in 
comment 14.

Maybe the new line you're proposing to add should be ifdef'ed for MinGW too.
I think that's entirely appropriate.  

Or Maybe nssckbi should call NSPR's equivalent of dlsym to get the address 
of PR_Free, and then pass that address to PR_NewThreadPrivateIndex, but only
for these MinGW builds.  
Comment 18 Wan-Teh Chang 2008-08-23 21:13:29 PDT
The nss_DestroyErrorStack() call fixes the leak of the
primordial thread's error stack if PR_Cleanup is not
called, so it should not be ifdef'ed for MinGW.  (This
leak is similar to the leak of bug 431805, and is caused
by lib/base being a static library and being included
in nss3.dll and nssckbi.dll, resulting in two error
stacks per thread.)

I don't want to spend too much time on this MinGW bug.
I'd rather be working on more important NSS bugs.
Comment 19 Nelson Bolyard (seldom reads bugmail) 2008-08-23 21:32:10 PDT
Well, if you attach the patch that I think you were suggesting at the end 
of comment 15, I'm willing to r+ it.
Comment 20 Wan-Teh Chang 2008-08-23 22:24:19 PDT
Created attachment 335228 [details] [diff] [review]
Proposed patch v2

Thanks, Nelson.  Here is the patch that Maks tested.
Comment 21 Nelson Bolyard (seldom reads bugmail) 2008-08-23 23:00:27 PDT
Comment on attachment 335228 [details] [diff] [review]
Proposed patch v2

r=nelson
Comment 22 Wan-Teh Chang 2008-08-25 15:48:15 PDT
I checked in the patch on the NSS trunk (NSS 3.12.2).

Checking in wrap.c;
/cvsroot/mozilla/security/nss/lib/ckfw/wrap.c,v  <--  wrap.c
new revision: 1.17; previous revision: 1.16
done

Note You need to log in before you can comment on or make changes to this bug.