795 bytes, text/plain
272 bytes, text/plain
915 bytes, patch
|Details | Diff | Splinter Review|
On NT, NSPR starts a continuation thread in PR_Init() . After calling PR_Cleanup(), the continuation thread is still running. This causes problems for applications that need to unload NSPR.
Component: Libraries → NSPR
Product: NSS → NSPR
Version: 3.9.2 → 4.5
There is another thread which also gets started only on Windows NT : the TimerInit thread. This gets started as a local thread, so unless NSPR_NATIVE_THREADS_ONLY is set, it will not show up. But this timer thread also never terminates. There is no code to end it; no return or break statement in the loop in TimerInit.
Note that this test program should not be linked with NSPR, it loads NSPR directly. This is to better reproduce the behavior of an application we have, which is a plug-in to a third-party server program, which initializes and cleans up NSPR itself. The plug-in is linked directly against NSPR, but the server unloads and reloads the plug-in under various circumstances, which causes the NSPR DLL to also be unloaded from the process. If you let the program run, you will see that it leaks one native thread per invocation of the "TestNSPR" function (two if you set NSPR_NATIVE_THREADS_ONLY to 1). Eventually, the program crashes, when the thread count gets in the 600 range but the crash is never in the same place. Sometimes in NT kernel code, sometimes in NSPR assertions, sometimes in other NSPR code, sometimes in the MSVCRT library. I think part of the problem is that the leaked threads are sometimes running code which has been unloaded due to the FreeLibrary(), and this obviously causes problems. I don't understand how FreeLibrary() on NSPR actually succeeds in this case where the leaked threads are still running NSPR functions, but it does ... I have also tried running the same program with the Win95 version of NSPR instead of the NT version. The only change needed is on the LoadLibrary line - you need to replace "libnspr4.dll" with "nspr4.dll". With the Win95 version, the program does not leak threads, and the test appears to run fine. However, the task manager shows that the memory usage of the test program grows very rapidly, easily a few MB per second, which indicates memory leaks from PR_Init that isn't cleaned up in PR_Cleanup . I'm looking for a copy & license to purify to investigate that issue, but the memory leak is less serious than the thread leak.
This test is just a simple loop of PR_Init / PR_Cleanup . This test fails on all platforms with assertions. On Solaris, the stack is :  __lwp_kill(0x0, 0x6, 0x0, 0x7f73c000, 0x2, 0xff0000), at 0x7f71f69c  raise(0x6, 0x0, 0xffbfeb88, 0x7f73c000, 0x0, 0x0), at 0x7f6d0888  abort(0x5f, 0x7f93cc34, 0x7f93c3d0, 0x7f93c378, 0x7c, 0x2abd1), at 0x7f6b6ce0  PR_Assert(s = 0x7f93c3d0 "&_pr_faulty_methods == fd->methods", file = 0x7f93c378 "../../../../pr/src/io/prfdcach.c", ln = 124), line 538 in "prlog.c" => _PR_Getfd(), line 124 in "prfdcach.c"  pt_SetMethods(osfd = 0, type = PR_DESC_FILE, isAcceptedSocket = 0, imported = 1), line 3307 in "ptio.c"  _PR_InitIO(), line 1158 in "ptio.c"  _PR_InitStuff(), line 238 in "prinit.c"  _PR_ImplicitInitialization(), line 258 in "prinit.c"  PR_Init(type = PR_SYSTEM_THREAD, priority = PR_PRIORITY_NORMAL, maxPTDs = 1U), line 309 in "prinit.c"  TestNSPR(), line 9 in "prtest2.c"  main(), line 20 in "prtest2.c" The problem is that the cleanup functions clearly weren't implemented in a way that allows subsequent reinitialization . Some structures get freed but not zero'ed. Also, there is reliance on global variable initial values, but these values aren't reset during cleanup. Several patches in various placees will be needed to solve this problem. However, fortunately, the first test case with dynamic loading of NSPR is the one we are concerned about, and the zero'ing and resetting of global variables doesn't need to be fixed for that case.
Summary: Continuation thread survives PR_Cleanup → Continuation and TimerManager threads survive PR_Cleanup
Note that the ISAPI server and plug-in test cases that I have attached to bug 254983 can also be used to reproduce this problem. Just set "restarts" to 1 or greater to see the thread leak at each iteration.
Attachment #155862 - Attachment description: Test case demonstrating the thread leak → Test case demonstrating the thread leaks . Loads and unloads NSPR dynamically.
Attachment #155862 - Attachment is patch: false
Attachment #155871 - Attachment is patch: false
Attachment #155991 - Flags: review?(wtc)
Comment on attachment 155991 [details] [diff] [review] reset some variables after cleanup to prevent assertions in second test (checked in) r=wtc.
Attachment #155991 - Flags: review?(wtc) → review+
I checked in attachment 155991 [details] [diff] [review] to the trunk . Checking in prfdcach.c; /cvsroot/mozilla/nsprpub/pr/src/io/prfdcach.c,v <-- prfdcach.c new revision: 3.14; previous revision: 3.13 done This does not fix the bug that the threads are leaking however, only crashes in reinitialization when accessing the variables.
Attachment #155991 - Attachment description: reset some variables after cleanup to prevent assertions in second test → reset some variables after cleanup to prevent assertions in second test (checked in)
You need to log in before you can comment on or make changes to this bug.