Continuation and TimerManager threads survive PR_Cleanup

NEW
Assigned to

Status

defect
15 years ago
11 years ago

People

(Reporter: julien.pierre, Assigned: wtc)

Tracking

x86
Windows NT

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

On NT, NSPR starts a continuation thread in PR_Init() .
After calling PR_Cleanup(), the continuation thread is still running.
This causes problems for applications that need to unload NSPR.
Component: Libraries → NSPR
Product: NSS → NSPR
Version: 3.9.2 → 4.5
There is another thread which also gets started only on Windows NT : the
TimerInit thread. This gets started as a local thread, so unless
NSPR_NATIVE_THREADS_ONLY is set, it will not show up. But this timer thread also
never terminates. There is no code to end it; no return or break statement in
the loop in TimerInit.
Note that this test program should not be linked with NSPR, it loads NSPR
directly. This is to better reproduce the behavior of an application we have,
which is a plug-in to a third-party server program, which initializes and cleans
up NSPR itself. The plug-in is linked directly against NSPR, but the server
unloads and reloads the plug-in under various circumstances, which causes the
NSPR DLL to also be unloaded from the process.

If you let the program run, you will see that it leaks one native thread per
invocation of the "TestNSPR" function (two if you set NSPR_NATIVE_THREADS_ONLY
to 1). Eventually, the program crashes, when the thread count gets in the 600
range but the crash is never in the same place. Sometimes in NT kernel code,
sometimes in NSPR assertions, sometimes in other NSPR code, sometimes in the
MSVCRT library. I think part of the problem is that the leaked threads are
sometimes running code which has been unloaded due to the FreeLibrary(), and
this obviously causes problems. I don't understand how FreeLibrary() on NSPR
actually succeeds in this case where the leaked threads are still running NSPR
functions, but it does ...

I have also tried running the same program with the Win95 version of NSPR
instead of the NT version. The only change needed is on the LoadLibrary line -
you need to replace "libnspr4.dll" with "nspr4.dll". With the Win95 version, the
program does not leak threads, and the test appears to run fine. However, the
task manager shows that the memory usage of the test program grows very rapidly,
easily a few MB per second, which indicates memory leaks from PR_Init that isn't
cleaned up in PR_Cleanup . I'm looking for a copy & license to purify to
investigate that issue, but the memory leak is less serious than the thread leak.


This test is just a simple loop of PR_Init / PR_Cleanup .
This test fails on all platforms with assertions. On Solaris, the stack is :

[1] __lwp_kill(0x0, 0x6, 0x0, 0x7f73c000, 0x2, 0xff0000), at 0x7f71f69c
  [2] raise(0x6, 0x0, 0xffbfeb88, 0x7f73c000, 0x0, 0x0), at 0x7f6d0888
  [3] abort(0x5f, 0x7f93cc34, 0x7f93c3d0, 0x7f93c378, 0x7c, 0x2abd1), at
0x7f6b6ce0
  [4] PR_Assert(s = 0x7f93c3d0 "&_pr_faulty_methods == fd->methods", file =
0x7f93c378 "../../../../pr/src/io/prfdcach.c", ln = 124), line 538 in "prlog.c"

=>[5] _PR_Getfd(), line 124 in "prfdcach.c"
  [6] pt_SetMethods(osfd = 0, type = PR_DESC_FILE, isAcceptedSocket = 0,
imported = 1), line 3307 in "ptio.c"
  [7] _PR_InitIO(), line 1158 in "ptio.c"
  [8] _PR_InitStuff(), line 238 in "prinit.c"
  [9] _PR_ImplicitInitialization(), line 258 in "prinit.c"
  [10] PR_Init(type = PR_SYSTEM_THREAD, priority = PR_PRIORITY_NORMAL, maxPTDs
= 1U), line 309 in "prinit.c"
  [11] TestNSPR(), line 9 in "prtest2.c"
  [12] main(), line 20 in "prtest2.c"

The problem is that the cleanup functions clearly weren't implemented in a way
that allows subsequent reinitialization . Some structures get freed but not
zero'ed.

Also, there is reliance on global variable initial values, but these values
aren't reset during cleanup. Several patches in various placees will be needed
to solve this problem.

However, fortunately, the first test case with dynamic loading of NSPR is the
one we are concerned about, and the zero'ing and resetting of global variables
doesn't need to be fixed for that case.
Summary: Continuation thread survives PR_Cleanup → Continuation and TimerManager threads survive PR_Cleanup
Note that the ISAPI server and plug-in test cases that I have attached to bug
254983 can also be used to reproduce this problem. Just set "restarts" to 1 or
greater to see the thread leak at each iteration.
Attachment #155862 - Attachment description: Test case demonstrating the thread leak → Test case demonstrating the thread leaks . Loads and unloads NSPR dynamically.
QA Contact: bishakhabanerjee → nspr
Attachment #155862 - Attachment is patch: false
Attachment #155871 - Attachment is patch: false
Comment on attachment 155991 [details] [diff] [review]
reset some variables after cleanup to prevent assertions in second test (checked in)

r=wtc.
Attachment #155991 - Flags: review?(wtc) → review+
I checked in attachment 155991 [details] [diff] [review] to the trunk .

Checking in prfdcach.c;
/cvsroot/mozilla/nsprpub/pr/src/io/prfdcach.c,v  <--  prfdcach.c
new revision: 3.14; previous revision: 3.13
done

This does not fix the bug that the threads are leaking however, only crashes in reinitialization when accessing the variables.
Duplicate of this bug: 307370
Attachment #155991 - Attachment description: reset some variables after cleanup to prevent assertions in second test → reset some variables after cleanup to prevent assertions in second test (checked in)
You need to log in before you can comment on or make changes to this bug.