Closed Bug 222588 Opened 21 years ago Closed 21 years ago

Mozilla creates too many threads and seems to never terminate them

Categories

(Core :: Networking, defect)

x86
Windows NT
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.6alpha

People

(Reporter: rsokol, Assigned: darin.moz)

Details

(Keywords: memory-leak)

Attachments

(3 files)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
Build Identifier: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015

Mozilla creates a lot of threads (more than 50 is not uncommon)
and seems to never terminate them. I was able to generate 90 threads
in 30 minutes just by intensive browsing.

The problem exists in trunk builds (tried with Mozilla/5.0 (Windows;
U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015) and is inexistant in
1.5 branch builds.

The problem is evident while using HTTP proxy (Squid, actually).
I didn't try to reproduce it with proxy disabled. It doesn't seem
to be connected with Java or Flash (i have Java disabled and the
pages that cause new threads to be created do not contain Flash
objects).


Reproducible: Always

Steps to Reproduce:
1. Start Mozilla
2. Start monitoring 'Thread count' in Windows Performance Monitor
3. Browse

Actual Results:  
Thread count grows endlessy while browsing (opening new pages) and stabilizes 
when the browser is idle displaying a page. I reached 90 threads in 30 minutes 
of intensive browsing, and over 200 (with earlier trunk build) while browsing 
with browser window open for several days.

Expected Results:  
Thread count should fluctuate, but not grow endlessy. Mozilla 1.5 tends to keep 
less than 20 threads open.
Attached a screenshot of Windows NT Performance Monitor
after 30 minutes of browsing with 2003-10-15 trunk build.
Thread count keeps growing, as is Handle count too (as
each thread needs at least one handle for itself).
-> me (most likely the result of recent necko threading changes)
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: mlk
Target Milestone: --- → mozilla1.6alpha
Radoslaw: it would help me a great deal if you could try a trunk (pre-1.6a)
build dated anytime between 9/12 and 10/5.  thanks!
I just tried 2003-09-16, 2003-09-22 and 2003-10-03 trunk builds
(with same profile as the faulty 2003-10-15 build). They are all
unaffected by this bug (less than 15 threads open all the time,
open threads are being closed when they are no longer needed).
ok, i suspect then that it is somehow the threads created in nsIOThreadPool.cpp,
which are not closing.  but there's nothing obviously wrong in the code that
could be causing this... hmm...
I'm also seeing this with Mozilla Firebird: Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.6a) Gecko/20031017 Firebird/0.7+ (aebrahim), with 20030714
build at one time I saw around 500 threads, I'll post a SS with result of my
current build.
Attached image example
please note I had it set to sample every 2 minutes and a space between two
vertical lines is around 4 samples, so firebird went from 12 to more than 100
threads in 45-50 minutes, during this time all I did was looking through many
bugs in bugzilla occasionally visting some other sites.
it seems that NSPR requires joinable threads to stick around until someone calls
PR_JoinThread.  that's a surprise to me.  i would have thought it could manage
to do so without requiring the thread itself to stick around.  i should ask wtc
about this.

at any rate, after inspecting the old thread pool code, i see that it was
creating unjoinable threads instead of joinable threads.  this is why the
problem never appeared before.  on shutdown the thread pool would wait for each
child thread to signal it via a condition variable that shutdown completed. 
it's unfortunate that that should be required.
Attached patch v1 patchSplinter Review
this patch is not fully tested yet, but it makes nsIOThreadPool create
unjoinable threads like the old nsThreadPool did.  it uses a similar algorithm
to implement synchronous shutdown.  of course, this solution is not 100% ideal
since the main thread may still run to completion before all background threads
return.  but, since this is what we used to do, i think it is reasonable
behavior here.	there shouldn't be any risk of a crash since we are not calling
PR_Cleanup on the main thread and each thread owns a reference back to the
nsIOThreadPool object.
Comment on attachment 133733 [details] [diff] [review]
v1 patch

ok, i've tested this patch, and i think it's what we want.  this bug should
probably be an alpha blocker.
Attachment #133733 - Flags: superreview?(bzbarsky)
Attachment #133733 - Flags: review?(dougt)
Flags: blocking1.6a?
Comment on attachment 133733 [details] [diff] [review]
v1 patch

> nsIOThreadPool::IsOnCurrentThread(PRBool *result)
> {
>     // fudging this a bit since we actually cover several threads...
>-    *result = (GetCurrentThreadIndex() != -1);
>+    *result = PR_FALSE;

I guess this will at worst cause some things to be indirected via a proxy,
right?	If so, this looks good.  sr=bzbarsky.
Attachment #133733 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 133733 [details] [diff] [review]
v1 patch

> +    NS_ASSERTION(mNumThreads == 0, "leaking thread");
should be leaking thread(s)

> +    else if (mNumThreads < MAX_THREADS) {
>          // try to create a new thread unless we have reached our maximum...
> comment doesn't match, right?

> +        NS_ADDREF_THIS(); // the thread owns a reference to us
> +        mNumThreads++;
> +        PRThread *thread = PR_CreateThread(PR_USER_THREAD,
> +        if (!thread) {
> +            mNumThreads--;
> +            rv = NS_ERROR_OUT_OF_MEMORY;
>          }
need to NS_RELEASE_THIS() for failure
Attachment #133733 - Flags: review?(dougt) → review+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Flags: blocking1.6a?
Resolution: --- → FIXED
Blocks: 223191
You must join with a joinable thread using PR_JoinThread.
PR_JoinThread does two things: 1. synchronize with the
termination of the target thread, and 2. cause all the
resources used by the target thread to be released.

I looked at v1 patch and found one minor flaw in the code.

             if (PR_CLIST_IS_EMPTY(&pool->mEventQ) && !pool->mShutdown) {
                 pool->mNumIdleThreads++;
                 PR_WaitCondVar(pool->mIdleThreadCV, THREAD_IDLE_TIMEOUT);
                 pool->mNumIdleThreads--;
             }
 
             // if the queue is still empty, then kill this thread...
             if (PR_CLIST_IS_EMPTY(&pool->mEventQ))
                 break;

You assume that if the event queue is still empty, then either
pool->mShutdown is true or the thread idle timeout has expired.
This is wrong.  PR_WaitCondVar may return for no reason (called
a spurious wakeup) or another thread may beat this thread to
servicing the event on the queue so that by the time this thread
reaquires the lock, the queue is empty again.  If either of these
happens, this thread will exit prematurely.  I suspect that your
thread pool can recover from this automatically.

To defend against spurious wakeups and "another thread beating
you to the lock", you must always call PR_WaitCondVar in a while
loop.

To really be sure that the timeout has expired, you must call
PR_IntervalNow before and after the PR_WaitCondVar call and
calculate the elapsed time yourself.
wtc: thx for the comments...

i understand the issues with the thread exit code, but i would have expected the
thread library to be able to implement the "join" operation without requiring
the target thread to remain around.  i would expect that only a small chunk of
data representing the thread's exit state would be required.  but, at any rate,
i understand why that probably cannot be a portable assumption.  i'm happier
with the new code anyways.  i think the shutdown code is now much simpler
without having to concern itself with PR_JoinThread calls.  for one thing, i now
no longer need to keep around a list of active threads :-)

as for the spurious wakeups issue... you're right, the code is able to absorb a
spurious wakeup, but if they were to happen very frequently, then we might end
up creating threads more often then we expect.  i suppose it could be a
performance hit, so i will fix it.  thx!
i filed bug 223352 for this PR_WaitCondVar issue.
Darin wrote:
> i would have expected the thread library to be able to
> implement the "join" operation without requiring the
> target thread to remain around.  i would expect that only
> a small chunk of data representing the thread's exit state
> would be required.

Yes.  You also need the thread's ID so that you can refer
to it in the "join" operation, and that's why the thread
will show up in thread listing.

Spurious wakeups happen rarely.  Spurious wakeups are allowed
by the specs to give thread library implementors more flexibility
to get a high-performance implementation. But the while loop
is not just for defending against spurious wakeups.  It also
defends the "another thread beating it to the lock" issue.

Here is a typical code fragment.  I've expanded PR_WaitCondVar
to pseudo code that represents its semantics:

    PR_Lock(lock);  /* #1 */
    while (!condition) {
        PR_Unlock(lock); <----+-- atomically
        wait on condvar; <----|
        PR_Lock(lock);  /* #2 */
    }
    work on shared data;
    PR_Unlock(lock);

PR_WaitCondVar is equivalent to the three lines inside
the while loop, with the first two lines (unlock and wait)
done atomically.  The key is that PR_WaitCondVar needs
to reaquire the lock (line #2) before it returns.  Another
thread, entering from outside on line #1, may beat it to
the lock, see that the condition is true, work on shared
data, causing the condition to be false again.  So by the
time the awakened thread gets the lock on line #2, the
condition is false, and the while loop correctly causes
the thread to wait again.
No longer blocks: 223191
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: