Closed
Bug 222588
Opened 22 years ago
Closed 22 years ago
Mozilla creates too many threads and seems to never terminate them
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.6alpha
People
(Reporter: rsokol, Assigned: darin.moz)
Details
(Keywords: memory-leak)
Attachments
(3 files)
|
13.50 KB,
image/gif
|
Details | |
|
18.44 KB,
image/png
|
Details | |
|
7.80 KB,
patch
|
timeless
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
Build Identifier: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015
Mozilla creates a lot of threads (more than 50 is not uncommon)
and seems to never terminate them. I was able to generate 90 threads
in 30 minutes just by intensive browsing.
The problem exists in trunk builds (tried with Mozilla/5.0 (Windows;
U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015) and is inexistant in
1.5 branch builds.
The problem is evident while using HTTP proxy (Squid, actually).
I didn't try to reproduce it with proxy disabled. It doesn't seem
to be connected with Java or Flash (i have Java disabled and the
pages that cause new threads to be created do not contain Flash
objects).
Reproducible: Always
Steps to Reproduce:
1. Start Mozilla
2. Start monitoring 'Thread count' in Windows Performance Monitor
3. Browse
Actual Results:
Thread count grows endlessy while browsing (opening new pages) and stabilizes
when the browser is idle displaying a page. I reached 90 threads in 30 minutes
of intensive browsing, and over 200 (with earlier trunk build) while browsing
with browser window open for several days.
Expected Results:
Thread count should fluctuate, but not grow endlessy. Mozilla 1.5 tends to keep
less than 20 threads open.
| Reporter | ||
Comment 1•22 years ago
|
||
Attached a screenshot of Windows NT Performance Monitor
after 30 minutes of browsing with 2003-10-15 trunk build.
Thread count keeps growing, as is Handle count too (as
each thread needs at least one handle for itself).
| Assignee | ||
Comment 2•22 years ago
|
||
-> me (most likely the result of recent necko threading changes)
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
| Assignee | ||
Updated•22 years ago
|
| Assignee | ||
Comment 3•22 years ago
|
||
Radoslaw: it would help me a great deal if you could try a trunk (pre-1.6a)
build dated anytime between 9/12 and 10/5. thanks!
| Reporter | ||
Comment 4•22 years ago
|
||
I just tried 2003-09-16, 2003-09-22 and 2003-10-03 trunk builds
(with same profile as the faulty 2003-10-15 build). They are all
unaffected by this bug (less than 15 threads open all the time,
open threads are being closed when they are no longer needed).
| Assignee | ||
Comment 5•22 years ago
|
||
ok, i suspect then that it is somehow the threads created in nsIOThreadPool.cpp,
which are not closing. but there's nothing obviously wrong in the code that
could be causing this... hmm...
Comment 6•22 years ago
|
||
I'm also seeing this with Mozilla Firebird: Mozilla/5.0 (Windows; U; Windows NT
5.1; en-US; rv:1.6a) Gecko/20031017 Firebird/0.7+ (aebrahim), with 20030714
build at one time I saw around 500 threads, I'll post a SS with result of my
current build.
Comment 7•22 years ago
|
||
please note I had it set to sample every 2 minutes and a space between two
vertical lines is around 4 samples, so firebird went from 12 to more than 100
threads in 45-50 minutes, during this time all I did was looking through many
bugs in bugzilla occasionally visting some other sites.
| Assignee | ||
Comment 8•22 years ago
|
||
it seems that NSPR requires joinable threads to stick around until someone calls
PR_JoinThread. that's a surprise to me. i would have thought it could manage
to do so without requiring the thread itself to stick around. i should ask wtc
about this.
at any rate, after inspecting the old thread pool code, i see that it was
creating unjoinable threads instead of joinable threads. this is why the
problem never appeared before. on shutdown the thread pool would wait for each
child thread to signal it via a condition variable that shutdown completed.
it's unfortunate that that should be required.
| Assignee | ||
Comment 9•22 years ago
|
||
this patch is not fully tested yet, but it makes nsIOThreadPool create
unjoinable threads like the old nsThreadPool did. it uses a similar algorithm
to implement synchronous shutdown. of course, this solution is not 100% ideal
since the main thread may still run to completion before all background threads
return. but, since this is what we used to do, i think it is reasonable
behavior here. there shouldn't be any risk of a crash since we are not calling
PR_Cleanup on the main thread and each thread owns a reference back to the
nsIOThreadPool object.
| Assignee | ||
Comment 10•22 years ago
|
||
Comment on attachment 133733 [details] [diff] [review]
v1 patch
ok, i've tested this patch, and i think it's what we want. this bug should
probably be an alpha blocker.
Attachment #133733 -
Flags: superreview?(bzbarsky)
Attachment #133733 -
Flags: review?(dougt)
| Assignee | ||
Updated•22 years ago
|
Flags: blocking1.6a?
Comment 11•22 years ago
|
||
Comment on attachment 133733 [details] [diff] [review]
v1 patch
> nsIOThreadPool::IsOnCurrentThread(PRBool *result)
> {
> // fudging this a bit since we actually cover several threads...
>- *result = (GetCurrentThreadIndex() != -1);
>+ *result = PR_FALSE;
I guess this will at worst cause some things to be indirected via a proxy,
right? If so, this looks good. sr=bzbarsky.
Attachment #133733 -
Flags: superreview?(bzbarsky) → superreview+
Comment 12•22 years ago
|
||
Comment on attachment 133733 [details] [diff] [review]
v1 patch
> + NS_ASSERTION(mNumThreads == 0, "leaking thread");
should be leaking thread(s)
> + else if (mNumThreads < MAX_THREADS) {
> // try to create a new thread unless we have reached our maximum...
> comment doesn't match, right?
> + NS_ADDREF_THIS(); // the thread owns a reference to us
> + mNumThreads++;
> + PRThread *thread = PR_CreateThread(PR_USER_THREAD,
> + if (!thread) {
> + mNumThreads--;
> + rv = NS_ERROR_OUT_OF_MEMORY;
> }
need to NS_RELEASE_THIS() for failure
Attachment #133733 -
Flags: review?(dougt) → review+
| Assignee | ||
Comment 13•22 years ago
|
||
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Flags: blocking1.6a?
Resolution: --- → FIXED
Comment 14•22 years ago
|
||
You must join with a joinable thread using PR_JoinThread.
PR_JoinThread does two things: 1. synchronize with the
termination of the target thread, and 2. cause all the
resources used by the target thread to be released.
I looked at v1 patch and found one minor flaw in the code.
if (PR_CLIST_IS_EMPTY(&pool->mEventQ) && !pool->mShutdown) {
pool->mNumIdleThreads++;
PR_WaitCondVar(pool->mIdleThreadCV, THREAD_IDLE_TIMEOUT);
pool->mNumIdleThreads--;
}
// if the queue is still empty, then kill this thread...
if (PR_CLIST_IS_EMPTY(&pool->mEventQ))
break;
You assume that if the event queue is still empty, then either
pool->mShutdown is true or the thread idle timeout has expired.
This is wrong. PR_WaitCondVar may return for no reason (called
a spurious wakeup) or another thread may beat this thread to
servicing the event on the queue so that by the time this thread
reaquires the lock, the queue is empty again. If either of these
happens, this thread will exit prematurely. I suspect that your
thread pool can recover from this automatically.
To defend against spurious wakeups and "another thread beating
you to the lock", you must always call PR_WaitCondVar in a while
loop.
To really be sure that the timeout has expired, you must call
PR_IntervalNow before and after the PR_WaitCondVar call and
calculate the elapsed time yourself.
| Assignee | ||
Comment 15•22 years ago
|
||
wtc: thx for the comments...
i understand the issues with the thread exit code, but i would have expected the
thread library to be able to implement the "join" operation without requiring
the target thread to remain around. i would expect that only a small chunk of
data representing the thread's exit state would be required. but, at any rate,
i understand why that probably cannot be a portable assumption. i'm happier
with the new code anyways. i think the shutdown code is now much simpler
without having to concern itself with PR_JoinThread calls. for one thing, i now
no longer need to keep around a list of active threads :-)
as for the spurious wakeups issue... you're right, the code is able to absorb a
spurious wakeup, but if they were to happen very frequently, then we might end
up creating threads more often then we expect. i suppose it could be a
performance hit, so i will fix it. thx!
| Assignee | ||
Comment 16•22 years ago
|
||
i filed bug 223352 for this PR_WaitCondVar issue.
Comment 17•22 years ago
|
||
Darin wrote:
> i would have expected the thread library to be able to
> implement the "join" operation without requiring the
> target thread to remain around. i would expect that only
> a small chunk of data representing the thread's exit state
> would be required.
Yes. You also need the thread's ID so that you can refer
to it in the "join" operation, and that's why the thread
will show up in thread listing.
Spurious wakeups happen rarely. Spurious wakeups are allowed
by the specs to give thread library implementors more flexibility
to get a high-performance implementation. But the while loop
is not just for defending against spurious wakeups. It also
defends the "another thread beating it to the lock" issue.
Here is a typical code fragment. I've expanded PR_WaitCondVar
to pseudo code that represents its semantics:
PR_Lock(lock); /* #1 */
while (!condition) {
PR_Unlock(lock); <----+-- atomically
wait on condvar; <----|
PR_Lock(lock); /* #2 */
}
work on shared data;
PR_Unlock(lock);
PR_WaitCondVar is equivalent to the three lines inside
the while loop, with the first two lines (unlock and wait)
done atomically. The key is that PR_WaitCondVar needs
to reaquire the lock (line #2) before it returns. Another
thread, entering from outside on line #1, may beat it to
the lock, see that the condition is true, work on shared
data, causing the condition to be false again. So by the
time the awakened thread gets the lock on line #2, the
condition is false, and the while loop correctly causes
the thread to wait again.
You need to log in
before you can comment on or make changes to this bug.
Description
•