Closed
Bug 222588
Opened 21 years ago
Closed 21 years ago
Mozilla creates too many threads and seems to never terminate them
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.6alpha
People
(Reporter: rsokol, Assigned: darin.moz)
Details
(Keywords: memory-leak)
Attachments
(3 files)
13.50 KB,
image/gif
|
Details | |
18.44 KB,
image/png
|
Details | |
7.80 KB,
patch
|
timeless
:
review+
bzbarsky
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0) Build Identifier: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015 Mozilla creates a lot of threads (more than 50 is not uncommon) and seems to never terminate them. I was able to generate 90 threads in 30 minutes just by intensive browsing. The problem exists in trunk builds (tried with Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015) and is inexistant in 1.5 branch builds. The problem is evident while using HTTP proxy (Squid, actually). I didn't try to reproduce it with proxy disabled. It doesn't seem to be connected with Java or Flash (i have Java disabled and the pages that cause new threads to be created do not contain Flash objects). Reproducible: Always Steps to Reproduce: 1. Start Mozilla 2. Start monitoring 'Thread count' in Windows Performance Monitor 3. Browse Actual Results: Thread count grows endlessy while browsing (opening new pages) and stabilizes when the browser is idle displaying a page. I reached 90 threads in 30 minutes of intensive browsing, and over 200 (with earlier trunk build) while browsing with browser window open for several days. Expected Results: Thread count should fluctuate, but not grow endlessy. Mozilla 1.5 tends to keep less than 20 threads open.
Reporter | ||
Comment 1•21 years ago
|
||
Attached a screenshot of Windows NT Performance Monitor after 30 minutes of browsing with 2003-10-15 trunk build. Thread count keeps growing, as is Handle count too (as each thread needs at least one handle for itself).
Assignee | ||
Comment 2•21 years ago
|
||
-> me (most likely the result of recent necko threading changes)
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
Assignee | ||
Updated•21 years ago
|
Assignee | ||
Comment 3•21 years ago
|
||
Radoslaw: it would help me a great deal if you could try a trunk (pre-1.6a) build dated anytime between 9/12 and 10/5. thanks!
Reporter | ||
Comment 4•21 years ago
|
||
I just tried 2003-09-16, 2003-09-22 and 2003-10-03 trunk builds (with same profile as the faulty 2003-10-15 build). They are all unaffected by this bug (less than 15 threads open all the time, open threads are being closed when they are no longer needed).
Assignee | ||
Comment 5•21 years ago
|
||
ok, i suspect then that it is somehow the threads created in nsIOThreadPool.cpp, which are not closing. but there's nothing obviously wrong in the code that could be causing this... hmm...
Comment 6•21 years ago
|
||
I'm also seeing this with Mozilla Firebird: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031017 Firebird/0.7+ (aebrahim), with 20030714 build at one time I saw around 500 threads, I'll post a SS with result of my current build.
Comment 7•21 years ago
|
||
please note I had it set to sample every 2 minutes and a space between two vertical lines is around 4 samples, so firebird went from 12 to more than 100 threads in 45-50 minutes, during this time all I did was looking through many bugs in bugzilla occasionally visting some other sites.
Assignee | ||
Comment 8•21 years ago
|
||
it seems that NSPR requires joinable threads to stick around until someone calls PR_JoinThread. that's a surprise to me. i would have thought it could manage to do so without requiring the thread itself to stick around. i should ask wtc about this. at any rate, after inspecting the old thread pool code, i see that it was creating unjoinable threads instead of joinable threads. this is why the problem never appeared before. on shutdown the thread pool would wait for each child thread to signal it via a condition variable that shutdown completed. it's unfortunate that that should be required.
Assignee | ||
Comment 9•21 years ago
|
||
this patch is not fully tested yet, but it makes nsIOThreadPool create unjoinable threads like the old nsThreadPool did. it uses a similar algorithm to implement synchronous shutdown. of course, this solution is not 100% ideal since the main thread may still run to completion before all background threads return. but, since this is what we used to do, i think it is reasonable behavior here. there shouldn't be any risk of a crash since we are not calling PR_Cleanup on the main thread and each thread owns a reference back to the nsIOThreadPool object.
Assignee | ||
Comment 10•21 years ago
|
||
Comment on attachment 133733 [details] [diff] [review] v1 patch ok, i've tested this patch, and i think it's what we want. this bug should probably be an alpha blocker.
Attachment #133733 -
Flags: superreview?(bzbarsky)
Attachment #133733 -
Flags: review?(dougt)
Assignee | ||
Updated•21 years ago
|
Flags: blocking1.6a?
Comment 11•21 years ago
|
||
Comment on attachment 133733 [details] [diff] [review] v1 patch > nsIOThreadPool::IsOnCurrentThread(PRBool *result) > { > // fudging this a bit since we actually cover several threads... >- *result = (GetCurrentThreadIndex() != -1); >+ *result = PR_FALSE; I guess this will at worst cause some things to be indirected via a proxy, right? If so, this looks good. sr=bzbarsky.
Attachment #133733 -
Flags: superreview?(bzbarsky) → superreview+
Comment 12•21 years ago
|
||
Comment on attachment 133733 [details] [diff] [review] v1 patch > + NS_ASSERTION(mNumThreads == 0, "leaking thread"); should be leaking thread(s) > + else if (mNumThreads < MAX_THREADS) { > // try to create a new thread unless we have reached our maximum... > comment doesn't match, right? > + NS_ADDREF_THIS(); // the thread owns a reference to us > + mNumThreads++; > + PRThread *thread = PR_CreateThread(PR_USER_THREAD, > + if (!thread) { > + mNumThreads--; > + rv = NS_ERROR_OUT_OF_MEMORY; > } need to NS_RELEASE_THIS() for failure
Attachment #133733 -
Flags: review?(dougt) → review+
Assignee | ||
Comment 13•21 years ago
|
||
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Flags: blocking1.6a?
Resolution: --- → FIXED
Comment 14•21 years ago
|
||
You must join with a joinable thread using PR_JoinThread. PR_JoinThread does two things: 1. synchronize with the termination of the target thread, and 2. cause all the resources used by the target thread to be released. I looked at v1 patch and found one minor flaw in the code. if (PR_CLIST_IS_EMPTY(&pool->mEventQ) && !pool->mShutdown) { pool->mNumIdleThreads++; PR_WaitCondVar(pool->mIdleThreadCV, THREAD_IDLE_TIMEOUT); pool->mNumIdleThreads--; } // if the queue is still empty, then kill this thread... if (PR_CLIST_IS_EMPTY(&pool->mEventQ)) break; You assume that if the event queue is still empty, then either pool->mShutdown is true or the thread idle timeout has expired. This is wrong. PR_WaitCondVar may return for no reason (called a spurious wakeup) or another thread may beat this thread to servicing the event on the queue so that by the time this thread reaquires the lock, the queue is empty again. If either of these happens, this thread will exit prematurely. I suspect that your thread pool can recover from this automatically. To defend against spurious wakeups and "another thread beating you to the lock", you must always call PR_WaitCondVar in a while loop. To really be sure that the timeout has expired, you must call PR_IntervalNow before and after the PR_WaitCondVar call and calculate the elapsed time yourself.
Assignee | ||
Comment 15•21 years ago
|
||
wtc: thx for the comments... i understand the issues with the thread exit code, but i would have expected the thread library to be able to implement the "join" operation without requiring the target thread to remain around. i would expect that only a small chunk of data representing the thread's exit state would be required. but, at any rate, i understand why that probably cannot be a portable assumption. i'm happier with the new code anyways. i think the shutdown code is now much simpler without having to concern itself with PR_JoinThread calls. for one thing, i now no longer need to keep around a list of active threads :-) as for the spurious wakeups issue... you're right, the code is able to absorb a spurious wakeup, but if they were to happen very frequently, then we might end up creating threads more often then we expect. i suppose it could be a performance hit, so i will fix it. thx!
Assignee | ||
Comment 16•21 years ago
|
||
i filed bug 223352 for this PR_WaitCondVar issue.
Comment 17•21 years ago
|
||
Darin wrote:
> i would have expected the thread library to be able to
> implement the "join" operation without requiring the
> target thread to remain around. i would expect that only
> a small chunk of data representing the thread's exit state
> would be required.
Yes. You also need the thread's ID so that you can refer
to it in the "join" operation, and that's why the thread
will show up in thread listing.
Spurious wakeups happen rarely. Spurious wakeups are allowed
by the specs to give thread library implementors more flexibility
to get a high-performance implementation. But the while loop
is not just for defending against spurious wakeups. It also
defends the "another thread beating it to the lock" issue.
Here is a typical code fragment. I've expanded PR_WaitCondVar
to pseudo code that represents its semantics:
PR_Lock(lock); /* #1 */
while (!condition) {
PR_Unlock(lock); <----+-- atomically
wait on condvar; <----|
PR_Lock(lock); /* #2 */
}
work on shared data;
PR_Unlock(lock);
PR_WaitCondVar is equivalent to the three lines inside
the while loop, with the first two lines (unlock and wait)
done atomically. The key is that PR_WaitCondVar needs
to reaquire the lock (line #2) before it returns. Another
thread, entering from outside on line #1, may beat it to
the lock, see that the condition is true, work on shared
data, causing the condition to be false again. So by the
time the awakened thread gets the lock on line #2, the
condition is false, and the while loop correctly causes
the thread to wait again.
You need to log in
before you can comment on or make changes to this bug.
Description
•