Closed Bug 222588 Opened 22 years ago Closed 22 years ago

Mozilla creates too many threads and seems to never terminate them

Categories

(Core :: Networking, defect)

x86
Windows NT
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.6alpha

People

(Reporter: rsokol, Assigned: darin.moz)

Details

(Keywords: memory-leak)

Attachments

(3 files)

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0) Build Identifier: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015 Mozilla creates a lot of threads (more than 50 is not uncommon) and seems to never terminate them. I was able to generate 90 threads in 30 minutes just by intensive browsing. The problem exists in trunk builds (tried with Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.6a) Gecko/20031015) and is inexistant in 1.5 branch builds. The problem is evident while using HTTP proxy (Squid, actually). I didn't try to reproduce it with proxy disabled. It doesn't seem to be connected with Java or Flash (i have Java disabled and the pages that cause new threads to be created do not contain Flash objects). Reproducible: Always Steps to Reproduce: 1. Start Mozilla 2. Start monitoring 'Thread count' in Windows Performance Monitor 3. Browse Actual Results: Thread count grows endlessy while browsing (opening new pages) and stabilizes when the browser is idle displaying a page. I reached 90 threads in 30 minutes of intensive browsing, and over 200 (with earlier trunk build) while browsing with browser window open for several days. Expected Results: Thread count should fluctuate, but not grow endlessy. Mozilla 1.5 tends to keep less than 20 threads open.
Attached a screenshot of Windows NT Performance Monitor after 30 minutes of browsing with 2003-10-15 trunk build. Thread count keeps growing, as is Handle count too (as each thread needs at least one handle for itself).
-> me (most likely the result of recent necko threading changes)
Assignee: general → darin
Status: UNCONFIRMED → NEW
Component: Browser-General → Networking
Ever confirmed: true
Severity: normal → major
Status: NEW → ASSIGNED
Keywords: mlk
Target Milestone: --- → mozilla1.6alpha
Radoslaw: it would help me a great deal if you could try a trunk (pre-1.6a) build dated anytime between 9/12 and 10/5. thanks!
I just tried 2003-09-16, 2003-09-22 and 2003-10-03 trunk builds (with same profile as the faulty 2003-10-15 build). They are all unaffected by this bug (less than 15 threads open all the time, open threads are being closed when they are no longer needed).
ok, i suspect then that it is somehow the threads created in nsIOThreadPool.cpp, which are not closing. but there's nothing obviously wrong in the code that could be causing this... hmm...
I'm also seeing this with Mozilla Firebird: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031017 Firebird/0.7+ (aebrahim), with 20030714 build at one time I saw around 500 threads, I'll post a SS with result of my current build.
Attached image example
please note I had it set to sample every 2 minutes and a space between two vertical lines is around 4 samples, so firebird went from 12 to more than 100 threads in 45-50 minutes, during this time all I did was looking through many bugs in bugzilla occasionally visting some other sites.
it seems that NSPR requires joinable threads to stick around until someone calls PR_JoinThread. that's a surprise to me. i would have thought it could manage to do so without requiring the thread itself to stick around. i should ask wtc about this. at any rate, after inspecting the old thread pool code, i see that it was creating unjoinable threads instead of joinable threads. this is why the problem never appeared before. on shutdown the thread pool would wait for each child thread to signal it via a condition variable that shutdown completed. it's unfortunate that that should be required.
Attached patch v1 patchSplinter Review
this patch is not fully tested yet, but it makes nsIOThreadPool create unjoinable threads like the old nsThreadPool did. it uses a similar algorithm to implement synchronous shutdown. of course, this solution is not 100% ideal since the main thread may still run to completion before all background threads return. but, since this is what we used to do, i think it is reasonable behavior here. there shouldn't be any risk of a crash since we are not calling PR_Cleanup on the main thread and each thread owns a reference back to the nsIOThreadPool object.
Comment on attachment 133733 [details] [diff] [review] v1 patch ok, i've tested this patch, and i think it's what we want. this bug should probably be an alpha blocker.
Attachment #133733 - Flags: superreview?(bzbarsky)
Attachment #133733 - Flags: review?(dougt)
Flags: blocking1.6a?
Comment on attachment 133733 [details] [diff] [review] v1 patch > nsIOThreadPool::IsOnCurrentThread(PRBool *result) > { > // fudging this a bit since we actually cover several threads... >- *result = (GetCurrentThreadIndex() != -1); >+ *result = PR_FALSE; I guess this will at worst cause some things to be indirected via a proxy, right? If so, this looks good. sr=bzbarsky.
Attachment #133733 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 133733 [details] [diff] [review] v1 patch > + NS_ASSERTION(mNumThreads == 0, "leaking thread"); should be leaking thread(s) > + else if (mNumThreads < MAX_THREADS) { > // try to create a new thread unless we have reached our maximum... > comment doesn't match, right? > + NS_ADDREF_THIS(); // the thread owns a reference to us > + mNumThreads++; > + PRThread *thread = PR_CreateThread(PR_USER_THREAD, > + if (!thread) { > + mNumThreads--; > + rv = NS_ERROR_OUT_OF_MEMORY; > } need to NS_RELEASE_THIS() for failure
Attachment #133733 - Flags: review?(dougt) → review+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Flags: blocking1.6a?
Resolution: --- → FIXED
Blocks: 223191
You must join with a joinable thread using PR_JoinThread. PR_JoinThread does two things: 1. synchronize with the termination of the target thread, and 2. cause all the resources used by the target thread to be released. I looked at v1 patch and found one minor flaw in the code. if (PR_CLIST_IS_EMPTY(&pool->mEventQ) && !pool->mShutdown) { pool->mNumIdleThreads++; PR_WaitCondVar(pool->mIdleThreadCV, THREAD_IDLE_TIMEOUT); pool->mNumIdleThreads--; } // if the queue is still empty, then kill this thread... if (PR_CLIST_IS_EMPTY(&pool->mEventQ)) break; You assume that if the event queue is still empty, then either pool->mShutdown is true or the thread idle timeout has expired. This is wrong. PR_WaitCondVar may return for no reason (called a spurious wakeup) or another thread may beat this thread to servicing the event on the queue so that by the time this thread reaquires the lock, the queue is empty again. If either of these happens, this thread will exit prematurely. I suspect that your thread pool can recover from this automatically. To defend against spurious wakeups and "another thread beating you to the lock", you must always call PR_WaitCondVar in a while loop. To really be sure that the timeout has expired, you must call PR_IntervalNow before and after the PR_WaitCondVar call and calculate the elapsed time yourself.
wtc: thx for the comments... i understand the issues with the thread exit code, but i would have expected the thread library to be able to implement the "join" operation without requiring the target thread to remain around. i would expect that only a small chunk of data representing the thread's exit state would be required. but, at any rate, i understand why that probably cannot be a portable assumption. i'm happier with the new code anyways. i think the shutdown code is now much simpler without having to concern itself with PR_JoinThread calls. for one thing, i now no longer need to keep around a list of active threads :-) as for the spurious wakeups issue... you're right, the code is able to absorb a spurious wakeup, but if they were to happen very frequently, then we might end up creating threads more often then we expect. i suppose it could be a performance hit, so i will fix it. thx!
i filed bug 223352 for this PR_WaitCondVar issue.
Darin wrote: > i would have expected the thread library to be able to > implement the "join" operation without requiring the > target thread to remain around. i would expect that only > a small chunk of data representing the thread's exit state > would be required. Yes. You also need the thread's ID so that you can refer to it in the "join" operation, and that's why the thread will show up in thread listing. Spurious wakeups happen rarely. Spurious wakeups are allowed by the specs to give thread library implementors more flexibility to get a high-performance implementation. But the while loop is not just for defending against spurious wakeups. It also defends the "another thread beating it to the lock" issue. Here is a typical code fragment. I've expanded PR_WaitCondVar to pseudo code that represents its semantics: PR_Lock(lock); /* #1 */ while (!condition) { PR_Unlock(lock); <----+-- atomically wait on condvar; <----| PR_Lock(lock); /* #2 */ } work on shared data; PR_Unlock(lock); PR_WaitCondVar is equivalent to the three lines inside the while loop, with the first two lines (unlock and wait) done atomically. The key is that PR_WaitCondVar needs to reaquire the lock (line #2) before it returns. Another thread, entering from outside on line #1, may beat it to the lock, see that the condition is true, work on shared data, causing the condition to be false again. So by the time the awakened thread gets the lock on line #2, the condition is false, and the while loop correctly causes the thread to wait again.
No longer blocks: 223191
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: