Closed Bug 36750 Opened 24 years ago Closed 24 years ago

Fix nsThreadPool to minimize threads

Categories

(Core :: XPCOM, defect, P3)

All
Windows NT
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: warrensomebody, Assigned: dougt)

Details

(Keywords: embed, memory-footprint, Whiteboard: [nsbeta3-] fixed on trunk)

Attachments

(6 files)

At the last minute, I had to comment out the code to allow threads to terminate 
from the thread pool code because they still hung around waiting to be joined 
with. This effectively leaked them.

Here's a message from Wan-Teh about how we could possibly fix this:

Subject: 
             Re: clearing PR_JOINABLE_THREAD
       Date: 
             Fri, 21 Apr 2000 09:05:53 -0700
       From: 
             wtc@netscape.com (Wan-Teh Chang)
          To: 
             Warren Harris <warren@netscape.com>
         CC: 
             Doug Turner <dougt@netscape.com>
 References: 
             1




Warren Harris wrote:
> 
> Hi Wan-Teh,
> 
> I've got a problem with nsThread.cpp that I'm hoping you can help me
> solve. Doug and I have changed the implementation of nsThreadPool to
> do a little better job managing the threads in the pool, letting the
> count of threads grow up to a max, and shrink back down to a minimum
> number when not needed.

That's good!

> However, the problem comes in because I've
> made the threads be joinable so that we can synchronize with them when
> the thread pool gets destroyed. However, for the threads that
> terminate themselves (because they're no longer needed), there's no
> way that I can tell to clear the joinable bit. This causes them to not
> actually go away, but hang out waiting to be joined with even though
> the thread pool has now forgotten about them. Is there some way to
> clear the joinable bit?

No.  Once an NSPR thread is create joinable or unjoinable,
there is no way to change that attribute.  I'm not sure
whether that's an oversight or a conscious design decision.

Pthreads has the operation you want: pthread_detach.

It's not hard to work around this limitation.  You'd create
all your threads PR_UNJOINABLE_THREAD.  Then you use a
thread count and a condition variable to synchronize with
them when the thread pool gets destroyed, like this:

void WorkerThread(void *arg)
{
    ....
    // need to exit, either voluntarily or because the
    // thread pool is being destroyed.
    PR_Lock(thread_pool_lock);
    if (--thread_count == 0) {
        PR_NotifyCondVar(all_threads_exited_cv);
    }
    PR_Unlock(thread_pool_lock);
}

void TheThreadThatDestroysTheThreadPool(void *arg)
{
    ...
    // signals all the threads in the pool to exit
    ...
    // synchronize with them
    PR_Lock(thread_pool_lock);  // you might already be holding this
lock when you signal the threads to exit
    while (thread_count) {
        PR_WaitCondVar(all_threads_exited_cv, PR_INTERVAL_NO_TIMEOUT);
    }
    PR_Unlock(thread_pool_lock);
    ....
}

I will take a look at your new thread pool code tonight.

Wan-Teh



Maybe Doug would be good enough to take this task on again and help me solve 
this problem. 

(BTW Doug, If you look for the #if 0 in nsThread.cpp you'll see how I was 
attempting to let threads go away. I don't think there was any need for the 
initial Wait with timeout that you had before -- if there are too many 
already, just let them go.)
Target Milestone: --- → M18
Doug: Here's the thread-pool bug. 
looking at this.
Assignee: warren → dougt
Keywords: embed, footprint, nsbeta3
Whiteboard: nsbeta3+
Whiteboard: nsbeta3+ → [nsbeta3+]
wtc, what are the chances that we add a detach function to nspr?
I don't have time to do that.  Sorry.
minusing per PDT rules.
Whiteboard: [nsbeta3+] → [nsbeta3-]
I have almost fixed this in my tree.  It will reduce running footprint by
reducing thread count by 75 percent (eg there are usually 4 running file
transport thread all the time.  I will reduce this to one)  If you startup FTP,
you will get another four.  This should be fixed, if not for seamonkey, for
mozilla and other clients.
Whiteboard: [nsbeta3-] → [nsbeta3+]
Attached patch Proposed fix - 1Splinter Review
Doug: Brendan's comments not withstanding, we're going to have to get together 
to talk through this change like we did before. I want us both to convince 
ourselves that this fix really works (and I'm going to need your help). How does 
Thurs look?
Warren, I haven't given a= yet, and I'd like to see a new patch attached based 
on my comments, but I think dougt is heading in the right direction, if not 
already there.

/be
Doug: I'll try your patch tonight. However, GetThreadPrivate/SetThreadPrivate 
still bug me. I don't think they're methods that should go on the interface 
class since there's no way for 2 clients to coordinate who owns the thread 
private data. Nspr has a concept that TLS has a key that's used for this. I 
think these methods should go on the implementation class.
Sorry, I meant GetPrivateData/SetPrivateData. And I don't even see where these 
get used anywhere.
Subject: 
             Re: thread pool diff
       Date: 
             Tue, 26 Sep 2000 00:53:11 -0700
       From: 
             Warren Harris <warren@netscape.com>
          To: 
             dougt@netscape.com
         CC: 
             valeski@netscape.com
 References: 
             1




Ok, I reviewed this tonight some more, and I don't like it. A lot of the 
synchronization looks wrong to me (or removed). For
instance, there's no synchronization when the thread starts up with the thread 
putting the event in the queue. I found this to be
important in the past. Also, there are numerous places where you let go of the 
lock to do something, but don't check invariants
when you re-enter. I'm not convinced this is safe. 

Let's try to get together again to talk this through. Tomorrow afternoon? (BTW, 
I'll post a slightly cleaned-up diff). 

Warren 
  
per email with Jud, changing nsbeta3+ to nsbeta3- on all "embed" keyword bugs
since embedding changes will not be made in the mn6 branch. If you feel this bug
fix needs to go into mn6 branch, please list the reasons/user impact/risk and
nominate for rtm. Thanks.
QA Contact: leger → lchiang
Whiteboard: [nsbeta3+] → [nsbeta-]
I really think this should go into b3. Restoring +
Whiteboard: [nsbeta-] → [nsbeta3+]
Ok, I'll send this to pdt mailing list for review..  PDT hasn't been looking at 
bugs with embed in the keyword since we've been leaving that up to embed 
team/Judson.
Warren, why do you think this should go into nsbeta3 rather than rtm?  How are
users affected either way, and also, what is the risk involved?
Here's why I want to fix this: As it stands, we have a parameter that controls 
the min and max number of threads that deal with file i/o. But this is broken, 
and we always hang out with the max number of threads (currently 4). This means 
that usually there are 4 threads sitting idle, taking up memory and holding 
references to objects (e.g. file transport objects and their associated urls, 
etc.). What's worse is that if the user goes to download a file, a file fetcher 
thread gets employed, and if the user downloads more than 4 things, we 
stall because we don't want to allow more than 4 threads.

If we can fix this bug, we'd be able to set the min threads down to 1, so that 
usually there was only one file fetcher hanging around, and set the max to a 
large number so that the user effectively never runs into the threshold while 
downloading files.
So the worst scenario is stalling when trying to download more than 4 files
concurrently?  That doesn't sound like a beta-stopper to me.
I think it's pretty important to fix.
Yes, you've said that. PDT is trying to understand why you it should go into the
branch rather than just rtm, which at this point is happening to very few bugs.
Reading the discussion about the fix, I think it is too late and too risky for
PR3 (hopefully finalized on Thursday of this week).
We need a clear unedrstanding of what the restriction is (no more than 4
downloads at once would be a big restriction IMO), or the size of a leak that
might be induced by the thread never being destroyed.  That info would help us
weigh the risk (when you think you really have this nailed) against the benefit,
to try to make the RTM call. For now, I've nominated for rtm, but taken it off
the PR3 radar (minus for nsbeta3).
Keywords: rtm
Whiteboard: [nsbeta3+] → [nsbeta3-]
Attached patch Reviewed PatchSplinter Review
the last attachment is unified diff that wtc warren and I reviewed.
I attached some slightly cleaned up diffs. I've run with them and they work 
great. I even upped the file transport count to 16 (not included in the diff) 
and found that on startup, we can have up to 14 threads running in parallel. 
They promptly drop back down, grow to 7, drop again -- pretty good. I don't know 
if we want to allow 14 threads at once -- I'll have to do some timing analysis 
to see if it's a net win. For file transfers (and ftp downloads), I had other 
problems that prevented me from determining whether this helped that or not. See 
bug 54648 for the ftp problems.
Okay.  I will check these into the TRUNK today given that warren has review and
brendan as approved.  Warren, you are going to have to fight pdt to get this one
into the branch.
Fix checked in:

Checking in nsThread.cpp;
/cvsroot/mozilla/xpcom/threads/nsThread.cpp,v  <--  nsThread.cpp
new revision: 1.27; previous revision: 1.26
done
Checking in nsThread.h;
/cvsroot/mozilla/xpcom/threads/nsThread.h,v  <--  nsThread.h
new revision: 1.16; previous revision: 1.15
done
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
If you want this on PDT's radar to review for the branch, I'm going to reopen
this and the module/area owner needs to mark rtm+.  This way, PDT will review
and this bug doesn't get lost.

I'll add FIXED IN TRUNK to the status whiteboard.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [nsbeta3-] → [nsbeta3-] fixed on trunk
Just out of curiousity, are we going to be introducing any negative performance
ramifications for the Mac with these changes? I know the simulated threads on
the Mac are pretty expensive right? It sounds like we used to have a max of 4
running threads (at least on startup) and after reading Warren's comments I
thought he said we could exceed that number. Or do the diffs still place a limit
on the total # of threads and Warren is just experimenting with increasing that
number?

Just curious...
Marking as fixed.  this is not going to be in n6.

mscott, the threads in the file thread pool will allocate up to four threads at
startup.

http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsFileTransportService.h#31
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Cool! I've been running with this patch for a week or two now and it's looked
good so far. Nice job guys.
If this is not going to be in NS6, please mark this rtm-. 
QA Contact: lchiang → asa
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: