Closed Bug 97833 Opened 24 years ago Closed 24 years ago

http should periodically prune it's idle connection list of dead connections

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.7

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(Whiteboard: [patch needs r=])

Attachments

(5 files)

Attaching a quick fix patch to this problem based on a timer. 24 years ago Vinay Badami 4.99 KB, patch		Details \| Diff \| Splinter Review
comments incorporated into patch 24 years ago Vinay Badami 6.53 KB, patch		Details \| Diff \| Splinter Review
darin's comments + fix for a race condition 24 years ago Vinay Badami 6.56 KB, patch		Details \| Diff \| Splinter Review
with changes for xpcom-shutdown event handling 24 years ago Vinay Badami 6.72 KB, patch		Details \| Diff \| Splinter Review
incorporating darins latest comments 24 years ago Vinay Badami 6.43 KB, patch	dougt : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review

Darin Fisher

Assignee

Description

•

24 years ago

http should periodically prune it's idle connection list of dead connections

Darin Fisher

Assignee

Updated

•

24 years ago

Priority: -- → P4

Target Milestone: --- → mozilla0.9.5

neeti

Updated

•

24 years ago

Target Milestone: mozilla0.9.5 → mozilla0.9.6

Darin Fisher

Assignee

Comment 1

•

24 years ago

-> me.. meant to assign this to myself.

Assignee: neeti → darin

Bradley Baetz (:bbaetz)

Comment 2

•

24 years ago

*** Bug 104138 has been marked as a duplicate of this bug. ***

Darin Fisher

Assignee

Updated

•

24 years ago

Target Milestone: mozilla0.9.6 → mozilla0.9.7

James Ralston

Comment 3

•

24 years ago

Someone who has the ability: please add "mozilla1.0" to the Keywords section. IMO, it's absolutely critical that this bug be fixed before mozilla1.0 is released; see Bug 104138 (the complete comments) for an explanation as to why.

Darin Fisher

Assignee

Comment 4

•

24 years ago

the fact that this bug is targeted for mozilla 0.9.7 means that it should be fixed by mozilla 1.0, so no reason to nominate it.

Status: NEW → ASSIGNED

James Ralston

Comment 5

•

24 years ago

The reason why I wanted to nominate this bug for mozilla1.0 is because it has already been "pushed back" multiple releases. I didn't want to see it pushed back beyond the 1.0 release. Regardless, as long as the bug is fixed before 1.0 is released, then I (and the rest of the world, which has had to suffer years of Netscape's completely brain-damaged handling of TCP sockets) will be happy.

Vinay Badami

Comment 6

•

24 years ago

Attached patch Attaching a quick fix patch to this problem based on a timer. — Details — Splinter Review

Create a timer that will trigger once a minute and walk the available connection list to get rid of dead connections.

Darin Fisher

Assignee

Comment 7

•

24 years ago

Comment on attachment 59536 [details] [diff] [review] Attaching a quick fix patch to this problem based on a timer. thanks for putting together this patch... here's some comments: >Index: nsHttpConnection.h >@@ -82,6 +82,7 @@ > void DontReuse() { mKeepAliveMask = PR_FALSE; > mKeepAlive = PR_FALSE; > mIdleTimeout = 0; } >+ PRBool IsAlive(); > > nsAHttpTransaction *Transaction() { return mTransaction; } > nsHttpConnectionInfo *ConnectionInfo() { return mConnectionInfo; } >@@ -102,7 +103,6 @@ > > nsresult SetupSSLProxyConnect(); > >- PRBool IsAlive(); you want nsHttpConnection::CanReuse, not nsHttpConnection::IsAlive. >Index: nsHttpHandler.cpp >+ , terminating(PR_FALSE) >+ , mTimer(nsnull) terminating should be mTerminating >@@ -256,6 +260,13 @@ > observerSvc->AddObserver(this, "profile-before-change", PR_TRUE); > observerSvc->AddObserver(this, "session-logout", PR_TRUE); > } >+ >+ mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv); >+ if (NS_FAILED(rv)) >+ return rv; failure to create this timer should not be a fatal error. >+ mTimer->Init(DeadConnectionCleanupCB, this, 60*1000, >+ NS_PRIORITY_NORMAL, NS_TYPE_REPEATING_SLACK); >+ >+nsresult >+nsHttpHandler::PurgeDeadConnections() >+{ >+ if (terminating == PR_FALSE) { >+ nsAutoLock lock(mConnectionLock); >+ for (PRInt32 i = 0; i < mIdleConnections.Count(); i++) { >+ nsHttpConnection* conn = (nsHttpConnection*) mIdleConnections[i]; >+ if (conn && (PR_FALSE == conn->IsAlive())) { >+ // Dead and idle connection; purge it >+ mIdleConnections.RemoveElement(conn); >+ } >+ } >+ } >+ return NS_OK; >+} how about: if (conn && !conn->CanReuse()) { you must also NS_RELEASE(conn) otherwise it will not actually be destroyed. >+void >+nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure) >+{ >+ nsHttpHandler* me = NS_STATIC_CAST(nsHttpHandler*, aClosure); >+ me->PurgeDeadConnections(); >+} >+ >+ looks like you have an extra newline here. >Index: nsHttpHandler.h >=================================================================== >RCS file: /cvsroot/mozilla/netwerk/protocol/http/src/nsHttpHandler.h,v >retrieving revision 1.20 >diff -u -r1.20 nsHttpHandler.h >--- nsHttpHandler.h 2001/11/04 05:46:06 1.20 >+++ nsHttpHandler.h 2001/11/28 16:30:57 >@@ -42,6 +42,7 @@ > #include "nsWeakReference.h" > #include "nsVoidArray.h" > #include "nsIIDNService.h" >+#include "timer/nsITimer.h" should be #include "nsITimer.h" add "timer" to the REQUIRES variable in Makefile.in and makefile.win you may also need to edit the netwerk/macbuild/netwerk.mcp to get this to work on the mac. >@@ -76,6 +77,9 @@ > // Returns a pointer to the one and only HTTP handler > static nsHttpHandler *get() { return mGlobalInstance; } > >+ // CB function for timer to clean up dead http connections >+ static void nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure); >+ indentation problem here. >@@ -123,6 +127,9 @@ > // longer than mMaxRequestDelay. > nsresult ProcessTransactionQ(); > >+ // Purge dead connections >+ nsresult PurgeDeadConnections(); >+ indentation problem here. >@@ -237,6 +245,7 @@ > nsCOMPtr<nsICacheSession> mCacheSession_MEM; > PRUint32 mLastUniqueID; > PRUint32 mSessionStartTime; >+ PRBool terminating; how about making this a PRPackedBool and placing it near the end with the other PRPackedBool's. on some platforms these are packed together in one word to reduce the size of the data structure.

Attachment #59536 - Flags: needs-work+

Vinay Badami

Comment 8

•

24 years ago

Attached patch comments incorporated into patch — Details — Splinter Review

Darrin Have incorporated ur comments. Hope I got them all. I was also forced to make changes to mozilla/netwerk/build/Makefile.in and mozilla/netwerk/build/Makefile.win once I moved from #include "timer/nsITimer.h" to #include "nsITimer.h". can u help out on the MAC part of it please.

Darin Fisher

Assignee

Comment 9

•

24 years ago

Comment on attachment 59672 [details] [diff] [review] comments incorporated into patch > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mterminating = PR_TRUE; prefer |mTerminating| over |mterminating| > >+ if (mTimer != nsnull) { >+ mTimer->Cancel(); >+ } and just a minor nit: please do away with the extra braces if possible. these two points are really just style nits, but as they say: "when in rome, stay in rome." ;-) >+ mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv); >+ // If we were unable to create mTimer, we will still work. >+ // But connections are not aggressively reaped. >+ if (mTimer) { >+ // Reap once a minute >+ mTimer->Init(DeadConnectionCleanupCB, this, 60*1000, >+ NS_PRIORITY_NORMAL, NS_TYPE_REPEATING_SLACK); >+ } >+ no need to capture |rv| if you're not going to use it. > // Returns a pointer to the one and only HTTP handler > static nsHttpHandler *get() { return mGlobalInstance; } > >+ // CB function for timer to clean up dead http connections >+ static void nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure); >+ this should be moved into the private section of the nsHttpHandler class. > >+ // Purge dead connections >+ nsresult PurgeDeadConnections(); >+ this comment seems a bit redundant. how about leaving it out since it doesn't add anything.

Attachment #59672 - Flags: needs-work+

Darin Fisher

Assignee

Comment 10

•

24 years ago

once we have a good patch, i'll give it a try on my mac, and see what project changes might need to be made.

Vinay Badami

Comment 11

•

24 years ago

Attached patch darin's comments + fix for a race condition — Details — Splinter Review

Darin done. also added a fix for a possible race condition.

Darin Fisher

Assignee

Comment 12

•

24 years ago

Comment on attachment 59679 [details] [diff] [review] darin's comments + fix for a race condition > nsHttpHandler::~nsHttpHandler() > { > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mTerminating = PR_TRUE; > >+ // scope for the nsAutoLock >+ { >+ nsAutoLock lock(mConnectionLock); >+ if (mTimer != nsnull) >+ mTimer->Cancel(); >+ } >+ i don't understand why this lock is necessary. please explain. >+nsresult >+nsHttpHandler::PurgeDeadConnections() >+{ >+ if (mTerminating == PR_FALSE) { >+ nsAutoLock lock(mConnectionLock); >+ if (mTerminating == PR_FALSE) { >+ for (PRInt32 i = 0; i < mIdleConnections.Count(); i++) { >+ nsHttpConnection* conn = (nsHttpConnection*) mIdleConnections[i]; >+ if (conn && !conn->CanReuse()) { >+ // Dead and idle connection; purge it >+ mIdleConnections.RemoveElement(conn); >+ NS_RELEASE(conn); >+ } >+ } >+ } >+ } >+ return NS_OK; >+} indentation seems all screwed up here.

Attachment #59679 - Flags: needs-work+

Vinay Badami

Comment 13

•

24 years ago

If we have both the destructor and the nsHttpHandler::PurgeDeadConnections() then there can exist a race. I'am assuming (given my limited knowledge) of this code base, that the destructor for HttpHandler would get invoked only on shutdown. So this race can exist only at shutdown. Imagine the following scenario. Timer has fired and nsHttpHandler::PurgeDeadConnections() is getting rid of the dead connections in the list. In this time, the destructor has also fired as a result of shutdown. Then the desturctor can also be touching the same data structures. Hence it is required to mark terminating in the destructor. Lock and cancel the timer. The destructor will not get the lock if the timer has fired and PurgeDeadConnections is executing. Also notice the usual double check pattern for checks on mTerminating in PurgeDeadConnections. With this we ensure correctness of nsHttpHandler irrespective of the underlying architecture. ex Was the underlying architecture designed to always cancel all timers on shutdown so that such races do not exist etc etc.

Darin Fisher

Assignee

Comment 14

•

24 years ago

we have bigger problems if the destructor is running while the timer function is still live... afterall it has a weak reference to the handler. we should instead make the http handler observe the xpcom-shutdown event. see nsIObserverService in xpcom/ds. on xpcom-shutdown we can cancel the timer. since the handler is a service (hence owned by the service manager) we can be sure that the handler won't be destroyed prior to xpcom-shutdown.

Vinay Badami

Comment 15

•

24 years ago

Attached patch with changes for xpcom-shutdown event handling — Details — Splinter Review

Darin Fisher

Assignee

Comment 16

•

24 years ago

Comment on attachment 59828 [details] [diff] [review] with changes for xpcom-shutdown event handling >Index: protocol/http/src/nsHttpHandler.cpp >+ , mTerminating(PR_FALSE) >+ , mTimer(nsnull) > { > NS_INIT_ISUPPORTS(); looks like we don't need mTerminating anymore. > nsHttpHandler::~nsHttpHandler() > { >+ // We do not deal with the timer cancellation in the destructor since >+ // it is taken care of in xpcom shutdown event in the Observe method. >+ > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mTerminating = PR_TRUE; please fix indentation here. >+ else if (!nsCRT::strcmp(topic, "xpcom-shutdown")) { >+ if (mTimer != nsnull) { >+ mTimer->Cancel(); >+ mTimer = nsnull; >+ } >+ } should probably use NS_XPCOM_SHUTDOWN_OBSERVER_ID instead of "xpcom-shutdown" just clean up these few things, and it looks like this patch'll do the trick.

Attachment #59828 - Flags: needs-work+

Vinay Badami

Comment 17

•

24 years ago

Attached patch incorporating darins latest comments — Details — Splinter Review

removed mTerminating since it is not required. indentation changes

Darin Fisher

Assignee

Comment 18

•

24 years ago

Comment on attachment 60119 [details] [diff] [review] incorporating darins latest comments looks good, sr=darin

Attachment #60119 - Flags: superreview+

Darin Fisher

Assignee

Comment 19

•

24 years ago

actually, i thought a bit more about this patch, and i think that it might be better to use a 15 second timeout instead of a 60 second timeout. this is because many web servers frequently send keep-alive timeouts of 15 seconds. moreover the cost of the timer callback is small enough that running it once every 15 seconds should not in any degrade user perceived performance.

Darin Fisher

Assignee

Updated

•

24 years ago

Keywords: patch

Darin Fisher

Assignee

Updated

•

24 years ago

Whiteboard: [patch needs r=]

Darin Fisher

Assignee

Comment 20

•

24 years ago

spoke with vinay about the timeout, and we decided to change it to 15 seconds... will do that when i check this in.

Doug Turner (:dougt)

Comment 21

•

24 years ago

Comment on attachment 60119 [details] [diff] [review] incorporating darins latest comments looks fine

Attachment #60119 - Flags: review+

Darin Fisher

Assignee

Comment 22

•

24 years ago

patch checked in... vinay: thx :)

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

James Ralston

Comment 23

•

24 years ago

Ideally, Mozilla should be checking for closed connections on the order of once every 1 second, or maybe once every 5 seconds. However, if the only way to discover closed connections is to walk a tree structure and poll each connection individually, then the performance penalty associated with those actions is probably too great to walk the tree every 1 second. On unix systems, the poll() and/or select() system calls can be used to check an arbitrary number of file descriptors at once. Do these system calls exist on Windows and Macintosh? If so, it might be worth it to, for example, maintain (in addition to the tree structure) an array of fds suitable for passing to poll(). That way, instead of walking the tree and checking each fd individually, you can simply call poll() with a zero timeout, see what fds have data, and then walk the tree to find each fd in question and process it. If, given a fd, there's an index of some type that allows you to jump directly to that fd's spot in the tree, then using poll() as per the above example would *really* be worth it, because you wouldn't have to walk the tree at all. (But I haven't looked closely at the code; I don't know if the tree is indexed in that way.) At any rate, the current solution--walking the tree every 15 seconds and closing any connection that was closed by the server--is probably good enough for the time being. This will prevent servers around the world from having FIN_WAIT_2 connections stick around forever. Thanks, all.

You need to log in before you can comment on or make changes to this bug.