Closed Bug 97833 Opened 24 years ago Closed 24 years ago

http should periodically prune it's idle connection list of dead connections

Categories

(Core :: Networking: HTTP, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: darin.moz, Assigned: darin.moz)

References

Details

(Whiteboard: [patch needs r=])

Attachments

(5 files)

http should periodically prune it's idle connection list of dead connections
Priority: -- → P4
Target Milestone: --- → mozilla0.9.5
Target Milestone: mozilla0.9.5 → mozilla0.9.6
-> me.. meant to assign this to myself.
Assignee: neeti → darin
*** Bug 104138 has been marked as a duplicate of this bug. ***
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Someone who has the ability: please add "mozilla1.0" to the Keywords section. IMO, it's absolutely critical that this bug be fixed before mozilla1.0 is released; see Bug 104138 (the complete comments) for an explanation as to why.
the fact that this bug is targeted for mozilla 0.9.7 means that it should be fixed by mozilla 1.0, so no reason to nominate it.
Status: NEW → ASSIGNED
The reason why I wanted to nominate this bug for mozilla1.0 is because it has already been "pushed back" multiple releases. I didn't want to see it pushed back beyond the 1.0 release. Regardless, as long as the bug is fixed before 1.0 is released, then I (and the rest of the world, which has had to suffer years of Netscape's completely brain-damaged handling of TCP sockets) will be happy.
Create a timer that will trigger once a minute and walk the available connection list to get rid of dead connections.
Comment on attachment 59536 [details] [diff] [review] Attaching a quick fix patch to this problem based on a timer. thanks for putting together this patch... here's some comments: >Index: nsHttpConnection.h >@@ -82,6 +82,7 @@ > void DontReuse() { mKeepAliveMask = PR_FALSE; > mKeepAlive = PR_FALSE; > mIdleTimeout = 0; } >+ PRBool IsAlive(); > > nsAHttpTransaction *Transaction() { return mTransaction; } > nsHttpConnectionInfo *ConnectionInfo() { return mConnectionInfo; } >@@ -102,7 +103,6 @@ > > nsresult SetupSSLProxyConnect(); > >- PRBool IsAlive(); you want nsHttpConnection::CanReuse, not nsHttpConnection::IsAlive. >Index: nsHttpHandler.cpp >+ , terminating(PR_FALSE) >+ , mTimer(nsnull) terminating should be mTerminating >@@ -256,6 +260,13 @@ > observerSvc->AddObserver(this, "profile-before-change", PR_TRUE); > observerSvc->AddObserver(this, "session-logout", PR_TRUE); > } >+ >+ mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv); >+ if (NS_FAILED(rv)) >+ return rv; failure to create this timer should not be a fatal error. >+ mTimer->Init(DeadConnectionCleanupCB, this, 60*1000, >+ NS_PRIORITY_NORMAL, NS_TYPE_REPEATING_SLACK); >+ >+nsresult >+nsHttpHandler::PurgeDeadConnections() >+{ >+ if (terminating == PR_FALSE) { >+ nsAutoLock lock(mConnectionLock); >+ for (PRInt32 i = 0; i < mIdleConnections.Count(); i++) { >+ nsHttpConnection* conn = (nsHttpConnection*) mIdleConnections[i]; >+ if (conn && (PR_FALSE == conn->IsAlive())) { >+ // Dead and idle connection; purge it >+ mIdleConnections.RemoveElement(conn); >+ } >+ } >+ } >+ return NS_OK; >+} how about: if (conn && !conn->CanReuse()) { you must also NS_RELEASE(conn) otherwise it will not actually be destroyed. >+void >+nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure) >+{ >+ nsHttpHandler* me = NS_STATIC_CAST(nsHttpHandler*, aClosure); >+ me->PurgeDeadConnections(); >+} >+ >+ looks like you have an extra newline here. >Index: nsHttpHandler.h >=================================================================== >RCS file: /cvsroot/mozilla/netwerk/protocol/http/src/nsHttpHandler.h,v >retrieving revision 1.20 >diff -u -r1.20 nsHttpHandler.h >--- nsHttpHandler.h 2001/11/04 05:46:06 1.20 >+++ nsHttpHandler.h 2001/11/28 16:30:57 >@@ -42,6 +42,7 @@ > #include "nsWeakReference.h" > #include "nsVoidArray.h" > #include "nsIIDNService.h" >+#include "timer/nsITimer.h" should be #include "nsITimer.h" add "timer" to the REQUIRES variable in Makefile.in and makefile.win you may also need to edit the netwerk/macbuild/netwerk.mcp to get this to work on the mac. >@@ -76,6 +77,9 @@ > // Returns a pointer to the one and only HTTP handler > static nsHttpHandler *get() { return mGlobalInstance; } > >+ // CB function for timer to clean up dead http connections >+ static void nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure); >+ indentation problem here. >@@ -123,6 +127,9 @@ > // longer than mMaxRequestDelay. > nsresult ProcessTransactionQ(); > >+ // Purge dead connections >+ nsresult PurgeDeadConnections(); >+ indentation problem here. >@@ -237,6 +245,7 @@ > nsCOMPtr<nsICacheSession> mCacheSession_MEM; > PRUint32 mLastUniqueID; > PRUint32 mSessionStartTime; >+ PRBool terminating; how about making this a PRPackedBool and placing it near the end with the other PRPackedBool's. on some platforms these are packed together in one word to reduce the size of the data structure.
Attachment #59536 - Flags: needs-work+
Darrin Have incorporated ur comments. Hope I got them all. I was also forced to make changes to mozilla/netwerk/build/Makefile.in and mozilla/netwerk/build/Makefile.win once I moved from #include "timer/nsITimer.h" to #include "nsITimer.h". can u help out on the MAC part of it please.
Comment on attachment 59672 [details] [diff] [review] comments incorporated into patch > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mterminating = PR_TRUE; prefer |mTerminating| over |mterminating| > >+ if (mTimer != nsnull) { >+ mTimer->Cancel(); >+ } and just a minor nit: please do away with the extra braces if possible. these two points are really just style nits, but as they say: "when in rome, stay in rome." ;-) >+ mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv); >+ // If we were unable to create mTimer, we will still work. >+ // But connections are not aggressively reaped. >+ if (mTimer) { >+ // Reap once a minute >+ mTimer->Init(DeadConnectionCleanupCB, this, 60*1000, >+ NS_PRIORITY_NORMAL, NS_TYPE_REPEATING_SLACK); >+ } >+ no need to capture |rv| if you're not going to use it. > // Returns a pointer to the one and only HTTP handler > static nsHttpHandler *get() { return mGlobalInstance; } > >+ // CB function for timer to clean up dead http connections >+ static void nsHttpHandler::DeadConnectionCleanupCB(nsITimer* aTimer, void* aClosure); >+ this should be moved into the private section of the nsHttpHandler class. > >+ // Purge dead connections >+ nsresult PurgeDeadConnections(); >+ this comment seems a bit redundant. how about leaving it out since it doesn't add anything.
Attachment #59672 - Flags: needs-work+
once we have a good patch, i'll give it a try on my mac, and see what project changes might need to be made.
Darin done. also added a fix for a possible race condition.
Comment on attachment 59679 [details] [diff] [review] darin's comments + fix for a race condition > nsHttpHandler::~nsHttpHandler() > { > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mTerminating = PR_TRUE; > >+ // scope for the nsAutoLock >+ { >+ nsAutoLock lock(mConnectionLock); >+ if (mTimer != nsnull) >+ mTimer->Cancel(); >+ } >+ i don't understand why this lock is necessary. please explain. >+nsresult >+nsHttpHandler::PurgeDeadConnections() >+{ >+ if (mTerminating == PR_FALSE) { >+ nsAutoLock lock(mConnectionLock); >+ if (mTerminating == PR_FALSE) { >+ for (PRInt32 i = 0; i < mIdleConnections.Count(); i++) { >+ nsHttpConnection* conn = (nsHttpConnection*) mIdleConnections[i]; >+ if (conn && !conn->CanReuse()) { >+ // Dead and idle connection; purge it >+ mIdleConnections.RemoveElement(conn); >+ NS_RELEASE(conn); >+ } >+ } >+ } >+ } >+ return NS_OK; >+} indentation seems all screwed up here.
Attachment #59679 - Flags: needs-work+
If we have both the destructor and the nsHttpHandler::PurgeDeadConnections() then there can exist a race. I'am assuming (given my limited knowledge) of this code base, that the destructor for HttpHandler would get invoked only on shutdown. So this race can exist only at shutdown. Imagine the following scenario. Timer has fired and nsHttpHandler::PurgeDeadConnections() is getting rid of the dead connections in the list. In this time, the destructor has also fired as a result of shutdown. Then the desturctor can also be touching the same data structures. Hence it is required to mark terminating in the destructor. Lock and cancel the timer. The destructor will not get the lock if the timer has fired and PurgeDeadConnections is executing. Also notice the usual double check pattern for checks on mTerminating in PurgeDeadConnections. With this we ensure correctness of nsHttpHandler irrespective of the underlying architecture. ex Was the underlying architecture designed to always cancel all timers on shutdown so that such races do not exist etc etc.
we have bigger problems if the destructor is running while the timer function is still live... afterall it has a weak reference to the handler. we should instead make the http handler observe the xpcom-shutdown event. see nsIObserverService in xpcom/ds. on xpcom-shutdown we can cancel the timer. since the handler is a service (hence owned by the service manager) we can be sure that the handler won't be destroyed prior to xpcom-shutdown.
Comment on attachment 59828 [details] [diff] [review] with changes for xpcom-shutdown event handling >Index: protocol/http/src/nsHttpHandler.cpp >+ , mTerminating(PR_FALSE) >+ , mTimer(nsnull) > { > NS_INIT_ISUPPORTS(); looks like we don't need mTerminating anymore. > nsHttpHandler::~nsHttpHandler() > { >+ // We do not deal with the timer cancellation in the destructor since >+ // it is taken care of in xpcom shutdown event in the Observe method. >+ > LOG(("Deleting nsHttpHandler [this=%x]\n", this)); >+ mTerminating = PR_TRUE; please fix indentation here. >+ else if (!nsCRT::strcmp(topic, "xpcom-shutdown")) { >+ if (mTimer != nsnull) { >+ mTimer->Cancel(); >+ mTimer = nsnull; >+ } >+ } should probably use NS_XPCOM_SHUTDOWN_OBSERVER_ID instead of "xpcom-shutdown" just clean up these few things, and it looks like this patch'll do the trick.
Attachment #59828 - Flags: needs-work+
removed mTerminating since it is not required. indentation changes
Comment on attachment 60119 [details] [diff] [review] incorporating darins latest comments looks good, sr=darin
Attachment #60119 - Flags: superreview+
actually, i thought a bit more about this patch, and i think that it might be better to use a 15 second timeout instead of a 60 second timeout. this is because many web servers frequently send keep-alive timeouts of 15 seconds. moreover the cost of the timer callback is small enough that running it once every 15 seconds should not in any degrade user perceived performance.
Keywords: patch
Whiteboard: [patch needs r=]
spoke with vinay about the timeout, and we decided to change it to 15 seconds... will do that when i check this in.
Comment on attachment 60119 [details] [diff] [review] incorporating darins latest comments looks fine
Attachment #60119 - Flags: review+
patch checked in... vinay: thx :)
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Ideally, Mozilla should be checking for closed connections on the order of once every 1 second, or maybe once every 5 seconds. However, if the only way to discover closed connections is to walk a tree structure and poll each connection individually, then the performance penalty associated with those actions is probably too great to walk the tree every 1 second. On unix systems, the poll() and/or select() system calls can be used to check an arbitrary number of file descriptors at once. Do these system calls exist on Windows and Macintosh? If so, it might be worth it to, for example, maintain (in addition to the tree structure) an array of fds suitable for passing to poll(). That way, instead of walking the tree and checking each fd individually, you can simply call poll() with a zero timeout, see what fds have data, and then walk the tree to find each fd in question and process it. If, given a fd, there's an index of some type that allows you to jump directly to that fd's spot in the tree, then using poll() as per the above example would *really* be worth it, because you wouldn't have to walk the tree at all. (But I haven't looked closely at the code; I don't know if the tree is indexed in that way.) At any rate, the current solution--walking the tree every 15 seconds and closing any connection that was closed by the server--is probably good enough for the time being. This will prevent servers around the world from having FIN_WAIT_2 connections stick around forever. Thanks, all.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: