Last Comment Bug 604796 - react to server initiated close of idle http persistent connection
: react to server initiated close of idle http persistent connection
Status: RESOLVED FIXED
[http-conn]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: x86_64 Linux
: -- enhancement with 2 votes (vote)
: ---
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on: 624739
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-15 15:18 PDT by Patrick McManus [:mcmanus]
Modified: 2012-06-21 06:42 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
react-to-server-close.1 (16.41 KB, patch)
2011-01-10 06:44 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Splinter Review
react to server close v2 (17.25 KB, patch)
2011-04-07 17:26 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Splinter Review
nshttpconneciton-alternate v1 (10.36 KB, patch)
2011-05-12 13:15 PDT, Patrick McManus [:mcmanus]
no flags Details | Diff | Splinter Review
nshttpconneciton-alternate v2 (11.08 KB, patch)
2011-05-13 12:58 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] 2010-10-15 15:18:19 PDT
server closes an idle connection we don't react to it in any way until either
our timeout elapses or we happen to try and use that connection. During this
time the socket sits in CLOSE-WAIT status and we don't generate a FIN - which
is kinda rude to the server who is trying to reclaim the connection in an
orderly way and it keeps our tables full of useless idleconns that still count
against maxconn, etc... If some flavor of listener was established on the
idleconns they could just reap themselves when that happened. (and I do
generally think the servers timeout before we do - which makes sense we have
little incentive to release that resource, but the server might want to do so
for scaling reasons.)
Comment 1 Patrick McManus [:mcmanus] 2011-01-10 06:44:07 PST
Created attachment 502476 [details] [diff] [review]
react-to-server-close.1

I think this code actually simplifies management of the persistent connection pool - a lot of redundant code (checking if we need to stop the purge timer, managing the nsTArray, adjusting global counters, etc..) is centralized now.

Other reasons to apply:

* Fixes our current state where we leave some compliant servers in limbo - they have sent a close and we haven't responded, which prevents them from reclaiming memory etc right away. This generally leads to the server eventually sending a RST which is more or less an error path for a non-erroneous condition. Slashdot.org is an easy place to see this.

* Our current approach means we have some unusable connections contributing to mNumIdleConns until our timer goes off. This puts pressure on us to close some connections when we exceeds the global limit and we don't do anything to ensure that we are closing the unusable ones - so the pressure effectively reduces the size of our pool.

* Right now, because unusable connections can linger in the pool, we may discover via system calls that connections are unusable right when we go to use them. Tossing them away and trying the next one adds latency at a latency sensitive moment - better to do that asynchronously when we can.
Comment 2 Jason Duell [:jduell] (needinfo me) 2011-01-11 15:40:59 PST
Comment on attachment 502476 [details] [diff] [review]
react-to-server-close.1

I'm barely available other than to (occasionally) check email for the next week or so, and Honza knows this code better than me anyway.   You could also try biesi or bz if Honza's too busy.
Comment 3 Patrick McManus [:mcmanus] 2011-01-11 16:18:50 PST
(In reply to comment #2)
> Comment on attachment 502476 [details] [diff] [review]
> react-to-server-close.1
> 
> I'm barely available other than to (occasionally) check email for the next week
> or so, and Honza knows this code better than me anyway.   You could also try
> biesi or bz if Honza's too busy.

I don't mind the r? transfer, but I know this will all wait for ff 4, so no rush.
Comment 4 Patrick McManus [:mcmanus] 2011-04-07 17:26:37 PDT
Created attachment 524550 [details] [diff] [review]
react to server close v2

update bit rot.. base on top of 624739 as they conflict.. candidate for firefox 5
Comment 5 Honza Bambas (:mayhemer) 2011-05-06 10:03:25 PDT
I could steal this review...
Comment 6 Honza Bambas (:mayhemer) 2011-05-11 12:56:16 PDT
Comment on attachment 524550 [details] [diff] [review]
react to server close v2

Review of attachment 524550 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with the comments

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +498,5 @@
> +nsHttpConnection::WaitForInput(nsIInputStreamCallback *callback,
> +                               nsIEventTarget *target)
> +{
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    mSocketIn->AsyncWait(callback, 0, 0, target);

Check for mSocketIn != null and new line after NS_ABORT...

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +1179,5 @@
> +    // protocol stack are the place to deal with any exception logic.
> +
> +    if (!mConn->CanReuse()) {
> +        LOG(("Server closed idle conn %x\n", mConn.get()));
> +        nsRefPtr<nsHttpConnection> conn = mConn;

Have you observed a case that the conn object has been destroyed with |this|?  If so, then something is probably wrong, because the connection should always be referred, we always AddRef it before call to AddIdleConnection.

My point is that you probably don't need to use nsRefPtr here.

@@ +1192,5 @@
> +
> +nsHttpConnection *
> +nsHttpConnectionMgr::RemoveIdleConnection(nsConnectionEntry *ent, PRInt32 index)
> +{
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");

New line here.

@@ +1195,5 @@
> +{
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    nsRefPtr<nsIdleConnection> handle = ent->mIdleConns[index];
> +    nsHttpConnection *rv = handle->Connection();
> +    rv->WaitForInput(nsnull, nsnull);

Please use |conn| or |connection| instead of |rv| as a name.

@@ +1204,5 @@
> +    // sure that we are not pruning dead connections anymore.
> +    if (0 == mNumIdleConns)
> +        StopPruneDeadConnectionsTimer();
> +
> +    handle->Release();               /* reference added in AddIdleConnection */

This release is here because we no longer refer the nsIdleConnection object in ent->mIdleConns, right?

Then please move this release right after call to RemoveElementAt to let the two code pieces be close.

@@ +1211,5 @@
> +
> +nsHttpConnection *
> +nsHttpConnectionMgr::GetIdleConnection(nsConnectionEntry *ent, PRInt32 index)
> +{
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");

New line here.

@@ +1224,5 @@
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +
> +    // Keep The idle connection list sorted with the connections that
> +    // have moved the largest data pipelines at the front because these
> +    // connections have the largest cwnds on the server.

This comment should be right above the cycle bellow, I think.

@@ +1227,5 @@
> +    // have moved the largest data pipelines at the front because these
> +    // connections have the largest cwnds on the server.
> +
> +    nsIdleConnection *handle = new nsIdleConnection(conn, ent, this);
> +    handle->AddRef();

As the comment above, move this AddRef right after InsertElementAt.

And rather keep |handle| in nsRefPtr?  If someone changes the code bellow to return early we might leak.

@@ +1244,1 @@
>  

Could we please define (and declare) these 3 methods in a more logical order like Add,Get,Remove?  It is more readable then.

::: netwerk/protocol/http/nsHttpConnectionMgr.h
@@ +201,5 @@
> +                         nsHttpConnectionMgr *mgr)
> +            : mConn (conn),
> +            mEnt(ent),
> +            mMgr(mgr)
> +            { }

Please rather use style (also remove the space after mConn):

: mConn(conn)
, mEnt(ent)
, mMgr(mgr)
{ }

@@ +207,5 @@
> +        
> +    private:
> +        nsRefPtr<nsHttpConnection>    mConn;
> +        nsConnectionEntry            *mEnt;
> +        nsRefPtr<nsHttpConnectionMgr> mMgr;

Why exactly keep a ref to the manager?

This makes a cycle.  Also the manager has a Shutdown method that always wipes out the mIdleConns array, so it will always live longer then this class.  But maybe I miss something?
Comment 7 Honza Bambas (:mayhemer) 2011-05-11 13:05:39 PDT
One more general question and a suggestion:

why shouldn't the existing code of nsHttpConnection be used to turn the connection to an idle-wait state and use mSocketIn internally to wait?  It already implements the callback interface.  We can have a flag nsHttpConnection::mIsIdle that changes the OnInputStreamReady behavior.  Patrick, did you think of this?  It could make the patch much simpler (no new class, fewer new methods, less complexity).
Comment 8 Patrick McManus [:mcmanus] 2011-05-12 09:31:39 PDT
(In reply to comment #7)
> One more general question and a suggestion:
> 
> why shouldn't the existing code of nsHttpConnection be used to turn the
> connection to an idle-wait state and use mSocketIn internally to wait? 

I thought about it and decided against it because it involved changing more public interfaces.. But the connectionmgr code did turn out to be a little more involved than I thought in the end. I'll make a nshttpconnection based version of the patch and we can compare and choose upon seeing them both.
Comment 9 Patrick McManus [:mcmanus] 2011-05-12 13:15:01 PDT
Created attachment 532013 [details] [diff] [review]
nshttpconneciton-alternate v1

This is an alternate patch, based on driving this off of state in nsHttpConnection rather than having the Connection manager have full ownership of the socket.

As honza suggested, this is a bit smaller and I think flows better. I prefer this patch, but would be ok with either one. Will send this one off to try.
Comment 10 Patrick McManus [:mcmanus] 2011-05-13 06:22:34 PDT
(In reply to comment #9)
> Created attachment 532013 [details] [diff] [review] [review]
> nshttpconneciton-alternate v1
> Will send this one off to try.

Try is green with this alternate approach.
Comment 11 Honza Bambas (:mayhemer) 2011-05-13 11:17:45 PDT
Comment on attachment 532013 [details] [diff] [review]
nshttpconneciton-alternate v1

Review of attachment 532013 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, but:

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +502,5 @@
> +{
> +    LOG(("nsHttpConnection::BeginIdleMonitoring [this=%p]\n", this));
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    NS_ABORT_IF_FALSE(!mTransaction, "BeginIdleMonitoring() while active");
> +    

Please add here a LOG that this connection is switching to the IdleMonitoring mode.

@@ +515,5 @@
> +    LOG(("nsHttpConnection::EndIdleMonitoring [this=%p]\n", this));
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    NS_ABORT_IF_FALSE(!mTransaction, "EndIdleMonitoring() while active");
> +
> +    if (mIdleMonitoring) {

Please add here a LOG (inside the condition block) that we are switching off the IdleMonitoring mode.

@@ +809,5 @@
> +        // any exception logic.
> +
> +        if (!CanReuse()) {
> +            LOG(("Server initiated close of idle conn %p\n", this));
> +            gHttpHandler->ConnMgr()->ReclaimConnection(this);

Hmmm... Using code for connection reclaim, that must be tweaked because of it, doesn't seems the best way to close the connection to me.  It also means to call CanReuse, that is relatively expensive, two/three times.

If there were not a special reason, and you feel this has to be async (what I personally don't), then let's have a new CloseIdleConnection/OnMsgCloseIdleConnection method pair that will handle this.

Method OnMsg.. then can be used from nsHttpConnectionMgr::PurgeExcessIdleConnectionsCB, nsHttpConnectionMgr::PruneDeadConnectionsCB and nsHttpConnectionMgr::ShutdownPassCB.


But IMHO simplest is just to close and remove the connection here directly.

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +1074,5 @@
>              NS_RELEASE(temp);
>              mNumActiveConns--;
>          }
> +        else if (ent->mIdleConns.RemoveElement(conn)) {
> +            NS_ABORT_IF_FALSE(!conn->CanReuse(),

A side note: I don't think it is a good idea to call method like this, that might change state of the socket or might return different results when called more then ones, in a debugging macro (which also means called in debug build, not in optimized build).  Would be good to avoid it.
Comment 12 Patrick McManus [:mcmanus] 2011-05-13 12:58:09 PDT
Created attachment 532318 [details] [diff] [review]
nshttpconneciton-alternate v2

updated patch based on review comment 11 + irc
Comment 13 Honza Bambas (:mayhemer) 2011-05-13 13:08:07 PDT
Comment on attachment 532318 [details] [diff] [review]
nshttpconneciton-alternate v2

Review of attachment 532318 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with few nits on the logs:

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +503,5 @@
> +    LOG(("nsHttpConnection::BeginIdleMonitoring [this=%p]\n", this));
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    NS_ABORT_IF_FALSE(!mTransaction, "BeginIdleMonitoring() while active");
> +    
> +    LOG(("Entering Idle Moitoring Mode"));

Typo "Moitoring" - missing 'n'.  Please add also |this| pointer to the log.

@@ +517,5 @@
> +    NS_ABORT_IF_FALSE(PR_GetCurrentThread() == gSocketThread, "wrong thread");
> +    NS_ABORT_IF_FALSE(!mTransaction, "EndIdleMonitoring() while active");
> +
> +    if (mIdleMonitoring) {
> +        LOG(("Leaving Idle Moitoring Mode"));

As well here.
Comment 14 Patrick McManus [:mcmanus] 2011-05-25 08:34:08 PDT
http://hg.mozilla.org/mozilla-central/rev/4a95b60ecbe3

Note You need to log in before you can comment on or make changes to this bug.