Last Comment Bug 623948 - Re-Implement HTTP Connection Acceleration (i.e. Syn Retry)
: Re-Implement HTTP Connection Acceleration (i.e. Syn Retry)
Status: RESOLVED FIXED
[http-conn]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- enhancement with 2 votes (vote)
: mozilla6
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
: 623201 (view as bug list)
Depends on: 623134 623921 631801 647574 656989 671053 746255
Blocks: 595316 603503 634278 651635
  Show dependency treegraph
 
Reported: 2011-01-07 10:04 PST by Patrick McManus [:mcmanus]
Modified: 2012-06-21 06:42 PDT (History)
25 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
syn retry v1 (59.10 KB, patch)
2011-02-03 10:56 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
syn retry v2 (58.71 KB, patch)
2011-02-08 12:50 PST, Patrick McManus [:mcmanus]
honzab.moz: review-
Details | Diff | Review
syn retry v3 (61.02 KB, patch)
2011-02-17 11:30 PST, Patrick McManus [:mcmanus]
no flags Details | Diff | Review
syn retry v4 (61.64 KB, patch)
2011-03-10 06:41 PST, Patrick McManus [:mcmanus]
honzab.moz: review+
Details | Diff | Review
syn retry v5 (61.94 KB, patch)
2011-03-21 08:10 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
Details | Diff | Review
syn retry v6 (61.96 KB, patch)
2011-03-22 11:48 PDT, Patrick McManus [:mcmanus]
mcmanus: review+
Details | Diff | Review
followup to fix gcc 4.5 issue (1.98 KB, patch)
2011-04-01 13:49 PDT, Daniel Holbert [:dholbert]
ehsan: review+
Details | Diff | Review

Description Patrick McManus [:mcmanus] 2011-01-07 10:04:44 PST
This feature was briefly implemented as part of Gecko 2.0 and removed before release. It can be re added in version.next with the following changes:

* tests (was bug 623201)
* investigate client certificate issues (was bug 623785)
* rearrangement to drive the connections from the connection manager code instead of the nsHttpConnection code. (613977 comment 79)

The backout bug and patch are found in bug 623921.. the actual commit was
http://hg.mozilla.org/mozilla-central/rev/c1b1fde638f9

The background information on this feature (from its original bug 592284) is:
Some packets are more painful to lose than others. In particular, the SYN
setting up an HTTP channel is really painful because the OS relies on large
constant values before resending it. The values are different between
platforms, but sending a retry after 3 seconds and another at the 9 second mark
(6 seconds after the first retry) is typical. Obviously a 3 second blocker in
reaction to a single lost packet is very painful.

Chrome still lets the OS set the retry rate, but if 250ms expires after sending
the first connection packet it opens a second parallel connection. If the
problem is just a normal packet loss that happened to impact the first syn,
then things get going well.

It makes a lot of sense - this is a very small amount of very useful traffic,
so it is worth the risk of duplication.

This is the Firefox pattern:

00:00:00.000000 IP 192.168.16.214.53431 > 192.168.37.1.80:
00:00:02.999657 IP 192.168.16.214.53431 > 192.168.37.1.80: 
00:00:08.999659 IP 192.168.16.214.53431 > 192.168.37.1.80: 

This is the Chrome pattern:

00:00:00.000000 IP 192.168.16.214.53429 > 192.168.37.1.80:
00:00:00.251162 IP 192.168.16.214.53430 > 192.168.37.1.80:
00:00:02.991019 IP 192.168.16.214.53429 > 192.168.37.1.80: 
00:00:03.251023 IP 192.168.16.214.53430 > 192.168.37.1.80: 
00:00:08.991023 IP 192.168.16.214.53429 > 192.168.37.1.80: 
00:00:09.251025 IP 192.168.16.214.53430 > 192.168.37.1.80:

Some anecdotal positive performance data on this feature is found here:
http://bitsup.blogspot.com/2010/12/accelerated-connection-retry-for-http.html
Comment 1 Patrick McManus [:mcmanus] 2011-01-07 10:12:54 PST
*** Bug 623201 has been marked as a duplicate of this bug. ***
Comment 2 Patrick McManus [:mcmanus] 2011-01-07 10:16:28 PST
*** Bug 623785 has been marked as a duplicate of this bug. ***
Comment 3 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-01-07 10:49:29 PST
Why not be more aggressive than Chrome? e.g. upon navigating to a document, open two connections immediately to the document's host, on the assumption that we are going to open a second connection for sub-resources (scripts, css, images, etc.) from the same host?

Note that I'm not suggesting we should always open 2x connections for each connection we need, but only do so when we have a reasonable expectation of opening a second connection to the same server.

Or, even extend this so that, for a cached document, we immediately open up the same number of connections to the same host that we used in the last connection for that document.
Comment 4 Patrick McManus [:mcmanus] 2011-01-07 11:35:43 PST
(In reply to comment #3)
> Why not be more aggressive than Chrome? e.g. upon navigating to a document,
> open two connections immediately to the document's host, on the assumption that
> we are going to open a second connection for sub-resources (scripts, css,
> images, etc.) from the same host?
> 

right - this whole area starts to drift into prefetching connections pretty quickly and certainly even limiting ourselves to timeout based retries sometimes picks that up as a side benefit. And that's probably a good idea, that I'm more willing to explore in ff.next than in ff.4.. 

one case I worry about is "redirect to canonical hostname" which probably only uses 1 connection. Same for a bunch of cases of revalidation - but at least we can guess that one at request time and not prefetch accordingly.

in any event, there are all sorts of variations that can be tried.

if we are going to start speculatively opening connections we need to be more aggressive about actively closing them when they are not needed which we are extremely casual about right now.
Comment 5 Emanuel Hoogeveen [:ehoogeveen] 2011-01-07 11:39:22 PST
Is that hardcoded maximum connection limit of 50 still in effect? I don't remember ever seeing a bug filed specifically to address it, but then I'm not exactly in the loop..
Comment 6 Patrick McManus [:mcmanus] 2011-01-07 11:50:22 PST
(In reply to comment #5)
> Is that hardcoded maximum connection limit of 50 still in effect?

yes

> I don't
> remember ever seeing a bug filed specifically to address it, but then I'm not
> exactly in the loop..

607741
Comment 7 Jürgen Kreileder 2011-01-08 13:42:59 PST
Note that bug 623785 wasn't caused by this.
Comment 8 Patrick McManus [:mcmanus] 2011-02-03 10:56:45 PST
Created attachment 509492 [details] [diff] [review]
syn retry v1

Patch refactored to be driven from connection manager. This has a really cool side effect for even normal (non retry) connections: if a persistent connection becomes available for use after we start to open the TCP connection, but before we complete it then we will use the pconn instead of waiting for connection completion as we had been doing in the past. Anecdotally, this seems to happen to a lot of our new connections: http://bitsup.blogspot.com/2011/02/separating-http-connections-from-tcp.html
Comment 9 Patrick McManus [:mcmanus] 2011-02-08 12:50:29 PST
Created attachment 510735 [details] [diff] [review]
syn retry v2
Comment 10 Honza Bambas (:mayhemer) 2011-02-15 17:17:13 PST
Comment on attachment 510735 [details] [diff] [review]
syn retry v2

> nsSocketTransport::SetSecurityCallbacks(nsIInterfaceRequestor *callbacks)
>+        SOCKET_LOG(("Reset callbacks for secinfo=%p callbacks=%p\n", mSecInfo.get(), mCallbacks.get()));

80 lines please.

>+nsHttpConnection::Init(nsHttpConnectionInfo *info, PRUint16 maxHangTime,
>+                  nsISocketTransport *transport, nsIAsyncInputStream *instream,
>+                  nsIAsyncOutputStream *outstream,
>+                  nsIInterfaceRequestor *callbacks,
>+                  nsIEventTarget *callbackTarget)

Indention please.

>+                       
>+                       

Remove these new lines.

> {
>+    NS_ABORT_IF_FALSE (transport && instream && outstream, "init precond");

Remove the space before the opening bracket and finish the text please.

>+    mSocketTransport = transport;
>+    mSocketIn = instream;
>+    mSocketOut = outstream;
>+    nsresult rv = mSocketTransport->SetEventSink(this, nsnull);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    mCallbacks = callbacks;
>+    mCallbackTarget = callbackTarget;
>+    rv = mSocketTransport->SetSecurityCallbacks(this);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+

This is of a personal taste, but I would rather add new lines between blocks that logically belongs together.  At least after NS_ENSUREs it became a habit.

>+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp
>         if (!offline) {
>             sts = do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv);
>+            if (!mSTS && sts) {
>+                mSTS = do_QueryInterface(sts);
>+                NS_ABORT_IF_FALSE(mSTS.get(), "nonsensical failed qi");
>+            }

Why exactly is this needed?  Or is that a left over?

>@@ -469,16 +479,17 @@ nsHttpConnectionMgr::PruneDeadConnection
>     // if this entry is empty, then we can remove it.
>     if (ent->mIdleConns.Length()   == 0 &&
>         ent->mActiveConns.Length() == 0 &&
>+        ent->mHalfOpens.Length() == 0 &&
>         ent->mPendingQ.Length()    == 0) {

Indention of '== 0 &&' ?

>+nsHttpConnectionMgr::CreateTransport(nsConnectionEntry *ent,
>+    NS_ABORT_IF_FALSE (mSTS, "no socket transport?");

No space before (

>+
>+    nsRefPtr<nsHalfOpenSocket> sock = new nsHalfOpenSocket(ent, this, trans);
>+    nsresult rv = sock->SetupPrimaryStreams();
>+    NS_ENSURE_SUCCESS(rv, rv);

SetupPrimaryStreams() may create circular references on the 'socket', AsyncWait does it.  In case of failure, you should call Abandon(), or ensure that SetupPrimaryStreams() does release all circulars it self.

>+    StartedConnect();

Could this potentially be rather called from nsHalfOpenSocket to leave it enclosed a bit?

>+    sock->SetupBackupTimer();
>+    ent->mHalfOpens.AppendElement(sock);
>+    return NS_OK;
>+}

Maybe some new lines here as well?

>+nsHttpConnectionMgr::nsHalfOpenSocket::SetupBackupTimer()
>+            mSynTimer->InitWithFuncCallback(SynTimeout, this,
>+                                            timeout, nsITimer::TYPE_ONE_SHOT);
>+            NS_ADDREF(this);

Rather let the object implement nsIObserver and then use mSynTimer->Init, it does referencing for you and the code is cleaner.

>+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon()
>+{
>+    PRInt32 refs = 0;
>+    
>+    if (mStreamOut) {
>+        mStreamOut->AsyncWait(nsnull, 0, 0, nsnull);
>+        mMgr->RecvdConnect();
>+        mStreamOut = nsnull;
>+        refs++;

Why is actually needed to AddRef manually in SetupStreams() when the stream holds a ref to the callback (this object) ?

>+    mEnt = nsnull;
>+    for (PRInt32 i = 0; i < refs; i++) {
>+        NS_RELEASE_THIS();
>+    }

I would like to get rid of coding like this, even though it works correctly.

>+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out)
>+        conn->SetIsReused(PR_TRUE);

I really want to do this a different way.  This might lead to duplicate requests being sent and processed by the server.  I would rather allow restart of a transaction when the connection was lingering for a while.  You are probably a more experienced to specify the time and decide, but something around a second or two could make it?  This is important.

>+    class nsHalfOpenSocket : public nsIOutputStreamCallback,
>+        nsHttpConnectionMgr            *mMgr;

You probably don't need to reference the manager.  It is accessible through gHttpHandler->ConnMgr().

>         nsHttpConnectionInfo        *mConnInfo;
>         nsTArray<nsHttpTransaction*> mPendingQ;    // pending transaction queue
>         nsTArray<nsHttpConnection*>  mActiveConns; // active connections
>         nsTArray<nsHttpConnection*>  mIdleConns;   // idle persistent connections
>+        nsTArray<nsHalfOpenSocket*>  mHalfOpens;

I would love to turn these all to nsTArray< nsRefPtr<ns*> > one day.  But not in this bug, it is complex a lot already.


In general, please add LOGs to your new code.  When something goes wrong, we have to be able to trace it.

r- for the IsReused flag thing.



So, I did testing on this patch, and it works well.  

However, when I was suggesting to move this to the connection manager and using the code we already have I hoped the patch will simplify.  Was there a particular reason to create a new class for a half open connection and introduce a new array for it?

I played with this a bit as well and I created a patch that is throwing all transaction to the pending queue and just ensures connections for all of them (if in the limit) which pop transactions from PendingQ in OnOutputStreamReady.  There is relation one nsHttpTransaction <- two nsHttpConnections each with its own socket transport.  Connections are marked as 'half-open' until OOSR() has been reached.  I wanted to check it is doable and possibly suggest to turn rather the way of a simpler patch, however I would need more time to stabilize it and it would end up doing your work, what no one wants.  Just enfolding that to make a note we could potentially go simpler next time when doing changes like these.
Comment 11 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2011-02-15 17:28:09 PST
(In reply to comment #3)
> Why not be more aggressive than Chrome? e.g. upon navigating to a document,
> open two connections immediately to the document's host, on the assumption 
> that we are going to open a second connection for sub-resources (scripts,
> css, images, etc.) from the same host?

IE9 and Chrome do this, according to this blog post:
http://www.belshe.com/2011/02/10/the-era-of-browser-preconnect/

It doesn't seem great to wait 250ms for something that we could wait 0ms to do.
If we had preconnect, would we even need syn retry? Could we make zero the default value of network.http.connection-retry-timeout?
Comment 12 Patrick McManus [:mcmanus] 2011-02-15 18:54:48 PST
(In reply to comment #11)

> IE9 and Chrome do this, according to this blog post:
see 632948
Comment 13 Patrick McManus [:mcmanus] 2011-02-15 18:55:50 PST
> see 632948

That was dumb. see 634278
Comment 14 Patrick McManus [:mcmanus] 2011-02-15 19:30:56 PST
(In reply to comment #10)

> >+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out)
> >+        conn->SetIsReused(PR_TRUE);
> 
> I really want to do this a different way.  This might lead to duplicate
> requests being sent and processed by the server.  I would rather allow restart
> of a transaction when the connection was lingering for a while.  You are
> probably a more experienced to specify the time and decide, but something
> around a second or two could make it?  This is important.
> 

can you be more clear on what your concern is here? This is exactly the same algorithm as the code that was previously merged which I do not believe had a problem. If the extra connection is not used for 5 seconds, for whatever reason, it is thrown away... and this code is only governing connections that turn out to be superfluous.

Marking it reused allows us to recover from the race condition of receiving a FIN while our request is in flight and only in those cases where there has been no data recvd. This is exactly what happens on every use of a persistent connection, which is well over 50% of all transactions in my observations - there is no real risk of a fin when we send the request as soon as the connection is open but if we wait even a second or two it becomes a rare possibility that the server generated fin and our request pass as ships in the night. Normally you have 30 or 60 seconds before the close, but the other day I saw nginx configured with a non-standard timeout of 4 seconds. I don't think we need to shoot for 100% under the minimum, 5 seconds captures the vast majority of cases, handling the close gracefully with isreused() is enough.

landing 604796 would help avoid most of the restarts, right now we see a bunch of them because we only check for fins on idle connections on a timer instead of the fin arrival event. Admittedly they cause a restart() in firefox, but the request never makes it to the wire the first time for this particular class of transaction.

I'm not sure what the alternative is. If we let the transaction that gets the FIN race condition generate an error (instead of restarting because the connection is not marked isreused) then we open bug 614677 again. Arguments about resending POSTs don't resonate with me for a whole bunch of reasons, but primarily because we already do exactly this in the incredibly common scenario of an idle persistent connection reuse. We're not breaking any new ground, these superfluous connections are just more idle pconns ready for use.
Comment 15 Patrick McManus [:mcmanus] 2011-02-15 20:36:24 PST
> Was there a
> particular reason to create a new class for a half open connection and
> introduce a new array for it?
>

it seems a natural place to store the redundant data about primary and backup transports and streams.. I tried to isolate the nshttpconnection from all of that in this version of the patch. More or less managing the sockets was new functionality for the CM to have and I kind of like having it live mostly in its own class.


>>+        mStreamOut->AsyncWait(nsnull, 0, 0, nsnull);

> Why is actually needed to AddRef manually in SetupStreams() when the stream
> holds a ref to the callback (this objec

I didn't want the asyncwait cancellation (asyncwait(nsnull,0,0,nsnull)) to drop the last ref on the object part way through abandon().. I will reorganize that through a series of local variable swaps that will make abandon() safe.. do you happen to know if nsItimer holds a ref on its closure? (I don't think it does off the top of my head).
Comment 16 Honza Bambas (:mayhemer) 2011-02-16 13:20:38 PST
(In reply to comment #14)
> (In reply to comment #10)

I understand why you set the flag.  I'll try to explain my concern differently:

In the current code, w/o this patch, we allow automatic restart on network failure only if no data has been sent AND no data received or the connection has already been passed a request/response transaction (is reused).  It means that we are using the connection for a SECOND request and it failed probably because it was a stall connection.

With your patch, we also allow restart for a connection that has never been used before (no data passed up nor down through it) and lingers for 0 to 5000 milliseconds.  Important is that the connection may be used immediately (very close to 0 ms after it has been established).  In that case when a failure is detected, we cannot (must not) assume the connection was a stall connection.  There is a risk we have sent the data, the server processed the request, but the connection has been dropped (before the server could send back the response) from a real reason, not because the connection stalled.  Then we must not restart, but rather say there were an error on the wire.

So, my suggestion is to allow restart if and only if the connection has been lingering unused (better say: never used) for some amount of time.  The time is more up to you then me, because you have probably a better real-world experience with servers and accelerators out there, for instance see your bug 613977 comment 19, paragraph 3.

If it is still unclear or you believe I'm not right, lets discuss it further by mail out of this bug.

(In reply to comment #15)
> it seems a natural place to store the redundant

Yep, I don't have a strong opinion, just asking.  Anyway, have you considered to create a second nsHttpConnection object for the backup conn rather then just a transport held in mBack* members of the new object?  You always create nsHttpConnection for an established socket transport anyway.  It seemed to me much simpler approach and I expected you did it that way.  Actually a lot of code could be removed and the patch be generally simplified.

> through a series of local variable swaps that will make abandon() safe.. do you
> happen to know if nsItimer holds a ref on its closure? (I don't think it does
> off the top of my head).

Yep, just do nsRefPtr<nsHalfOpenSocket> kungFuDeathGrip(this) if needed.

nsITimer doesn't hold a ref to the closure when using InitWithCallback, it sees it as a void pointer only.  However, using nsIObserver and Init() method does it, so you don't need to bother with proper placing of NS_ADDREF()/NS_RELEASE().  You just need to cancel the timer in destructor, what you already do I think.
Comment 17 Patrick McManus [:mcmanus] 2011-02-16 13:44:26 PST
(In reply to comment #16)

> 
> So, my suggestion is to allow restart if and only if the connection has been
> lingering unused (better say: never used) for some amount of time.  

ah this helps a lot - thank you. I was confused by the way 'lingering' was used. Its a reasonable extra level of cautiousness.
Comment 18 Patrick McManus [:mcmanus] 2011-02-17 11:30:41 PST
Created attachment 513192 [details] [diff] [review]
syn retry v3

updated to reflect review comments
Comment 19 Honza Bambas (:mayhemer) 2011-03-07 14:48:16 PST
Comment on attachment 513192 [details] [diff] [review]
syn retry v3

># HG changeset patch
># Parent 19d6c295cfec3fb8e3e16dc24abacf6656e4ac49
>Accelerate TCP connection retries in HTTP
>
>Losing a TCP SYN requires a long painful (typically 3 second) delay
>before being retried. This patch creates a second parallel connection
>attempt for any nsHttpConnection which has not become writable before
>a timeout occurs.
>
>If you assume .5% packet loss, this converts a full 3 second delay
>from a 1 in 200 event into a 1 in 40,000 event.
>
>Whichever connection establishes itself first is used. If another one
>has been started and it does connect before the one being used is
>closed then the extra one is handed to the connection manager for use
>by a different transaction - essentially a persistent connection with
>0 previous transactions on it. (Another way to think about is
>pre-fetching a 3WHS on a high latency connection).
>
>The pref network.http.connection-retry-timeout controls the amount of
>time in ms to wait for success on the initial connection before beginning
>the second one. Setting it to 0 disables the parallel connection, the
>default is 250.
>

Personally, I don't think we want such a long comment in the push log.  Reference to the bug # is enough in general.


>+nsHttpConnection::Init(nsHttpConnectionInfo *info,
>     LOG(("nsHttpConnection::Init [this=%x]\n", this));

Add reference to the transport and streams to the log.

> nsHttpConnection::CanReuse()
> {
>+    // An idle persistent connection should not have data waiting to be read
>+    // before a request is sent. Data here is likely a 408 timeout response
>+    // which we would deal with later on through the restart logic, but that
>+    // path is more expensive than just closing the socket now. SSL check can
>+    // be removed with fixing of 631801
>+
>+    PRUint32 dataSize;
>+    if (rv && mSocketIn && !mConnInfo->UsingSSL() &&
>+        NS_SUCCEEDED(mSocketIn->Available(&dataSize)) && dataSize) {
>+        LOG(("Socket not reusable because read data pending on it: %s %d\n",
>+             mConnInfo->Host(), dataSize));
>+        rv = PR_FALSE;
>+    }
>+    return rv;

Why wasn't this code needed in the previous version of the patch?

>+PRBool
>+nsHttpConnection::IsReused()
>+{
>+    return mIsReused || 
>+        (mConsiderReusedAfter && (PR_Now() >= mConsiderReusedAfter));
>+}
>+
>+void
>+nsHttpConnection::SetIsReusedAfter(PRUint32 afterMiliseconds)
>+{
>+    mConsiderReusedAfter = PR_Now() + afterMiliseconds * PR_USEC_PER_MSEC;
>+}
>+

Please don't use PR_Now(), it is not monotonically increasing.  Use PR_IntervalNow() or mozilla::TimeStamp.  The letter only if doesn't mean to modify dependencies.

I understand we are using PR_Now() in the http code, that should probably be fixed as well, but not in this bug.  Let's just not repeat the same mistake for a new code.

>+    void     SetIsReused(PRBool val) {mIsReused = val;}

You don't need this method.

>+nsresult
>+nsHttpConnectionMgr::
>+nsHalfOpenSocket::SetupStreams(nsISocketTransport **transport,
>+                               nsIAsyncInputStream **instream,
>+                               nsIAsyncOutputStream **outstream)

Wrong indention.

Preserve new lines in this method that were in nsHttpConnection::CreateTransport.  This is not very well readable code.

>+nsresult
>+nsHttpConnectionMgr::nsHalfOpenSocket::SetupPrimaryStreams()
>+{
>+    nsresult rv = SetupStreams(getter_AddRefs(mSocketTransport),
>+                               getter_AddRefs(mStreamIn),
>+                               getter_AddRefs(mStreamOut));
>+    LOG(("SetupPrimaryStream %x %s %X", this, mEnt->mConnInfo->Host(), rv));
>+    if (NS_FAILED(rv)) {
>+        if (mStreamOut) {
>+            gHttpHandler->ConnMgr()->RecvdConnect();

You probably don't need to call RecvdConnect() here.  If SetupStreams() fails it won't increase the counter.

The same for nsHttpConnectionMgr::nsHalfOpenSocket::SetupBackupStreams()

>+void
>+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon()
>+{
>+    nsCOMPtr<nsIAsyncOutputStream> primary, secondary;
>+    LOG(("nsHalfOpen::Abandon %x %s", this, mEnt->mConnInfo->Host()));
>+    if (mStreamOut) {
>+        gHttpHandler->ConnMgr()->RecvdConnect();
>+        primary.swap(mStreamOut);
>+    }
>+    if (mBackupStreamOut) {
>+        gHttpHandler->ConnMgr()->RecvdConnect();
>+        secondary.swap(mBackupStreamOut);
>+    }
>+    if (mSynTimer) {
>+        mSynTimer->Cancel();
>+        mSynTimer = nsnull;
>+    }
>+    
>+    mEnt = nsnull;
>+    if (primary)
>+        primary->AsyncWait(nsnull, 0, 0, nsnull);
>+    if (secondary)
>+        secondary->AsyncWait(nsnull, 0, 0, nsnull);
>+}

If mSynTimer is the only referrer to this nsHalfOpenSocket object then you will crash on the 'mEnt = nsnull;' line.  Use nsRefPt<nsHalfOpenSocket> kungFuDeathGrip(this) here to protect against early release.

>+NS_IMETHODIMP
>+nsHttpConnectionMgr::
>+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out)
>+    if (out == mStreamOut) {
>+        rv = conn->Init(mEnt->mConnInfo,
>+                        gHttpHandler->ConnMgr()->mMaxRequestDelay,
>+                        mSocketTransport, mStreamIn, mStreamOut,
>+                        callbacks, callbackTarget);
>+        mStreamOut = nsnull;
>+    }
>+    else {
>+        rv = conn->Init(mEnt->mConnInfo,
>+                        gHttpHandler->ConnMgr()->mMaxRequestDelay,
>+                        mBackupTransport, mBackupStreamIn, mBackupStreamOut,
>+                        callbacks, callbackTarget);
>+        mBackupStreamOut = nsnull;
>+    }
>+
>+    if (NS_FAILED(rv))
>+        return rv;

Not sure about this, but shouldn't you throw mSocketTransport (or mBackupTransport) and streams away here?  There might probably still be a cycle between this nsHalfOpenSocket and m*Transport's progress sink or security callbacks.  Worth to double check this.



According to LOGs() you have added:
- use following format pattern:
"FullClassName::FullMethodName [this=%p ent=%s rv=%X]"
- don't use %x for logging pointer in your new logs or logs you modify
- add logs to constructor and destructor of your new class
- add more logs to nsHalfOpenSocket::OnOutputStreamReady logic:
  - that you are canceling the timer
  - log the pointer value of the new connection object you have created
  - definitely that conn->Init failed and with what result



As a transaction might get progress information from two separate transports, it might get it duplicated and messed in order (from point of view of the consumer).  This might confuse consumers like net monitor of Firebug.  We should fix this, but probably not in this bug.  It's way too complicated enough.  The way we should fix it should be to don't allow a 'lower' state be propagated through the same transaction that has already observed a 'higher' state, but this is just the first thought.


How far are you with the tests for this functionality?
Comment 20 Patrick McManus [:mcmanus] 2011-03-07 15:13:29 PST
(In reply to comment #19)

> >+    PRUint32 dataSize;
> >+    if (rv && mSocketIn && !mConnInfo->UsingSSL() &&
> >+        NS_SUCCEEDED(mSocketIn->Available(&dataSize)) && dataSize) {
> >+        LOG(("Socket not reusable because read data pending on it: %s %d\n",
> >+             mConnInfo->Host(), dataSize));
> >+        rv = PR_FALSE;
> >+    }
> >+    return rv;
> 
> Why wasn't this code needed in the previous version of the patch?
> 

because this version is better :)

This code is an optimization, the previous version worked. This catches the case of some servers that send a timeout response (typically a 408) without a request being sent, which is a bug imho but that's kind of beside the point. If we reuse that connection we do deal with the 408 fine - but it involves restart()ing the transaction and sending it twice. This code lets us avoid that rigamarole when its clear the connection isn't in a clean state for reuse.

Due to races in when that 408 stuff arrives vs when the request is sent, all the old code is still needed, but this catches the base case and just says - "use a different connection".
Comment 21 Honza Bambas (:mayhemer) 2011-03-07 15:25:44 PST
OK, so this piece of code is actually not needed for the patch, right?  We could actually land it separately?

My point is that you are adding more code to an already reviewed patch.  Please try to avoid this next time and rather open a new bug or at least explain your changes in the comment when adding a new patch version.  

Let's go with this for now also because we have already talked about this optimization.

Can you just rename the variable 'rv' to 'canReuse' ?
Comment 22 Patrick McManus [:mcmanus] 2011-03-07 17:08:32 PST
> >+nsresult
> >+nsHttpConnectionMgr::
> >+nsHalfOpenSocket::SetupStreams(nsISocketTransport **transport,
> >+                               nsIAsyncInputStream **instream,
> >+                               nsIAsyncOutputStream **outstream)
> 
> Wrong indention.
> 

please tell me your preferred style in this case which is 80-col challenged. This is my preference but I'll go whatever way the wind is blowing.
Comment 23 Honza Bambas (:mayhemer) 2011-03-09 10:49:56 PST
(In reply to comment #22)
> please tell me your preferred style in this case which is 80-col challenged.
> This is my preference but I'll go whatever way the wind is blowing.

Ah!  My apologies.  This looks like a Firefox/Win7 bug: even a Courier fixed size font used by bugzilla is displayed with a variable width.  It appears on my screen like the first letter of the type name on the second line would be under the opening bracket, unaligned.  On OSX the patch looks well formed.
Comment 24 Patrick McManus [:mcmanus] 2011-03-09 12:21:20 PST
(In reply to comment #19)

> >+void
> >+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon()
> >+{
> >+    nsCOMPtr<nsIAsyncOutputStream> primary, secondary;
> >+    LOG(("nsHalfOpen::Abandon %x %s", this, mEnt->mConnInfo->Host()));
> >+    if (mStreamOut) {
> >+        gHttpHandler->ConnMgr()->RecvdConnect();
> >+        primary.swap(mStreamOut);
> >+    }
> >+    if (mBackupStreamOut) {
> >+        gHttpHandler->ConnMgr()->RecvdConnect();
> >+        secondary.swap(mBackupStreamOut);
> >+    }
> >+    if (mSynTimer) {
> >+        mSynTimer->Cancel();
> >+        mSynTimer = nsnull;
> >+    }
> >+    
> >+    mEnt = nsnull;
> >+    if (primary)
> >+        primary->AsyncWait(nsnull, 0, 0, nsnull);
> >+    if (secondary)
> >+        secondary->AsyncWait(nsnull, 0, 0, nsnull);
> >+}
> 
> If mSynTimer is the only referrer to this nsHalfOpenSocket object then you will
> crash on the 'mEnt = nsnull;' line.  

I don't think that is true - the function is arranged to not touch the object after the mEnt = nsnull line.


>Use nsRefPt<nsHalfOpenSocket>
> kungFuDeathGrip(this) here to protect against early release

However, I like your suggestion. So I will do that (calling it deleteProtector instead even though I am aware of the kungFu precedent) and also get rid of the primary, and secondary temporary stack variables which will no longer be necessary.
Comment 25 Honza Bambas (:mayhemer) 2011-03-09 12:50:37 PST
(In reply to comment #24)
> calling it deleteProtector

Please use kungFuDeathGrip.  It is common in the source code and everybody then easily recognize what is going on.
Comment 26 Patrick McManus [:mcmanus] 2011-03-10 06:39:30 PST
(In reply to comment #19)

> 
> How far are you with the tests for this functionality?

Because we can't seem to blackbox test this due to interactions with network equipment, I'm working on a handful of tests which directly insturment the code and more or less simulate similar events to the packet loss.

I'm really trying to keep it to the minimum as it has the potential to be invasive and I'm not convinced 100% of their value anyhow. (I don't mean of tests in general, I mean these tests which are just making rough simulations of what would happen in a packet drop situation). but they're coming - now that I have a websockets draft out, they'll come faster :)

Of course - Now I also have to work through the other 9 or 10 pipelining patches and remove PR_Now() :)

I have hand tested all of this using iptables and null routes, so that's something good.
Comment 27 Patrick McManus [:mcmanus] 2011-03-10 06:41:15 PST
Created attachment 518362 [details] [diff] [review]
syn retry v4

updates for review comments modulo c24
Comment 28 Honza Bambas (:mayhemer) 2011-03-17 13:22:49 PDT
Comment on attachment 518362 [details] [diff] [review]
syn retry v4

>+    LOG(("nsHalfOpenSocket::nsHalfOpenSocket [this=%p trans=%p ent=%s]\n",
>+         this, trans, ent->mConnInfo->Host()));

Please change the ctor/dtor logs to "Creating .." and "Destroying .." as nsHttpConnection and other classes do.

>+    LOG(("nsHalfOpenSocket::~nsHalfOpenSocket [this=%p]\n"));

You forgot to add 'this' as an arg.

>+    LOG(("nsHalfOpenSocket::Abandon [this=%p ent=%s]",
>+         mEnt->mConnInfo->Host()));

'this' missing among args.

>+        LOG(("nsHalfOpenSocket::OnOutputStreamReady %p "
>+             "Backup connection timer canceled\n", this));

It might be enough to just log "   backup connection timer canceled\n" in these nested logs, but up to you.

r=honzab with those nits.
Comment 29 Honza Bambas (:mayhemer) 2011-03-17 13:29:33 PDT
Comment on attachment 518362 [details] [diff] [review]
syn retry v4

Forgot to mark this r+ first.

However!  I want to re-check the code you handle PR_IntervalNow().  I'll let you know soon.

For now please consider the patch as unreviewed.
Comment 30 Honza Bambas (:mayhemer) 2011-03-17 13:52:17 PDT
PR_IntervalNow() may overflow (overlap).  Maybe what I want to say will be better explained by a graph:

      B  2  C                     1   A
   [XXXXX|---------  ... ---------|XXXXX]
   
   
[  ] is the range of PR_IntervalValues() - actually the whole PRUint64 or 32, not sure at the moment.

XXXXXXXXXX is the delay to not consider the connection as reused

1 is the time you mark a connection to be considered as stall after a certain amount of time

2 is a value you then have stored in mConsiderReusedAfter, with v4 of the patch

A,B,C are times you check for a connection to be considered stall.  Only at time C your check must return true.  But your code will also return true for time A, which is wrong.

The solution is:
- store the time 1 to a field name e.g. mEpoch, i.e. do mEpoch = PR_IntervalNow()
- if you want the time for consideration to reuse be parametric, then store also the time you want for the delay ; you may use your new member for that
- when checking you have to do the following math:

    isReused/Stall = ((PR_IntervalNow() - mEpoch) >= mConsiderReusedAfter);

the subtraction ensure correct handling of the overlap.

r=honzab with that fixed as well.
Comment 31 Patrick McManus [:mcmanus] 2011-03-21 08:10:54 PDT
Created attachment 520642 [details] [diff] [review]
syn retry v5

r=honzab

updates from c28 and c30

I really appreciated comment 30 btw. you might want to take a quick look to make sure I got it right, but I'm confident in it.
Comment 32 Honza Bambas (:mayhemer) 2011-03-21 16:01:55 PDT
Comment on attachment 520642 [details] [diff] [review]
syn retry v5

Looks good.

>+nsHttpConnection::IsReused()
>+{
>+    if (mIsReused) return PR_TRUE;
>+    if (!mConsiderReusedAfterInterval) return PR_FALSE;

Maybe change to

if (mIsReused) 
  return PR_TRUE;

if (!mConsiderReusedAfterInterval)
  return PR_FALSE;

before landing.  Better for debugging.
Comment 33 Patrick McManus [:mcmanus] 2011-03-22 11:48:33 PDT
Created attachment 520980 [details] [diff] [review]
syn retry v6

r=honzab

whitespace updates from c32
Comment 34 tenzoKritz 2011-03-30 08:25:35 PDT
Is this not going in to Fx5?
Comment 35 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-03-31 12:34:54 PDT
Patrick, you should add this to your ~/.hgrc:

 [defaults]
 qnew = -U

so your patches will have From lines....
Comment 36 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-03-31 22:02:35 PDT
http://hg.mozilla.org/projects/cedar/rev/1714b4facb21
Comment 37 :Ehsan Akhgari (out sick) 2011-04-01 13:36:06 PDT
http://hg.mozilla.org/mozilla-central/rev/faa363b57d48

dholbert said that this breaks his build with gcc 4.5.
Comment 38 Daniel Holbert [:dholbert] 2011-04-01 13:43:35 PDT
Yeah - the build failure is:
> nsHttpConnectionMgr.cpp: In member function ‘virtual nsresult nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream*)’:
> nsHttpConnectionMgr.cpp:1411:139: error: cannot pass objects of non-trivially-copyable type ‘class nsRefPtr<nsHttpConnection>’ through ‘...’
> nsHttpConnectionMgr.cpp:1442:141: error: cannot pass objects of non-trivially-copyable type ‘class nsRefPtr<nsHttpConnection>’ through ‘...’

The failure is from this code:
> 1410 nsRefPtr<nsHttpConnection> conn = new nsHttpConnection();
> 1411 LOG(("nsHalfOpenSocket::OnOutputStreamReady "
> 1412       "Created new nshttpconnection %p\n", conn));

I think s/conn/conn.get()/ should fix this?  Trying locally...
Comment 39 Daniel Holbert [:dholbert] 2011-04-01 13:49:36 PDT
Created attachment 523673 [details] [diff] [review]
followup to fix gcc 4.5 issue

Yup, .get() fixes this.  Here's the followup patch.
Comment 40 :Ehsan Akhgari (out sick) 2011-04-01 14:36:52 PDT
Comment on attachment 523673 [details] [diff] [review]
followup to fix gcc 4.5 issue

OMG, don't know how I missed this!
Comment 41 Daniel Holbert [:dholbert] 2011-04-01 14:45:43 PDT
Landed followup: http://hg.mozilla.org/mozilla-central/rev/df19f4679db1
Comment 42 Bill Gianopoulos [:WG9s] 2011-04-02 06:48:47 PDT
Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger SYN flood blocking on the site you are connecting to and be really counter productive.
Comment 43 Patrick McManus [:mcmanus] 2011-04-02 08:30:23 PDT
(In reply to comment #42)
> Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger
> SYN flood blocking on the site you are connecting to and be really counter
> productive.

This code doesn't actually change the total maximum number of outstanding syns to any host by more than 1, so I'm not worried.
Comment 44 Patrick McManus [:mcmanus] 2011-04-02 08:32:34 PDT
ehsan, daniel, thanks for landing this and fixing the .get() while I was traveling.. a nice surprise to see in my mailbox this morning.
Comment 45 Bill Gianopoulos [:WG9s] 2011-04-02 08:35:23 PDT
(In reply to comment #43)
> (In reply to comment #42)
> > Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger
> > SYN flood blocking on the site you are connecting to and be really counter
> > productive.
> 
> This code doesn't actually change the total maximum number of outstanding syns
> to any host by more than 1, so I'm not worried.

That is probably not going to trigger SYN flood avoidance then.
Comment 46 Jo Hermans 2011-04-02 11:35:23 PDT
(In reply to comment #42)
> Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger
> SYN flood blocking on the site you are connecting to and be really counter
> productive.

I work for a major router manufacturer, but I can assure you that it won't trigger the SYN-flood protection. We're not counting the number of SYN's per second, but the number of connections. Otherwise you're only protected against SYN attack, not against attacks that open a regular connection, then nothing more. Like any garden variety script-kiddie will do.

Note that there are lots of devices that don't respect the 3-second timer anyway, changing that won't cause bug problems.
Comment 47 Eric Shepherd [:sheppy] 2011-04-13 02:11:12 PDT
Actually, after further review, I don't think this really does impact docs, so removing the doc needed keyword. If I'm mistaken, please explain how. :)
Comment 48 Masatoshi Kimura [:emk] 2011-07-01 13:04:27 PDT
Actually, this change missed the Fx5 train.
Comment 49 Tobias Oberstein 2011-07-31 02:47:18 PDT
I have a question if this feature might be related to the following behavior with WebSockets in FF:

When WebSocket is opened from JS, FF will connect using 1 TCP connection, perform WebSockets handshake, and immediately after the handshake has finished open a 2nd TCP connection to the server.

On that 2nd TCP, no data is sent by FF at all, and the TCP is actively closed by FF after approx. 5s.

My concern is: there might be WebSockets servers which want to implement protection mechanisms like: block any IP from which a) TCP was accepted, but WebSockets handshake was never finished or for which b) no data is received after TCP open within N seconds.

There is a little bit more descr. on the third comment on https://bugzilla.mozilla.org/show_bug.cgi?id=664344

Maybe this is all unrelated to the feature here .. but I thought it might be ok to ask.
Comment 50 Patrick McManus [:mcmanus] 2011-07-31 06:10:17 PDT
we'll deal with this in the websockets bug, not here.
Comment 51 Honza Bambas (:mayhemer) 2011-08-16 09:06:44 PDT
Bug 679090, BTW, shows that the timer implementing TCP early retry is started at bad time.  It should start when CONNECTING_TO state is observed on the primary socket.  Also, I think that simply when any other state is observed the timer has to be canceled (e.g. when connection to a host fail) and restarted on any subsequent CONNECTING_TO.

I have also seen we open significant number of connections that are never used when acceleration is used.

Based on that, we should check the code for HTTP acceleration again and make it more robust.

Note You need to log in before you can comment on or make changes to this bug.