623948 - Re-Implement HTTP Connection Acceleration (i.e. Syn Retry)

Assignee

Description

•

14 years ago

This feature was briefly implemented as part of Gecko 2.0 and removed before release. It can be re added in version.next with the following changes: * tests (was bug 623201) * investigate client certificate issues (was bug 623785) * rearrangement to drive the connections from the connection manager code instead of the nsHttpConnection code. (613977 comment 79) The backout bug and patch are found in bug 623921.. the actual commit was http://hg.mozilla.org/mozilla-central/rev/c1b1fde638f9 The background information on this feature (from its original bug 592284) is: Some packets are more painful to lose than others. In particular, the SYN setting up an HTTP channel is really painful because the OS relies on large constant values before resending it. The values are different between platforms, but sending a retry after 3 seconds and another at the 9 second mark (6 seconds after the first retry) is typical. Obviously a 3 second blocker in reaction to a single lost packet is very painful. Chrome still lets the OS set the retry rate, but if 250ms expires after sending the first connection packet it opens a second parallel connection. If the problem is just a normal packet loss that happened to impact the first syn, then things get going well. It makes a lot of sense - this is a very small amount of very useful traffic, so it is worth the risk of duplication. This is the Firefox pattern: 00:00:00.000000 IP 192.168.16.214.53431 > 192.168.37.1.80: 00:00:02.999657 IP 192.168.16.214.53431 > 192.168.37.1.80: 00:00:08.999659 IP 192.168.16.214.53431 > 192.168.37.1.80: This is the Chrome pattern: 00:00:00.000000 IP 192.168.16.214.53429 > 192.168.37.1.80: 00:00:00.251162 IP 192.168.16.214.53430 > 192.168.37.1.80: 00:00:02.991019 IP 192.168.16.214.53429 > 192.168.37.1.80: 00:00:03.251023 IP 192.168.16.214.53430 > 192.168.37.1.80: 00:00:08.991023 IP 192.168.16.214.53429 > 192.168.37.1.80: 00:00:09.251025 IP 192.168.16.214.53430 > 192.168.37.1.80: Some anecdotal positive performance data on this feature is found here: http://bitsup.blogspot.com/2010/12/accelerated-connection-retry-for-http.html

Patrick McManus [:mcmanus]

Assignee

Updated

•

14 years ago

Assignee: nobody → mcmanus

Status: NEW → ASSIGNED

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

14 years ago

Depends on: post2.0

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 3

•

14 years ago

Why not be more aggressive than Chrome? e.g. upon navigating to a document, open two connections immediately to the document's host, on the assumption that we are going to open a second connection for sub-resources (scripts, css, images, etc.) from the same host? Note that I'm not suggesting we should always open 2x connections for each connection we need, but only do so when we have a reasonable expectation of opening a second connection to the same server. Or, even extend this so that, for a cached document, we immediately open up the same number of connections to the same host that we used in the last connection for that document.

Patrick McManus [:mcmanus]

Assignee

Comment 4

•

14 years ago

(In reply to comment #3) > Why not be more aggressive than Chrome? e.g. upon navigating to a document, > open two connections immediately to the document's host, on the assumption that > we are going to open a second connection for sub-resources (scripts, css, > images, etc.) from the same host? > right - this whole area starts to drift into prefetching connections pretty quickly and certainly even limiting ourselves to timeout based retries sometimes picks that up as a side benefit. And that's probably a good idea, that I'm more willing to explore in ff.next than in ff.4.. one case I worry about is "redirect to canonical hostname" which probably only uses 1 connection. Same for a bunch of cases of revalidation - but at least we can guess that one at request time and not prefetch accordingly. in any event, there are all sorts of variations that can be tried. if we are going to start speculatively opening connections we need to be more aggressive about actively closing them when they are not needed which we are extremely casual about right now.

Emanuel Hoogeveen [:ehoogeveen]

Comment 5

•

14 years ago

Is that hardcoded maximum connection limit of 50 still in effect? I don't remember ever seeing a bug filed specifically to address it, but then I'm not exactly in the loop..

Patrick McManus [:mcmanus]

Assignee

Comment 6

•

14 years ago

(In reply to comment #5) > Is that hardcoded maximum connection limit of 50 still in effect? yes > I don't > remember ever seeing a bug filed specifically to address it, but then I'm not > exactly in the loop.. 607741

Jürgen Kreileder

Comment 7

•

14 years ago

Note that bug 623785 wasn't caused by this.

Daniel C

Updated

•

14 years ago

Depends on: 623921

Patrick McManus [:mcmanus]

Assignee

Comment 8

•

14 years ago

Attached patch syn retry v1 (obsolete) — Details — Splinter Review

Patch refactored to be driven from connection manager. This has a really cool side effect for even normal (non retry) connections: if a persistent connection becomes available for use after we start to open the TCP connection, but before we complete it then we will use the pconn instead of waiting for connection completion as we had been doing in the past. Anecdotally, this seems to happen to a lot of our new connections: http://bitsup.blogspot.com/2011/02/separating-http-connections-from-tcp.html

Attachment #509492 - Flags: feedback?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Comment 9

•

14 years ago

Attached patch syn retry v2 (obsolete) — Details — Splinter Review

Attachment #509492 - Attachment is obsolete: true

Attachment #510735 - Flags: review?(honzab.moz)

Attachment #509492 - Flags: feedback?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Updated

•

14 years ago

Depends on: 631801

Patrick McManus [:mcmanus]

Assignee

Updated

•

14 years ago

Blocks: 634278

Honza Bambas (:mayhemer)

Comment 10

•

14 years ago

Comment on attachment 510735 [details] [diff] [review] syn retry v2 > nsSocketTransport::SetSecurityCallbacks(nsIInterfaceRequestor *callbacks) >+ SOCKET_LOG(("Reset callbacks for secinfo=%p callbacks=%p\n", mSecInfo.get(), mCallbacks.get())); 80 lines please. >+nsHttpConnection::Init(nsHttpConnectionInfo *info, PRUint16 maxHangTime, >+ nsISocketTransport *transport, nsIAsyncInputStream *instream, >+ nsIAsyncOutputStream *outstream, >+ nsIInterfaceRequestor *callbacks, >+ nsIEventTarget *callbackTarget) Indention please. >+ >+ Remove these new lines. > { >+ NS_ABORT_IF_FALSE (transport && instream && outstream, "init precond"); Remove the space before the opening bracket and finish the text please. >+ mSocketTransport = transport; >+ mSocketIn = instream; >+ mSocketOut = outstream; >+ nsresult rv = mSocketTransport->SetEventSink(this, nsnull); >+ NS_ENSURE_SUCCESS(rv, rv); >+ mCallbacks = callbacks; >+ mCallbackTarget = callbackTarget; >+ rv = mSocketTransport->SetSecurityCallbacks(this); >+ NS_ENSURE_SUCCESS(rv, rv); >+ This is of a personal taste, but I would rather add new lines between blocks that logically belongs together. At least after NS_ENSUREs it became a habit. >+++ b/netwerk/protocol/http/nsHttpConnectionMgr.cpp > if (!offline) { > sts = do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv); >+ if (!mSTS && sts) { >+ mSTS = do_QueryInterface(sts); >+ NS_ABORT_IF_FALSE(mSTS.get(), "nonsensical failed qi"); >+ } Why exactly is this needed? Or is that a left over? >@@ -469,16 +479,17 @@ nsHttpConnectionMgr::PruneDeadConnection > // if this entry is empty, then we can remove it. > if (ent->mIdleConns.Length() == 0 && > ent->mActiveConns.Length() == 0 && >+ ent->mHalfOpens.Length() == 0 && > ent->mPendingQ.Length() == 0) { Indention of '== 0 &&' ? >+nsHttpConnectionMgr::CreateTransport(nsConnectionEntry *ent, >+ NS_ABORT_IF_FALSE (mSTS, "no socket transport?"); No space before ( >+ >+ nsRefPtr<nsHalfOpenSocket> sock = new nsHalfOpenSocket(ent, this, trans); >+ nsresult rv = sock->SetupPrimaryStreams(); >+ NS_ENSURE_SUCCESS(rv, rv); SetupPrimaryStreams() may create circular references on the 'socket', AsyncWait does it. In case of failure, you should call Abandon(), or ensure that SetupPrimaryStreams() does release all circulars it self. >+ StartedConnect(); Could this potentially be rather called from nsHalfOpenSocket to leave it enclosed a bit? >+ sock->SetupBackupTimer(); >+ ent->mHalfOpens.AppendElement(sock); >+ return NS_OK; >+} Maybe some new lines here as well? >+nsHttpConnectionMgr::nsHalfOpenSocket::SetupBackupTimer() >+ mSynTimer->InitWithFuncCallback(SynTimeout, this, >+ timeout, nsITimer::TYPE_ONE_SHOT); >+ NS_ADDREF(this); Rather let the object implement nsIObserver and then use mSynTimer->Init, it does referencing for you and the code is cleaner. >+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon() >+{ >+ PRInt32 refs = 0; >+ >+ if (mStreamOut) { >+ mStreamOut->AsyncWait(nsnull, 0, 0, nsnull); >+ mMgr->RecvdConnect(); >+ mStreamOut = nsnull; >+ refs++; Why is actually needed to AddRef manually in SetupStreams() when the stream holds a ref to the callback (this object) ? >+ mEnt = nsnull; >+ for (PRInt32 i = 0; i < refs; i++) { >+ NS_RELEASE_THIS(); >+ } I would like to get rid of coding like this, even though it works correctly. >+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out) >+ conn->SetIsReused(PR_TRUE); I really want to do this a different way. This might lead to duplicate requests being sent and processed by the server. I would rather allow restart of a transaction when the connection was lingering for a while. You are probably a more experienced to specify the time and decide, but something around a second or two could make it? This is important. >+ class nsHalfOpenSocket : public nsIOutputStreamCallback, >+ nsHttpConnectionMgr *mMgr; You probably don't need to reference the manager. It is accessible through gHttpHandler->ConnMgr(). > nsHttpConnectionInfo *mConnInfo; > nsTArray<nsHttpTransaction*> mPendingQ; // pending transaction queue > nsTArray<nsHttpConnection*> mActiveConns; // active connections > nsTArray<nsHttpConnection*> mIdleConns; // idle persistent connections >+ nsTArray<nsHalfOpenSocket*> mHalfOpens; I would love to turn these all to nsTArray< nsRefPtr<ns*> > one day. But not in this bug, it is complex a lot already. In general, please add LOGs to your new code. When something goes wrong, we have to be able to trace it. r- for the IsReused flag thing. So, I did testing on this patch, and it works well. However, when I was suggesting to move this to the connection manager and using the code we already have I hoped the patch will simplify. Was there a particular reason to create a new class for a half open connection and introduce a new array for it? I played with this a bit as well and I created a patch that is throwing all transaction to the pending queue and just ensures connections for all of them (if in the limit) which pop transactions from PendingQ in OnOutputStreamReady. There is relation one nsHttpTransaction <- two nsHttpConnections each with its own socket transport. Connections are marked as 'half-open' until OOSR() has been reached. I wanted to check it is doable and possibly suggest to turn rather the way of a simpler patch, however I would need more time to stabilize it and it would end up doing your work, what no one wants. Just enfolding that to make a note we could potentially go simpler next time when doing changes like these.

Attachment #510735 - Flags: review?(honzab.moz) → review-

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 11

•

14 years ago

(In reply to comment #3) > Why not be more aggressive than Chrome? e.g. upon navigating to a document, > open two connections immediately to the document's host, on the assumption > that we are going to open a second connection for sub-resources (scripts, > css, images, etc.) from the same host? IE9 and Chrome do this, according to this blog post: http://www.belshe.com/2011/02/10/the-era-of-browser-preconnect/ It doesn't seem great to wait 250ms for something that we could wait 0ms to do. If we had preconnect, would we even need syn retry? Could we make zero the default value of network.http.connection-retry-timeout?

Patrick McManus [:mcmanus]

Assignee

Comment 12

•

14 years ago

(In reply to comment #11) > IE9 and Chrome do this, according to this blog post: see 632948

Patrick McManus [:mcmanus]

Assignee

Comment 13

•

14 years ago

> see 632948 That was dumb. see 634278

Patrick McManus [:mcmanus]

Assignee

Comment 14

•

14 years ago

(In reply to comment #10) > >+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out) > >+ conn->SetIsReused(PR_TRUE); > > I really want to do this a different way. This might lead to duplicate > requests being sent and processed by the server. I would rather allow restart > of a transaction when the connection was lingering for a while. You are > probably a more experienced to specify the time and decide, but something > around a second or two could make it? This is important. > can you be more clear on what your concern is here? This is exactly the same algorithm as the code that was previously merged which I do not believe had a problem. If the extra connection is not used for 5 seconds, for whatever reason, it is thrown away... and this code is only governing connections that turn out to be superfluous. Marking it reused allows us to recover from the race condition of receiving a FIN while our request is in flight and only in those cases where there has been no data recvd. This is exactly what happens on every use of a persistent connection, which is well over 50% of all transactions in my observations - there is no real risk of a fin when we send the request as soon as the connection is open but if we wait even a second or two it becomes a rare possibility that the server generated fin and our request pass as ships in the night. Normally you have 30 or 60 seconds before the close, but the other day I saw nginx configured with a non-standard timeout of 4 seconds. I don't think we need to shoot for 100% under the minimum, 5 seconds captures the vast majority of cases, handling the close gracefully with isreused() is enough. landing 604796 would help avoid most of the restarts, right now we see a bunch of them because we only check for fins on idle connections on a timer instead of the fin arrival event. Admittedly they cause a restart() in firefox, but the request never makes it to the wire the first time for this particular class of transaction. I'm not sure what the alternative is. If we let the transaction that gets the FIN race condition generate an error (instead of restarting because the connection is not marked isreused) then we open bug 614677 again. Arguments about resending POSTs don't resonate with me for a whole bunch of reasons, but primarily because we already do exactly this in the incredibly common scenario of an idle persistent connection reuse. We're not breaking any new ground, these superfluous connections are just more idle pconns ready for use.

Patrick McManus [:mcmanus]

Assignee

Comment 15

•

14 years ago

> Was there a > particular reason to create a new class for a half open connection and > introduce a new array for it? > it seems a natural place to store the redundant data about primary and backup transports and streams.. I tried to isolate the nshttpconnection from all of that in this version of the patch. More or less managing the sockets was new functionality for the CM to have and I kind of like having it live mostly in its own class. >>+ mStreamOut->AsyncWait(nsnull, 0, 0, nsnull); > Why is actually needed to AddRef manually in SetupStreams() when the stream > holds a ref to the callback (this objec I didn't want the asyncwait cancellation (asyncwait(nsnull,0,0,nsnull)) to drop the last ref on the object part way through abandon().. I will reorganize that through a series of local variable swaps that will make abandon() safe.. do you happen to know if nsItimer holds a ref on its closure? (I don't think it does off the top of my head).

Honza Bambas (:mayhemer)

Comment 16

•

14 years ago

(In reply to comment #14) > (In reply to comment #10) I understand why you set the flag. I'll try to explain my concern differently: In the current code, w/o this patch, we allow automatic restart on network failure only if no data has been sent AND no data received or the connection has already been passed a request/response transaction (is reused). It means that we are using the connection for a SECOND request and it failed probably because it was a stall connection. With your patch, we also allow restart for a connection that has never been used before (no data passed up nor down through it) and lingers for 0 to 5000 milliseconds. Important is that the connection may be used immediately (very close to 0 ms after it has been established). In that case when a failure is detected, we cannot (must not) assume the connection was a stall connection. There is a risk we have sent the data, the server processed the request, but the connection has been dropped (before the server could send back the response) from a real reason, not because the connection stalled. Then we must not restart, but rather say there were an error on the wire. So, my suggestion is to allow restart if and only if the connection has been lingering unused (better say: never used) for some amount of time. The time is more up to you then me, because you have probably a better real-world experience with servers and accelerators out there, for instance see your bug 613977 comment 19, paragraph 3. If it is still unclear or you believe I'm not right, lets discuss it further by mail out of this bug. (In reply to comment #15) > it seems a natural place to store the redundant Yep, I don't have a strong opinion, just asking. Anyway, have you considered to create a second nsHttpConnection object for the backup conn rather then just a transport held in mBack* members of the new object? You always create nsHttpConnection for an established socket transport anyway. It seemed to me much simpler approach and I expected you did it that way. Actually a lot of code could be removed and the patch be generally simplified. > through a series of local variable swaps that will make abandon() safe.. do you > happen to know if nsItimer holds a ref on its closure? (I don't think it does > off the top of my head). Yep, just do nsRefPtr<nsHalfOpenSocket> kungFuDeathGrip(this) if needed. nsITimer doesn't hold a ref to the closure when using InitWithCallback, it sees it as a void pointer only. However, using nsIObserver and Init() method does it, so you don't need to bother with proper placing of NS_ADDREF()/NS_RELEASE(). You just need to cancel the timer in destructor, what you already do I think.

Patrick McManus [:mcmanus]

Assignee

Comment 17

•

14 years ago

(In reply to comment #16) > > So, my suggestion is to allow restart if and only if the connection has been > lingering unused (better say: never used) for some amount of time. ah this helps a lot - thank you. I was confused by the way 'lingering' was used. Its a reasonable extra level of cautiousness.

Patrick McManus [:mcmanus]

Assignee

Comment 18

•

14 years ago

Attached patch syn retry v3 (obsolete) — Details — Splinter Review

updated to reflect review comments

Attachment #510735 - Attachment is obsolete: true

Attachment #513192 - Flags: review?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Updated

•

14 years ago

Blocks: 595316

Patrick McManus [:mcmanus]

Assignee

Updated

•

14 years ago

Blocks: 603503

Honza Bambas (:mayhemer)

Comment 19

•

14 years ago

Comment on attachment 513192 [details] [diff] [review] syn retry v3 ># HG changeset patch ># Parent 19d6c295cfec3fb8e3e16dc24abacf6656e4ac49 >Accelerate TCP connection retries in HTTP > >Losing a TCP SYN requires a long painful (typically 3 second) delay >before being retried. This patch creates a second parallel connection >attempt for any nsHttpConnection which has not become writable before >a timeout occurs. > >If you assume .5% packet loss, this converts a full 3 second delay >from a 1 in 200 event into a 1 in 40,000 event. > >Whichever connection establishes itself first is used. If another one >has been started and it does connect before the one being used is >closed then the extra one is handed to the connection manager for use >by a different transaction - essentially a persistent connection with >0 previous transactions on it. (Another way to think about is >pre-fetching a 3WHS on a high latency connection). > >The pref network.http.connection-retry-timeout controls the amount of >time in ms to wait for success on the initial connection before beginning >the second one. Setting it to 0 disables the parallel connection, the >default is 250. > Personally, I don't think we want such a long comment in the push log. Reference to the bug # is enough in general. >+nsHttpConnection::Init(nsHttpConnectionInfo *info, > LOG(("nsHttpConnection::Init [this=%x]\n", this)); Add reference to the transport and streams to the log. > nsHttpConnection::CanReuse() > { >+ // An idle persistent connection should not have data waiting to be read >+ // before a request is sent. Data here is likely a 408 timeout response >+ // which we would deal with later on through the restart logic, but that >+ // path is more expensive than just closing the socket now. SSL check can >+ // be removed with fixing of 631801 >+ >+ PRUint32 dataSize; >+ if (rv && mSocketIn && !mConnInfo->UsingSSL() && >+ NS_SUCCEEDED(mSocketIn->Available(&dataSize)) && dataSize) { >+ LOG(("Socket not reusable because read data pending on it: %s %d\n", >+ mConnInfo->Host(), dataSize)); >+ rv = PR_FALSE; >+ } >+ return rv; Why wasn't this code needed in the previous version of the patch? >+PRBool >+nsHttpConnection::IsReused() >+{ >+ return mIsReused || >+ (mConsiderReusedAfter && (PR_Now() >= mConsiderReusedAfter)); >+} >+ >+void >+nsHttpConnection::SetIsReusedAfter(PRUint32 afterMiliseconds) >+{ >+ mConsiderReusedAfter = PR_Now() + afterMiliseconds * PR_USEC_PER_MSEC; >+} >+ Please don't use PR_Now(), it is not monotonically increasing. Use PR_IntervalNow() or mozilla::TimeStamp. The letter only if doesn't mean to modify dependencies. I understand we are using PR_Now() in the http code, that should probably be fixed as well, but not in this bug. Let's just not repeat the same mistake for a new code. >+ void SetIsReused(PRBool val) {mIsReused = val;} You don't need this method. >+nsresult >+nsHttpConnectionMgr:: >+nsHalfOpenSocket::SetupStreams(nsISocketTransport **transport, >+ nsIAsyncInputStream **instream, >+ nsIAsyncOutputStream **outstream) Wrong indention. Preserve new lines in this method that were in nsHttpConnection::CreateTransport. This is not very well readable code. >+nsresult >+nsHttpConnectionMgr::nsHalfOpenSocket::SetupPrimaryStreams() >+{ >+ nsresult rv = SetupStreams(getter_AddRefs(mSocketTransport), >+ getter_AddRefs(mStreamIn), >+ getter_AddRefs(mStreamOut)); >+ LOG(("SetupPrimaryStream %x %s %X", this, mEnt->mConnInfo->Host(), rv)); >+ if (NS_FAILED(rv)) { >+ if (mStreamOut) { >+ gHttpHandler->ConnMgr()->RecvdConnect(); You probably don't need to call RecvdConnect() here. If SetupStreams() fails it won't increase the counter. The same for nsHttpConnectionMgr::nsHalfOpenSocket::SetupBackupStreams() >+void >+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon() >+{ >+ nsCOMPtr<nsIAsyncOutputStream> primary, secondary; >+ LOG(("nsHalfOpen::Abandon %x %s", this, mEnt->mConnInfo->Host())); >+ if (mStreamOut) { >+ gHttpHandler->ConnMgr()->RecvdConnect(); >+ primary.swap(mStreamOut); >+ } >+ if (mBackupStreamOut) { >+ gHttpHandler->ConnMgr()->RecvdConnect(); >+ secondary.swap(mBackupStreamOut); >+ } >+ if (mSynTimer) { >+ mSynTimer->Cancel(); >+ mSynTimer = nsnull; >+ } >+ >+ mEnt = nsnull; >+ if (primary) >+ primary->AsyncWait(nsnull, 0, 0, nsnull); >+ if (secondary) >+ secondary->AsyncWait(nsnull, 0, 0, nsnull); >+} If mSynTimer is the only referrer to this nsHalfOpenSocket object then you will crash on the 'mEnt = nsnull;' line. Use nsRefPt<nsHalfOpenSocket> kungFuDeathGrip(this) here to protect against early release. >+NS_IMETHODIMP >+nsHttpConnectionMgr:: >+nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream *out) >+ if (out == mStreamOut) { >+ rv = conn->Init(mEnt->mConnInfo, >+ gHttpHandler->ConnMgr()->mMaxRequestDelay, >+ mSocketTransport, mStreamIn, mStreamOut, >+ callbacks, callbackTarget); >+ mStreamOut = nsnull; >+ } >+ else { >+ rv = conn->Init(mEnt->mConnInfo, >+ gHttpHandler->ConnMgr()->mMaxRequestDelay, >+ mBackupTransport, mBackupStreamIn, mBackupStreamOut, >+ callbacks, callbackTarget); >+ mBackupStreamOut = nsnull; >+ } >+ >+ if (NS_FAILED(rv)) >+ return rv; Not sure about this, but shouldn't you throw mSocketTransport (or mBackupTransport) and streams away here? There might probably still be a cycle between this nsHalfOpenSocket and m*Transport's progress sink or security callbacks. Worth to double check this. According to LOGs() you have added: - use following format pattern: "FullClassName::FullMethodName [this=%p ent=%s rv=%X]" - don't use %x for logging pointer in your new logs or logs you modify - add logs to constructor and destructor of your new class - add more logs to nsHalfOpenSocket::OnOutputStreamReady logic: - that you are canceling the timer - log the pointer value of the new connection object you have created - definitely that conn->Init failed and with what result As a transaction might get progress information from two separate transports, it might get it duplicated and messed in order (from point of view of the consumer). This might confuse consumers like net monitor of Firebug. We should fix this, but probably not in this bug. It's way too complicated enough. The way we should fix it should be to don't allow a 'lower' state be propagated through the same transaction that has already observed a 'higher' state, but this is just the first thought. How far are you with the tests for this functionality?

Attachment #513192 - Flags: review?(honzab.moz)

Patrick McManus [:mcmanus]

Assignee

Comment 20

•

14 years ago

(In reply to comment #19) > >+ PRUint32 dataSize; > >+ if (rv && mSocketIn && !mConnInfo->UsingSSL() && > >+ NS_SUCCEEDED(mSocketIn->Available(&dataSize)) && dataSize) { > >+ LOG(("Socket not reusable because read data pending on it: %s %d\n", > >+ mConnInfo->Host(), dataSize)); > >+ rv = PR_FALSE; > >+ } > >+ return rv; > > Why wasn't this code needed in the previous version of the patch? > because this version is better :) This code is an optimization, the previous version worked. This catches the case of some servers that send a timeout response (typically a 408) without a request being sent, which is a bug imho but that's kind of beside the point. If we reuse that connection we do deal with the 408 fine - but it involves restart()ing the transaction and sending it twice. This code lets us avoid that rigamarole when its clear the connection isn't in a clean state for reuse. Due to races in when that 408 stuff arrives vs when the request is sent, all the old code is still needed, but this catches the base case and just says - "use a different connection".

Honza Bambas (:mayhemer)

Comment 21

•

14 years ago

OK, so this piece of code is actually not needed for the patch, right? We could actually land it separately? My point is that you are adding more code to an already reviewed patch. Please try to avoid this next time and rather open a new bug or at least explain your changes in the comment when adding a new patch version. Let's go with this for now also because we have already talked about this optimization. Can you just rename the variable 'rv' to 'canReuse' ?

Patrick McManus [:mcmanus]

Assignee

Comment 22

•

14 years ago

> >+nsresult > >+nsHttpConnectionMgr:: > >+nsHalfOpenSocket::SetupStreams(nsISocketTransport **transport, > >+ nsIAsyncInputStream **instream, > >+ nsIAsyncOutputStream **outstream) > > Wrong indention. > please tell me your preferred style in this case which is 80-col challenged. This is my preference but I'll go whatever way the wind is blowing.

Honza Bambas (:mayhemer)

Comment 23

•

14 years ago

(In reply to comment #22) > please tell me your preferred style in this case which is 80-col challenged. > This is my preference but I'll go whatever way the wind is blowing. Ah! My apologies. This looks like a Firefox/Win7 bug: even a Courier fixed size font used by bugzilla is displayed with a variable width. It appears on my screen like the first letter of the type name on the second line would be under the opening bracket, unaligned. On OSX the patch looks well formed.

Patrick McManus [:mcmanus]

Assignee

Comment 24

•

14 years ago

(In reply to comment #19) > >+void > >+nsHttpConnectionMgr::nsHalfOpenSocket::Abandon() > >+{ > >+ nsCOMPtr<nsIAsyncOutputStream> primary, secondary; > >+ LOG(("nsHalfOpen::Abandon %x %s", this, mEnt->mConnInfo->Host())); > >+ if (mStreamOut) { > >+ gHttpHandler->ConnMgr()->RecvdConnect(); > >+ primary.swap(mStreamOut); > >+ } > >+ if (mBackupStreamOut) { > >+ gHttpHandler->ConnMgr()->RecvdConnect(); > >+ secondary.swap(mBackupStreamOut); > >+ } > >+ if (mSynTimer) { > >+ mSynTimer->Cancel(); > >+ mSynTimer = nsnull; > >+ } > >+ > >+ mEnt = nsnull; > >+ if (primary) > >+ primary->AsyncWait(nsnull, 0, 0, nsnull); > >+ if (secondary) > >+ secondary->AsyncWait(nsnull, 0, 0, nsnull); > >+} > > If mSynTimer is the only referrer to this nsHalfOpenSocket object then you will > crash on the 'mEnt = nsnull;' line. I don't think that is true - the function is arranged to not touch the object after the mEnt = nsnull line. >Use nsRefPt<nsHalfOpenSocket> > kungFuDeathGrip(this) here to protect against early release However, I like your suggestion. So I will do that (calling it deleteProtector instead even though I am aware of the kungFu precedent) and also get rid of the primary, and secondary temporary stack variables which will no longer be necessary.

Honza Bambas (:mayhemer)

Comment 25

•

14 years ago

(In reply to comment #24) > calling it deleteProtector Please use kungFuDeathGrip. It is common in the source code and everybody then easily recognize what is going on.

Patrick McManus [:mcmanus]

Assignee

Comment 26

•

14 years ago

(In reply to comment #19) > > How far are you with the tests for this functionality? Because we can't seem to blackbox test this due to interactions with network equipment, I'm working on a handful of tests which directly insturment the code and more or less simulate similar events to the packet loss. I'm really trying to keep it to the minimum as it has the potential to be invasive and I'm not convinced 100% of their value anyhow. (I don't mean of tests in general, I mean these tests which are just making rough simulations of what would happen in a packet drop situation). but they're coming - now that I have a websockets draft out, they'll come faster :) Of course - Now I also have to work through the other 9 or 10 pipelining patches and remove PR_Now() :) I have hand tested all of this using iptables and null routes, so that's something good.

Patrick McManus [:mcmanus]

Assignee

Comment 27

•

14 years ago

Attached patch syn retry v4 (obsolete) — Details — Splinter Review

updates for review comments modulo c24

Attachment #513192 - Attachment is obsolete: true

Attachment #518362 - Flags: review?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 28

•

14 years ago

Comment on attachment 518362 [details] [diff] [review] syn retry v4 >+ LOG(("nsHalfOpenSocket::nsHalfOpenSocket [this=%p trans=%p ent=%s]\n", >+ this, trans, ent->mConnInfo->Host())); Please change the ctor/dtor logs to "Creating .." and "Destroying .." as nsHttpConnection and other classes do. >+ LOG(("nsHalfOpenSocket::~nsHalfOpenSocket [this=%p]\n")); You forgot to add 'this' as an arg. >+ LOG(("nsHalfOpenSocket::Abandon [this=%p ent=%s]", >+ mEnt->mConnInfo->Host())); 'this' missing among args. >+ LOG(("nsHalfOpenSocket::OnOutputStreamReady %p " >+ "Backup connection timer canceled\n", this)); It might be enough to just log " backup connection timer canceled\n" in these nested logs, but up to you. r=honzab with those nits.

Honza Bambas (:mayhemer)

Comment 29

•

14 years ago

Comment on attachment 518362 [details] [diff] [review] syn retry v4 Forgot to mark this r+ first. However! I want to re-check the code you handle PR_IntervalNow(). I'll let you know soon. For now please consider the patch as unreviewed.

Honza Bambas (:mayhemer)

Comment 30

•

14 years ago

PR_IntervalNow() may overflow (overlap). Maybe what I want to say will be better explained by a graph: B 2 C 1 A [XXXXX|--------- ... ---------|XXXXX] [ ] is the range of PR_IntervalValues() - actually the whole PRUint64 or 32, not sure at the moment. XXXXXXXXXX is the delay to not consider the connection as reused 1 is the time you mark a connection to be considered as stall after a certain amount of time 2 is a value you then have stored in mConsiderReusedAfter, with v4 of the patch A,B,C are times you check for a connection to be considered stall. Only at time C your check must return true. But your code will also return true for time A, which is wrong. The solution is: - store the time 1 to a field name e.g. mEpoch, i.e. do mEpoch = PR_IntervalNow() - if you want the time for consideration to reuse be parametric, then store also the time you want for the delay ; you may use your new member for that - when checking you have to do the following math: isReused/Stall = ((PR_IntervalNow() - mEpoch) >= mConsiderReusedAfter); the subtraction ensure correct handling of the overlap. r=honzab with that fixed as well.

Honza Bambas (:mayhemer)

Updated

•

14 years ago

Attachment #518362 - Flags: review?(honzab.moz) → review+

Patrick McManus [:mcmanus]

Assignee

Comment 31

•

14 years ago

Attached patch syn retry v5 (obsolete) — Details — Splinter Review

r=honzab updates from c28 and c30 I really appreciated comment 30 btw. you might want to take a quick look to make sure I got it right, but I'm confident in it.

Attachment #520642 - Flags: review+

Honza Bambas (:mayhemer)

Comment 32

•

14 years ago

Comment on attachment 520642 [details] [diff] [review] syn retry v5 Looks good. >+nsHttpConnection::IsReused() >+{ >+ if (mIsReused) return PR_TRUE; >+ if (!mConsiderReusedAfterInterval) return PR_FALSE; Maybe change to if (mIsReused) return PR_TRUE; if (!mConsiderReusedAfterInterval) return PR_FALSE; before landing. Better for debugging.

Patrick McManus [:mcmanus]

Assignee

Comment 33

•

14 years ago

Attached patch syn retry v6 — Details — Splinter Review

r=honzab whitespace updates from c32

Attachment #518362 - Attachment is obsolete: true

Attachment #520642 - Attachment is obsolete: true

Attachment #520980 - Flags: review+

tenzoKritz

Comment 34

•

14 years ago

Is this not going in to Fx5?

Boris Zbarsky [:bzbarsky]

Comment 35

•

14 years ago

Patrick, you should add this to your ~/.hgrc: [defaults] qnew = -U so your patches will have From lines....

Boris Zbarsky [:bzbarsky]

Comment 36

•

14 years ago

http://hg.mozilla.org/projects/cedar/rev/1714b4facb21

Flags: in-testsuite?

Whiteboard: fixed-in-cedar

(no longer active)

Comment 37

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/faa363b57d48 dholbert said that this breaks his build with gcc 4.5.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

No longer depends on: post2.0

Resolution: --- → FIXED

Whiteboard: fixed-in-cedar

Target Milestone: --- → mozilla2.2

Daniel Holbert [:dholbert]

Comment 38

•

14 years ago

Yeah - the build failure is: > nsHttpConnectionMgr.cpp: In member function ‘virtual nsresult nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady(nsIAsyncOutputStream*)’: > nsHttpConnectionMgr.cpp:1411:139: error: cannot pass objects of non-trivially-copyable type ‘class nsRefPtr<nsHttpConnection>’ through ‘...’ > nsHttpConnectionMgr.cpp:1442:141: error: cannot pass objects of non-trivially-copyable type ‘class nsRefPtr<nsHttpConnection>’ through ‘...’ The failure is from this code: > 1410 nsRefPtr<nsHttpConnection> conn = new nsHttpConnection(); > 1411 LOG(("nsHalfOpenSocket::OnOutputStreamReady " > 1412 "Created new nshttpconnection %p\n", conn)); I think s/conn/conn.get()/ should fix this? Trying locally...

Daniel Holbert [:dholbert]

Comment 39

•

14 years ago

Attached patch followup to fix gcc 4.5 issue — Details — Splinter Review

Yup, .get() fixes this. Here's the followup patch.

Attachment #523673 - Flags: review?(mcmanus)

(no longer active)

Comment 40

•

14 years ago

Comment on attachment 523673 [details] [diff] [review] followup to fix gcc 4.5 issue OMG, don't know how I missed this!

Attachment #523673 - Flags: review?(mcmanus) → review+

Daniel Holbert [:dholbert]

Comment 41

•

14 years ago

Landed followup: http://hg.mozilla.org/mozilla-central/rev/df19f4679db1

Bill Gianopoulos [:WG9s]

Comment 42

•

14 years ago

Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger SYN flood blocking on the site you are connecting to and be really counter productive.

Patrick McManus [:mcmanus]

Assignee

Comment 43

•

14 years ago

(In reply to comment #42) > Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger > SYN flood blocking on the site you are connecting to and be really counter > productive. This code doesn't actually change the total maximum number of outstanding syns to any host by more than 1, so I'm not worried.

Patrick McManus [:mcmanus]

Assignee

Comment 44

•

14 years ago

ehsan, daniel, thanks for landing this and fixing the .get() while I was traveling.. a nice surprise to see in my mailbox this morning.

Bill Gianopoulos [:WG9s]

Comment 45

•

14 years ago

(In reply to comment #43) > (In reply to comment #42) > > Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger > > SYN flood blocking on the site you are connecting to and be really counter > > productive. > > This code doesn't actually change the total maximum number of outstanding syns > to any host by more than 1, so I'm not worried. That is probably not going to trigger SYN flood avoidance then.

Jo Hermans

Comment 46

•

14 years ago

(In reply to comment #42) > Just to put my $0.02 in here, SYN retry is, unfortunately, likely to trigger > SYN flood blocking on the site you are connecting to and be really counter > productive. I work for a major router manufacturer, but I can assure you that it won't trigger the SYN-flood protection. We're not counting the number of SYN's per second, but the number of connections. Otherwise you're only protected against SYN attack, not against attacks that open a regular connection, then nothing more. Like any garden variety script-kiddie will do. Note that there are lots of devices that don't respect the 3-second timer anyway, changing that won't cause bug problems.

[not reading bugmail]

Updated

•

14 years ago

Depends on: 647574

Eric Shepherd [:sheppy]

Updated

•

14 years ago

Keywords: dev-doc-needed

Eric Shepherd [:sheppy]

Comment 47

•

14 years ago

Actually, after further review, I don't think this really does impact docs, so removing the doc needed keyword. If I'm mistaken, please explain how. :)

Keywords: dev-doc-needed

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Updated

•

14 years ago

Blocks: 651635

Honza Bambas (:mayhemer)

Updated

•

14 years ago

Depends on: 656989

Dave Townsend [:mossop]

Updated

•

14 years ago

Depends on: 623134

Masatoshi Kimura [:emk]

Comment 48

•

14 years ago

Actually, this change missed the Fx5 train.

Target Milestone: mozilla5 → mozilla6

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

14 years ago

Depends on: 671053

Tobias Oberstein

Comment 49

•

14 years ago

I have a question if this feature might be related to the following behavior with WebSockets in FF: When WebSocket is opened from JS, FF will connect using 1 TCP connection, perform WebSockets handshake, and immediately after the handshake has finished open a 2nd TCP connection to the server. On that 2nd TCP, no data is sent by FF at all, and the TCP is actively closed by FF after approx. 5s. My concern is: there might be WebSockets servers which want to implement protection mechanisms like: block any IP from which a) TCP was accepted, but WebSockets handshake was never finished or for which b) no data is received after TCP open within N seconds. There is a little bit more descr. on the third comment on https://bugzilla.mozilla.org/show_bug.cgi?id=664344 Maybe this is all unrelated to the feature here .. but I thought it might be ok to ask.

Patrick McManus [:mcmanus]

Assignee

Comment 50

•

14 years ago

we'll deal with this in the websockets bug, not here.

Honza Bambas (:mayhemer)

Comment 51

•

14 years ago

Bug 679090, BTW, shows that the timer implementing TCP early retry is started at bad time. It should start when CONNECTING_TO state is observed on the primary socket. Also, I think that simply when any other state is observed the timer has to be canceled (e.g. when connection to a host fail) and restarted on any subsequent CONNECTING_TO. I have also seen we open significant number of connections that are never used when acceleration is used. Based on that, we should check the code for HTTP acceleration again and make it more robust.

Honza Bambas (:mayhemer)

Updated

•

13 years ago

Depends on: 746255

Honza Bambas (:mayhemer)

Updated

•

13 years ago

Whiteboard: [http-conn]

syn retry v1 14 years ago Patrick McManus [:mcmanus] 59.10 KB, patch		Details \| Diff \| Splinter Review
syn retry v2 14 years ago Patrick McManus [:mcmanus] 58.71 KB, patch	mayhemer : review-	Details \| Diff \| Splinter Review
syn retry v3 14 years ago Patrick McManus [:mcmanus] 61.02 KB, patch		Details \| Diff \| Splinter Review
syn retry v4 14 years ago Patrick McManus [:mcmanus] 61.64 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review
syn retry v5 14 years ago Patrick McManus [:mcmanus] 61.94 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review
syn retry v6 14 years ago Patrick McManus [:mcmanus] 61.96 KB, patch	mcmanus : review+	Details \| Diff \| Splinter Review
followup to fix gcc 4.5 issue 14 years ago Daniel Holbert [:dholbert] 1.98 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review