Closed Bug 1426366 Opened 2 years ago Closed 2 years ago

Turn off TFO if we detect connection stalls

Categories

(Core :: Networking: HTTP, enhancement, P1)

59 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: dragana, Assigned: dragana)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file, 1 obsolete file)

I will try to detect stalls like the one explained in bug 1395494.
Similar stalls can happen with tls1.2 on some older versions of PaloAlto firewall (this bug is fixed the newer versions).
Blocks: 1426367
If a connection that is using TFO(really sending data during the tcp handshake) and TLS is idle for about 10s, a firewall will delete all connection states and no more packet for that connection will go through.

To detect this:
 - when a new transaction is added to a connection, check if the connection has been idle for 10s (pref network.tcp.tcp_fastopen_http_check_for_stalls_only_if_idle_for). If yes, mark the connection for checking (mCheckNetworkStallsWithTFO = true)
 - when the transaction data has been written to the network set mLastRequestBytesSentTime.
 - if there is no response for 20s (pref network.tcp.tcp_fastopen_http_stalls_timeout), increase mFastOpenStallsCounter
 - if mFastOpenStallsLimit (pref network.tcp.tcp_fastopen_http_stalls_limit set to 3) is reached, disable TFO
Attached patch bug_detect_stalls_with_tfo.patch (obsolete) — Splinter Review
Attachment #8938111 - Flags: review?(hurley)
Comment on attachment 8938111 [details] [diff] [review]
bug_detect_stalls_with_tfo.patch

Review of attachment 8938111 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good, just a couple questions on the determination of a stall before r+ing.

::: netwerk/protocol/http/Http2Session.cpp
@@ +274,5 @@
>  
> +  uint32_t nextTick = UINT32_MAX;
> +  if (mCheckNetworkStallsWithTFO && mLastRequestBytesSentTime) {
> +    PRIntervalTime initialResponseDelta = now - mLastRequestBytesSentTime;
> +    if (initialResponseDelta > gHttpHandler->FastOpenStallsTimeout()) {

Should this be >=? I tend to think of the timeout as being hit as soon as the time is elapsed. Also, with just > we could potentially get in a state in the else where nextTick is 0.

@@ +387,5 @@
>    }
>  
>    mStreamIDHash.Put(aNewID, stream);
> +
> +  // if the TCP fast Open has need used and conection was idle for some time

nit: capitalize the I in 'if', and should 'need' be 'been'? (You can drop the 'the' before 'TCP', as well.)

@@ +388,5 @@
>  
>    mStreamIDHash.Put(aNewID, stream);
> +
> +  // if the TCP fast Open has need used and conection was idle for some time
> +  // we will be cautious and watch out for bug 1395494. 

nit: trailing whitespace

@@ +393,5 @@
> +  if (!mCheckNetworkStallsWithTFO && mConnection) {
> +    RefPtr<nsHttpConnection> conn = mConnection->HttpConnection();
> +    if (conn && (conn->GetFastOpenStatus() == TFO_DATA_SENT) &&
> +        gHttpHandler->CheckIfConnectionIsStalledOnlyIfIdleForThisAmountOfSeconds() &&
> +        IdleTime() > gHttpHandler->CheckIfConnectionIsStalledOnlyIfIdleForThisAmountOfSeconds()) {

Again, maybe |IdleTime() >=| here? Same reasoning as above.

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +642,5 @@
>  
>      NS_ENSURE_ARG_POINTER(trans);
>      NS_ENSURE_TRUE(!mTransaction, NS_ERROR_IN_PROGRESS);
>  
> +    // if the TCP fast Open has need used and conection was idle for some time

Same nit here as in the Http2Session version of this code.

@@ +646,5 @@
> +    // if the TCP fast Open has need used and conection was idle for some time
> +    // we will be cautious and watch out for bug 1395494.
> +    if (mNPNComplete && (mFastOpenStatus == TFO_DATA_SENT) &&
> +        gHttpHandler->CheckIfConnectionIsStalledOnlyIfIdleForThisAmountOfSeconds() &&
> +        IdleTime() > gHttpHandler->CheckIfConnectionIsStalledOnlyIfIdleForThisAmountOfSeconds()) {

Same question on >= here as above.

@@ +1371,5 @@
>  
> +    // Check for the TCP Fast Open related stalls.
> +    if (mCheckNetworkStallsWithTFO && mLastRequestBytesSentTime) {
> +        PRIntervalTime initialResponseDelta = now - mLastRequestBytesSentTime;
> +        if (initialResponseDelta > gHttpHandler->FastOpenStallsTimeout()) {

Same >= question again :)
Attachment #8938111 - Flags: review?(hurley)
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #3)
> Comment on attachment 8938111 [details] [diff] [review]
> bug_detect_stalls_with_tfo.patch
> 
> Review of attachment 8938111 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks pretty good, just a couple questions on the determination of a
> stall before r+ing.
> 
> ::: netwerk/protocol/http/Http2Session.cpp
> @@ +274,5 @@
> >  
> > +  uint32_t nextTick = UINT32_MAX;
> > +  if (mCheckNetworkStallsWithTFO && mLastRequestBytesSentTime) {
> > +    PRIntervalTime initialResponseDelta = now - mLastRequestBytesSentTime;
> > +    if (initialResponseDelta > gHttpHandler->FastOpenStallsTimeout()) {
> 
> Should this be >=? I tend to think of the timeout as being hit as soon as
> the time is elapsed. Also, with just > we could potentially get in a state
> in the else where nextTick is 0.
> 

I agree, otherwise we will have a strange behavior. Thanks.
Attachment #8938111 - Attachment is obsolete: true
Attachment #8938343 - Flags: review?(hurley)
Comment on attachment 8938343 [details] [diff] [review]
bug_detect_stalls_with_tfo.patch

Review of attachment 8938343 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome, thanks!
Attachment #8938343 - Flags: review?(hurley) → review+
Pushed by dd.mozilla@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e288b04ea679
Detect http transaction stalls with TFO. r=hurley
https://hg.mozilla.org/mozilla-central/rev/e288b04ea679
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
You need to log in before you can comment on or make changes to this bug.