network.http.response.timeout breaks applications with long lived http connetions

RESOLVED FIXED in Firefox 31

Status

()

Firefox
Untriaged
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: Karl O. Pinc, Assigned: sworkman)

Tracking

29 Branch
Firefox 33
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox31 fixed, firefox32 fixed, firefox33 fixed)

Details

Attachments

(4 attachments)

(Reporter)

Description

4 years ago
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140429 Firefox/24.0 Iceweasel/24.5.0 (Nightly/Aurora)
Build ID: 20140429161455

Steps to reproduce:

Write web application that takes more than 5 minutes to return a response.

Try to use application using Firefox 29 or greater.

Get connection timeout response.  Connection to server is broken and server cannot deliver results to client.


Actual results:

In Firefox 29 an enhanement was introduced to drop connections if they take "too long".  See:
https://developer.mozilla.org/en-US/Firefox/Releases/29/Site_Compatibility#Networking
and
bug #947391.

This breaks our application.  (And other folks applications as well per comments in bug #947391.)

FWIW, it took me quite a while to trace this back to firefox.  I suspect other's have had the problem and simply abandoned firefox rather than bothering to report the problem.

Here's some search engine keywords: long-lived long lived http response broken connection


Expected results:

Applications should be able to take as long as is necessary to respond to the client.

If you want to pop up a dialog box to warn the user that the remote site is taking a long time to respond, that sounds reasonable.  Anything that allows the user to eventually receive results from the server would be fine.  Aborting the connection is not reasonable.

Since HTTP 1.1 POST requests are not idempotent premature connection reset for these reqeusts are especially henious.

It is also not reasonable to ask all the firefox users to change their network.http.response.timeout setting in order to use the application.  Nor is it reasonable to require the user to install some sort of plugin that changes the setting so that the application can be used.

Helpful hints are a nice bit of frosting to hand to the end-user, but not at the expense of breaking backwards compatibility and removing real functionatlity.  (It's not as if the user does not know she's not yet gotten a result -- they see the little spinning ball and know it means something's waiting.)

If you're going to warn the user about how long the remote site is taking then it might be nice to give the remote site some way to tell firefox how long it expects to take to supply a the next response.  Sounds like a strange road to go down but I suppose you could introduce some non-standard http header.

Meanwhile, until there's a fix that does not introduce regressions -- as bug #947391 did, a quick fix would be a new firefox release with network.http.response.timeout set to 0.  (If the user wants a timeout message they can frob the setting themselves.)  One would think that changing the setting to 0 would fall into the category of something that could quickly be released to immediately fix the problem.
(Reporter)

Updated

4 years ago
Version: 24 Branch → 29 Branch
(Reporter)

Updated

4 years ago
OS: Linux → All
Hardware: x86_64 → All
(Reporter)

Comment 1

4 years ago
What problem is solved by bug #947391's closing of the connection?  If the connetion needs to be closed why not let the user do it?
If I read Bug 947391 correctly, async XMLHttpRequests (Ajax) are not affected. An app taking 5+ minutes to load is a _pretty_ bad user experience, so you should rewrite you app using XHR and maybe a loading status/icon.
(Reporter)

Comment 3

4 years ago
Putting a little spinning ball on the screen (or whatever) is not going to affect my user's mood.
(Assignee)

Comment 4

4 years ago
Karl, please see bug 1005808 for more information.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1005808
The timer in question was introduced in 947391 for Firefox 29

in Firefox 30, we introduced refined functionality via the TCP Keep-Alive mechanism via bug 444328

The KA code is nicer in many ways than the timer code - it has a finer granularity for detecting the same problem and doesn't require us to put a cap on the application's response time.

However, it relies on active probing and therefore is not as energy friendly. It might be disabled in some environments (either now or in the future as events warrant). At this time it is universally enabled at distribution time.

Given all of that I think the right thing to do is to automatically disable the response timeout introduced in 947391 when the KA timer is found to be enabled. (i.e the response timeout code should check the KA pref and not arm itself if KA is enabled). That will make these paths mutually exclusive with KA being preferred. That ought to maximize compatibility while keeping our back up plan in tact.

We can take that simple patch right in this bug - steve has volunteered to write it.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
(Reporter)

Comment 6

4 years ago
Thank you!  That should address my concerns nicely.
(Assignee)

Comment 7

4 years ago
Created attachment 8445467 [details] [diff] [review]
v1.0 Only enable HTTP response timeout when TCP Keepalives are disabled for HTTP

-- Adds a check in nsHttpHandler to only enable HTTP response timeout if TCP keepalives for HTTP are disabled.
-- nsHttpConnection will only enabled the timeout if gHttpHandler->ResponseTimeoutEnabled(). I also added an assertion for gHttpHandler->ResponseTimeoutEnabled() when mResponseTimeoutEnabled is true in ReadTimeoutTick.
Assignee: nobody → sworkman
Status: REOPENED → ASSIGNED
Attachment #8445467 - Flags: review?(mcmanus)
(Assignee)

Comment 8

4 years ago
Created attachment 8445472 [details] [diff] [review]
v1.0 Update test for HTTP response timeout; disable with TCP keepalives for HTTP

Updated xpcshell test:
-- Enable response timeouts at test start; reset keepalive prefs in test cleanup.
-- Add test cases to cover different combinations of the keepalive prefs being enabled/disabled.
-- Change how the test runs; use add_test()/run_next_test() to get a better idea of which test case is running in the logs.

Try looks good for xpcshell tests across all platforms:
https://tbpl.mozilla.org/?tree=Try&rev=2eed41289e91
Attachment #8445472 - Flags: review?(mcmanus)
Attachment #8445472 - Flags: review?(mcmanus) → review+
Comment on attachment 8445467 [details] [diff] [review]
v1.0 Only enable HTTP response timeout when TCP Keepalives are disabled for HTTP

Review of attachment 8445467 [details] [diff] [review]:
-----------------------------------------------------------------

r+ if you agree with the changes

::: netwerk/protocol/http/nsHttpConnection.cpp
@@ +1110,5 @@
>  
>      uint32_t nextTickAfter = UINT32_MAX;
>      // Timeout if the response is taking too long to arrive.
>      if (mResponseTimeoutEnabled) {
> +        MOZ_ASSERT(gHttpHandler->ResponseTimeoutEnabled());

I think this is racy when people change prefs during runtime. you can WARN

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +141,5 @@
>      , mFastFallbackToIPv4(false)
>      , mProxyPipelining(true)
>      , mIdleTimeout(PR_SecondsToInterval(10))
>      , mSpdyTimeout(PR_SecondsToInterval(180))
> +    , mResponseTimeout(0)

this should match the default pref (i.e. leave it at 600s)
Attachment #8445467 - Flags: review?(mcmanus) → review+
Thanks Pat!

One minor thing: nsHttpHandler::mResponseTimeout should be 300; the original value of 600 did not match the pref default. So, I set it to 300s as per the intention of your comment.

try run: https://tbpl.mozilla.org/?tree=Try&rev=612cd11bd368

m-i is currently closed; I'll land this as soon as it opens.
Wait, I had a more recent try run than that one: https://tbpl.mozilla.org/?tree=Try&rev=184803507897
Landed on inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/2700ebff7d64

We should be able to uplift it to aurora and beta once it's baked on mozilla-central for a couple of days.
https://hg.mozilla.org/mozilla-central/rev/2700ebff7d64
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 33
Created attachment 8448960 [details] [diff] [review]
v1.0 BUNDLED FOR AURORA Only enable HTTP response timeout when TCP Keepalives are disabled for HTTP

NOTE: This is a bundled patch that applies to aurora, builds locally, and the included test passes.

Approval Request Comment
[Feature/regressing bug #]: Bug 444328
[User impact if declined]: Pages that take longer than 5 minutes to load will timeout (e.g. some legacy report generating services).
[Describe test coverage new/current, TBPL]: Test in patch; patch has been on m-c for 3 days.
[Risks and why]: Minimal; returning to previous behavior of not timing out.
[String/UUID change made/needed]: None.
Attachment #8448960 - Flags: approval-mozilla-aurora?
Created attachment 8448963 [details] [diff] [review]
v1.0 BUNDLED FOR BETA Only enable HTTP response timeout when TCP Keepalives are disabled for HTTP

NOTE: This is a bundled patch that applies to beta (no big difference, it just applies cleanly), builds locally, and the included test passes.

Approval Request Comment
[Feature/regressing bug #]: Bug 444328
[User impact if declined]: Pages that take longer than 5 minutes to load will timeout (e.g. some legacy report generating services).
[Describe test coverage new/current, TBPL]: Test in patch; patch has been on m-c for 3 days.
[Risks and why]: Minimal; returning to previous behavior of not timing out.
[String/UUID change made/needed]: None.
Attachment #8448963 - Flags: approval-mozilla-beta?
status-firefox31: --- → affected
status-firefox32: --- → affected
status-firefox33: --- → fixed
Attachment #8448960 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8448963 [details] [diff] [review]
v1.0 BUNDLED FOR BETA Only enable HTTP response timeout when TCP Keepalives are disabled for HTTP

taking in beta because 31 is an ESR and I guess that mainly impacts enterprise users.
Attachment #8448963 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Reporter)

Comment 17

4 years ago
(In reply to Sylvestre Ledru [:sylvestre] from comment #16)
> Comment on attachment 8448963 [details] [diff] [review]
> v1.0 BUNDLED FOR BETA Only enable HTTP response timeout when TCP Keepalives
> are disabled for HTTP
> 
> taking in beta because 31 is an ESR and I guess that mainly impacts
> enterprise users.

Yes.  Thank you.

Note that regards enterprise use this is not just for "legacy" applications.
It also matters anytime something dirt simple (minimal investment/no javascript)
is needed that takes a "long time" for the server to process.  I wouldn't
think that this applies to consumer-oriented applications but it sure can
be handy to knock out a solution for within an organization.

Updated

4 years ago
Depends on: 1060202

Updated

4 years ago
Depends on: 1063358
You need to log in before you can comment on or make changes to this bug.