Last Comment Bug 407190 - Network (HTTP) should timeout, if server does not react
: Network (HTTP) should timeout, if server does not react
Status: ASSIGNED
[lame-network][necko-backlog]
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- major with 7 votes (vote)
: ---
Assigned To: Patrick McManus [:mcmanus]
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on:
Blocks: 142326 tomtom 450710
  Show dependency treegraph
 
Reported: 2007-12-06 11:51 PST by Ben Bucksch (:BenB)
Modified: 2016-02-01 14:07 PST (History)
22 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Make connection timeouts configurable (5.84 KB, patch)
2007-12-06 12:27 PST, Wladimir Palant
ben.bucksch: review+
Details | Diff | Splinter Review
Same patch, bitrot removed (5.83 KB, patch)
2008-04-08 01:59 PDT, Wladimir Palant
cbiesinger: review+
Details | Diff | Splinter Review
Patch v2 (5.86 KB, patch)
2008-04-08 03:30 PDT, Wladimir Palant
cbiesinger: review+
Details | Diff | Splinter Review
nsIHttpChannel attribute (15.17 KB, patch)
2008-09-02 14:33 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Patch v4 with nsIHttpChannel attribute (21.18 KB, patch)
2008-09-02 16:36 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Splinter Review
Patch v2 updated (3.54 KB, patch)
2008-12-15 02:41 PST, Wladimir Palant
trev.moz: review+
Details | Diff | Splinter Review
Patch v5 (15.57 KB, patch)
2012-01-04 14:05 PST, Ben Bucksch (:BenB)
mcmanus: review-
Details | Diff | Splinter Review

Description Ben Bucksch (:BenB) 2007-12-06 11:51:33 PST
Reproduction:
1. Have a server which accepts your request, but needs 2 minutes to give you any data
2. You (e.g. an AJAX app) want a response within 10 seconds or an error, you don't want to wait 2 minutes.

Actual result:
You never get any response, your application hangs (unless the user manually aborts), or hangs 1m30s-2 minutes, which is practically the same.

Expected result:
- A backend pref network.http.read.timeout, default 20-30 seconds
- An attribute on nsIHttpChannel which an application (including AJAX apps) can use to set the timeout for a particular call/server very low, without changing the timeout globally for the app.

Implementation:
Wladimir and me were working on this and have a working implementation.

Somewhere deep down (in nsSocketTransportService2.cpp), there's a poll(). poll() allows a timeout - that should be set. nsSocketTransportService already allows to set it, and the mail protocols use it, but necko / HTTP never sets it. nsHttpConnection should set it, based on a pref from nsHttpHandler (where all other prefs are read).
As said, nsIHttpChannel should also have an attribute, which nsHttpConnection should check and use, if set, otherwise use the pref from nHttpHandler.
0 should be allowed, meaning no timeout.

x-ref: Bug 142326
Comment 1 Wladimir Palant 2007-12-06 12:27:21 PST
Created attachment 291925 [details] [diff] [review]
Make connection timeouts configurable
Comment 2 Ben Bucksch (:BenB) 2007-12-06 12:28:16 PST
Comment on attachment 291925 [details] [diff] [review]
Make connection timeouts configurable

r=BenB
Comment 3 Wladimir Palant 2008-04-08 01:59:55 PDT
Created attachment 314298 [details] [diff] [review]
Same patch, bitrot removed
Comment 4 Christian :Biesinger (don't email me, ping me on IRC) 2008-04-08 02:51:51 PDT
of course your ajax app could always do timeout handling by itself
Comment 5 Christian :Biesinger (don't email me, ping me on IRC) 2008-04-08 02:53:59 PDT
actually could you use the pref names from bug 142326 comment 3?
Comment 6 Wladimir Palant 2008-04-08 03:30:31 PDT
Created attachment 314313 [details] [diff] [review]
Patch v2

Renamed network.http.read.timeout pref into network.http.request.timeout, corresponding variables renamed as well.

Do I need an additional review for that? Also, is there a chance this patch will make it into 1.9?
Comment 7 Lalo Martins 2008-06-26 15:59:48 PDT
Sad to see this more or less abandoned, I think I'll apply the patch by hand for the time being.

Here's a different reason why I need this, although it may be the "wrong" fix for the problem, but it does the job: I'm in China.  The "Great Firewall" (Golden Shield) usually works by making Firefox "think" it has connected to the server, but then never sending a reply.  I need to wait however long is the default OS-level timeout (minutes!) before I realise it's been blocked.  Having the ability to set a much lower (1 minute, maybe even less, will have to experiment) read timeout will make my entire browsing experience a lot better.
Comment 8 benc 2008-07-13 00:43:41 PDT
So, there are probably ways of implementing a "read" timeout vs. a "request" timeout... so I think the most important aspect isn't making the pref match the pref, but making the pref name descriptive.

Comment 9 benc 2008-07-19 11:32:54 PDT
Okay, I've had time to look at the patches.

Two points:

1- I'd like to see the connection time out separated from the request timeout. I know they seem like the same thing, but we don't have a lot of truly detailed information in the other bug reports, about why there were timeouts. Although the code checkin for supporting both is pretty simple, the actual selection of default pref and verification is going to vary. We should probably split them up (for example, I doubt I'd be able to verify them in one pass).

2- The code was changed from "read" to "request", but I'm not clear on which behavior the code will implement.

Does this mean: if I don't get a respose to my request in x seconds -or- in the last x seconds, I haven't received any new data when I read the socket ?

There can be a difference between a very slow start of response and a very slowly transmitted response.

Also, neither a read or request timeout will solve the problem with very slow connections where the request is sent, but takes a long time for ACKs to return. The OS value for the transmission timeout will trip, and kill the connection. (We should return a good error here, I don't know if we do...)
Comment 10 Ben Bucksch (:BenB) 2008-07-21 03:33:16 PDT
> I'd like to see the connection time out separated from the request timeout.
> There can be a difference between a very slow start of response and a very
> slowly transmitted response.

We're implementing all that we can implement given the options by the socket transport. Creating new options - if possible at all - would be deeper in the code (nsSocketTransportServivce2.cpp etc.).

It's also a question of whether it matters. It could be that a dynamic server returns a static template header (HTTP headers or even the start of the body) until it starts generating the actual response. See e.g. PHP inlined in HTML.

> The code was changed from "read" to "request", but I'm not clear on which
> behavior the code will implement.

Good question.
No, we don't change it, we keep the existing "request" that HTTPHandler uses, although the nsISocketTtransport uses "read". Thus, I was for using "read" in the pref, but Wladimir for "request" (IIRC Wladimir said the latter is correct and we even tested it?).

> very slow connections

That would only be relevant if we were to change the default. Which we don't (in this patch).
Comment 11 Ben Bucksch (:BenB) 2008-09-02 14:33:23 PDT
Created attachment 336548 [details] [diff] [review]
nsIHttpChannel attribute

> An attribute on nsIHttpChannel which an application (including AJAX apps) can
> use to set the timeout for a particular call/server very low, without changing
> the timeout globally for the app.

Implemented that.

The connection (nsHttpConnection) doesn't seem to have an interface, though, just the channel (nsIHttpChannel.idl), which is accessible via XMLHttpRequest.channel or what nsIIOService returns.
Not knowing a better alternative, I used nsHttpConnectionInfo to go from channel to connection.

It doesn't work, though, it seems it's too late. Attaching for biesi to look at it.
Comment 12 Ben Bucksch (:BenB) 2008-09-02 16:36:11 PDT
Created attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute
Comment 13 Ben Bucksch (:BenB) 2008-09-02 16:39:08 PDT
I'm not claiming the patch is pretty, but at least it seems I get the value from JS to the SocketTransportService.
I don't see it having any effect, but that's another matter :).
Comment 14 Ben Bucksch (:BenB) 2008-09-02 16:40:07 PDT
JS code would be:
var r = new XMLHttpRequest();
r.open("GET", url);
var channel = r.channel.QueryInterface(Ci.nsIHttpChannel2);
channel.connectTimeout = 5;
channel.requestTimeout = 5;
Comment 15 Ben Bucksch (:BenB) 2008-09-05 15:15:13 PDT
Requesting wanted1.9.1, because Thunderbird needs it for autoconfig
<https://wiki.mozilla.org/Thunderbird:Autoconfiguration> / bug 450710 / bug 422814
Comment 16 Wladimir Palant 2008-12-15 02:41:40 PST
Created attachment 353006 [details] [diff] [review]
Patch v2 updated

Attaching updated version of Patch v2 (diffed against 1.9.1b2 and proper format), taking review flag over from previous patch.
Comment 17 Ben Bucksch (:BenB) 2008-12-15 04:17:23 PST
Note: Patch v2 is not sufficient. (I think Wladimir attached it because he needed it for TomTom HOME's XULRunner.) As said in comment 0 and comment 15, I (and other AJAX apps) need the ability to set the timeout per request, not only globally as pref. That's what patch 4 does, the latter should go into Mozilla.
Comment 18 David :Bienvenu 2009-01-08 14:16:43 PST
any chance this is going to go anywhere? Should we request a different reviewer?  I think there's someone new at moco working on networking code...
Comment 19 Ben Bucksch (:BenB) 2009-01-08 15:04:53 PST
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

> I think there's someone new at moco working on networking code...

I wouldn't know who that is or whether he can review.

Asking for (super)review by darin.
Comment 20 Adam Porter 2009-07-07 17:38:34 PDT
Any updates on this?  I'm currently stuck on a connection with high intermittent packet loss and many dropped connections.  Firefox doesn't care, though, it'll happily wait forever for more data from the server, even if it's never going to come.
Comment 21 Ben Bucksch (:BenB) 2009-07-07 17:48:10 PDT
Patch (v4) is waiting for review by biesi and darin
Comment 22 Ben Bucksch (:BenB) 2009-10-29 05:40:04 PDT
Changed review request to bz.
Comment 23 Boris Zbarsky [:bz] (still a bit busy) 2009-11-01 20:16:10 PST
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

Jason, can you take a look?

One obvious issue: this new IDL file shouldn't be in SDK_XPIDLSRCS.

In fact, is there a reason to not just put this on nsIHttpChannelInternal?
Comment 24 Jeff Walden [:Waldo] (remove +bmo to email) 2009-11-01 20:23:37 PST
|if ( ! expr)| isn't prevailing style anywhere in Mozilla code, last I checked -- should be |if (!expr)|.

If you try accessing and setting the new attributes on a channel that's not opened, do you properly not crash?  Just looking at the code (and having no idea what the internal invariants of the members are) I'm going to guess you crash, but I could be wrong.

You should be able to write tests for this using httpd.js and nsIHttpResponse.seizePower.
Comment 25 Ben Bucksch (:BenB) 2009-11-01 21:48:03 PST
bz, I intentionally made a public interface, because applications and extensions need to be able to use it - that's the whole point. Both Thunderbird and TomTom HOME (XULrunner app) need it.
Comment 26 Boris Zbarsky [:bz] (still a bit busy) 2009-11-01 21:50:01 PST
"public" in what sense?  Only frozen interfaces should be in SDK_XPIDLSRCS.
Comment 27 Ben Bucksch (:BenB) 2009-11-01 21:57:06 PST
Well, "public" in the way that XULrunner apps and extensions can use it without using "Internal" interfaces.

See initial description:
> - An attribute on nsIHttpChannel which an application (including AJAX apps) can
> use to set the timeout for a particular call/server very low, without changing
> the timeout globally for the app.

Actually, I should (later?) add an XMLHttpRequest attribute, too, to make it accessible for AJAX webpages.
Comment 28 Boris Zbarsky [:bz] (still a bit busy) 2009-11-01 21:59:26 PST
There's nothing wrong with using nsIHttpChannelInternal, the naming notwithstanding.
Comment 29 Ben Bucksch (:BenB) 2009-11-01 22:15:11 PST
OK. I'll quote you on that when somebody claims I shouldn't :). At least it's shipped in the XULRunner SDK in included/ .
Comment 30 Boris Zbarsky [:bz] (still a bit busy) 2009-11-01 22:22:32 PST
Oh, and as far as XHR goes that's something to bring up with the relevant W3C group, no?
Comment 31 Ben Bucksch (:BenB) 2009-11-01 22:39:43 PST
OK, I filed Bug 525816 for that.
Comment 32 timeless 2009-11-02 00:46:33 PST
fwiw, from memory, this bug was flagged by our connectivity people as something which hurt MicroB's user experience while traversing flaky cellular networks.

+    attribute PRInt16 connectTimeout;

what does x.connectTimeout = -2 do?

+nsHttpChannel::GetConnectTimeout(PRInt16* value)
+{
+    NS_ENSURE_ARG_POINTER(value);

for attributes, this check shouldn't need to exist (this isn't the case for out arguments of functions)

+    printf("channel::setrequesttimeout\n");

obviously this should go or be moved to prlogging

+nsHttpConnection::SetConnectTimeout(PRInt16 value)

typically the argument should be aValue to remind people it started as an argument (even if you do clobber it later)

+    if ( ! mSocketTransport)
+        return NS_ERROR_NOT_INITIALIZED;

if this is truly a caller bug, then NS_ENSURE_STATE ?

the idl doesn't explain that 0 is magical

there, this crosses something off my todo list from around august.
Comment 33 Ben Bucksch (:BenB) 2010-04-30 16:39:16 PDT
jduell: ping
Comment 34 Jason Duell [:jduell] (needinfo me) 2010-05-01 15:40:25 PDT
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

I'm willing to look at this--seems like a good bug to fix--but not before we get necko/e10s working.   Feel free to assign to someone else, or re-assign to me in a month or two.
Comment 35 Ben Bucksch (:BenB) 2010-08-25 17:40:40 PDT
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

jduell wrote 2010-05-01:
> Feel free to ... re-assign to me in a month or two.
Comment 36 Patrick McManus [:mcmanus] 2011-12-19 19:46:47 PST
there is code in the pipelining changes that can probably provide a basis for this. I'll put it on my radar.
Comment 37 Jason Duell [:jduell] (needinfo me) 2012-01-04 12:15:19 PST
I'm going to let Patrick figure out if the patch here still makes sense, and/or how else to proceed.
Comment 38 Ben Bucksch (:BenB) 2012-01-04 12:44:23 PST
Yes, it still makes sense.
Comment 39 Ben Bucksch (:BenB) 2012-01-04 14:05:02 PST
Created attachment 585886 [details] [diff] [review]
Patch v5

- Unbitrot the patch
- remove httpchannel2 interface, because we can now add new attributes to existing interfaces.
- to be resolved: the inherintance hierarchy changed, it's not obvious to me where the function implementations should go now or why there's no DECL_NSIHTTPCHANNEL anywhere.

/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpHandler.cpp:1385: error: cannot allocate an object of abstract type ‘mozilla::net::HttpChannelChild’
../../../dist/include/mozilla/net/HttpChannelChild.h:79: note:   because the following virtual functions are pure within ‘mozilla::net::HttpChannelChild’:
../../../dist/include/nsIHttpChannel.h:58: note:        virtual nsresult nsIHttpChannel::GetConnectTimeout(PRInt16*)
...
/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpHandler.cpp:1387: error: cannot allocate an object of abstract type ‘nsHttpChannel’
/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpChannel.h:90: note:   because the following virtual functions are pure within ‘nsHttpChannel’:
../../../dist/include/nsIHttpChannel.h:58: note:        virtual nsresult nsIHttpChannel::GetConnectTimeout(PRInt16*)
...
make[1]: *** [nsHttpHandler.o] Fehler 1
Comment 40 Ben Bucksch (:BenB) 2012-01-04 14:07:15 PST
Comment on attachment 585886 [details] [diff] [review]
Patch v5

I need a pointer for that last thing.
Comment 41 Patrick McManus [:mcmanus] 2012-01-08 07:59:52 PST
Comment on attachment 585886 [details] [diff] [review]
Patch v5

Review of attachment 585886 [details] [diff] [review]:
-----------------------------------------------------------------

hi ben, I'm sorry this has languished for so long - its a little bit complicated and the code in question is in flux - but I like the basic idea. let's see if we can't iterate it forward.

* the biggest problem here is that you can't drive the connection timeout out of nshttpconnection - the sockets are actually made out of nshttpconnectionmgr, by the time nshttpconnection is called the socket is already tcp connected.

* a related issue is that the connection creation is not tied to a transaction (and therefore your channel timeout params). A new transaction that can't be placed on an existing connection forces a new connection to be made and when that socket is connected the transaction queue is consulted to put them back together - but that transcation might have been dispatched elsewhere in the meantime (for instance another persistent connection became available while waiting for the new connection to complete) - that's a feature. That suggests to me that the transaction might need a timer associated with it instead of trying to use the socket transport logic. see my comments below about the read timeout for an (almost) existing timer tick this might be able to be attached to.

* as far as the read timeouts on active connections the basic approach will proabably work better.. I'd prefer these are called read timeouts instead of request timeouts. (you are actually timing out the response, not the request.. but read or recv is clearer).

* the read timeout has to be cleared when this goes back into the idle connection pool

* bug 603514 also implements some read timeout logic, but it does it off a http specific timer instead of the socket logic. (that patch requires it because it needs some pretty fancy pipeline rearranging not necessarily aborting the transaction). I'd prefer if you could build on that because then we could get approrpiate logging and error codes that were specific to your new property.


* the idl change should be on nsihttpchannelinternal and don't forget to bump the uuid. If for no other reason than I prefer to avoid churning uuid's on interfaces that included by lots of addons.

* as far as why decl_nsihttpchannel went away it has something to do with e10s - jason is the expert. you can look at Get/SetAllowSpdy in HttpBaseChannel for a template on how to add an attribute.
Comment 42 Patrick McManus [:mcmanus] 2012-01-08 08:06:25 PST
a couple other thoughts:

* be sure the read timer does not apply while we're sending a large upload

* what does your timeout mean in the context of two different queues.. first in the connection manager when transactions can sit in a wait queue while all 6 allowable connections are already busy, and second when the transaction has been dispatched as a later part of a pipeline of transactions on one connection and may experience head of line blocking.
Comment 43 dE 2014-12-01 21:33:30 PST
It appears that the OS (at least Linux) has it's own way to handle this.

Linux has net.ipv4.tcp_retries2 which specifies the no. of retries (it'll resend the packet) before returning the application an error code.

Firefox ignores this. So does wget (but it has a --timeout option).
Comment 44 Ben Bucksch (:BenB) 2015-10-26 07:02:18 PDT
Would be nice to land this. The low-level networking APIs allow to set a timeout. Callers of XHR should be able to use them.
Comment 45 Honza Bambas (:mayhemer) 2015-10-26 08:36:36 PDT
If there has to be an optional timeout feature, the start time point must be taken no sooner than we send the data to the socket.  Problem is that it may be held in local buffers.  Timeouts are hard and I personally was always against them being lower than on the application level.  Timeouts always cause pain and never work reliably.

Note You need to log in before you can comment on or make changes to this bug.