Open Bug 407190 Opened 17 years ago Updated 2 years ago

Network (HTTP) should timeout, if server does not react

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

People

(Reporter: BenB, Unassigned)

References

Details

(Whiteboard: [lame-network][necko-backlog])

Attachments

(1 file, 6 obsolete files)

Reproduction:
1. Have a server which accepts your request, but needs 2 minutes to give you any data
2. You (e.g. an AJAX app) want a response within 10 seconds or an error, you don't want to wait 2 minutes.

Actual result:
You never get any response, your application hangs (unless the user manually aborts), or hangs 1m30s-2 minutes, which is practically the same.

Expected result:
- A backend pref network.http.read.timeout, default 20-30 seconds
- An attribute on nsIHttpChannel which an application (including AJAX apps) can use to set the timeout for a particular call/server very low, without changing the timeout globally for the app.

Implementation:
Wladimir and me were working on this and have a working implementation.

Somewhere deep down (in nsSocketTransportService2.cpp), there's a poll(). poll() allows a timeout - that should be set. nsSocketTransportService already allows to set it, and the mail protocols use it, but necko / HTTP never sets it. nsHttpConnection should set it, based on a pref from nsHttpHandler (where all other prefs are read).
As said, nsIHttpChannel should also have an attribute, which nsHttpConnection should check and use, if set, otherwise use the pref from nHttpHandler.
0 should be allowed, meaning no timeout.

x-ref: Bug 142326
Blocks: 406684
Blocks: tomtom
No longer blocks: 406684
Assignee: ben.bucksch → trev.moz
Attachment #291925 - Flags: review?(cbiesinger)
Comment on attachment 291925 [details] [diff] [review]
Make connection timeouts configurable

r=BenB
Attachment #291925 - Flags: review+
Attached patch Same patch, bitrot removed (obsolete) — Splinter Review
Attachment #291925 - Attachment is obsolete: true
Attachment #314298 - Flags: review?(cbiesinger)
Attachment #291925 - Flags: review?(cbiesinger)
Attachment #314298 - Flags: review?(cbiesinger) → review+
of course your ajax app could always do timeout handling by itself
actually could you use the pref names from bug 142326 comment 3?
Attached patch Patch v2 (obsolete) — Splinter Review
Renamed network.http.read.timeout pref into network.http.request.timeout, corresponding variables renamed as well.

Do I need an additional review for that? Also, is there a chance this patch will make it into 1.9?
Attachment #314298 - Attachment is obsolete: true
Attachment #314313 - Flags: review?(cbiesinger)
Attachment #314313 - Flags: review?(cbiesinger) → review+
Sad to see this more or less abandoned, I think I'll apply the patch by hand for the time being.

Here's a different reason why I need this, although it may be the "wrong" fix for the problem, but it does the job: I'm in China.  The "Great Firewall" (Golden Shield) usually works by making Firefox "think" it has connected to the server, but then never sending a reply.  I need to wait however long is the default OS-level timeout (minutes!) before I realise it's been blocked.  Having the ability to set a much lower (1 minute, maybe even less, will have to experiment) read timeout will make my entire browsing experience a lot better.
So, there are probably ways of implementing a "read" timeout vs. a "request" timeout... so I think the most important aspect isn't making the pref match the pref, but making the pref name descriptive.

Okay, I've had time to look at the patches.

Two points:

1- I'd like to see the connection time out separated from the request timeout. I know they seem like the same thing, but we don't have a lot of truly detailed information in the other bug reports, about why there were timeouts. Although the code checkin for supporting both is pretty simple, the actual selection of default pref and verification is going to vary. We should probably split them up (for example, I doubt I'd be able to verify them in one pass).

2- The code was changed from "read" to "request", but I'm not clear on which behavior the code will implement.

Does this mean: if I don't get a respose to my request in x seconds -or- in the last x seconds, I haven't received any new data when I read the socket ?

There can be a difference between a very slow start of response and a very slowly transmitted response.

Also, neither a read or request timeout will solve the problem with very slow connections where the request is sent, but takes a long time for ACKs to return. The OS value for the transmission timeout will trip, and kill the connection. (We should return a good error here, I don't know if we do...)
> I'd like to see the connection time out separated from the request timeout.
> There can be a difference between a very slow start of response and a very
> slowly transmitted response.

We're implementing all that we can implement given the options by the socket transport. Creating new options - if possible at all - would be deeper in the code (nsSocketTransportServivce2.cpp etc.).

It's also a question of whether it matters. It could be that a dynamic server returns a static template header (HTTP headers or even the start of the body) until it starts generating the actual response. See e.g. PHP inlined in HTML.

> The code was changed from "read" to "request", but I'm not clear on which
> behavior the code will implement.

Good question.
No, we don't change it, we keep the existing "request" that HTTPHandler uses, although the nsISocketTtransport uses "read". Thus, I was for using "read" in the pref, but Wladimir for "request" (IIRC Wladimir said the latter is correct and we even tested it?).

> very slow connections

That would only be relevant if we were to change the default. Which we don't (in this patch).
Attached patch nsIHttpChannel attribute (obsolete) — Splinter Review
> An attribute on nsIHttpChannel which an application (including AJAX apps) can
> use to set the timeout for a particular call/server very low, without changing
> the timeout globally for the app.

Implemented that.

The connection (nsHttpConnection) doesn't seem to have an interface, though, just the channel (nsIHttpChannel.idl), which is accessible via XMLHttpRequest.channel or what nsIIOService returns.
Not knowing a better alternative, I used nsHttpConnectionInfo to go from channel to connection.

It doesn't work, though, it seems it's too late. Attaching for biesi to look at it.
Assignee: trev.moz → ben.bucksch
Attachment #314313 - Attachment is obsolete: true
Attachment #336548 - Attachment is obsolete: true
Attachment #336574 - Flags: review?
Attachment #336574 - Flags: review? → review?(cbiesinger)
I'm not claiming the patch is pretty, but at least it seems I get the value from JS to the SocketTransportService.
I don't see it having any effect, but that's another matter :).
JS code would be:
var r = new XMLHttpRequest();
r.open("GET", url);
var channel = r.channel.QueryInterface(Ci.nsIHttpChannel2);
channel.connectTimeout = 5;
channel.requestTimeout = 5;
Requesting wanted1.9.1, because Thunderbird needs it for autoconfig
<https://wiki.mozilla.org/Thunderbird:Autoconfiguration> / bug 450710 / bug 422814
Flags: wanted1.9.1?
Attached patch Patch v2 updated (obsolete) — Splinter Review
Attaching updated version of Patch v2 (diffed against 1.9.1b2 and proper format), taking review flag over from previous patch.
Attachment #353006 - Flags: review+
Note: Patch v2 is not sufficient. (I think Wladimir attached it because he needed it for TomTom HOME's XULRunner.) As said in comment 0 and comment 15, I (and other AJAX apps) need the ability to set the timeout per request, not only globally as pref. That's what patch 4 does, the latter should go into Mozilla.
Attachment #353006 - Attachment is obsolete: true
Blocks: 450710
any chance this is going to go anywhere? Should we request a different reviewer?  I think there's someone new at moco working on networking code...
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

> I think there's someone new at moco working on networking code...

I wouldn't know who that is or whether he can review.

Asking for (super)review by darin.
Attachment #336574 - Flags: superreview?(darin.moz)
Attachment #336574 - Attachment description: nsIHttpChannel attribute, v4 → Patch v4 with nsIHttpChannel attribute
Any updates on this?  I'm currently stuck on a connection with high intermittent packet loss and many dropped connections.  Firefox doesn't care, though, it'll happily wait forever for more data from the server, even if it's never going to come.
Patch (v4) is waiting for review by biesi and darin
Attachment #336574 - Flags: superreview?(darin.moz)
Attachment #336574 - Flags: review?(cbiesinger)
Attachment #336574 - Flags: review?(bzbarsky)
Changed review request to bz.
Flags: wanted1.9.1?
Attachment #336574 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

Jason, can you take a look?

One obvious issue: this new IDL file shouldn't be in SDK_XPIDLSRCS.

In fact, is there a reason to not just put this on nsIHttpChannelInternal?
|if ( ! expr)| isn't prevailing style anywhere in Mozilla code, last I checked -- should be |if (!expr)|.

If you try accessing and setting the new attributes on a channel that's not opened, do you properly not crash?  Just looking at the code (and having no idea what the internal invariants of the members are) I'm going to guess you crash, but I could be wrong.

You should be able to write tests for this using httpd.js and nsIHttpResponse.seizePower.
bz, I intentionally made a public interface, because applications and extensions need to be able to use it - that's the whole point. Both Thunderbird and TomTom HOME (XULrunner app) need it.
"public" in what sense?  Only frozen interfaces should be in SDK_XPIDLSRCS.
Well, "public" in the way that XULrunner apps and extensions can use it without using "Internal" interfaces.

See initial description:
> - An attribute on nsIHttpChannel which an application (including AJAX apps) can
> use to set the timeout for a particular call/server very low, without changing
> the timeout globally for the app.

Actually, I should (later?) add an XMLHttpRequest attribute, too, to make it accessible for AJAX webpages.
There's nothing wrong with using nsIHttpChannelInternal, the naming notwithstanding.
OK. I'll quote you on that when somebody claims I shouldn't :). At least it's shipped in the XULRunner SDK in included/ .
Oh, and as far as XHR goes that's something to bring up with the relevant W3C group, no?
OK, I filed Bug 525816 for that.
fwiw, from memory, this bug was flagged by our connectivity people as something which hurt MicroB's user experience while traversing flaky cellular networks.

+    attribute PRInt16 connectTimeout;

what does x.connectTimeout = -2 do?

+nsHttpChannel::GetConnectTimeout(PRInt16* value)
+{
+    NS_ENSURE_ARG_POINTER(value);

for attributes, this check shouldn't need to exist (this isn't the case for out arguments of functions)

+    printf("channel::setrequesttimeout\n");

obviously this should go or be moved to prlogging

+nsHttpConnection::SetConnectTimeout(PRInt16 value)

typically the argument should be aValue to remind people it started as an argument (even if you do clobber it later)

+    if ( ! mSocketTransport)
+        return NS_ERROR_NOT_INITIALIZED;

if this is truly a caller bug, then NS_ENSURE_STATE ?

the idl doesn't explain that 0 is magical

there, this crosses something off my todo list from around august.
Summary: Network (HTTP) should time out, if server does not react → Network (HTTP) should timeout, if server does not react
jduell: ping
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

I'm willing to look at this--seems like a good bug to fix--but not before we get necko/e10s working.   Feel free to assign to someone else, or re-assign to me in a month or two.
Attachment #336574 - Flags: review?(jduell.mcbugs)
Attachment #336574 - Flags: review?(ben.bucksch)
Comment on attachment 336574 [details] [diff] [review]
Patch v4 with nsIHttpChannel attribute

jduell wrote 2010-05-01:
> Feel free to ... re-assign to me in a month or two.
Attachment #336574 - Flags: review?(ben.bucksch) → review?(jduell.mcbugs)
there is code in the pipelining changes that can probably provide a basis for this. I'll put it on my radar.
Whiteboard: [lame-network]
I'm going to let Patrick figure out if the patch here still makes sense, and/or how else to proceed.
Assignee: ben.bucksch → mcmanus
Attachment #336574 - Flags: review?(jduell.mcbugs) → review?(mcmanus)
Yes, it still makes sense.
Attached patch Patch v5Splinter Review
- Unbitrot the patch
- remove httpchannel2 interface, because we can now add new attributes to existing interfaces.
- to be resolved: the inherintance hierarchy changed, it's not obvious to me where the function implementations should go now or why there's no DECL_NSIHTTPCHANNEL anywhere.

/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpHandler.cpp:1385: error: cannot allocate an object of abstract type ‘mozilla::net::HttpChannelChild’
../../../dist/include/mozilla/net/HttpChannelChild.h:79: note:   because the following virtual functions are pure within ‘mozilla::net::HttpChannelChild’:
../../../dist/include/nsIHttpChannel.h:58: note:        virtual nsresult nsIHttpChannel::GetConnectTimeout(PRInt16*)
...
/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpHandler.cpp:1387: error: cannot allocate an object of abstract type ‘nsHttpChannel’
/mnt/compile/mozilla/firefox/trunk/source/netwerk/protocol/http/nsHttpChannel.h:90: note:   because the following virtual functions are pure within ‘nsHttpChannel’:
../../../dist/include/nsIHttpChannel.h:58: note:        virtual nsresult nsIHttpChannel::GetConnectTimeout(PRInt16*)
...
make[1]: *** [nsHttpHandler.o] Fehler 1
Attachment #336574 - Attachment is obsolete: true
Attachment #585886 - Flags: review?
Attachment #336574 - Flags: review?(mcmanus)
Comment on attachment 585886 [details] [diff] [review]
Patch v5

I need a pointer for that last thing.
Attachment #585886 - Flags: review? → review?(mcmanus)
Comment on attachment 585886 [details] [diff] [review]
Patch v5

Review of attachment 585886 [details] [diff] [review]:
-----------------------------------------------------------------

hi ben, I'm sorry this has languished for so long - its a little bit complicated and the code in question is in flux - but I like the basic idea. let's see if we can't iterate it forward.

* the biggest problem here is that you can't drive the connection timeout out of nshttpconnection - the sockets are actually made out of nshttpconnectionmgr, by the time nshttpconnection is called the socket is already tcp connected.

* a related issue is that the connection creation is not tied to a transaction (and therefore your channel timeout params). A new transaction that can't be placed on an existing connection forces a new connection to be made and when that socket is connected the transaction queue is consulted to put them back together - but that transcation might have been dispatched elsewhere in the meantime (for instance another persistent connection became available while waiting for the new connection to complete) - that's a feature. That suggests to me that the transaction might need a timer associated with it instead of trying to use the socket transport logic. see my comments below about the read timeout for an (almost) existing timer tick this might be able to be attached to.

* as far as the read timeouts on active connections the basic approach will proabably work better.. I'd prefer these are called read timeouts instead of request timeouts. (you are actually timing out the response, not the request.. but read or recv is clearer).

* the read timeout has to be cleared when this goes back into the idle connection pool

* bug 603514 also implements some read timeout logic, but it does it off a http specific timer instead of the socket logic. (that patch requires it because it needs some pretty fancy pipeline rearranging not necessarily aborting the transaction). I'd prefer if you could build on that because then we could get approrpiate logging and error codes that were specific to your new property.


* the idl change should be on nsihttpchannelinternal and don't forget to bump the uuid. If for no other reason than I prefer to avoid churning uuid's on interfaces that included by lots of addons.

* as far as why decl_nsihttpchannel went away it has something to do with e10s - jason is the expert. you can look at Get/SetAllowSpdy in HttpBaseChannel for a template on how to add an attribute.
Attachment #585886 - Flags: review?(mcmanus) → review-
a couple other thoughts:

* be sure the read timer does not apply while we're sending a large upload

* what does your timeout mean in the context of two different queues.. first in the connection manager when transactions can sit in a wait queue while all 6 allowable connections are already busy, and second when the transaction has been dispatched as a later part of a pipeline of transactions on one connection and may experience head of line blocking.
It appears that the OS (at least Linux) has it's own way to handle this.

Linux has net.ipv4.tcp_retries2 which specifies the no. of retries (it'll resend the packet) before returning the application an error code.

Firefox ignores this. So does wget (but it has a --timeout option).
Would be nice to land this. The low-level networking APIs allow to set a timeout. Callers of XHR should be able to use them.
If there has to be an optional timeout feature, the start time point must be taken no sooner than we send the data to the socket.  Problem is that it may be held in local buffers.  Timeouts are hard and I personally was always against them being lower than on the application level.  Timeouts always cause pain and never work reliably.
Whiteboard: [lame-network] → [lame-network][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3

Hey Honza,
Do you happen to know if this issue is still relevant today or it can be closed?

Flags: needinfo?(honzab.moz)

Yes. Last time I checked, the Mozilla implementation of XmlHTTPRequest.timeout was a simple setTimeout(). While that's convenient, it doesn't abort the network request in some states, which matters to the server. I found some states where the result will be ignored as far as the client API is concerned, so for the client side, it will be aborted, but the request continues towards the server. That's wasting server resources for cases where the client wants a very short timeout and the timeout is a common situation.

QA Whiteboard: qa-not-actionable

Unassigning bugs owned by Patrick.

Assignee: mcmanus → nobody
Status: ASSIGNED → NEW

Redirect a needinfo that is pending on an inactive user to the triage owner.
:dragana, since the bug has high severity, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(honzab.moz) → needinfo?(dd.mozilla)
Flags: needinfo?(dd.mozilla)

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: