Preconnect considered harmful?

RESOLVED INVALID

Status

()

Firefox
Untriaged
RESOLVED INVALID
6 years ago
4 years ago

People

(Reporter: Yann HAMON, Unassigned)

Tracking

10 Branch
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

6 years ago
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
Build ID: 20120216100510

Steps to reproduce:

Hello Firefox Team, 

Not a bug per-se (as most browsers have a similar behaviour) but trying to raise your awareness that in some cases, the side-effects of preconnect might result in a worsened end-user experience - with the hope that the webbrowser-building community could find a medium-term work around, as I consider the current situation sub-optimal. Hopefully you'll be able to come up with some magic :)

I am currently working for an ad-serving business, which has the following characteristic:
- When the user loads a page, the page will load one javascript file from one domain
- That javascript will then load one other file, from a second domain (we do not have the same caching policies/TCP configuration for both domains).
- Our proxies are configured to cut any incoming connection after 3 seconds, should, for any reason whatsoever, the ad-serving not be finished by that time - as it could be blocking the calling page from loading properly.

In the last year we started to see an increasing number of broken connections, up to 20% from our total incoming connections. Our proxy was returning 400 & 408 errors to the clients. After some longer in-depth investigation, we managed to identify the problem to be related to preconnect:
* When a user downloads one file from us, he opens a second preconnect connection at the same time. If the user stays idle for more than 3 seconds, our proxy will timeout the connection and send a 408 error to the browser.
* When a user downloads a file from us (after landing on a page serving an ad) - and quits that page before 3 seconds, the browser will close the preconnect connections and send a FIN to the server - those are the 400 errors we were seeing.
* We are particularly affected by preconnect "errors" as in most cases, the browser makes only a single connection to a specific domain.

From this I concluded that these frontend errors were "normal", and due to the behaviour of most modern browsers, and that I couldn't do much about it. Sadly, it also prevents us from finding a decent timeout value - as we are not able anymore to measure with some level of accuracy the ratio of "legitimate" requests that are failing.

Being in the dark in regards to what would be a sufficient timeout, we are definitely encouraged to make the timeout higher than what we would be confident putting otherwise, if we were able to see that the lowered timeout isn't creating connection failure. A direct consequence of this is that errors on our side would have a larger impact on the user, as the timeout is set higher, it could delay the loading of the page by a couple additional seconds.
Also, 400 or 408 errors could be triggered by improper TCP configurations or network problems - and it's now difficult to filter out which errors are relevant.

There is a similar issue on the chrome bug tracking system: http://code.google.com/p/chromium/issues/detail?id=85229 but there doesn't seem to be much interest going on. 

Do you think there could be a way that would allow system administrator to identify which connections are failing due to a preconnect TCP connection (and are not affecting the user), and which connections failures have resulted in elements not loading properly?

Thanks!
(Reporter)

Updated

6 years ago
Depends on: 634278
Thanks Yann - its certainly a development we're all going to have to deal with, but I'm totally convinced this is good for the web.

we currently timeout at 5 seconds if that helps you and generally seems like a good value. (any constant is never good for everyone :))

you'll see more speculative connections from everyone - latency is not improving (and really can't) in the way other resources (bandwidth, cpu, ram, etc..) are and if we don't build algorithms around evading it, latency will stagnate the web.

speculative connections are a good value - they cost a tiny amount of bandwidth and ram (scaling to be cheaper) and some fraction of them save a round trip's worth of latency (not scaling.)

I'm going to close the bug, just because this isn't really a bug per se - but I appreciate the comment, it remains linked to 634278.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
I think there's a legitimate question here, but it seems like there's plenty one could do on the side of the ad agency.

Each browser opens a deterministic number of preconnections, right?  So you can use your HTTP logs to calculate the expected number of preconnections which will come along with each real connection, based on how popular each browser is.  Suppose Firefox opens 2 preconnections and is 40% of your traffic and Chrome opens 3 preconnections and is 60% of your traffic, then your expected number of preconnections per connection is .4 * 2 + .6 * 3 = 2.6.

So now look at your TCP logs.  Suppose we have 30 successfully-completed connections and 200 failures.  Divide 230 by (2.6 + 1) to get the number of non-preconnections, 64 in this case.  All of the successful connections are due to real connections (because you only serve one file per domain), so our rate of connection success is 30/64 = 47%.

Now you can experiment with changing your timeout and have a (rough) idea of how it affects the real connections to your server.  Does that solve your problem?
(Reporter)

Comment 3

6 years ago
Hello Justin, I see what you mean- indeed that would help get a very rough idea, if we had perfect numbers. But I fear it might be too approximate to be useful - and figuring out the number of expected preconnects from the useragent might be quite a challenge... 

Plus Chromium uses a speculative number of preconnect: http://code.google.com/p/chromium/issues/detail?id=64246 , so sometimes it might open one preconnect (or several)- and the next time not any, from what I understand. 

A way for the server to give hints to the browser how many preconnect sessions to open would help - the server could hint the browser to open 10 tcp pre-connect sessions, or.. none. But I wouldn't be sure how to get this done, considering we're at TCP level.
> sometimes it might open one preconnect (or several)- and the next time not any, from what I 
> understand. 

I see, that's tricky.

Can you do something with the ordering of the connections?  Like, is the first SYN received usually the real connection, while the other ones are usually the preconnections?
One thing which crosses my mind -- this wouldn't necessarily help Yann, but it could help us -- is if Firefox used historical data to tune how many preconnections to create.

Are preconnections at all a scarce resource on the client side?  If so, we could probably improve our allocation of them by remembering which domains benefit from preconnection and which ones don't.
(In reply to Justin Lebar [:jlebar] from comment #5)
> One thing which crosses my mind -- this wouldn't necessarily help Yann, but
> it could help us -- is if Firefox used historical data to tune how many
> preconnections to create.
> 
> Are preconnections at all a scarce resource on the client side?  If so, we
> could probably improve our allocation of them by remembering which domains
> benefit from preconnection and which ones don't.

we'll probably do something like that when we really implement this. (there is a reason the linked bug is still open :))

we don't actually implement extra preconnections explicitly yet, we've just got a couple algorithms that effectively create some of the same thing as an artifact.

syn-retry is one of them. If we don't get a handshake quickly we'll start another connection in parallel. If they both eventually complete we keep the "extra" one around for a few seconds to see if it would be useful - given that everybody has done all the work for it already :)

transaction cancellations can have the same effect - if we open a connection for a pending transsaction and that trans gets cancelled before the connection is complete we don't abandon the connection - we just put it in the pool for a few seconds to see if it gets matched up with another soon to arrive transaction.

We're going to start racing connection establishment against some cache lookups too because the IO and locking involved in the cache can be significant depending on the host. Its possible we'll make too many connections for what we really need in the end, they go into the pool too.

The theme here is that the handshake is a real bottleneck and the penalty involved in wasting a few connections is something we (in the internet sense) should be willing to architect around. nginx claims the server side state per connection is 250 bytes userspace, plus 250 bytes kernel space - not a big deal. (Yann has a different issue, I understand. Its just something that should be considered.).
(Reporter)

Comment 7

6 years ago
Well getting away from my initial issue, but let me throw one idea too :)

Cross domain preconnect (intelligent preconnect ip pooling?) - I'm not familiar with firefox's preconnect implementation (maybe already implemented?), and it wouldn't help in my specific case as my different domains listen on different ip addresses ; however, imagine a website like openstreetmap or google maps that uses several different virtual hosts (a.tiles.*, b.tiles.*, ....) - all pointing to the same proxies. I believe this used to be done to get around the network.http.pipelining.maxrequests limit - that used to be fairly low. It could be useful to use the same preconnect connections for all the domains, if firefox realize the different domains point to the same proxy.

It's quite tricky as a webmaster: you're trying to get around the maxrequest limit by putting your tiles on several domains, but by doing this you might now be killing TCP optimisations!

This could get funny (continuing with the openstreetmap example) if they had 20 proxies, in DNS round robin. Like:
a.tile.openstreetmap.org resolves to 11.11.11.11, 33.33.33.33, 77.77.77.77, 99.99.99.99 and b.tile.openstreetmap.org resolves to 22.22.22.22, 33.33.33.33, 77.77.77, 99.99.99.99 - loading a file from a.tile.openstreetmap.org, firefox opens a two TCP connections to 33.33.33.33 (one regular, one preconnect). A few milliseconds later it needs to load a file from b.tile.openstreetmap.org - it could notice that it has already a preconnect connection opened to one of the IPs b.tile.openstreetmap.org resolves to, and reuse that connection.

Not sure if it would be worth the CPU & RAM spent though - but with this, and a speculative number of preconnects like mentionned above, things like maps tiles could load significantly faster.
What you are referring to is IP-coalescing.  This is not what HTTP is currently designed for.  But SPDY does that.
You need to log in before you can comment on or make changes to this bug.