Closed Bug 853423 Opened 11 years ago Closed 11 years ago

speculative connections and single thread blocking management servers (modems/routers)

Categories

(Core :: Networking: HTTP, defect)

x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: mcmanus, Assigned: sworkman)

References

Details

Attachments

(2 files, 7 obsolete files)

15.76 KB, patch
sworkman
: review+
Details | Diff | Splinter Review
9.42 KB, patch
mcmanus
: review+
Details | Diff | Splinter Review
https://groups.google.com/forum/?fromgroups=#!topic/httpfiddler/qnWYKz39a4c[1-25-false]


 	
Mar 19
Let me provide some background for this question.

Using Firefox, I consistently receive a '400 Bad Request' error from my DSL modem (192.168.0.1) after a few clicks on any of the links of its website. This modem uses micro_httpd as its web server. Reviewing the source code for micro_httpd suggests that this error only occurs if the server receives a null request:

    if ( fgets( line, sizeof(line), stdin ) == (char*) 0 )
send_error( 400, "Bad Request", (char*) 0, "No request found." );


When I run Firefox standalone, I get a 400 error every few clicks. However, when I have Fiddler running, I have NEVER gotten a 400 error (even after over 100 clicks). I can only speculate that Fiddler is somehow modifying the browser requests.

Any ideas or insights as to why this is happening?
	EricLaw 	
Mar 19
Generally speaking, Fiddler doesn't modify browser requests unless you tell it to.
 
Interestingly, I probably know why this is happening, since we debugged a similar problem with IE9 beta. The most likely explanation is that your router is single-threaded and can only handle a single connection at once. Most modern browsers will establish multiple connections in parallel so that there's a "free" connection available to service later requests. The problem is that this "free" connection gums up a single-threaded webserver since there's no request on the socket immediately and hence it waits a few seconds, then throws the 400.
 
Putting Fiddler in the middle makes such a problem go away because the browser's speculative/extra connection goes to Fiddler, not to the router. Fiddler only connects to the router when it's actually sending a request, which means that you never have an "empty" connection that causes the router to throw the 400.
I think Eric Lawrence (of fiddler and formerly of MS) has this right.

The embedded webservers are hanging and timing out connections in serial instead of processing the ones that have data.. probably taking so long that they effectively all time out.

We saw this with a dedicated websockets server or two too. (we fixed it by disabling speculative connections around websockets iirc).

It was suggested to me that there isn't a good general answer to this, but we can disable speculative connections to rfc 1918 addresses pretty painlessly and work around most of the consumer embedded class of devices.
Assignee: mcmanus → sworkman
In the case of FF15, I saw this on Windows XP as well, so I changed the platform from Linux to ALL.  Probably should change this from X86_64 to all as well.

The description in comment 1 is exactly correct.  micro_httpd will listen for one connection at a time and time-out connections after a timeout (usually one second) and sends the 400.  If one of the speculative connection gets used just as it times out, the browser suddenly has committed to the connection just as the 400 is on its way to the browser.

The suggested approach, disabling speculative connections for rfc1918 addresses is a good idea as there are hundreds of millions of these webservers out there.

The other approach worth considering is to have the browser detect either

a) the first few terminated speculative connections to an IP address 
b) the single-threaded nature of the webserver as the connections to retrieve graphics on the initial page are not accepted more than one at a time

and then disable speculative connections to that address for the remainder of the session.
OS: Linux → All
Pat, is this the sort of thing you're looking for? I'm assuming these routers are old and don't do IPv6 - is that assumption false and should I include some IPv6 local address filtering?
Attachment #781392 - Flags: feedback?(mcmanus)
Comment on attachment 781392 [details] [diff] [review]
v1.0 Block speculative connections for RFC1918 addresses

Review of attachment 781392 [details] [diff] [review]:
-----------------------------------------------------------------

sure, I think that's a reasonable step.

::: netwerk/protocol/http/nsHttpConnectionInfo.cpp
@@ +116,5 @@
>  
> +bool
> +nsHttpConnectionInfo::HostIsRFC1918Literal()
> +{
> +    PRNetAddr tempAddr;    

whitespace
Attachment #781392 - Flags: feedback?(mcmanus) → feedback+
Hmm, which is a reasonable step: assuming IPv6 is not needed, or including some IPv6 local address filtering? :)
1] your patch only works on the hostname.. unfortunately it needs to work on the resolved address that is being connected to. i.e. you're going to have to let sockettransport know with a flag that you don't want to do the connect if it is a local address and then do the local checking inside sockettransport.

2] I think it would be good to include ipv6 unique local addresses along with rfc 1918 addresses here because of #1.. we're not just talking literals. My thinking has changed on it - sorry for the confusion.
-- Added blocking for IPv6 local addresses.
-- Added blocking for resolved addresses.

I need to do some testing on this, hence feedback only. Also, I want to make sure that the blocking is communication properly back up the chain. Nonetheless, let me know if this is the right direction or not.
Attachment #781392 - Attachment is obsolete: true
Attachment #803390 - Flags: feedback?(mcmanus)
Comment on attachment 803390 [details] [diff] [review]
v2.0 Stop speculative connections for local IP addresses

Review of attachment 803390 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsSocketTransport2.cpp
@@ +1163,5 @@
> +    // Hosts that are Local IP Literals should not be speculatively
> +    // connected - Bug 853423.
> +    if (mConnectionFlags & nsISocketTransport::SPECULATIVE_CONNECT &&
> +        IsIPAddrLocal(&mNetAddr)) {
> +#ifdef PR_LOGGING

I think what you really want (maybe in addition to this) is LOGMUMBLE_ENABLED (look for log3_enabled in nsHttpTransaction.cpp) - because PR_LOGGING is pretty much always defined

::: netwerk/dns/DNS.h
@@ +159,5 @@
>  bool IsIPAddrAny(const NetAddr *addr);
>  
>  bool IsIPAddrV4Mapped(const NetAddr *addr);
>  
> +bool IsIPAddrLocal(const NetAddr *addr);

you can mark the method const

::: netwerk/protocol/http/nsHttpConnectionInfo.cpp
@@ +116,5 @@
>          return false;
>      return !mProxyInfo->IsDirect();
>  }
>  
> +bool

we also need to consider the proxy case.. if a proxy is being used then that is the host that should be checked

::: netwerk/protocol/http/nsHttpConnectionInfo.h
@@ +96,5 @@
>      // Returns true for any kind of proxy (http, socks, etc..)
>      bool UsingProxy();
>  
> +    // Returns true when mHost is an RFC1918 literal.
> +    bool HostIsLocalIPLiteral();

const method

::: xpcom/base/ErrorList.h
@@ +204,5 @@
>    /* The connection attempt to a proxy failed. */
>    ERROR(NS_ERROR_PROXY_CONNECTION_REFUSED,  FAILURE(72)),
> +  /* The speculative connection attempt was forbidden, e.g. for Local IP
> +   * addreses */
> +  ERROR(NS_ERROR_SPECULATIVE_CONNECTION_FORBIDDEN,  FAILURE(73)),

unfortunately we need to do a lot more to add a new error code.. see

http://mxr.mozilla.org/mozilla-central/search?string=corruptedcontenterror

as the most simple example.

you can do that or just reuse something that's already there.
Attachment #803390 - Flags: feedback?(mcmanus) → feedback+
Thanks for the comments on the other patch, Pat. I'll get to them next.

In the meantime, I've been working on adding 2 groups of sub-tests to test_speculative_connections.js:

1 - Test for IP literals used in the URI: test uses IOService to get the HTTP Handler, in order to get a sync result. IOService.SpeculativeConnect does not return success or failure for the speculative connect. HTTP Handler will return NS_ERROR_ABORT if a local address is detected (Yes, that means I took the easy way and avoided creating a new error type; ABORT, or something similar, will do just fine).

2 - Test for hostnames that resolve to local IPs: I've used some inner knowledge of the DNS resolver here. Rather than trying to get a real list of hostnames that map to local addresses, I'm passing a string version of the address when creating the nsISocketTransport object. It's the same codepath; the resolver will respond async'ly with the IP literal converted to a NetAddr, and then the code will check to see if it's local or not. I don't see anything being done to the hostname in nsSocketTransport from the time of object creation to ResolveHost. So, this should be fine. Let me know if you see an issue here.

I also specified a target in asyncWait, because it gets around the problem of the js callback object being AddRefed/Released off main thread. If a thread is specified in asyncWait, it is added to an nsOutputStreamReadyEvent, on the main thread, and then this object can be (and id) Addrefed/Released on the socket thread.

For the moment, there are two arrays set up: they have identical addresses, but one has braces to be used in URIs, while the other does not, making it suitable for use in hostnames passed to CreateTransport.

TODO:
-- Consider the proxy case (as with the other patch).
-- Consider cases that should succeed for IP literals.

Let me know if I going in the right direction here.
Attachment #806827 - Flags: feedback?(mcmanus)
Made some changes to block speculative connections for local proxies.

I'll set this for review once the tests are complete and pass; they're almost complete. Current test patch will be uploaded next.
Attachment #803390 - Attachment is obsolete: true
Added test cases:

-- remote addresses that should succeed (or fail differently)
-- local and remote proxies (should fail and succeed respectively, as with direct connections).

Proxy tests are a work in progress, so this is just an update to the patch. Review will be requested when it's done.
Attachment #806827 - Attachment is obsolete: true
Attachment #806827 - Flags: feedback?(mcmanus)
Attachment #809996 - Flags: review?(mcmanus)
Updated for proxies (works on my mac, pushed to try to verify on other platforms).

-- Failure cases: Expect an NS_ERROR_ABORT.
-- Success cases: Expect either NS_ERROR_NET_TIMEOUT or NS_ERROR_PROXY_CONNECTION_REFUSED, so that they can be run when addresses are reachable (e.g. running from desktop) or not (try environment ... I think). I've set the connection timeout to 1s so the timeout is pretty quick. And it's only a few addresses it checks, so the combined timeout should be small.

Let me know what you think.

try: https://tbpl.mozilla.org/?tree=Try&rev=d53631838080
Attachment #810003 - Attachment is obsolete: true
Attachment #810898 - Flags: review?(mcmanus)
Comment on attachment 809996 [details] [diff] [review]
v2.1 Stop speculative connections for local IP addresses

Review of attachment 809996 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/public/nsISocketTransport.idl
@@ +179,5 @@
>      /**
> +     * If set, indicates that the socket was initiated for a speculative
> +     * connection.
> +     */
> +    const unsigned long SPECULATIVE_CONNECT = (1 << 5);

lets call this DISABLE_RFC1918

::: netwerk/base/src/nsSocketTransport2.cpp
@@ +1164,5 @@
> +    // Hosts/Proxy Hosts that are Local IP Literals should not be speculatively
> +    // connected - Bug 853423.
> +    if (mConnectionFlags & nsISocketTransport::SPECULATIVE_CONNECT &&
> +        IsIPAddrLocal(&mNetAddr)) {
> +#ifdef PR_LOGGING

put this whole logging block in if SOCKET_LOG_ENABLED(){}

@@ +1177,5 @@
> +                    "address [%s]",
> +                    mHost.get(), mPort, mProxyHost.get(), mProxyPort,
> +                    netAddrCString.get()));
> +#endif
> +        return NS_ERROR_ABORT;

make this NS_ERROR_CONNECTION_REFUSED - that way the failover logic will kick in for the bizare circumstance where the dns record contains both a 1918 and non 1918 address.

::: netwerk/dns/DNS.cpp
@@ +181,5 @@
> +{
> +  MOZ_ASSERT(addr);
> +
> +  // IPv4 RFC1918 and Link Local Addresses.
> +  if (addr->raw.family == AF_INET &&

let's do ntohl() once for the address rather than 5 times?

@@ +183,5 @@
> +
> +  // IPv4 RFC1918 and Link Local Addresses.
> +  if (addr->raw.family == AF_INET &&
> +      // 10/8 prefix (RFC 1918).
> +      (ntohl(addr->inet.ip) >> 24 == 10 ||

== 0x0A (just to keep it in hex like the other addresses)

@@ +185,5 @@
> +  if (addr->raw.family == AF_INET &&
> +      // 10/8 prefix (RFC 1918).
> +      (ntohl(addr->inet.ip) >> 24 == 10 ||
> +       // 172.16/20 prefix (RFC 1918).
> +       (ntohl(addr->inet.ip) >> 16 >= 0xAC10 &&

shouldn't 176.16/20 be
 ((ntohl(addr->inet.ip) >> 12 == 0xAC100) ?

::: netwerk/dns/DNS.h
@@ +159,5 @@
>  bool IsIPAddrAny(const NetAddr *addr);
>  
>  bool IsIPAddrV4Mapped(const NetAddr *addr);
>  
> +bool IsIPAddrLocal(const NetAddr *addr);

you can make this
bool IsIPAddrLocal(const NetAddr *addr) const;

::: netwerk/protocol/http/nsHttpConnectionInfo.cpp
@@ +121,5 @@
> +nsHttpConnectionInfo::HostIsLocalIPLiteral()
> +{
> +    PRNetAddr prAddr;
> +    // If the host/proxy host is not an IP address literal, return false.
> +    if (ProxyHost() &&

if(proxyhost) {
 if (strToAddr(proxyhost) != SUCCESS)
  return false;
} else if (strToAddr(host) != SUCCESS) {
 return false;
}

right? The way it is written it will test strToAddr(host) even when proxyhost() is defined if proxyhost fails the strtoaddr part of the check.

::: netwerk/protocol/http/nsHttpConnectionInfo.h
@@ +92,5 @@
>      // Returns true for any kind of proxy (http, socks, etc..)
>      bool UsingProxy();
>  
> +    // Returns true when mHost is an RFC1918 literal.
> +    bool HostIsLocalIPLiteral();

bool HostIsLocalIPLiteral() const;

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +330,5 @@
> +    // connected - Bug 853423.
> +    if (ci && ci->HostIsLocalIPLiteral()) {
> +        LOG(("nsHttpConnectionMgr::SpeculativeConnect skipping RFC1918 "
> +             "address [%s]", ci->Host()));
> +        return NS_ERROR_ABORT;

return NS_OK here. Its all speculative and nondeterministic afterall.

::: xpcom/base/ErrorList.h
@@ +203,5 @@
>    ERROR(NS_ERROR_NET_INTERRUPT,             FAILURE(71)),
>    /* The connection attempt to a proxy failed. */
>    ERROR(NS_ERROR_PROXY_CONNECTION_REFUSED,  FAILURE(72)),
> +  /* The speculative connection attempt was forbidden, e.g. for Local IP
> +   * addreses */

I think you forgot to remove this
Attachment #809996 - Flags: review?(mcmanus)
Attachment #810898 - Flags: review?(mcmanus) → review+
(In reply to Patrick McManus [:mcmanus] from comment #13)
> Comment on attachment 809996 [details] [diff] [review]
> v2.1 Stop speculative connections for local IP addresses
> 
> Review of attachment 809996 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/base/public/nsISocketTransport.idl
> @@ +179,5 @@
> >      /**
> > +     * If set, indicates that the socket was initiated for a speculative
> > +     * connection.
> > +     */
> > +    const unsigned long SPECULATIVE_CONNECT = (1 << 5);
> 
> lets call this DISABLE_RFC1918

OK. I'll add a comment to say it disables IPv6 equivalents as well.

> ::: netwerk/base/src/nsSocketTransport2.cpp
> @@ +1164,5 @@
> > +    // Hosts/Proxy Hosts that are Local IP Literals should not be speculatively
> > +    // connected - Bug 853423.
> > +    if (mConnectionFlags & nsISocketTransport::SPECULATIVE_CONNECT &&
> > +        IsIPAddrLocal(&mNetAddr)) {
> > +#ifdef PR_LOGGING
> 
> put this whole logging block in if SOCKET_LOG_ENABLED(){}

Done.

> @@ +1177,5 @@
> > +                    "address [%s]",
> > +                    mHost.get(), mPort, mProxyHost.get(), mProxyPort,
> > +                    netAddrCString.get()));
> > +#endif
> > +        return NS_ERROR_ABORT;
> 
> make this NS_ERROR_CONNECTION_REFUSED - that way the failover logic will
> kick in for the bizare circumstance where the dns record contains both a
> 1918 and non 1918 address.

Ah, I see. Makes sense.

> ::: netwerk/dns/DNS.cpp
> @@ +181,5 @@
> > +{
> > +  MOZ_ASSERT(addr);
> > +
> > +  // IPv4 RFC1918 and Link Local Addresses.
> > +  if (addr->raw.family == AF_INET &&
> 
> let's do ntohl() once for the address rather than 5 times?

Ah yes. Stupid.

> @@ +183,5 @@
> > +
> > +  // IPv4 RFC1918 and Link Local Addresses.
> > +  if (addr->raw.family == AF_INET &&
> > +      // 10/8 prefix (RFC 1918).
> > +      (ntohl(addr->inet.ip) >> 24 == 10 ||
> 
> == 0x0A (just to keep it in hex like the other addresses)

Indeed.

> @@ +185,5 @@
> > +  if (addr->raw.family == AF_INET &&
> > +      // 10/8 prefix (RFC 1918).
> > +      (ntohl(addr->inet.ip) >> 24 == 10 ||
> > +       // 172.16/20 prefix (RFC 1918).
> > +       (ntohl(addr->inet.ip) >> 16 >= 0xAC10 &&
> 
> shouldn't 176.16/20 be
>  ((ntohl(addr->inet.ip) >> 12 == 0xAC100) ?

It's actually 172.16/12, so I'm glad you caught it.

Changed to (addr >> 20 == 0xAC1).

> ::: netwerk/dns/DNS.h
> @@ +159,5 @@
> >  bool IsIPAddrAny(const NetAddr *addr);
> >  
> >  bool IsIPAddrV4Mapped(const NetAddr *addr);
> >  
> > +bool IsIPAddrLocal(const NetAddr *addr);
> 
> you can make this
> bool IsIPAddrLocal(const NetAddr *addr) const;

Nope: not a member function, so can't be const'd like that.

> ::: netwerk/protocol/http/nsHttpConnectionInfo.cpp
> @@ +121,5 @@
> > +nsHttpConnectionInfo::HostIsLocalIPLiteral()
> > +{
> > +    PRNetAddr prAddr;
> > +    // If the host/proxy host is not an IP address literal, return false.
> > +    if (ProxyHost() &&
> 
> if(proxyhost) {
>  if (strToAddr(proxyhost) != SUCCESS)
>   return false;
> } else if (strToAddr(host) != SUCCESS) {
>  return false;
> }
> 
> right? The way it is written it will test strToAddr(host) even when
> proxyhost() is defined if proxyhost fails the strtoaddr part of the check.

Yup.

> ::: netwerk/protocol/http/nsHttpConnectionInfo.h
> @@ +92,5 @@
> >      // Returns true for any kind of proxy (http, socks, etc..)
> >      bool UsingProxy();
> >  
> > +    // Returns true when mHost is an RFC1918 literal.
> > +    bool HostIsLocalIPLiteral();
> 
> bool HostIsLocalIPLiteral() const;

Yup.

> ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
> @@ +330,5 @@
> > +    // connected - Bug 853423.
> > +    if (ci && ci->HostIsLocalIPLiteral()) {
> > +        LOG(("nsHttpConnectionMgr::SpeculativeConnect skipping RFC1918 "
> > +             "address [%s]", ci->Host()));
> > +        return NS_ERROR_ABORT;
> 
> return NS_OK here. Its all speculative and nondeterministic afterall.

Done.

It means there is no way to test the nsHttpHandler api (which calls this directly). But that's ok. We talked on IRC about this: the decision was to remove the test which calls the nsHttpHandler api. Even if the check doesn't work as desired in this function, there is still the check in the socket layer. And that one is testable.

> ::: xpcom/base/ErrorList.h
> @@ +203,5 @@
> >    ERROR(NS_ERROR_NET_INTERRUPT,             FAILURE(71)),
> >    /* The connection attempt to a proxy failed. */
> >    ERROR(NS_ERROR_PROXY_CONNECTION_REFUSED,  FAILURE(72)),
> > +  /* The speculative connection attempt was forbidden, e.g. for Local IP
> > +   * addreses */
> 
> I think you forgot to remove this

Indeed I did.
Attachment #809996 - Attachment is obsolete: true
Attachment #811363 - Flags: review?(mcmanus)
Comment on attachment 811363 [details] [diff] [review]
v2.2 Stop speculative connections for local IP addresses

Review of attachment 811363 [details] [diff] [review]:
-----------------------------------------------------------------

this patch doesn't match the comments in comment 14.. I think its the wrong file
Attachment #811363 - Flags: review?(mcmanus)
Correct patch uploaded (I had made manual changes to the patch, but hadn't qref'd before uploading).
Attachment #811363 - Attachment is obsolete: true
Attachment #811375 - Flags: review?(mcmanus)
Attachment #811375 - Flags: review?(mcmanus) → review+
Trying again: increased connection timeout from 1s to 5s - wondering if the connection was timing out early, and that's why NS_ERROR_NET_TIMEOUT was thrown, and not NS_ERROR_CONNECTION_REFUSED (expected).

https://hg.mozilla.org/integration/mozilla-inbound/rev/6a84bba784c9
https://hg.mozilla.org/integration/mozilla-inbound/rev/df399d443e57
I was emailed regarding a possible clobber issue.
I don't see anything in the patches or in the build system in affected directories that indicates this could be a clobber issue. But to be sure, I recommend pushing this on top of a twig. Does it reproduce after a clobber build? If so, then definitely not a clobber issue.

Since it doesn't occur in Try, it's possible this due to different machine configurations in automation land. You'll need to collect a lot more data (likely via logs) to prove anything. You can also ask RelEng for a loaner machine so you can run tests there, attach debuggers, etc.
I wouldn't go so far as to say that you just can't win, but you certainly can't without a struggle.

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/6a3066431da8 for leaks, so far in OS X crashtests and reftests but the debug ones for other platforms haven't finished yet to know whether or not they also plan on leaking.
Sigh. Thanks for backing out anyhow.
Depends on: 936685
Landed 936685 yesterday to fix the memory leak - it hasn't been backed out, so let's try this one again :)

https://hg.mozilla.org/integration/mozilla-inbound/rev/cf1ca47f2830
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccc9e2696fa5
https://hg.mozilla.org/mozilla-central/rev/cf1ca47f2830
https://hg.mozilla.org/mozilla-central/rev/ccc9e2696fa5
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Depends on: 1076129
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: