User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 My /etc/resolv.conf contains "search: searchdomain.tld" When I navigate to http://www/, the request gets sent to www.searchdomain.tld, but the client reports www:80 as the host. This behaviour is in violation of HTTP 1.1, rfc2616:14,23, because the URI is being resolved to another URI on the client machine. Before the request goes out, it is changed to www.searchdomain.tld, so that should be the header's host. Reproducible: Always Steps to Reproduce: 1. (client) $ cat "search: myco.tld" > /etc/resolv.conf; mozilla www:9999 2. (server) $ nc -l -p 9999 | grep "Host" Actual Results: Host: www Expected Results: Host: www.myco.tld This makes accessing local resources a hassle, affects univ, students greatly.
this is another one of those bugs that can only be solved by pushing name resolution much earlier into the load process.
Hmm, I thought I'd commented on this one already. Anyway, I had a large argument on why our behaviour was correct, but then I reread that part of the spec again anbd decided that we were wrong :) For proxies, though, we don't resolve, and will have to rely on the proxy doing this for us. The other problem is that we never know that www maps to www.foo.com. We ask the DNS service to look up 'www', and it gives us back an IP. We obviously can't try a reverse lookup for a useful result. unix's gethostbyname's ret value has an entry for the 'official' name of the host - will that do what we want everywhere? What about other platforms?
bbaetz: good question... i was thinking of just using PRHostEnt::h_name, which will actually apply to all major platforms once we switch Windows over to PR_GetIPNodeByName (required for IPv6 support) and once Mac goes completely mach-o. but, you raise an interesting question... is h_name the right value? cc'ing wtc and jgmyers... do either of you know if h_name is guaranteed to be the FQDN equivalent of the hostname argument passed to PR_GetIPNodeByName? ...on all platforms? thx!
This bug is about the "Host:" header line right? I'd like to clarify the summary some more. If we asked the resolver for "www", then that is what we think the "host" is. Does the HTTP spec look like it contemplated this problem (host lines not being what they should be?) This is always a potential problem w/ name resolution, NIS and /etc/host files often conspire to send funny requests to a server.
Consider two things when deciding how to handle this: 1) Think about apache virtual host configuration. If we decide to send www as the Host:, then the server will need to be configured to accept www.myco.tld and www as host names. Now tell me how many times you've seen this done on a server you've worked on? 2) Consider the behavior of other competing browsers. MSIE will get to the "www" website, but Mozilla will not.
hmm... NN4 behaves like mozilla. anyone know how IE6 behaves? RFC 2616 has this to say: The Host field value MUST represent the naming authority of the origin server or gateway given by the original URL. if the original URL is "http://www/" then it seems like we should send "Host: www" what am i missing in this interpretation of the RFC?
Darin: It depends on your definition of 'naming authority', I think. You can read the text both ways, but I think that the rationale makes it clear that the full name is what is wanted so that name based virtual hosts work easily.
bbaetz: right, i agree with you, but i disagree that "this behaviour is in violation of HTTP 1.1, rfc2616:14,23" ... i just don't see it spelled out so clearly. it says "original URL"... it doesn't say anything about the URL resulting from a DNS lookup or anything like that.
-> futuring (we're compatible w/ nav4x, so i don't see the urgency here)
IE6 also behaves like mozilla, so i'm tempted to say WONTFIX unless someone can give me a convincing argument that we are violating the RFC.
Darin: Re: your question in comment #4: I am not sure what you meant by "the FQDN equivalent of the hostname argument passed to PR_GetIPNodeByName". So I'll answer with an example. On Solaris, if you pass "www" to PR_GetIPNodeByName on the Netscape intranet, the h_name you get back is "mcom.com", and "www.netscape.com" is an alias. In this case, the h_name "mcom.com" is a FQDN, but I suspect it is "www.netscape.com" that is the FQDN that you want.
wtc: yeah, that answers my question, and it also points out why this sort of fixup might be unreliable or at least difficult to get right.
*** Bug 204430 has been marked as a duplicate of this bug. ***
Just had 204430 marked as a duplicate to this, which I would agree with. In reference to http://bugzilla.mozilla.org/show_bug.cgi?id=177919#c11 - I would have said that the implied meaning of www on a host which as a resolv.conf entry of 'foo.org' is that the full authority is www.foo.org (This is the way a call to nslookup would handle such an authority). If the authority is www.subnet1 then the first authority to be checked is www.subnet1 (which fails) followed by www.subnet1.foo.org which resolves and is the authority. In the case of nslookup as the client it doesn't then respond 'www.subnet1' resolved to xxx.xxx.xxx.xxx, it responds www.subnet1.foo.org resolved to xxx.xxx.xxx.xxx - I would say that a web browser should behave similarily if it is going to use gethostbyname and therefore risk having extra domain information appended to the requested url. btw - on my system (with a resolve.conf search path including unimelb.edu.au) a gethostbyname structure (called on "www") returns "www.unimelb.edu.au" as the name so it should be recoverable. Note - in the case of calling gethostbyname on an alias I get the alias name (which is the one that would need to be sent as the Host: header) in the aliases field (as opposed to the official name) - so this would need to be taken account of (See http://bugzilla.mozilla.org/show_bug.cgi?id=204430 for my comments on aliases)
-> default owner
I think this is wontfix. I'm going to cash some cred and mark it as such. It seems like something relatively simple has been made complicated. The basic idea is that virtual hosting meant that the web site needed to know what was in the hostname of the URL. I doubt that anyone really meant that the browser should figure out the "real" hostname of the IP address that is returned from a lookup. No other DNS-based application that I know of does this. And, the real problem in this case lies in the URL, or more specifically the author. A URL like that is inherently ambiguous in two ways. I could mean a top-level domain "www" or a host, "www" in an implied domain. Whichever it is, there are only two possible results: 1- The author didn't really know what they were doing, and they should have used FQDN's. The URL spec basically says something like "all domain names in URLs should be FQDNs". 2- The author really did know what they were doing, which means they would have configured their (hopefully intranet-only) sever to handle this correctly. Changing the HTTP<->DNS integration to return an FQDN is probably not the right behavior, this falls into auto-magic network features that seem to inevitably create serious problems down the road. The better solution is to add support for an application-level (browser) user setting for default domain name, when people are stuck in content that is not using FQDNs.