Closed Bug 177919 Opened 22 years ago Closed 17 years ago

Host: line uses "SERV" when connecting to "SERV.SEARCHDOMAIN.TLD"

Categories

(Core :: Networking, defect, P5)

x86
Linux
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bugzilla.mozilla.org, Unassigned)

References

Details

(Whiteboard: [http/1.1])

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003

My /etc/resolv.conf contains "search: searchdomain.tld"

When I navigate to http://www/, the request gets sent to
www.searchdomain.tld, but the client reports www:80 as the
host.

This behaviour is in violation of HTTP 1.1, rfc2616:14,23, because the URI is
being resolved to another URI on the client machine. Before the request goes
out, it is changed to www.searchdomain.tld, so that should be the header's host.

Reproducible: Always

Steps to Reproduce:
1. (client) $ cat "search: myco.tld" > /etc/resolv.conf; mozilla www:9999
2. (server) $ nc -l -p 9999 | grep "Host"
Actual Results:  
Host: www

Expected Results:  
Host: www.myco.tld

This makes accessing local resources a hassle, affects univ, students greatly.
To http...

Assignee: new-network-bugs → darin
Status: UNCONFIRMED → NEW
Component: Networking → Networking: HTTP
Ever confirmed: true
QA Contact: benc → httpqa
this is another one of those bugs that can only be solved by pushing name
resolution much earlier into the load process.
Priority: -- → P2
Whiteboard: [http/1.1]
Target Milestone: --- → mozilla1.3alpha
Hmm, I thought I'd commented on this one already.

Anyway, I had a large argument on why our behaviour was correct, but then I
reread that part of the spec again anbd decided that we were wrong :)

For proxies, though, we don't resolve, and will have to rely on the proxy doing
this for us.

The other problem is that we never know that www maps to www.foo.com. We ask the
DNS service to look up 'www', and it gives us back an IP. We obviously can't try
a reverse lookup for a useful result. unix's gethostbyname's ret value has an
entry for the 'official' name of the host - will that do what we want
everywhere? What about other platforms?
bbaetz: good question... i was thinking of just using PRHostEnt::h_name, which
will actually apply to all major platforms once we switch Windows over to
PR_GetIPNodeByName (required for IPv6 support) and once Mac goes completely
mach-o.  but, you raise an interesting question... is h_name the right value?

cc'ing wtc and jgmyers... do either of you know if h_name is guaranteed to be
the FQDN equivalent of the hostname argument passed to PR_GetIPNodeByName? 
...on all platforms?  thx!
This bug is about the "Host:" header line right? I'd like to clarify the summary
some more.

If we asked the resolver for "www", then that is what we think the "host" is.
Does the HTTP spec look like it contemplated this problem (host lines not being
what they should be?) This is always a potential problem w/ name resolution, NIS
and /etc/host files often conspire to send funny requests to a server.
Consider two things when deciding how to handle this:

1) Think about apache virtual host configuration. If we decide to send www as
the Host:, then the server will need to be configured to accept www.myco.tld and
www as host names. Now tell me how many times you've seen this done on a server
you've worked on?

2) Consider the behavior of other competing browsers. MSIE will get to the "www"
website, but Mozilla will not.
Priority: P2 → P4
hmm... NN4 behaves like mozilla.  anyone know how IE6 behaves?

RFC 2616 has this to say:

  The Host field value MUST represent the naming authority of the origin server or 
  gateway given by the original URL. 

if the original URL is "http://www/" then it seems like we should send "Host: www"
what am i missing in this interpretation of the RFC?
Status: NEW → ASSIGNED
Darin: It depends on your definition of 'naming authority', I think. You can 
read the text both ways, but I think that the rationale makes it clear that 
the full name is what is wanted so that name based virtual hosts work easily.
bbaetz: right, i agree with you, but i disagree that "this behaviour is in
violation of HTTP 1.1, rfc2616:14,23" ... i just don't see it spelled out so
clearly.  it says "original URL"... it doesn't say anything about the URL
resulting from a DNS lookup or anything like that.
-> futuring (we're compatible w/ nav4x, so i don't see the urgency here)
Severity: major → normal
Target Milestone: mozilla1.3alpha → Future
IE6 also behaves like mozilla, so i'm tempted to say WONTFIX unless someone can
give me a convincing argument that we are violating the RFC.
Severity: normal → minor
Priority: P4 → P5
Darin:

Re: your question in comment #4: I am not sure what you meant
by "the FQDN equivalent of the hostname argument passed to
PR_GetIPNodeByName".  So I'll answer with an example.

On Solaris, if you pass "www" to PR_GetIPNodeByName on the
Netscape intranet, the h_name you get back is "mcom.com",
and "www.netscape.com" is an alias.

In this case, the h_name "mcom.com" is a FQDN, but I suspect
it is "www.netscape.com" that is the FQDN that you want.
wtc: yeah, that answers my question, and it also points out why this sort of
fixup might be unreliable or at least difficult to get right.
*** Bug 204430 has been marked as a duplicate of this bug. ***
Just had 204430 marked as a duplicate to this, which I would agree with.

In reference to http://bugzilla.mozilla.org/show_bug.cgi?id=177919#c11 - I would
have said that the implied meaning of www on a host which as a resolv.conf entry
of 'foo.org' is that the full authority is www.foo.org (This is the way a call
to nslookup would handle such an authority).

If the authority is www.subnet1 then the first authority to be checked is
www.subnet1 (which fails) followed by www.subnet1.foo.org which resolves and is
the authority.

In the case of nslookup as the client it doesn't then respond 'www.subnet1'
resolved to xxx.xxx.xxx.xxx, it responds www.subnet1.foo.org resolved to
xxx.xxx.xxx.xxx - I would say that a web browser should behave similarily if it
is going to use gethostbyname and therefore risk having extra domain information
appended to the requested url.

btw - on my system (with a resolve.conf search path including unimelb.edu.au) a
gethostbyname structure (called on "www") returns "www.unimelb.edu.au" as the
name so it should be recoverable. Note - in the case of calling gethostbyname on
an alias I get the alias name (which is the one that would need to be sent as
the Host: header) in the aliases field (as opposed to the official name) - so
this would need to be taken account of (See
http://bugzilla.mozilla.org/show_bug.cgi?id=204430 for my comments on aliases)
-> default owner
Assignee: darin → nobody
Status: ASSIGNED → NEW
Component: Networking: HTTP → Networking
QA Contact: networking.http → networking
Target Milestone: Future → ---
I think this is wontfix. I'm going to cash some cred and mark it as such.

It seems like something relatively simple has been made complicated. The basic idea is that virtual hosting meant that the web site needed to know what was in the hostname of the URL.

I doubt that anyone really meant that the browser should figure out the "real" hostname of the IP address that is returned from a lookup. No other DNS-based application that I know of does this.

And, the real problem in this case lies in the URL, or more specifically the author. A URL like that is inherently ambiguous in two ways. I could mean a top-level domain "www" or a host, "www" in an implied domain.

Whichever it is, there are only two possible results:

1- The author didn't really know what they were doing, and they should have used FQDN's. The URL spec basically says something like "all domain names in URLs should be FQDNs".

2- The author really did know what they were doing, which means they would have configured their (hopefully intranet-only) sever to handle this correctly.

Changing the HTTP<->DNS integration to return an FQDN is probably not the right behavior, this falls into auto-magic network features that seem to inevitably create serious problems down the road. 

The better solution is to add support for an application-level (browser) user setting for default domain name, when people are stuck in content that is not using FQDNs.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
Summary: HTTP client reports SERV as host, not SERV.SEARCHDOMAIN.TLD → Host: line uses "SERV" when connecting to "SERV.SEARCHDOMAIN.TLD"
You need to log in before you can comment on or make changes to this bug.