(This bug imported from BugSplat, Netscape's internal bugsystem. It was known there as bug #114981 http://scopus.netscape.com/bugsplat/show_bug.cgi?id=114981 Imported into Bugzilla on 02/03/99 18:48) If you set an HTTP proxy (e.g. w3proxy.netscape.com:8080) and type an Internet Keyword in the Location field, the proxy comes back with an error saying that it could not find the host (using the keyword as a hostname). When you type in "dogs and cats" it will work. What's happening is that in the proxy case its using the proxy before adding the "go " to the front.
I have a slightly different theory about what's happening. When you set an HTTP proxy, the client uses it to try to resolve the hostname (or what it thinks must be a hostname, even though we've typed a keyword). If you type "toyota", the proxy turns that into "www.toyota.com", and succeeds in finding it. If you type "blurfl", the proxy fails to find a host called "blurfl".
Guha will be out of town soon. The 4.x branch will be merged into Nova on 5/11/98. Does this bug need to be fixed before the Nova merge?
I wasnt aware that any proxies did the "www.X.com" trick. Most dont. Anyway, the problem is that there may be no DNS on a client using proxy and so the algorithm needs to be modified. Since some proxies do indeed resolve unqualified names (e.g. warp) then when someone types in a single keyword then netlib must give it to the proxy for resolution. This implies that netlib needs to know why the proxy failed. Since the proxy doesnt distinguish between failed and failed DNS in the result code (I think) then netlib is going to have to treat 404 and other error codes differently in that case. Refering to Judson for an opinion...
Date: Fri, 24 Apr 1998 00:22:33 -0700 Erik van der Poel wrote: > > Hi Lou and Ari, > > If Navigator is using an HTTP proxy, and it asks the proxy to resolve > a URL that contains a non-existent hostname (i.e. gethostbyname() > fails in the proxy), what is the proxy supposed to return? Is it > supposed to return 404, or 400, or what? Our UNIX proxy returns 500. It shouldn't be 404 because I think HTTP status codes refer to objects within servers, not the existence or non-existence of servers themselves. Since the proxy is not the originator of the data, it cannot definitively determine the existence or non-existence of a server (or an object within a server) -- in a case where it fails to connect to one (regardless of whether the reason is DNS, server down, or midstream proxy down). Only an origin server can say something like "404 Not found" or "410 Gone". It shouldn't be 400 because the request format itself was ok. I further think that it shouldn't be any 4xx class status code (client's request bad), since the problem may be a temporary DNS problem, or a problem on the proxy itself. I would therefore continue to vote that 500 (or some other 5xx class status code) is the correct code for this case. Our NT proxy may return some other status, though. I don't have one up right now so I can't check. I'm Cc'ing Neel who might be able to answer that. --Neel? --Thanks!! -- Ari Luotonen, Mail-Stop MV-068 Opinions my own, not Netscape's. Netscape Communications Corp. firstname.lastname@example.org 501 East Middlefield Road http://people.netscape.com/ari/ Mountain View, CA 94043, USA Netscape Proxy Server Development ============================= Neel later replied that the NT proxy also returns 500 in this case. -- erik
Note also that Ari sent my email to Jim Gettys (HTTP guy) in order to get the HTTP spec itself tightened up in this regard (i.e. lack of specification for this case).
note: what we've done has completely removed the client's ability to explicity notify the user that a host can't be found. Anytime a dns lookup fails (local or proxy), the text that failed is sent to keyword.netscape.com. someone needs to resolve what the http response code from the proxy server is that we need to catch when the proxy's dns failure occurs. Once we have this the "send off to keyword.netscape.com" code can be injected where we catch http response codes. This problem is magnified when Proxy Auto configuration is used.
reassigning to guha as he's apparently the one who's working on keyword stuff. I just spoke with Ari and he said Erik has the proxy return code info. All that needs to happen here is the proxy return code needs to be caught in mhttp.c>net_parse_first_http_line() and the keyword call(s) need to be inserted (this should be similair to the 304 redirect case I'm thinking).
Yes, I pasted the proxy return code info into this bug report. See above.
removed Rick Takeda as QA assigned - no longer with NSCP.
assigning Greg as QA Assigned to, please assign tothe appropriate engineer
Ok, bear with me as I am not a proxy guru. 6/2 Win32 4.06 build. I manually entered "w3proxy.netscape.com:8080" and configed for manual proxy, tried "toyota" which resolved and then tried "dogs and cats" and "blurfl". The latter two returned: "Error, The requested item could not be loaded by this proxy. Netscape Proxy is unable to locate the server: keyword.netscape.com The server does not have a DNS entry. Check the server name in the Location (URL) and try again" I take it this is what should occur, and this bug is fixed, or should it have resolved and dumped me to the search engine?
keyword.netscape.com is currently mapped to uruk, an internal machine (inside the firewall). w3proxy.netscape.com is outside the firewall, and cannot find keyword.netscape.com in the DNS database. keyword.netscape.com will properly exist outside the firewall when it is ready. In the meantime, it would be best to verify this bug fix by using an internal proxy (inside the firewall). I have one running at bluejay:8080. Please try this. Essentially, you should get the same results, whether or not you are using a proxy. If not, this bug has not been fixed properly.
Ok, used bluejay as proxy vs. direct connection. Enter "toyota", resolved properly to toyota site. Enter "mice and men" dumped to search engine in both cases. Enter garbage URL and with proxy I get the Error message; on direct connect I get dumped to search engine. So for this last test it would seem we are still broken since the internal proxy should have worked to access uruk the internal server? Or am I misunderstanding the last explanation?
I think you performed the test correctly.
Reopening as we apparently are still broken
I have no idea how to fix this or whether this is even a bug. Yes, if some proxies behave badly, keywords won't work and the browser will work exactly as 4.05 used to work. But there is nothing we can do about it.
Assigning to erik for re-evaluation....please see Guha's comments
If you enter a "garbage URL" (with non-existent hostname), then the proxy will fail to lookup the host in DNS, and return an HTTP status code of 500. There may be some way to detect this in the client, and then to failover to the search query instead. I am not that familiar with libnet. One of the libnet experts would be able to comment on the feasibility of such a fix.
so if we don't ship SB features this goes away. if we do its a release note item.
Clearing fixed rez as it is not fixed yet, and it appears from last entries this needed some more look over and if SB stays in a possible fix.
assign to jud for one last look, then going to close it as wont fix.
Is pondering done? would like to mark as Won't Fix and Release Note.
Putting in Status Summary comments.
Changing TFV to M14. valeski, could you either confirm that it will or will not be fixed in Nova?
Judson, we need a determination from you on whether this will be fixed in 4.5 or whether we'll declare that certain proxy servers won't work with Smart Browsing. I believe you're in the best position to make that call from a policy and standard standpoint.
Last entry is from mourey asking for final input yet bug is marked won't fix?..so reopening to flag for final input from Judson.
I've indicated what would need to be done, whoever is familiar with the keyword code (Guha I think) can take my suggestion and run with it, otherwise we'll just need to take Chris Hofmann's suggestion to release note that keywords don't work through proxies. I'm reassigning to guha to get this off my plate.
resolving as wontfix per bug triage mtg.
Guys, I understand that our plates are too full, but please consider that Internet Keywords is considered one of the most strategic things we as a company are doing. I really think that we should fix this.
jar - can you help with this one? Moving to 4.5b2 whilst more discussion happens. kristif - this is a Release Note for M14.
I looked at this bug, and I'm pretty sure it is a proxy server problem. I used bluejay in my test, per instructions above, except I used telnet to manually test the proxy (I had to guess how to use it), and I was able to see the actual return codes. The first time I tried: telnet bluejay 8080 GET http://dogs and cats The response I got began: HTTP/1.0 500 Error from proxy That response caused the keywords function to work like a charm (when I tried this with Navigator 4.5). Alas, when I tried the following: telnet bluejay 8080 GET http://nevernevernevernever Then I did *NOT* get the leading error code! All I got was the error page. Try it yourself using telnet. Perhaps I don't know how proxys work (very possible), but the above test seemed to show that bluejay did *NOT* return a leading error code, or any of the common header stuff (re: mime-version, proxy-agent, content-type) that should appear before the page. As a result, the navigator can't detect the fact that a page was NOT served. Simply put, this is a server error (unless someone can tell me how a proxy server is supposed to work.). We could try to work around this proxy server error, but it would be hard (actually impossible to be perfect... and a hack all the way). I think the correct resolution is a Wontfix, but it should really be raised as a bug to the proxy server folks. For now, we should release not the fact that a solo keyword will not put you into the search system if a Netscape proxy is used. My apology in advance if I'm misunderstanding how to mess with a proxy server... but my tests sure explain the behavior we're seeing on Navigator.
No, you wrote an invalid request to the proxy. First of all, an HTTP/1.x request looks like this: GET http://hostname HTTP/1.0 Header: value Header: value <EMPTY LINE> So you got the "HTTP/1.0" or "HTTP/1.1" missing from after the URL. Secondly, it's illegal to type in spaces into the URL when it's in HTTP. You gotta encode them with . So "dogs and cats" should issue the request: http://dogs and cats Third, spaces are illegal characters in DNS hostnames, so if the URL has spaces in the hostname portion, the Navigator could immediately determine that it's a keyword search.
Interesting. I had tested my conjecture about how to contact a proxy server with the telnet sequence: GET http://florida/index.html <BLANK LINE> and had retrieved my web page (with a success code!). I was lulled into a sense of understanding by that premature success. Ari's correction to my attempt at a protocol suggests I was exploiting "undefined behavior" in the proxy (getting responses without specifying the HTTP/1.x protocol). The interesting question is then what exactly is 4.5 sending to the proxy? If it is sending the correct request format, then it is an error (bug) that the Navigator does not properly parse the Error Code (and toss the URL into the Keywords handling code). This might also fit with Ari's other comment, that the Navigator could (and since this works, probably does) realize that spaces in the wanna-be-host-name preclude it from being a valid URL, and cause the keyword code to handle the case without even contacting the proxy (and hence never having to parse a proxy error code!). Is there a log on a proxy server to verify that the request is full and valid (including the HTTP/1.x element)? I'd like to understand this a little better... as we'll eventually have to later this to 5.0, since this bug is way beyond the current crash/data loss triage level for 4.5. This investigation (and set of comments) can at least be used on a 5.0 fix. Thanks in advance for all contributions, Jim
Telnet'ting to the proxy as: GET URL<RETURN> is the old, HTTP/0.9 protocol. It is defined such that it does *not* send a status code, just the document (or text form error message) back to the client (for simplicity). So in fact, the proxy is working ok even in this case. Note that the empty line is not present in this format, either. I've set up a proxy at: ski-rack.mcom.com:8888 You can look at the access and error logs at: /u/luotonen/work3/sol-3.5-ski-rack/proxy-JAR/logs/ Let me know when you're done and I can shut this one down.
Thanks to Ari's comments, I can now summarize the bug (correctly this time). This is not a server bug (I was wrong), the problem is that several distinct errors are lumped together by a proxy server, and we have to decide if we want the Navigator to catch all such causes, and redirect handling to the keywords code. The problem only appears when a solo word is supplied, and can be mistaken for a host name. The proxy server returns an identical return code (500) for the case of "Server does not have DNS entry" and "Server may be down or unreachable." As pointed out by Valeski in detail on 5/18 above, folks *could* catch the 500 error and redirect it to the keywords handling code. As he also pointed out, this would mean that we would loose the ability to distinguish down-machines from non-existant-machines when a proxy was active :-/. Hence, if we "fixed" this bug, and always passed off 500 error codes to keywords-handling, then we would have a new bug :-/. We could then tell if we are using a proxy by typing in a URL to a host that has a server down. When a proxy is not in use, then we'll be told that the host is down (via a pop-up). When a proxy is in use, we'd be deprived of this info, and be fed off to keywords. The new "alternate bug" would say "Internet Keywords don't work when machine is down unless proxy is in use." If not that, then it might say "Internet Keywords are too aggressive when server is down and proxy is in use." I wouldn't know which to file... but there would be a clear (bug) difference depending on whether you are using a proxy or not :-(. IMO, the current bug is not that much "worse" than the bug that we'd get from a fix. A techie would like the current bug, which tells you when a machine is down. A non-techie would probably rather hear about other related sites when his target machine is down (since a non-techie doesn't even follow the distinction between client, server, net, and ISP). IMO, without a clear spec (with some justification), this is a "bug" that should not be fixed. Folks could even decide to change proxy servers... and then we could keep clients on identical footing with/without servers... but I don't honestly know what the desired behaviour should be... perhaps even another preference :-( :-(. I'll now sit on this bug until some compelling arguments appear for "the right" behavior (whatever that is)... or till the triage requirements are enforced, and this non-crashing bug is latered (current triage rules state that is our current 4.5 policy).
I am latering this from 4.5 to 5.0 in response to further triage requests. We need to figure out what we want to do... and if we want to be consistent... and then do it.
This is not, IMO, a P0 bug. It is not stopping folks from testing... nor blocking other engineers from doing their work. There is a feature here that is not spec'd completely... and that is really what is holding up a resolution. See my comments above for a detailed explanation of the options, and implications. I'm downgrading this to P1.
Not an FE bug since all the keyword magic happens in the backend, including the redirect to keywords.netscape.com. Reassigning to dp.
Jud could you keep this in your pool and decide based on jar's comment.
Jud not around, giving to dp.
I'll accept just to get it off the NEW list, but I can't do anything with it until the feature behavior is decided on.
removing myself from cc: list
qa contact set to email@example.com
qa contact set to firstname.lastname@example.org
JAR's analysis of the bug is close to correct, but he is failing to note that the current behavior for single words w/o a proxy matches what he says would be a bug if we fixed this bug. I.e., right now if you type an intranet server name, and that server happens to be down, the 4.5 client will time-out and send the single word to the keyword system. Judson Valeski was just suggesting that we do the same thing for proxy server cases - if the proxy server returns a 500, then the client should roll over to send the word to the keyword system. Since 99% of the single words people are typing are likely to be internet host names, and not intranet host names, these behaviors (with or w/o proxy) support the majority.
Ok, I understand. To point is moot until we support proxies in 5.0
Setting to M5 Target Milestone to get fixed before PR1.
If I understand the proposed fix correctly, we want the client to notice some class of proxy errors and then retry the request as a keyword. This assumes that one would never want the user to see those proxy errors and their corresponding HTML pages. This all assumes that you can tell DNS or connection errors apart from other HTTP errors returned by the proxy. This is non trivial since various proxies return various things and there is some overlap in meaning for some of the error codes. I dont think you would want to roll over to keywords for all 4?? and 5?? error codes for example. And the HTTP spec isnt much help since its suggestion (504?) is not widely implemented (its only for HTTP 1.1 for starters). The latest Squid (for example) seems to return 503. The RFC claims that others return 400 or 500. There was some talk lst year about doing something evil like parsing the descriptive HTML for hints that it was a DNS error but that is fragile. Im of the opinion that the correct place to fix this is in the proxy itself. Already today if you ask for http://foo the proxy needs to choose how to turn that into a FQDN. Its not clear that the default need be the domain of the proxy server's host. Squid lets you configure what happens with the append_domain configuration variable. It does nothing by default. If you could rely on that (which you cant) then you could choose never to send single word hosts to the proxy in the first place. I have some patches to Squid that implement keywords. If the hostname has no dot and the DNS resolution fails, then it treats it as a keyword. I will submit the changes to the Squid maintainers if no-one objects. Perhaps it will catch on with other proxies, then again, perhaps not. -jg
I know of only one keyword server (ours), but if others are created in the future, and if we want to allow users to set their keyword server in the prefs file, then we want the client (not the proxy) to make the keyword request. So this is one drawback if we get the proxy to do the keywords. However, I don't think it's common to change the keyword server in prefs (and, indeed, maybe Netcenter doesn't even want people to change it :-)
I like the idea of fixing proxy servers, but am doubtful that it will solve this problem for the majority of users. Here's a question: if the default mode of the browser was that non-URLs were sent to the designated keyword server (without a local host lookup), would this bypass the proxy problem?
As far as I know, some people use proxies because they don't have a DNS server, in which case they couldn't access the keyword server unless we hard-coded the keyword server's IP address into the client. Also, I heard recently that some ISPs force their customers to use HTTP proxies by disallowing access to port 80 (HTTP) on other hosts.
IE5 autodiscovers proxies.
something to keep on the necko radar... will this ever be possible...
once we resolve how we want to treat proxy response codes wrt dns failure, this can be fixed. The component of this really should change though to some new component that will be handling keywords.
Changing all Networking Library/Browser bugs to Networking-Core component for Browser. Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do this in a bulk change. If this happens, I will fix. ;-)
Isn't the dependency r'ship the wrong way round, ie this should depend on 10276?
Bulk move of all Necko (to be deleted component) bugs to new Networking component.
Date: Mon, 27 Dec 1999 15:51:01 -0800 From: "Tom Williams" <email@example.com> I found that this same problem occurs even on Netscape 4.7 using our Apache web server's proxy capability. If I browse "cdnow", I get errors from the proxy indicating that "Get http://cdnow/" can't be handled by the proxy. When I specify http://www.cdnow.com, everything works fine.
making this bug dependent on the proxy bug.
Putting on PDT- radar for beta1 - due to being the same behavior as 4.7.
I'm pushing the milestone on this out. Same behavior as 4.x as leger points out.
not getting to this for awhile. marking it as an enhancement because it's an interoperability issue w/ existing proxy servers.
Moving to M17 (also beta2).
spam, changing qa contact from paulmac to firstname.lastname@example.org on networking/RDF bugs
Does anyone have an update that describes this behavior in Netscape 6? Also, can someone comment if this is in Mozilla, or is this a commercial feature now. If this a Netscape-only feature, perhaps it needs to be moved to bugscape? Compared to other functional proxy problems, this seems like a low priority, so I'm not likely to analyze this problem anytime soon.
qa to me.
After reviewing a variety of Location, DNS and IK bugs, I think the solution is that when we hit a 5xx class error, we should call IK and request again. My reasoning is this: If the proxy is having internal errors (bad config, low memory, no internet connectivity), then a second request will probably return a 5xx error again anyhow. If the 5xx error is related a request forwarding failure (server down, not in DNS, whatever), then by failing over to a IK request, you can generate some information for the user. The other solution would be to implement bug 88217 or bug 127872. These would decrease the situations where a server 500 error would be the final result presented to the user.
IMO, it would be a privacy problem to fallback to IK, if the desired server is e.g. down. See bug 127872 for discussion about privacy differences between DNS and IK.
-> URL bar (for Internet Keywords) When IK==on, anything failing in DNS is sent to the IK server, so for any HTTP Proxy setup, this might be an issue now.
[RFE] is deprecated in favor of severity: enhancement. They have the same meaning.
*** Bug 150580 has been marked as a duplicate of this bug. ***
Focusing the summary on the actual change needed to fix the problem. This underlies problems w/ getting IK and domain guessing to work correctly in proxy environments.
old bug. i think that this is now invalid.
hmm... old bug yes, but seems valid to me. should be a RFE instead of a normal bug though.
The IK and Domain Guessing owners should decide if they even care about working via proxies. If they don't, I would like to WONTFIX thse bugs.
Well, I'm not the owner, but I for one would like to have this working -- at least domain guessing; IK is less important. The method that seems to be used in other browsers is that the proxy setting simply doesn't affect the domain name lookup routine. And it seems to me that an HTTP proxy should only be used for HTTP requests, not domain name lookups.
> it seems to me that an HTTP proxy should only be used for HTTP requests, not > domain name lookups. Are you suggesting to try to make a DNS lookup from the client, ignoring the proxy? That will fail here, and I don't know how (long delay or resolution failure), because the client might not have access to any DNS server or the DNS server might carry internal names only or the DNS server setting might be intentionally misconfigured to avoid and reveal abuse.
I see your point. Elsewhere I suggested adding an optional 'DNS proxy'; if unset, the client would attempt DNS lookup/guessing as if there were no (HTTP) proxy. I think this would prove to be a cleaner and more useful fix than trying to intelligently "guess" appropriate behavior from the proxy response. Those who want proxied DNS can simply specify it. For the record, my proxy server [Proxomitron] returns '400 Host Not Found'. So I think it's safe to say that proxy servers cannot be relied upon to return a predictable response for domain guessing (based on reports on other proxy server responses in above comments).
BenB's points are right on the money. That is why SOCKS4 was considered useless in the real world, and had to be replaced by SOCKSV5, which allows HTTP (URL) proxy style DNS delegation. The reason newer proxies do not have server-level support for domain guessing, is because the world has changed, and it doesn't make any sense anymore, we just don't have a replacement for it yet. http://www.mozilla.org/docs/end-user/domain-guessing.html
I suggested the addition of a DNS-only proxy preference as an effective solution to this problem. HTTP proxy servers perhaps *should* handle domain guessing (or uniformly return an error code), but the fact is, they don't -- with the consequence being apparently "broken" domain guessing in Mozilla. I think adding a DNS proxy pref would roundly resolve this issue without getting proxy-vendor-specific or breaking existing functionality in Moz. Those who need DNS proxying could specify it, and those who want DNS handled by the local machine can leave it blank. This would also have the side benefit of permitting users to specify a different HTTP and DNS proxy. I haven't heard any comments specifically against this suggestion, so I'm wondering if it makes sense for me to start working on adding this feature. I don't want to begin work, though, if there is little or no chance of it being accepted into the build tree. Please comment.
DNS delegation has been mentioned in other bugs, but I think you should create your own bug. It would not solve all problems, because DG and IK are both hooked into docshell, not just the DNS service. Proxy servers abstract all connectivity, the design aspects are broader than just DNS.
*** Bug 280854 has been marked as a duplicate of this bug. ***
*** Bug 269519 has been marked as a duplicate of this bug. ***
*** Bug 336734 has been marked as a duplicate of this bug. ***
Now, I admit I haven't gotten to read through all the comments, and just skimmed through them, but from what I've seen, there is no clear solution that anyone agrees on because each proxy has a different behavior, and the HTTP specification does not require a specific behavior. Thus I would propose a simple hack: from what I've noted, almost all sane proxy servers issue an 400 or 500 family status code, and also issue as response body a minimal HTML page that says the error code and a short description. Now we could observe that the page contains an H1 header line with the words "host not found". So my proposed solution is very simple and efficient almost all cases: if the proxy responds with a 400 or 500 status code and inside the page we can find either the line (with a case insensitive regular expression) <title>.*host not found.*</title>, either <h1>.*host not found.*</h1>, then we could assume that the host does not exist. Now I don't know how easy is to implement this, so please don't kill me :)
Ciprian, and if the proxy is translated? :) That's one of the many reasons why programmers hate string parsing and invented error codes.
Oh, and my proxy says "No such domain" (not "host"), of course in a <h2>, not <h1>, just to annoy us, and the <title> contains nothing of the sort, it just says "502 Bad Gateway" plus some nonsense, although it's not the gateway that's bad. --- Personally, I ended up considering this bug a feature. I don't want Google to know what I looked for just because I misspelled the domain/hostname.
Indeed if the proxy is translated (which happens in how many cases?) there could be a problem. So would be the <h2> header instead of <h1>... But we could parametrize the regular expression that should be searched for (for example a key named network.proxy.http_host_not_found_re), and thus everyone could customize according to their needs... As I said in the beginning what I've proposed is an *very bad* hack, but it is doable. --- > Personally, I ended up considering this bug a feature. I don't want Google > to know what I looked for just because I misspelled the domain/hostname. For this there is the setting keyword.enabled, and I intend to use the keyword feature as a shortcut for my personal wiki (through the setting keyword.URL).
I don't know if it help, but I'm from Argentina, I have a proxy in the university and the response is: HTTP/1.0 503 Service Unavailable Server: squid/3.0.STABLE18 Mime-Version: 1.0 Date: Thu, 24 Jun 2010 13:05:24 GMT Content-Type: text/html Content-Length: 2203 X-Squid-Error: ERR_DNS_FAIL 0 X-Cache: MISS from localhost X-Cache-Lookup: MISS from localhost:8000 Via: 1.0 localhost (squid/3.0.STABLE18) Proxy-Connection: close the contents are in english. Thanks for all.