Closed Bug 155344 (host:port) Opened 22 years ago Closed 22 years ago

URL bar doesn't understand ports without protocol (no default to http)

Categories

(SeaMonkey :: Location Bar, defect)

defect
Not set
minor

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bjorncarlin, Assigned: adamlock)

References

Details

(Keywords: qawanted, regression)

Attachments

(2 files, 2 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.1a+) Gecko/20020701 BuildID: 2002070103 When entering (for instance) "locahost:80" Mozilla claims "localhost is not a supported protocol". "http://locahost:80" works fine. Mozilla 1.0 final did not have this problem. Reproducible: Always Steps to Reproduce: enter "locahost:80" in the URL bar
To adamlock; this is a URI fixup issue...
Assignee: hewitt → adamlock
Whiteboard: DUPEME
The fix for bug 100176 adds the unregistered protocol error message. It didn't get fixed in 1.0.0. It might be possible to add something to URI fixup to detect the port but I wonder if it's better to mark this WONTFIX. Fixup is intended for novices, who wouldn't usually be specifying port numbers. Internet Explorer doesn't fix up localhost:80 either.
WorksForMe using FizzillaCFM/2002062203. Accessing "localhost:80" results in "http://localhost/" being accessed (though the port should still have been displayed). Accessing "localhost:23" results in the error, "Access to the port number has been disabled for security reasons." Adding qawanted for duplication per DUPEME on the whiteboard.
Keywords: qawanted
*** Bug 155306 has been marked as a duplicate of this bug. ***
Duped the other bug. All/all. I agree that this bug should be wontfix. It's important to display an error message when a user, novice or expert, enters an URL with an invalid protocol, rather than doing nothing. If the user knows about port numbers and is frustrated by an error message, they most likely know about protocols, too, and can type in the right one.
OS: MacOS X → All
Hardware: Macintosh → All
Whiteboard: DUPEME
Marking WONTFIX. I believe fixup should confine itself to fixing common mistakes made by novices. It will just tie itself in knots if we keep going after things like this.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
is it feasible to have a preference (without a UI) to disable the fixup behavour? the mozilla browser is primarly a web browser (personally i can't remember the last time i used it for anything other than http(s)). i feel it makes more sense for mozilla to weight its guesses with bias towards the http protocol, as it previously did. if i wrote code to add a preference to disable this element of the fixup code, would i just be wasting my time?
It would certainly be possible to override the URI fixup object (which is a service), though I doubt a pref would be accepted in the UI unless it was something a lot of people would want to switch off completely. You might like to pitch the idea in netscape.public.mozilla.ui newsgroup.
I would love to see such a pref, with or without UI. I spend a lot of time accessing HTTP servers on ports other than 80 and hate having to type in http:// all the time.
*** Bug 157579 has been marked as a duplicate of this bug. ***
I believe that the WONTFIX status of this bug was a little bit hasty. In order to have this working again it would not be necessary to stop doing the URL fixup. You just watch out for this special case. You would still display an error message when the URL doesnt work. I feel that this is a regression and the reason for marking this bug WONTFIX is not clear. Thanks.
Okay, time has had its effect. I changed my mind. I recommend reopening this bug to implement behavior similar to what I suggested in bug 100176 comment 22, if that or another scheme is feasible.
There are two heuristics we can use: 1. If the "protocol" contains dots, it's a hostname and not a protocol. 2. If the first part of the "path" is a (small?) number, it's a port and the thing before the colon is a hostname. Both of these would fix slashdot.org:80. Only #2 would fix localhost:80, but #2 might break a future protocol, so #1 is safer. Opera uses #2. Netscape 4 uses #2 but only if it doesn't recognize the protocol. IE does no fixup. ftp:80 localhost:80 slashdot:80 Netscaspe 4 can't find host "." http://localhost:80/ http://slashdot.org:80/ Opera 6 http://ftp:80/ http://localhost:80/ http://slashdot.org:80/ IE 6 ftp://80/ local:80 (???) invalid syntax Adam, I think address bar fixup should be for convinience and "doing what the user expects" rather than just being for "fixing common mistakes made by novices". It's been years since typing "slashdot.org" rather than "http://slashdot.org/" into the location bar has been considered a mistake.
Status: RESOLVED → UNCONFIRMED
Keywords: 4xp
Resolution: WONTFIX → ---
I might be able to add some code to count the dots in the scheme and do fixup instead of return an error. I could also check if the bit after the colon contained digits. I don't think localhost:80 should be fixed up though, because localhost could be a scheme.
*** Bug 158892 has been marked as a duplicate of this bug. ***
*** Bug 159058 has been marked as a duplicate of this bug. ***
Status: UNCONFIRMED → NEW
Ever confirmed: true
I was halfway through implementing this when I noticed that the rfc covering URIs explicitly states that schemes make contain dots, pluses, minuses and alpha numeric characters. It would not be possible to assume that slashdot.org:80 is an address, rather than an invalid scheme without violating this. See section 3.1: http://www.faqs.org/rfcs/rfc2396.html I will attach a work in progress patch anyway. I've never heard of a scheme with a dot in it so it may be the lesser of two evils to assume that one that does is probably an address, especially when Mozilla doesn't recognize it as a valid protocol.
adam, Wouldnt it make more sense to have a default fall back behavior if you do not recognize the scheme as being valid? So In the case of localhost:80, after you realize that localhost is not a scheme you would treat it as being a hostname. This way you dont have to make any assumptions and you would give the desired behavior to both people that use schemes and people that assume the http protocol as the default.
Attached patch Work in progress (obsolete) — Splinter Review
Work in progress patch. As mentioned before, this patch counts the dots in the protocol to determine the liklihood that it is a malformed http uri. If I don't count the dots then I increase the risk that fixup compounds the error rather than giving a reasonable error message that might help the user correct a simple mistake.
Attached patch Work in progress mk 2 (obsolete) — Splinter Review
Same as before but restricted to diffs nsDefaultURIFixup.cpp. The other diffs were bogus.
Attachment #92624 - Attachment is obsolete: true
Re: comment 18 - that was the problem before. If a user typed in a valid but unsupported protocol, like wais:site.org or an invalid protocol, like httpffr://www.blah.com Mozilla would respond with an error message. That's a bummer about schemes with dots in them. OTOH, aren't all port numbers supposed to entered as decimal numbers, between 1 and 65536? A number within that range could not be an IP address. If so, we could check to see if the number following the colon is within that range, and if so treat it as the port number for an HTTP address.
*** Bug 159270 has been marked as a duplicate of this bug. ***
Re: comment 21 -- That was partially wrong, sorry. The previous problem was that when a valid but unsupported protocol or an invalid protocol was requested, Mozilla did not display an error message. Instead, nothing happened. The current behavior is superior to that. Now, however, we face a different challenge.
Keywords: regression
*** Bug 160081 has been marked as a duplicate of this bug. ***
*** Bug 161263 has been marked as a duplicate of this bug. ***
Alias: host:port
*** Bug 161473 has been marked as a duplicate of this bug. ***
(this bug is going to get a ton of duplicates...) I've been re-writing the URL parser tests, so I've been spending some quality time with RFC Adam mentioned, as well as 1630, 1738, and 1808, the contributing documents. This problem also affects relative URL parsing. I want to strangle the person who actually sugested that a colon be used for hostname|IP:port. netstat had it right, do w.x.y.z.port or host.domain.tld.port. As I understand it, the first char of the TLD cannot be numeric, so this would be easily parsable. (rant) As for the matter at hand, what we have is URL parsing that favors invalid protocol detection to hostname detection. I think I am the inadvertent cause of this because I filed bug 100176. What I really wanted, (which still doesn't work as of Mozilla 1.1b) is more error handling at the URL level, not at URL bar (which uses URIfixup to translate user input into a URL). If you parse to a colon, then it could be a hostname or a scheme. If the first character after the colon is not a number, it cannot be port, so send everything to Necko as a URL. If the colon is followed by a number, read the rest of the input, if it is not a valid port number (any non numeric chars or a decimal number too large) then send everything to Necko as a URL. At this point, if you believe the right side of the colon *COULD* be a valid number, then check the left hand side to see if it is a registered scheme. If not, then treat it as a hostname|IP:port number. (I'm assuming that URL bar can find out what protocols are registered somehow). This should ensure that you fixup the most hostname:port combinations w/o stomping on new schemes or calling for port numbers that are junk.
*** Bug 156004 has been marked as a duplicate of this bug. ***
*** Bug 164807 has been marked as a duplicate of this bug. ***
*** Bug 165701 has been marked as a duplicate of this bug. ***
nominating: asking for fix from both mozilla and commercial teams.
Keywords: mozilla1.2, nsbeta1
I'll try and fixup the address if it matches this: [a-zA-Z0-9.-]+:[0-9]{5,}[/]? Or in English, a hostname of alphanumeric chars, dots & dashes, followed by a colon, followed by 5 or less digits followed a forward slash or the end of the string. Anything else triggers an invalid scheme error as per usual.
Maybe it is possible to fix this partly in the urlparser. Currently we parse localhost:80 as localhost://80/. If I remember correctly previous versions of the urlparser contained some code to deal with situations like this. If we would parse this as ://localhost:80/ and then error because the protocol is missing, URIfixup has a chance to do its job.
Excuse me, but I don't really see the problem. If you have something with a colon that is NOT followed by a slash, and that pattern cannot be interpreted as a protocol OR it doesn't connect OR it gives an error upon connection? Why not just try with 'http://' prepended, as a last resort? Or you could make an exception for 'localhost' -- or indeed anything that doesn't match a protocol name but that does resolve to an IP address or that can be interpreted as an IP address... Or am I now waltzing over a subtle reasoning why we don't want this sort of thing?
greetings, i'm responsible for one of the many dups. i also agree with R.J.V., why not try pinging, why should the parser catch it. anyway, i can't help but goof with reg expressions for the proposed parser: [a-zA-Z0-9.-]+:[0-9]{5,}[/]? it should probably be: [a-zA-Z0-9.-]+:[0-9]{,5}[/]? the subtle difference is that the latter actually takes 5 or less digits in the port. i noticed this using egrep. wouldn't imagine it behaves much differently than the regex library used in mozilla.
Attached patch PatchSplinter Review
This patch adds a new method to URI fixup to guess if a scheme is actually a hostname, basically it does what I said it should check for and throws an invalid protocol otherwise. So these urls will be fixed up: www.foo.com:80 www.foo.com:80/ www.foo.com:88/blah/path/somewhere localhost:8888 While these won't localhost:89a localhost:8888888 www.foo.com:x The patch also sorts out a mess of different string calling conventions used in some other methods.
Attachment #92625 - Attachment is obsolete: true
*** Bug 165854 has been marked as a duplicate of this bug. ***
What about IP4 and IP6 addresses as hostnames?
right, an IPv6 address literal can contain ':' chars.
I still think URIFixup is the wrong place to do this kind of parsing. Wouldn't it be better to let the normal urlparser figure out that it deals with a host:port string and return a specific error like NS_ERROR_URL_MISSING_SCHEME. Then URIFixup can deal with it in prepending http:// for example.
here's how i see it... currently, docshell just hands URL strings from the URL bar off to necko, and if necko understands the URL string, then everything is good. if necko doesn't understand the URL string, then it will reject it with NS_ERROR_MALFORMED_URI and docshell performs URL fixup and then sends the new URL to necko. necko may consult the appropriate protocol handler and/or DNS to verify the URL. the problem is that when necko sees localhost:8888, it thinks it should consult the localhost protocol handler, which does not exist. the result is NS_ERROR_UNKNOWN_PROTOCOL. if localhost were a registered protocol handler, however, then necko would probably have to treat "localhost:8888" as an URL for the localhost protocol handler. that is the one situation where this can get ambiguous. for example, if someone had a LAN setup with a machine named http, then http:8888 would greatly confuse necko. so, i think this points to the fact that a layer above necko must make a decision about what URLs entered in the URL bar really mean. that is, the URL bar code should probably treat "http:8888" as "http://http:8888/" but that does require such code to know a lot about what is valid, etc. hmm... i'm not really sure what the best solution is.
I agree, but there are certain things we can do in necko. We can identify a IPv4 and IPv6 addresses, we can identify a port, we can check the scheme for valid characters, we can even check for a well formed hostname (although this might become difficult with iDNS?). 8888 for example would not be a vaild hostname, so necko would be able to see that this can not be scheme:host but must be something else. This would fit into host:port, but then necko could error because of the missing scheme. Maybe it is time to use more specific error codes to allow for more differenciated URIfixup. What bugs me is having very similar urlparsing code in the URLParser and in docshell.
yeah, it also bugs me to have similar code in both docshell and necko, but in the case of "blah:8888", necko cannot just assume that 8888 is a port number. blah:8888 maybe a perfectly valid/fully-qualified URI of type "blah". even if there is no registered protocol handler for "blah", there is the unknown protocol handler (implemented under windows, mac, and os2) that would handle this URI by poking the OS. now, necko does determine if a URI scheme is internal or external, but does that really solve the problem? what about a server named "keyword"? keyword is a registered, internal protocol handler. how should necko treat such a URI... it seems like it has to ask the keyword protocol handler to parse it, and then i suppose if that fails, necko would have to return the keyword protocol handler's error, whatever it might be. but guess what? "keyword:8888" is a perfectly valid URI... it causes a keyword lookup on the string "8888" which returns a page full of search results. it seems to me that the URI fixup code could simply check for the NS_ERROR_UNKNOWN_PROTOCOL exception, and then do the normal URI fixup when that error occurs.
> it seems to me that the URI fixup code could simply check for the > NS_ERROR_UNKNOWN_PROTOCOL exception, and then do the normal URI fixup when > that error occurs. That is exactly the approach the patch takes. If URI fixup receives NS_ERROR_UNKNOWN_PROTOCOL it tests the url to see if it might be hostname:port and proceeds on the result of the test. If it doesn't look like a hostname:port it throws invalid protocol error as usual, otherwise it http:// (or ftp://) and uses that. I don't see any need in this case to put anything in necko. The user has typed a duff url and uri fixup is trying to guess what they meant. If we put the code in necko then it would affect every uri, irrespective on whether it was entered by a user or not. I'd like a review on the patch please.
Keywords: patch
Necko needs some code to support the problems w/ schemes and relative URLs. If URI fixup can use some of this stuff to do fixup, that would be great. However, I agree w/ adam, lets not add anything to necko that does not relate to URL parsing. Docshell and URL bar should be arch'd so their function is to take user-inputed-strings and convert them to a usable URL before they send it to necko.
benc: sure, but the problem remains that docshell cannot determine if an URL is valid unless it asks necko to try loading it.
Hadn't thought of that. If docshell can ask necko about the validity of a scheme, that sounds like the right way to structure the code. My main point is that docshell should handle the human interaction, fixup code should not be pushed down into Necko.
*** Bug 166851 has been marked as a duplicate of this bug. ***
*** Bug 166910 has been marked as a duplicate of this bug. ***
Can I have a review on this please?
*** Bug 168586 has been marked as a duplicate of this bug. ***
tweak summary for searchability
Summary: URL bar doesn't understand ports without protocol → URL bar doesn't understand ports without protocol (no default to http)
*** Bug 168753 has been marked as a duplicate of this bug. ***
*** Bug 168849 has been marked as a duplicate of this bug. ***
3rd time lucky - can I have a review on the patch please? Surely someone is interested in seeing this patch go in?
Please can I have a review? Surely someone of the 30 odd people CC'd to this bug is interested in seeing it go in!
Keywords: review
Adam, I'm cautiously in favour. I only say cautiously because my unserstanding of the patch is limited but the intent as explained is good. Andrew Steele
For me the issue of IPv6 adresses is still open with this patch. They are not detected as a valid host.
Andreas, I've raised bug 171148 for general IPv6 issues with fixup. If the code is okay for IPv4 I'd like to get it in ASAP.
I will take a look at the patch later today.
adam: lot's of folks depend on IPv6 working correctly. we can't take any IPv6 regressions. NSPR provides a function to convert a literal IP address string into a PRNetAddr (PR_StringToNetAddr) which can be used to test if a string is a valid IP address (v4 or v6).
This isn't an IPv6 regression. The URI fixup object just won't stick the http:// on the front of IPv6 addresses is all and they break. AFAIK correct URIs using IPv6 addresses will work just fine. I raised bug 171148 specifically to cover the whole issue of IPv6 and fixup.
ok, sounds reasonable then. sorry for the noise. i'll sr= the patch after andreas has had a look.
The current patch does not build on linux. It fails with nsDefaultURIFixup.cpp: In method `nsresult nsDefaultURIFixup::ConvertFileToStringURI(const nsAString &, nsCString &)': nsDefaultURIFixup.cpp:391: no matching function for call to `nsAString::get () const' This is the line in question, looks like strng foo ... const PRUnichar * up = aIn;
I think .get() should be replaced by .First()
Attached patch updated patchSplinter Review
updated the patch to fix the unix build problem. Also removed 2 unused new boolean variables and (while being at it) removed the check if the string starts with \ on unix. With unix doing no \ to / fixup this check makes no sense.
Comment on attachment 100944 [details] [diff] [review] updated patch I'm giving an r= on the patch, except for the small changes I did myself. Anybody want to r= the changes I did?
Attachment #100944 - Flags: review+
Comment on attachment 100944 [details] [diff] [review] updated patch r=adamlock The unix backslash changes aren't really related to this bug, but it looks right.
Comment on attachment 100944 [details] [diff] [review] updated patch >Index: mozilla/docshell/base/nsDefaultURIFixup.cpp >+ { >+ chunkSize++; >+ ++iter; >+ } nit: how about |++chunkSize| just for kicks? >+ if (iter == iterEnd) >+ { nit: wierd indentation... looks like there's a tab char in front of the if () stmt. looks good otherwise. sr=darin
Attachment #100944 - Flags: superreview+
Fix is checked in with Darin's suggestions.
Status: NEW → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
*** Bug 171836 has been marked as a duplicate of this bug. ***
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: