Closed Bug 32895 Opened 20 years ago Closed 19 years ago
Converting \ to / in urls on windows only (was: RFC 2396 $2
.4 .3 non-compliance?)
From Bugzilla Helper: User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.12-20 i686) BuildID: M14 By the BNF of 3.2.1 in RFC 2068, the path is broken into segments specifically by "/", which means that the hypothetical page "back\slash.html" is not equivaltent to "back/slash.html". Reproducible: Always Steps to Reproduce: 1. go to http://www.undergrad.math.uwaterloo.ca/~dj3vande/ie.html 2. click on link 3. if browser is rfc-compliant, you go one place. If else, you go another Actual Results: Browser gets http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc/compliance.html instead of http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html Expected Results: display http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html
Hmmm... On my linux moz build 2000042809 it replaces the '\' with %5C, on my windows 95 moz build 2000042708 it replaces the '\' with a '/', giving me two different documents. IMHO a URI should I the same R, no matter what platform you're on. RFC 2068 has been obsoleted by RFC 2616, which points at RFC 2396 for allowed URI. Section 2.4.3 explicitely marks '\' as unwise and therefore not a valid character in URI as defined in Appendix A. email@example.com's URL should therefore officialy not work. Of course mozilla is allowed to DWIM (do what I mean) and convert "illegal" URIs to correct ones. The question is, what kind of conversion should be used? IMHO, for all schemes '\' and other "illegal" characters should be escaped. On OSen with '\' as file path seperator it would make sense to convert '\' to '/' for file:// (and related?) schemes. Can someone with more knowledge / experience in this field please comment on this? Changing the summary, changing the OS (the linux build seems to do the right thing, the win build doesn't), marking confirmed new.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → Windows 95
Summary: RFC 2068 3.2.1 non-compliance → RFC 2616 $2.4.3 non-compliance?
Putting myself on the CC.
Adding timeless to the CC per his request.
Yes, this was done for all those windows users who can't distinguish a \ from a /. This happens very often. We even have requests to allow http:\\server\path\file.htm as a valid url that should do the right thing. We don't do that, a line has to be drawn somewhere. But the above thing happens very often ... so whatever we do we will break some pages. Maybe we could make this configurable, sort of quirks-mode. Making this protocol depended is also a very good idea. I will look into that.
Assignee: gagan → andreas.otte
Target Milestone: --- → M18
I think the problem here is that there aren't enough requests which _beg_ you not to allow \ as path seperator because it only encourages people to not correct their mistakes and will slowly force other software packages to have to support the \ in their software. Consider this the first request :-) *beg* Besides, pages which rely on \ being a path seperator should break (heck, they will on Nav4 under linux) and their writer should be gently notified of the existence of RFCs, standards, and why it's a good thing to adhere to them. I know computers are supposed to make things easier for humans, and not the other way around, but the moment humans start making life difficult for other humans with their silly requests, a line should be drawn.
There are at least two places where we convert \ to / inside urlpaths in mozilla. One is inside the urlparser. The other is inside the docshell where we try to fix a string (for XP_PC) to a valid url when the first try to parse the url fails. Also there are some converter-functions which convert from a native path to an url path and the other way around. Normally access from a file to an url should go through this converter routines, but I'm not sure this always true. I don't like this conversion inside the urlparser too. I will do some tests without it and see what breaks.
Status: NEW → ASSIGNED
I'd have to second the beg that we convert \ to %5C, and not /. Allowing \ only encourages it, and whats worse, pages that use it will break on other OSes. Adding pp.
Yes, it's a real problem. NC 4.x supports \ and it's conversion to /, IE does it too. It's used in some pages. If we want to reach platform parity quick it will go the other way around, we will convert \ to / for UNIX/Linux/Mac too. In the long run, it makes sense to remove that support and replace it with a fallback support. First try it without conversion, if that worked all is okay, if not and the string contains \ convert them to / and try again. But that is a massive undertaking since we have to change every place where a relative or absolute URI string is used to create an URI and wrap it into the fallback code.
Andreas suggests that all platforms convert \ to / now, at that it be phased out later. If we can't do it now, during the complete rewrite, it will never be done. More web page authors will start using \. Breaking the RFC will become the standard that all web browsers have to implement.
Gagan voted against removing the \-conversion code now although he agrees that it is the right thing to do, including implementing some sort of "quirks" mode, maybe like that one I described above.
Another problem is that while windows users are used to \ as a file separater, unix users are used to \ being a shell escape. So in file: urls on unix, you should be able to use "one\ name" to reference a directory or file with a space in the middle. Currently you have to use %20. Not saying that \ should be implemented this way, but I'm just giving another example of the problems with this situation.
Well, I don't think Unix users ought to be able to do that. If we assume shell escapes are resolved first, you still have an unescaped space character in the URL, which is invalid per the RFC. Of course, this is no greater non-compliance than converting '\' to '/' in the first place.
back to gagan for reassignment, my time schedule is getting worse, I have to stop sitting on this bug.
Assignee: andreas.otte → gagan
Status: ASSIGNED → NEW
mass move, v2. qa to me.
QA Contact: tever → benc
Works on Linux... is this still broken in windows?
The parsercode for conversion of \ to / is still in there (nsURLHelper.cpp).
Yes, while Netscape 4.78 gets the "Your browser is compliant with RFC 2068." page in the testcase! Both on WindowsME (Mozilla trunk CVS build on 20010804).
Compliance with RFC2396 2.4.3 means in this case: Make the urlparser(!) completly ignorant of \ as an separator for directory structures. Treat it as what it is: An unwise character which should and will be escaped. This involves: - nsURLHelper/CoaleseDirs will no longer convert from \ to / on XP_PC. - nsURLHelper/CoaleseDirs will ignore \ as a directory separator (all platforms). - nsNoAuthUrlparser will no longer recognize \ as a possible directory separator on XP_PC used for drive detection. Consequences: An embedded URL inside a HTML document is always treated as an URL, using \ as directory separator in it will no longer work! Maybe docshell can do some urifixup if the first try to load the url fails. "Doing the right thing" (TM) lies only in the hand of the file system specific conversionen routines from and to file-urls! This means: - correct conversion/escaping of filesystem specific characters wich colide with url systax. For example: having filenames with /, : or something similar in it needs to trigger special escaping of these chars before making it part of a file url. - use nsIFile and its conversion routines everywhere and use them correctly. - Do the right thing with UNC filepaths. - Do escaping/unescaping in the right order and number. Other stuff: GetFile and SetFile from nsStdURL should be moved into the file system specific parts of the implementation. Having it ifdefed inside nsStdURL is really bad style!
Summary: RFC 2616 $2.4.3 non-compliance? → RFC 2396 $2.4.3 non-compliance?
*** Bug 34239 has been marked as a duplicate of this bug. ***
Gagan, Judson? Anyone want to do a review?
Summary: RFC 2396 $2.4.3 non-compliance? → Converting \ to / in urls on windows only (was: RFC 2396 $2.4.3 non-compliance?)
to try to clarify the desired behavior: RFC 1738 lists "\" as an unsafe character, then says: " All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding." So we should do encode the offending slash, on all platforms. I had originally thought that it was a "reserve"-able character, which would make it URL scheme specific, and possibly legal in file URL's. I was wrong. In regards to how this affects windows users w/ file paths, I think that what we need is entrypoints (like the line command parser and "Open URL" dialogs) to be path friendly, and do conversion to file URLs automatically. We might also want to change the displays of file URL's to a local path in the local formatings, but I think our internal handling of slashes should be compliant.
Yes, that's exactly right. We have functions that convert from local filepaths to urls and the other way around. These functions are aware of the filesystem specific delimiters and special characters and convert them. If these functions are used and are correct, all is fine. We end up with a valid url or a vaild filepath. What will get broken when this goes active are websites which have *urls* in their documents which use \ not as a normal char but as the path delimiter. This is clearly wrong and a case for tech evangelism. But you have always to deal with the argument: IE (windows) can handle this, why can't you ...
I suggest we get this in early in the 0.9.5 cycle to get early response ...
So what's the final fix? Convert typed in URLs with \ to /? What about href URLs, will clicking a link on MS vs Mac go to the same place?
No, with the patch we wont do that conversion on any platform anymore. A \ that is part of an *url* will be just an unwise char which will get escaped to %5C. On the other hand if we get a \ as part of a *filepath* on windows/os2 we will convert it to / when doing a conversion from filepath to url. But that is part of the local filepath handling (and some uri fixup in docshell) and has nothing to do anymore with the url parser.
Comment on attachment 48700 [details] [diff] [review] new diff to prevent bitrott sr=darin ... who reviewed this?
Attachment #48700 - Flags: superreview+
if you look click on the "View Bug Activity" you will see that I did. :-)
fix checked in. Can be verified by using the test url http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html It will now verify compliance for mozilla windows too. On the other hand we will probably see some more reports about non working links which use \ instead of /. All these bugs can go straight to tech evangelism.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
*** Bug 119457 has been marked as a duplicate of this bug. ***
*** Bug 150475 has been marked as a duplicate of this bug. ***
verified on 10/1/02 Win2000
Status: RESOLVED → VERIFIED
(In reply to comment #36) > Firefox should be tolerant as well, or at least provide a means to enable > tolerancy why should firefox assume that someone meant to type / when he typed \? those are different characters, and can well be different files on the web server.
I agree with biesi here. I'm all for tolerance, but if we break a RFC in the process a line is crossed. This would happen here, we would break all those pages that have a \ as part of a file- or directory name. There are not very much, but those pages exist.
*** Bug 176918 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.