Closed Bug 32895 Opened 20 years ago Closed 19 years ago

Converting \ to / in urls on windows only (was: RFC 2396 $2.4.3 non-compliance?)

Categories

(Core :: Networking, defect, P3)

x86
Windows 95
defect

Tracking

()

VERIFIED FIXED
Future

People

(Reporter: jacoby, Assigned: gagan)

References

()

Details

(Keywords: platform-parity)

Attachments

(1 file, 1 obsolete file)

From Bugzilla Helper:
User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.12-20 i686)
BuildID:    M14

By the BNF of 3.2.1 in RFC 2068, the path is broken into segments specifically
by "/", which means that the hypothetical page "back\slash.html" is not
equivaltent to "back/slash.html". 

Reproducible: Always
Steps to Reproduce:
1. go to  http://www.undergrad.math.uwaterloo.ca/~dj3vande/ie.html
2. click on link
3. if browser is rfc-compliant, you go one place. If else, you go another

Actual Results:  Browser gets
http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc/compliance.html
instead of 
http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html

Expected Results:  display
http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html
Hmmm... On my linux moz build 2000042809 it replaces the '\' with %5C, on my 
windows 95 moz build 2000042708 it replaces the '\' with a '/', giving me two 
different documents. IMHO a URI should I the same R, no matter what platform 
you're on.

RFC 2068 has been obsoleted by RFC 2616, which points at RFC 2396 for allowed 
URI. Section 2.4.3 explicitely marks '\' as unwise and therefore not a valid 
character in URI as defined in Appendix A. jacoby@ecn.purdue.edu's URL should 
therefore officialy not work.

Of course mozilla is allowed to DWIM (do what I mean) and convert "illegal" URIs 
to correct ones. The question is, what kind of conversion should be used? IMHO, 
for all schemes '\' and other "illegal" characters should be escaped. On OSen 
with '\' as file path seperator it would make sense to convert '\' to '/' for 
file:// (and related?) schemes.

Can someone with more knowledge / experience in this field please comment on 
this?

Changing the summary, changing the OS (the linux build seems to do the right 
thing, the win build doesn't), marking confirmed new.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → Windows 95
Summary: RFC 2068 3.2.1 non-compliance → RFC 2616 $2.4.3 non-compliance?
Putting myself on the CC.
Adding timeless to the CC per his request.
Yes, this was done for all those windows users who can't distinguish a \ from a
/. This happens very often. We even have requests to allow
http:\\server\path\file.htm as a valid url that should do the right thing. We
don't do that, a line has to be drawn somewhere. 

But the above thing happens very often ... so whatever we do we will break some
pages. Maybe we could make this configurable, sort of quirks-mode. Making this
protocol depended is also a very good idea. I will look into that.
->andreas.
Assignee: gagan → andreas.otte
Target Milestone: --- → M18
I think the problem here is that there aren't enough requests which _beg_ you 
not to allow \ as path seperator because it only encourages people to not 
correct their mistakes and will slowly force other software packages to have to 
support the \ in their software. Consider this the first request :-) *beg*

Besides, pages which rely on \ being a path seperator should break (heck, they 
will on Nav4 under linux) and their writer should be gently notified of the 
existence of RFCs, standards, and why it's a good thing to adhere to them.

I know computers are supposed to make things easier for humans, and not the 
other way around, but the moment humans start making life difficult for other 
humans with their silly requests, a line should be drawn.
There are at least two places where we convert \ to / inside urlpaths in
mozilla. One is inside the urlparser. The other is inside the docshell where we
try to fix a string (for XP_PC) to a valid url when the first try to parse the
url fails.

Also there are some converter-functions which convert from a native path to an
url path and the other way around. Normally access from a file to an url should
go through this converter routines, but I'm not sure this always true.

I don't like this conversion inside the urlparser too. I will do some tests
without it and see what breaks.
Status: NEW → ASSIGNED
I'd have to second the beg that we convert \ to %5C, and not /. Allowing \ only
encourages it, and whats worse, pages that use it will break on other OSes.
Adding pp.
Keywords: pp
Yes, it's a real problem. NC 4.x supports \ and it's conversion to /, IE does it
too. It's used in some pages. If we want to reach platform parity quick it will
go the other way around, we will convert \ to / for UNIX/Linux/Mac too.

In the long run, it makes sense to remove that support and replace it with a
fallback support. First try it without conversion, if that worked all is okay,
if not and the string contains \ convert them to / and try again. But that is a
massive undertaking since we have to change every place where a relative or
absolute URI string is used to create an URI and wrap it into the fallback code.
Andreas suggests that all platforms convert \ to / now, at that it be phased out
later. If we can't do it now, during the complete rewrite, it will never be
done. More web page authors will start using \. Breaking the RFC will become the
standard that all web browsers have to implement.
Gagan voted against removing the \-conversion code now although he agrees that
it is the right thing to do, including implementing some sort of "quirks" mode,
maybe like that one I described above.
Another problem is that while windows users are used to \ as a file separater,
unix users are used to \ being a shell escape.  So in file: urls on unix, you
should be able to use "one\ name" to reference a directory or file with a space
in the middle.  Currently you have to use %20.  Not saying that \ should be
implemented this way, but I'm just giving another example of the problems with
this situation.
Well, I don't think Unix users ought to be able to do that. If we assume shell
escapes are resolved first, you still have an unescaped space character in the
URL, which is invalid per the RFC. Of course, this is no greater non-compliance
than converting '\' to '/' in the first place.
back to gagan for reassignment, my time schedule is getting worse, I have to
stop sitting on this bug.
Assignee: andreas.otte → gagan
Status: ASSIGNED → NEW
Target Milestone: M18 → Future
Blocks: 61999
Blocks: 63736
mass move, v2.
qa to me.
QA Contact: tever → benc
Works on Linux... is this still broken in windows?
i think that's now bug 90383.
Depends on: 90383
The parsercode for conversion of \ to / is still in there (nsURLHelper.cpp). 
Yes, while Netscape 4.78 gets the "Your browser is compliant with RFC 2068."
page in the testcase! Both on WindowsME (Mozilla trunk CVS build on 20010804).
Compliance with RFC2396 2.4.3 means in this case: Make the urlparser(!)
completly ignorant of \ as an separator for directory structures. Treat it as
what it is: An unwise character which should and will be escaped.

This involves:

- nsURLHelper/CoaleseDirs will no longer convert from \ to / on XP_PC.
- nsURLHelper/CoaleseDirs will ignore \ as a directory separator (all platforms).
- nsNoAuthUrlparser will no longer recognize \ as a possible directory separator
on XP_PC used for drive detection.

Consequences:

An embedded URL inside a HTML document is always treated as an URL, using \ as
directory separator in it will no longer work! Maybe docshell can do some
urifixup if the first try to load the url fails.

"Doing the right thing" (TM) lies only in the hand of the file system specific
conversionen routines from and to file-urls! This means:

- correct conversion/escaping of filesystem specific characters wich colide with
url systax. For example: having filenames with /, : or something similar in it
needs to trigger special escaping of these chars before making it part of a file
url.
- use nsIFile and its conversion routines everywhere and use them correctly.
- Do the right thing with UNC filepaths.
- Do escaping/unescaping in the right order and number.

Other stuff:

GetFile and SetFile from nsStdURL should be moved into the file system specific
parts of the implementation. Having it ifdefed inside nsStdURL is really bad style!
Summary: RFC 2616 $2.4.3 non-compliance? → RFC 2396 $2.4.3 non-compliance?
*** Bug 34239 has been marked as a duplicate of this bug. ***
Gagan, Judson? Anyone want to do a review?
Summary: RFC 2396 $2.4.3 non-compliance? → Converting \ to / in urls on windows only (was: RFC 2396 $2.4.3 non-compliance?)
to try to clarify the desired behavior:

RFC 1738 lists "\" as an unsafe character, then says:
" All unsafe characters must always be encoded within a URL. For
   example, the character "#" must be encoded within URLs even in
   systems that do not normally deal with fragment or anchor
   identifiers, so that if the URL is copied into another system that
   does use them, it will not be necessary to change the URL encoding."

So we should do encode the offending slash, on all platforms.

I had originally thought that it was a "reserve"-able character, which would
make it URL scheme specific, and possibly legal in file URL's. I was wrong.

In regards to how this affects windows users w/ file paths, I think that what we
need is entrypoints (like the line command parser and "Open URL" dialogs) to be
path friendly, and do conversion to file URLs automatically. We might also want
to change the displays of file URL's to a local path in the local formatings,
but I think our internal handling of slashes should be compliant.
Yes, that's exactly right. We have functions that convert from local filepaths
to urls and the other way around. These functions are aware of the filesystem
specific delimiters and special characters and convert them. If these functions
are used and are correct, all is fine. We end up with a valid url or a vaild
filepath.

What will get broken when this goes active are websites which have *urls* in
their documents which use \ not as a normal char but as the path delimiter. This
is clearly wrong and a case for tech evangelism. But you have always to deal
with the argument: IE (windows) can handle this, why can't you ... 
I suggest we get this in early in the 0.9.5 cycle to get early response ...
Keywords: review
So what's the final fix? Convert typed in URLs with \ to /? What about href
URLs, will clicking a link on MS vs Mac go to the same place?
No, with the patch we wont do that conversion on any platform anymore. A \ that
is part of an *url* will be just an unwise char which will get escaped to %5C.

On the other hand if we get a \ as part of a *filepath* on windows/os2 we will
convert it to / when doing a conversion from filepath to url. But that is part
of the local filepath handling (and some uri fixup in docshell) and has nothing
to do anymore with the url parser.
Attachment #46676 - Attachment is obsolete: true
Attachment #48700 - Flags: review+
Comment on attachment 48700 [details] [diff] [review]
new diff to prevent bitrott

sr=darin ... who reviewed this?
Attachment #48700 - Flags: superreview+
if you look click on the "View Bug Activity" you will see that I did. :-)
fix checked in. Can be verified by using the test url

http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html

It will now verify compliance for mozilla windows too. On the other hand we will
probably see some more reports about non working links which use \ instead of /.
All these bugs can go straight to tech evangelism.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
*** Bug 119457 has been marked as a duplicate of this bug. ***
*** Bug 150475 has been marked as a duplicate of this bug. ***
verified on 10/1/02 Win2000
Status: RESOLVED → VERIFIED
another thread on same topic:
http://forums.mozillazine.org/viewtopic.php?p=629335#629335

Firefox should be tolerant as well, or at least provide a means to enable
tolerancy (i.e. give users a choice between strict mode and tolerant mode). 
This strictness can only lead to bad press in comparisons.  Even the open source
Apache web server is tolerant (or can be configured to be tolerant) for URL and
spelling mistakes (mod speling) .  There's no reason Firefox can't be this way too.

Broken page:
http://www.nywatertaxi.com/about.php

One bad press example due to strictness...
http://computergripes.com/firefox.html

mod_speling:
http://httpd.apache.org/docs-2.0/mod/mod_speling.html

Note: one way to workaround Firefox and Mozilla's strictness might be to use a
proxy that can rewrite the URLs, replacing the bad '\' with a good '/'.  However
I still think this should be part of a robust browser tolerant of
mistakes...perhaps like how bad javascript is handled (put a warning sign
somewhere that the script isn't quite right, but still make a best-effort at
rendering it correctly.)
(In reply to comment #36)
> Firefox should be tolerant as well, or at least provide a means to enable
> tolerancy

why should firefox assume that someone meant to type / when he typed \? those
are different characters, and can well be different files on the web server.
I agree with biesi here. I'm all for tolerance, but if we break a RFC in the
process a line is crossed. This would happen here, we would break all those
pages that have a \ as part of a file- or directory name. There are not very
much, but those pages exist.
*** Bug 176918 has been marked as a duplicate of this bug. ***
Duplicate of this bug: 438348
Duplicate of this bug: 637001
You need to log in before you can comment on or make changes to this bug.