Closed Bug 32895 Opened 25 years ago Closed 24 years ago

Converting \ to / in urls on windows only (was: RFC 2396 $2.4.3 non-compliance?)

Categories

(Core :: Networking, defect, P3)

Product:

Component:

Platform:

x86

Windows 95

Type:

defect

Priority:

P3

Severity:

normal

Tracking

()

Status:

VERIFIED FIXED

Milestone:

Future

People

(Reporter: jacoby, Assigned: gagan)

References

(
URL
)

Details

(Keywords: platform-parity)

Attachments

(1 file, 1 obsolete file)

patch to remove conversion of \ to / from the urlparser 24 years ago Andreas Otte 2.28 KB, patch		Details \| Diff \| Splinter Review
new diff to prevent bitrott 24 years ago Andreas Otte 2.27 KB, patch	dougt : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review

Reporter

Description

•

25 years ago

From Bugzilla Helper: User-Agent: Mozilla/4.72 [en] (X11; U; Linux 2.2.12-20 i686) BuildID: M14 By the BNF of 3.2.1 in RFC 2068, the path is broken into segments specifically by "/", which means that the hypothetical page "back\slash.html" is not equivaltent to "back/slash.html". Reproducible: Always Steps to Reproduce: 1. go to http://www.undergrad.math.uwaterloo.ca/~dj3vande/ie.html 2. click on link 3. if browser is rfc-compliant, you go one place. If else, you go another Actual Results: Browser gets http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc/compliance.html instead of http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html Expected Results: display http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html

Peter "jag" Annema

Comment 1

•

25 years ago

Hmmm... On my linux moz build 2000042809 it replaces the '\' with %5C, on my windows 95 moz build 2000042708 it replaces the '\' with a '/', giving me two different documents. IMHO a URI should I the same R, no matter what platform you're on. RFC 2068 has been obsoleted by RFC 2616, which points at RFC 2396 for allowed URI. Section 2.4.3 explicitely marks '\' as unwise and therefore not a valid character in URI as defined in Appendix A. jacoby@ecn.purdue.edu's URL should therefore officialy not work. Of course mozilla is allowed to DWIM (do what I mean) and convert "illegal" URIs to correct ones. The question is, what kind of conversion should be used? IMHO, for all schemes '\' and other "illegal" characters should be escaped. On OSen with '\' as file path seperator it would make sense to convert '\' to '/' for file:// (and related?) schemes. Can someone with more knowledge / experience in this field please comment on this? Changing the summary, changing the OS (the linux build seems to do the right thing, the win build doesn't), marking confirmed new.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Linux → Windows 95

Summary: RFC 2068 3.2.1 non-compliance → RFC 2616 $2.4.3 non-compliance?

Peter "jag" Annema

Comment 2

•

25 years ago

Putting myself on the CC.

Peter "jag" Annema

Comment 3

•

25 years ago

Adding timeless to the CC per his request.

Comment 4

•

25 years ago

Yes, this was done for all those windows users who can't distinguish a \ from a /. This happens very often. We even have requests to allow http:\\server\path\file.htm as a valid url that should do the right thing. We don't do that, a line has to be drawn somewhere. But the above thing happens very often ... so whatever we do we will break some pages. Maybe we could make this configurable, sort of quirks-mode. Making this protocol depended is also a very good idea. I will look into that.

Assignee

Comment 5

•

25 years ago

->andreas.

Assignee: gagan → andreas.otte

Target Milestone: --- → M18

Peter "jag" Annema

Comment 6

•

25 years ago

I think the problem here is that there aren't enough requests which _beg_ you not to allow \ as path seperator because it only encourages people to not correct their mistakes and will slowly force other software packages to have to support the \ in their software. Consider this the first request :-) *beg* Besides, pages which rely on \ being a path seperator should break (heck, they will on Nav4 under linux) and their writer should be gently notified of the existence of RFCs, standards, and why it's a good thing to adhere to them. I know computers are supposed to make things easier for humans, and not the other way around, but the moment humans start making life difficult for other humans with their silly requests, a line should be drawn.

Comment 7

•

25 years ago

There are at least two places where we convert \ to / inside urlpaths in mozilla. One is inside the urlparser. The other is inside the docshell where we try to fix a string (for XP_PC) to a valid url when the first try to parse the url fails. Also there are some converter-functions which convert from a native path to an url path and the other way around. Normally access from a file to an url should go through this converter routines, but I'm not sure this always true. I don't like this conversion inside the urlparser too. I will do some tests without it and see what breaks.

Status: NEW → ASSIGNED

Jeremy M. Dolan

Comment 8

•

25 years ago

I'd have to second the beg that we convert \ to %5C, and not /. Allowing \ only encourages it, and whats worse, pages that use it will break on other OSes. Adding pp.

Keywords: pp

Comment 9

•

25 years ago

Yes, it's a real problem. NC 4.x supports \ and it's conversion to /, IE does it too. It's used in some pages. If we want to reach platform parity quick it will go the other way around, we will convert \ to / for UNIX/Linux/Mac too. In the long run, it makes sense to remove that support and replace it with a fallback support. First try it without conversion, if that worked all is okay, if not and the string contains \ convert them to / and try again. But that is a massive undertaking since we have to change every place where a relative or absolute URI string is used to create an URI and wrap it into the fallback code.

Jeremy M. Dolan

Comment 10

•

25 years ago

Andreas suggests that all platforms convert \ to / now, at that it be phased out later. If we can't do it now, during the complete rewrite, it will never be done. More web page authors will start using \. Breaking the RFC will become the standard that all web browsers have to implement.

Comment 11

•

25 years ago

Gagan voted against removing the \-conversion code now although he agrees that it is the right thing to do, including implementing some sort of "quirks" mode, maybe like that one I described above.

Comment 12

•

25 years ago

Another problem is that while windows users are used to \ as a file separater, unix users are used to \ being a shell escape. So in file: urls on unix, you should be able to use "one\ name" to reference a directory or file with a space in the middle. Currently you have to use %20. Not saying that \ should be implemented this way, but I'm just giving another example of the problems with this situation.

Comment 13

•

25 years ago

Well, I don't think Unix users ought to be able to do that. If we assume shell escapes are resolved first, you still have an unescaped space character in the URL, which is invalid per the RFC. Of course, this is no greater non-compliance than converting '\' to '/' in the first place.

Comment 14

•

25 years ago

back to gagan for reassignment, my time schedule is getting worse, I have to stop sitting on this bug.

Assignee: andreas.otte → gagan

Status: ASSIGNED → NEW

Assignee

Updated

•

25 years ago

Target Milestone: M18 → Future

Updated

•

25 years ago

Blocks: 61999

Updated

•

25 years ago

Blocks: 63736

Comment 15

•

24 years ago

mass move, v2. qa to me.

QA Contact: tever → benc

Jeremy M. Dolan

Comment 16

•

24 years ago

Works on Linux... is this still broken in windows?

Comment 17

•

24 years ago

i think that's now bug 90383.

Depends on: 90383

Comment 18

•

24 years ago

The parsercode for conversion of \ to / is still in there (nsURLHelper.cpp).

Comment 19

•

24 years ago

Yes, while Netscape 4.78 gets the "Your browser is compliant with RFC 2068." page in the testcase! Both on WindowsME (Mozilla trunk CVS build on 20010804).

Comment 20

•

24 years ago

Compliance with RFC2396 2.4.3 means in this case: Make the urlparser(!) completly ignorant of \ as an separator for directory structures. Treat it as what it is: An unwise character which should and will be escaped. This involves: - nsURLHelper/CoaleseDirs will no longer convert from \ to / on XP_PC. - nsURLHelper/CoaleseDirs will ignore \ as a directory separator (all platforms). - nsNoAuthUrlparser will no longer recognize \ as a possible directory separator on XP_PC used for drive detection. Consequences: An embedded URL inside a HTML document is always treated as an URL, using \ as directory separator in it will no longer work! Maybe docshell can do some urifixup if the first try to load the url fails. "Doing the right thing" (TM) lies only in the hand of the file system specific conversionen routines from and to file-urls! This means: - correct conversion/escaping of filesystem specific characters wich colide with url systax. For example: having filenames with /, : or something similar in it needs to trigger special escaping of these chars before making it part of a file url. - use nsIFile and its conversion routines everywhere and use them correctly. - Do the right thing with UNC filepaths. - Do escaping/unescaping in the right order and number. Other stuff: GetFile and SetFile from nsStdURL should be moved into the file system specific parts of the implementation. Having it ifdefed inside nsStdURL is really bad style!

Summary: RFC 2616 $2.4.3 non-compliance? → RFC 2396 $2.4.3 non-compliance?

Comment 21

•

24 years ago

Attached patch patch to remove conversion of \ to / from the urlparser (obsolete) — Details — Splinter Review

Comment 22

•

24 years ago

*** Bug 34239 has been marked as a duplicate of this bug. ***

Comment 23

•

24 years ago

Gagan, Judson? Anyone want to do a review?

Updated

•

24 years ago

Summary: RFC 2396 $2.4.3 non-compliance? → Converting \ to / in urls on windows only (was: RFC 2396 $2.4.3 non-compliance?)

Comment 24

•

24 years ago

to try to clarify the desired behavior: RFC 1738 lists "\" as an unsafe character, then says: " All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding." So we should do encode the offending slash, on all platforms. I had originally thought that it was a "reserve"-able character, which would make it URL scheme specific, and possibly legal in file URL's. I was wrong. In regards to how this affects windows users w/ file paths, I think that what we need is entrypoints (like the line command parser and "Open URL" dialogs) to be path friendly, and do conversion to file URLs automatically. We might also want to change the displays of file URL's to a local path in the local formatings, but I think our internal handling of slashes should be compliant.

Comment 25

•

24 years ago

Yes, that's exactly right. We have functions that convert from local filepaths to urls and the other way around. These functions are aware of the filesystem specific delimiters and special characters and convert them. If these functions are used and are correct, all is fine. We end up with a valid url or a vaild filepath. What will get broken when this goes active are websites which have *urls* in their documents which use \ not as a normal char but as the path delimiter. This is clearly wrong and a case for tech evangelism. But you have always to deal with the argument: IE (windows) can handle this, why can't you ...

Comment 26

•

24 years ago

Attached patch new diff to prevent bitrott — Details — Splinter Review

Comment 27

•

24 years ago

I suggest we get this in early in the 0.9.5 cycle to get early response ...

Keywords: review

Jeremy M. Dolan

Comment 28

•

24 years ago

So what's the final fix? Convert typed in URLs with \ to /? What about href URLs, will clicking a link on MS vs Mac go to the same place?

Comment 29

•

24 years ago

No, with the patch we wont do that conversion on any platform anymore. A \ that is part of an *url* will be just an unwise char which will get escaped to %5C. On the other hand if we get a \ as part of a *filepath* on windows/os2 we will convert it to / when doing a conversion from filepath to url. But that is part of the local filepath handling (and some uri fixup in docshell) and has nothing to do anymore with the url parser.

Updated

•

24 years ago

Attachment #46676 - Attachment is obsolete: true

Doug Turner (:dougt)

Updated

•

24 years ago

Attachment #48700 - Flags: review+

Comment 30

•

24 years ago

Comment on attachment 48700 [details] [diff] [review] new diff to prevent bitrott sr=darin ... who reviewed this?

Attachment #48700 - Flags: superreview+

Doug Turner (:dougt)

Comment 31

•

24 years ago

if you look click on the "View Bug Activity" you will see that I did. :-)

Comment 32

•

24 years ago

fix checked in. Can be verified by using the test url http://www.undergrad.math.uwaterloo.ca/~dj3vande/rfc\compliance.html It will now verify compliance for mozilla windows too. On the other hand we will probably see some more reports about non working links which use \ instead of /. All these bugs can go straight to tech evangelism.

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Comment 33

•

24 years ago

*** Bug 119457 has been marked as a duplicate of this bug. ***

Comment 34

•

23 years ago

*** Bug 150475 has been marked as a duplicate of this bug. ***

Comment 35

•

23 years ago

verified on 10/1/02 Win2000

Status: RESOLVED → VERIFIED

Comment 36

•

21 years ago

another thread on same topic: http://forums.mozillazine.org/viewtopic.php?p=629335#629335 Firefox should be tolerant as well, or at least provide a means to enable tolerancy (i.e. give users a choice between strict mode and tolerant mode). This strictness can only lead to bad press in comparisons. Even the open source Apache web server is tolerant (or can be configured to be tolerant) for URL and spelling mistakes (mod speling) . There's no reason Firefox can't be this way too. Broken page: http://www.nywatertaxi.com/about.php One bad press example due to strictness... http://computergripes.com/firefox.html mod_speling: http://httpd.apache.org/docs-2.0/mod/mod_speling.html Note: one way to workaround Firefox and Mozilla's strictness might be to use a proxy that can rewrite the URLs, replacing the bad '\' with a good '/'. However I still think this should be part of a robust browser tolerant of mistakes...perhaps like how bad javascript is handled (put a warning sign somewhere that the script isn't quite right, but still make a best-effort at rendering it correctly.)

Christian :Biesinger (don't email me, ping me on IRC)

Comment 37

•

21 years ago

(In reply to comment #36) > Firefox should be tolerant as well, or at least provide a means to enable > tolerancy why should firefox assume that someone meant to type / when he typed \? those are different characters, and can well be different files on the web server.

Comment 38

•

21 years ago

I agree with biesi here. I'm all for tolerance, but if we break a RFC in the process a line is crossed. This would happen here, we would break all those pages that have a \ as part of a file- or directory name. There are not very much, but those pages exist.

Boris Zbarsky [:bzbarsky]

Comment 39

•

20 years ago

*** Bug 176918 has been marked as a duplicate of this bug. ***

You need to log in before you can comment on or make changes to this bug.