Closed Bug 7399 Opened 21 years ago Closed 21 years ago

Escaping illegal chars in URLs

Categories

(Core :: Networking, defect, P3)

x86
Windows NT
defect

Tracking

()

VERIFIED DUPLICATE of bug 10373

People

(Reporter: hjtoi-bugzilla, Assigned: gagan)

References

()

Details

Recently http://www.biztalk.org had spaces in links. They worked in IE and
Opera, but not in Netscape nor Gecko. They later changed the spaces to
underscores.

In XML world at least the browser should escape ALL illegal characters in URLs
(I just read a mail about that today, but can't remember on which list it was).
So if there are spaces in URLs they should be escaped with %20 automatically by
the browser. Gecko understands escaped URLs, it is just a matter of doing the
escaping...

The URL has a doc that contains one link that points to a file with a space
in its name. IE handles that fine, NS and Gecko fail.
There are some problems with this:

1) different URL RFCs have different ideas of what illegal characters are
2) Should the URL, as given in the document already be legal?  Is it the job of
the browser to correct a URL when the correction might mess up the server?
(What do current browsers do here?)

I think one may end up having to stick to tradition on this, but I'm not really
sure what the URL RFC's say about correction of URLs.

(When the site you mention above had spaces in links, was the whole thing in
quotes?  If not, then the problem was with parsing.)
It took some time to find where I had read that piece about illegal characters
in URIs (note, _URI_). The below URLs should answer your questions.

The discussion happened on XML-DEV. Here is a link to the archive and the
thread you should read:

http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-May-1999/0573.html

Here are some extracted relevant URLs from the discussion:

http://www.w3.org/TR/WD-charmod#URIs
http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2
According to those last two links, which point to HTML 4 section B.2.1 and
the last working draft of the W3C Character Model respectively, we should
indeed be escaping URIs.

1) We should probably take the superset. That way all bases are covered.
2) Yes, the URI in the document should indeed be legal. No, I would say that it
   is not our job to correct it. However, we should certainly not be sending
   invalid URIs to servers, so I suggest encoding would be best.

Currently, we are dropping spaces in URIs altogether (this happens somewhere in
the content sink, see bug 8319). We should certainly not be doing this.
Pushed past necko landing...
Changing all Networking Library/Browser bugs to Networking-Core component for
Browser.

Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do
this in a bulk change.  If this happens, I will fix. ;-)
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
*** This bug has been marked as a duplicate of 10373 ***
Status: RESOLVED → VERIFIED
Bulk move of all Networking-Core (to be deleted component) bugs to new
Networking component.
You need to log in before you can comment on or make changes to this bug.