Closed Bug 7399 Opened 26 years ago Closed 26 years ago

Escaping illegal chars in URLs

Tracking

()

Status:

VERIFIED DUPLICATE of bug 10373

Milestone:

People

(Reporter: hjtoi-bugzilla, Assigned: gagan)

References

(
URL
)

Details

Heikki Toivonen (remove -bugzilla when emailing directly)

Reporter

Description

•

26 years ago

Recently http://www.biztalk.org had spaces in links. They worked in IE and Opera, but not in Netscape nor Gecko. They later changed the spaces to underscores. In XML world at least the browser should escape ALL illegal characters in URLs (I just read a mail about that today, but can't remember on which list it was). So if there are spaces in URLs they should be escaped with %20 automatically by the browser. Gecko understands escaped URLs, it is just a matter of doing the escaping... The URL has a doc that contains one link that points to a file with a space in its name. IE handles that fine, NS and Gecko fail.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 1

•

26 years ago

There are some problems with this: 1) different URL RFCs have different ideas of what illegal characters are 2) Should the URL, as given in the document already be legal? Is it the job of the browser to correct a URL when the correction might mess up the server? (What do current browsers do here?) I think one may end up having to stick to tradition on this, but I'm not really sure what the URL RFC's say about correction of URLs. (When the site you mention above had spaces in links, was the whole thing in quotes? If not, then the problem was with parsing.)

Heikki Toivonen (remove -bugzilla when emailing directly)

Reporter

Comment 2

•

26 years ago

It took some time to find where I had read that piece about illegal characters in URIs (note, _URI_). The below URLs should answer your questions. The discussion happened on XML-DEV. Here is a link to the archive and the thread you should read: http://www.lists.ic.ac.uk/hypermail/xml-dev/xml-dev-May-1999/0573.html Here are some extracted relevant URLs from the discussion: http://www.w3.org/TR/WD-charmod#URIs http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2

Hixie (not reading bugmail)

Comment 3

•

26 years ago

According to those last two links, which point to HTML 4 section B.2.1 and the last working draft of the W3C Character Model respectively, we should indeed be escaping URIs. 1) We should probably take the superset. That way all bases are covered. 2) Yes, the URI in the document should indeed be legal. No, I would say that it is not our job to correct it. However, we should certainly not be sending invalid URIs to servers, so I suggest encoding would be best. Currently, we are dropping spaces in URIs altogether (this happens somewhere in the content sink, see bug 8319). We should certainly not be doing this.

Gagan

Assignee

Comment 4

•

26 years ago

Pushed past necko landing...

leger

Comment 5

•

26 years ago

Changing all Networking Library/Browser bugs to Networking-Core component for Browser. Occasionally, Bugzilla will burp and cause Verified bugs to reopen when I do this in a bulk change. If this happens, I will fix. ;-)

Gagan

Assignee

Updated

•

26 years ago

Status: ASSIGNED → RESOLVED

Closed: 26 years ago

Resolution: --- → DUPLICATE

Gagan

Assignee

Comment 6

•

26 years ago

*** This bug has been marked as a duplicate of 10373 ***

Paul MacQuiddy

Updated

•

26 years ago

Status: RESOLVED → VERIFIED

leger

Comment 7

•

26 years ago

Bulk move of all Networking-Core (to be deleted component) bugs to new Networking component.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Escaping illegal chars in URLs

Categories

(Core :: Networking, defect, P3)

Tracking

()

People

(Reporter: hjtoi-bugzilla, Assigned: gagan)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated

Comment 7