Closed Bug 49939 Opened 25 years ago Closed 25 years ago

encoding of non-ASCII URLs is sub-optimal

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

VERIFIED DUPLICATE of bug 43852

People

(Reporter: peter.williams, Assigned: nhottanscp)

Details

Mozilla uses (at least) two, totally incompatible, schemes to encode non-ASCII characters in URLs. One scheme is URL encoding (ie %hh) the iso-8859-1 numeric value for the character. The other is, for non-Latin 1 characters, converting the character into its UTF-8 octet sequence and theURL encoding that sequence. There being two, incompatible, methods for incoding text would be bad enough, but it appears as if you can have Latin 1 character and UTF-8 character in the same URL. In either case there is no good way to determine which underlying encoding was used for any given octet sequence. For instance if a cgi script receives the following "%E3%81%8D%E3%81%BE+%E5%27&" (this is 2 Japanese characters, a space, a lowercase A ring, an apostrophe, and an ampersand ) as part of a query string how do you decode this? Part of the URL is URF-8, part of it is ISO-8859-1 and there is no way to tell the difference (note that the octet sequence 229,39,38 (the last three octets, after URL decoding, is a prefectly valid UTF-8 character value. Of course the above example is a bit unlikely (having Japanese and Swedise chars in the same form results. But it is not unreasonable to expect a mix of, say, German and Polish or Italian and Greek, which I suspect would work the same way (Polish and Greek would be UTF-8ed; German and Italian would be ISO-8859-1ed). A much better approach would be to UTF-8 all non-ASCII characters so that applications could relieably decode URL encoded data from Mozilla. I know that the way Mozilla currently works is a pretty common approach but it is pure brain-damage.
A similar bug has been filed (bug 43852).
so is this a duplicate of bug 43852 -"Send URLs as UTF-8" not working
I'm going to confirm this bug but at the same time we should subsume this bug under Bug 43852. That bug discusses overall framework under which we would be dealing with non-ASCII URLs. The author of this bug should participate in that bug.
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** This bug has been marked as a duplicate of 43852 ***
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
Verified as dup.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.