Closed
Bug 49939
Opened 25 years ago
Closed 25 years ago
encoding of non-ASCII URLs is sub-optimal
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
People
(Reporter: peter.williams, Assigned: nhottanscp)
Details
Mozilla uses (at least) two, totally incompatible, schemes to encode non-ASCII characters in URLs. One scheme is URL encoding (ie %hh) the iso-8859-1 numeric value for the character. The other is, for non-Latin 1 characters, converting the character into its UTF-8 octet sequence and theURL encoding that sequence. There being two, incompatible, methods for incoding text would be bad enough, but it appears as if you can have Latin 1 character and UTF-8 character in the same URL. In either case there is no good way to determine which underlying encoding was used for any given octet sequence. For instance if a cgi script receives the following "%E3%81%8D%E3%81%BE+%E5%27&" (this is 2 Japanese characters, a space, a lowercase A ring, an apostrophe, and an ampersand ) as part of a query string how do you decode this? Part of the URL is URF-8, part of it is ISO-8859-1 and there is no way to tell the difference (note that the octet sequence 229,39,38 (the last three octets, after URL decoding, is a prefectly valid UTF-8 character value.
Of course the above example is a bit unlikely (having Japanese and Swedise chars in the same form results. But it is not unreasonable to expect a mix of, say, German and Polish or Italian and Greek, which I suspect would work the same way (Polish and Greek would be UTF-8ed; German and Italian would be ISO-8859-1ed).
A much better approach would be to UTF-8 all non-ASCII characters so that applications could relieably decode URL encoded data from Mozilla. I know that the way Mozilla currently works is a pretty common approach but it is pure brain-damage.
Comment 3•25 years ago
|
||
I'm going to confirm this bug but at the same time we should
subsume this bug under Bug 43852. That bug discusses
overall framework under which we would be dealing with
non-ASCII URLs. The author of this bug should participate
in that bug.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 4•25 years ago
|
||
*** This bug has been marked as a duplicate of 43852 ***
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•