Closed
Bug 329202
Opened 18 years ago
Closed 17 years ago
The URL bar encodes non-Latin URLs as $LANG/hex instead of UTF8/hex
Categories
(Firefox :: Address Bar, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: ilatypov, Unassigned)
References
()
Details
(Whiteboard: CLOSEME 07/24)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060209 Debian/1.5.dfsg+1.5.0.1-2 Firefox/1.5.0.1 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060209 Debian/1.5.dfsg+1.5.0.1-2 Firefox/1.5.0.1 This issue is about the way Firefox automatically translates Unicode symbols typed into the address bar by the user as part of the URL path or query. The HTML standard recommends that browsers will represent the user input as a sequence of Unicode symbols and encode it with UTF-8, then HEX before sending the HTTP GET request. [The reason for this recommendation is that the browser is unaware about the encoding preferred by the server or used by the page]. Pasting the Unicode Cyrillic (U+0x400..U+0x4ff) link generated below into the Firefox'es address bar and hitting Enter will unexpectedly transform it to the 8-bit (0..0xff) Cyrillic character set KOI8-R before encoding it with UTF-8/HEX. My environment variable LANG is set to "ru_RU.KOI8-R". The second link uses Latin (U+0..U+0xff) characters only. It is encoded correctly. Perhaps, the unexpected transformation of Unicode symbols to the 8-bit character set specified in $LANG (KOI8-R) happens to be identical. Curiously, clicking a Unicode Cyrillic link inside a UTF-8 web page works as expected. I.e., the Unicode symbols are encoded directly to UTF-8/HEX. function decode_utf_hex() { UTFHEX="$1" decoded=$(echo -n "$UTFHEX" \ | perl -pe 's/%([a-fA-F0-9]{2,2})/chr(hex($1))/eg;') url="http://en.wikipedia.org/wiki/$decoded" echo "$url" > /tmp/f gedit --encoding=utf-8 /tmp/f } # "kirillitsa" in U+0x400..U+0x4ff decode_utf_hex "%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0" # "Re'sume'" in U+0..U+0xff decode_utf_hex "R%C3%A9sum%C3%A9" ----------------- Note 1. Using Wikipedia here is just a convenience. Note 2. There isn't any assumed conversion/redirection of the link on the server side. Note 3. I am intentionally avoiding any Unicode symbols in the text of this bug because this Bugzilla's server character set isn't UTF-8. Reproducible: Always Steps to Reproduce:
Reporter | ||
Comment 1•18 years ago
|
||
The URL expected in the address bar: http://en.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0 The URL observed matter-of-factly: http://en.wikipedia.org/wiki/%C3%8B%C3%89%C3%92%C3%89%C3%8C%C3%8C%C3%89%C3%83%C3%81 Note. The bug occurs only on Linux platform. On Windows, the Unicode URLs are encoded correctly, i.e. directly to UTF-8/HEX.
Reporter | ||
Comment 2•18 years ago
|
||
(In reply to comment #1) > The URL expected in the address bar: > > http://en.wikipedia.org/wiki/%D0%9A%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0 Here Wikipedia does change the first letter to upper case by doing an HTTP redirect, thus contaminating the experiment. The expected Firefox'es role here is to encode the Unicode URL to UTF-8/HEX, producing the value submitted to decode_utf_hex: http://en.wikipedia.org/wiki/%D0%BA%D0%B8%D1%80%D0%B8%D0%BB%D0%BB%D0%B8%D1%86%D0%B0
Comment 3•18 years ago
|
||
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9a1) Gecko/20060303 Firefox/1.6a1 ID:0000000000 Works fine for me, but my environment LANG variable is en_GB.UTF-8.
Comment 4•17 years ago
|
||
Ilguiz, are you able to reproduce this with a current 2.x, 3.x, or trunk build? Thanks.
Whiteboard: CLOSEME 2007-07-24
Reporter | ||
Comment 5•17 years ago
|
||
Yes. the bug remains. I am using Iceweasel 2.0.0.4 with Gecko/20070508 from Debian unstable. I did not test Firefox 3, though. Just clicking the correct non-latin link above brings me to an expected Wikipedia page. When I copy the name of the page from its contents into the cut-and-paste buffer, paste the utf-hex encoded part of URL with the cut buffer and hit Enter, I am brought to another page with a garbled link. I don't think this is because of Wikipedia's server-side rewriting of URLs. See, for example, a Usemod wiki page, http://ei.homeip.net/wiki?кириллица Cut the name of the page, paste it into URL instead of the utf-hex encoded part and hit Enter to reproduce the issue.
Reporter | ||
Comment 6•17 years ago
|
||
To sum up, Firefox 2.0 sends the LANG hex encoding of a Unicode URL instead of UTF hex encoding.
Reporter | ||
Updated•17 years ago
|
Summary: Unexpected Unicode-to-$LANG transformation when typing in a non-Latin path or query. → The URL bar encodes non-Latin URLs as $LANG/hex instead of UTF8/hex
Comment 7•17 years ago
|
||
http://localhost/тест.txt Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.5pre) Gecko/20070703 BonEcho/2.0.0.5pre en_GB.UTF-8: http://localhost/%D1%82%D0%B5%D1%81%D1%82.txt ru_RU.KOI8-R: http://localhost/%D4%C5%D3%D4.txt (and the file cannot be loaded) Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a7pre) Gecko/20070703 Minefield/3.0a7pre en_GB.UTF-8: http://localhost/%D1%82%D0%B5%D1%81%D1%82.txt ru_RU.KOI8-R: http://localhost/%D1%82%D0%B5%D1%81%D1%82.txt That's because the default for network.standard-url.encode-utf8 is "false" on the former, and "true" on the latter. This bug is either INVALID, or DUPLICATE of a bug that caused the default change.
Version: unspecified → 2.0 Branch
Reporter | ||
Updated•17 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
Comment 9•17 years ago
|
||
No, it's not a duplicate of bug 105909. 1. That bug doesn't cause Error 404. 2. See comment #37 there about the pref.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Reporter | ||
Comment 10•17 years ago
|
||
I would think that the reason the file could not be loaded was the default value of the option you pointed me to,
> network.standard-url.encode-utf8=false
I understand that the change of the above option's default value to true fixes my problem. (I don't even see a reason to keep this option available for modifications).
Correct me if I am wrong.
Updated•17 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → INVALID
Whiteboard: CLOSEME 2007-07-24 → CLOSEME 07/24
You need to log in
before you can comment on or make changes to this bug.
Description
•