Closed
Bug 1477983
Opened 7 years ago
Closed 6 years ago
Local utf-8 page and characters shown in Western Encoding
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
RESOLVED
DUPLICATE
of bug 1071816
People
(Reporter: t20, Unassigned)
Details
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180704003137
Steps to reproduce:
1. Create local page containing
<!DOCTYPE HTML>
and
<meta http-equiv="content-type" content="text/html;charset=utf-8">
in the HEAD.
2. Open this page in Firefox.
3. Use Firefox > Web developer > Network to verify that Response Header contains
Content-Type text/html; charset=UTF-8
Actual results:
1. Firefox > View > Text encoding either shows Western encoding (does this mean iso-8859-1 ? ANSI ? Windows Code Page 1252 ?) or it shows Unicode encoding (what does this mean?) or it is a disabled menu item. Any of these three results seem to occur randomly.
2. In any case, regardless of the reported encoding, the string "TEST™®©" (TEST followed by tm symbol, Registered symbol, and Copyright symbol) is displayed as "TEST���" (TEST followed by three diamonds each containing a question mark).
Expected results:
1. The selected character encoding should have been UTF-8, not "Western" or "Unicode".
2. The utf-characters should have been displayed correctly.
Comments:
1. Opening the original file using Notepad in utf-8 mode erroneously shows all the lines run together (possible incorrect handling of \n character), but does show the string TEST™®© correctly.
2. I am guessing that Firefox is not fully compliant with the modern standard utf-8 encoding. I see no reason why such compliance should be difficult to implement or verify.
3. A similar problem is seen in the Thunderbird email client.
4. It is possible that this problem is caused by some local encoding error that I have not been able to find.
Environment: HP laptop, Windows 10 Home (64), Firefox 61.0.1 (64), Thunderbird 52.9.1 (32), PHP 7.0.30.
| Reporter | ||
Comment 1•7 years ago
|
||
Further investigation raises doubt that the same problem is happening in Thunderbird. I was sending the emails in iso-8859-1 encoding. However, the problem reported here in Firefox appears unaffected by my error.
Comment 2•7 years ago
|
||
encoding menu is off as default, and it depends on localization build that this menu is shown or not. So P3
Priority: -- → P3
Comment 3•7 years ago
|
||
"Unicode" in Firefox encoding menu means UTF-8. Someone in Firefox devs considered that the term "UTF-8" is too technical for average users.
(In reply to David Spector from comment #0)
> 1. Firefox > View > Text encoding either shows Western encoding (does this
> mean iso-8859-1 ? ANSI ? Windows Code Page 1252 ?) or it shows Unicode
> encoding (what does this mean?) or it is a disabled menu item. Any of these
> three results seem to occur randomly.
Does it happen with the same file? If so, could you attach a testcase? I can't reproduce the issue.
| Reporter | ||
Comment 4•7 years ago
|
||
Reply to Comment #2: Encoding menus should ALWAYS be on, in my opinion, so users can correct for mistaken choices in encodings, which DO happen.
General comment: UTF-8 is the FUTURE. It allows whatever degree of encoding complexity needed to represent alphabetic and other needed glyphs on web pages. If there are problems, they should be resolved to assume/force UTF-8, always allowing the user to override this easily to accommodate the current years of transition. (In fact, UTF-8 is highly unfair to most of the world. It puts English in the one-byte range, which caters to and encourages the obsolete handling of one character in one byte still assumed by conservative tools like PHP. Programming languages should orient their string handling to a new data type, character (and char ptr where relevant), instead of ever allowing the handling of characters as bytes.)
Reply to Comment #3: Calling an encoding "Western" or "Unicode" is not needed, because ignorance should not be and need not be encouraged. Firefox stands out as one of the few tools that exposes the dirty truth of encoding in such a poor way. Why not start a project to change these words and the programming behind them to something more precise and helpful and consistent with standards? And, during this transition, why not scan strings a little and optionally switch automatically to a more appropriate encoding, irrespective of what the page header claims should be used? No browser that shows "garbage characters" is user-friendly.
Also to Comment #3, The request for a test case is reasonable and I will work on it soon.
Updated•6 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Comment 6•6 years ago
|
||
<meta http-equiv="content-type" content="text/html;charset=utf-8">
This should have been enough to trigger UTF-8 even in Firefox 61. Reporter, are you sure that your test file contained the above tag and not something different and typoed? If it contained precisely that tag, I'd still be interested in seeing a test case.
You need to log in
before you can comment on or make changes to this bug.
Description
•