Closed
Bug 562096
Opened 15 years ago
Closed 11 years ago
Support charset aliasing per Encoding Standard
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
DUPLICATE
of bug 801402
People
(Reporter: jshin1987, Assigned: smontagu)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
5.22 KB,
text/plain
|
Details |
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
has the following:
-------------------
When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility.
Character encoding overrides
Input encoding Replacement encoding References
EUC-KR windows-949 [EUCKR] [WIN949]
GB2312 GBK [RFC1345] [GBK]
GB_2312-80 GBK [RFC1345] [GBK]
ISO-8859-1 windows-1252 [RFC1345] [WIN1252]
ISO-8859-9 windows-1254 [RFC1345] [WIN1254]
ISO-8859-11 windows-874 [ISO885911] [WIN874]
KS_C_5601-1987 windows-949 [RFC1345] [WIN949]
Shift_JIS Windows-31J [SHIFTJIS] [WIN31J]
TIS-620 windows-874 [TIS620] [WIN874]
US-ASCII windows-1252 [RFC1345] [WIN1252]
--------------------
We already do some of the above aliasing (e.g. ISO-8859-1 > windows-1252), but not all. For Korean-specific issue, see bug 562091.
Because HTML5 stipulates that we do the above aliasing for both directions, we can get rid of some of converters to save some space.
I tried to find a bug on this, but couldn't find. If it's already filed/resolved in the trunk, please accept my apology.
Comment 1•13 years ago
|
||
This section would be superseded by Encoding Standard.
Summary: Support charset aliasing per HTML5 → Support charset aliasing per Encoding Standard
Comment 2•13 years ago
|
||
This charsetalias.properties will have the following effects.
A. The following encodings will no longer be available.
A.1 XSS vulnerable encodings:
x-mac-arabic, x-mac-farsi, x-mac-hebrew, x-imap4-modified-utf7,
UTF-7, T.61-8bit
A.2 IBM encodings other than ibm864 and ibm866
IBM850, IBM852, IBM855, IBM857, IBM862, IBM864i
A.3 Mac encodings other than macintosh (MacRoman)
x-mac-ce, x-mac-croatian, x-mac-devanagari, x-mac-greek, x-mac-gujarati,
x-mac-gurmukhi, x-mac-icelandic, x-mac-romanian, x-mac-turkish
A.4 Vietnamese encodings
x-viet-tcvn5712, x-viet-vps,VISCII
A.5 Others
x-euc-tw, armscii-8, x-johab, x-user-defined, ISO-IR-111, ISO-2022-CN,
ISO-8859-6-E, ISO-8859-6-I, ISO-8859-8-E, ISO-8859-8-I
B. The following aliases will be removed.
UTF-16BE: csunicode11, csunicode, csunicodeascii, csunicodelatin1,
iso-10646-j-1, iso-10646-ucs-2, iso-10646-ucs-basic,
iso-10646-unicode-latin1, iso-10646, x-iso-10646-ucs-2-be,
x-iso-10646-ucs-2-le
IBM864: 864, csibm864, ibm-864
IBM866: 866, csibm866, cp-866
windows-1250: cp1250
windows-1251: cp1251, ansi-1251
windows-1252: cp1252, x-cp1252
windows-1253: x-cp1253
windows-1254: cp1254,x-cp1254
windows-1255: x-cp1255
windows-1256: x-cp1256
windows-1257: cp1257, x-cp1257
windows-1258: x-cp1258
windows-874: ibm874
ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591
ISO-8859-2: iso88592, iso8859-3, iso88593
ISO-8859-4: iso8859-4, iso88594
ISO-8859-5: iso8859-5, iso88595
ISO-8859-6: asmo-708, iso8859-6, iso88596
ISO-8859-7: iso8859-7, iso88597, sun_eu_greek
ISO-8859-8: iso8859-8, iso88598
ISO-8859-9: iso8859-9, iso88599, iso_8859-9
ISO-8859-10: iso885910
ISO-8859-11: iso8859-11, iso885911
ISO-8859-12: iso885912
ISO-8859-13: iso8859-13, iso885913
ISO-8859-14: iso885914
ISO-8859-15: iso8859-15, iso885915
EUC-KR: 5601
us-ascii: 646
Shift_JIS: cp932
ISO-2022-JP: csiso2022jp2, iso-2022-jp-2
TIS-620: tis620
gbk: windows-936
GB2312: zh_cn.euc
Big5: zh_tw-big5
C. The following encodings will be replaced with other (usually superset)
encodings.
GB2312->gbk, us-ascii->windows-1252, ISO-8859-1->windows-1252,
ISO-8859-9->windows-1254, ISO-8859-11->windows-874, TIS-620->windows-874,
iso-8859-8-i->ISO-8859-8, Big5-HKSCS->Big5, UTF-16->UTF-16LE
EUC-KR->x-windows-949 (Encoding Standard calls windows-949 as EUC-KR)
D. The following aliases will be added.
cn-big5=Big5
sjis=Shift_JIS
windows-949=EUC-KR
Updated•13 years ago
|
Attachment #617222 -
Attachment is patch: false
Comment 3•13 years ago
|
||
The following aliases (and even more) has been added again.
IBM864: csibm864, ibm-864
IBM866: 866, csibm866
windows-1250: cp1250
windows-1251: cp1251
windows-1252: cp1252, x-cp1252
windows-1253: x-cp1253
windows-1254: cp1254,x-cp1254
windows-1255: x-cp1255
windows-1256: x-cp1256
windows-1257: cp1257, x-cp1257
windows-1258: x-cp1258
ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591
ISO-8859-2: iso88592, iso8859-3, iso88593
ISO-8859-4: iso8859-4, iso88594
ISO-8859-5: iso8859-5, iso88595
ISO-8859-6: asmo-708, iso8859-6, iso88596
ISO-8859-7: iso8859-7, iso88597, sun_eu_greek
ISO-8859-8: iso8859-8, iso88598
ISO-8859-9: iso8859-9, iso88599, iso_8859-9
ISO-8859-10: iso885910
ISO-8859-11: iso8859-11, iso885911
ISO-8859-12: iso885912
ISO-8859-13: iso8859-13, iso885913
ISO-8859-14: iso885914
ISO-8859-15: iso8859-15, iso885915
Attachment #617222 -
Attachment is obsolete: true
Comment 4•12 years ago
|
||
Does it make sense to do this all in one big change? Some of these changes have bigger compat implications than others, and some might just be spec bugs. Wouldn't it be more prudent to make this a tracker bug and pursue changes bit by bit in individual bugs?
Depends on: 802059
Comment 5•11 years ago
|
||
The browser side of this was fixed by bug 801402. Moving the legacy code to comm-central is bug 943268.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•