Support charset aliasing per Encoding Standard

RESOLVED DUPLICATE of bug 801402

Status

()

RESOLVED DUPLICATE of bug 801402
9 years ago
5 years ago

People

(Reporter: jshin1987, Assigned: smontagu)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

9 years ago
http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html
has the following:


-------------------
When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility.

Character encoding overrides
Input encoding	 Replacement encoding	 References
EUC-KR	 windows-949	[EUCKR] [WIN949]
GB2312	 GBK	[RFC1345] [GBK]
GB_2312-80	 GBK	[RFC1345] [GBK]
ISO-8859-1	 windows-1252	[RFC1345] [WIN1252]
ISO-8859-9	 windows-1254	[RFC1345] [WIN1254]
ISO-8859-11	 windows-874	[ISO885911] [WIN874]
KS_C_5601-1987	 windows-949	[RFC1345] [WIN949]
Shift_JIS	 Windows-31J	[SHIFTJIS] [WIN31J]
TIS-620	 windows-874	[TIS620] [WIN874]
US-ASCII	 windows-1252	[RFC1345] [WIN1252]
--------------------

We already do some of the above aliasing (e.g. ISO-8859-1 > windows-1252), but not all. For Korean-specific issue, see bug 562091. 

Because HTML5 stipulates that we do the above aliasing for both directions, we can get rid of some of converters to save some space.  

I tried to find a bug on this, but couldn't find. If it's already filed/resolved in the trunk, please accept my apology.
Depends on: 600715
Depends on: 712876
Blocks: 746911
This section would be superseded by Encoding Standard.
Summary: Support charset aliasing per HTML5 → Support charset aliasing per Encoding Standard
Created attachment 617222 [details]
charsetalias.properties per Encoding Standard

This charsetalias.properties will have the following effects.

A. The following encodings will no longer be available.
A.1 XSS vulnerable encodings:
    x-mac-arabic, x-mac-farsi, x-mac-hebrew, x-imap4-modified-utf7,
    UTF-7, T.61-8bit
A.2 IBM encodings other than ibm864 and ibm866
    IBM850, IBM852, IBM855, IBM857, IBM862, IBM864i
A.3 Mac encodings other than macintosh (MacRoman)
    x-mac-ce, x-mac-croatian, x-mac-devanagari, x-mac-greek, x-mac-gujarati,
    x-mac-gurmukhi, x-mac-icelandic, x-mac-romanian, x-mac-turkish
A.4 Vietnamese encodings
    x-viet-tcvn5712, x-viet-vps,VISCII
A.5 Others
    x-euc-tw, armscii-8, x-johab, x-user-defined, ISO-IR-111, ISO-2022-CN,
    ISO-8859-6-E, ISO-8859-6-I, ISO-8859-8-E, ISO-8859-8-I

B. The following aliases will be removed.
   UTF-16BE: csunicode11, csunicode, csunicodeascii, csunicodelatin1,
             iso-10646-j-1, iso-10646-ucs-2, iso-10646-ucs-basic,
             iso-10646-unicode-latin1, iso-10646, x-iso-10646-ucs-2-be,
             x-iso-10646-ucs-2-le
   IBM864: 864, csibm864, ibm-864
   IBM866: 866, csibm866, cp-866
   windows-1250: cp1250
   windows-1251: cp1251, ansi-1251
   windows-1252: cp1252, x-cp1252
   windows-1253: x-cp1253
   windows-1254: cp1254,x-cp1254
   windows-1255: x-cp1255
   windows-1256: x-cp1256
   windows-1257: cp1257, x-cp1257
   windows-1258: x-cp1258
   windows-874: ibm874
   ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591
   ISO-8859-2: iso88592, iso8859-3, iso88593
   ISO-8859-4: iso8859-4, iso88594
   ISO-8859-5: iso8859-5, iso88595
   ISO-8859-6: asmo-708, iso8859-6, iso88596
   ISO-8859-7: iso8859-7, iso88597, sun_eu_greek
   ISO-8859-8: iso8859-8, iso88598
   ISO-8859-9: iso8859-9, iso88599, iso_8859-9
   ISO-8859-10: iso885910
   ISO-8859-11: iso8859-11, iso885911
   ISO-8859-12: iso885912
   ISO-8859-13: iso8859-13, iso885913
   ISO-8859-14: iso885914
   ISO-8859-15: iso8859-15, iso885915
   EUC-KR: 5601
   us-ascii: 646
   Shift_JIS: cp932
   ISO-2022-JP: csiso2022jp2, iso-2022-jp-2
   TIS-620: tis620
   gbk: windows-936
   GB2312: zh_cn.euc
   Big5: zh_tw-big5

C. The following encodings will be replaced with other (usually superset)
   encodings.
   GB2312->gbk, us-ascii->windows-1252, ISO-8859-1->windows-1252,
   ISO-8859-9->windows-1254, ISO-8859-11->windows-874, TIS-620->windows-874,
   iso-8859-8-i->ISO-8859-8, Big5-HKSCS->Big5, UTF-16->UTF-16LE
   EUC-KR->x-windows-949 (Encoding Standard calls windows-949 as EUC-KR)

D. The following aliases will be added.
   cn-big5=Big5
   sjis=Shift_JIS
   windows-949=EUC-KR
Attachment #617222 - Attachment is patch: false
Created attachment 618267 [details]
updated to the latest spec

The following aliases (and even more) has been added again.
   IBM864: csibm864, ibm-864
   IBM866: 866, csibm866
   windows-1250: cp1250
   windows-1251: cp1251
   windows-1252: cp1252, x-cp1252
   windows-1253: x-cp1253
   windows-1254: cp1254,x-cp1254
   windows-1255: x-cp1255
   windows-1256: x-cp1256
   windows-1257: cp1257, x-cp1257
   windows-1258: x-cp1258
   ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591
   ISO-8859-2: iso88592, iso8859-3, iso88593
   ISO-8859-4: iso8859-4, iso88594
   ISO-8859-5: iso8859-5, iso88595
   ISO-8859-6: asmo-708, iso8859-6, iso88596
   ISO-8859-7: iso8859-7, iso88597, sun_eu_greek
   ISO-8859-8: iso8859-8, iso88598
   ISO-8859-9: iso8859-9, iso88599, iso_8859-9
   ISO-8859-10: iso885910
   ISO-8859-11: iso8859-11, iso885911
   ISO-8859-12: iso885912
   ISO-8859-13: iso8859-13, iso885913
   ISO-8859-14: iso885914
   ISO-8859-15: iso8859-15, iso885915
Attachment #617222 - Attachment is obsolete: true
Depends on: 802030
Does it make sense to do this all in one big change?  Some of these changes have bigger compat implications than others, and some might just be spec bugs.  Wouldn't it be more prudent to make this a tracker bug and pursue changes bit by bit in individual bugs?
Depends on: 802059
The browser side of this was fixed by bug 801402. Moving the legacy code to comm-central is bug 943268.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 801402
You need to log in before you can comment on or make changes to this bug.