Closed Bug 562096 Opened 15 years ago Closed 11 years ago

Support charset aliasing per Encoding Standard

Tracking

()

Status:

RESOLVED DUPLICATE of bug 801402

People

(Reporter: jshin1987, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

charsetalias.properties per Encoding Standard 13 years ago Masatoshi Kimura [:emk] 3.80 KB, text/plain		Details
updated to the latest spec 13 years ago Masatoshi Kimura [:emk] 5.22 KB, text/plain		Details

Jungshik Shin

Reporter

Description

•

15 years ago

http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html has the following: ------------------- When a user agent would otherwise use an encoding given in the first column of the following table to either convert content to Unicode characters or convert Unicode characters to bytes, it must instead use the encoding given in the cell in the second column of the same row. When a byte or sequence of bytes is treated differently due to this encoding aliasing, it is said to have been misinterpreted for compatibility. Character encoding overrides Input encoding Replacement encoding References EUC-KR windows-949 [EUCKR] [WIN949] GB2312 GBK [RFC1345] [GBK] GB_2312-80 GBK [RFC1345] [GBK] ISO-8859-1 windows-1252 [RFC1345] [WIN1252] ISO-8859-9 windows-1254 [RFC1345] [WIN1254] ISO-8859-11 windows-874 [ISO885911] [WIN874] KS_C_5601-1987 windows-949 [RFC1345] [WIN949] Shift_JIS Windows-31J [SHIFTJIS] [WIN31J] TIS-620 windows-874 [TIS620] [WIN874] US-ASCII windows-1252 [RFC1345] [WIN1252] -------------------- We already do some of the above aliasing (e.g. ISO-8859-1 > windows-1252), but not all. For Korean-specific issue, see bug 562091. Because HTML5 stipulates that we do the above aliasing for both directions, we can get rid of some of converters to save some space. I tried to find a bug on this, but couldn't find. If it's already filed/resolved in the trunk, please accept my apology.

Masatoshi Kimura [:emk]

Updated

•

14 years ago

Depends on: 600715

Gordon P. Hemsley [:GPHemsley]

Updated

•

13 years ago

Depends on: 712876

Masatoshi Kimura [:emk]

Updated

•

13 years ago

Blocks: encoding

Masatoshi Kimura [:emk]

Comment 1

•

13 years ago

This section would be superseded by Encoding Standard.

Summary: Support charset aliasing per HTML5 → Support charset aliasing per Encoding Standard

Masatoshi Kimura [:emk]

Comment 2

•

13 years ago

Attached file charsetalias.properties per Encoding Standard (obsolete) — Details

This charsetalias.properties will have the following effects. A. The following encodings will no longer be available. A.1 XSS vulnerable encodings: x-mac-arabic, x-mac-farsi, x-mac-hebrew, x-imap4-modified-utf7, UTF-7, T.61-8bit A.2 IBM encodings other than ibm864 and ibm866 IBM850, IBM852, IBM855, IBM857, IBM862, IBM864i A.3 Mac encodings other than macintosh (MacRoman) x-mac-ce, x-mac-croatian, x-mac-devanagari, x-mac-greek, x-mac-gujarati, x-mac-gurmukhi, x-mac-icelandic, x-mac-romanian, x-mac-turkish A.4 Vietnamese encodings x-viet-tcvn5712, x-viet-vps,VISCII A.5 Others x-euc-tw, armscii-8, x-johab, x-user-defined, ISO-IR-111, ISO-2022-CN, ISO-8859-6-E, ISO-8859-6-I, ISO-8859-8-E, ISO-8859-8-I B. The following aliases will be removed. UTF-16BE: csunicode11, csunicode, csunicodeascii, csunicodelatin1, iso-10646-j-1, iso-10646-ucs-2, iso-10646-ucs-basic, iso-10646-unicode-latin1, iso-10646, x-iso-10646-ucs-2-be, x-iso-10646-ucs-2-le IBM864: 864, csibm864, ibm-864 IBM866: 866, csibm866, cp-866 windows-1250: cp1250 windows-1251: cp1251, ansi-1251 windows-1252: cp1252, x-cp1252 windows-1253: x-cp1253 windows-1254: cp1254,x-cp1254 windows-1255: x-cp1255 windows-1256: x-cp1256 windows-1257: cp1257, x-cp1257 windows-1258: x-cp1258 windows-874: ibm874 ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591 ISO-8859-2: iso88592, iso8859-3, iso88593 ISO-8859-4: iso8859-4, iso88594 ISO-8859-5: iso8859-5, iso88595 ISO-8859-6: asmo-708, iso8859-6, iso88596 ISO-8859-7: iso8859-7, iso88597, sun_eu_greek ISO-8859-8: iso8859-8, iso88598 ISO-8859-9: iso8859-9, iso88599, iso_8859-9 ISO-8859-10: iso885910 ISO-8859-11: iso8859-11, iso885911 ISO-8859-12: iso885912 ISO-8859-13: iso8859-13, iso885913 ISO-8859-14: iso885914 ISO-8859-15: iso8859-15, iso885915 EUC-KR: 5601 us-ascii: 646 Shift_JIS: cp932 ISO-2022-JP: csiso2022jp2, iso-2022-jp-2 TIS-620: tis620 gbk: windows-936 GB2312: zh_cn.euc Big5: zh_tw-big5 C. The following encodings will be replaced with other (usually superset) encodings. GB2312->gbk, us-ascii->windows-1252, ISO-8859-1->windows-1252, ISO-8859-9->windows-1254, ISO-8859-11->windows-874, TIS-620->windows-874, iso-8859-8-i->ISO-8859-8, Big5-HKSCS->Big5, UTF-16->UTF-16LE EUC-KR->x-windows-949 (Encoding Standard calls windows-949 as EUC-KR) D. The following aliases will be added. cn-big5=Big5 sjis=Shift_JIS windows-949=EUC-KR

Gordon P. Hemsley [:GPHemsley]

Updated

•

13 years ago

Attachment #617222 - Attachment is patch: false

Masatoshi Kimura [:emk]

Comment 3

•

13 years ago

Attached file updated to the latest spec — Details

The following aliases (and even more) has been added again. IBM864: csibm864, ibm-864 IBM866: 866, csibm866 windows-1250: cp1250 windows-1251: cp1251 windows-1252: cp1252, x-cp1252 windows-1253: x-cp1253 windows-1254: cp1254,x-cp1254 windows-1255: x-cp1255 windows-1256: x-cp1256 windows-1257: cp1257, x-cp1257 windows-1258: x-cp1258 ISO-8859-1: ibm819, cp819, iso-ir-100, iso88591 ISO-8859-2: iso88592, iso8859-3, iso88593 ISO-8859-4: iso8859-4, iso88594 ISO-8859-5: iso8859-5, iso88595 ISO-8859-6: asmo-708, iso8859-6, iso88596 ISO-8859-7: iso8859-7, iso88597, sun_eu_greek ISO-8859-8: iso8859-8, iso88598 ISO-8859-9: iso8859-9, iso88599, iso_8859-9 ISO-8859-10: iso885910 ISO-8859-11: iso8859-11, iso885911 ISO-8859-12: iso885912 ISO-8859-13: iso8859-13, iso885913 ISO-8859-14: iso885914 ISO-8859-15: iso8859-15, iso885915

Attachment #617222 - Attachment is obsolete: true

Masatoshi Kimura [:emk]

Updated

•

12 years ago

Depends on: 802030

Aryeh Gregor (:ayg) (no longer with Mozilla)

Comment 4

•

12 years ago

Does it make sense to do this all in one big change? Some of these changes have bigger compat implications than others, and some might just be spec bugs. Wouldn't it be more prudent to make this a tracker bug and pursue changes bit by bit in individual bugs?

Depends on: 802059

Aryeh Gregor (:ayg) (no longer with Mozilla)

Updated

•

12 years ago

Depends on: 802069

Aryeh Gregor (:ayg) (no longer with Mozilla)

Updated

•

12 years ago

Depends on: 802082

Henri Sivonen (:hsivonen)

Comment 5

•

11 years ago

The browser side of this was fixed by bug 801402. Moving the legacy code to comm-central is bug 943268.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Support charset aliasing per Encoding Standard

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: jshin1987, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Updated

Comment 4

Updated

Updated

Comment 5

Attachment

General

Description

File Name

Content Type