Closed Bug 470523 Opened 16 years ago Closed 10 years ago

ISO-2022-CN decoder is horribly broken

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: emk, Assigned: emk)

Details

Attachments

(2 files, 1 obsolete file)

Sample text from RFC 1922 16 years ago Masatoshi Kimura [:emk] 20 bytes, text/plain; charset=ISO-2022-CN		Details
overhauled ISO-2022-CN decoder 16 years ago Masatoshi Kimura [:emk] 34.56 KB, patch		Details \| Diff \| Splinter Review
fixed some XSS vectors 16 years ago Masatoshi Kimura [:emk] 35.59 KB, patch		Details \| Diff \| Splinter Review

Masatoshi Kimura [:emk]

Assignee

Description

•

16 years ago

Attached file Sample text from RFC 1922 — Details

It can't decode even RFC 1922 sample text correctly.

Masatoshi Kimura [:emk]

Assignee

Comment 1

•

16 years ago

Attached patch overhauled ISO-2022-CN decoder (obsolete) — Details — Splinter Review

Assignee: smontagu → VYV03354

Status: NEW → ASSIGNED

Attachment #353945 - Flags: review?(smontagu)

Masatoshi Kimura [:emk]

Assignee

Comment 2

•

16 years ago

Attached patch fixed some XSS vectors — Details — Splinter Review

Oops, the previous patch was still vulnerable to some XSS attacks.

Attachment #353945 - Attachment is obsolete: true

Attachment #354014 - Flags: review?(smontagu)

Attachment #353945 - Flags: review?(smontagu)

Simon Montagu :smontagu

Comment 3

•

16 years ago

The patch looks technically correct, but I'm not really sure that we need it. I don't know how commonly used ISO-2022-CN is in the first place, and it must be very unusual to have traditional and simple Chinese mixed in the same message, let alone the same line.

FWIW, iconv doesn't decode the sample text correctly either (the expected decoding is 交换交換)

Masatoshi Kimura [:emk]

Assignee

Comment 4

•

16 years ago

> The patch looks technically correct, but I'm not really sure that we need it.
If we don't need it, we should completely remove it rather than leaving it broken. Broken support is worse than no support. (Remember Nav4 CSS support!)
> I don't know how commonly used ISO-2022-CN is in the first place,
I'm afraid that it's rarely used. ISO-2022-JP is an only survivor of 7-bit ISO/IEC 2022 family encodings. But it's defined in RFC, registered by IANA, and supported by Microsoft Windows (CP50227 and 50229).
> and it must be
> very unusual to have traditional and simple Chinese mixed in the same message,
> let alone the same line.
The current decoder treats a single shift as it's a locking shift. So traditional Chinese only text will not be parsed correctly if it contains plane 2 or 3 characters.
> FWIW, iconv doesn't decode the sample text correctly either (the expected
decoding is 交换交換)
Which iconv? GNU iconv 1.11 which is included in MozillaBuild handled it correctly.

Masatoshi Kimura [:emk]

Assignee

Comment 5

•

13 years ago

I'm going to consider dropping support for ISO-2022-CN because:
1. IE and WebKit do not support this encoding.
2. Our implementation has been broken for a long time.

Masatoshi Kimura [:emk]

Assignee

Comment 6

•

13 years ago

Why ISO-2022-CN is defined in Encoding Standard?
http://dvcs.w3.org/hg/encoding/raw-file/78076fc4d607/Overview.html#iso-2022-cn

Anne (:annevk)

Comment 7

•

13 years ago

I'm open to removing it. I believe Chrome specifically added basic support (not backed by encoding tables though) for it to prevent XSS exploits.

Anne (:annevk)

Comment 8

•

13 years ago

http://code.google.com/p/chromium/issues/detail?id=21354 is what Chrome has done. For "iso-2022-cn" and "iso-2022-cn-ext".

Glenn Maynard

Comment 9

•

13 years ago

One idea was to always treat text/html; charset=iso-2022-cn as text/plain to prevent XSS problems.  That makes it explicitly not work at all, rather than be parsed as HTML with the wrong charset (leading to the XSS issue).

I don't know if that would complicate the parser; for example, meta charset could cause the content type to change instead of just the charset.

Masatoshi Kimura [:emk]

Assignee

Comment 10

•

13 years ago

Hm, IE9 was vulnerable to this XSS attack.
http://i18nl10n.com/chrome/iso2022cn1.html
http://i18nl10n.com/chrome/iso2022cn2.html

Anne (:annevk)

Comment 11

•

13 years ago

Given that Internet Explorer does not support this encoding at all, maybe we should just remove it from all browsers (and the spec)?

Masatoshi Kimura [:emk]

Assignee

Comment 12

•

10 years ago

The ISO-2022-CN decoder will soon be removed (bug 997115).
We have introduced the "replacement" encoding to avoid the security issue.

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

Simon Montagu :smontagu

Updated

•

10 years ago

Attachment #354014 - Flags: review?(smontagu)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

ISO-2022-CN decoder is horribly broken

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: emk, Assigned: emk)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type