Closed Bug 1508136 Opened 7 years ago Closed 7 years ago

ISO-2022-JP text is displayed with a replacement character <?> if it contains a zero-length ASCII run due to concatenation

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: kzmizzz, Unassigned)

References

Details

Attachments

(2 files)

sample.eml, details.txt, and screenshot pngs 7 years ago Iwasa Kazmi 62.17 KB, application/x-zip-compressed		Details
iso-2022-jp.html 7 years ago Jorg K (CEST = GMT+2) 91 bytes, text/html		Details

Iwasa Kazmi

Reporter

Description

•

7 years ago

Attached file sample.eml, details.txt, and screenshot pngs — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0 Steps to reproduce: Open "sample.eml" with Thunderbird. (60.3.1 Windows) "Sample.eml" contains ISO-2022-JP text like: <ESC>$Bこんに<ESC>(B<ESC>$Bちは<ESC>(B This text also could be considered that these two ISO-2022-JP texts are concatenated. <ESC>$Bこんに<ESC>(B <ESC>$Bちは<ESC>(B "From", "To", and "Subject" headers in "sample.eml" also have such text. These are described in "details.txt." Actual results: See screenshots. Thunderbird 60.3.1 -> TB60.3.1(wrong).png Thunderbird 54.0.b3 -> TB54.0.b3(good).png In 60.3.1, U+FFFD (REPLACEMENT CHARACTER) are displayed in text. 54.0.b3 seems to display correctly. Expected results: U+FFFD should not be appeard in text. The root cause would be in Encoding Standard of WHATWG. https://encoding.spec.whatwg.org/ The reference implementation of Encoding Standard inserts U+FFFD for the zero-length content between escape sequences. Encoding Standard says: > The ISO-2022-JP encoder is the only encoder for which the concatenation of multiple outputs can result in an error when run through the corresponding decoder. (https://encoding.spec.whatwg.org/#iso-2022-jp-encoder) This behavior is also the spec of TextDecoder or its underlying libs. See bug 1506049. However, Thunderbird should display ISO-2022-JP text correctly. Inserting U+FFFD will reduce potential risk of XSS, but Thunderbird should not be relying it.

Jorg K (CEST = GMT+2)

Comment 1

•

7 years ago

Thanks for the detailed report. I'll copy some of it into this comment for easier accessibility. Let's focus on the body problem: Message content: <ESC>$B$3$s$K<ESC>(B<ESC>$B$A$O<ESC>(B Message display: こんに<?>ちは ISO-2022-JP text in the message body (hex): 1b 24 42 : <ESC $ B> select JIS X 0208-1983 to be used 24 33 : character "こ" in JIS X 0208-1983 24 73 : character "ん" in JIS X 0208-1983 24 4b : character "に" in JIS X 0208-1983 1b 28 42 : <ESC ( B> select ASCII to be used 1b 24 42 : <ESC $ B> select JIS X 0208-1983 to be used 24 41 : character "ち" in JIS X 0208-1983 24 4f : character "は" in JIS X 0208-1983 1b 28 42 : <ESC ( B> select ASCII to be used 0d 0a : <CR LF> The replacement character appears where <ESC>(B appears in the message body. For far the presented facts. Now our reply: Yes, handling of all encoding changed in bug 1363281 in Thunderbird 56 beta. So yes, 54 and 60 may behave differently. We discussed "zero-lenght ASCII runs" at length in bug 1374149, see bug 1374149 comment #3 and below. As per bug 1374149 comment #5 these zero-length runs are invalid. We only tolerate them at the end of an RFC 2047 token, but not in the middle of a string. Sorry. Where do these invalid messages come from? Henri, anything to add here?

Status: UNCONFIRMED → RESOLVED

Closed: 7 years ago

Flags: needinfo?(hsivonen)

Resolution: --- → INVALID

Summary: "concatenated" ISO-2022-JP text is displayed incorrectly → ISO-2022-JP text is displayed with a replacement character <?> if it contains a zero-length ASCII run due to concatenation

Jorg K (CEST = GMT+2)

Updated

•

7 years ago

Component: Folder and Message Lists → Internationalization

Product: Thunderbird → Core

Version: 60 → 60 Branch

Iwasa Kazmi

Reporter

Comment 2

•

7 years ago

I found this issue on some emails from a mailing-list system. That mailing-list system concatenating some ISO-2022-JP text for changing Subject header or modify content. It is not common, but generates valid ISO-2022-JP text.

Henri Sivonen (:hsivonen)

Comment 3

•

7 years ago

(In reply to Jorg K (GMT+1) from comment #1) > Henri, anything to add here? I still don't understand the benefit of the U+FFFD generation as an XSS defense, considering that there are other cases left undefended: https://github.com/whatwg/encoding/issues/115#issuecomment-312645847 OTOH, generating a U+FFFD when there is no content between ISO-2022-JP shift sequences is mentioned in the Unicode Security Considerations: https://www.unicode.org/reports/tr36/#Some_Output_For_All_Input If you are interested in getting this changed, the next steps would be: 1) Finding out what IE, Edge, Chrome and Safari do and 2) finding out why the Unicode Security Considerations say what they say about this.

Jorg K (CEST = GMT+2)

Comment 4

•

7 years ago

Attached file iso-2022-jp.html — Details

(In reply to Henri Sivonen (:hsivonen) from comment #3) > 1) Finding out what IE, Edge, Chrome and Safari do and 2) finding out why the > Unicode Security Considerations say what they say about this. For number 1) you can use the attached page. Thunderbird displays the body as Firefox would display a web page. While I'm here, I tried IE and Edge, they both don't show the <?>. I don't have Chrome (yet on this fairly new machine), but bug 1374149 comment #3 says: Chrome matches encoding_rs, that is TB/FF.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 5

•

7 years ago

Edge and IE don't generate a REPLACEMENT CHARACTER. Firefox, Chrome and Safari do. (With the caveat that my Mac is stuck on El Capitan, so I couldn't test the latest Safari.)

Henri Sivonen (:hsivonen)

Comment 6

•

7 years ago

I posted to the Unicode mailing list about this: https://www.unicode.org/mail-arch/unicode-ml/y2018-m11/0106.html

Jorg K (CEST = GMT+2)

Comment 7

•

7 years ago

Thanks Henri that's a very long post. From a layman's point, showing the <?> in the otherwise readable text doesn't appear very "useful", and uconv and Microsoft browsers don't insert it. I have trouble understanding why this is done, maybe it's related to this: "Security software written to the formal specification may not detect malicious text (for example, "delete" with a shift-to-double-byte then an immediate shift-to-ASCII in the middle)." How would one create or hide "malicious text" using those "no-op escape sequences"?

Henri Sivonen (:hsivonen)

Comment 8

•

7 years ago

(In reply to Jorg K (GMT+1) from comment #7) > I have trouble understanding why this is done, maybe it's related to this: > "Security software written to the formal specification may not detect > malicious text > (for example, "delete" with a shift-to-double-byte then an immediate > shift-to-ASCII > in the middle)." > > How would one create or hide "malicious text" using those "no-op escape > sequences"? If the ASCII string "delete" has ISO-2022-JP shift sequences added between the characters, "security software" scanning the content on the byte level does not see the sequence of bytes as containing "delete" but after decoding, the text says "delete" unless REPLACEMENT CHARACTERS are injected.

Jorg K (CEST = GMT+2)

Comment 9

•

7 years ago

I got it now, thanks. Surely the security software needs to be a bit context aware, no? HTML del<span></span>ete is also not detected as "delete".

Magnus Melin [:mkmelin]

Updated

•

2 years ago

Duplicate of this bug: 1864978

You need to log in before you can comment on or make changes to this bug.

Bugzilla

ISO-2022-JP text is displayed with a replacement character <?> if it contains a zero-length ASCII run due to concatenation

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: kzmizzz, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type