Last Comment Bug 564679 - (CVE-2010-1210) Bytes mapped to U+FFFD in 8-bit encodings make the following byte/character disappear
(CVE-2010-1210)
: Bytes mapped to U+FFFD in 8-bit encodings make the following byte/character d...
Status: RESOLVED FIXED
[sg:moderate] possible XSS hazard
: verified1.9.2
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: unspecified
: All All
: -- normal (vote)
: ---
Assigned To: Simon Montagu :smontagu
:
: Makoto Kato [:m_kato]
Mentors:
http://coq.no/X/charset5/test8bit.php...
Depends on:
Blocks: xss
  Show dependency treegraph
 
Reported: 2010-05-09 06:09 PDT by O. Andersen
Modified: 2010-07-20 16:15 PDT (History)
11 users (show)
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
needed
.7-fixed
unaffected


Attachments
Fix (794 bytes, patch)
2010-05-10 01:36 PDT, Simon Montagu :smontagu
VYV03354: review+
dveditz: approval1.9.2.7+
Details | Diff | Splinter Review
Test (2.02 KB, patch)
2010-05-10 01:45 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review
Test (1.99 KB, patch)
2010-05-10 01:49 PDT, Simon Montagu :smontagu
no flags Details | Diff | Splinter Review

Description O. Andersen 2010-05-09 06:09:50 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a5pre) Gecko/20100508 Minefield/3.7a5pre
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.3a5pre) Gecko/20100508 Minefield/3.7a5pre

Many 8-bit encodings contain undefined positions, which are mapped to U+FFFD. In the last Minefield for Mac, such bytes are correctly mapped to U+FFFD, but the immediately following byte disappears(!).

For example, the sequence {'\xD1', '\xD2', '\xD3', 'xD4'} in windows-1253 should result in {U+3A1, U+FFFD, U+3A3, U+3A4} (i.e., the string "Ρ�ΣΤ"), but the actual result is the shorter sequence {U+3A1, U+FFFD, U+3A4} with no U+3A3 character (i.e., the string "Ρ�Τ", with no 'Σ').

(Two consecutive bytes both mapped to U+FFFD result in only one U+FFFD character instead of two.)

This seems to be a general problem; it does apply to several windows-* and ISO-8859-* encodings.

Firefox 3.6.3 (release) shows the same incorrect behaviour. This bug did not exist in Firefox 3.5.8.

[Incidentally, it might make sense to map bytes in the range 0x7F..0x9F to U+7F..U+9F and not to U+FFFD for many of the affected encodings, but that is a separate issue and would in any case not solve the current problem completely since many encodings, including windows-1253, have undefined characters outside this range, for which U+FFFD is the only reasonable mapping.]

Reproducible: Always
Comment 1 Simon Montagu :smontagu 2010-05-09 22:04:30 PDT
Investigating. There is more to this than meets the eye: I see the failure at http://coq.no/X/charset5/test8bit.php?enc=windows-1253&mime=windows-1253, but on the other hand data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes as expected to Ρ�ΣΤ.

Unfortunately our own unit tests at intl/uconv/tests/unit/test_decode_*.js don't test undefined code points, and I will fix this, but I tried adding 0xd2 to test_decode_CP1253.js manually, and that didn't show the bug either.
Comment 3 Simon Montagu :smontagu 2010-05-09 23:51:36 PDT
This could be an XSS vulnerability
Comment 4 Simon Montagu :smontagu 2010-05-10 00:31:40 PDT
(In reply to comment #1)
> Investigating. There is more to this than meets the eye: I see the failure at
> http://coq.no/X/charset5/test8bit.php?enc=windows-1253&mime=windows-1253, but
> on the other hand data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes as
> expected to Ρ�ΣΤ.

This turns out to be true only on trunk with the HTML5 parser enabled. With it disabled, and also on 3.6, data:text/html;charset=windows-1253,%d1%d2%d3%d4 decodes to Ρ�Τ
Comment 5 Simon Montagu :smontagu 2010-05-10 01:36:43 PDT
Created attachment 444362 [details] [diff] [review]
Fix
Comment 6 Simon Montagu :smontagu 2010-05-10 01:45:21 PDT
Created attachment 444363 [details] [diff] [review]
Test
Comment 7 Simon Montagu :smontagu 2010-05-10 01:49:20 PDT
Created attachment 444365 [details] [diff] [review]
Test
Comment 8 Boris Zbarsky [:bz] (still a bit busy) 2010-05-10 06:39:28 PDT
Simon, did you mean to ask someone in particular for review?
Comment 9 Simon Montagu :smontagu 2010-05-10 08:19:42 PDT
(In reply to comment #8)
> Simon, did you mean to ask someone in particular for review?

Thanks for spotting that, Boris. Not only did I mean to do so, I *did* do so, but it failed, and silently at that. This is the supremely annoying bug 372539, and I have been bitten by it before...
Comment 10 Masatoshi Kimura [:emk] 2010-05-10 09:02:29 PDT
Comment on attachment 444362 [details] [diff] [review]
Fix

> +  <title>Test for Unicode non-characters</title>
Fix the test title. r=me with this.
Comment 12 Simon Montagu :smontagu 2010-05-24 01:07:41 PDT
Comment on attachment 444362 [details] [diff] [review]
Fix

Requesting branch approval after trunk baking. This is a very low-risk change which prevents illegal codepoints from corrupting the following character. It is more important to have this on the branch than on trunk, since the HTML5 parser mitigates its effect in some cases.
Comment 13 christian 2010-06-11 15:53:23 PDT
Is 1.9.1 affected as well?
Comment 14 Masatoshi Kimura [:emk] 2010-06-11 18:54:02 PDT
No. Bug 174351 was not landed on the 1.9.1 branch.
Comment 15 Daniel Veditz [:dveditz] 2010-06-14 10:24:25 PDT
Comment on attachment 444362 [details] [diff] [review]
Fix

Approved for 1.9.2.6, a=dveditz for release-drivers
Comment 16 Simon Montagu :smontagu 2010-06-19 13:09:43 PDT
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/8478cfe10e43
Comment 17 Simon Montagu :smontagu 2010-06-19 13:11:10 PDT
and tests: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/4422b1e5b0dc
Comment 18 Al Billings [:abillings] 2010-07-16 12:13:01 PDT
Verified for 1.9.2 with passing tests.

Note You need to log in before you can comment on or make changes to this bug.