Closed Bug 715319 (CVE-2012-0471) Opened 13 years ago Closed 12 years ago

More multi-octet encoding issues

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla14
Tracking Status
firefox9 --- wontfix
firefox10 --- wontfix
firefox11 - wontfix
firefox12 + verified
firefox13 + verified
firefox14 + verified
firefox-esr10 12+ verified
blocking1.9.2 --- needed
status1.9.2 --- wontfix

People

(Reporter: annevk, Assigned: smontagu)

References

Details

(Keywords: testcase, Whiteboard: [sg:high/moderate][qa+:ashughes])

Attachments

(12 files, 5 obsolete files)

68 bytes, text/html; charset=hz-gb-2312
Details
54 bytes, text/html; charset=euc-jp
Details
5.41 KB, text/html; charset=x-johab
Details
76 bytes, text/html
Details
78 bytes, text/html
Details
7.01 KB, patch
emk
: review+
Details | Diff | Splinter Review
8.71 KB, patch
Details | Diff | Splinter Review
1.14 KB, patch
emk
: review+
Details | Diff | Splinter Review
1.35 KB, patch
emk
: review+
Details | Diff | Splinter Review
3.85 KB, patch
emk
: review+
Details | Diff | Splinter Review
12.73 KB, patch
smontagu
: review+
Details | Diff | Splinter Review
181 bytes, text/html; charset=gb2312
Details
Since this seems to be the same kind of issues as bug 690225 marking it security sensitive for now.

In euc-jp 0x8F "eats" the next two octets.

In hz-gb-2312 anything above 0x7E will eat the next octet. (Every other implementation of hz-gb-2312 I have seen just replaces each octet above 0x7E with U+FFFD by the way.)

I have not really done extensive research, but I suspect there is more.
This would not be a problem unless lead bytes "eat" any ASCII characters (<= 0x7F).
Or do you have any concrete exploits?
Yes, they eat regardless of what the next octet is, which includes "illegal" octets in the range 0x00 to 0x7E.
Hm, we should not blindly eat the second bytes, at least.
Attached file HZ test case
Attached file EUC-JP test case
Would be nice to get this fixed for Fx 10 so the first ESR isn't affected.
Whiteboard: [sg:high/moderate]
blocking1.9.2: --- → needed
I investigated other encodings. The results that I found is the following:

hz-gb-2312: [0x80-0xFF] eat the next octet. 
T.61-8bit: [0xC0-0xCF] eat the next octet.
GB2312/gbk/gb18030: [0x81-0xFE]+[0x30-0x39] eat the next two octets.
Another one:

x-johab: [0xE0-0xF9] eat the next octet.
Attached file x-johab test case
Attachment #588876 - Attachment mime type: text/plain → text/html; charset=x-johab
As you can see attachment 58876 [details] [diff] [review], I made a mistake. The correct is:

x-johab: [0xD9-0xF9](not including 0xDB/0xDF) eat the next octet.
Simon: can you add your diagnosis to this bug? Thanks.
(In reply to Anne van Kesteren from comment #0)
> (Every other
> implementation of hz-gb-2312 I have seen just replaces each octet above 0x7E
> with U+FFFD by the way.)

Not quite accurate: IE decodes 8-bit data in GB mode but not in ASCII mode.
Attached patch Patch for hz-gb-2312 (obsolete) — Splinter Review
This makes us compatible with IE on http://coq.no/X/charset5/test-HZ.php?HZ-GB-2312 and I think solves the over-consuming ASCII issue.

More patches to come for the other encodings
Attachment #602350 - Flags: review?(VYV03354)
Comment on attachment 602350 [details] [diff] [review]
Patch for hz-gb-2312

Review of attachment 602350 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/uconv/ucvcn/nsHZToUnicode.cpp
@@ +119,5 @@
> +        oddByte = 0;
> +        mHZState &= ~HZ_STATE_ODD_BYTE_FLAG;
> +      }
> +      continue;
> +    } 

Per RFC 1843, '~\n' looks to be valid line continuation only in ASCII mode.
Is this simulating other existing implementation?

@@ +143,4 @@
>            *aDest++ = UCS2_NO_MAPPING;
> +          if (HZ_ENCODING_STATE != HZ_STATE_GB) {
> +            aSrc--;
> +          }

When is this condition hold?
oddByte can only be HZLEAD1 or 0 in ASCII mode.
Attached file Line-continuation test
BTW, line-continuation support can cause XSS. IE10 on Win8CP was not vulnerable.
Attachment #602350 - Flags: review?(VYV03354) → review-
Some time ago, I investigated ignored bytes. Also, Firefox ignores ESC ( J on iso-2022-jp page.
Attachment #602570 - Attachment mime type: text/plain → text/html
Attachment #602350 - Attachment is obsolete: true
Attachment #602762 - Flags: review?(VYV03354)
Attachment #602762 - Flags: review?(VYV03354) → review+
Attached patch Patch for gb18030 4 byte (obsolete) — Splinter Review
Attachment #603426 - Flags: review?(VYV03354)
Attached patch Patch for t.61 (obsolete) — Splinter Review
Attachment #603428 - Flags: review?(VYV03354)
Attached patch Patch for EUC-JP (obsolete) — Splinter Review
Attachment #603429 - Flags: review?(VYV03354)
Attached patch Lots of testsSplinter Review
(In reply to Masato Kinugawa from comment #10)
> As you can see attachment 58876 [details] [diff] [review], I made a mistake.
> The correct is:
> 
> x-johab: [0xD9-0xF9](not including 0xDB/0xDF) eat the next octet.

I don't follow this: as far as I can tell they only eat the next octet when it makes up a defined double-byte sequence. See http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT for reference.

(In reply to Masato Kinugawa from comment #16)
> Created attachment 602570 [details]
> iso-2022-jp_ignoredbytes
> 
> Some time ago, I investigated ignored bytes. Also, Firefox ignores ESC ( J
> on iso-2022-jp page.

Kimura-san and I discussed this some time ago in bug 381412 comment 26 and following and didn't reach a solution. All other browsers I tested also fail this XSS test, FWIW.
Comment on attachment 603426 [details] [diff] [review]
Patch for gb18030 4 byte

Review of attachment 603426 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/uconv/ucvcn/nsGBKToUnicode.cpp
@@ +222,5 @@
>            *aDest = UCS2_NO_MAPPING; 
>          }
> +        if (*aDest == UCS2_NO_MAPPING) {
> +          // If the decoding failed, resynchronize on the first ASCII byte.
> +          for (PRInt32 j = 1; j < 4; ++j) {

Don't resync if the src sequence is 84 31 A4 37 (Yes, GB18030 can represent any Unicode code points except isolated surrogate).
Moreover, why do you scan the ascii byte? aSrc[1] will always be ascii (0x30 to 0x39) because it passed LEGAL_GBK_4BYTE_SECOND_BYTE() test.
Attachment #603426 - Flags: review?(VYV03354) → review-
(In reply to Simon Montagu from comment #22)
> I don't follow this: as far as I can tell they only eat the next octet when
> it makes up a defined double-byte sequence. See
> http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT for
> reference.
I thought x-johab can eat any bytes. But in fact can't. You are right - this is the correct behavior. And I noticed x-johab also has risk of bug 703868.
Comment on attachment 603428 [details] [diff] [review]
Patch for t.61

Is this encoding required for Web at all?
(In reply to Masatoshi Kimura [:emk] from comment #23)

> Don't resync if the src sequence is 84 31 A4 37 (Yes, GB18030 can represent
> any Unicode code points except isolated surrogate).

Ha ha ha, great catch! (But I think there is actually a bug in the converter, because bug 174351 didn't touch nsGBKToUnicode). New patch coming up
Attached patch Patch for GB18030 4 byte v.2 (obsolete) — Splinter Review
Attachment #603426 - Attachment is obsolete: true
Attachment #603866 - Flags: review?(VYV03354)
Comment on attachment 603866 [details] [diff] [review]
Patch for GB18030 4 byte v.2

I'll remove the debugging code in the second hunk before checkin
(In reply to Masatoshi Kimura [:emk] from comment #25)
> Comment on attachment 603428 [details] [diff] [review]
> Patch for t.61
> 
> Is this encoding required for Web at all?

I don't think so. It seems from http://coq.no/character-tables/misc/en that there is no interoperable implementation, and I doubt that it is actually used in web content. However, I see from comments in other bugs that it may be used by LDAP, so I've marked it with .isXSSVulnerable rather than removing the decoder completely
Attachment #603428 - Attachment is obsolete: true
Attachment #603428 - Flags: review?(VYV03354)
Attachment #603876 - Flags: review?(VYV03354)
Comment on attachment 603866 [details] [diff] [review]
Patch for GB18030 4 byte v.2

Is resync required if the sequence is in the valid 4 bytes range (that is, [0x81-0xfe][0x30-0x39][0x81-0xfe][0x30-0x39])? I doubt it is actually exploitable because only digits are possible ascii characters.
Futhermore, It doesn't match other browsers' behavior.
Attachment #603876 - Flags: review?(VYV03354) → review+
Attachment #603866 - Attachment is obsolete: true
Attachment #603866 - Flags: review?(VYV03354)
Attachment #604215 - Flags: review?(VYV03354)
Attachment #604215 - Flags: review?(VYV03354) → review+
Comment on attachment 603429 [details] [diff] [review]
Patch for EUC-JP

Review of attachment 603429 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/uconv/ucvja/nsJapaneseToUnicode.cpp
@@ +385,5 @@
>              *dest++ = 0xFFFD;
> +            // Undefined JIS 0212 two byte sequence and the second byte is in
> +            // the ASCII range: save it.
> +            if ( (PRUint8)*src < (PRUint8)0x7f )
> +              --src;

Chrome: do not also eat C1-controls range (<= 0x9f).
IE10: do not eat unless the second byte is in the valid GR range ([^\xa1-\xfe])
We should match our behavior to one of their behaviors.
I prefer the IE behaviour, but I don't have access to IE for testing right now. Chrome's behaviour is very different from ours after resynchronizing, because their handling of characters outside the GR range is very different from ours in general. For example, they decode data:text/html;charset=euc-jp,%8f%81%82 and data:text/html;charset=euc-jp,%81%82 both as U+0081 U+0082, while we decode the first as U+FFFD and the second as U+FFFD U+FFFD. 

Another case: Chrome decodes data:text/html;charset=euc-jp,%8f%8e%c1 as U+FF81, while we decode it as U+FFFD. I think both are wrong, and the preferred behaviour would be to decode to U+FFFD U+FF81.

How does IE handle these cases?
Version: unspecified → Trunk
In IE10,
euc-jp,%8f%81%82 -> U+FFFD U+FFFD U+FFFD
euc-jp,%81%82 -> U+FFFD U+FFFD
euc-jp,%8f%8e%c1 -> U+FFFD U+FF81
This aims to emulate IE's behaviour consistently
Attachment #603429 - Attachment is obsolete: true
Attachment #603429 - Flags: review?(VYV03354)
Attachment #605236 - Flags: review?(VYV03354)
Hm, the current Encoding Standard draft has a different algorithm from any existing implementations.
0x8E or 0x8F will disappear if the following byte is neither valid nor ASCII.
For the byte sequence 0x8F 0x8E 0xC1:
0x8F: Increment the byte pointer (Step 6) and set euc-jp first to 0x8F (Step 12).
0x8E: Increment the byte pointer (Step 6) and set euc-jp first to 0x8E (Step 12).
0xC1: Increment the byte pointer (Step 6) and emit U+FE81 (Step 8).
For the byte sequence 0x8F 0x81 0x82:
0x8F: Increment the byte pointer (Step 6) and set euc-jp first to 0x8F (Step 12).
0x81: Increment the byte pointer (Step 6) and emit U+FFFD (Step 14).
0x82: Increment the byte pointer (Step 6) and emit U+FFFD (Step 14).
But it will not eat ASCII chars. For the byte sequence 0x8F 0x22:
0x8F: Increment the byte pointer (Step 6) and set euc-jp first to 0x8F (Step 12).
0x22: Set euc-jp first to 0x00 and emit U+FFFD (Step 4, note that the byte pointer will not be increased).
0x22: Emit U+0022 (Step 4, reading the second byte again because the byte pointer has not been increased).

Anne, what do you think?
I think this is OK because eating non-ASCII chars will not be exploitable.
> 0x22: Emit U+0022 (Step 4, reading the second byte again because the byte pointer has not been increased).
It was
0x22: Increment the byte pointer (Step 6) and emit U+0022 (Step 11, reading the second byte again because the byte pointer has not been increased).
I think step 12 should also check that euc-jp first is 0x00.
I rewrote parts of the euc-jp algorithm. Hopefully it's more sensible now.
So are Anne's algorithm, IE's implementation and my patch interoperable? Because of the changes, it's impossible to follow comment 36 now.
Target Milestone: --- → mozilla13
I think so. The current specification only consumes bytes in valid ranges. If a byte at any point in a multi-byte sequence is out of range, the lead byte is replaced and all the bytes following it are reinterpreted. (The iso-2022-* encodings are treated somewhat differently to this. There is no backtracking once you enter double-byte mode there.)
For the byte sequence 8F C1 8E C1:
8F: Increment the byte pointer (Step 5) and set euc-jp first to 0x8F (Step 11).
C1: Increment the byte pointer (Step 5), set euc-jp first to 0x00, and euc-jp second to 0xC1 (Step 8).
8E: Increment the byte pointer (Step 5), let lead be 0xC1 and set euc-jp second to 0x00 (Step 6).
    Decrement the byte pointer (Step 6.4) and emit U+FFFD (Step 6.5).
8E: Increment the byte pointer (Step 5) and set euc-jp first to 0x8E (Step 11).
C1: Increment the byte pointer (Step 5) and emit U+FF81 (Step 7).
But IE emits U+FFFD U+FFFD U+FF81. So this algorithm is not interoperable to IE's implementation yet.
That said, I'm no longer sure simulating IE's behavior is a good thing. It is inconsistent with ISO-2022-JP decoder and UTF-8 decoder.
Also, IE swallows the incomplete byte(s) before EOF.
(In reply to Masatoshi Kimura [:emk] from comment #42)
> That said, I'm no longer sure simulating IE's behavior is a good thing. It
> is inconsistent with ISO-2022-JP decoder and UTF-8 decoder.

So are you ready to give the patch r+ or do you want me to make more changes?
(In reply to Simon Montagu from comment #44)
> So are you ready to give the patch r+ or do you want me to make more changes?
What the current patch behavior? I haven't test the current patch yet.
I found yet another discrepancy. For the byte sequence 8E E0 A1 A2 A1 A2,
IE and the latest spec: U+FFFD U+71F9 U+25C6
Chrome: U+00A2 U+3001 U+3001
Firefox 11: U+FFFD U+3001 U+3001
We are right here because E0-FE is considered a valid but undefined code range.
I withdraw comment 32 because existing implementations are too peculiar.
The latest spec and Gecko agree on 8E E0 actually (if no match in step 7 it will always go through substeps of 9 and since E0 is in the valid range the byte pointer is not decreased). I should probably change the spec for 8F though to decrease the byte pointer by 2. Thanks for the review!
Comment on attachment 605236 [details] [diff] [review]
Patch for euc-jp v.2

Let's fix any remaining problems in follow-up bugs.
Attachment #605236 - Flags: review?(VYV03354) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/db02a6ae2953
Target Milestone: mozilla13 → mozilla14
https://hg.mozilla.org/mozilla-central/rev/db02a6ae2953
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Carrying over r=emk. I hope it's still early enough in the beta cycle to take this fix.

[Approval Request Comment]
Regression caused by (bug #): Not a regression
User impact if declined: Users could be exposed to XSS on a crafted page
Testing completed (on m-c, etc.): Baked on trunk for 10 days
Risk to taking this patch (and alternatives if risky): None known
String changes made by this patch: None
Attachment #609747 - Flags: review+
Attachment #609747 - Flags: approval-mozilla-esr10?
Attachment #609747 - Flags: approval-mozilla-beta?
Attachment #609747 - Flags: approval-mozilla-aurora?
Comment on attachment 609747 [details] [diff] [review]
Combined patch as checked in

[Triage Comment]
The security team is recommending that we take this patch for FF12, and we're early enough to land there. Approving for all branches.
Attachment #609747 - Flags: approval-mozilla-esr10?
Attachment #609747 - Flags: approval-mozilla-esr10+
Attachment #609747 - Flags: approval-mozilla-beta?
Attachment #609747 - Flags: approval-mozilla-beta+
Attachment #609747 - Flags: approval-mozilla-aurora?
Attachment #609747 - Flags: approval-mozilla-aurora+
Keywords: testcase
Whiteboard: [sg:high/moderate] → [sg:high/moderate][qa+]
When I compare the rendering of testcases attached or linked to here in nightly and current Aurora builds, I'm not seeing rendering issues. Can someone give some guidance here on whether I should or how to verify the fix here?
Alias: CVE-2012-0471
(In reply to Al Billings [:abillings] from comment #53)
> When I compare the rendering of testcases attached or linked to here in
> nightly and current Aurora builds, I'm not seeing rendering issues. Can
> someone give some guidance here on whether I should or how to verify the fix
> here?

Anyone?
Hmm...I must have done something stupid before. Verified in trunk with testcases in comment 4 and 5.
Status: RESOLVED → VERIFIED
Target Milestone: mozilla14 → ---
Version: Trunk → Other Branch
Group: core-security
Landed xpcshell-tests now that the bug is open https://hg.mozilla.org/integration/mozilla-inbound/rev/16c6de083029
Should the tests land on branches also?
Flags: in-testsuite? → in-testsuite+
Resetting flags changed in comment 55 (I presume inadvertently).

Tests merged:
https://hg.mozilla.org/mozilla-central/rev/16c6de083029
Target Milestone: --- → mozilla14
Version: Other Branch → Trunk
(In reply to Masato Kinugawa from comment #9)
> Created attachment 588876 [details]
> x-johab test case

On both Fx13 and Fx14 I see ? inside a black diamond background for everything past 7E:  

Safari displays chars correctly.
Tried to verify on Firefox 10.0.5 ESR:
The issue from comment 58 also reproduces on ESR.

For the line-continuation test, "<� script>alert('hello');" is displayed in the browser. Its source code is not displayed right in Firefox:

<!DOCTYPE html>
<meta charset=HZ-GB-2312>
<�
script>alert('hello');</script
Mozilla/5.0 (Windows NT 6.1; rv:10.0.5) Gecko/20100101 Firefox/10.0.5

For the HZ test case, �丂、、 �"�@��!" is displayed in the browser.
For the EUC-JP test case, 丂�0��!�0! is displayed in the browser.
For the iso-2022-jp_ignoredbytes test case, an alert with the word "hello" is displayed. The arrow between "<" and "(Jscrip..." is ignored. This arrow is not even displayed in Firefox when viewing the test case's source.
Can someone from Engineering please comment on this? QA has attempted three times to verify this and all of our results seem to indicate this is not fixed.
Multiple issues have been brought up in the bug, not all of which have been fixed.

I'll try to go through them all seriatim and point to test cases.

Comment 0:
In euc-jp 0x8f "eats" the next two octets
In attachment 585905 [details], PASS=丂�0��!�0! (including two each of ASCII 0 and !)

In hz-gb-2312 anything above 0x7E will eat the next octet
In attachment 602569 [details], there should be no alert box

Comment 7:
hz-gb-2312: [0x80-0xFF] eat the next octet. 
See above

T.61-8bit: [0xC0-0xCF] eat the next octet
Fixed by making T.61-8bit unavailable to browser

GB2312/gbk/gb18030: [0x81-0xFE]+[0x30-0x39] eat the next two octets.
I'll attach a testcase in a second

Comment 8:
x-johab: [0xE0-0xF9] eat the next octet.
This is inaccurate -- see comment 22 and comment 24

Comment 16:
Firefox ignores ESC ( J on iso-2022-jp page.
In attachment 602570 [details] there should be no alert box. This is NOT fixed. See comment 22.
Attached file GB2312 testcase
The input field in this testcase should read "�2 pass" to pass.
Just going by the attached testcases, here are my results with Firefox 12, 13RC, and 14.0a2-latest:
 * attachment 585905 [details]: 丂�0��!�0! (PASS)
 * attachment 602569 [details]: no alert (PASS)
 * attachment 602570 [details]: no alert (KNOWN FAIL - "hello" alerts)
 * attachment 629810 [details]: �2 pass (FAIL - "2 pass" displays)

Apart from the last one, I'm not sure we can call this verified fixed. Simon, please advise.
Comment on attachment 629810 [details]
GB2312 testcase

Sorry, I forgot that <meta charset> doesn't override the Content-type header that bugzilla automatically sets. Can you retest?
Attachment #629810 - Attachment mime type: text/html → text/html; charset=gb2312
Yup, that testcase works as expected now. Tested in Firefox 12.0, 13.0, 14.0a2, and 10.0.5esr.

Marking this verified -- please let me know if there is something more to test here.

Thanks
Whiteboard: [sg:high/moderate][qa+] → [sg:high/moderate][qa+:ashughes]
Guys!

The situation with this bug is similar to bug #690225. In both these Bugzilla entries there is information concerning the holes, which I've informed Mozilla already in March 2009. And that time Mozilla ignored my letter and left Internet users with risk of XSS attacks. And fixed these vulnerabilities only now.

It's good that they were fixed at last - partly after 2,5 years and partly after 3 years after my informing of Mozilla. I've wrote about it in bug #715319.
Depends on: CVE-2012-4207
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: