1173100 - Clean up OneUcs4ToUtf8Char

Reporter

Description

•

9 years ago

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150305021524

Steps to reproduce:

js_OneUcs4ToUtf8Char (38.0.1\js\src\jsstr.cpp) outputs more than the documented number of characters if supplied with an input character that has certain negative values when viewed as a CharT. This can cause overruns of callers' buffers.

For example, assume ucs4Char == 0xffffffff. The assert on line 4934 does nothing, because it's not there in release builds. Control skips from line 4935 to line 4938 because ucs4Char is an unsigned 0xffffffff. Lines 4939-44 then compute utf8Length == 7, and lines 4946-50 write 7 characters into the caller's buffer that should, according to js_OneUcs4ToUtf8Char's header comment, be "at least 4 bytes long":

4925: /*
4926:  * Convert one UCS-4 char and write it into a UTF-8 buffer, which must be at
4927:  * least 4 bytes long.  Return the number of UTF-8 bytes of data written.
4928:  */
4929: int
4930: js_OneUcs4ToUtf8Char(uint8_t* utf8Buffer, uint32_t ucs4Char)
4931: {
4932:     int utf8Length = 1;
4933:
4934:     MOZ_ASSERT(ucs4Char <= 0x10FFFF);
4935:     if (ucs4Char < 0x80) {
4936:         *utf8Buffer = (uint8_t)ucs4Char;
4937:     } else {
4938:         int i;
4939:         uint32_t a = ucs4Char >> 11;
4940:         utf8Length = 2;
4941:         while (a) {
4942:             a >>= 5;
4943:             utf8Length++;
4944:         }
4945:         i = utf8Length;
4946:         while (--i) {
4947:            utf8Buffer[i] = (uint8_t)((ucs4Char & 0x3F) | 0x80);
4948:            ucs4Char >>= 6;
4949:         }
4950:         *utf8Buffer = (uint8_t)(0x100 - (1 << (8-utf8Length)) + ucs4Char);
4951:     }
4952:     return utf8Length;
4953: }

The function also returns 7 instead of 4.

js_OneUcs4ToUtf8Char has 3 direct callers of which I know:

  38.0.1\js\src\jsstr.cpp line 4723 (Encode):
     size_t L = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 4 bytes, and on the stack, declared on line 4722: uint8_t utf8buf[4];
     (Maximum overrun 3 bytes).

  38.0.1\js\src\ctypes\CTypes.cpp line 154 (DeflateStringToUTF8Buffer):
     utf8Len = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 6 bytes, and on the stack, declared on line 125: uint8_t utf8buf[6];
     (Maximum overun 1 bytes).

  38.0.1\js\src\vm\CharacterEncoding.cpp line 119 (DeflateStringToUTF8Buffer):
     utf8Len = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 4 bytes, and on the stack, declare on line 118: uint8_t utf8buf[4];
     (Maximum overrun 3 bytes).

In all these cases, the string appears to have external origin, though it might have been scrubbed by other code. The data written is modified by js_OneUcs4ToUtf8Char, so this bug is less exploitable than if the data were written unchanged, but any overrun of a stack-based buffer raises the possibility of exploitation by modification of other stack variables and/or return addresses.

Note that js_OneUcs4ToUtf8Char's direct callers might be inlined into their callers, so the corruption might extend beyond the direct callers. Also the effects will differ depending upon platforms, word size, and compiler version.

Interestingly, where Encode is the direct caller (and using the Windows x86 build), the bug corrupts hexBuf [0], stomping the "%" with which it was initialized. In the case I tested using the debugger, this ended up corrupting a URL, turning what should have been something like "https:%2f%2fduckduckgo.com%2f" into "https뾿FE뾿83뾿BF뾿BF뾿BF뾿BF뾿42뾿2F뾿2Fduckduckgo.com뾿2F".

Presumably this bug, as exploited using this path, might be used to cause the user's browser to load a script from an attacker's site into a different site's context, or at least to cause the browser to navigate to an unexpected site, or an expected site with an unexpected URL argument. I understand that ICANN now allows Unicode domains to be registered. https://en.wikipedia.org/wiki/Internationalized_domain_name .

Benjamin Smedberg

Updated

•

9 years ago

Component: Untriaged → JavaScript Engine

Product: Firefox → Core

Al Billings [:abillings - ex-MoCo]

Updated

•

9 years ago

Flags: sec-bounty?

Jan de Mooij [:jandem]

Assignee

Comment 1

•

9 years ago

I don't see how a value of 0xFFFFFFFF can be passed to OneUcs4ToUtf8Char.

(In reply to q1 from comment #0)
>   38.0.1\js\src\jsstr.cpp line 4723 (Encode):
>      size_t L = js_OneUcs4ToUtf8Char(utf8buf, v);
>      The buffer here is 4 bytes, and on the stack, declared on line 4722:
> uint8_t utf8buf[4];
>      (Maximum overrun 3 bytes).

Let's look at the input range of |v| here:

(1) If c < 0xD800 || c > 0xDBFF, v = c, so v has range [0, 0xFFFF].

(2) Else:

c  must be [0xD800, 0xDBFF]
c2 must be [0xDC00, 0xDFFF]

v = ((c - 0xD800) << 10) + (c2 - 0xDC00) + 0x10000;

If we try the min and max values for c and c2, we get either:

  v = (0 << 10) + 0 + 0x10000 = 0x10000
  v = (1023 << 10) + 1023 + 0x10000 = 0x10ffff

Note that 0x10ffff matches the assert in OneUcs4ToUtf8Char. That does seem like it's the maximum value we can pass here in Encode, unless I'm missing something subtle.

Can you post some JS that should fail the assert in OneUcs4ToUtf8Char in debug builds?

Flags: needinfo?(q1)

q1

Reporter

Comment 2

•

9 years ago

Good catch. I made an important mistake, for which I apologize: I assumed that char16_t is signed. It is *unsigned*, per C++11 s.3.9.1(5). The upshot is that the maximum value that 

4710:                v = c;

can ever assign to the uint32_t v is 0xffff, which appears to be a safe value. The same limit seems to apply in all the callers I listed, so this bug appears currently to be unrealized.

Nonetheless, js_OneUcs4ToUtf8Char should be fixed (limiting its output to <= 4 characters) to prevent this bug from ever being realized.

Flags: needinfo?(q1)

q1

Reporter

Updated

•

9 years ago

Summary: Bug in js_OneUcs4ToUtf8Char overwrites stack variable(s) in callers → Bug in js_OneUcs4ToUtf8Char could overrun buffer if callers don't range-check argument

Jan de Mooij [:jandem]

Assignee

Comment 3

•

9 years ago

(In reply to q1 from comment #2)
> Good catch. I made an important mistake, for which I apologize: I assumed
> that char16_t is signed.

No worries, I'm glad you filed it. Don't hesitate to report more bugs.

> Nonetheless, js_OneUcs4ToUtf8Char should be fixed (limiting its output to <=
> 4 characters) to prevent this bug from ever being realized.

Yes, this function is old code and could use some cleanup. I'll try to get to it soon.

Flags: needinfo?(jdemooij)

Daniel Veditz [:dveditz]

Comment 4

•

9 years ago

Jandem: you're saying this is unexploitable and can be unhidden?

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Group: core-security

Jan de Mooij [:jandem]

Assignee

Updated

•

9 years ago

Summary: Bug in js_OneUcs4ToUtf8Char could overrun buffer if callers don't range-check argument → Clean up OneUcs4ToUtf8Char

Jan de Mooij [:jandem]

Assignee

Comment 5

•

9 years ago

Attached patch Patch — Details — Splinter Review

Some random OneUcs4ToUtf8Char cleanup. I also changed the range assert to MOZ_RELEASE_ASSERT, as an extra check.

Assignee: nobody → jdemooij

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Flags: needinfo?(jdemooij)

Attachment #8639818 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 6

•

9 years ago

Comment on attachment 8639818 [details] [diff] [review]
Patch

Review of attachment 8639818 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +5009,3 @@
>  js::OneUcs4ToUtf8Char(uint8_t* utf8Buffer, uint32_t ucs4Char)
>  {
> +    MOZ_RELEASE_ASSERT(ucs4Char <= 0x10FFFF);

Frankly, I don't think we need an assertion *everywhere* here, including imposing a cost on release builds, on a character-by-character basis.  I would make this a normal assertion.

If we truly were worried, I would add a new type like Codepoint or something that wrapped a uint32_t, that had some sort of checks on its only being created with a value <= 0x10FFFF, or by construction enforced this.  But I'm really not that worried about it.

Attachment #8639818 - Flags: review?(jwalden+bmo) → review+

q1

Reporter

Comment 7

•

9 years ago

(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #6)
> If we truly were worried, I would add a new type like Codepoint or something
> that wrapped a uint32_t, that had some sort of checks on its only being
> created with a value <= 0x10FFFF, or by construction enforced this.  But I'm
> really not that worried about it.

Or just replace

+    MOZ_ASSERT(utf8Length <= 4);

with a release assert or failout, which cuts somewhat the cost of a release assert on every character.

Pulsebot

Comment 8

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/b4ea303cbe27

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 9

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/b4ea303cbe27

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox43: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla43

Daniel Veditz [:dveditz]

Updated

•

9 years ago

Flags: sec-bounty? → sec-bounty-

Bugzilla

Quick Search

Clean up OneUcs4ToUtf8Char

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: q1, Assigned: jandem)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type