Closed Bug 1173100 Opened 9 years ago Closed 9 years ago

Clean up OneUcs4ToUtf8Char

Categories

(Core :: JavaScript Engine, defect)

38 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: q1, Assigned: jandem)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150305021524

Steps to reproduce:

js_OneUcs4ToUtf8Char (38.0.1\js\src\jsstr.cpp) outputs more than the documented number of characters if supplied with an input character that has certain negative values when viewed as a CharT. This can cause overruns of callers' buffers.

For example, assume ucs4Char == 0xffffffff. The assert on line 4934 does nothing, because it's not there in release builds. Control skips from line 4935 to line 4938 because ucs4Char is an unsigned 0xffffffff. Lines 4939-44 then compute utf8Length == 7, and lines 4946-50 write 7 characters into the caller's buffer that should, according to js_OneUcs4ToUtf8Char's header comment, be "at least 4 bytes long":

4925: /*
4926:  * Convert one UCS-4 char and write it into a UTF-8 buffer, which must be at
4927:  * least 4 bytes long.  Return the number of UTF-8 bytes of data written.
4928:  */
4929: int
4930: js_OneUcs4ToUtf8Char(uint8_t* utf8Buffer, uint32_t ucs4Char)
4931: {
4932:     int utf8Length = 1;
4933:
4934:     MOZ_ASSERT(ucs4Char <= 0x10FFFF);
4935:     if (ucs4Char < 0x80) {
4936:         *utf8Buffer = (uint8_t)ucs4Char;
4937:     } else {
4938:         int i;
4939:         uint32_t a = ucs4Char >> 11;
4940:         utf8Length = 2;
4941:         while (a) {
4942:             a >>= 5;
4943:             utf8Length++;
4944:         }
4945:         i = utf8Length;
4946:         while (--i) {
4947:            utf8Buffer[i] = (uint8_t)((ucs4Char & 0x3F) | 0x80);
4948:            ucs4Char >>= 6;
4949:         }
4950:         *utf8Buffer = (uint8_t)(0x100 - (1 << (8-utf8Length)) + ucs4Char);
4951:     }
4952:     return utf8Length;
4953: }

The function also returns 7 instead of 4.

js_OneUcs4ToUtf8Char has 3 direct callers of which I know:

  38.0.1\js\src\jsstr.cpp line 4723 (Encode):
     size_t L = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 4 bytes, and on the stack, declared on line 4722: uint8_t utf8buf[4];
     (Maximum overrun 3 bytes).

  38.0.1\js\src\ctypes\CTypes.cpp line 154 (DeflateStringToUTF8Buffer):
     utf8Len = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 6 bytes, and on the stack, declared on line 125: uint8_t utf8buf[6];
     (Maximum overun 1 bytes).

  38.0.1\js\src\vm\CharacterEncoding.cpp line 119 (DeflateStringToUTF8Buffer):
     utf8Len = js_OneUcs4ToUtf8Char(utf8buf, v);
     The buffer here is 4 bytes, and on the stack, declare on line 118: uint8_t utf8buf[4];
     (Maximum overrun 3 bytes).

In all these cases, the string appears to have external origin, though it might have been scrubbed by other code. The data written is modified by js_OneUcs4ToUtf8Char, so this bug is less exploitable than if the data were written unchanged, but any overrun of a stack-based buffer raises the possibility of exploitation by modification of other stack variables and/or return addresses.

Note that js_OneUcs4ToUtf8Char's direct callers might be inlined into their callers, so the corruption might extend beyond the direct callers. Also the effects will differ depending upon platforms, word size, and compiler version.

Interestingly, where Encode is the direct caller (and using the Windows x86 build), the bug corrupts hexBuf [0], stomping the "%" with which it was initialized. In the case I tested using the debugger, this ended up corrupting a URL, turning what should have been something like "https:%2f%2fduckduckgo.com%2f" into "https뾿FE뾿83뾿BF뾿BF뾿BF뾿BF뾿42뾿2F뾿2Fduckduckgo.com뾿2F".

Presumably this bug, as exploited using this path, might be used to cause the user's browser to load a script from an attacker's site into a different site's context, or at least to cause the browser to navigate to an unexpected site, or an expected site with an unexpected URL argument. I understand that ICANN now allows Unicode domains to be registered. https://en.wikipedia.org/wiki/Internationalized_domain_name .
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Flags: sec-bounty?
I don't see how a value of 0xFFFFFFFF can be passed to OneUcs4ToUtf8Char.

(In reply to q1 from comment #0)
>   38.0.1\js\src\jsstr.cpp line 4723 (Encode):
>      size_t L = js_OneUcs4ToUtf8Char(utf8buf, v);
>      The buffer here is 4 bytes, and on the stack, declared on line 4722:
> uint8_t utf8buf[4];
>      (Maximum overrun 3 bytes).

Let's look at the input range of |v| here:

(1) If c < 0xD800 || c > 0xDBFF, v = c, so v has range [0, 0xFFFF].

(2) Else:

c  must be [0xD800, 0xDBFF]
c2 must be [0xDC00, 0xDFFF]

v = ((c - 0xD800) << 10) + (c2 - 0xDC00) + 0x10000;

If we try the min and max values for c and c2, we get either:

  v = (0 << 10) + 0 + 0x10000 = 0x10000
  v = (1023 << 10) + 1023 + 0x10000 = 0x10ffff

Note that 0x10ffff matches the assert in OneUcs4ToUtf8Char. That does seem like it's the maximum value we can pass here in Encode, unless I'm missing something subtle.

Can you post some JS that should fail the assert in OneUcs4ToUtf8Char in debug builds?
Flags: needinfo?(q1)
Good catch. I made an important mistake, for which I apologize: I assumed that char16_t is signed. It is *unsigned*, per C++11 s.3.9.1(5). The upshot is that the maximum value that 

4710:                v = c;

can ever assign to the uint32_t v is 0xffff, which appears to be a safe value. The same limit seems to apply in all the callers I listed, so this bug appears currently to be unrealized.

Nonetheless, js_OneUcs4ToUtf8Char should be fixed (limiting its output to <= 4 characters) to prevent this bug from ever being realized.
Flags: needinfo?(q1)
Summary: Bug in js_OneUcs4ToUtf8Char overwrites stack variable(s) in callers → Bug in js_OneUcs4ToUtf8Char could overrun buffer if callers don't range-check argument
(In reply to q1 from comment #2)
> Good catch. I made an important mistake, for which I apologize: I assumed
> that char16_t is signed.

No worries, I'm glad you filed it. Don't hesitate to report more bugs.

> Nonetheless, js_OneUcs4ToUtf8Char should be fixed (limiting its output to <=
> 4 characters) to prevent this bug from ever being realized.

Yes, this function is old code and could use some cleanup. I'll try to get to it soon.
Flags: needinfo?(jdemooij)
Jandem: you're saying this is unexploitable and can be unhidden?
Group: core-security
Summary: Bug in js_OneUcs4ToUtf8Char could overrun buffer if callers don't range-check argument → Clean up OneUcs4ToUtf8Char
Attached patch PatchSplinter Review
Some random OneUcs4ToUtf8Char cleanup. I also changed the range assert to MOZ_RELEASE_ASSERT, as an extra check.
Assignee: nobody → jdemooij
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(jdemooij)
Attachment #8639818 - Flags: review?(jwalden+bmo)
Comment on attachment 8639818 [details] [diff] [review]
Patch

Review of attachment 8639818 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +5009,3 @@
>  js::OneUcs4ToUtf8Char(uint8_t* utf8Buffer, uint32_t ucs4Char)
>  {
> +    MOZ_RELEASE_ASSERT(ucs4Char <= 0x10FFFF);

Frankly, I don't think we need an assertion *everywhere* here, including imposing a cost on release builds, on a character-by-character basis.  I would make this a normal assertion.

If we truly were worried, I would add a new type like Codepoint or something that wrapped a uint32_t, that had some sort of checks on its only being created with a value <= 0x10FFFF, or by construction enforced this.  But I'm really not that worried about it.
Attachment #8639818 - Flags: review?(jwalden+bmo) → review+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #6)
> If we truly were worried, I would add a new type like Codepoint or something
> that wrapped a uint32_t, that had some sort of checks on its only being
> created with a value <= 0x10FFFF, or by construction enforced this.  But I'm
> really not that worried about it.

Or just replace

+    MOZ_ASSERT(utf8Length <= 4);

with a release assert or failout, which cuts somewhat the cost of a release assert on every character.
https://hg.mozilla.org/mozilla-central/rev/b4ea303cbe27
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Flags: sec-bounty? → sec-bounty-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: