Open Bug 1037108 Opened 6 years ago Updated 2 years ago

Potential buffer overflow in UTF8ToUTF16Char


(Toolkit :: Crash Reporting, defect)

Not set




(Reporter: erahm, Unassigned)


(Blocks 1 open bug)


(Keywords: coverity, sec-other, Whiteboard: [CID 1064600])

11/01/13 (change by
Action:	Undecided	→	Fix Required

If the calling routines don't protect against utf8 chars above \uFFFF then this could overflow.
Do you have more details? As filed the bug is close to useless.
Sorry this made more sense in the context of coverity. From what I can tell line 62 is doing pointer arithmetic wrong, the |* sizeof(...)| should be removed.

Full details:

int UTF8ToUTF16Char(const char *in, int in_length, uint16_t out[2]) {
 59  const UTF8 *source_ptr = reinterpret_cast<const UTF8 *>(in);
 60  const UTF8 *source_end_ptr = source_ptr + sizeof(char);
1. alias: Assigning: target_ptr = out. target_ptr now points to element 0 of out (which consists of 2 2-byte elements).
 61  uint16_t *target_ptr = out;
CID 1064600 (#1 of 1): Illegal address computation (OVERRUN)2. illegal_address: target_ptr + 4UL evaluates to an address that is at byte offset 8 of an array of 4 bytes.
 62  uint16_t *target_end_ptr = target_ptr + 2 * sizeof(uint16_t);
 63  out[0] = out[1] = 0;
 65  // Process one character at a time
3. Condition true /* 1 */, taking true branch
 66  while (1) {
 67    ConversionResult result = ConvertUTF8toUTF16(&source_ptr, source_end_ptr,
 68                                                 &target_ptr, target_end_ptr,
 69                                                 strictConversion);
4. Condition result == conversionOK, taking true branch
 71    if (result == conversionOK)
 72      return static_cast<int>(source_ptr - reinterpret_cast<const UTF8 *>(in));
 74    // Add another character to the input stream and try again
 75    source_ptr = reinterpret_cast<const UTF8 *>(in);
 76    ++source_end_ptr;
 78    if (source_end_ptr > reinterpret_cast<const UTF8 *>(in) + in_length)
 79      break;
 80  }
 82  return 0;
Source links would help make this clearer, FWIW. This is the code you're talking about:

I don't know enough about character encodings to say anything meaningful about this code, but your analysis sounds sensible.
On the plus side, this is probably difficult to meaningfully exploit because these conversion functions are only called on strings derived from the OS, like shared library names.
This doesn't really sound exploitable. Please adjust as needed.
Keywords: sec-other
Ted are you confident that this is only filenames/OS stuff?
Flags: needinfo?(ted)
Yeah, I did some DXRing to find the Linux callers and it comes down to this:*%2C+unsigned+int%2C+MDLocationDescriptor+*%29%22

So you've got mapping filenames (shared libs etc):*%2C+unsigned+int%2C+MDLocationDescriptor+*%29%22

Shared libs again:

and the output of uname:

On Mac it's only used in two places:
For the OS version:

and shared library names:

Obviously none of this code gets used on Windows since we rely on Microsoft's implementation there.

I'd say that the only way this is exploitable is through a vector that already has system-level access, with the ability to memory-map files with broken unicode filenames.
Flags: needinfo?(ted)
Group: core-security
Mentor: benjamin
Mentor: benjamin
You need to log in before you can comment on or make changes to this bug.