Last Comment Bug 241440 - memory overflow in UTF8ToNewUnicode
: memory overflow in UTF8ToNewUnicode
Status: RESOLVED FIXED
[sg:fix]
: fixed-aviary1.0.1, fixed1.4.5, fixed1.7.6
Product: Core
Classification: Components
Component: String (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla1.8beta1
Assigned To: Christopher Aillon (sabbatical, not receiving bugmail)
:
Mentors:
Depends on:
Blocks: sg-ff101 sg-tb101 sg-moz176
  Show dependency treegraph
 
Reported: 2004-04-23 03:51 PDT by wind li
Modified: 2006-03-12 17:30 PST (History)
15 users (show)
dveditz: blocking1.7.6+
dveditz: blocking‑aviary1.0.1+
asa: blocking1.8b+
asa: blocking‑aviary1.5+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
remove mLength = 0; (639 bytes, patch)
2004-04-26 21:57 PDT, wind li
dbaron: review+
dveditz: superreview+
dveditz: approval‑aviary1.0.1+
dveditz: approval1.7.6+
dveditz: approval1.8b+
Details | Diff | Review
Fix attempt (1.87 KB, patch)
2005-02-12 16:39 PST, Christopher Aillon (sabbatical, not receiving bugmail)
dbaron: review-
Details | Diff | Review
Patch for 1.4 branch (850 bytes, patch)
2005-06-29 23:43 PDT, Leon Sha
dveditz: superreview+
Details | Diff | Review

Description wind li 2004-04-23 03:51:08 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113

in UTF8ToNewUnicode
>copy_string(aSource.BeginReading(start), aSource.EndReading(end),
>                calculator);
if aSource include none-UTF8 chachacter such as 0xFC 0xDF 
calculator.Length() == 0
>PRUnichar *result = NS_STATIC_CAST(PRUnichar*,
>        nsMemory::Alloc(sizeof(PRUnichar) * (calculator.Length() + 1)));
length of result = 2
>
>    ConvertUTF8toUTF16 converter(result);
>    copy_string(aSource.BeginReading(start), aSource.EndReading(end),
                converter).write_terminator();
if aSource is like "something(0xFC 0xDF)" result will be "something". Danger.

A walk through is to remove the mLength = 0; in
xpcom/string/public/nsUTF8Utils.h:256

Reproducible: Always
Steps to Reproduce:
1.
2.
3.
Comment 1 timeless 2004-04-23 14:22:44 PDT
why would non utf8 characters be in a class clearly labeled utf8?
Comment 2 Simon Montagu :smontagu 2004-04-23 16:30:44 PDT
(In reply to comment #1)
> why would non utf8 characters be in a class clearly labeled utf8?

It happens ;-) See bug 236941 for a recent example.
Comment 3 Christian :Biesinger (don't email me, ping me on IRC) 2004-04-24 04:14:28 PDT
that sounds more like a bug in the caller
Comment 4 wind li 2004-04-26 21:57:06 PDT
Created attachment 147104 [details] [diff] [review]
remove mLength = 0;
Comment 5 Daniel Veditz [:dveditz] 2004-06-07 09:33:14 PDT
Confirming bug. Exploitability would depend on the ability of an attacker to get
bad UTF8 into this section of code, but there are enough places where we parse
UTF8 off the web that there's probably some easy ways.

I'm not sure I like the proposed fix. Instead of returning a known error value
(0) it would return a partial length. Then we have to hope ConvertUTF8toUTF16
fails no later than CalculateUTF8Length. It seems better for UTF8toNewUnicode to
check for a zero length before allocating and deal with the error at that level.
Comment 6 Christian :Biesinger (don't email me, ping me on IRC) 2004-06-07 15:29:47 PDT
I'd expect UTF-8 off the web to go to a different place, namely intl's
nsUTF8ToUnicode...
Comment 7 chris hofmann 2005-02-02 12:16:49 PST
callion to try and work on this a bit.  can use some help
Comment 8 Christopher Aillon (sabbatical, not receiving bugmail) 2005-02-12 16:39:12 PST
Created attachment 174183 [details] [diff] [review]
Fix attempt

Another fix for the issue, handling the zero-length given by the calculator.

The question to ask is do we protect against this when we clearly state that
UTF8ToNewUnicode takes a UTF-8 string?	Sure, callers should get fixed, but we
probably shouldn't stomp memory if they hand us garbage...

I'm also wondering if the previously submitted patch (removing the mLength = 0
in the calculatr) is correct, since the converter will in fact write up to the
invalid character and hand back a string of that length.  It makes sense to
have the converter and calculator match, I think, with their output length.

I also took a look at ToNewUTF8String which I thought might have had the same
issue, though I don't believe it does after reading through its respective
Convert and Calculate classes.
Comment 9 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-02-15 15:52:55 PST
Comment on attachment 147104 [details] [diff] [review]
remove mLength = 0;

The converter and the calculator should definitely match.  It looks to me like
this makes them do so, so r=dbaron on this patch.  It would probably be good to
add comments to the header of all 4 classes (the two converters and the two
calculators) about that (i.e., that ConvertUTF8toUTF16 and CalculateUTF8Length
should match and that ConvertUTF16toUTF8 and CalculateUTF8Size should match).
Comment 10 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-02-15 15:54:17 PST
Comment on attachment 174183 [details] [diff] [review]
Fix attempt

I prefer the other approach.
Comment 11 Daniel Veditz [:dveditz] 2005-02-15 16:36:09 PST
Comment on attachment 147104 [details] [diff] [review]
remove mLength = 0;

sr=dveditz
Comment 12 Daniel Veditz [:dveditz] 2005-02-15 16:37:42 PST
Comment on attachment 147104 [details] [diff] [review]
remove mLength = 0;

a=dveditz for landing everywhere
Comment 13 Simon Montagu :smontagu 2005-02-15 22:24:10 PST
If I'm not mistaken, the convertor and calculator will still not match in the
case of an incomplete multi-byte sequence (e.g. 110xxxxx not followed by
10xxxxxx, or 1110xxxx not followed by two of 10xxxxxx, etc.).

If by "match" you just mean the convertor should never write more than the
length returned by the calculator, that's fine.
Comment 14 Simon Montagu :smontagu 2005-02-15 22:53:45 PST
(In reply to comment #9)
> (From update of attachment 147104 [details] [diff] [review] [edit])
> The converter and the calculator should definitely match.  It looks to me like
> this makes them do so, so r=dbaron on this patch.  It would probably be good to
> add comments to the header of all 4 classes (the two converters and the two
> calculators) about that (i.e., that ConvertUTF8toUTF16 and CalculateUTF8Length
> should match and that ConvertUTF16toUTF8 and CalculateUTF8Size should match).

It would also be good to have comments explaining which errors the converter
considers fatal and which it considers recoverable by emitting a REPLACEMENT
CHARACTER, and why. I assume that the distinction is between byte sequences that
look like they are in another encoding (fatal) and byte sequences that look like
UTF-8 with errors (recoverable), but that's only a guess.
Comment 15 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-02-17 12:11:01 PST
Actually, though, there's another case where the computed length could be less
-- which is if we actually get a 5-byte or 6-byte UTF-8 sequence.  It looks like
we'll attempt to fit it into a surrogate pair even though it doesn't fit.
Comment 16 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-02-17 12:12:04 PST
Er, actually, never mind.  We do check for that case.
Comment 17 David Baron :dbaron: ⌚️UTC-7 (review requests must explain patch) 2005-02-17 12:31:27 PST
Fix checked in to trunk, 2005-02-17 12:17 -0800.
Fix checked in to MOZILLA_1_7_BRANCH, 2005-02-17 12:24 -0800.
Fix checked in to AVIARY_1_0_1_20050124_BRANCH, 2005-02-17 12:29 -0800.

Thanks for the patch.
Comment 18 Jay Patel [:jay] 2005-02-17 20:24:37 PST
Does anyone have a testcase we can use to verify this fix?
Comment 19 Leon Sha 2005-06-29 23:43:50 PDT
Created attachment 187760 [details] [diff] [review]
Patch for 1.4 branch
Comment 20 Daniel Veditz [:dveditz] 2005-07-27 17:37:07 PDT
Comment on attachment 187760 [details] [diff] [review]
Patch for 1.4 branch

sr=dveditz
Comment 21 Leon Sha 2005-07-27 23:23:17 PDT
Checking in nsUTF8Utils.h;
/cvsroot/mozilla/xpcom/string/public/nsUTF8Utils.h,v  <--  nsUTF8Utils.h
new revision: 1.3.2.1; previous revision: 1.3
done

Note You need to log in before you can comment on or make changes to this bug.