Closed
Bug 600974
Opened 14 years ago
Closed 14 years ago
crash (nsUTF8ToUnicode overruns buffer by 2 bytes outputting surrogate pair) on :first-letter punctuation test in CSS 2.1 test suite
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla2.0
People
(Reporter: dbaron, Assigned: smontagu)
References
()
Details
(Keywords: crash, css2, Whiteboard: [sg:critical?] [qa-examined-191] [qa-examined-192] [qa-needs-STR] private until 634257 is unhidden)
Attachments
(6 files, 2 obsolete files)
3.48 KB,
patch
|
emk
:
review+
dbaron
:
review+
dveditz
:
approval1.9.2.14+
dveditz
:
approval1.9.1.17+
|
Details | Diff | Splinter Review |
32.74 KB,
patch
|
Details | Diff | Splinter Review | |
12.00 KB,
text/html; charset=UTF-32BE
|
Details | |
9.21 KB,
patch
|
emk
:
review+
dveditz
:
approval1.9.2.14+
dveditz
:
approval1.9.1.17+
|
Details | Diff | Splinter Review |
40.54 KB,
patch
|
Details | Diff | Splinter Review | |
10.18 KB,
patch
|
Details | Diff | Splinter Review |
Loading the following test in the CSS 2.1 test suite (which is derived from a test I wrote):
crashes Firefox intermittently.
Steps to reproduce:
1. load http://test.csswg.org/suites/css2.1/20100917/html4/first-letter-punct-before-036.htm
2. reload the page a few times
Actual results: crash, generally after less than 10 reloads
I've tested that this occurs on both 4.0b6 and on the 2010-09-30-03-mozilla-central nightly, in both cases on 64-bit Linux.
In a debug build, loading the page reliably triggers the assertions:
###!!! ASSERTION: The Unicode decoder wrote too much data.: 'end <= NS_HTML5_STREAM_PARSER_READ_BUFFER_SIZE', file /home/dbaron/builds/mozilla-central/mozilla/parser/html/nsHtml5StreamParser.cpp, line 495
A buffer overrun causing a crash may well be exploitable (it's harder, but still possible, if it's a heap buffer overrun), so marking security-sensitive.
Reporter | ||
Updated•14 years ago
|
blocking2.0: --- → ?
Reporter | ||
Comment 1•14 years ago
|
||
I suspect this is related to incorrect handling of characters that are represented in UTF-16 as surrogate pairs, i.e., non-BMP characters.
Reporter | ||
Comment 2•14 years ago
|
||
Actually, the bug may well be that nsUTF8ToUnicode::Convert will happily write one PRUnichar past the end of the buffer when it's outputting a surrogate pair.
Reporter | ||
Comment 3•14 years ago
|
||
These steps to reproduce are specific to the HTML5 parser, which isn't on old branches, but the bug is on those old branches and it's likely possible to exploit it there.
I think it only allows overrunning the buffer by only two bytes, though.
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → dbaron
Component: HTML: Parser → Internationalization
OS: Linux → All
QA Contact: parser → i18n
Hardware: x86_64 → All
Reporter | ||
Comment 4•14 years ago
|
||
I think this is probably the safest approach to fixing it.
I started to write a different approach on the assumption that nobody would be silly enough to repeatedly call convert with a 1 char output buffer... but then (a) again, it's possible that somebody could be and (b) fixing it by backing up introduces significant complexity that I wasn't confident I could get right.
This patch should apply all the way back to 1.9.0, with fuzz. (I actually accidentally wrote it in my 1.9.0 tree while looking to see how far back the bug went.)
Attachment #479926 -
Flags: review?(smontagu)
Reporter | ||
Updated•14 years ago
|
Summary: crash (Unicode decoder overruns HTML5 parser's buffer?) on :first-letter punctuation test in CSS 2.1 test suite → crash (nsUTF8ToUnicode overruns buffer by 2 bytes outputting surrogate pair) on :first-letter punctuation test in CSS 2.1 test suite
Reporter | ||
Comment 5•14 years ago
|
||
I should probably also add a comment here:
+ if (out >= outend) {
+ mRemainingSurrogate = lowSurr;
noting that out >= outend means this is our last iteration through the loop, but we don't want to |break| because we need the assignments below.
Reporter | ||
Comment 6•14 years ago
|
||
Attachment #479926 -
Attachment is obsolete: true
Attachment #479989 -
Flags: review?(smontagu)
Attachment #479926 -
Flags: review?(smontagu)
Reporter | ||
Comment 7•14 years ago
|
||
Actually, maybe this isn't the best approach, since it seems like the other decoders don't split surrogate pairs across output blocks.
Assignee | ||
Updated•14 years ago
|
Attachment #479989 -
Flags: review?(smontagu) → review+
Assignee | ||
Comment 8•14 years ago
|
||
Crashtest?
Assignee | ||
Comment 9•14 years ago
|
||
(In reply to comment #7)
> Actually, maybe this isn't the best approach, since it seems like the other
> decoders don't split surrogate pairs across output blocks.
I was wondering about that too. Do you think splitting the surrogate pair will cause problems?
Assignee | ||
Comment 10•14 years ago
|
||
FWIW, nsUTF32ToUnicode and nsGBKToUnicode seem to have the same bug.
Reporter | ||
Comment 11•14 years ago
|
||
nsUTF32ToUnicode doesn't look like it overruns the buffer; it just drops a character on the floor, I think.
ucvlatin/nsUCS2BEToUnicode.cpp also has correctness bug when a surrogate pair
ought to cross a block boundary in the destination buffer (it outputs a replacement char instead of the surrogates it ought to)
nsGBKToUnicode::ConvertNoBuff does indeed look broken; I hadn't yet gotten to any of the ConvertNoBuff implementations, or for that matter even figured out how nsBufferDecoderSupport::Convert worked or who derived from it.
We probably ought to have a more general audit here...
(In reply to comment #9)
> I was wondering about that too. Do you think splitting the surrogate pair will
> cause problems?
It depends what the callers do with the result. If they're just pasting the chunks back together then they should be fine. But if they look into them before doing so, there could be problems.
Comment 12•14 years ago
|
||
(In reply to comment #9)
> (In reply to comment #7)
> > Actually, maybe this isn't the best approach, since it seems like the other
> > decoders don't split surrogate pairs across output blocks.
>
> I was wondering about that too. Do you think splitting the surrogate pair will
> cause problems?
The decode loop in the HTML5 parser assumes that if the result is NS_PARTIAL_MORE_OUTPUT the decoder hasn't consumed all input bytes. This is wrong, of course, if the decoder consumed a 4-byte sequence but had room to write only the first surrogate.
http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#536
My bad. :-(
In any case, the decoder interface contract should be documented to cover this case so that users of the API know what to do.
If the decode loop in the HTML5 parser were fixed to take into account the case mentioned above, the worst that could happen with the high and low surrogates going into different output buffers would be a discretionary flush happening in the middle of a surrogate pair and the first surrogate got notified to layout before the second is available. However, if the decoder doesn't emit the high surrogate until it has consumed the entire 4-bytes UTF-8 sequence, AFAICT, there's no way for a discretionary flush to take place in the middle of a surrogate pair.
So the HTML5 parser should be OK once the decode loop itself has been fixed--assuming that the new API contract makes it possible to grab the second surrogate out of the decoder without pushing new bytes to it so that the parser can always obtain the complete surrogate pair inside the same runnable task that pushed the 4-byte sequence into the decoder.
Updated•14 years ago
|
Comment 13•14 years ago
|
||
Comment on attachment 479989 [details] [diff] [review]
patch
From the comments sounds like we don't need a separate branch patch. We'll wait for trunk landing (or at least blocking) first.
Attachment #479989 -
Flags: approval1.9.2.12?
Attachment #479989 -
Flags: approval1.9.1.15?
Reporter | ||
Comment 14•14 years ago
|
||
(In reply to comment #12)
> If the decode loop in the HTML5 parser were fixed to take into account the case
> mentioned above, the worst that could happen with the high and low surrogates
> going into different output buffers would be a discretionary flush happening in
> the middle of a surrogate pair and the first surrogate got notified to layout
> before the second is available.
I think we don't want that to happen.
Assignee | ||
Comment 15•14 years ago
|
||
Comment on attachment 479989 [details] [diff] [review]
patch
So the consensus seems to be that we don't want to do it this way. David, do you want to make a new patch, or would you rather punt to me (in view of what you said in comment 4)?
Attachment #479989 -
Flags: review-
Attachment #479989 -
Flags: review+
Attachment #479989 -
Flags: approval1.9.2.12?
Attachment #479989 -
Flags: approval1.9.1.15?
Reporter | ||
Comment 16•14 years ago
|
||
I'd be happy if you wrote the new patch.
Updated•14 years ago
|
Assignee: dbaron → smontagu
blocking2.0: ? → final+
Reporter | ||
Comment 17•14 years ago
|
||
There are a bunch of other bugs (bug 597849, bug 596874, bug 599600) at least some of the occurrences of which are probably this bug... and they're public.
Do you know when you'll have a chance to work on this? Should I take it back?
Reporter | ||
Comment 19•14 years ago
|
||
There are also public comments in bug 564008.
Assignee | ||
Comment 20•14 years ago
|
||
This is about 90% done, but I got distracted by other bugs.
Assignee | ||
Comment 21•14 years ago
|
||
Attachment #479989 -
Attachment is obsolete: true
Attachment #482587 -
Flags: review?
Assignee | ||
Comment 22•14 years ago
|
||
Reftest rather than crashtest in order to test for correct decoding.
Did you forget to request a reviewer? This needs to get on someones review queue.
Assignee | ||
Updated•14 years ago
|
Attachment #482587 -
Flags: review?(dbaron)
Attachment #482587 -
Flags: review?(VYV03354)
Attachment #482587 -
Flags: review?
Assignee | ||
Comment 24•14 years ago
|
||
(In reply to comment #23)
> Did you forget to request a reviewer? This needs to get on someones review
> queue.
Bug 372539 must die!
Reporter | ||
Comment 25•14 years ago
|
||
Comment on attachment 482587 [details] [diff] [review]
Patch
r=dbaron
Attachment #482587 -
Flags: review?(dbaron) → review+
Comment 26•14 years ago
|
||
Comment on attachment 482587 [details] [diff] [review]
Patch
I couldn't run reftests (even without patches), but the patch looks fine.
Attachment #482587 -
Flags: review?(VYV03354) → review+
Assignee | ||
Comment 27•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/4c146ed860b3
http://hg.mozilla.org/mozilla-central/rev/3a24c22ddd4f
Keeping open for other decoders per comment 10 and 11.
I'm not so convinced that we need this on branches without STR, and if we do take it we should probably take fixes for other buffer overruns that were exposed by the HTML5 parser. I remember bug 563618, and I think there were others.
Assignee | ||
Comment 28•14 years ago
|
||
Both UTF-32 and GB18030 trigger the assertions. Since HTML5 recommends against supporting UTF-32, maybe the time has come to remove it rather than invest time in fixing it. I opened bug 604317 on that.
Comment 29•14 years ago
|
||
(In reply to comment #12)
> The decode loop in the HTML5 parser assumes that if the result is
> NS_PARTIAL_MORE_OUTPUT the decoder hasn't consumed all input bytes. This is
> wrong, of course, if the decoder consumed a 4-byte sequence but had room to
> write only the first surrogate.
> http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#536
> My bad. :-(
>
> In any case, the decoder interface contract should be documented to cover this
> case so that users of the API know what to do.
What's the new API contract? How should I change nsHtml5StreamParser.cpp?
Comment 30•14 years ago
|
||
Sayre, we need to fix the UTF-32 decoder unless we remove the encoder itself.
Assignee | ||
Comment 31•14 years ago
|
||
UTF-32 testcase (asserts and crashes)
Assignee | ||
Updated•14 years ago
|
Attachment #483185 -
Attachment mime type: text/html → text/html; charset-UTF-32BE
Assignee | ||
Updated•14 years ago
|
Attachment #483185 -
Attachment mime type: text/html; charset-UTF-32BE → text/html; charset=UTF-32BE
Assignee | ||
Comment 32•14 years ago
|
||
umm, actually the UTF-32 testcase has different assertions and doesn't seem to crash, so maybe it's not such high priority after all.
###!!! ASSERTION: The Unicode decoder consumed the wrong number of bytes.: 'totalByteCount == (PRInt32)aCount', file /home/smontagu/mozwork/hgtree/mozilla/parser/html/nsHtml5StreamParser.cpp, line 540
###!!! ASSERTION: Wrong number of stream bytes written/sniffed.: 'writeCount == aLength', file /home/smontagu/mozwork/hgtree/mozilla/parser/html/nsHtml5StreamParser.cpp, line 682
Assignee | ||
Comment 33•14 years ago
|
||
(In reply to comment #29)
> What's the new API contract? How should I change nsHtml5StreamParser.cpp?
I don't think there's any change, except for the assertion that you linked to before.
* Unless there is not enough output space, this method must consume all the
* available input data!
This doesn't necessarily imply the converse, i.e. that where there isn't enough output space, not all input data will be consumed. I'll add a sentence to clarify that.
Assignee | ||
Comment 34•14 years ago
|
||
(In reply to comment #29)
> What's the new API contract? How should I change nsHtml5StreamParser.cpp?
The assertion
NS_ASSERTION(byteCount > 0 || NS_FAILED(convResult),
"The decoder consumed too few bytes but did not signal an error.");
will also need to change to cover the case where the decoder returns NS_PARTIAL_MORE_OUTPUT at the very end of the input, and the caller has to call Convert() again with no input.
Assignee | ||
Comment 35•14 years ago
|
||
Attachment #484264 -
Flags: review?(VYV03354)
Assignee | ||
Comment 36•14 years ago
|
||
Comment 37•14 years ago
|
||
Comment on attachment 484264 [details] [diff] [review]
Patch for GB18030 and UTF-16
Please set a language and a font-family to the <html> element explicitly. Otherwise reftests will fail on non-English locales because those locales may select a non-Western or sans-serif font for UTF-8 documents. At least the tests don't pass on Japanese locale unless I add lang="en" style="font-family: serif" to the <html> element of 600974-1.html and 600974-3.html.
Otherwise looks good.
Attachment #484264 -
Flags: review?(VYV03354) → review+
Assignee | ||
Comment 38•14 years ago
|
||
(In reply to comment #37)
> At least the tests
> don't pass on Japanese locale unless I add lang="en" style="font-family: serif"
> to the <html> element of 600974-1.html and 600974-3.html.
I originally did this in 600974-2.html, because it didn't pass on en-US locale, but even with the language specified it still fails. I couldn't find a solution, so I ended up giving up and making 2 reference versions, 600974-1-ref.html and 600974-2-ref.html, which are identical except for the charset. Do you have any idea how to avoid this? I'm not sure if it's a bug in font selection or expected behaviour.
Comment 39•14 years ago
|
||
Did you also specify font-family: serif? On Japanese and Chinese locale, the default font-family is sans-serif which seems to affect a font selection for comma characters (U+002C). I could match 600974-2-ref.html rednering to 600974-1-ref.html by adding style="font-family: serif".
Assignee | ||
Comment 40•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/7c6637b28e51
http://hg.mozilla.org/mozilla-central/rev/a7f4a58793c0
font-family: serif wasn't enough, but with font-size as well I was able to remove the duplicate reference rendering.
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Updated•14 years ago
|
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
Assignee | ||
Comment 41•14 years ago
|
||
wrt branches, see comment 27.
Assignee | ||
Updated•14 years ago
|
Attachment #482587 -
Flags: approval1.9.2.14?
Attachment #482587 -
Flags: approval1.9.1.17?
Assignee | ||
Updated•14 years ago
|
Attachment #484264 -
Flags: approval1.9.2.14?
Attachment #484264 -
Flags: approval1.9.1.17?
Comment 42•14 years ago
|
||
Comment on attachment 482587 [details] [diff] [review]
Patch
Approved for 1.9.2.14 and 1.9.1.17, a=dveditz for release-drivers
Attachment #482587 -
Flags: approval1.9.2.14?
Attachment #482587 -
Flags: approval1.9.2.14+
Attachment #482587 -
Flags: approval1.9.1.17?
Attachment #482587 -
Flags: approval1.9.1.17+
Comment 43•14 years ago
|
||
Comment on attachment 484264 [details] [diff] [review]
Patch for GB18030 and UTF-16
Approved for 1.9.2.14 and 1.9.1.17, a=dveditz for release-drivers
Attachment #484264 -
Flags: approval1.9.2.14?
Attachment #484264 -
Flags: approval1.9.2.14+
Attachment #484264 -
Flags: approval1.9.1.17?
Attachment #484264 -
Flags: approval1.9.1.17+
Updated•14 years ago
|
blocking1.9.1: ? → .17+
blocking1.9.2: ? → .14+
Assignee | ||
Comment 44•14 years ago
|
||
Assignee | ||
Comment 45•14 years ago
|
||
Comment 46•14 years ago
|
||
Have we seen this crash on 1.9.2 or 1.9.1? Using 1.9.2.13 and 1.9.1.16 released builds on XP with either the original URL or the testcase attached, I don't get any crashes.
Whiteboard: [sg:critical?] → [sg:critical?] [qa-examined-191] [qa-examined-192] [qa-needs-STR]
Comment 47•14 years ago
|
||
Comment 48•14 years ago
|
||
(In reply to comment #34)
> (In reply to comment #29)
> > What's the new API contract? How should I change nsHtml5StreamParser.cpp?
>
> The assertion
> NS_ASSERTION(byteCount > 0 || NS_FAILED(convResult),
> "The decoder consumed too few bytes but did not signal an error.");
> will also need to change to cover the case where the decoder returns
> NS_PARTIAL_MORE_OUTPUT at the very end of the input, and the caller has to call
> Convert() again with no input.
Filed bug 634262 as a follow-up.
Updated•14 years ago
|
Whiteboard: [sg:critical?] [qa-examined-191] [qa-examined-192] [qa-needs-STR] → [sg:critical?] [qa-examined-191] [qa-examined-192] [qa-needs-STR] private until 634257 is unhidden
Updated•14 years ago
|
Group: core-security
Updated•14 years ago
|
Target Milestone: --- → mozilla2.0
You need to log in
before you can comment on or make changes to this bug.
Description
•