Closed Bug 575175 Opened 14 years ago Closed 14 years ago

OOM crash [@ linux-gate.so@0x430 ] at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501"

Categories

(Core :: DOM: HTML Parser, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: dholbert, Assigned: hsivonen)

References

()

Details

(4 keywords)

Crash Data

Attachments

(1 file, 1 obsolete file)

STR:
Visit 2wire.com/bandwidth

EXPECTED RESULTS:
Page should load with a bandwidth report, within ~10 seconds.

ACTUAL RESULTS:
- In debug builds, this assertion is spammed repeatedly:
###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501
- In both debug & opt builds, firefox consumes 100% CPU & eats up all of my available memory over the period of about a minute, after which it crashes.

BROKEN:     My mozilla-central nightly:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a6pre) Gecko/20100626 Minefield/3.7a6pre

NOT BROKEN: My 1.9.2 nightly:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7pre) Gecko/20100627 Namoroka/3.6.7pre
blocking2.0: --- → ?
Crash report:
http://crash-stats.mozilla.com/report/pending/ee81f30d-9937-4c40-ac12-16ccb2100627
Keywords: crashreportid
Summary: OOM Crash at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501" → OOM crash [@ linux-gate.so@0x430 ] at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501"
The assertion is:
500       NS_ASSERTION(totalByteCount < aCount,
501                    "The decoder signaled an error but consumed all input.");
http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#500

If I break there after the assertion has started failing, I get these values:
(gdb) p totalByteCount
$22 = 4294967295
(gdb) p aCount
$23 = 14600

Note that the totalByteCount value is 2^32 - 1.  It remains at this value every subsequent time I hit this line.

There also seems to be some sort of race condition involved, because if I put a condition on the breakpoint (to try to catch the first time the assertion is violated)...
> (gdb) cond 1 totalByteCount >= aCount
... then the conditional-breakpoint never gets hit (and no assertions are spammed).  Firefox does hang as long as the conditional breakpoint is enabled, though (while gdb eats up my CPU) -- and then it loads the page *successfully* when I break in and disable the breakpoint.
Looking at the contextual code (mxr link in comment 2), we're actually trapped in an infinite loop when this assertion is spamming.

Towards the beginning of the loop, we call  mUnicodeDecoder->Convert() and store the rv in "convResult".  In this case, convResult is *always* a failure value ( NS_ERROR_UNEXPECTED, from nsUTF8ToUnicode.cpp:368 [1]).  Lower down in the loop, we fall into the "NS_FAILED(convResult)" case, which *only* lets us exit the loop if totalByteCount == aCount (which it's not).  So we loop again.

Each time we go through the loop, Convert() stores a failure value in convResult, and totalByteCount and aCount remain unchanged (and not equal).  So we loop forever, AFAICT.  (allocating more and more memory with each iteration)

[1] http://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/nsUTF8ToUnicode.cpp?mark=368-368#361
It appears that the converter is violating the nsIUnicodeDecoder API contract and claiming it consumed -1 bytes...
288 NS_IMETHODIMP nsUTF8ToUnicode::Convert(const char * aSrc,

362         /* Current octet is neither in the US-ASCII range nor a legal first
363          * octet of a multi-octet sequence.
364          *
365          * Return an error condition. Caller is responsible for flushing and
366          * refilling the buffer and resetting state.
367          */
368 res = NS_ERROR_UNEXPECTED;

382           /* End of the multi-octet sequence. mUcs4 now contains the final
383            * Unicode codepoint to be output
384            *
385            * Check for illegal sequences and codepoints.
386            */
387 
388           // From Unicode 3.1, non-shortest form is illegal
389           if (((2 == mBytes) && (mUcs4 < 0x0080)) ||
390               ((3 == mBytes) && (mUcs4 < 0x0800)) ||
391               ((4 == mBytes) && (mUcs4 < 0x10000)) ||
392               (4 < mBytes) ||
393               // From Unicode 3.2, surrogate characters are illegal
394               ((mUcs4 & 0xFFFFF800) == 0xD800) ||
395               // Codepoints outside the Unicode range are illegal
396               (mUcs4 > 0x10FFFF)) {
397             res = NS_ERROR_UNEXPECTED;

416         /* ((0xC0 & (*in) != 0x80) && (mState != 0))
417          * 
418          * Incomplete multi-octet sequence. Unconsume this
419          * octet and return an error condition. Caller is responsible
420          * for flushing and refilling the buffer and resetting state.
421          */
422         in--;
423         res = NS_ERROR_UNEXPECTED;


145    * @return            NS_PARTIAL_MORE_INPUT if only a partial conversion was
146    *                    done; more input is needed to continue
147    *                    NS_PARTIAL_MORE_OUTPUT if only  a partial conversion
148    *                    was done; more output space is needed to continue
149    *                    NS_ERROR_ILLEGAL_INPUT if an illegal input sequence
150    *                    was encountered and the behavior was set to "signal";
151    *                    the caller must skip over one byte, reset the decoder
152    *                    and retry.
153    */
154   NS_IMETHOD Convert(const char * aSrc, PRInt32 * aSrcLength, 
155       PRUnichar * aDest, PRInt32 * aDestLength) = 0;

Well, I guess we can change the NS_ERROR_UNEXPECTED to NS_ERROR_ILLEGAL_INPUT
This patch makes the HTML5 parser more resilient against this case, but the real bug is in the decoder.
Attachment #454470 - Flags: review?(jonas)
Moving over to i18n, because the decoder claims to have consumed -1 bytes, which doesn't conform to the interface.
Assignee: nobody → smontagu
Component: HTML: Parser → Internationalization
QA Contact: parser → i18n
hsivonen: from reading other code, it seems clear that it wants to return NS_ERROR_UNEXPECTED.

Your patch isn't right:
     nsresult convResult = mUnicodeDecoder->Convert((const char*)aFromSegment, &byteCount, mLastBuffer->getBuffer() + end, &utf16Count);
 
+    if (byteCount < 0) {
+      NS_NOTREACHED("Decoder claimed to consume a negative number of bytes.");

This will call NS_NOTREACHED for a case you know will happen. It should only declare NS_NOTREACHED if NS_SUCCEEDED(convResult).
(In reply to comment #8)
> hsivonen: from reading other code, it seems clear that it wants to return
> NS_ERROR_UNEXPECTED.

The problem is that as long as the caller can't trust that all decoders return NS_ERROR_ILLEGAL_INPUT on illegal input, callers have to check for general failure the way nsScanner does.

> Your patch isn't right:
>      nsresult convResult = mUnicodeDecoder->Convert((const char*)aFromSegment,
> &byteCount, mLastBuffer->getBuffer() + end, &utf16Count);
> 
> +    if (byteCount < 0) {
> +      NS_NOTREACHED("Decoder claimed to consume a negative number of bytes.");
> 
> This will call NS_NOTREACHED for a case you know will happen. It should only
> declare NS_NOTREACHED if NS_SUCCEEDED(convResult).

I won't know it will happen when the converter part of this bug is fixed, too.
(In reply to comment #4)
> It appears that the converter is violating the nsIUnicodeDecoder API contract
> and claiming it consumed -1 bytes...

Agreed that we should change the decoder to return NS_ERROR_ILLEGAL_INPUT instead of NS_ERROR_UNEXPECTED, but I'm not sure that returning -1 in aSrcLength is a contract violation: 

The caller is expected to "skip over one byte, reset the decoder and retry".

What is happening here is that the previous buffer ended with an incomplete multibyte sequence, and the first byte of the current buffer is not a valid continuation byte. In this case we definitely want to retry from the invalid byte and reinterpret it as ASCII range if possible (see http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences for why this is important), so we have no choice but to return -1 and let the caller retry at position 0.

(In reply to comment #8)
> +    if (byteCount < 0) {
> +      NS_NOTREACHED("Decoder claimed to consume a negative number of bytes.");
> 
> This will call NS_NOTREACHED for a case you know will happen. It should only
> declare NS_NOTREACHED if NS_SUCCEEDED(convResult).

I agree.
(In reply to comment #10)
> (In reply to comment #4)
> > It appears that the converter is violating the nsIUnicodeDecoder API contract
> > and claiming it consumed -1 bytes...
> 
> Agreed that we should change the decoder to return NS_ERROR_ILLEGAL_INPUT
> instead of NS_ERROR_UNEXPECTED, but I'm not sure that returning -1 in
> aSrcLength is a contract violation: 
> 
> The caller is expected to "skip over one byte, reset the decoder and retry".
> 
> What is happening here is that the previous buffer ended with an incomplete
> multibyte sequence, and the first byte of the current buffer is not a valid
> continuation byte. In this case we definitely want to retry from the invalid
> byte and reinterpret it as ASCII range if possible (see
> http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences for why this is
> important), 

I'm not disputing that it's the responsibility of the caller to reset the decoder and retry.

> so we have no choice but to return -1 and let the caller retry at
> position 0.

The API doc says "after conversion will contain the number of bytes read". If no bytes were consumed, I'd expect an implementation conforming to the interface to say that 0 bytes were consumed. Not that -1 bytes were consumed.

> (In reply to comment #8)
> > +    if (byteCount < 0) {
> > +      NS_NOTREACHED("Decoder claimed to consume a negative number of bytes.");
> > 
> > This will call NS_NOTREACHED for a case you know will happen. It should only
> > declare NS_NOTREACHED if NS_SUCCEEDED(convResult).
> 
> I agree.

I didn't mean the patch here to be the entire fix for this bug. I expected the UTF-8 decoder to be fixed, too. Hence, the recategorization under Internationalization.
We seem to be talking past each other here... What I'm trying to say is that in this specific case, if the decoder returns that there was an error and 0 bytes were consumed, the caller will skip over the byte at position 0 and retry from position 1, which will produce the wrong result.

Saying that we consumed -1 bytes is certainly a bit paradoxical, but what it means is "not only did we consume 0 bytes, the last byte of the previous buffer needs to be unconsumed". Is there any other way to make the caller restart from position 0?
Comment on attachment 454470 [details] [diff] [review]
Make nsHtml5StreamParser more resilient against broken nsIUnicodeDecoder implementations

(In reply to comment #12)
> Saying that we consumed -1 bytes is certainly a bit paradoxical, but what it
> means is "not only did we consume 0 bytes, the last byte of the previous buffer
> needs to be unconsumed". Is there any other way to make the caller restart from
> position 0?

Oh, I see. I'll adjust the caller code accordingly. I wish this had been documented in nsIUnicodeDecoder.h. :-( I'll adjust the interface docs, too.
Attachment #454470 - Attachment is obsolete: true
Attachment #454470 - Flags: review?(jonas)
Assignee: smontagu → hsivonen
Status: NEW → ASSIGNED
Attachment #457840 - Flags: review?
Assignee: hsivonen → nobody
Component: Internationalization → HTML: Parser
OS: Linux → All
QA Contact: i18n → parser
Hardware: x86 → All
Attachment #457840 - Flags: review? → review?(smontagu)
Something went wrong with bugzilla when requesting review, so CCing smontagu.
Priority: -- → P2
Attachment #457840 - Flags: review?(smontagu) → review+
Can we get a testcase for this?
(In reply to comment #16)
> Can we get a testcase for this?

What's a good way to test it without setting us up for random orange in case the Necko buffer boundaries vary?
Comment on attachment 457840 [details] [diff] [review]
Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes

Requesting approval, since this is a crasher. There's no test case at the moment, because I don't know how to write a robust test for this.
Attachment #457840 - Flags: approval2.0?
(IMHO this should have blocking+ rather than approval+, since it's a regression & a crash. On the other hand, it's had blocking=? for about a month already, so maybe the approval request will get triaged faster...)
Yeah, there may have been a hole in the triage coverage. Agreed it should be blocking in any case.
blocking2.0: ? → betaN+
Comment on attachment 457840 [details] [diff] [review]
Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes

Ah, thanks bsmedberg!  (Yeah, this bug jumped back and forth between components a bit - that may be part of the reason it was missed in triage.)

/me cancels no-longer-necessary approval request
Attachment #457840 - Flags: approval2.0?
Landed for Henri (ok'd with him in IRC):
http://hg.mozilla.org/mozilla-central/rev/ce820bc46229
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@ linux-gate.so@0x430 ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: