575175 - OOM crash [@ linux-gate.so@0x430 ] at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501"

Reporter

Description

•

15 years ago

STR: Visit 2wire.com/bandwidth EXPECTED RESULTS: Page should load with a bandwidth report, within ~10 seconds. ACTUAL RESULTS: - In debug builds, this assertion is spammed repeatedly: ###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501 - In both debug & opt builds, firefox consumes 100% CPU & eats up all of my available memory over the period of about a minute, after which it crashes. BROKEN: My mozilla-central nightly: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.3a6pre) Gecko/20100626 Minefield/3.7a6pre NOT BROKEN: My 1.9.2 nightly: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7pre) Gecko/20100627 Namoroka/3.6.7pre

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

blocking2.0: --- → ?

Daniel Holbert [:dholbert]

Reporter

Comment 1

•

15 years ago

Crash report: http://crash-stats.mozilla.com/report/pending/ee81f30d-9937-4c40-ac12-16ccb2100627

Keywords: crashreportid

Summary: OOM Crash at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501" → OOM crash [@ linux-gate.so@0x430 ] at 2wire.com/bandwidth, with "###!!! ASSERTION: The decoder signaled an error but consumed all input.: 'totalByteCount < aCount', file ../../../mozilla/parser/html/nsHtml5StreamParser.cpp, line 501"

Daniel Holbert [:dholbert]

Reporter

Updated

•

15 years ago

URL: 2wire.com/bandwidth → http://2wire.com/bandwidth

Daniel Holbert [:dholbert]

Reporter

Comment 2

•

15 years ago

The assertion is: 500 NS_ASSERTION(totalByteCount < aCount, 501 "The decoder signaled an error but consumed all input."); http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#500 If I break there after the assertion has started failing, I get these values: (gdb) p totalByteCount $22 = 4294967295 (gdb) p aCount $23 = 14600 Note that the totalByteCount value is 2^32 - 1. It remains at this value every subsequent time I hit this line. There also seems to be some sort of race condition involved, because if I put a condition on the breakpoint (to try to catch the first time the assertion is violated)... > (gdb) cond 1 totalByteCount >= aCount ... then the conditional-breakpoint never gets hit (and no assertions are spammed). Firefox does hang as long as the conditional breakpoint is enabled, though (while gdb eats up my CPU) -- and then it loads the page *successfully* when I break in and disable the breakpoint.

Daniel Holbert [:dholbert]

Reporter

Comment 3

•

15 years ago

Looking at the contextual code (mxr link in comment 2), we're actually trapped in an infinite loop when this assertion is spamming. Towards the beginning of the loop, we call mUnicodeDecoder->Convert() and store the rv in "convResult". In this case, convResult is *always* a failure value ( NS_ERROR_UNEXPECTED, from nsUTF8ToUnicode.cpp:368 [1]). Lower down in the loop, we fall into the "NS_FAILED(convResult)" case, which *only* lets us exit the loop if totalByteCount == aCount (which it's not). So we loop again. Each time we go through the loop, Convert() stores a failure value in convResult, and totalByteCount and aCount remain unchanged (and not equal). So we loop forever, AFAICT. (allocating more and more memory with each iteration) [1] http://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/nsUTF8ToUnicode.cpp?mark=368-368#361

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 4

•

15 years ago

It appears that the converter is violating the nsIUnicodeDecoder API contract and claiming it consumed -1 bytes...

timeless

Comment 5

•

15 years ago

288 NS_IMETHODIMP nsUTF8ToUnicode::Convert(const char * aSrc, 362 /* Current octet is neither in the US-ASCII range nor a legal first 363 * octet of a multi-octet sequence. 364 * 365 * Return an error condition. Caller is responsible for flushing and 366 * refilling the buffer and resetting state. 367 */ 368 res = NS_ERROR_UNEXPECTED; 382 /* End of the multi-octet sequence. mUcs4 now contains the final 383 * Unicode codepoint to be output 384 * 385 * Check for illegal sequences and codepoints. 386 */ 387 388 // From Unicode 3.1, non-shortest form is illegal 389 if (((2 == mBytes) && (mUcs4 < 0x0080)) || 390 ((3 == mBytes) && (mUcs4 < 0x0800)) || 391 ((4 == mBytes) && (mUcs4 < 0x10000)) || 392 (4 < mBytes) || 393 // From Unicode 3.2, surrogate characters are illegal 394 ((mUcs4 & 0xFFFFF800) == 0xD800) || 395 // Codepoints outside the Unicode range are illegal 396 (mUcs4 > 0x10FFFF)) { 397 res = NS_ERROR_UNEXPECTED; 416 /* ((0xC0 & (*in) != 0x80) && (mState != 0)) 417 * 418 * Incomplete multi-octet sequence. Unconsume this 419 * octet and return an error condition. Caller is responsible 420 * for flushing and refilling the buffer and resetting state. 421 */ 422 in--; 423 res = NS_ERROR_UNEXPECTED; 145 * @return NS_PARTIAL_MORE_INPUT if only a partial conversion was 146 * done; more input is needed to continue 147 * NS_PARTIAL_MORE_OUTPUT if only a partial conversion 148 * was done; more output space is needed to continue 149 * NS_ERROR_ILLEGAL_INPUT if an illegal input sequence 150 * was encountered and the behavior was set to "signal"; 151 * the caller must skip over one byte, reset the decoder 152 * and retry. 153 */ 154 NS_IMETHOD Convert(const char * aSrc, PRInt32 * aSrcLength, 155 PRUnichar * aDest, PRInt32 * aDestLength) = 0; Well, I guess we can change the NS_ERROR_UNEXPECTED to NS_ERROR_ILLEGAL_INPUT

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 6

•

15 years ago

Attached patch Make nsHtml5StreamParser more resilient against broken nsIUnicodeDecoder implementations (obsolete) — Details — Splinter Review

This patch makes the HTML5 parser more resilient against this case, but the real bug is in the decoder.

Attachment #454470 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 7

•

15 years ago

Moving over to i18n, because the decoder claims to have consumed -1 bytes, which doesn't conform to the interface.

Assignee: nobody → smontagu

Component: HTML: Parser → Internationalization

QA Contact: parser → i18n

timeless

Comment 8

•

15 years ago

hsivonen: from reading other code, it seems clear that it wants to return NS_ERROR_UNEXPECTED. Your patch isn't right: nsresult convResult = mUnicodeDecoder->Convert((const char*)aFromSegment, &byteCount, mLastBuffer->getBuffer() + end, &utf16Count); + if (byteCount < 0) { + NS_NOTREACHED("Decoder claimed to consume a negative number of bytes."); This will call NS_NOTREACHED for a case you know will happen. It should only declare NS_NOTREACHED if NS_SUCCEEDED(convResult).

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 9

•

15 years ago

(In reply to comment #8) > hsivonen: from reading other code, it seems clear that it wants to return > NS_ERROR_UNEXPECTED. The problem is that as long as the caller can't trust that all decoders return NS_ERROR_ILLEGAL_INPUT on illegal input, callers have to check for general failure the way nsScanner does. > Your patch isn't right: > nsresult convResult = mUnicodeDecoder->Convert((const char*)aFromSegment, > &byteCount, mLastBuffer->getBuffer() + end, &utf16Count); > > + if (byteCount < 0) { > + NS_NOTREACHED("Decoder claimed to consume a negative number of bytes."); > > This will call NS_NOTREACHED for a case you know will happen. It should only > declare NS_NOTREACHED if NS_SUCCEEDED(convResult). I won't know it will happen when the converter part of this bug is fixed, too.

Simon Montagu :smontagu

Comment 10

•

15 years ago

(In reply to comment #4) > It appears that the converter is violating the nsIUnicodeDecoder API contract > and claiming it consumed -1 bytes... Agreed that we should change the decoder to return NS_ERROR_ILLEGAL_INPUT instead of NS_ERROR_UNEXPECTED, but I'm not sure that returning -1 in aSrcLength is a contract violation: The caller is expected to "skip over one byte, reset the decoder and retry". What is happening here is that the previous buffer ended with an incomplete multibyte sequence, and the first byte of the current buffer is not a valid continuation byte. In this case we definitely want to retry from the invalid byte and reinterpret it as ASCII range if possible (see http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences for why this is important), so we have no choice but to return -1 and let the caller retry at position 0. (In reply to comment #8) > + if (byteCount < 0) { > + NS_NOTREACHED("Decoder claimed to consume a negative number of bytes."); > > This will call NS_NOTREACHED for a case you know will happen. It should only > declare NS_NOTREACHED if NS_SUCCEEDED(convResult). I agree.

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 11

•

15 years ago

(In reply to comment #10) > (In reply to comment #4) > > It appears that the converter is violating the nsIUnicodeDecoder API contract > > and claiming it consumed -1 bytes... > > Agreed that we should change the decoder to return NS_ERROR_ILLEGAL_INPUT > instead of NS_ERROR_UNEXPECTED, but I'm not sure that returning -1 in > aSrcLength is a contract violation: > > The caller is expected to "skip over one byte, reset the decoder and retry". > > What is happening here is that the previous buffer ended with an incomplete > multibyte sequence, and the first byte of the current buffer is not a valid > continuation byte. In this case we definitely want to retry from the invalid > byte and reinterpret it as ASCII range if possible (see > http://www.unicode.org/reports/tr36/#Ill-Formed_Subsequences for why this is > important), I'm not disputing that it's the responsibility of the caller to reset the decoder and retry. > so we have no choice but to return -1 and let the caller retry at > position 0. The API doc says "after conversion will contain the number of bytes read". If no bytes were consumed, I'd expect an implementation conforming to the interface to say that 0 bytes were consumed. Not that -1 bytes were consumed. > (In reply to comment #8) > > + if (byteCount < 0) { > > + NS_NOTREACHED("Decoder claimed to consume a negative number of bytes."); > > > > This will call NS_NOTREACHED for a case you know will happen. It should only > > declare NS_NOTREACHED if NS_SUCCEEDED(convResult). > > I agree. I didn't mean the patch here to be the entire fix for this bug. I expected the UTF-8 decoder to be fixed, too. Hence, the recategorization under Internationalization.

Simon Montagu :smontagu

Comment 12

•

15 years ago

We seem to be talking past each other here... What I'm trying to say is that in this specific case, if the decoder returns that there was an error and 0 bytes were consumed, the caller will skip over the byte at position 0 and retry from position 1, which will produce the wrong result. Saying that we consumed -1 bytes is certainly a bit paradoxical, but what it means is "not only did we consume 0 bytes, the last byte of the previous buffer needs to be unconsumed". Is there any other way to make the caller restart from position 0?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 13

•

15 years ago

Comment on attachment 454470 [details] [diff] [review] Make nsHtml5StreamParser more resilient against broken nsIUnicodeDecoder implementations (In reply to comment #12) > Saying that we consumed -1 bytes is certainly a bit paradoxical, but what it > means is "not only did we consume 0 bytes, the last byte of the previous buffer > needs to be unconsumed". Is there any other way to make the caller restart from > position 0? Oh, I see. I'll adjust the caller code accordingly. I wish this had been documented in nsIUnicodeDecoder.h. :-( I'll adjust the interface docs, too.

Attachment #454470 - Attachment is obsolete: true

Attachment #454470 - Flags: review?(jonas)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 14

•

15 years ago

Attached patch Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes — Details — Splinter Review

Assignee: smontagu → hsivonen

Status: NEW → ASSIGNED

Attachment #457840 - Flags: review?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Updated

•

15 years ago

Assignee: hsivonen → nobody

Component: Internationalization → HTML: Parser

OS: Linux → All

QA Contact: i18n → parser

Hardware: x86 → All

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Updated

•

15 years ago

Attachment #457840 - Flags: review? → review?(smontagu)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 15

•

15 years ago

Something went wrong with bugzilla when requesting review, so CCing smontagu.

Priority: -- → P2

Simon Montagu :smontagu

Updated

•

15 years ago

Attachment #457840 - Flags: review?(smontagu) → review+

Simon Montagu :smontagu

Comment 16

•

15 years ago

Can we get a testcase for this?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 17

•

15 years ago

(In reply to comment #16) > Can we get a testcase for this? What's a good way to test it without setting us up for random orange in case the Necko buffer boundaries vary?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05)

Assignee

Comment 18

•

15 years ago

Comment on attachment 457840 [details] [diff] [review] Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes Requesting approval, since this is a crasher. There's no test case at the moment, because I don't know how to write a robust test for this.

Attachment #457840 - Flags: approval2.0?

Daniel Holbert [:dholbert]

Reporter

Comment 19

•

15 years ago

(IMHO this should have blocking+ rather than approval+, since it's a regression & a crash. On the other hand, it's had blocking=? for about a month already, so maybe the approval request will get triaged faster...)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

15 years ago

Assignee: nobody → hsivonen

Benjamin Smedberg

Comment 20

•

15 years ago

Yeah, there may have been a hole in the triage coverage. Agreed it should be blocking in any case.

blocking2.0: ? → betaN+

Daniel Holbert [:dholbert]

Reporter

Comment 21

•

15 years ago

Comment on attachment 457840 [details] [diff] [review] Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes Ah, thanks bsmedberg! (Yeah, this bug jumped back and forth between components a bit - that may be part of the reason it was missed in triage.) /me cancels no-longer-necessary approval request

Attachment #457840 - Flags: approval2.0?

Daniel Holbert [:dholbert]

Reporter

Comment 22

•

15 years ago

Landed for Henri (ok'd with him in IRC): http://hg.mozilla.org/mozilla-central/rev/ce820bc46229

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ linux-gate.so@0x430 ]

Make nsHtml5StreamParser more resilient against broken nsIUnicodeDecoder implementations 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05) 2.27 KB, patch		Details \| Diff \| Splinter Review
Make the HTML5 parser deal with Unicode decoder signaling -1 consumed bytes 15 years ago Henri Sivonen (:hsivonen) (away from Bugzilla until 2026-01-05) 9.94 KB, patch	smontagu : review+	Details \| Diff \| Splinter Review