Closed Bug 1309834 Opened 3 years ago Closed 3 years ago

Assertion failure: charBufferLen + aLength <= charBuffer.length (About to memcpy past the end of the buffer!), at nsHtml5TreeBuilderCppSupplement.h:959

Categories

(Core :: HTML: Parser, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox49 --- wontfix
firefox-esr45 50+ fixed
firefox50 + fixed
firefox51 + fixed
firefox52 + fixed
firefox53 --- fixed

People

(Reporter: cbook, Assigned: hsivonen)

References

(Blocks 1 open bug, )

Details

(4 keywords, Whiteboard: [adv-main50.1+][adv-esr45.6+] bug 1286911 mitigates in Fx50+)

Crash Data

Attachments

(2 files, 3 obsolete files)

Attached file bughunter stack
found via bughunter and reproduced via recent tinderbox windows debug build based on m-c

affects builds trunk to beta

Steps to reproduce:
-> https://public-artifacts.taskcluster.net/cCAdbuPmQa6p_aptbdJ9Nw/0/public/logs/live_backing.log
--> Assertion failure: charBufferLen + aLength <= charBuffer.length (About to memcpy past the end of the buffer!), at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\parser\html\nsHtml5TreeBuilderCppSupplement.h:959

Also affects opt builds https://crash-stats.mozilla.com/report/index/f66748f5-a1f6-4b81-9ab5-fb8bc2161013
[Tracking Requested - why for this release]:

real live site (well our own in this case :) and affects opt builds
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Component: DOM: Core & HTML → HTML: Parser
Warning: The log decompresses to 2 GB, so even when handled without bugs, it might take a lot of RAM if using tools that try to buffer the whole thing.
So:
 * This is text/plain, so the parser behavior is as simple as it gets. Just about the only parser opportunities for parsing bugs are around carriage return and U+0000 handling. (The file does have carriage returns.)
 * As noted, the file size is around 2 GB.
 * A 64-bit Windows build MOZ_CRASHes with IPC message size is too large.

I'm suspecting an integer overflow in buffer size calculation in 32-bit builds.
Group: core-security
OTOH, with a 32-bit build, the crash happens much sooner than what it took a 64-build to download enough data to crash.
(In reply to Henri Sivonen (:hsivonen) from comment #3)
> I'm suspecting an integer overflow in buffer size calculation in 32-bit
> builds.

Using checked integer math for buffer size calculation doesn't fix this, and on 32-bit Linux, the crash occurs when charBuffer.length is 2^28 code units, i.e. 2^29 bytes, which is in *two* doublings away from 2^31, and, therefore, should be far enough from 2^31 not to overflow the sort of math operations that should be involved.

If this isn't about integer overflow, just about the only bug opportunity left is CRLF normalization. I still haven't managed to spot a bug in the CRLF handling code, though.
(In reply to Henri Sivonen (:hsivonen) from comment #6)
> If this isn't about integer overflow,

I still can't reproduce with 64-bit builds (Linux, Windows), so this could still be an integer overflow issue even though:
 1) It doesn't look like an integer overflow in the obvious place (buffer size calculation).
 2) The sizes the trigger the assertion are not wildly off as one would except from an integer wrapping around.
Note that Bughunter has only seen this on 32 bit builds. We don't test Linux 32 bits but do test both 32 bit and 64 bit on Windows 7 and 10.
Crash stats also show all the crashes are on 32-bit x86 or 32-bit ARM.
Some observations:
 * All HTML parser code translated from Java uses int32_t for indices and length in both 32-bit and 64-bit builds.
 * AFAICT, the only buffer length calculations done with machine-dependent types in the HTML parser are the EnsureBufferSpace() calculations, but as noted in comment 6, changing those to overflow-checking operations doesn't show an overflow. Even lowering the max size checks to check for > (INT32_MAX / 2) instead of > INT32_MAX makes no difference.
Attached patch Debugging patch (obsolete) — Splinter Review
* I built with UBSan's integer overflow sanitizer enabled. Nothing caught.
 * To avoid size_t to int32_t conversions, I changed EnsureBufferSpace() methods to perform computations using CheckedInt<int32_t> instead of size_t.
 * Since INT32_MAX is not a power-of-two, it's bogus to use it there. I changed that max size cap to 0x40000000, which is the highest power-of-two that fits in int32_t.
 * I explicitly changed the last argument of memcpy to convert from int32_t to size_t before multiplying by sizeof(char16_t) in order to make sure the multiplication by two happens in the unsigned domain.
 * I explicitly changed the argument to "new (mozilla::fallible) char16_t[]" to convert to size_t in case leaving it as int32_t performed the multiplication by 2 on int32_t. (I don't think it would, but I did this to rule out the possibility.)

None of this helped. I fail to see where else 32-bit build vs. 64-bit build could make a difference. 64-bit builds use int32_t for tracking the buffer size and indices and the above should eliminate all questionable conversions from size_t and ensure that the conversions to size_t are proper.

Then I made nsHtml5TreeBuilder::EnsureBufferSpace() record the current aLength and zero a counter and I made accumulateCharacters check if its aLength plus the counter exceed the recorded length and to add its aLength to the counter. Curiously, this assertion doesn't fire. So the accumulateCharacters calls that happen between any pair of nsHtml5TreeBuilder::EnsureBufferSpace() stay within the range they are supposed to. Yet, charBufferLen has grown more that one would expect from that observation. And only on 32-bit builds.
(In reply to Henri Sivonen (:hsivonen) from comment #11)
> None of this helped.

Whoa. It seems that ./mach run was running something older than the output of the most recent ./mach build. I may have a fix already after all!
Attached patch Overfix this (obsolete) — Splinter Review
This turned out to be a matter of a failure to propagate OOM and the OOM only happened on 32-bit systems. While searching for cause from integer effects, I ended up making the code that was questionable but not the root cause better.
Attachment #8802460 - Attachment is obsolete: true
Tracking 52+ as this is sec high and affects live sites.
Attached patch Overfix this, v2 (obsolete) — Splinter Review
The root cause was that the failure of nsHtml5TreeBuilder::EnsureBufferSpace() was not propagated out of nsHtml5Tokenizer::EnsureBufferSpace().

While at it, this patch
 1) Changes the INT32_MAX limits to the max positive power of two that fits in int32_t.
 2) Explicitly converts int32_t to size_t before computing memcpy length.
 3) Explicitly converts int32_t to size_t before passing it to new[].
 4) Uses overflow-checking math when computing the buffer sizes. (With the buffer sizes we use for dealing with data from network, overflows should be impossible anyway, but I added these checks just in case, because scriptable entry points might pass in extremely large strings and it seemed safer to just check for overflow.)
Attachment #8802862 - Attachment is obsolete: true
Attachment #8803231 - Flags: review?(wchen)
Crash Signature: [@ nsHtml5TreeBuilder::accumulateCharacters]
Comment on attachment 8802862 [details] [diff] [review]
Overfix this

Review of attachment 8802862 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/html/nsHtml5TokenizerCppSupplement.h
@@ +32,5 @@
>    // Adding to the general worst case instead of only the
>    // TreeBuilder-exposed worst case to avoid re-introducing a bug when
>    // unifying the tokenizer and tree builder buffers in the future.
> +  worstCase += 2;
> +  if (!worstCase.isValid()) {

It looks like you can remove the previous two checks because CheckedInt preserves the validity/invalidity in subsequent operations, although the way you currently have it makes it a bit more obvious that the + operator is doing something extra. I'll leave it up to you.
Attachment #8802862 - Flags: review+
Attached patch Overfix this, v3Splinter Review
Attaching a patch with the review comment addressed. Re-requesting review, just in case, because the previous r+ was (accidentally?) on an obsolete patch.
Attachment #8803231 - Attachment is obsolete: true
Attachment #8803231 - Flags: review?(wchen)
Attachment #8805071 - Flags: review?(wchen)
Attachment #8805071 - Flags: review?(wchen) → review+
Group: core-security → dom-core-security
Comment on attachment 8805071 [details] [diff] [review]
Overfix this, v3

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

It's very easy to construct something that triggers the bug. (A gzip bomb easily gets around the problem of transferring a gigabyte over the network.) However, it's not clear that the bug being fixed here is exploitable in builds that contain the patch for bug 1286911. As far as disclosure goes, the concern is more about exploitability of releases 43 to 49 (inclusive), i.e. builds that have bug 1286911 but not its patch, than about builds strictly without this patch. With the affected builds, it's not clear how easy the exploit would be, precisely, but the ease of exploiting a general write-past-end-of-buffer is a good approximation.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Comments as well as the code itself do.

> Which older supported branches are affected by this flaw?

43 to 49, inclusive. (50 has the patch for bug 1286911, which likely, though I'm not certain, makes this remaining bug not a security bug.)

> If not all supported branches, which bug introduced the flaw?

Bug 489820.

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

The same patch applies.

> How likely is this patch to cause regressions; how much testing does it need?

Very unlikely. There should be no impact on pages that don't have text nodes larger than 2^30 UTF-16 code units.
Attachment #8805071 - Flags: sec-approval?
[Tracking Requested - why for this release]:
Can we wait for 51 to fix it? We are in RC week and, afaik, there is not active exploit of it.

NI Ritu so that she is aware.
Flags: needinfo?(rkothari)
(In reply to Sylvestre Ledru [:sylvestre] from comment #20)
> Can we wait for 51 to fix it? We are in RC week and, afaik, there is not
> active exploit of it.

It seems to me that it's OK not to land this fix on the 50 branch, since 50 already has the patch for bug 1286911 as a mitigation. However, if we believe that reason for not putting this into 50, there isn't really a reason to defer landing far into the next cycle after 50 has been released and users of 49 have had a chance to upgrade.
i think the main release where we would need this patch is the esr45 given https://bugzilla.mozilla.org/show_bug.cgi?id=1309834#c18
(In reply to Carsten Book [:Tomcat] from comment #22)
> i think the main release where we would need this patch is the esr45 given
> https://bugzilla.mozilla.org/show_bug.cgi?id=1309834#c18

The same patch that mitigates this in 50 also landed on esr45 for the point release that coincides with 50.
I'll wait for the awesome sec team (Al, Dan) to guide me on whether this is a must for 50 or not. If there is an alternate mitigation for this (as stated in comment 21) perhaps this can be a wontfix for 50. Please let me know.
Flags: needinfo?(rkothari)
Flags: needinfo?(dveditz)
Flags: needinfo?(abillings)
Given the comments about this not being as much of an issue in 50 in comment 18 and the late timeline, I think we shouldn't take this on 50.

I'll give sec-approval for the 29th, as that is two weeks into the next cycle.
Flags: needinfo?(abillings)
Whiteboard: [checkin on 11/29]
Comment on attachment 8805071 [details] [diff] [review]
Overfix this, v3

sec-approval for 11/29 checkin. We'll want it on Aurora and Beta then after it lands and, I guess, ESR45.
Attachment #8805071 - Flags: sec-approval? → sec-approval+
If the fix for bug 1286911 neuters this in Fx50 I don't think we need to wait so long to land this on mozilla-central. (I agree it doesn't need to put pushed into 50, but I'd like it on 51)
Blocks: 489820
Flags: needinfo?(dveditz)
Keywords: regression
Whiteboard: [checkin on 11/29] → [checkin on 11/29] bug 1286911 mitigates in Fx50+
Let's check this in on 11/22 then.
Whiteboard: [checkin on 11/29] bug 1286911 mitigates in Fx50+ → [checkin on 11/22] bug 1286911 mitigates in Fx50+
another real-website where this seem to happen now is 	http://hattivatti.myftp.org/aaa.txt
Whiteboard: [checkin on 11/22] bug 1286911 mitigates in Fx50+ → bug 1286911 mitigates in Fx50+
https://hg.mozilla.org/mozilla-central/rev/8704571a427f
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Group: dom-core-security → core-security-release
Hi :hsivonen,
could you please nominate this uplift to Beta51 and Aurora52?
Flags: needinfo?(hsivonen)
Comment on attachment 8805071 [details] [diff] [review]
Overfix this, v3

[Approval Request Comment]
> If this is not a sec:{high,crit} bug, please state case for ESR consideration:

sec-high

> User impact if declined: 

32-bit builds can be caused to MOZ_CRASH reliably with a gzip bomp (denial of service). If it turned out that bug 1286911 wasn't a full mitigation for worse exploits, there could be worse exploits, but bug 1286911 is likely to be a full mitigation.

> Fix Landed on Version:

53

> Risk to taking this patch (and alternatives if risky): 

The changes made in this patch are simple and related to checking integers for excessive size/overflow, so the risk should be very low.

> String or UUID changes made by this patch: 

None.

Approval Request Comment
> [Feature/regressing bug #]:

Bug 489820.

> [User impact if declined]:

32-bit builds can be caused to MOZ_CRASH reliably with a gzip bomp (denial of service). If it turned out that bug 1286911 wasn't a full mitigation for worse exploits, there could be worse exploits, but bug 1286911 is likely to be a full mitigation.

> [Describe test coverage new/current, TreeHerder]:

The fix itself isn't tested with unit tests (to avoid putting an OOMing gzip bomb in the tree). However, our test suite exercises the HTML parser heavily, so the test suite passing is a good indication that this fix didn't break something adjacent.

> [Risks and why]: 

The changes made in this patch are simple and related to checking integers for excessive size/overflow, so the risk should be very low.

> [String/UUID change made/needed]:

None.
Flags: needinfo?(hsivonen)
Attachment #8805071 - Flags: approval-mozilla-esr45?
Attachment #8805071 - Flags: approval-mozilla-beta?
Attachment #8805071 - Flags: approval-mozilla-aurora?
Comment on attachment 8805071 [details] [diff] [review]
Overfix this, v3

Fix a sec-high. Beta51+, Aurora52+, and ESR45+. Should be in 51 beta 3.
Attachment #8805071 - Flags: approval-mozilla-esr45?
Attachment #8805071 - Flags: approval-mozilla-esr45+
Attachment #8805071 - Flags: approval-mozilla-beta?
Attachment #8805071 - Flags: approval-mozilla-beta+
Attachment #8805071 - Flags: approval-mozilla-aurora?
Attachment #8805071 - Flags: approval-mozilla-aurora+
Temporarily reverted from esr45 for reasons. No action needed on your part, this'll be relanded at the appropriate time.

https://hg.mozilla.org/releases/mozilla-esr45/rev/b5cc90c208e6
Given that we landed this on esr45 for the 45.6 release, I think we should reconsider this for 50.1 as well.
Attachment #8805071 - Flags: approval-mozilla-release?
Comment on attachment 8805071 [details] [diff] [review]
Overfix this, v3

Sec-high, meets the triage bar for inclusion in 50.1.0
Attachment #8805071 - Flags: approval-mozilla-release? → approval-mozilla-release+
Whiteboard: bug 1286911 mitigates in Fx50+ → [adv-main50.1+] bug 1286911 mitigates in Fx50+
Whiteboard: [adv-main50.1+] bug 1286911 mitigates in Fx50+ → [adv-main50.1+][adv-esr45.6+] bug 1286911 mitigates in Fx50+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.