Closed Bug 1512758 Opened 5 years ago Closed 5 years ago

Write beyond bounds in StringBuilder::ToString()

Categories

(Core :: DOM: Core & HTML, defect, P1)

64 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox63 --- unaffected
firefox64 --- wontfix
firefox65 + verified
firefox66 + verified

People

(Reporter: mozillabugs, Assigned: hsivonen)

References

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main65+])

Attachments

(7 files)

Attached file bug_1299_poc_2.htm
StringBuilder::ToString() (dom\base\nsContentUtils.cpp) can cause an integer overflow, with subsequent underallocation of an output buffer and writing beyond bounds. The bug became patent in FF 64b14 (or possibly an earlier beta of 64.0) when the object |BulkAppender appender| was added in place of directly appending to the output string |nsAString &aOut|. [1]

The bug is that the various StringBuilder::Append() functions accumulate a length into |uint32_t mLength| without checking for overflow (e.g, line 9195 below). If the document being serialized is sufficiently large, |mLength| overflows. When ToString() eventually is called, it initializes |BulkAppender appender| with the overflowed length (line 9220, below). If the document contains attribute values (e.g., <p a="......">) totalling > 4 GB, BulkAppender::Append(Span<const char16_t>) (line 9076-77, below), etc., memcpy() beyond the end of the buffer at |mHandle.Elements()|. (The assert on line 9071 is not present in release builds).

	9189:   void [StringBuilder::]AppendWithAttrEncode(nsAutoString* aString, uint32_t aLen)
	9190:   {
	9191:     Unit* u = AddUnit();
	9192:     u->mString = aString;
	9193:     u->mType = Unit::eStringWithEncode;
	9194:     u->mLength = aLen;
	9195:     mLength += aLen;
	9196:   }

	9217:   bool [StringBuilder::]ToString(nsAString& aOut)
	9218:   {
	9219:     nsresult rv;
	9220:     BulkAppender appender(aOut.BulkWrite(mLength, 0, true, rv));
	9221:     if (NS_FAILED(rv)) {
	9222:       return false;
	9223:     }
	9224: 
	9225:     for (StringBuilder* current = this; current; current = current->mNext) {
	9226:       uint32_t len = current->mUnits.Length();
	9227:       for (uint32_t i = 0; i < len; ++i) {
	9228:         Unit& u = current->mUnits[i];
	9229:         switch (u.mType) {
	9230:           case Unit::eAtom:
	9231:             appender.Append(*(u.mAtom));
	9232:             break;
	9233:           case Unit::eString:
	9234:             appender.Append(*(u.mString));
	9235:             break;
	9236:           case Unit::eStringWithEncode:
	9237:             EncodeAttrString(*(u.mString), appender);
	9238:             break;
	...
	9264:         }
	9265:       }
	9266:     }
	9267:     appender.Finish();
	9268:     return true;
	9269:   }

	9068:   void [BulkAppender::]Append(Span<const char16_t> aStr)
	9069:   {
	9070:     size_t len = aStr.Length();
	9071:     MOZ_ASSERT(mPosition + len <= mHandle.Length());
	9072:     // Both mHandle.Elements() and aStr.Elements() are guaranteed
	9073:     // to be non-null (by the string implementation and by Span,
	9074:     // respectively), so not checking the pointers for null before
	9075:     // memcpy does not lead to UB even if len was zero.
	9076:     memcpy(
	9077:       mHandle.Elements() + mPosition, aStr.Elements(), len * sizeof(char16_t));
	9078:     mPosition += len;
	9079:   }
		
Attached is a POC that demonstrates the overwrite. Using it by starting FF 64.0b14 x64 on a machine with >= 16GB of memory, putting the POC files on a webserver, and loading bug_1299_poc_2.htm. Now attach a debugger to the FF processes and wait for FF to crash, which will take ~10 minutes. The debugger will indicate an access violation writing beyond the end of |mHandle.Elements()| with the data from the JS setAttribute() statement on POC line 14 et seq.

------
[1] The bug was latent in older versions of FF where ToString() used nsAString::SetLength() with the overflowed value, then nsAString::Append() to add the string data, since nsAString::Append() checked the output buffer length and increased it if necessary before appending the data. BulkAppender::Append(Span<const char16_t>), etc., in contrast, assume that the buffer is long enough.
"Using it by starting " -> "Use it by starting". More sunlight!
"where ToString() used nsAString::SetLength()" -> "where ToString() used nsAString::SetCapacity()"
Flags: needinfo?(hsivonen)
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Flags: needinfo?(hsivonen)
Blocks: 1488697
Group: core-security → dom-core-security
Keywords: crash, regression
I've added a POC that achieves a similar overwrite, but uses much less memory and occurs in about 15 seconds. Use it just like the old POC, but instead load bug_1299_poc_3.htm in FF.
Attached file bug_1299_poc_3.htm
Attached file Bug 1512758.
I have trouble witnessing the original crash even in an ASAN build, because the wait times are really long, so I'm unsure if I should wait more or not. Reporter, could you, please, check if my patch fixes the problem for you?
Flags: needinfo?(mozillabugs)
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> I have trouble witnessing the original crash even in an ASAN build, because
> the wait times are really long, so I'm unsure if I should wait more or not.
> Reporter, could you, please, check if my patch fixes the problem for you?

Try bug_1299_poc_3.htm and bug_1299_ajax_poc_3.htm . Put both on a webserver, start FF 64b14, attach a debugger to all the FF processes, then load bug_1299_poc_3.htm . It should overwrite in about 15 seconds on a machine with >= 8GB RAM. I haven't tried it with an ASAN build (are there any for Windows?), so I don't know how long that would take.
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> I have trouble witnessing the original crash even in an ASAN build, because
> the wait times are really long, so I'm unsure if I should wait more or not.
> Reporter, could you, please, check if my patch fixes the problem for you?

The patch fixes the overwrite.

However, FF leaks nearly the entire amount of memory allocated by the POC after the code bails out, and restarting the POC allocates and leaks another ~4GB.

BTW, have you been able to reproduce the overwrite using bug_1299_poc_3.htm and bug_1299_ajax_poc_3.htm ? Make sure to put them both in the webserver root, because bug_1299_poc_3.htm uses "/bug_1299_ajax_poc_3.htm" as the destination for an HTTP POST request. FF release-build should spin for about 15 seconds before crashing. FF debug-build takes ~5 min to crash.
Flags: needinfo?(mozillabugs)
(In reply to mozillabugs from comment #10)
> However, FF leaks nearly the entire amount of memory allocated by the POC
> after the code bails out, and restarting the POC allocates and leaks another
> ~4GB.

By code inspection, the serialization code should clean up immediate when it returns--even on failure.

However, the DOM nodes that you appended may not be released until the cycle collector runs.

Can you tell if what you are seeing is the serialization code leaking or the DOM nodes not getting cycle collected? How did you determine that you had put Firefox in a state where the DOM nodes should be eligible for deletion?

> BTW, have you been able to reproduce the overwrite using bug_1299_poc_3.htm
> and bug_1299_ajax_poc_3.htm ?

Yes, using an ASAN build on x86_64 Linux.
Comment on attachment 9030423 [details]
Bug 1512758.

[Security Approval Request]

How easily could an exploit be constructed based on the patch?: Very easy. (The check-in comment paints a bulls-eye only to the extent the code change itself does.)

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes

Which older supported branches are affected by this flaw?: 64 and later

If not all supported branches, which bug introduced the flaw?: Bug 1488697

Do you have backports for the affected branches?: No

If not, how different, hard to create, and risky will they be?: The same patch should apply, potentially requiring whitespace changes.

How likely is this patch to cause regressions; how much testing does it need?: Extremely unlikely to cause regressions, since this does not change observable behavior when a 32-bit integer does not overflow. At this point, this doesn't really need further testing.
Attachment #9030423 - Flags: sec-approval?
Severity: normal → critical
Priority: -- → P1
(In reply to Henri Sivonen (:hsivonen) from comment #11)
> (In reply to mozillabugs from comment #10)
> > However, FF leaks nearly the entire amount of memory allocated by the POC
> > after the code bails out, and restarting the POC allocates and leaks another
> > ~4GB.
> 
> By code inspection, the serialization code should clean up immediate when it
> returns--even on failure.
> 
> However, the DOM nodes that you appended may not be released until the cycle
> collector runs.
> 
> Can you tell if what you are seeing is the serialization code leaking or the
> DOM nodes not getting cycle collected? How did you determine that you had
> put Firefox in a state where the DOM nodes should be eligible for deletion?

I am not sure which. However, everything should get deleted when I re-run the script, right? That's what doesn't appear to happen, at least according to Process Explorer's "private bytes" field: after the first run, usage is ~4.7GB. After running the script again, the usage is ~9GB, and it stays there for as long as I've let it sit, maybe 30 minutes.

> > BTW, have you been able to reproduce the overwrite using bug_1299_poc_3.htm
> > and bug_1299_ajax_poc_3.htm ?
> 
> Yes, using an ASAN build on x86_64 Linux.

Good! BTW, this there a Windows ASAN build?
BTW, *is* there a Windows ASAN build?
(In reply to Henri Sivonen (:hsivonen) from comment #15)
> (In reply to mozillabugs from comment #14)
> > BTW, *is* there a Windows ASAN build?
> 
> https://developer.mozilla.org/en-US/docs/Mozilla/Testing/
> ASan_Nightly_Project has a download link.
> https://developer.mozilla.org/en-US/docs/Mozilla/Testing/
> Firefox_and_Address_Sanitizer#Creating_local_builds_on_Windows has build
> instructions.

Thanks!
> > Can you tell if what you are seeing is the serialization code leaking or the
> > DOM nodes not getting cycle collected? How did you determine that you had
> > put Firefox in a state where the DOM nodes should be eligible for deletion?
> 
> I am not sure which. However, everything should get deleted when I re-run
> the script, right? That's what doesn't appear to happen, at least according
> to Process Explorer's "private bytes" field: after the first run, usage is
> ~4.7GB. After running the script again, the usage is ~9GB, and it stays
> there for as long as I've let it sit, maybe 30 minutes.

if you open a tab with about:memory, you can force GC and CC (and minimize).  See if those recover the memory.  Also: take a memory report after GC&CC, and attach it here.  Thanks!
Flags: needinfo?(mozillabugs)
Flags: sec-bounty?
Sec-approval+ for check in on January 2, 2019 because of the obvious nature of the fix.

We'll need a beta branch patch made and nominated at that time as well.
Whiteboard: [checkin on 1/2]
Attachment #9030423 - Flags: sec-approval? → sec-approval+
Comment on attachment 9030423 [details]
Bug 1512758.

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: Bug 1488697

User impact if declined: Attacker can overwrite heap memory. Assume all the usual consequences.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: No

Needs manual test from QE?: Yes

If yes, steps to reproduce: Download attachments bug_1299_poc_3.htm and bug_1299_ajax_poc_3.htm and put them into a directory that gets served by an HTTP server (e.g. python -m SimpleHTTPServer 8080 ). In a 64-bit ASAN build, navigate to bug_1299_poc_3.htm as served by the HTTP server. Wait 15 minutes. Observe that ASAN did not report an out-of-bounds write.

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Not risky, because just adds an integer overflow check.

String changes made/needed: None
Attachment #9030423 - Flags: approval-mozilla-beta?
(In reply to Al Billings [:abillings] from comment #18)
> Sec-approval+ for check in on January 2, 2019 because of the obvious nature
> of the fix.

I'll be out of office on January 2 (back on January 7), so please treat this as checkin-needed on January 2.

> We'll need a beta branch patch made and nominated at that time as well.

The same patch applies.
(In reply to Henri Sivonen (:hsivonen) from comment #19)
> If yes, steps to reproduce: Download attachments bug_1299_poc_3.htm and
> bug_1299_ajax_poc_3.htm and put them into a directory that gets served by an
> HTTP server (e.g. python -m SimpleHTTPServer 8080 ). In a 64-bit ASAN build,
> navigate to bug_1299_poc_3.htm as served by the HTTP server. Wait 15
> minutes. Observe that ASAN did not report an out-of-bounds write.

"a directory" should read "the webserver root directory", since bug_1299_poc_3.htm contains a reference to "/bug_1299_ajax_poc_3.htm". If you put them into some other directory, you'll get a false negative.
(In reply to Randell Jesup [:jesup] from comment #17)
> > > Can you tell if what you are seeing is the serialization code leaking or the
> > > DOM nodes not getting cycle collected? How did you determine that you had
> > > put Firefox in a state where the DOM nodes should be eligible for deletion?
> > 
> > I am not sure which. However, everything should get deleted when I re-run
> > the script, right? That's what doesn't appear to happen, at least according
> > to Process Explorer's "private bytes" field: after the first run, usage is
> > ~4.7GB. After running the script again, the usage is ~9GB, and it stays
> > there for as long as I've let it sit, maybe 30 minutes.
> 
> if you open a tab with about:memory, you can force GC and CC (and minimize).
> See if those recover the memory.  Also: take a memory report after GC&CC,
> and attach it here.  Thanks!

Ok, here are the results. I prepared these by starting FF with the patch, loading bug_1299_poc_3.htm, and waiting for it to complete loading. I judged it to be complete when the loading indicator in the address bar ceased moving, which also was when memory usage in the relevant content process (pid 4292) plateaued and that process stopped using significant CPU. At this point, the process used about 4.7GB of private memory.

At this point, I took a memory report, which is "Memory report 1", attached. I then clicked "minimize memory usage", took a report (omitted), and, noticing no real effect of the minimize, clicked, in sequence, "GC", "CC", and "minimize memory usage" again, waiting for each to complete before clicking the next one, and then took the second memory report, which is "Memory report 2", attached. Even after all this, the content process's memory usage was basically unchanged at 4.7GB, and it still sat around, hours after I first loaded the POC.

I then tried reloading the POC, and the (still-existing!) content process's memory usage increased to 9.3GB, but, at some point within an hour or so (I was interrupted and so can't say exactly when), the additional memory got released, leaving usage again at 4.7GB. I tried reloading the POC again, and memory usage plateaued at about 9.3GB, but once again shrank to 4.7GB, in this case within 1-2 minutes of the plateau. The process is still sitting around, about 30 minutes after the last time I loaded the POC.
Flags: needinfo?(mozillabugs)
Attached file memory report 1
Attached file memory report 2
4,519.49 MB (100.0%) -- explicit
├──4,402.36 MB (97.41%) ── heap-unclassified

So we'll need DMD or a GC/CC log to track it down, unless it's obvious from inspection
(In reply to Randell Jesup [:jesup] from comment #25)
> 4,519.49 MB (100.0%) -- explicit
> ├──4,402.36 MB (97.41%) ── heap-unclassified
> 
> So we'll need DMD or a GC/CC log to track it down, unless it's obvious from
> inspection

Can you reproduce this issue? It is completely reproducible for me.
(In reply to Randell Jesup [:jesup] from comment #25)
> 4,519.49 MB (100.0%) -- explicit
> ├──4,402.36 MB (97.41%) ── heap-unclassified
> 
> So we'll need DMD or a GC/CC log to track it down, unless it's obvious from
> inspection

Stack is truncated unfortunately, but it's all coming out of `nsIDocument::CreateComment`.

> Unreported {
>   2,083,304 blocks in heap block record 1 of 2,113
>   4,266,606,592 bytes (4,266,606,592 requested / 0 slop)
>   Individual block sizes: 2,048 x 2,083,304
>   92.26% of the heap (92.26% cumulative)
>   92.34% of unreported (92.34% cumulative)
>   Allocated at {
>     #01: replace_malloc(unsigned long) (/var/dev/erahm/mozilla-unified/memory/replace/dmd/DMD.cpp:1123)
>     #02: nsTextFragment::SetTo(char16_t const*, int, bool, bool) (/var/dev/erahm/mozilla-unified/dom/base/nsTextFragment.cpp:299)
>     #03: mozilla::dom::CharacterData::SetTextInternal(unsigned int, unsigned int, char16_t const*, unsigned int, bool, CharacterDataChangeInfo::Details*) (/var/dev/erahm/mozilla-unified/dom/base/CharacterData.cpp:264)
>     #04: mozilla::dom::CharacterData::SetText(char16_t const*, unsigned int, bool) (/var/dev/erahm/mozilla-unified/dom/base/CharacterData.cpp:554)
>     #05: nsIDocument::CreateComment(nsTSubstring<char16_t> const&) const (/var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang-7-debug/dist/include/mozilla/RefPtr.h:231)
>     #06: mozilla::dom::Document_Binding::createComment(JSContext*, JS::Handle<JSObject*>, nsIDocument*, JSJitMethodCallArgs const&) (/var/dev/erahm/mozilla-unified/obj-x86_64-pc-linux-gnu-clang-7-debug/dist/include/mozilla/AlreadyAddRefed.h:145)
>     #07: ??? (???:???)
>   }
> }
https://hg.mozilla.org/mozilla-central/rev/e0d35b7a804e
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Comment on attachment 9030423 [details]
Bug 1512758.

[Triage Comment]
Fixes a sec-crit. Approved for 65.0b8.
Attachment #9030423 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
Whiteboard: [post-critsmash-triage]
(In reply to Eric Rahm [:erahm] from comment #27)
> (In reply to Randell Jesup [:jesup] from comment #25)
> > 4,519.49 MB (100.0%) -- explicit
> > ├──4,402.36 MB (97.41%) ── heap-unclassified
> > 
> > So we'll need DMD or a GC/CC log to track it down, unless it's obvious from
> > inspection
> 
> Stack is truncated unfortunately, but it's all coming out of
> `nsIDocument::CreateComment`.

Is this actionable enough to make sense to track as a separate bug?
Flags: needinfo?(erahm)
Flags: sec-bounty? → sec-bounty+

(In reply to Henri Sivonen (:hsivonen) from comment #32)

(In reply to Eric Rahm [:erahm] from comment #27)

(In reply to Randell Jesup [:jesup] from comment #25)

4,519.49 MB (100.0%) -- explicit
├──4,402.36 MB (97.41%) ── heap-unclassified

So we'll need DMD or a GC/CC log to track it down, unless it's obvious from
inspection

Stack is truncated unfortunately, but it's all coming out of
nsIDocument::CreateComment.

Is this actionable enough to make sense to track as a separate bug?

It's definitely reproducible, so yes. I'm not sure we can post the STR until this fix rides the trains though.

Flags: needinfo?(erahm)

(In reply to Eric Rahm [:erahm] from comment #33)

It's definitely reproducible, so yes. I'm not sure we can post the STR until this fix rides the trains though.

I agree on both points.

Extracted bug 1518461 as a spin-off so that the memory issue doesn't get forgotten now that the security issue is in the FIXED state.

I've managed to reproduce the issue using the "bug_1299_poc_3.htm" test case and reproduced on affected builds as follows:
Windows10/Ubuntu 18.04
-65.0b3 2018-12-10
-64.0 2018-12-06
-65.0a1 Assan 2018-12-01 (Win 10)

(the tab crashed within 5 minutes for all the above instances).

then proceeded verifying the bug is not reproducible with the fix on:
-65.0b9 2019-01-07
-66.0a1 2019-01-09
For Windows 10 x64 and Ubuntu 18.04 x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Duplicate of this bug: 1518958

Please CC me on that bug.

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main65+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: