Closed Bug 1437600 Opened 6 years ago Closed 6 years ago

mprotect unused LifoAlloc chunks, and chunk content

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: nbp, Assigned: nbp)

References

(Blocks 1 open bug)

Details

(Keywords: sec-want, Whiteboard: [post-critsmash-triage][adv-main62-])

Attachments

(1 file, 1 obsolete file)

This bug is about adding mitigations to protect our LifoAlloc chunks from being manipulated while they are still held either dead or alive as part of a LifoAlloc.

While they are dead, the chunks are located in the list of Dead chunks since Bug 1405795.  We cannot protect the first page of the chunks as we are using the first words of it to keep them in a single linked list.  On the other hand, we can protect any remaining pages of each chunk until we allocated past the first page.

While they are alive, to be more precise when they are being transferred from one thread to another, we can protect all the pages of all chunks, as the linked list should not be changed.
This patch adds best-effort uses of mprotect / VirtualProtect in order to
prevent mutations on dead zones of the BumpChunk.

The goal of this patch is to catch memory corruption of the LifoAlloc content,
and the corrupting stack reported on crash-stat.

Unfortunately this patch is a best-effort as the linked list of chunks is
stored in the BumpChunks. Thus, at the end, due to the page-size restrictions,
only the chunks allocated by Ion compilations are being protected because all
others are too small.

In addition, this adds a mechanism such that Ion compilation which are scheduled
for off-thread compilations are being completely flagged as non-writtable, while
the data is waiting to be processed off-thread. (~20 beta crashes per day)
Attachment #8951046 - Flags: review?(luke)
Generally sounds like a good idea and so far patch looks good.  Looks like this is diagnostic builds-only, but, because nightly perf still matters to a degree: have you checked the effect of this on performance?
Flags: needinfo?(nicolas.b.pierron)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ada5805e9709e014757d805e1e84fe961c098435

There is a bunch of assertion in the SM-rust build, which I have no idea how to test. Nick, any idea?

And a bunch of failures on Windows debug JIT tests failures, which I will try to investigate.

(In reply to Luke Wagner [:luke] from comment #2)
> Generally sounds like a good idea and so far patch looks good.  Looks like
> this is diagnostic builds-only, but, because nightly perf still matters to a
> degree: have you checked the effect of this on performance?

I checked on Octane, and there is at most a ~1% slowdown from the testing I done locally.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(nfitzgerald)
Comment on attachment 8951046 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.

Review of attachment 8951046 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ds/LifoAlloc.cpp
@@ +79,5 @@
> +
> +# ifdef XP_WIN
> +
> +static
> +void setReadOnly(uint8_t* addr, ptrdiff_t sz)

You should be able to use the "GC" functions at [1] for this.

[1] https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Memory.h#46
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ada5805e9709e014757d805e1e84fe961c098435
> 
> There is a bunch of assertion in the SM-rust build, which I have no idea how
> to test. Nick, any idea?

In the logs, I'm seeing:

    Assertion failure: tree.empty(), at /builds/worker/workspace/build/src/js/src/ds/MemoryProtectionExceptionHandler.cpp:71

You should be able to reproduce by:

    $ cd js/rust
    $ cargo test --features debugmozjs

That will also print the binary that cargo ends up invoking (target/debug/...) which is probably easier for passing to rr or gdb.

Let me know if you need an extra hand reproducing.
Flags: needinfo?(nfitzgerald)
Priority: -- → P1
By the way, I should confirm this locally, but IIRC decommitting pages on Windows also makes them inaccessible. If so, perhaps we can kill two birds with one stone here and also reduce our memory footprint.

On Windows, instead of using VirtualProtect we'd just call VirtualFree with MEM_DECOMMIT, and call VirtualAlloc with MEM_COMMIT when we need the pages again (note: this will also zero the pages).

On OSX we could call madvise with MADV_FREE, then mprotect with PROT_NONE - this matches what jemalloc does to reduce the memory footprint on OSX when we take a measurement in about:memory.

On Linux, madvise with MADV_DONTNEED followed by mprotect with PROT_NONE should be roughly equivalent (though MADV_REMOVE and MADV_FREE are also supported on newer kernels).
Depends on: 1444081
Comment on attachment 8951046 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.

Review of attachment 8951046 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay; forgot to look back at this when I thought a new patch was coming up.

::: js/src/ds/LifoAlloc.cpp
@@ +59,5 @@
>  
> +#ifdef LIFO_CHUNK_PROTECT
> +
> +static
> +const uint8_t* AlignPtrUp(const uint8_t* ptr, uintptr_t align) {

nit, SM style is:

static const uint8_t*
AlignPtrUp(...)
{

Here and below many times

@@ +280,5 @@
>      if (!newChunk)
>          return false;
>      size_t size = newChunk->computedSizeOfIncludingThis();
>      chunks_.append(mozilla::Move(newChunk));
> +    // chunks_.last()->setRWUntil(Loc::End);

This line is commented out: should either be removed or uncommented.
Attachment #8951046 - Flags: review?(luke) → review+
Blocks: 1444930
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #6)
> By the way, I should confirm this locally, but IIRC decommitting pages on
> Windows also makes them inaccessible. If so, perhaps we can kill two birds
> with one stone here and also reduce our memory footprint.
> 
> On Windows, instead of using VirtualProtect we'd just call VirtualFree with
> MEM_DECOMMIT, and call VirtualAlloc with MEM_COMMIT when we need the pages
> again (note: this will also zero the pages).

I will open Bug 1444930 a follow-up bug for it.
(In reply to Luke Wagner [:luke] from comment #7)
> @@ +280,5 @@
> >      if (!newChunk)
> >          return false;
> >      size_t size = newChunk->computedSizeOfIncludingThis();
> >      chunks_.append(mozilla::Move(newChunk));
> > +    // chunks_.last()->setRWUntil(Loc::End);
> 
> This line is commented out: should either be removed or uncommented.

I replaced it by a better comment.  This comment was meant to say that this line is not mandatorry because new allocated BumpChunk already have the intended protection.  I wanted to have a comment to explicit the state in the BumpChunk is.
This bug is held back from landing because some test cases are leaking the world, and this cause the non-fatal all-caps warning message[1] to become fatal.

One solution would be to disable this assertion, In the mean time I will open follow-up bugs to get these warning fixed.

[1] WARNING: YOU ARE LEAKING THE WORLD … AT JS_ShutDown TIME.  FIX THIS!
Attach the patch such that others can test with it.

Delta:
 - Use gc/Memory.h (comment 4)
 - Apply nits (comment 7)
Attachment #8958570 - Flags: review+
Attachment #8951046 - Attachment is obsolete: true
Depends on: 1445392
Depends on: 1445947
Depends on: 1445955
I am still unable to land this patch for the moment because of reproducible (on-try) failures. (Windows 2012 debug, and Windows 10 x64 debug)
I tried to setup a Windows 10 VM to reproduce this issue but failed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef139a93e8e84235ac41a63da07ea6784f7fa347

The test case is within the Jit tests, which should be actionable, if we manage to reproduce it, and might be related to the bug that we are trying to capture here. (I have no way to tell based on the unexpected exit code, AFAIK)

Test case: jit-test\tests\auto-regress\bug1315943.js
Ted can you give it a try, see if you can reproduce this issue? (comment 12)
Flags: needinfo?(tcampbell)
I can reproduce this locally (on at least debug-opt builds). The crash is a due to a stack overflow in js::SplayTree::checkCoherency. Looking at the frames, it seems that node->right is always null. I am suspecting there is an issue with the SplayTree balancing and we may not have noticed before due to smaller trees.
Flags: needinfo?(tcampbell)
Depends on: 1468984
Comment on attachment 8958570 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

I have no idea how one should do.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The patch highlight that we are trying to investigate a memory corruption issue, it does not tell what come is corrupting it. (this is what we are trying to figure that out).

> Which older supported branches are affected by this flaw?

since at least 2 years.

> If not all supported branches, which bug introduced the flaw?

We do not know …

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

This patch should apply cleanly but depends on a few fixes and should ride the train.

> How likely is this patch to cause regressions; how much testing does it need?

This might cause performance regression, but nothing noticed locally.
Attachment #8958570 - Flags: sec-approval?
Comment on attachment 8958570 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.

This is a sec-want bug. It doesn't need sec-approval to land on trunk.
Attachment #8958570 - Flags: sec-approval?
https://hg.mozilla.org/mozilla-central/rev/44a69a4ebc09
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Depends on: 1470115
Depends on: 1470123
This crash stats query should work for tracking the crashes:
https://crash-stats.mozilla.com/search/?moz_crash_reason=~Tried%20to%20access%20a%20protected%20region%21&build_id=%3E%3D20180621013659&version=62.0a1&version=61.0b&date=%3E%3D2018-06-20T02%3A00%3A00.000Z&date=%3C2019-06-19T02%3A00%3A00.000Z

Looks like there's already one crash with a stack that implicates GC type information sweeping. Given the noise floor of crashes from bad hardware it probably makes sense to wait and see if a pattern emerges though.
Depends on: 1470732
Depends on: 1470935
Depends on: 1470979
Depends on: 1471234
Depends on: 1471589
Depends on: 1473618
Depends on: 1472638
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Depends on: 1474904
Depends on: 1475186
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main62-]
Depends on: 1488386
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: