Closed
Bug 1437600
Opened 7 years ago
Closed 6 years ago
mprotect unused LifoAlloc chunks, and chunk content
Categories
(Core :: JavaScript Engine, defect, P1)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla62
People
(Reporter: nbp, Assigned: nbp)
References
Details
(Keywords: sec-want, Whiteboard: [post-critsmash-triage][adv-main62-])
Attachments
(1 file, 1 obsolete file)
22.50 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
This bug is about adding mitigations to protect our LifoAlloc chunks from being manipulated while they are still held either dead or alive as part of a LifoAlloc.
While they are dead, the chunks are located in the list of Dead chunks since Bug 1405795. We cannot protect the first page of the chunks as we are using the first words of it to keep them in a single linked list. On the other hand, we can protect any remaining pages of each chunk until we allocated past the first page.
While they are alive, to be more precise when they are being transferred from one thread to another, we can protect all the pages of all chunks, as the linked list should not be changed.
Assignee | ||
Comment 1•7 years ago
|
||
This patch adds best-effort uses of mprotect / VirtualProtect in order to
prevent mutations on dead zones of the BumpChunk.
The goal of this patch is to catch memory corruption of the LifoAlloc content,
and the corrupting stack reported on crash-stat.
Unfortunately this patch is a best-effort as the linked list of chunks is
stored in the BumpChunks. Thus, at the end, due to the page-size restrictions,
only the chunks allocated by Ion compilations are being protected because all
others are too small.
In addition, this adds a mechanism such that Ion compilation which are scheduled
for off-thread compilations are being completely flagged as non-writtable, while
the data is waiting to be processed off-thread. (~20 beta crashes per day)
Attachment #8951046 -
Flags: review?(luke)
Comment 2•7 years ago
|
||
Generally sounds like a good idea and so far patch looks good. Looks like this is diagnostic builds-only, but, because nightly perf still matters to a degree: have you checked the effect of this on performance?
Updated•7 years ago
|
Flags: needinfo?(nicolas.b.pierron)
Assignee | ||
Comment 3•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ada5805e9709e014757d805e1e84fe961c098435
There is a bunch of assertion in the SM-rust build, which I have no idea how to test. Nick, any idea?
And a bunch of failures on Windows debug JIT tests failures, which I will try to investigate.
(In reply to Luke Wagner [:luke] from comment #2)
> Generally sounds like a good idea and so far patch looks good. Looks like
> this is diagnostic builds-only, but, because nightly perf still matters to a
> degree: have you checked the effect of this on performance?
I checked on Octane, and there is at most a ~1% slowdown from the testing I done locally.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(nfitzgerald)
Comment 4•7 years ago
|
||
Comment on attachment 8951046 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.
Review of attachment 8951046 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/ds/LifoAlloc.cpp
@@ +79,5 @@
> +
> +# ifdef XP_WIN
> +
> +static
> +void setReadOnly(uint8_t* addr, ptrdiff_t sz)
You should be able to use the "GC" functions at [1] for this.
[1] https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Memory.h#46
Comment 5•7 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #3)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=ada5805e9709e014757d805e1e84fe961c098435
>
> There is a bunch of assertion in the SM-rust build, which I have no idea how
> to test. Nick, any idea?
In the logs, I'm seeing:
Assertion failure: tree.empty(), at /builds/worker/workspace/build/src/js/src/ds/MemoryProtectionExceptionHandler.cpp:71
You should be able to reproduce by:
$ cd js/rust
$ cargo test --features debugmozjs
That will also print the binary that cargo ends up invoking (target/debug/...) which is probably easier for passing to rr or gdb.
Let me know if you need an extra hand reproducing.
Flags: needinfo?(nfitzgerald)
Assignee | ||
Updated•7 years ago
|
Priority: -- → P1
Comment 6•7 years ago
|
||
By the way, I should confirm this locally, but IIRC decommitting pages on Windows also makes them inaccessible. If so, perhaps we can kill two birds with one stone here and also reduce our memory footprint.
On Windows, instead of using VirtualProtect we'd just call VirtualFree with MEM_DECOMMIT, and call VirtualAlloc with MEM_COMMIT when we need the pages again (note: this will also zero the pages).
On OSX we could call madvise with MADV_FREE, then mprotect with PROT_NONE - this matches what jemalloc does to reduce the memory footprint on OSX when we take a measurement in about:memory.
On Linux, madvise with MADV_DONTNEED followed by mprotect with PROT_NONE should be roughly equivalent (though MADV_REMOVE and MADV_FREE are also supported on newer kernels).
Comment 7•7 years ago
|
||
Comment on attachment 8951046 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.
Review of attachment 8951046 [details] [diff] [review]:
-----------------------------------------------------------------
Sorry for the delay; forgot to look back at this when I thought a new patch was coming up.
::: js/src/ds/LifoAlloc.cpp
@@ +59,5 @@
>
> +#ifdef LIFO_CHUNK_PROTECT
> +
> +static
> +const uint8_t* AlignPtrUp(const uint8_t* ptr, uintptr_t align) {
nit, SM style is:
static const uint8_t*
AlignPtrUp(...)
{
Here and below many times
@@ +280,5 @@
> if (!newChunk)
> return false;
> size_t size = newChunk->computedSizeOfIncludingThis();
> chunks_.append(mozilla::Move(newChunk));
> + // chunks_.last()->setRWUntil(Loc::End);
This line is commented out: should either be removed or uncommented.
Attachment #8951046 -
Flags: review?(luke) → review+
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #6)
> By the way, I should confirm this locally, but IIRC decommitting pages on
> Windows also makes them inaccessible. If so, perhaps we can kill two birds
> with one stone here and also reduce our memory footprint.
>
> On Windows, instead of using VirtualProtect we'd just call VirtualFree with
> MEM_DECOMMIT, and call VirtualAlloc with MEM_COMMIT when we need the pages
> again (note: this will also zero the pages).
I will open Bug 1444930 a follow-up bug for it.
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #7)
> @@ +280,5 @@
> > if (!newChunk)
> > return false;
> > size_t size = newChunk->computedSizeOfIncludingThis();
> > chunks_.append(mozilla::Move(newChunk));
> > + // chunks_.last()->setRWUntil(Loc::End);
>
> This line is commented out: should either be removed or uncommented.
I replaced it by a better comment. This comment was meant to say that this line is not mandatorry because new allocated BumpChunk already have the intended protection. I wanted to have a comment to explicit the state in the BumpChunk is.
Assignee | ||
Comment 10•7 years ago
|
||
This bug is held back from landing because some test cases are leaking the world, and this cause the non-fatal all-caps warning message[1] to become fatal.
One solution would be to disable this assertion, In the mean time I will open follow-up bugs to get these warning fixed.
[1] WARNING: YOU ARE LEAKING THE WORLD … AT JS_ShutDown TIME. FIX THIS!
Assignee | ||
Comment 11•7 years ago
|
||
Attach the patch such that others can test with it.
Delta:
- Use gc/Memory.h (comment 4)
- Apply nits (comment 7)
Attachment #8958570 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Attachment #8951046 -
Attachment is obsolete: true
Assignee | ||
Comment 12•6 years ago
|
||
I am still unable to land this patch for the moment because of reproducible (on-try) failures. (Windows 2012 debug, and Windows 10 x64 debug)
I tried to setup a Windows 10 VM to reproduce this issue but failed.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef139a93e8e84235ac41a63da07ea6784f7fa347
The test case is within the Jit tests, which should be actionable, if we manage to reproduce it, and might be related to the bug that we are trying to capture here. (I have no way to tell based on the unexpected exit code, AFAIK)
Test case: jit-test\tests\auto-regress\bug1315943.js
Assignee | ||
Comment 13•6 years ago
|
||
Ted can you give it a try, see if you can reproduce this issue? (comment 12)
Flags: needinfo?(tcampbell)
Comment 14•6 years ago
|
||
I can reproduce this locally (on at least debug-opt builds). The crash is a due to a stack overflow in js::SplayTree::checkCoherency. Looking at the frames, it seems that node->right is always null. I am suspecting there is an issue with the SplayTree balancing and we may not have noticed before due to smaller trees.
Flags: needinfo?(tcampbell)
Assignee | ||
Comment 15•6 years ago
|
||
Comment on attachment 8958570 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.
[Security approval request comment]
> How easily could an exploit be constructed based on the patch?
I have no idea how one should do.
> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
The patch highlight that we are trying to investigate a memory corruption issue, it does not tell what come is corrupting it. (this is what we are trying to figure that out).
> Which older supported branches are affected by this flaw?
since at least 2 years.
> If not all supported branches, which bug introduced the flaw?
We do not know …
> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
This patch should apply cleanly but depends on a few fixes and should ride the train.
> How likely is this patch to cause regressions; how much testing does it need?
This might cause performance regression, but nothing noticed locally.
Attachment #8958570 -
Flags: sec-approval?
Comment 16•6 years ago
|
||
Comment on attachment 8958570 [details] [diff] [review]
Use mprotect to prevent mutations of inaccessible regions.
This is a sec-want bug. It doesn't need sec-approval to land on trunk.
Attachment #8958570 -
Flags: sec-approval?
Comment 17•6 years ago
|
||
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox60:
--- → wontfix
status-firefox61:
--- → wontfix
status-firefox62:
--- → fixed
status-firefox-esr52:
--- → wontfix
status-firefox-esr60:
--- → wontfix
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Assignee | ||
Comment 18•6 years ago
|
||
Follow-up to fix beta builds:
https://hg.mozilla.org/mozilla-central/rev/1c33a38da75d
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
This crash stats query should work for tracking the crashes:
https://crash-stats.mozilla.com/search/?moz_crash_reason=~Tried%20to%20access%20a%20protected%20region%21&build_id=%3E%3D20180621013659&version=62.0a1&version=61.0b&date=%3E%3D2018-06-20T02%3A00%3A00.000Z&date=%3C2019-06-19T02%3A00%3A00.000Z
Looks like there's already one crash with a stack that implicates GC type information sweeping. Given the noise floor of crashes from bad hardware it probably makes sense to wait and see if a pattern emerges though.
Updated•6 years ago
|
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Updated•6 years ago
|
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main62-]
Updated•5 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•