Rooting hazard with Maybe<AutoClearTypeInferenceStateOnOOM> in ObjectGroup::sweep

RESOLVED FIXED in Firefox 47

Status

()

Core
JavaScript Engine
RESOLVED FIXED
a year ago
6 months ago

People

(Reporter: jandem, Assigned: jonco)

Tracking

(Blocks: 1 bug, {leave-open, sec-high})

unspecified
mozilla49
leave-open, sec-high
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox46 wontfix, firefox47+ fixed, firefox48+ fixed, firefox49+ fixed, firefox-esr45 unaffected)

Details

(Whiteboard: [post-critsmash-triage][adv-main47+])

Attachments

(3 attachments, 4 obsolete attachments)

(Reporter)

Description

a year ago
decoder found a hard-to-reduce OOM crash. I think what's happening is that ObjectGroup::sweep does:

  Maybe<AutoClearTypeInferenceStateOnOOM> fallbackOOM;

The rooting analysis doesn't know the destructor of this thing can GC, so various callers can call this with unrooted pointers on the stack.
(Reporter)

Comment 1

a year ago
Note that we have some (top)crashes in ObjectGroup::sweep and related code, so maybe fixing this will help there as well.
status-firefox48: --- → affected
tracking-firefox48: --- → ?
Keywords: sec-high
(Reporter)

Comment 2

a year ago
sfink, is there anything we can do to make the analysis understand the situation in comment 0, and related footguns?
Flags: needinfo?(sphink)
(Reporter)

Comment 3

a year ago
Hm, the GC stack looks like this:

* ~Maybe
* ~AutoClearTypeInferenceStateOnOOM
* Zone::discardJitCode
* ZoneCellIter::ZoneCellIter
* evictNursery
...

I'm not sure if it's caused by the Maybe<>, mrgiggles tells me Zone::discardJitCode and ~AutoClearTypeInferenceStateOnOOM also cannot GC, but maybe that's just because all callers currently suppress GC or something?
(Reporter)

Updated

a year ago
status-firefox46: --- → affected
status-firefox47: --- → affected
tracking-firefox46: --- → ?
tracking-firefox47: --- → ?
Too late for a fix for 46, tracking for 47+ since it's rated sec-high.
status-firefox46: affected → wontfix
tracking-firefox46: ? → +
tracking-firefox47: ? → +
tracking-firefox48: ? → +
Ok, this is a weird one. There was a bug in the analysis where it lost track of the GC call possibility (due to an internally-generated destructor). I fixed that, and immediately got hazard reports for something that looks very much like this. But when I looked closer, it seemed to me like it was a false positive -- the things the analysis were worried about actually passed in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual practice it should never be able to GC. I "fixed" it by telling the analysis that when that oom* parameter was nullptr, that it wouldn't GC. But this bug suggests that the annotation is incorrect. (In a debug build, this annotation will now cause an assert crash if it really does GC, but given that it may only happen on OOM I'm skeptical that will help.)

Can I see a more complete crash stack? I want to know if this was from a pre write barrier, or a GC. Is SweepArenaList in the stack or not?
Flags: needinfo?(sphink) → needinfo?(jdemooij)
(Reporter)

Comment 6

a year ago
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> But when I looked closer, it seemed to me like it was a
> false positive -- the things the analysis were worried about actually passed
> in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual
> practice it should never be able to GC.

I think most of these functions construct an AutoClearTypeInferenceStateOnOOM "on demand" if their |oom| argument is nullptr. For the case I was looking at we did it here I think:

https://dxr.mozilla.org/mozilla-central/rev/6adc822f5e27a55551faeb6c47a9bd8b0859a23b/js/src/vm/TypeInference.cpp#4190-4191

See also the stack in comment 3, the ~Maybe was the one in ObjectGroup::sweep IIRC. I don't have the test handy but I can try to reproduce again on Linux if really needed.
Flags: needinfo?(jdemooij) → needinfo?(sphink)
Ok, seems like the stack the analysis was reporting wasn't possible in practice, but the stack here is.

It starts at NativeObject.cpp's UpdateShapeTypeAndValue which calls ObjectGroup::newScript() which ends up calling ObjectGroup::sweep(), which creates an on-demand AutoClearTypeInferenceStateOnOOM (ugh; I missed that). In its destructor, it can call discardJitCode, which makes a ZoneCellIter, which evicts the nursery. :(

In this particular case, we don't *need* to evict the nursery, because we're only iterating over scripts which are not currently nursery-allocated.
Flags: needinfo?(sphink)
(Assignee)

Comment 8

a year ago
(In reply to Steve Fink [:sfink] [:s:] from comment #7)
This has come about because I changed Zone::discardJitCode to use ZoneCellIter rather than ZoneCellIterUnderGC in bug 1244412 (it's not always called under GC).  Previously this OOM situation would have resulted in an assertion failure in debug builds, but would have worked (I think) in release builds.

We can fix ZoneCellIter to not evict the nursery unnecessarily which is worth doing for its own sake.  But it sounds like we should also fix the analysis as well.
(Assignee)

Comment 9

a year ago
Created attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

Only evict the nursery when iterating nursery allocated things.
Attachment #8748106 - Flags: review?(sphink)
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

Review of attachment 8748106 [details] [diff] [review]:
-----------------------------------------------------------------

Heh. This is what I finally figured out that we'd want, and then I saw your review request. Perfect!

I think I'll probably tell the analysis about this by doing an AutoSuppressGCAnalysis higher up in the stack, when we still know IsNurseryAllocable is going to be false. But I'm still trying to fix the hazard bug that is once again preventing it from reporting this problem.

Thanks, Jon!
Attachment #8748106 - Flags: review?(sphink) → review+
(Assignee)

Comment 11

a year ago
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Difficult and would require triggering OOM at just the right moment.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  No.

Which older supported branches are affected by this flaw? Aurora only.

If not all supported branches, which bug introduced the flaw? Bug 1244412.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Same patch should apply.

How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8748106 - Flags: sec-approval?
(Assignee)

Updated

a year ago
Blocks: 1244412
(Assignee)

Comment 12

a year ago
(In reply to Jon Coppeard (:jonco) from comment #11)
> Which older supported branches are affected by this flaw? Aurora only.

It seems the regressing patch has been uplifted to beta too now.
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

Approval Request Comment: see comments 11 and 12.

[Feature/regressing bug #]: Bug 1244412
[User impact if declined]: theoretically exploitable UAF-type thing
[Describe test coverage new/current, TreeHerder]: this is probably showing up as an intermittent orange. There's an unreducible fuzz test that hits it.
[Risks and why]: low risk. It avoids doing some unnecessary and dangerous work, but it also asserts that the work was not needed.
[String/UUID change made/needed]: none
Attachment #8748106 - Flags: approval-mozilla-beta?
Attachment #8748106 - Flags: approval-mozilla-aurora?
sec-approval+ for trunk. I'm approving for others since it is early in the cycle.
status-firefox49: --- → affected
tracking-firefox46: + → ---
tracking-firefox49: --- → +
Attachment #8748106 - Flags: sec-approval?
Attachment #8748106 - Flags: sec-approval+
Attachment #8748106 - Flags: approval-mozilla-beta?
Attachment #8748106 - Flags: approval-mozilla-beta+
Attachment #8748106 - Flags: approval-mozilla-aurora?
Attachment #8748106 - Flags: approval-mozilla-aurora+
(Assignee)

Updated

a year ago
Keywords: leave-open
(Assignee)

Comment 15

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f957b27ddff
https://hg.mozilla.org/mozilla-central/rev/9f957b27ddff
Assignee: nobody → jcoppeard
status-firefox-esr45: --- → unaffected
https://hg.mozilla.org/releases/mozilla-aurora/rev/7229e6a595af
status-firefox48: affected → fixed
status-firefox49: affected → fixed
Target Milestone: --- → mozilla49
has problems uplifting to beta:

grafting 343376:7229e6a595af "Bug 1264300 - Don't evict the nursery unnecessarily in ZoneCellIter r=sfink a=abillings"
merging js/src/jsgcinlines.h
warning: conflicts while merging js/src/jsgcinlines.h! (edit, then use 'hg resolve --mark')
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 19

a year ago
(In reply to Carsten Book [:Tomcat] from comment #18)
The patch depends on subsequent changes made in bug 1259042.  I'll request uplift for that too.
Depends on: 1259042
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 20

a year ago
This also depends on the changes in bug 1263572.
Depends on: 1263572
Depends on: 1271110
(Assignee)

Comment 21

a year ago
So it seems like we do need to do a minor GC when discarding JIT code after all (see bug 1271110).  Is the issue in this bug confined to UpdateShapeTypeAndValue?  If so it seem like we should just fix the rooting there.
Flags: needinfo?(sphink)
It is not constrained to that function.

Right now, it leaks out into all pre barriers on anything at all. But I think that one is relatively easy to resolve.

I'm not sure how many other things are affected. I'll run it through the analysis.
You're right about that function, though; it is trivially rootable. (The only caller already has Rooteds, so it's not even adding any roots.)

I snuck that into the bug 1259850 patch, but it doesn't change anything here.

One reason that ~AutoClearTypeInferenceStateOnOOM needs to GC is that discardJitCode() needs to empty the store buffer (and it now does, again, as of bug 1271110.) Bug 1272167 seems to suggest that we might be able to do away with that requirement if we can use RelocatablePtr for everything stored in ICs, and run their destructors. Or jandem mentions allocating from the "per-Zone allocator" over there, but I don't understand what that means yet. Perhaps that makes it easier to run the destructors for just the types we need?

Alternatively, we could make yet another Ptr thing that modifies the post barriers to always insert the whole JSScript into the store buffer instead of pointing into the IonScript/BaselineScript. UnstablePtr or something. DontLookAtMePtr.
Flags: needinfo?(sphink)
Blocks: 1008341
(Assignee)

Comment 24

a year ago
(In reply to Steve Fink [:sfink] [:s:] from comment #23)
Another approach would be to make BaselineScript a GC thing.  This might actually be the simplest fix too.
(Assignee)

Comment 25

a year ago
(In reply to Jon Coppeard (:jonco) from comment #24
...and an even simpler approach might be delay freeing the memory until after the next minor GC, by putting it on a list somewhere.
(Assignee)

Comment 26

a year ago
Created attachment 8754418 [details] [diff] [review]
bug1264300-baseline-script

The problem is that there can be storebuffer entries pointing into the fallback stub space and we don't run any destructors when freeing this space.  This patch defers freeing the memory allocated to the fallback stub space until after the next minor GC, unless we are already collecting.

If you think this is OK I'll r? jandem too
Attachment #8754418 - Flags: review?(sphink)
(Assignee)

Comment 27

a year ago
Created attachment 8754419 [details] [diff] [review]
bug1264300-ion-script

Ditto for IonScript.  Also removes the problematic calls to evictNursery.
Attachment #8754419 - Flags: review?(sphink)
(Assignee)

Comment 28

a year ago
Comment on attachment 8754418 [details] [diff] [review]
bug1264300-baseline-script

So this causes xpcshell tests to crash while marking the storebuffer... cancelling review.
Attachment #8754418 - Flags: review?(sphink)
(Assignee)

Updated

a year ago
Attachment #8754419 - Flags: review?(sphink)
(Assignee)

Comment 29

a year ago
(In reply to Jon Coppeard (:jonco) from comment #28)
I fixed the xpcshell problem but now I get mochitest failures like "Timed out while polling clipboard for pasted data"... on Windows 7 only.
That's not your fault. That's from Win7 testing on AWS that's enabled on Try only at the moment.
(Assignee)

Comment 31

a year ago
Created attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Updated patches.
Attachment #8754418 - Attachment is obsolete: true
Attachment #8754419 - Attachment is obsolete: true
Attachment #8755482 - Flags: review?(sphink)
(Assignee)

Comment 32

a year ago
Created attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2
Attachment #8755483 - Flags: review?(sphink)
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------

Whoa, ok, given the existing machinery to postpone freeing until after sweeping, this fits in remarkably cleanly. This looks pretty solid.
Attachment #8755482 - Flags: review?(sphink) → review+
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2

Review of attachment 8755483 [details] [diff] [review]:
-----------------------------------------------------------------

It's so nice to see all those "evict nursery here because of complicated action at a distance" calls go.
Attachment #8755483 - Flags: review?(sphink) → review+
https://hg.mozilla.org/releases/mozilla-beta/rev/cb27eacbe04a
(This was the original patch that failed to uplift to beta in comment 18. Unsure what that means for the status flags in here, and whether the new patch will also need to be uplifted.)
status-firefox47: affected → fixed
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 36

a year ago
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Requesting additional review for JIT changes.
Attachment #8755482 - Flags: review?(jdemooij)
(Assignee)

Comment 37

a year ago
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2

Requesting additional review for JIT changes.
Attachment #8755483 - Flags: review?(jdemooij)
(Assignee)

Comment 38

a year ago
(In reply to Wes Kocher (:KWierso) from comment #35)
We should uplift the new patches too.
Flags: needinfo?(jcoppeard)
(Assignee)

Updated

a year ago
Duplicate of this bug: 1273355
(Reporter)

Comment 40

a year ago
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------

Nice idea. Thanks a lot for fixing this!

::: js/src/jit/BaselineJIT.h
@@ +607,5 @@
>  
> +namespace JS {
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>

Why do we need this DeletePolicy, I think that's mostly used for UniquePtr etc?

@@ +609,5 @@
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>
> +{
> +    DeletePolicy(JSRuntime* rt) : rt_(rt) {}

Nit: explicit
Attachment #8755482 - Flags: review?(jdemooij) → review+
(Reporter)

Updated

a year ago
Attachment #8755483 - Flags: review?(jdemooij) → review+
(Assignee)

Comment 41

a year ago
(In reply to Jan de Mooij [:jandem] from comment #40)
> Comment on attachment 8755482 [details] [diff] [review]

> Why do we need this DeletePolicy, I think that's mostly used for UniquePtr
> etc?

Yes, BaselineCompiler::compile uses a UniquePtr while initialising a BaselineScript.

> Nit: explicit

Thanks, that would have been another static analysis build failure for sure.
(Assignee)

Comment 42

a year ago
Created attachment 8756334 [details] [diff] [review]
bug1264300-defer-free-combined

Combined patch, carrying r+.
Attachment #8755482 - Attachment is obsolete: true
Attachment #8755483 - Attachment is obsolete: true
Attachment #8756334 - Flags: review+
(Assignee)

Comment 43

a year ago
Comment on attachment 8756334 [details] [diff] [review]
bug1264300-defer-free-combined

[Security approval request comment]
How easily could an exploit be constructed based on the patch? I'd say very difficult, this requires triggering OOM while type information is being swept lazily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? Everything back to 31.

If not all supported branches, which bug introduced the flaw? The problem this patch fixes goes all the way back to Bug 619558.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Backports should be straightforward.

How likely is this patch to cause regressions; how much testing does it need? Unlikely but we should let this bake for a little while before uplifting.
Attachment #8756334 - Flags: sec-approval?
sec-approval+ for trunk. You should ask for Aurora approval. I don't think this should go to Beta since we're too close to release.
Attachment #8756334 - Flags: sec-approval? → sec-approval+
(Assignee)

Comment 45

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/3dd0686489c6
https://hg.mozilla.org/mozilla-central/rev/3dd0686489c6
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main47+]
Group: javascript-core-security → core-security-release
(Assignee)

Comment 47

a year ago
Created attachment 8759123 [details] [diff] [review]
bug1264300-aurora

Approval Request Comment
[Feature/regressing bug #]: Bug 619558
[User impact if declined]: Potential crash / security vulnerability.
[Describe test coverage new/current, TreeHerder]: On central since 26th May.
[Risks and why]: Low.
[String/UUID change made/needed]: None
Attachment #8759123 - Flags: review+
Attachment #8759123 - Flags: approval-mozilla-aurora?
Depends on: 1277545
Comment on attachment 8759123 [details] [diff] [review]
bug1264300-aurora

Fix a crash, taking it
Attachment #8759123 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 49

a year ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/5c8f37e61ac7

Updated

10 months ago
Duplicate of this bug: 1272167
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.