Closed Bug 1264300 Opened 8 years ago Closed 8 years ago

Rooting hazard with Maybe<AutoClearTypeInferenceStateOnOOM> in ObjectGroup::sweep

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox46 --- wontfix
firefox47 + fixed
firefox48 + fixed
firefox49 + fixed
firefox-esr45 --- unaffected

People

(Reporter: jandem, Assigned: jonco)

References

(Blocks 1 open bug)

Details

(Keywords: sec-high, Whiteboard: [post-critsmash-triage][adv-main47+])

Attachments

(3 files, 4 obsolete files)

decoder found a hard-to-reduce OOM crash. I think what's happening is that ObjectGroup::sweep does:

  Maybe<AutoClearTypeInferenceStateOnOOM> fallbackOOM;

The rooting analysis doesn't know the destructor of this thing can GC, so various callers can call this with unrooted pointers on the stack.
Note that we have some (top)crashes in ObjectGroup::sweep and related code, so maybe fixing this will help there as well.
sfink, is there anything we can do to make the analysis understand the situation in comment 0, and related footguns?
Flags: needinfo?(sphink)
Hm, the GC stack looks like this:

* ~Maybe
* ~AutoClearTypeInferenceStateOnOOM
* Zone::discardJitCode
* ZoneCellIter::ZoneCellIter
* evictNursery
...

I'm not sure if it's caused by the Maybe<>, mrgiggles tells me Zone::discardJitCode and ~AutoClearTypeInferenceStateOnOOM also cannot GC, but maybe that's just because all callers currently suppress GC or something?
Too late for a fix for 46, tracking for 47+ since it's rated sec-high.
Ok, this is a weird one. There was a bug in the analysis where it lost track of the GC call possibility (due to an internally-generated destructor). I fixed that, and immediately got hazard reports for something that looks very much like this. But when I looked closer, it seemed to me like it was a false positive -- the things the analysis were worried about actually passed in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual practice it should never be able to GC. I "fixed" it by telling the analysis that when that oom* parameter was nullptr, that it wouldn't GC. But this bug suggests that the annotation is incorrect. (In a debug build, this annotation will now cause an assert crash if it really does GC, but given that it may only happen on OOM I'm skeptical that will help.)

Can I see a more complete crash stack? I want to know if this was from a pre write barrier, or a GC. Is SweepArenaList in the stack or not?
Flags: needinfo?(sphink) → needinfo?(jdemooij)
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> But when I looked closer, it seemed to me like it was a
> false positive -- the things the analysis were worried about actually passed
> in nullptr for the AutoClearTypeInferenceStateOnOOM* parameter, so in actual
> practice it should never be able to GC.

I think most of these functions construct an AutoClearTypeInferenceStateOnOOM "on demand" if their |oom| argument is nullptr. For the case I was looking at we did it here I think:

https://dxr.mozilla.org/mozilla-central/rev/6adc822f5e27a55551faeb6c47a9bd8b0859a23b/js/src/vm/TypeInference.cpp#4190-4191

See also the stack in comment 3, the ~Maybe was the one in ObjectGroup::sweep IIRC. I don't have the test handy but I can try to reproduce again on Linux if really needed.
Flags: needinfo?(jdemooij) → needinfo?(sphink)
Ok, seems like the stack the analysis was reporting wasn't possible in practice, but the stack here is.

It starts at NativeObject.cpp's UpdateShapeTypeAndValue which calls ObjectGroup::newScript() which ends up calling ObjectGroup::sweep(), which creates an on-demand AutoClearTypeInferenceStateOnOOM (ugh; I missed that). In its destructor, it can call discardJitCode, which makes a ZoneCellIter, which evicts the nursery. :(

In this particular case, we don't *need* to evict the nursery, because we're only iterating over scripts which are not currently nursery-allocated.
Flags: needinfo?(sphink)
(In reply to Steve Fink [:sfink] [:s:] from comment #7)
This has come about because I changed Zone::discardJitCode to use ZoneCellIter rather than ZoneCellIterUnderGC in bug 1244412 (it's not always called under GC).  Previously this OOM situation would have resulted in an assertion failure in debug builds, but would have worked (I think) in release builds.

We can fix ZoneCellIter to not evict the nursery unnecessarily which is worth doing for its own sake.  But it sounds like we should also fix the analysis as well.
Only evict the nursery when iterating nursery allocated things.
Attachment #8748106 - Flags: review?(sphink)
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

Review of attachment 8748106 [details] [diff] [review]:
-----------------------------------------------------------------

Heh. This is what I finally figured out that we'd want, and then I saw your review request. Perfect!

I think I'll probably tell the analysis about this by doing an AutoSuppressGCAnalysis higher up in the stack, when we still know IsNurseryAllocable is going to be false. But I'm still trying to fix the hazard bug that is once again preventing it from reporting this problem.

Thanks, Jon!
Attachment #8748106 - Flags: review?(sphink) → review+
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Difficult and would require triggering OOM at just the right moment.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  No.

Which older supported branches are affected by this flaw? Aurora only.

If not all supported branches, which bug introduced the flaw? Bug 1244412.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Same patch should apply.

How likely is this patch to cause regressions; how much testing does it need? Unlikely.
Attachment #8748106 - Flags: sec-approval?
Blocks: 1244412
(In reply to Jon Coppeard (:jonco) from comment #11)
> Which older supported branches are affected by this flaw? Aurora only.

It seems the regressing patch has been uplifted to beta too now.
Comment on attachment 8748106 [details] [diff] [review]
bug1264300-dont-evict-unnecessarily

Approval Request Comment: see comments 11 and 12.

[Feature/regressing bug #]: Bug 1244412
[User impact if declined]: theoretically exploitable UAF-type thing
[Describe test coverage new/current, TreeHerder]: this is probably showing up as an intermittent orange. There's an unreducible fuzz test that hits it.
[Risks and why]: low risk. It avoids doing some unnecessary and dangerous work, but it also asserts that the work was not needed.
[String/UUID change made/needed]: none
Attachment #8748106 - Flags: approval-mozilla-beta?
Attachment #8748106 - Flags: approval-mozilla-aurora?
sec-approval+ for trunk. I'm approving for others since it is early in the cycle.
Attachment #8748106 - Flags: sec-approval?
Attachment #8748106 - Flags: sec-approval+
Attachment #8748106 - Flags: approval-mozilla-beta?
Attachment #8748106 - Flags: approval-mozilla-beta+
Attachment #8748106 - Flags: approval-mozilla-aurora?
Attachment #8748106 - Flags: approval-mozilla-aurora+
Keywords: leave-open
has problems uplifting to beta:

grafting 343376:7229e6a595af "Bug 1264300 - Don't evict the nursery unnecessarily in ZoneCellIter r=sfink a=abillings"
merging js/src/jsgcinlines.h
warning: conflicts while merging js/src/jsgcinlines.h! (edit, then use 'hg resolve --mark')
Flags: needinfo?(jcoppeard)
(In reply to Carsten Book [:Tomcat] from comment #18)
The patch depends on subsequent changes made in bug 1259042.  I'll request uplift for that too.
Depends on: 1259042
Flags: needinfo?(jcoppeard)
This also depends on the changes in bug 1263572.
Depends on: 1263572
So it seems like we do need to do a minor GC when discarding JIT code after all (see bug 1271110).  Is the issue in this bug confined to UpdateShapeTypeAndValue?  If so it seem like we should just fix the rooting there.
Flags: needinfo?(sphink)
It is not constrained to that function.

Right now, it leaks out into all pre barriers on anything at all. But I think that one is relatively easy to resolve.

I'm not sure how many other things are affected. I'll run it through the analysis.
You're right about that function, though; it is trivially rootable. (The only caller already has Rooteds, so it's not even adding any roots.)

I snuck that into the bug 1259850 patch, but it doesn't change anything here.

One reason that ~AutoClearTypeInferenceStateOnOOM needs to GC is that discardJitCode() needs to empty the store buffer (and it now does, again, as of bug 1271110.) Bug 1272167 seems to suggest that we might be able to do away with that requirement if we can use RelocatablePtr for everything stored in ICs, and run their destructors. Or jandem mentions allocating from the "per-Zone allocator" over there, but I don't understand what that means yet. Perhaps that makes it easier to run the destructors for just the types we need?

Alternatively, we could make yet another Ptr thing that modifies the post barriers to always insert the whole JSScript into the store buffer instead of pointing into the IonScript/BaselineScript. UnstablePtr or something. DontLookAtMePtr.
Flags: needinfo?(sphink)
(In reply to Steve Fink [:sfink] [:s:] from comment #23)
Another approach would be to make BaselineScript a GC thing.  This might actually be the simplest fix too.
(In reply to Jon Coppeard (:jonco) from comment #24
...and an even simpler approach might be delay freeing the memory until after the next minor GC, by putting it on a list somewhere.
Attached patch bug1264300-baseline-script (obsolete) — Splinter Review
The problem is that there can be storebuffer entries pointing into the fallback stub space and we don't run any destructors when freeing this space.  This patch defers freeing the memory allocated to the fallback stub space until after the next minor GC, unless we are already collecting.

If you think this is OK I'll r? jandem too
Attachment #8754418 - Flags: review?(sphink)
Attached patch bug1264300-ion-script (obsolete) — Splinter Review
Ditto for IonScript.  Also removes the problematic calls to evictNursery.
Attachment #8754419 - Flags: review?(sphink)
Comment on attachment 8754418 [details] [diff] [review]
bug1264300-baseline-script

So this causes xpcshell tests to crash while marking the storebuffer... cancelling review.
Attachment #8754418 - Flags: review?(sphink)
Attachment #8754419 - Flags: review?(sphink)
(In reply to Jon Coppeard (:jonco) from comment #28)
I fixed the xpcshell problem but now I get mochitest failures like "Timed out while polling clipboard for pasted data"... on Windows 7 only.
That's not your fault. That's from Win7 testing on AWS that's enabled on Try only at the moment.
Attached patch bug1264300-baseline-script v2 (obsolete) — Splinter Review
Updated patches.
Attachment #8754418 - Attachment is obsolete: true
Attachment #8754419 - Attachment is obsolete: true
Attachment #8755482 - Flags: review?(sphink)
Attached patch bug1264300-ion-script v2 (obsolete) — Splinter Review
Attachment #8755483 - Flags: review?(sphink)
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------

Whoa, ok, given the existing machinery to postpone freeing until after sweeping, this fits in remarkably cleanly. This looks pretty solid.
Attachment #8755482 - Flags: review?(sphink) → review+
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2

Review of attachment 8755483 [details] [diff] [review]:
-----------------------------------------------------------------

It's so nice to see all those "evict nursery here because of complicated action at a distance" calls go.
Attachment #8755483 - Flags: review?(sphink) → review+
https://hg.mozilla.org/releases/mozilla-beta/rev/cb27eacbe04a
(This was the original patch that failed to uplift to beta in comment 18. Unsure what that means for the status flags in here, and whether the new patch will also need to be uplifted.)
Flags: needinfo?(jcoppeard)
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Requesting additional review for JIT changes.
Attachment #8755482 - Flags: review?(jdemooij)
Comment on attachment 8755483 [details] [diff] [review]
bug1264300-ion-script v2

Requesting additional review for JIT changes.
Attachment #8755483 - Flags: review?(jdemooij)
(In reply to Wes Kocher (:KWierso) from comment #35)
We should uplift the new patches too.
Flags: needinfo?(jcoppeard)
Comment on attachment 8755482 [details] [diff] [review]
bug1264300-baseline-script v2

Review of attachment 8755482 [details] [diff] [review]:
-----------------------------------------------------------------

Nice idea. Thanks a lot for fixing this!

::: js/src/jit/BaselineJIT.h
@@ +607,5 @@
>  
> +namespace JS {
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>

Why do we need this DeletePolicy, I think that's mostly used for UniquePtr etc?

@@ +609,5 @@
> +
> +template <>
> +struct DeletePolicy<js::jit::BaselineScript>
> +{
> +    DeletePolicy(JSRuntime* rt) : rt_(rt) {}

Nit: explicit
Attachment #8755482 - Flags: review?(jdemooij) → review+
Attachment #8755483 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #40)
> Comment on attachment 8755482 [details] [diff] [review]

> Why do we need this DeletePolicy, I think that's mostly used for UniquePtr
> etc?

Yes, BaselineCompiler::compile uses a UniquePtr while initialising a BaselineScript.

> Nit: explicit

Thanks, that would have been another static analysis build failure for sure.
Combined patch, carrying r+.
Attachment #8755482 - Attachment is obsolete: true
Attachment #8755483 - Attachment is obsolete: true
Attachment #8756334 - Flags: review+
Comment on attachment 8756334 [details] [diff] [review]
bug1264300-defer-free-combined

[Security approval request comment]
How easily could an exploit be constructed based on the patch? I'd say very difficult, this requires triggering OOM while type information is being swept lazily.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No.

Which older supported branches are affected by this flaw? Everything back to 31.

If not all supported branches, which bug introduced the flaw? The problem this patch fixes goes all the way back to Bug 619558.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Backports should be straightforward.

How likely is this patch to cause regressions; how much testing does it need? Unlikely but we should let this bake for a little while before uplifting.
Attachment #8756334 - Flags: sec-approval?
sec-approval+ for trunk. You should ask for Aurora approval. I don't think this should go to Beta since we're too close to release.
Attachment #8756334 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/mozilla-central/rev/3dd0686489c6
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main47+]
Group: javascript-core-security → core-security-release
Approval Request Comment
[Feature/regressing bug #]: Bug 619558
[User impact if declined]: Potential crash / security vulnerability.
[Describe test coverage new/current, TreeHerder]: On central since 26th May.
[Risks and why]: Low.
[String/UUID change made/needed]: None
Attachment #8759123 - Flags: review+
Attachment #8759123 - Flags: approval-mozilla-aurora?
Comment on attachment 8759123 [details] [diff] [review]
bug1264300-aurora

Fix a crash, taking it
Attachment #8759123 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Group: core-security-release
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.