Closed Bug 986843 Opened 6 years ago Closed 6 years ago

AutoHoldZone is a hazard with IGC

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31
Tracking Status
firefox28 --- wontfix
firefox29 --- fixed
firefox30 --- fixed
firefox31 --- fixed
firefox-esr24 29+ fixed
b2g18 --- wontfix
b2g-v1.1hd --- wontfix
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: bholley, Assigned: jonco)

References

Details

(Keywords: csectype-uaf, sec-high, Whiteboard: [qa-][adv-main29+][adv-esr24.5+])

Attachments

(2 files, 1 obsolete file)

I spent a few hours mulling over bug 960828 with sfink, terrence, and jonco yesterday. We were still stuck when we left off, but I had an epiphany this morning, and think I have this figured out.

The key point here is that the global doesn't actually get collected (that's why we never see the finalizer running). But but the global's _compartment_ does. So global->global() (the culprit, as identified in bug 960828 comment 72), which compiles down to global()->compartment()->maybeGlobal(), returns garbage, rather than identity.

The scenario goes like this. XPConnect invokes JS_NewGlobalObject, whose first act of business is to invoke js::NewCompartment. This creates a compartment, and cannot GC. Then, before the call to GlobalObject::create (which can GC), we instantiate an AutoHoldZone instance. Once the global object is created, it is rooted (by virtue of either being on the stack or being in a Rooted<>, depending on the rooting configuration), and therefore its compartment is held alive too.

But AutoHoldZone isn't actually good enough. It prevents the zone (and therefore the compartment) from being swept, but it doesn't actually mark anything. So if we happen to begin a GC at the very beginning of GlobalObject::create (before the global itself is actually allocated), we'll do the marking phase, and fail to mark the freshly-allocated compartment. We then return from JS_NewGlobalObject (at which point the AutoHoldZone is out of scope), and at some point later finish up the incremental sweeping. SweepZones invokes SweepCompartments, which finds the compartment unmarked, and deletes it. Boom.

We could mark AutoHoldZone smarter, but it might be simpler to just manually GC (if we're getting close) at the beginning of JS_NewGlobalObject, and then use an AutoLockGC for the rest of the call, since this is touchy stuff.

GC hackers - thoughts? Once we agree on the approach, I'll go ahead and write the patch.
Flags: needinfo?(jcoppeard)
OS: Mac OS X → All
Hardware: x86 → All
Nice investigation, Bobby!  Once there's a patch ready, we probably want to land this everywhere, as it might help with various very intermittent JS crashes touching dead objects, like bug 984101.
FWIW, AutoHoldZone was added in bug 639270.
Jon and I discussed this IRL last night and we think we should have |class AutoCompartmentRooter : public CustomAutoRooter<JSCompartment*>| and just forward the trace method. Seems much less difficult than messing with GC timings.
Sounds good. I'll write a patch.
Attached patch bug986843-removeAutoHoldZone (obsolete) — Splinter Review
bholley, sorry I didn't see your last comment and wrote this patch!

I wasn't able to come up with a testcase to reproduce this though -- if anyone has any ideas it would be great if we could test this.  I was trying creating new globals with different frequency values for gczeal 9.
Attachment #8395667 - Flags: review?(terrence)
Flags: needinfo?(jcoppeard)
Comment on attachment 8395667 [details] [diff] [review]
bug986843-removeAutoHoldZone

Review of attachment 8395667 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +2494,3 @@
>      Rooted<GlobalObject *> global(cx);
>      {
> +        AutoCompartmentRooter hold(cx, compartment);

This scoping worries me. How about just giving AutoCompartmentRooter an operator JSCompartment*() doing:

AutoCompartmentRooter compartment(cx, NewCompartment(...))?
Comment on attachment 8395667 [details] [diff] [review]
bug986843-removeAutoHoldZone

Review of attachment 8395667 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/jsapi.cpp
@@ -2410,5 @@
>  {
>    public:
> -    explicit AutoHoldZone(Zone *zone
> -                          MOZ_GUARD_OBJECT_NOTIFIER_PARAM)
> -      : holdp(&zone->hold)

We should be able to kill JSCompartment::hold now too.
Attachment #8395667 - Flags: review?(terrence) → review+
Patch updated with comments.

Also I renamed the confusing one argument overload of JSCompartment::mark() to markRoots() leaving the no-argument version to set the compartment's mark flag.
Attachment #8395667 - Attachment is obsolete: true
Attachment #8395785 - Flags: review+
This should have had sec-approval before landing no?
https://wiki.mozilla.org/Security/Bug_Approval_Process

Also, does this affect B2G?
Flags: needinfo?(jcoppeard)
Yes on both. ;)
Keywords: leave-open
Whiteboard: [leave open]
Ah, I'm sorry, I didn't realise that!

It's worth pointing out that patch this doesn't necessarily fix the problem, but it's a nice cleanup anyway.

Here's the approval request details:

How easily could an exploit be constructed based on the patch?

I think it would be hard to create an exploit based on this as the GC would have to happen at exactly the right point.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The patch is a tidy up and removes some unnecessary functionality, so it's not obvious that this is a security fix.

Which older supported branches are affected by this flaw?

All of them - this code was introduced in 2011 in bug 639270.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Should be trivial.

How likely is this patch to cause regressions; how much testing does it need?

Unlikely to cause regressions.
Flags: needinfo?(jcoppeard)
It looks like bug 960828 hasn't happened on inbound since it landed yesterday, which is promising.
Yeah, bug 960828 hasn't happened anywhere since this happened, so it looks like this worked.  Awesome!  Can you nominate for landing this on branches, Jon?  Thanks.  It seems like you can also close both bugs.
Assignee: bobbyholley → jcoppeard
Flags: needinfo?(jcoppeard)
Duplicate of this bug: 960828
Comment on attachment 8395785 [details] [diff] [review]
bug986843-removeAutoHoldZone

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 639270
User impact if declined: Potential security issue
Testing completed: On mozilla-central for a week.
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #8395785 - Flags: approval-mozilla-esr24?
Attachment #8395785 - Flags: approval-mozilla-beta?
Attachment #8395785 - Flags: approval-mozilla-b2g28?
Attachment #8395785 - Flags: approval-mozilla-b2g26?
Attachment #8395785 - Flags: approval-mozilla-aurora?
Flags: needinfo?(jcoppeard)
https://hg.mozilla.org/mozilla-central/rev/e05b578dcd6d
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment on attachment 8395785 [details] [diff] [review]
bug986843-removeAutoHoldZone

Per https://wiki.mozilla.org/Release_Management/B2G_Landing, this will land on the B2G branches once approved and landed on the other Gecko branches.
Attachment #8395785 - Flags: approval-mozilla-b2g28?
Attachment #8395785 - Flags: approval-mozilla-b2g26?
Attachment #8395785 - Flags: approval-mozilla-beta?
Attachment #8395785 - Flags: approval-mozilla-beta+
Attachment #8395785 - Flags: approval-mozilla-aurora?
Attachment #8395785 - Flags: approval-mozilla-aurora+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #18)
> Comment on attachment 8395785 [details] [diff] [review]
> bug986843-removeAutoHoldZone
> 
> Per https://wiki.mozilla.org/Release_Management/B2G_Landing, this will land
> on the B2G branches once approved and landed on the other Gecko branches.

Does the "will land" part here mean that this process happens automatically by virtue of having landed on aurora/beta?
From the link: "sec-high and sec-critical patches have automatic approval to land if the fix has landed on all affected Firefox branches"

And the bottom of the page has the bug queries that are looked at daily for such uplifts ;)
Attachment #8395785 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
FYI, Ryan will do landings on older branches for you.  He has some queries that look for things with aurora approval set, so you don't even need to set checkin-needed.

If you do want to land stuff yourself, then you need to set the status once you land (except on inbound).
> aurora approval
and similarly for all the other branches, obviously...
(In reply to Jon Coppeard (:jonco) from comment #23)
> https://hg.mozilla.org/releases/mozilla-esr24/rev/42531d5e6ca2

Whilst this has an intermittent present, there seems to be other failures too:
https://tbpl.mozilla.org/php/getParsedLog.php?id=37140911&tree=Mozilla-Esr24

have retriggered.
I've backed this out of mozilla-esr24 as the test failures seem related to the patch:

https://hg.mozilla.org/releases/mozilla-esr24/rev/32160fd34f6b
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The assertion that's triggering is complaining that a compartment is not being destroyed even though the zone contains no allocated GC things.  The compartment is not being destroyed because it is marked by AutoCompartmentRooter.

This is in fact an allowable state now - the point of adding AutoCompartmentRooter was to mark the compartment in case we start a GC before we manage to create its global.

It's not happening on recent versions because the code that creates new type objects now suppresses GC for part of its operation and allocates GC things into the zone without the possibility of triggering GC.

I think the fix for this is to also not destroy zones if they contain any marked compartments, even if they don't contain any GC things.  I'm currently testing a patch for this.

I also think it's worth applying this change to mozilla-central (and everywhere else) since it's only an accident of the current arrangement of the object creation code that this doesn't happen there.
Patch to change SweepZones() not destroy a zone if it contains any marked compartments, even if that zone doesn't contain any GC things.
Attachment #8401750 - Flags: review?(terrence)
Comment on attachment 8401750 [details] [diff] [review]
bug986843-dontSweepMarkedZones

Review of attachment 8401750 [details] [diff] [review]:
-----------------------------------------------------------------

r=me  Clear ownership semantics for compartments? What novelty!
Attachment #8401750 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/ded76bd93630
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
QA Contact: andrei.vaida
QA Contact: andrei.vaida
Does this fix require manual QA? If that is the case, could you please advise on how to verify it manually?
Flags: needinfo?(jcoppeard)
I don't know what the criteria are for requiring manual QA, but this would be hard to test manually as it was very intermittent in the first place, so I'd say not.
Flags: needinfo?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #38)
> I don't know what the criteria are for requiring manual QA, but this would
> be hard to test manually as it was very intermittent in the first place, so
> I'd say not.
Thanks for the quick reply, Jon! Marking this as [qa-] based on Comment 38.
Whiteboard: [qa-]
Group: dom-core-security → core-security
Whiteboard: [qa-] → [qa-][adv-main29+][adv-esr24.5+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.