AutoHoldZone is a hazard with IGC

RESOLVED FIXED in Firefox 29

Status

()

defect
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bholley, Assigned: jonco)

Tracking

({csectype-uaf, sec-high})

unspecified
mozilla31
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox28 wontfix, firefox29 fixed, firefox30 fixed, firefox31 fixed, firefox-esr2429+ fixed, b2g18 wontfix, b2g-v1.1hd wontfix, b2g-v1.2 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

Details

(Whiteboard: [qa-][adv-main29+][adv-esr24.5+])

Attachments

(2 attachments, 1 obsolete attachment)

Reporter

Description

5 years ago
I spent a few hours mulling over bug 960828 with sfink, terrence, and jonco yesterday. We were still stuck when we left off, but I had an epiphany this morning, and think I have this figured out.

The key point here is that the global doesn't actually get collected (that's why we never see the finalizer running). But but the global's _compartment_ does. So global->global() (the culprit, as identified in bug 960828 comment 72), which compiles down to global()->compartment()->maybeGlobal(), returns garbage, rather than identity.

The scenario goes like this. XPConnect invokes JS_NewGlobalObject, whose first act of business is to invoke js::NewCompartment. This creates a compartment, and cannot GC. Then, before the call to GlobalObject::create (which can GC), we instantiate an AutoHoldZone instance. Once the global object is created, it is rooted (by virtue of either being on the stack or being in a Rooted<>, depending on the rooting configuration), and therefore its compartment is held alive too.

But AutoHoldZone isn't actually good enough. It prevents the zone (and therefore the compartment) from being swept, but it doesn't actually mark anything. So if we happen to begin a GC at the very beginning of GlobalObject::create (before the global itself is actually allocated), we'll do the marking phase, and fail to mark the freshly-allocated compartment. We then return from JS_NewGlobalObject (at which point the AutoHoldZone is out of scope), and at some point later finish up the incremental sweeping. SweepZones invokes SweepCompartments, which finds the compartment unmarked, and deletes it. Boom.

We could mark AutoHoldZone smarter, but it might be simpler to just manually GC (if we're getting close) at the beginning of JS_NewGlobalObject, and then use an AutoLockGC for the rest of the call, since this is touchy stuff.

GC hackers - thoughts? Once we agree on the approach, I'll go ahead and write the patch.
Reporter

Updated

5 years ago
Flags: needinfo?(jcoppeard)
OS: Mac OS X → All
Hardware: x86 → All
Nice investigation, Bobby!  Once there's a patch ready, we probably want to land this everywhere, as it might help with various very intermittent JS crashes touching dead objects, like bug 984101.
FWIW, AutoHoldZone was added in bug 639270.
Jon and I discussed this IRL last night and we think we should have |class AutoCompartmentRooter : public CustomAutoRooter<JSCompartment*>| and just forward the trace method. Seems much less difficult than messing with GC timings.
Reporter

Comment 4

5 years ago
Sounds good. I'll write a patch.
Assignee

Comment 5

5 years ago
Posted patch bug986843-removeAutoHoldZone (obsolete) — Splinter Review
bholley, sorry I didn't see your last comment and wrote this patch!

I wasn't able to come up with a testcase to reproduce this though -- if anyone has any ideas it would be great if we could test this.  I was trying creating new globals with different frequency values for gczeal 9.
Attachment #8395667 - Flags: review?(terrence)
Flags: needinfo?(jcoppeard)
Reporter

Comment 6

5 years ago
Comment on attachment 8395667 [details] [diff] [review]
bug986843-removeAutoHoldZone

Review of attachment 8395667 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +2494,3 @@
>      Rooted<GlobalObject *> global(cx);
>      {
> +        AutoCompartmentRooter hold(cx, compartment);

This scoping worries me. How about just giving AutoCompartmentRooter an operator JSCompartment*() doing:

AutoCompartmentRooter compartment(cx, NewCompartment(...))?
Comment on attachment 8395667 [details] [diff] [review]
bug986843-removeAutoHoldZone

Review of attachment 8395667 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/jsapi.cpp
@@ -2410,5 @@
>  {
>    public:
> -    explicit AutoHoldZone(Zone *zone
> -                          MOZ_GUARD_OBJECT_NOTIFIER_PARAM)
> -      : holdp(&zone->hold)

We should be able to kill JSCompartment::hold now too.
Attachment #8395667 - Flags: review?(terrence) → review+
Assignee

Comment 8

5 years ago
Patch updated with comments.

Also I renamed the confusing one argument overload of JSCompartment::mark() to markRoots() leaving the no-argument version to set the compartment's mark flag.
Attachment #8395667 - Attachment is obsolete: true
Attachment #8395785 - Flags: review+
This should have had sec-approval before landing no?
https://wiki.mozilla.org/Security/Bug_Approval_Process

Also, does this affect B2G?
Flags: needinfo?(jcoppeard)
Yes on both. ;)
Keywords: leave-open
Whiteboard: [leave open]
Assignee

Comment 12

5 years ago
Ah, I'm sorry, I didn't realise that!

It's worth pointing out that patch this doesn't necessarily fix the problem, but it's a nice cleanup anyway.

Here's the approval request details:

How easily could an exploit be constructed based on the patch?

I think it would be hard to create an exploit based on this as the GC would have to happen at exactly the right point.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

The patch is a tidy up and removes some unnecessary functionality, so it's not obvious that this is a security fix.

Which older supported branches are affected by this flaw?

All of them - this code was introduced in 2011 in bug 639270.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Should be trivial.

How likely is this patch to cause regressions; how much testing does it need?

Unlikely to cause regressions.
Flags: needinfo?(jcoppeard)
It looks like bug 960828 hasn't happened on inbound since it landed yesterday, which is promising.
Yeah, bug 960828 hasn't happened anywhere since this happened, so it looks like this worked.  Awesome!  Can you nominate for landing this on branches, Jon?  Thanks.  It seems like you can also close both bugs.
Assignee: bobbyholley → jcoppeard
Flags: needinfo?(jcoppeard)
Assignee

Updated

5 years ago
Duplicate of this bug: 960828
Assignee

Comment 16

5 years ago
Comment on attachment 8395785 [details] [diff] [review]
bug986843-removeAutoHoldZone

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 639270
User impact if declined: Potential security issue
Testing completed: On mozilla-central for a week.
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #8395785 - Flags: approval-mozilla-esr24?
Attachment #8395785 - Flags: approval-mozilla-beta?
Attachment #8395785 - Flags: approval-mozilla-b2g28?
Attachment #8395785 - Flags: approval-mozilla-b2g26?
Attachment #8395785 - Flags: approval-mozilla-aurora?
Flags: needinfo?(jcoppeard)
Assignee

Comment 17

5 years ago
https://hg.mozilla.org/mozilla-central/rev/e05b578dcd6d
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Comment on attachment 8395785 [details] [diff] [review]
bug986843-removeAutoHoldZone

Per https://wiki.mozilla.org/Release_Management/B2G_Landing, this will land on the B2G branches once approved and landed on the other Gecko branches.
Attachment #8395785 - Flags: approval-mozilla-b2g28?
Attachment #8395785 - Flags: approval-mozilla-b2g26?
Attachment #8395785 - Flags: approval-mozilla-beta?
Attachment #8395785 - Flags: approval-mozilla-beta+
Attachment #8395785 - Flags: approval-mozilla-aurora?
Attachment #8395785 - Flags: approval-mozilla-aurora+
Reporter

Comment 19

5 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #18)
> Comment on attachment 8395785 [details] [diff] [review]
> bug986843-removeAutoHoldZone
> 
> Per https://wiki.mozilla.org/Release_Management/B2G_Landing, this will land
> on the B2G branches once approved and landed on the other Gecko branches.

Does the "will land" part here mean that this process happens automatically by virtue of having landed on aurora/beta?
From the link: "sec-high and sec-critical patches have automatic approval to land if the fix has landed on all affected Firefox branches"

And the bottom of the page has the bug queries that are looked at daily for such uplifts ;)
Attachment #8395785 - Flags: approval-mozilla-esr24? → approval-mozilla-esr24+
FYI, Ryan will do landings on older branches for you.  He has some queries that look for things with aurora approval set, so you don't even need to set checkin-needed.

If you do want to land stuff yourself, then you need to set the status once you land (except on inbound).
> aurora approval
and similarly for all the other branches, obviously...
(In reply to Jon Coppeard (:jonco) from comment #23)
> https://hg.mozilla.org/releases/mozilla-esr24/rev/42531d5e6ca2

Whilst this has an intermittent present, there seems to be other failures too:
https://tbpl.mozilla.org/php/getParsedLog.php?id=37140911&tree=Mozilla-Esr24

have retriggered.
Assignee

Comment 28

5 years ago
I've backed this out of mozilla-esr24 as the test failures seem related to the patch:

https://hg.mozilla.org/releases/mozilla-esr24/rev/32160fd34f6b
Assignee

Updated

5 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee

Comment 31

5 years ago
The assertion that's triggering is complaining that a compartment is not being destroyed even though the zone contains no allocated GC things.  The compartment is not being destroyed because it is marked by AutoCompartmentRooter.

This is in fact an allowable state now - the point of adding AutoCompartmentRooter was to mark the compartment in case we start a GC before we manage to create its global.

It's not happening on recent versions because the code that creates new type objects now suppresses GC for part of its operation and allocates GC things into the zone without the possibility of triggering GC.

I think the fix for this is to also not destroy zones if they contain any marked compartments, even if they don't contain any GC things.  I'm currently testing a patch for this.

I also think it's worth applying this change to mozilla-central (and everywhere else) since it's only an accident of the current arrangement of the object creation code that this doesn't happen there.
Assignee

Comment 32

5 years ago
Patch to change SweepZones() not destroy a zone if it contains any marked compartments, even if that zone doesn't contain any GC things.
Attachment #8401750 - Flags: review?(terrence)
Comment on attachment 8401750 [details] [diff] [review]
bug986843-dontSweepMarkedZones

Review of attachment 8401750 [details] [diff] [review]:
-----------------------------------------------------------------

r=me  Clear ownership semantics for compartments? What novelty!
Attachment #8401750 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/ded76bd93630
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
QA Contact: andrei.vaida
QA Contact: andrei.vaida
Does this fix require manual QA? If that is the case, could you please advise on how to verify it manually?
Flags: needinfo?(jcoppeard)
Assignee

Comment 38

5 years ago
I don't know what the criteria are for requiring manual QA, but this would be hard to test manually as it was very intermittent in the first place, so I'd say not.
Flags: needinfo?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #38)
> I don't know what the criteria are for requiring manual QA, but this would
> be hard to test manually as it was very intermittent in the first place, so
> I'd say not.
Thanks for the quick reply, Jon! Marking this as [qa-] based on Comment 38.
Whiteboard: [qa-]
Group: dom-core-security → core-security
Whiteboard: [qa-] → [qa-][adv-main29+][adv-esr24.5+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.