Make sure the AnimationManager/TransitionManager objects stay alive for the duration their DispatchEvents() calls

RESOLVED FIXED in Firefox 45

Status

()

defect
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: dholbert, Assigned: smaug)

Tracking

({sec-high})

Trunk
mozilla47
Points:
---

Firefox Tracking Flags

(firefox45+ fixed, firefox46+ fixed, firefox47+ fixed, firefox-esr3845+ fixed, firefox-esr45 fixed)

Details

(Whiteboard: [adv-main45+][adv-esr38.7+][post-critsmash-triage])

Attachments

(2 attachments)

(Reporter)

Description

3 years ago
I have some concerns about the code that we fixed in bug 1211334, and the way we fixed it.

In particular:
 - Bug 1211334 comment 1 suggests that we were crashing due to calling context->AnimationManager()->DispatchEvents(), with a null AnimationManager() pointer.
 - As noted in that comment, this suggests that the context->AnimationManager() was changing to null *while we were running the TransitionManager's DispatchEvents() call*.
 - This would happen via a call to nsPresContext::SetShell(nullptr), which nsPresShell calls in its Destroy method. So we're hitting this code:
> 1187     if (mTransitionManager) {
> 1188       mTransitionManager->Disconnect();
> 1189       mTransitionManager = nullptr;
> 1190     }
> 1191     if (mAnimationManager) {
> 1192       mAnimationManager->Disconnect();
> 1193       mAnimationManager = nullptr;
> 1194     }

These manager pointers are RefPtr<>s, so when we clear them here, we might be destroying these objects.  This happens *while we're inside of a call to a TransitionManager method* (DispatchEvents).  These MXR searches suggest to me that there are no other owning references to these objects (though I'm not 100% sure), which means these nulling-out operations probably delete these objects:
http://mxr.mozilla.org/mozilla-central/search?string=RefPtr.*AnimationManager&regexp=1
http://mxr.mozilla.org/mozilla-central/search?string=RefPtr.*TransitionManager&regexp=1

So, it looks to me like we could end up deleting the TransitionManager partway through its DispatchEvents method, which suggests that DispatchEvents could end up using deleted data for the remainder of its run.

(This same problem could also happen during TransitionManager::DispatchEvents.)
(Reporter)

Updated

3 years ago
Summary: Make sure the AnimationManager/TransitionManager objects stay alive for the duration their >DispatchEvents() calls → Make sure the AnimationManager/TransitionManager objects stay alive for the duration their DispatchEvents() calls
(Assignee)

Comment 1

3 years ago
We end up calling
http://mxr.mozilla.org/mozilla-central/source/layout/style/AnimationCommon.h#317
If I read the code right, everything else there is safe, except prescontext usage.
The method takes a reference to something in the owning manager, and that owner might die.

So yes, smells like a sec-critical to me.

And we should probably land the fix for this at the same time as the fix for Bug 1243583.
(Assignee)

Comment 2

3 years ago
I guess the easiest fix would be to add kungfuDeathGrip to DispatchEvents methods before calling 
mEventDispatcher.DispatchEvents(mPresContext);
(Assignee)

Comment 3

3 years ago
It is possibly a bit hard to write a testcase for this though, since both transition manager and animation manager are cycle collectable objects, so snowwhite killer will call delete on them asynchronously.
But if some event listener uses sync XHR or calls alert() or something, that spins event loop, the object can be deleted while the events are still being dispatched.
(Reporter)

Comment 4

3 years ago
(In reply to Daniel Holbert [:dholbert] from comment #0)
> So, it looks to me like we could end up deleting the TransitionManager
> partway through its DispatchEvents method, which suggests that
> DispatchEvents could end up using deleted data for the remainder of its run.
> 
> (This same problem could also happen during
> TransitionManager::DispatchEvents.)

Sorry -- in that final parenthetical, I meant to say AnimationManager. (Both AnimationManager and TransitionManager seem to be able to die during their DispatchEvents methods.)
(Reporter)

Comment 5

3 years ago
(In reply to Olli Pettay [:smaug] from comment #1)
> We end up calling
> http://mxr.mozilla.org/mozilla-central/source/layout/style/AnimationCommon.h#317
> If I read the code right, everything else there is safe [...]

(This is the DispatchEvents method in DelayedEventDispatcher.)

Mmm, I suppose you're right - at first glance I was concerned with the "mPendingEvents" usage, but looking more closely, we drain mPendingEvents into a local variable up-front, and we don't use any member-variables after that point.

> [...] except prescontext usage. The method takes a reference to something in the owning manager, and that
> owner might die.

In the call from nsRefreshDriver::DispatchAnimationEvents() at least, we have a local RefPtr<nsPresContext> keeping the pres-context alive, so I suspect we're fine there (as long as this RefPtr is holding the same nsPresContext as the one that gets passed into DispatchEvents -- and I think we're guaranteed that it is).

Do you agree that this callsite looks OK?

(There is one other callsite which might *not* be OK -- PresShell::FlushPendingNotifications makes a call to   mPresContext->AnimationManager()->DispatchEvents()...
  http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#3992
...which could destroy stuff; and there, we *don't* have a local RefPtr keeping the nsPresContext alive for the duration of the DispatchEvents call.)
(Reporter)

Updated

3 years ago
Flags: needinfo?(bugs)
Good catch.  The "reference to prescontext member" there assumes the object
stays alive.  The kungfuDeathGrip fix that Olli suggested seems the least risky
considering we probably want to uplift it to branches.

That "reference to member" thing feels like a code smell.  I suspect we can
remove CommonAnimationManager::mPresContext though, since it's mainly used for
the purpose of checking if it's been nulled out.  The other uses in Transition /
AnimationManager can probably get a prescontext directly from the nsStyleContext
object they process.

Then we can change DispatchEvents() to take a prescontext pointer that it
does a kungfuDeathGrip on itself.  It can check GetPresShell()==null
to do the early return.
(In reply to Daniel Holbert [:dholbert] from comment #5)
> In the call from nsRefreshDriver::DispatchAnimationEvents() at least, we
> have a local RefPtr<nsPresContext> keeping the pres-context alive, so I
> suspect we're fine there (as long as this RefPtr is holding the same
> nsPresContext as the one that gets passed into DispatchEvents -- and I think
> we're guaranteed that it is).

I don't think that helps.  Even if the nsPresContext is alive, someone might
call SetShell(nullptr) on it which wipes out the strong refs to these managers.
So the "reference to member" is now pointing into deallocated memory which
might be reallocated and filled...
(Reporter)

Comment 8

3 years ago
Ah, thank you - I missed the fact that our nsPresContext *pointer* we're using here is a member-var on a maybe-deleted object (even if the nsPresContext itself were still alive).
(Assignee)

Comment 9

3 years ago
(Mats answered here already. Clearing needinfo.)
Flags: needinfo?(bugs)
(In reply to Daniel Holbert [:dholbert] from comment #5)
> (There is one other callsite which might *not* be OK --
> PresShell::FlushPendingNotifications makes a call to  
> mPresContext->AnimationManager()->DispatchEvents()...
>  
> http://mxr.mozilla.org/mozilla-central/source/layout/base/nsPresShell.cpp#3992
> ...which could destroy stuff; and there, we *don't* have a local RefPtr
> keeping the nsPresContext alive for the duration of the DispatchEvents call.)

Yeah, it's a bit odd that we don't have a kungfuDeathGrip on mPresContext there.
I guess we expect each of the called methods to manage that on their own if they
need it.

I think making DelayedEventDispatcher::DispatchEvents resilient against
the prescontext/shell/Transition/AnimationManager going away is probably
a good idea either way.
(Assignee)

Comment 11

3 years ago
This should be the simplest and safest possible patch for branches, I think.
Attachment #8713806 - Flags: review?(mats)
Comment on attachment 8713806 [details] [diff] [review]
kungfudeathgrip

Thanks.  I think this is a safe fix for branches.
Attachment #8713806 - Flags: review?(mats) → review+
(Assignee)

Comment 13

3 years ago
Looks like esr38 needs a tiny bit different patch.
(Assignee)

Comment 14

3 years ago
Posted patch esr38Splinter Review
Assignee: nobody → bugs
Attachment #8713834 - Flags: review?(mats)
Comment on attachment 8713834 [details] [diff] [review]
esr38

It looks like this aspect of the code works in the same way here; the code have
just been shuffled around somewhat since then.  So this fix looks fine to me.
Attachment #8713834 - Flags: review?(mats) → review+
FTR, it looks like bug 1211334 wasn't fixed, and can occur, on esr38.
The calls comes from here on that branch:
http://mxr.mozilla.org/mozilla-esr38/source/layout/base/nsPresShell.cpp#4298
So 'mPresContext' can become null after each of the Flush* calls there.
AFAICT, it should be a safe null-pointer crash though, so I don't think
we need to fix that on esr38.
(Assignee)

Comment 17

3 years ago
Comment on attachment 8713834 [details] [diff] [review]
esr38

[Approval Request Comment]
User impact if declined: 
Possible sec-critical crashes
Fix Landed on Version:
Hopefully will land on 47 soon
Risk to taking this patch (and alternatives if risky):
This should be very safe 
String or UUID changes made by this patch: 
NA
Attachment #8713834 - Flags: approval-mozilla-esr38?
(Assignee)

Comment 18

3 years ago
Comment on attachment 8713806 [details] [diff] [review]
kungfudeathgrip


[Security approval request comment]
How easily could an exploit be constructed based on the patch?
kungFuDeathGrip kind of pinpoints what the issue is.
But we don't have an exploit for this, this is all based on code inspection.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
commit could be something vague like 
"Bug 1244250, make sure to dispatch all the needed transition and animation events, r=mats"

Which older supported branches are affected by this flaw?
esr38, aurora, beta

If not all supported branches, which bug introduced the flaw?
(didn't look at how long we've had this. I assume quite awhile)

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
esr38 needs a separate patch. That is already reviewed, and just puts the same code in a bit different place.

How likely is this patch to cause regressions; how much testing does it need?
Should be very safe.
Attachment #8713806 - Flags: sec-approval?
Attachment #8713806 - Flags: approval-mozilla-beta?
Attachment #8713806 - Flags: approval-mozilla-aurora?
Marking 45 and 46 as affected, tracking on the likelihood that this may be sec-critical.
Keywords: sec-high
sec-approval+ for checkin on February 9. We should take it on Aurora, Beta, and ESR38 after trunk is good.
Whiteboard: [checkin on 2/9]
Attachment #8713834 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Attachment #8713806 - Flags: sec-approval?
Attachment #8713806 - Flags: sec-approval+
Attachment #8713806 - Flags: approval-mozilla-beta?
Attachment #8713806 - Flags: approval-mozilla-beta+
Attachment #8713806 - Flags: approval-mozilla-aurora?
Attachment #8713806 - Flags: approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [checkin on 2/9]
https://hg.mozilla.org/mozilla-central/rev/5c4a80fca486
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Group: layout-core-security → core-security-release
Whiteboard: [adv-main45+][adv-esr38.7+]
Whiteboard: [adv-main45+][adv-esr38.7+] → [adv-main45+][adv-esr38.7+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.