Closed Bug 1335450 Opened 7 years ago Closed 7 years ago

Crash in RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom

Categories

(Core :: Web Painting, defect)

45 Branch
x86
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox-esr45 52+ fixed
firefox51 --- wontfix
firefox52 + fixed
firefox-esr52 --- fixed
firefox53 + fixed
firefox54 + fixed

People

(Reporter: jesup, Assigned: bzbarsky)

References

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main52+][adv-esr45.8+])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-1dcb9d26-8bbf-4c76-a141-91fcd2170131.
=============================================================

UAF in code called from TakeFrameRequestCallbacksFrom() (which was added or last changed in Bug 1145439).  Called from the RefreshDriver tick.  

High frequency crash; 300 in the last week.

Almost all crashes are in 50 or newer.  Spiked in mid-December to current levels.  There are some older crashes, but at very low frequencies:

https://crash-stats.mozilla.com/signature/?product=Firefox&address=~e5e5&signature=RefPtr%3CT%3E%3A%3ARefPtr%3CT%3E%20%7C%20nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20TakeFrameRequestCallbacksFrom&date=%3E%3D2016-10-31T16%3A11%3A53.000Z&date=%3C2017-01-31T16%3A11%3A53.000Z#graphs
Group: core-security → layout-core-security
Component: Layout → Layout: Web Painting
We're looping through mThrottledFrameRequestCallbackDocs, which is an array of raw nsIDocument pointers and it looks like we crash trying to access the document pointer. It seems like there is nothing preventing documents in mThrottledFrameRequestCallbackDocs from dying. Or am I missing something obvious that guarantees they are live?
Looking at the initial changeset that added the raw nsIDocument* pointer array (called mAnimationFrameListenerDocs at the time): https://hg.mozilla.org/mozilla-central/rev/c05c386324c0 it looks like the idea was that documents call RevokeAnimationFrameListeners before they go away, from nsDocument::DeleteShell(). Maybe that's not happening the way it should.
Tracking for 51+ since this is rated sec-critical. 
Markus, do you think there is anything actionable here? Are you still investigating?
Flags: needinfo?(mstange)
I'm not investigating at the moment. Timothy, do you have thoughts on comment 2?
Flags: needinfo?(mstange) → needinfo?(tnikkel)
Comment 2 is right, comment 1 missed that.

The places in nsDocument where frame callbacks are revoked are in several different places and have different conditions. Someone more familiar with nsDocument could take a look to see if they see any obvious mistakes there.
Flags: needinfo?(tnikkel)
bz, can you help here? Thanks!
Flags: needinfo?(bzbarsky)
Comment 2 is right.  Documents with no presshell should not be in this list.  Documents that had a presshell should not be getting destroyed without the presshell being destroyed; for one thing the presshell holds a strong ref to the document.

In general, the idea is that a document is registered with the refresh driver only when all the following are true:

1) mPresShell != nullptr
2) IsEventHandlingEnabled()
3) !AnimationsPaused()
4) !mFrameRequestCallbacks.IsEmpty()

Let's look through places that might affect those conditions:

* mPresShell being non-null.  This can only change in
  nsDocument::DeleteShell and nsDocument::CreateShell. 
 * DeleteShell calls RevokeAnimationFrameNotifications() if 
   IsEventHandlingEnabled() && !AnimationsPaused() (and recall that our 
   invariant is that if either condition were false then the document is
   not registered with the refresh driver to start with).
   RevokeAnimationFrameNotifications only unregisters if !mFrameRequestCallbacks.IsEmpty().
 * CreateShell calls MaybeRescheduleAnimationFrameNotifications() which checks mPresShell and
   !mFrameRequestCallbacks.IsEmpty() and IsEventHandlingEnabled() but NOT !AnimationsPaused().

* IsEventHandlingEnabled being true.  This can change due to mEventsSuppressed changing or
  mScriptGlobalObject changing.
 * mEventsSuppressed can become nonzero in nsDocument::SuppressEventHandling, which
   calls RevokeAnimationFrameNotifications if we're transitioning to either
   !IsEventHandlingEnabled() or AnimationsPaused().
 * mEventsSuppressed can become zero in DecreaseEventSuppression, which calls
   MaybeRescheduleAnimationFrameNotifications().
 * mScriptGlobalObject can change in nsDocument::SetScriptGlobalObject which calls
   MaybeRescheduleAnimationFrameNotifications() when going non-null and calls
   RevokeAnimationFrameNotifications if 
   mPresShell && !EventHandlingSuppressed() && !AnimationsPaused()

* AnimationsPaused() being false.  This can change in SuppressEventHandling() or
  ResumeAnimations().  We already looked at the former; the latter calls
  MaybeRescheduleAnimationFrameNotifications().

* mFrameRequestCallbacks.IsEmpty() being false.  This can change in
  ScheduleFrameRequestCallback(), which calls  ScheduleFrameRequestCallbacks
  if mPresShell && IsEventHandlingEnabled() && !AnimationsPaused() and
  transitioning to a nonzero count.  It can also change in
  nsIDocument::CancelFrameRequestCallback( which will remove if
  mPresShell && IsEventHandlingEnabled() (but does NOT check AnimationsPaused()).

So at first glance there are two bugs:

1) nsIDocument::CancelFrameRequestCallback should not call RevokeFrameRequestCallbacks if AnimationsPaused(), because there is no need: they're already revoked.

2) MaybeRescheduleAnimationFrameNotifications should check AnimationsPaused().

But #2 is presumably what's causing problems here.  If we pause animations and _then_ create a presshell, then we will incorrectly register ourselves with the refresh driver.  If we then proceed to delete the presshell and kill the document, it won't get unregistered, and we can get these crashes.

I knew this all sounded familiar; see bug 1191942...
Flags: needinfo?(bzbarsky)
So the question is: when is AnimationsPaused() true?  It used to be the case that we'd do:

  mSuspendedDoc->SuppressEventHandling(nsIDocument::eAnimationsOnly);

in nsGlobalWindow::EnterModalState.  See bug 956704.  But bug 1316330 recently changed this to do:

      topDoc->SuppressEventHandling(nsIDocument::eEvents);

Doesn't this effectively regress bug 956704?

Anyway, bug 1316330 is on 53, as of 4 days ago.  With that landed, AnimationsPaused() is never true, and all that animation-pausing code is dead code and can go away.

So here's a question: did landing bug 1316330 on Aurora make this crash disappear in 53?
Flags: needinfo?(rjesup)
Flags: needinfo?(mrbkap)
Flags: needinfo?(jjong)
Flags: needinfo?(bugs)
I believe this should fix things; we probably want to backport this to various branches
Attachment #8837196 - Flags: review?(mrbkap)
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

[Security approval request comment]
How easily could an exploit be constructed based on the patch?  Not that easily.  I tried to do this in the past, in bug 1191942, and couldn't even get a reliable crash.  It's probably doable, but you have to get the timing and GC behavior and so forth _just_ right somehow.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  It's pretty clear from the patch that this has to do with frame request callbacks and the AnimationsPaused() condition.  Code searching will quickly lead one to see that the latter is only true in modal state.  And looking at ScheduleFrameRequestCallbacks will probably quickly show that a weak pointer is held to the document.  But apart from that, nothing in the patch really points out the issues.

Which older supported branches are affected by this flaw? All of them.

If not all supported branches, which bug introduced the flaw?  Bug 956704.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?  I expect this patch to cleanly apply everywhere; it certainly applies cleanly to ESR45.

How likely is this patch to cause regressions; how much testing does it need?  I _think_ regressions are fairly unlikely, but this code is finicky.  More testing is likely better.
Attachment #8837196 - Flags: sec-approval?
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: I expect this is sec:crit.
User impact if declined: Crashes 
Fix Landed on Version: Will be 54.
Risk to taking this patch (and alternatives if risky): Moderate, but I don't
   think there are any safer alternatives here, apart from having
   likely-exploitable crashes.
String or UUID changes made by this patch: None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 956704
[User impact if declined]: Crashes
[Is this code covered by automated tests?]: Not very well, no.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: It's unclear what the
   steps to reproduce are....
[List of other uplifts needed for the feature/fix]: No.
[Is the change risky?]: Moderately.
[Why is the change risky/not risky?]: It's touching pretty finicky code.  I
   _think_ my analysis in this bug, and my boolean tests, are both correct,
   but I wouldn't bet my life on it.
[String changes made/needed]: None.
Attachment #8837196 - Flags: approval-mozilla-release?
Attachment #8837196 - Flags: approval-mozilla-esr52?
Attachment #8837196 - Flags: approval-mozilla-esr45?
Attachment #8837196 - Flags: approval-mozilla-beta?
Attachment #8837196 - Flags: approval-mozilla-aurora?
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #8)

> So here's a question: did landing bug 1316330 on Aurora make this crash
> disappear in 53?

There weren't many crashes on aurora 53, but it looks like the last one was from a build from Feb. 11:

https://crash-stats.mozilla.com/signature/?version=53.0a2&signature=RefPtr%3CT%3E%3A%3ARefPtr%3CT%3E%20%7C%20nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20TakeFrameRequestCallbacksFrom&date=%3E%3D2017-01-31T20%3A47%3A27.000Z&date=%3C2017-02-14T20%3A47%3A27.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports

The patch from bug 1316330 landed on Aurora on Feb. 10th, which means it should be in the Feb. 11th build. So maybe that patch didn't fix the issue completely.
That suggests the patch I attached won't completely fix it either...
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #8)

> Anyway, bug 1316330 is on 53, as of 4 days ago.  With that landed,
> AnimationsPaused() is never true, and all that animation-pausing code is
> dead code and can go away.
> 
> So here's a question: did landing bug 1316330 on Aurora make this crash
> disappear in 53?

https://crash-stats.mozilla.com/report/index/ea83ab5a-d305-4abf-91e2-1ab192170212 says no, it didn't go away.  (BuildID 20170211xxxx, 3 days ago; 2.5 days before your comment).  You may want to verify if that fix is in that build or not.

(I think this is also what liz was saying)
Flags: needinfo?(rjesup)
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

Review of attachment 8837196 [details] [diff] [review]:
-----------------------------------------------------------------

Given comment 8, I don't understand how this patch fixes this bug.
Attachment #8837196 - Flags: review?(mrbkap)
Flags: needinfo?(mrbkap)
Should this go in even if this doesn't completely fix the bug? The last few comments leave me confused.
Flags: needinfo?(bzbarsky)
So here's my take on this:

1)  The logic in nsDocument is clearly buggy per comment 7.
2)  On m-c, that bugginess may be irrelevant because AnimationsPaused() is always false, afaict.
3)  The thing that makes AnimationsPaused() always be false is not on various
    branches (52 and earlier), so we absolutely need to fix the logic on those
    branches to have any hope of not hitting this crash there.
4)  Fixing the logic may not be enough to fix all the crashes,
    given comment 12 and comment 14.

I'm happy to spin off a separate bug for "fix the logic and backport everywhere" if people would prefer that.
Flags: needinfo?(bzbarsky)
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

See comment 18.
Attachment #8837196 - Flags: review?(mrbkap)
Attachment #8837196 - Flags: approval-mozilla-esr52?
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

sec-approval+.
I'll give aurora, and beta approval as well.
Attachment #8837196 - Flags: approval-mozilla-beta?
Attachment #8837196 - Flags: approval-mozilla-beta+
Attachment #8837196 - Flags: approval-mozilla-aurora?
Attachment #8837196 - Flags: approval-mozilla-aurora+
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

Review of attachment 8837196 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks. We should definitely file a followup at least for trunk to figure out if we can remove AnimationsPaused.
Attachment #8837196 - Flags: review?(mrbkap) → review+
Attachment #8837196 - Flags: sec-approval? → sec-approval+
https://hg.mozilla.org/mozilla-central/rev/492fa407d12c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
> We should definitely file a followup at least for trunk to figure out if we can remove AnimationsPaused.

Jessica filed bug 1340086.
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

Fix a sec-critical. ESR45+.
Attachment #8837196 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
Group: layout-core-security → core-security-release
Clearing ni since this has been resolved and bug 1340086 has landed too.
Flags: needinfo?(jjong)
Comment on attachment 8837196 [details] [diff] [review]
Fix our condition for rescheduling frame request callbacks

We don't have then plan to have dot release for 51. Rel51-.
Attachment #8837196 - Flags: approval-mozilla-release? → approval-mozilla-release-
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main52+][adv-esr45.8+]
See Also: → 1342823
Bug 1342823 is tracking the fact that this is still happening on trunk.  I have no idea why so far.  :(
Blocks: 1349110
Flags: needinfo?(bugs)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: