Fix crash in bug 1321066 more thoroughly (so it doesn't crash at all)

RESOLVED FIXED in Firefox 51

Status

()

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dholbert, Assigned: birtles)

Tracking

({crash, sec-other, testcase})

Trunk
mozilla53
crash, sec-other, testcase
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox-esr45 wontfix, firefox50 wontfix, firefox51 fixed, firefox52 fixed, firefox53 fixed)

Details

(Whiteboard: [adv-main51-], crash signature, URL)

Attachments

(2 attachments, 6 obsolete attachments)

In bug 1321066, we crashed because:
 - We're iterating across a nsTArray, using a raw pointer to its Elements(). (Actually it's a nsTPriorityQueue, but that's backed by a nsTArray.)
 - During that iteration, we add entries to the array.
 - This ends up reallocating the array's buffer.
 - So from that point on, our iteration is walking through freed memory.

For now, we're taking a band-aid fix that detects the array-modification and safely crashes at that point.  This should prevent any badness. HOWEVER: we should dig in & fix this more thoroughly, so that the testcases don't crash at all.  I'm filing this bug on that part.

The testcases in bug 1321066 should serve as testcases for this bug as well.

POSSIBLE FIX STRATEGIES:
 * Strategy #1: If we're sure that these modifications aren't supposed to be possible during iteration: we need to figure out why they're happening & how to avoid them. (Note that we have a (debug-only) assertion that indicates that we expect the array to be un-changed during iteration right now. So I think this might be feasible.)
 * Strategy #2: Change nsTPriorityQueue (or at least the one used here) to be backed by a nsTObserverArray, which *is* safe to modify during iteration. (If we follow this strategy, we could perhaps remove the release-build assertions added in bug 1321066, since the modifications caught by those assertions would become safe.)

Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL milestone stuff is super-complicated, and you're the most likely to remember how it's supposed to fit together. :)  Assigning to you for now.
(I've marked this bug as security sensitive, but it should be safe to open this up once we've opened up bug 1321066. This is just a followup for that bug, and it has the same level of sensitivity.)
Testcase: https://bugzilla.mozilla.org/attachment.cgi?id=8815501
Keywords: crash, testcase
OS: Unspecified → All
Hardware: Unspecified → All
Keywords: sec-other
(In reply to Daniel Holbert [:dholbert] from comment #0)
> POSSIBLE FIX STRATEGIES:
>  * Strategy #1: If we're sure that these modifications aren't supposed to be
> possible during iteration: we need to figure out why they're happening & how
> to avoid them. (Note that we have a (debug-only) assertion that indicates
> that we expect the array to be un-changed during iteration right now.

This ^^ was referring to the following MOZ_ASSERT( (w/ comparison to queueLength) here:
=====
  const MilestoneEntry* p = mMilestoneEntries.Elements();
#if DEBUG
  uint32_t queueLength = mMilestoneEntries.Length();
#endif
  while (p < mMilestoneEntries.Elements() + mMilestoneEntries.Length()) {
    mozilla::dom::SVGAnimationElement* elem = p->mTimebase.get();
    elem->TimedElement().HandleContainerTimeChange();
    MOZ_ASSERT(queueLength == mMilestoneEntries.Length(),
               "Call to HandleContainerTimeChange resulted in a change to the "
               "queue of milestones");
    ++p;
=====
https://dxr.mozilla.org/mozilla-central/rev/a69583d2dbc6fdc18f63761a89cf539c356668be/dom/smil/nsSMILTimeContainer.cpp#329
(Assignee)

Comment 4

2 years ago
(In reply to Daniel Holbert [:dholbert] from comment #0)
> Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL
> milestone stuff is super-complicated, and you're the most likely to remember
> how it's supposed to fit together. :)  Assigning to you for now.

Absolutely! Thank you so much for all your work on the emergency fix. I saw the bug report after you'd already reviewed the patch for it and didn't have anything to add at that point. I was thinking about how to fix this properly last night but I agree that both of those approaches appear promising (although fixing nsTPriorityQueue seems safest at this stage).
Group: layout-core-security
(Assignee)

Comment 5

2 years ago
Having looked into that assertion, I don't think we can reasonably expect the set of milestones won't change from calling nsSMILTimedElement::HandleContainerTimeChange.

I think there are at least two possible fixes:

1. Iterate over the milestones once and pull out the elements from the milestones into a separate array (containing owning references to the elements). Then iterate over the array of elements and call HandleContainerTimeChange on each one.

2. Change nsTPriorityQueue to use nsTObserverArray and expose the necessary iterators (at least EndLimitedIterator for this case).

(2) seems more robust so I'll make up a patch for that. I might make up a patch for (1) as well so we can compare (and in case it's more suitable for uplifting).
Flags: in-testsuite?

Comment 6

2 years ago
FYI, I just posted a more real world like test case with an actual animation in bug 1321066.
(In reply to Yuhong Bao from comment #6)
> FYI, I just posted a more real world like test case with an actual animation
> in bug 1321066.

Thanks. I added a crash signature here in case it starts happening a lot in the wild.
Crash Signature: [@ nsSMILTimeContainer::AddMilestone ]
status-firefox50: --- → affected
status-firefox51: --- → affected
status-firefox52: --- → affected
status-firefox-esr45: --- → affected
(Assignee)

Comment 8

2 years ago
Created attachment 8816010 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray

This took a little longer than I anticipated due to bug 1321468.

The patch ended up being larger than I anticipated too so I'll have a look at approach 1 from comment 5 to see how it looks.
(Assignee)

Comment 9

2 years ago
Created attachment 8816027 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them

This is the first approach from comment 5.
(Assignee)

Comment 11

2 years ago
Created attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Made a few tweaks to the comment.
Attachment #8816027 - Attachment is obsolete: true
(Assignee)

Comment 12

2 years ago
Created attachment 8816042 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray (v2)

Fix a namespace issue.
Attachment #8816010 - Attachment is obsolete: true
(Assignee)

Comment 13

2 years ago
(In reply to Brian Birtles (:birtles) from comment #10)
> Approach 2:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c7dabe4d8e3b82e407dfe1f990a4c08609b2ccf3

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b52f53f137c030981cee7a082e4314165590d798
(Assignee)

Comment 14

2 years ago
Daniel, I've prepared two different approaches to this and currently I'm thinking we could:

* Land approach 1 on branches since it's quite simple. It keeps the
  mHoldingEntries check too just in case, but we might decide to drop that.

* Land approach 2 on trunk and split it up a little:
  - Add SwapElementsAt to nsTObserverArray
  - Port nsTPriorityQueue to nsTObserverArray and update the two call sites in
    nsSMILTimeContainer that rely on Elements() since that would no longer be
    available.
    (We could possibly add a couple more gtests to TestPriorityQueue.cpp but
    I suspect they'd just be testing nsTObserverArray functionality.)
  - Drop mHoldingEntries from nsSMILTimeContainer and add the crashtest

If you agree, I'll go ahead and split up part 2 and put it all up for review.
Flags: needinfo?(dholbert)
(Assignee)

Comment 15

2 years ago
Oh, that's assuming try looks good!
(Assignee)

Comment 16

2 years ago
Actually, I think approach 2 is not ultimately right. If we do hit this situation where we end up adding milestones while iterating over them, then, because we're using a priority queue, we may end up shuffling the array around and any active iterators will visit the wrong set of elements. So, while it won't crash, it won't produce the correct result either.

So, I think we should just land approach 1. Then, in a separate patch, perhaps even a separate bug and on trunk only:

- Drop mHoldingEntries
- Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing iterators from nsTObserverArray instead of Elements().
(Assignee)

Updated

2 years ago
Flags: needinfo?(dholbert)
Attachment #8816028 - Flags: review?(dholbert)
Comment on attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Review of attachment 8816028 [details] [diff] [review]:
-----------------------------------------------------------------

Commit message nit:
(1)  "This patch leaves mHoldingEntries is place for now"
  s/is/in/

 "we should probably remove it when/after landing."
(2) IMO, we shouldn't remove it -- we should just convert it to debug-only:
  - We should wrap all its usages with #ifdef DEBUG (not making it DebugOnly<> -- since IIRC DebugOnly takes up space as a member-variable)
  - We should convert the MOZ_RELEASE_ASSERTs to just MOZ_ASSERT.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +330,4 @@
>      ++p;
>    }
> +
> +  p = nullptr;

The "p = nullptr" here is a bit awkward, since we never use "p" after this point.  I think it's an attempt at enforcing scope -- so let's just replace the "while" loop with a "for" loop, so that p is *actually* scoped without the need for this explicit nulling.

I think this should do it:
  for (const MilestoneEntry* p = mMilestoneEntries.Elements();
       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
       ++p) {
    elems.AppendElement(p->mTimebase.get());
  }

What do you think?
(In reply to Brian Birtles (:birtles) from comment #16)
> Actually, I think approach 2 is not ultimately right. If we do hit this
> situation where we end up adding milestones while iterating over them, then,
> because we're using a priority queue, we may end up shuffling the array
> around and any active iterators will visit the wrong set of elements. So,
> while it won't crash, it won't produce the correct result either.

Agreed.

> So, I think we should just land approach 1.

Agreed (modulo review comments).

> Then, in a separate patch, perhaps even a separate bug and on trunk only:
> 
> - Drop mHoldingEntries
I think we should preserve mHoldingEntries as a debug-only thing, per comment 17.

> - Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing
> iterators from nsTObserverArray instead of Elements().

I'm not sure this is a good idea at this point.  IIUC we won't need to have nsTObserverArray under the hood anymore, for this usage at least, after we've taken Approach 1.  And I expect that the conversion to nsTObserverArray isn't free, so it doesn't make sense that we'd shake up this implementation & take a runtime penalty for no clear benefit.  (If you disagree, though, feel free to file the followup bug on this part, and we can continue this discussion over there.)
Flags: needinfo?(bbirtles)
(Assignee)

Comment 19

2 years ago
Yes, thinking about this overnight I came to the same conclusion that it's probably not worth fiddling with nsTPriorityQueue. I'll make the changes to Approach 1.
Flags: needinfo?(bbirtles)
(Assignee)

Comment 20

2 years ago
Created attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Attachment #8816317 - Flags: review?(dholbert)
(Assignee)

Updated

2 years ago
Attachment #8816028 - Attachment is obsolete: true
Attachment #8816028 - Flags: review?(dholbert)
(Assignee)

Comment 21

2 years ago
Created attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Attachment #8816318 - Flags: review?(dholbert)
(Assignee)

Updated

2 years ago
Attachment #8816042 - Attachment is obsolete: true
Comment on attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816317 [details] [diff] [review]:
-----------------------------------------------------------------

This is basically r+ with nits addressed, but I'll mark it r- and take a look at the final patch to be on the cautious side.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +318,5 @@
>    // with significant transitions yet to fire should have their next milestone
>    // registered. Other timed elements don't matter.
> +
> +  // Copy the timed elements to a separate array before calling
> +  // HandleContainerTimeChange on each them in case doing so mutates

Typo -- s/each them/each of them/

@@ +329,5 @@
> +       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
> +       ++p) {
> +    elems.AppendElement(p->mTimebase.get());
> +  }
> +  mHoldingEntries = false;

Technically, "mHoldingEntries = false" here is bogus.  You're assuming that it was false before we set it to true, but you don't actually know that.  We should really be letting |saveHolding| restore the saved value here.

I think the only way to do that is to force the AutoRestore to go out of scope at this point -- i.e. to add an extra layer of curly-braces, starting from

  nsTArray<RefPtr<mozilla::dom::SVGAnimationElement>> elems;
  // Scope for saveHolding:
  {
    AutoRestore<bool> saveHolding(mHoldingEntries);
    mHoldingEntries = true;	
    for (...) {
      ...
    }
  }
Attachment #8816317 - Flags: review?(dholbert) → review-
Comment on attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Review of attachment 8816318 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on part 2, though it'll need to be rebased on top of the updates to "part 1" of course.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +340,2 @@
>    mHoldingEntries = false;
> +#endif

(This "mHoldingEntries = false;" assignment won't exist anymore, per my previous comment, so this chunk of the patch won't be needed.)
Attachment #8816318 - Flags: review?(dholbert) → review+
(Assignee)

Comment 24

2 years ago
Created attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Attachment #8816325 - Flags: review?(dholbert)
(Assignee)

Updated

2 years ago
Attachment #8816317 - Attachment is obsolete: true
(Assignee)

Comment 25

2 years ago
Created attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Attachment #8816326 - Flags: review?(dholbert)
(Assignee)

Updated

2 years ago
Attachment #8816318 - Attachment is obsolete: true
(Assignee)

Comment 26

2 years ago
Comment on attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Carry forward r+ from comment 23.
Attachment #8816326 - Flags: review?(dholbert) → review+
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816325 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks - this looks good! r=me
Attachment #8816325 - Flags: review?(dholbert) → review+

Comment 29

2 years ago
Pushed by bbirtles@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c1e9210ea85
part 1 - Copy elements from mMilestoneEntries before iterating over them; r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/7004ffd040dc
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only; r=dholbert

Comment 30

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8c1e9210ea85
https://hg.mozilla.org/mozilla-central/rev/7004ffd040dc
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox53: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(Assignee)

Comment 31

2 years ago
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Approval Request Comment
[Feature/Bug causing the regression]: bug 474743
[User impact if declined]: (Safe) crash with some content
[Is this code covered by automated tests?]: Yes (included in this patch)
[Has the fix been verified in Nightly?]: It has landed in central, but not shipped in Nightly yet
[Needs manual test from QE? If yes, steps to reproduce]: Who is OE? In any case, I don't think so since the patch includes an automated test. If someone wants to verify, the test steps from bug 1321066 should apply although in this case we don't expect a crash.
[List of other uplifts needed for the feature/fix]: Just this patch is enough (part 2 is optional)
[Is the change risky?]: Not particularly.
[Why is the change risky/not risky?]: It keeps the safe-guard checks introduced in bug 1321066.
[String changes made/needed]: None
Attachment #8816325 - Flags: approval-mozilla-beta?
Attachment #8816325 - Flags: approval-mozilla-aurora?
What about 45ESR, since the (safe) crash is still there?
(In reply to Cameron Kaiser [:spectre] from comment #32)
> What about 45ESR, since the (safe) crash is still there?

This is just a stability fix at this point -- i.e. a fix fora safe crash (and one that's unlikely to be hit by normal web content, since SMIL is so rarely used, and syncbase-timing-with-scripting is even more rarely used).

And the ESR FAQ page says:
> Maintenance of each ESR, through point releases, is limited to
> high-risk/high-impact security vulnerabilities [...].
> Backports of any functional enhancements and/or stability fixes
> are not in scope.
https://www.mozilla.org/en-US/firefox/organizations/faq/

So I think that makes this out-of-scope for ESR.
status-firefox-esr45: affected → wontfix
I'll backport for TenFourFox separately.
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

avoid crashing with recent sec fix, take in aurora52
Attachment #8816325 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
needs rebasing for aurora
Flags: needinfo?(bbirtles)
Yup, just needs rebasing for the crashtest.list insertion.

I've got an aurora clone handy -- I'll rebase & land.
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

let's fix this crash in beta51
Attachment #8816325 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Rebased to beta for crashtests.list contextual differences, and landed on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/181d381af5dfff2cafcd816be6167a1897d963a0
status-firefox51: affected → fixed
status-firefox50: affected → wontfix
Whiteboard: [adv-main50.1-]
Whiteboard: [adv-main50.1-] → [adv-main51-]
You need to log in before you can comment on or make changes to this bug.