Last Comment Bug 1321357 - Fix crash in bug 1321066 more thoroughly (so it doesn't crash at all)
: Fix crash in bug 1321066 more thoroughly (so it doesn't crash at all)
Status: RESOLVED FIXED
[adv-main51-]
: crash, sec-other, testcase
Product: Core
Classification: Components
Component: SVG (show other bugs)
: Trunk
: All All
-- normal (vote)
: mozilla53
Assigned To: Brian Birtles (:birtles)
:
: Jet Villegas (:jet)
Mentors:
https://bugzilla.mozilla.org/attachme...
Depends on: CVE-2016-9079
Blocks:
  Show dependency treegraph
 
Reported: 2016-11-30 10:56 PST by Daniel Holbert [:dholbert]
Modified: 2017-01-17 12:23 PST (History)
13 users (show)
ryanvm: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
fixed
fixed
fixed
wontfix


Attachments
Approach 2 - Make nsTPriorityQueue use nsTObserverArray (12.69 KB, patch)
2016-11-30 23:00 PST, Brian Birtles (:birtles)
no flags Details | Diff | Splinter Review
Approach 1 - Copy the elements to a separate array before iterating over them (2.79 KB, patch)
2016-11-30 23:30 PST, Brian Birtles (:birtles)
no flags Details | Diff | Splinter Review
Approach 1 - Copy the elements to a separate array before iterating over them (v2) (2.82 KB, patch)
2016-11-30 23:38 PST, Brian Birtles (:birtles)
no flags Details | Diff | Splinter Review
Approach 2 - Make nsTPriorityQueue use nsTObserverArray (v2) (12.71 KB, patch)
2016-11-30 23:50 PST, Brian Birtles (:birtles)
no flags Details | Diff | Splinter Review
part 1 - Copy elements from mMilestoneEntries before iterating over them (3.08 KB, patch)
2016-12-01 16:19 PST, Brian Birtles (:birtles)
dholbert: review-
Details | Diff | Splinter Review
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only (4.96 KB, patch)
2016-12-01 16:19 PST, Brian Birtles (:birtles)
dholbert: review+
Details | Diff | Splinter Review
part 1 - Copy elements from mMilestoneEntries before iterating over them (3.17 KB, patch)
2016-12-01 16:33 PST, Brian Birtles (:birtles)
dholbert: review+
jcristau: approval‑mozilla‑aurora+
jcristau: approval‑mozilla‑beta+
Details | Diff | Splinter Review
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only (4.81 KB, patch)
2016-12-01 16:33 PST, Brian Birtles (:birtles)
bbirtles: review+
Details | Diff | Splinter Review

Description User image Daniel Holbert [:dholbert] 2016-11-30 10:56:46 PST
In bug 1321066, we crashed because:
 - We're iterating across a nsTArray, using a raw pointer to its Elements(). (Actually it's a nsTPriorityQueue, but that's backed by a nsTArray.)
 - During that iteration, we add entries to the array.
 - This ends up reallocating the array's buffer.
 - So from that point on, our iteration is walking through freed memory.

For now, we're taking a band-aid fix that detects the array-modification and safely crashes at that point.  This should prevent any badness. HOWEVER: we should dig in & fix this more thoroughly, so that the testcases don't crash at all.  I'm filing this bug on that part.

The testcases in bug 1321066 should serve as testcases for this bug as well.

POSSIBLE FIX STRATEGIES:
 * Strategy #1: If we're sure that these modifications aren't supposed to be possible during iteration: we need to figure out why they're happening & how to avoid them. (Note that we have a (debug-only) assertion that indicates that we expect the array to be un-changed during iteration right now. So I think this might be feasible.)
 * Strategy #2: Change nsTPriorityQueue (or at least the one used here) to be backed by a nsTObserverArray, which *is* safe to modify during iteration. (If we follow this strategy, we could perhaps remove the release-build assertions added in bug 1321066, since the modifications caught by those assertions would become safe.)

Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL milestone stuff is super-complicated, and you're the most likely to remember how it's supposed to fit together. :)  Assigning to you for now.
Comment 1 User image Daniel Holbert [:dholbert] 2016-11-30 11:01:14 PST
(I've marked this bug as security sensitive, but it should be safe to open this up once we've opened up bug 1321066. This is just a followup for that bug, and it has the same level of sensitivity.)
Comment 2 User image Daniel Holbert [:dholbert] 2016-11-30 11:03:54 PST
Testcase: https://bugzilla.mozilla.org/attachment.cgi?id=8815501
Comment 3 User image Daniel Holbert [:dholbert] 2016-11-30 11:33:49 PST
(In reply to Daniel Holbert [:dholbert] from comment #0)
> POSSIBLE FIX STRATEGIES:
>  * Strategy #1: If we're sure that these modifications aren't supposed to be
> possible during iteration: we need to figure out why they're happening & how
> to avoid them. (Note that we have a (debug-only) assertion that indicates
> that we expect the array to be un-changed during iteration right now.

This ^^ was referring to the following MOZ_ASSERT( (w/ comparison to queueLength) here:
=====
  const MilestoneEntry* p = mMilestoneEntries.Elements();
#if DEBUG
  uint32_t queueLength = mMilestoneEntries.Length();
#endif
  while (p < mMilestoneEntries.Elements() + mMilestoneEntries.Length()) {
    mozilla::dom::SVGAnimationElement* elem = p->mTimebase.get();
    elem->TimedElement().HandleContainerTimeChange();
    MOZ_ASSERT(queueLength == mMilestoneEntries.Length(),
               "Call to HandleContainerTimeChange resulted in a change to the "
               "queue of milestones");
    ++p;
=====
https://dxr.mozilla.org/mozilla-central/rev/a69583d2dbc6fdc18f63761a89cf539c356668be/dom/smil/nsSMILTimeContainer.cpp#329
Comment 4 User image Brian Birtles (:birtles) 2016-11-30 15:04:17 PST
(In reply to Daniel Holbert [:dholbert] from comment #0)
> Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL
> milestone stuff is super-complicated, and you're the most likely to remember
> how it's supposed to fit together. :)  Assigning to you for now.

Absolutely! Thank you so much for all your work on the emergency fix. I saw the bug report after you'd already reviewed the patch for it and didn't have anything to add at that point. I was thinking about how to fix this properly last night but I agree that both of those approaches appear promising (although fixing nsTPriorityQueue seems safest at this stage).
Comment 5 User image Brian Birtles (:birtles) 2016-11-30 18:05:35 PST
Having looked into that assertion, I don't think we can reasonably expect the set of milestones won't change from calling nsSMILTimedElement::HandleContainerTimeChange.

I think there are at least two possible fixes:

1. Iterate over the milestones once and pull out the elements from the milestones into a separate array (containing owning references to the elements). Then iterate over the array of elements and call HandleContainerTimeChange on each one.

2. Change nsTPriorityQueue to use nsTObserverArray and expose the necessary iterators (at least EndLimitedIterator for this case).

(2) seems more robust so I'll make up a patch for that. I might make up a patch for (1) as well so we can compare (and in case it's more suitable for uplifting).
Comment 6 User image Yuhong Bao 2016-11-30 18:14:25 PST
FYI, I just posted a more real world like test case with an actual animation in bug 1321066.
Comment 7 User image Andrew McCreight [:mccr8] 2016-11-30 18:49:35 PST
(In reply to Yuhong Bao from comment #6)
> FYI, I just posted a more real world like test case with an actual animation
> in bug 1321066.

Thanks. I added a crash signature here in case it starts happening a lot in the wild.
Comment 8 User image Brian Birtles (:birtles) 2016-11-30 23:00:03 PST
Created attachment 8816010 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray

This took a little longer than I anticipated due to bug 1321468.

The patch ended up being larger than I anticipated too so I'll have a look at approach 1 from comment 5 to see how it looks.
Comment 9 User image Brian Birtles (:birtles) 2016-11-30 23:30:42 PST
Created attachment 8816027 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them

This is the first approach from comment 5.
Comment 11 User image Brian Birtles (:birtles) 2016-11-30 23:38:05 PST
Created attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Made a few tweaks to the comment.
Comment 12 User image Brian Birtles (:birtles) 2016-11-30 23:50:31 PST
Created attachment 8816042 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray (v2)

Fix a namespace issue.
Comment 13 User image Brian Birtles (:birtles) 2016-11-30 23:53:46 PST
(In reply to Brian Birtles (:birtles) from comment #10)
> Approach 2:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c7dabe4d8e3b82e407dfe1f990a4c08609b2ccf3

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b52f53f137c030981cee7a082e4314165590d798
Comment 14 User image Brian Birtles (:birtles) 2016-11-30 23:57:02 PST
Daniel, I've prepared two different approaches to this and currently I'm thinking we could:

* Land approach 1 on branches since it's quite simple. It keeps the
  mHoldingEntries check too just in case, but we might decide to drop that.

* Land approach 2 on trunk and split it up a little:
  - Add SwapElementsAt to nsTObserverArray
  - Port nsTPriorityQueue to nsTObserverArray and update the two call sites in
    nsSMILTimeContainer that rely on Elements() since that would no longer be
    available.
    (We could possibly add a couple more gtests to TestPriorityQueue.cpp but
    I suspect they'd just be testing nsTObserverArray functionality.)
  - Drop mHoldingEntries from nsSMILTimeContainer and add the crashtest

If you agree, I'll go ahead and split up part 2 and put it all up for review.
Comment 15 User image Brian Birtles (:birtles) 2016-11-30 23:58:00 PST
Oh, that's assuming try looks good!
Comment 16 User image Brian Birtles (:birtles) 2016-12-01 01:11:43 PST
Actually, I think approach 2 is not ultimately right. If we do hit this situation where we end up adding milestones while iterating over them, then, because we're using a priority queue, we may end up shuffling the array around and any active iterators will visit the wrong set of elements. So, while it won't crash, it won't produce the correct result either.

So, I think we should just land approach 1. Then, in a separate patch, perhaps even a separate bug and on trunk only:

- Drop mHoldingEntries
- Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing iterators from nsTObserverArray instead of Elements().
Comment 17 User image Daniel Holbert [:dholbert] 2016-12-01 14:06:58 PST
Comment on attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Review of attachment 8816028 [details] [diff] [review]:
-----------------------------------------------------------------

Commit message nit:
(1)  "This patch leaves mHoldingEntries is place for now"
  s/is/in/

 "we should probably remove it when/after landing."
(2) IMO, we shouldn't remove it -- we should just convert it to debug-only:
  - We should wrap all its usages with #ifdef DEBUG (not making it DebugOnly<> -- since IIRC DebugOnly takes up space as a member-variable)
  - We should convert the MOZ_RELEASE_ASSERTs to just MOZ_ASSERT.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +330,4 @@
>      ++p;
>    }
> +
> +  p = nullptr;

The "p = nullptr" here is a bit awkward, since we never use "p" after this point.  I think it's an attempt at enforcing scope -- so let's just replace the "while" loop with a "for" loop, so that p is *actually* scoped without the need for this explicit nulling.

I think this should do it:
  for (const MilestoneEntry* p = mMilestoneEntries.Elements();
       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
       ++p) {
    elems.AppendElement(p->mTimebase.get());
  }

What do you think?
Comment 18 User image Daniel Holbert [:dholbert] 2016-12-01 14:13:02 PST
(In reply to Brian Birtles (:birtles) from comment #16)
> Actually, I think approach 2 is not ultimately right. If we do hit this
> situation where we end up adding milestones while iterating over them, then,
> because we're using a priority queue, we may end up shuffling the array
> around and any active iterators will visit the wrong set of elements. So,
> while it won't crash, it won't produce the correct result either.

Agreed.

> So, I think we should just land approach 1.

Agreed (modulo review comments).

> Then, in a separate patch, perhaps even a separate bug and on trunk only:
> 
> - Drop mHoldingEntries
I think we should preserve mHoldingEntries as a debug-only thing, per comment 17.

> - Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing
> iterators from nsTObserverArray instead of Elements().

I'm not sure this is a good idea at this point.  IIUC we won't need to have nsTObserverArray under the hood anymore, for this usage at least, after we've taken Approach 1.  And I expect that the conversion to nsTObserverArray isn't free, so it doesn't make sense that we'd shake up this implementation & take a runtime penalty for no clear benefit.  (If you disagree, though, feel free to file the followup bug on this part, and we can continue this discussion over there.)
Comment 19 User image Brian Birtles (:birtles) 2016-12-01 14:45:25 PST
Yes, thinking about this overnight I came to the same conclusion that it's probably not worth fiddling with nsTPriorityQueue. I'll make the changes to Approach 1.
Comment 20 User image Brian Birtles (:birtles) 2016-12-01 16:19:41 PST
Created attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Comment 21 User image Brian Birtles (:birtles) 2016-12-01 16:19:47 PST
Created attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Comment 22 User image Daniel Holbert [:dholbert] 2016-12-01 16:28:41 PST
Comment on attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816317 [details] [diff] [review]:
-----------------------------------------------------------------

This is basically r+ with nits addressed, but I'll mark it r- and take a look at the final patch to be on the cautious side.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +318,5 @@
>    // with significant transitions yet to fire should have their next milestone
>    // registered. Other timed elements don't matter.
> +
> +  // Copy the timed elements to a separate array before calling
> +  // HandleContainerTimeChange on each them in case doing so mutates

Typo -- s/each them/each of them/

@@ +329,5 @@
> +       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
> +       ++p) {
> +    elems.AppendElement(p->mTimebase.get());
> +  }
> +  mHoldingEntries = false;

Technically, "mHoldingEntries = false" here is bogus.  You're assuming that it was false before we set it to true, but you don't actually know that.  We should really be letting |saveHolding| restore the saved value here.

I think the only way to do that is to force the AutoRestore to go out of scope at this point -- i.e. to add an extra layer of curly-braces, starting from

  nsTArray<RefPtr<mozilla::dom::SVGAnimationElement>> elems;
  // Scope for saveHolding:
  {
    AutoRestore<bool> saveHolding(mHoldingEntries);
    mHoldingEntries = true;	
    for (...) {
      ...
    }
  }
Comment 23 User image Daniel Holbert [:dholbert] 2016-12-01 16:32:32 PST
Comment on attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Review of attachment 8816318 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on part 2, though it'll need to be rebased on top of the updates to "part 1" of course.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +340,2 @@
>    mHoldingEntries = false;
> +#endif

(This "mHoldingEntries = false;" assignment won't exist anymore, per my previous comment, so this chunk of the patch won't be needed.)
Comment 24 User image Brian Birtles (:birtles) 2016-12-01 16:33:36 PST
Created attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Comment 25 User image Brian Birtles (:birtles) 2016-12-01 16:33:43 PST
Created attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Comment 26 User image Brian Birtles (:birtles) 2016-12-01 16:34:23 PST
Comment on attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Carry forward r+ from comment 23.
Comment 27 User image Daniel Holbert [:dholbert] 2016-12-01 16:35:19 PST
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816325 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks - this looks good! r=me
Comment 29 User image Pulsebot 2016-12-01 18:27:36 PST
Pushed by bbirtles@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c1e9210ea85
part 1 - Copy elements from mMilestoneEntries before iterating over them; r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/7004ffd040dc
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only; r=dholbert
Comment 31 User image Brian Birtles (:birtles) 2016-12-02 04:53:40 PST
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Approval Request Comment
[Feature/Bug causing the regression]: bug 474743
[User impact if declined]: (Safe) crash with some content
[Is this code covered by automated tests?]: Yes (included in this patch)
[Has the fix been verified in Nightly?]: It has landed in central, but not shipped in Nightly yet
[Needs manual test from QE? If yes, steps to reproduce]: Who is OE? In any case, I don't think so since the patch includes an automated test. If someone wants to verify, the test steps from bug 1321066 should apply although in this case we don't expect a crash.
[List of other uplifts needed for the feature/fix]: Just this patch is enough (part 2 is optional)
[Is the change risky?]: Not particularly.
[Why is the change risky/not risky?]: It keeps the safe-guard checks introduced in bug 1321066.
[String changes made/needed]: None
Comment 32 User image Cameron Kaiser [:spectre] 2016-12-02 08:14:29 PST
What about 45ESR, since the (safe) crash is still there?
Comment 33 User image Daniel Holbert [:dholbert] 2016-12-02 08:41:25 PST
(In reply to Cameron Kaiser [:spectre] from comment #32)
> What about 45ESR, since the (safe) crash is still there?

This is just a stability fix at this point -- i.e. a fix fora safe crash (and one that's unlikely to be hit by normal web content, since SMIL is so rarely used, and syncbase-timing-with-scripting is even more rarely used).

And the ESR FAQ page says:
> Maintenance of each ESR, through point releases, is limited to
> high-risk/high-impact security vulnerabilities [...].
> Backports of any functional enhancements and/or stability fixes
> are not in scope.
https://www.mozilla.org/en-US/firefox/organizations/faq/

So I think that makes this out-of-scope for ESR.
Comment 34 User image Cameron Kaiser [:spectre] 2016-12-02 09:51:28 PST
I'll backport for TenFourFox separately.
Comment 35 User image Julien Cristau [:jcristau] 2016-12-05 14:00:18 PST
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

avoid crashing with recent sec fix, take in aurora52
Comment 36 User image Carsten Book [:Tomcat] 2016-12-06 05:57:21 PST
needs rebasing for aurora
Comment 37 User image Daniel Holbert [:dholbert] 2016-12-06 14:33:22 PST
Yup, just needs rebasing for the crashtest.list insertion.

I've got an aurora clone handy -- I'll rebase & land.
Comment 39 User image Julien Cristau [:jcristau] 2016-12-08 15:50:50 PST
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

let's fix this crash in beta51
Comment 40 User image Daniel Holbert [:dholbert] 2016-12-08 16:20:40 PST
Rebased to beta for crashtests.list contextual differences, and landed on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/181d381af5dfff2cafcd816be6167a1897d963a0

Note You need to log in before you can comment on or make changes to this bug.