The default bug view has changed. See this FAQ.

Fix crash in bug 1321066 more thoroughly (so it doesn't crash at all)

RESOLVED FIXED in Firefox 51

Status

()

Core
SVG
RESOLVED FIXED
4 months ago
2 months ago

People

(Reporter: dholbert, Assigned: birtles)

Tracking

({crash, sec-other, testcase})

Trunk
mozilla53
crash, sec-other, testcase
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox50 wontfix, firefox51 fixed, firefox52 fixed, firefox53 fixed, firefox-esr45 wontfix)

Details

(Whiteboard: [adv-main51-], crash signature, URL)

Attachments

(2 attachments, 6 obsolete attachments)

(Reporter)

Description

4 months ago
In bug 1321066, we crashed because:
 - We're iterating across a nsTArray, using a raw pointer to its Elements(). (Actually it's a nsTPriorityQueue, but that's backed by a nsTArray.)
 - During that iteration, we add entries to the array.
 - This ends up reallocating the array's buffer.
 - So from that point on, our iteration is walking through freed memory.

For now, we're taking a band-aid fix that detects the array-modification and safely crashes at that point.  This should prevent any badness. HOWEVER: we should dig in & fix this more thoroughly, so that the testcases don't crash at all.  I'm filing this bug on that part.

The testcases in bug 1321066 should serve as testcases for this bug as well.

POSSIBLE FIX STRATEGIES:
 * Strategy #1: If we're sure that these modifications aren't supposed to be possible during iteration: we need to figure out why they're happening & how to avoid them. (Note that we have a (debug-only) assertion that indicates that we expect the array to be un-changed during iteration right now. So I think this might be feasible.)
 * Strategy #2: Change nsTPriorityQueue (or at least the one used here) to be backed by a nsTObserverArray, which *is* safe to modify during iteration. (If we follow this strategy, we could perhaps remove the release-build assertions added in bug 1321066, since the modifications caught by those assertions would become safe.)

Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL milestone stuff is super-complicated, and you're the most likely to remember how it's supposed to fit together. :)  Assigning to you for now.
(Reporter)

Comment 1

4 months ago
(I've marked this bug as security sensitive, but it should be safe to open this up once we've opened up bug 1321066. This is just a followup for that bug, and it has the same level of sensitivity.)
(Reporter)

Comment 2

4 months ago
Testcase: https://bugzilla.mozilla.org/attachment.cgi?id=8815501
Keywords: crash, testcase
OS: Unspecified → All
Hardware: Unspecified → All
Keywords: sec-other
(Reporter)

Comment 3

4 months ago
(In reply to Daniel Holbert [:dholbert] from comment #0)
> POSSIBLE FIX STRATEGIES:
>  * Strategy #1: If we're sure that these modifications aren't supposed to be
> possible during iteration: we need to figure out why they're happening & how
> to avoid them. (Note that we have a (debug-only) assertion that indicates
> that we expect the array to be un-changed during iteration right now.

This ^^ was referring to the following MOZ_ASSERT( (w/ comparison to queueLength) here:
=====
  const MilestoneEntry* p = mMilestoneEntries.Elements();
#if DEBUG
  uint32_t queueLength = mMilestoneEntries.Length();
#endif
  while (p < mMilestoneEntries.Elements() + mMilestoneEntries.Length()) {
    mozilla::dom::SVGAnimationElement* elem = p->mTimebase.get();
    elem->TimedElement().HandleContainerTimeChange();
    MOZ_ASSERT(queueLength == mMilestoneEntries.Length(),
               "Call to HandleContainerTimeChange resulted in a change to the "
               "queue of milestones");
    ++p;
=====
https://dxr.mozilla.org/mozilla-central/rev/a69583d2dbc6fdc18f63761a89cf539c356668be/dom/smil/nsSMILTimeContainer.cpp#329
(Assignee)

Comment 4

4 months ago
(In reply to Daniel Holbert [:dholbert] from comment #0)
> Birtles, I'm hoping you're up for looking into this -- IIRC the SMIL
> milestone stuff is super-complicated, and you're the most likely to remember
> how it's supposed to fit together. :)  Assigning to you for now.

Absolutely! Thank you so much for all your work on the emergency fix. I saw the bug report after you'd already reviewed the patch for it and didn't have anything to add at that point. I was thinking about how to fix this properly last night but I agree that both of those approaches appear promising (although fixing nsTPriorityQueue seems safest at this stage).
Group: layout-core-security
(Assignee)

Comment 5

4 months ago
Having looked into that assertion, I don't think we can reasonably expect the set of milestones won't change from calling nsSMILTimedElement::HandleContainerTimeChange.

I think there are at least two possible fixes:

1. Iterate over the milestones once and pull out the elements from the milestones into a separate array (containing owning references to the elements). Then iterate over the array of elements and call HandleContainerTimeChange on each one.

2. Change nsTPriorityQueue to use nsTObserverArray and expose the necessary iterators (at least EndLimitedIterator for this case).

(2) seems more robust so I'll make up a patch for that. I might make up a patch for (1) as well so we can compare (and in case it's more suitable for uplifting).
Flags: in-testsuite?

Comment 6

4 months ago
FYI, I just posted a more real world like test case with an actual animation in bug 1321066.
(In reply to Yuhong Bao from comment #6)
> FYI, I just posted a more real world like test case with an actual animation
> in bug 1321066.

Thanks. I added a crash signature here in case it starts happening a lot in the wild.
Crash Signature: [@ nsSMILTimeContainer::AddMilestone ]
status-firefox50: --- → affected
status-firefox51: --- → affected
status-firefox52: --- → affected
status-firefox-esr45: --- → affected
(Assignee)

Comment 8

4 months ago
Created attachment 8816010 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray

This took a little longer than I anticipated due to bug 1321468.

The patch ended up being larger than I anticipated too so I'll have a look at approach 1 from comment 5 to see how it looks.
(Assignee)

Comment 9

4 months ago
Created attachment 8816027 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them

This is the first approach from comment 5.
(Assignee)

Comment 10

4 months ago
Approach 1:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=12cc9aa3376571c92bf7f77556815802e45eac0c

Approach 2:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c7dabe4d8e3b82e407dfe1f990a4c08609b2ccf3
(Assignee)

Comment 11

4 months ago
Created attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Made a few tweaks to the comment.
Attachment #8816027 - Attachment is obsolete: true
(Assignee)

Comment 12

4 months ago
Created attachment 8816042 [details] [diff] [review]
Approach 2 - Make nsTPriorityQueue use nsTObserverArray (v2)

Fix a namespace issue.
Attachment #8816010 - Attachment is obsolete: true
(Assignee)

Comment 13

4 months ago
(In reply to Brian Birtles (:birtles) from comment #10)
> Approach 2:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=c7dabe4d8e3b82e407dfe1f990a4c08609b2ccf3

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b52f53f137c030981cee7a082e4314165590d798
(Assignee)

Comment 14

4 months ago
Daniel, I've prepared two different approaches to this and currently I'm thinking we could:

* Land approach 1 on branches since it's quite simple. It keeps the
  mHoldingEntries check too just in case, but we might decide to drop that.

* Land approach 2 on trunk and split it up a little:
  - Add SwapElementsAt to nsTObserverArray
  - Port nsTPriorityQueue to nsTObserverArray and update the two call sites in
    nsSMILTimeContainer that rely on Elements() since that would no longer be
    available.
    (We could possibly add a couple more gtests to TestPriorityQueue.cpp but
    I suspect they'd just be testing nsTObserverArray functionality.)
  - Drop mHoldingEntries from nsSMILTimeContainer and add the crashtest

If you agree, I'll go ahead and split up part 2 and put it all up for review.
Flags: needinfo?(dholbert)
(Assignee)

Comment 15

4 months ago
Oh, that's assuming try looks good!
(Assignee)

Comment 16

4 months ago
Actually, I think approach 2 is not ultimately right. If we do hit this situation where we end up adding milestones while iterating over them, then, because we're using a priority queue, we may end up shuffling the array around and any active iterators will visit the wrong set of elements. So, while it won't crash, it won't produce the correct result either.

So, I think we should just land approach 1. Then, in a separate patch, perhaps even a separate bug and on trunk only:

- Drop mHoldingEntries
- Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing iterators from nsTObserverArray instead of Elements().
(Assignee)

Updated

4 months ago
Flags: needinfo?(dholbert)
Attachment #8816028 - Flags: review?(dholbert)
(Reporter)

Comment 17

4 months ago
Comment on attachment 8816028 [details] [diff] [review]
Approach 1 - Copy the elements to a separate array before iterating over them (v2)

Review of attachment 8816028 [details] [diff] [review]:
-----------------------------------------------------------------

Commit message nit:
(1)  "This patch leaves mHoldingEntries is place for now"
  s/is/in/

 "we should probably remove it when/after landing."
(2) IMO, we shouldn't remove it -- we should just convert it to debug-only:
  - We should wrap all its usages with #ifdef DEBUG (not making it DebugOnly<> -- since IIRC DebugOnly takes up space as a member-variable)
  - We should convert the MOZ_RELEASE_ASSERTs to just MOZ_ASSERT.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +330,4 @@
>      ++p;
>    }
> +
> +  p = nullptr;

The "p = nullptr" here is a bit awkward, since we never use "p" after this point.  I think it's an attempt at enforcing scope -- so let's just replace the "while" loop with a "for" loop, so that p is *actually* scoped without the need for this explicit nulling.

I think this should do it:
  for (const MilestoneEntry* p = mMilestoneEntries.Elements();
       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
       ++p) {
    elems.AppendElement(p->mTimebase.get());
  }

What do you think?
(Reporter)

Comment 18

4 months ago
(In reply to Brian Birtles (:birtles) from comment #16)
> Actually, I think approach 2 is not ultimately right. If we do hit this
> situation where we end up adding milestones while iterating over them, then,
> because we're using a priority queue, we may end up shuffling the array
> around and any active iterators will visit the wrong set of elements. So,
> while it won't crash, it won't produce the correct result either.

Agreed.

> So, I think we should just land approach 1.

Agreed (modulo review comments).

> Then, in a separate patch, perhaps even a separate bug and on trunk only:
> 
> - Drop mHoldingEntries
I think we should preserve mHoldingEntries as a debug-only thing, per comment 17.

> - Harden nsTPriorityQueue by basing it on nsTObserverArray and only exposing
> iterators from nsTObserverArray instead of Elements().

I'm not sure this is a good idea at this point.  IIUC we won't need to have nsTObserverArray under the hood anymore, for this usage at least, after we've taken Approach 1.  And I expect that the conversion to nsTObserverArray isn't free, so it doesn't make sense that we'd shake up this implementation & take a runtime penalty for no clear benefit.  (If you disagree, though, feel free to file the followup bug on this part, and we can continue this discussion over there.)
(Reporter)

Updated

4 months ago
Flags: needinfo?(bbirtles)
(Assignee)

Comment 19

4 months ago
Yes, thinking about this overnight I came to the same conclusion that it's probably not worth fiddling with nsTPriorityQueue. I'll make the changes to Approach 1.
Flags: needinfo?(bbirtles)
(Assignee)

Comment 20

4 months ago
Created attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Attachment #8816317 - Flags: review?(dholbert)
(Assignee)

Updated

4 months ago
Attachment #8816028 - Attachment is obsolete: true
Attachment #8816028 - Flags: review?(dholbert)
(Assignee)

Comment 21

4 months ago
Created attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Attachment #8816318 - Flags: review?(dholbert)
(Assignee)

Updated

4 months ago
Attachment #8816042 - Attachment is obsolete: true
(Reporter)

Comment 22

4 months ago
Comment on attachment 8816317 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816317 [details] [diff] [review]:
-----------------------------------------------------------------

This is basically r+ with nits addressed, but I'll mark it r- and take a look at the final patch to be on the cautious side.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +318,5 @@
>    // with significant transitions yet to fire should have their next milestone
>    // registered. Other timed elements don't matter.
> +
> +  // Copy the timed elements to a separate array before calling
> +  // HandleContainerTimeChange on each them in case doing so mutates

Typo -- s/each them/each of them/

@@ +329,5 @@
> +       p < mMilestoneEntries.Elements() + mMilestoneEntries.Length();
> +       ++p) {
> +    elems.AppendElement(p->mTimebase.get());
> +  }
> +  mHoldingEntries = false;

Technically, "mHoldingEntries = false" here is bogus.  You're assuming that it was false before we set it to true, but you don't actually know that.  We should really be letting |saveHolding| restore the saved value here.

I think the only way to do that is to force the AutoRestore to go out of scope at this point -- i.e. to add an extra layer of curly-braces, starting from

  nsTArray<RefPtr<mozilla::dom::SVGAnimationElement>> elems;
  // Scope for saveHolding:
  {
    AutoRestore<bool> saveHolding(mHoldingEntries);
    mHoldingEntries = true;	
    for (...) {
      ...
    }
  }
Attachment #8816317 - Flags: review?(dholbert) → review-
(Reporter)

Comment 23

4 months ago
Comment on attachment 8816318 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Review of attachment 8816318 [details] [diff] [review]:
-----------------------------------------------------------------

r=me on part 2, though it'll need to be rebased on top of the updates to "part 1" of course.

::: dom/smil/nsSMILTimeContainer.cpp
@@ +340,2 @@
>    mHoldingEntries = false;
> +#endif

(This "mHoldingEntries = false;" assignment won't exist anymore, per my previous comment, so this chunk of the patch won't be needed.)
Attachment #8816318 - Flags: review?(dholbert) → review+
(Assignee)

Comment 24

4 months ago
Created attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

MozReview-Commit-ID: GjkXYlhoeoy
Attachment #8816325 - Flags: review?(dholbert)
(Assignee)

Updated

4 months ago
Attachment #8816317 - Attachment is obsolete: true
(Assignee)

Comment 25

4 months ago
Created attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

MozReview-Commit-ID: 2plUpGMj1FB
Attachment #8816326 - Flags: review?(dholbert)
(Assignee)

Updated

4 months ago
Attachment #8816318 - Attachment is obsolete: true
(Assignee)

Comment 26

4 months ago
Comment on attachment 8816326 [details] [diff] [review]
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only

Carry forward r+ from comment 23.
Attachment #8816326 - Flags: review?(dholbert) → review+
(Reporter)

Comment 27

4 months ago
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Review of attachment 8816325 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks - this looks good! r=me
Attachment #8816325 - Flags: review?(dholbert) → review+
(Assignee)

Comment 28

4 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=599c2055807801536c5f008b28d91e0a6f9fadbb

Comment 29

4 months ago
Pushed by bbirtles@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c1e9210ea85
part 1 - Copy elements from mMilestoneEntries before iterating over them; r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/7004ffd040dc
part 2 - Make nsSMILTimeContainer::mHoldingEntries debug-only; r=dholbert

Comment 30

4 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/8c1e9210ea85
https://hg.mozilla.org/mozilla-central/rev/7004ffd040dc
Status: NEW → RESOLVED
Last Resolved: 4 months ago
status-firefox53: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(Assignee)

Comment 31

4 months ago
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

Approval Request Comment
[Feature/Bug causing the regression]: bug 474743
[User impact if declined]: (Safe) crash with some content
[Is this code covered by automated tests?]: Yes (included in this patch)
[Has the fix been verified in Nightly?]: It has landed in central, but not shipped in Nightly yet
[Needs manual test from QE? If yes, steps to reproduce]: Who is OE? In any case, I don't think so since the patch includes an automated test. If someone wants to verify, the test steps from bug 1321066 should apply although in this case we don't expect a crash.
[List of other uplifts needed for the feature/fix]: Just this patch is enough (part 2 is optional)
[Is the change risky?]: Not particularly.
[Why is the change risky/not risky?]: It keeps the safe-guard checks introduced in bug 1321066.
[String changes made/needed]: None
Attachment #8816325 - Flags: approval-mozilla-beta?
Attachment #8816325 - Flags: approval-mozilla-aurora?
What about 45ESR, since the (safe) crash is still there?
(Reporter)

Comment 33

4 months ago
(In reply to Cameron Kaiser [:spectre] from comment #32)
> What about 45ESR, since the (safe) crash is still there?

This is just a stability fix at this point -- i.e. a fix fora safe crash (and one that's unlikely to be hit by normal web content, since SMIL is so rarely used, and syncbase-timing-with-scripting is even more rarely used).

And the ESR FAQ page says:
> Maintenance of each ESR, through point releases, is limited to
> high-risk/high-impact security vulnerabilities [...].
> Backports of any functional enhancements and/or stability fixes
> are not in scope.
https://www.mozilla.org/en-US/firefox/organizations/faq/

So I think that makes this out-of-scope for ESR.
status-firefox-esr45: affected → wontfix
I'll backport for TenFourFox separately.
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

avoid crashing with recent sec fix, take in aurora52
Attachment #8816325 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
needs rebasing for aurora
Flags: needinfo?(bbirtles)
(Reporter)

Comment 37

4 months ago
Yup, just needs rebasing for the crashtest.list insertion.

I've got an aurora clone handy -- I'll rebase & land.
(Reporter)

Comment 38

4 months ago
https://hg.mozilla.org/releases/mozilla-aurora/rev/6c2cb17c6db341f49599522c4fb1350b7c37b9e0
status-firefox52: affected → fixed
Flags: needinfo?(bbirtles)
Comment on attachment 8816325 [details] [diff] [review]
part 1 - Copy elements from mMilestoneEntries before iterating over them

let's fix this crash in beta51
Attachment #8816325 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Reporter)

Comment 40

4 months ago
Rebased to beta for crashtests.list contextual differences, and landed on beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/181d381af5dfff2cafcd816be6167a1897d963a0
status-firefox51: affected → fixed
status-firefox50: affected → wontfix
Whiteboard: [adv-main50.1-]
Whiteboard: [adv-main50.1-] → [adv-main51-]
You need to log in before you can comment on or make changes to this bug.