Closed Bug 572938 Opened 14 years ago Closed 14 years ago

SMIL animation of "display" on <use> triggers infinite recursion & ASSERTION: Registering content during sample.: '!mRunningSample', ASSERTION: Unregistering content during sample.: '!mRunningSample'

Categories

(Core :: SVG, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- final+

People

(Reporter: wgianopoulos, Assigned: birtles)

References

Details

(Keywords: assertion, crash, hang)

Attachments

(4 files, 3 obsolete files)

There is a pending patch on bug 558663 that has the added benefit of fixing <use> so that it actually works.

In builds with that patch applied this testcase:

https://bugzilla.mozilla.org/attachment.cgi?id=418898

from bug 536454 loops asserting "Resample dirty flag set during sample!" until it crashes.

Bz asked me on IRC to file this bug and assign it to you to get an opinion on if this is an underlying SVG/SMIL error or if there is something wrong with the patch in bug 536454 that is causing this problem.

I have tried the following workaround, that avoids the loop and the crash under. Linux.  Unfortunately, it does not prevent the crash under Windows.

diff --git a/content/smil/nsSMILAnimationController.cpp b/content/smil/nsSMILAnimationController.cpp
--- a/content/smil/nsSMILAnimationController.cpp
+++ b/content/smil/nsSMILAnimationController.cpp
@@ -395,16 +395,17 @@ nsSMILAnimationController::DoSample(PRBo
   // when the inherited value is *also* being animated, we really should be
   // traversing our animated nodes in an ancestors-first order (bug 501183)
   currentCompositorTable->EnumerateEntries(DoComposeAttribute, nsnull);
 
   // Update last compositor table
   mLastCompositorTable = currentCompositorTable.forget();
 
   NS_ASSERTION(!mResampleNeeded, "Resample dirty flag set during sample!");
+  mResampleNeeded = PR_FALSE;
 }
 
 void
 nsSMILAnimationController::DoMilestoneSamples()
 {
   // We need to sample the timing model but because SMIL operates independently
   // of the frame-rate, we can get one sample at t=0s and the next at t=10min.
   //
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
Attached patch WorkaroundSplinter Review
This avoids the loop.
The bug 558663 patch has now landed, so this loop occurs in current trunk builds.
Sorry for not replying on this earlier.  The patch seems reasonable as a workaround for this sort of thing, but I'd be happier if we could figure out why we're asking for a new sample during a sample, and fix that.

Looking at this in GDB right now...
Here's how mResampleNeeded is getting enabled during the sample:

So during a SMIL sample, we query the computed value of the 'display' property, as the base value for animation, and that query involves a call to shell->FlushPendingNotifications(Flush_Style).  While flushing, we see that the 'display' property has changed since we last saw it, so we recreate frames for the <text> element, and rebuild its anonymous cloned content.  This involves creating a new clone of the <animate> node and calling BindToTree to hook it into our document -- and nsSVGAnimationElement::BindToTree calls 'AnimationNeedsResample' which sets mResampleNeeded to PR_TRUE.
Kind of why I called this a workaround.  I was wondering if the issue was related to the comment prior to the assertion that imply we are not really processing things in the correct order, referencing bug 501183.
Comment 4 is about the *anonymous* <text> element from the start, I think.  Notably, it hits the chunk of code added in bug 558663, and makes that call to RecreateFramesForContent (with 'parent' being a nsSVGUseFrame).

> I was wondering if the issue was
> related to the comment prior to the assertion that imply we are not really
> processing things in the correct order, referencing bug 501183.

Yeah, the iteration order could be relevant here -- we probably end up processing the "original" content first, and that triggers a restyle request on the anonymous version, and that's what we end up hitting in comment 4, later in the sample.

We could work around that by always processing the anonymous content *first* during a SMIL sample, but then the anonymous content would lag one sample behind its <use>-target, which is undesirable.
Perhaps nsCSSFrameConstructor::RecreateFramesForContent could just bail out if we're mid-sample? Trying a patch that does that.
Attached patch fix? (no)Splinter Review
This patch adds a way to check whether we're mid-sample -- and when that flag is set, this patch has us bail out if we try to rebuild our parent's subtree in RecreateFramesForContent.

This makes the anonymous content display forever -- it doesn't ever blink off.  Not ideal, but a reasonable workaround for now, I think... (restores behavior similar to what I originally described in Bug 536454 -- though now 'display' would be perma-on instead of perma-off).
Attachment #457728 - Flags: review?(bzbarsky)
Would another option be to move the resample request to a scriptrunner instead of doing directly in BindToTree?
Hmm...  I think that would break this situation:
 (1) created an <animate> node
 (2) appended it to an element
 (3) checked the value of the animated attribute on that element

Right now, step 2 there triggers the 'needs resample' flag, and step 3 notices that the flag is set & forces a synchronous sample to get the up-to-date animated value.  But if we moved the flag-setting to a scriptrunner, then it'd happen later on, and step 3 would see the flag as being false & woudln't know that it needs to force a synchronous sample.
... though I suppose we could create a scriptrunner in BindToTree **only if** we're in the middle of a sample at that point...
> But if we moved the flag-setting to a scriptrunner, then it'd happen later on

Hold on.  Are the three steps in comment 10 happening in user script?  If so, the scriptrunner would run before step 2 returns control to the user script....
That said, it's not clear to me whether we set up a scriptblocker across the sample.  I guess we can't possibly do it, since we're flushing.

More importantly, now that I read through this again, asserting anything about state after a flush is not workable.  A style flush can run arbitrary script, including spinning the event loop.  So while the attached patch fixes one codepath that can lead to insertion of <animate> elements during a layout flush, it's not that hard to write testcases that trigger similar issues which the patch doesn't catch.
(In reply to comment #12)
> Hold on.  Are the three steps in comment 10 happening in user script?  If so,
> the scriptrunner would run before step 2 returns control to the user script....

Ah, right -- sorry, I misunderstood how script-runners worked.  It sounds like they won't help us unless we have a scriptblocker in place, which we don't.

Perhaps we could do an initial style-flush at the beginning of each sample, and then put a scriptblocker in place for the rest of the sample (and use a scriptrunner inside of nsSVGElement::AnimationNeedsResample, to prevent us from setting the resample-needed flag until we're done with our sample).  That would make animations that build on top of inherited animated values "lag behind" by one sample (as they already do randomly, per bug 501183), but that's not too bad.
(In reply to comment #14)
> Perhaps we could do an initial style-flush at the beginning of each sample, and
> then put a scriptblocker in place for the rest of the sample

Nevermind -- we can't have a scriptblocker in place during a sample, because that would break nsSMILCSSProperty::GetBaseValue().  This method
 (a) temporarily clears the SMILOverrideStyle
 (b) check the computed style
 (c) restores the SMILOverrideStyle
If we've got a script-blocker in place, then part (b) will end up returning an outdated computed style. (with the SMILOverrideStyle applied, which we don't want)
Comment on attachment 457728 [details] [diff] [review]
fix?  (no)

Does this still need review?
Comment on attachment 457728 [details] [diff] [review]
fix?  (no)

Sorry -- no, that patch is old. Meant to cancel review a while back.

Still intending to work on this; just postponing it until after my new-feature work.
Attachment #457728 - Flags: review?(bzbarsky)
I'd like for this to block betaN, as this is a new crash in Firefox 4.0.
blocking2.0: --- → ?
blocking2.0: betaN+ → final+
This now triggers many many copies of this assertion (until we crash):
###!!! ASSERTION: Registering content during sample.: '!mRunningSample', file ../../../mozilla/content/smil/nsSMILAnimationController.cpp, line 180
###!!! ASSERTION: Unregistering content during sample.: '!mRunningSample', file ../../../mozilla/content/smil/nsSMILAnimationController.cpp, line 196
Keywords: assertion, crash
OS: Linux → All
Hardware: x86 → All
Summary: Loop asserting "Resample dirty flag set during sample!" until crash → SMIL animation of "display" on <use> triggers ininfinite recursion & ASSERTION: Registering content during sample.: '!mRunningSample', ASSERTION: Unregistering content during sample.: '!mRunningSample'
Attachment #457728 - Attachment description: fix? → fix? (no)
Brian, feel free to steal this bug from me, if you get a chance to work on it. :)
Attached image jesse's testcase
I made this reduced testcase before realizing there was already a bug on file.
Keywords: hang
Blocks: 602880
Bug 602880 has been assigned to me and I suspect it has the same root cause as this bug so I'm currently looking into this.

It seems ideally we'd like to avoid creating and destroying content within a sample.

However, that's difficult to avoid because of the way we fetch CSS base values which involves temporarily clearing the SMIL override style, getting the base value, and then restoring the override style. That process could potentially trigger a resample, as well as registering new animations or unregistering old ones.

That's bad news because running a resample while we're already in a sample can send us into infinite recursion. Adding / removing animations whilst we're in a sample is also bad because it means altering the hashtable we're currently enumerating.

Some possible steps that have come to mind so far:

A) Disable flushing with a script blocker of some sort (e.g. mozAutoDocUpdate) -- doesn't work because then we'll get an out-of-date base value as Daniel has noted in comment 15 (and my tests confirm :).

B) Disable recreating frames within a sample as per attachment 457728 [details] [diff] [review] but as Boris has noted in comment 13, so long as we're flushing styles, content could still be created or destroyed via a number of other channels.

  Question: Is this still the case? We now flush styles before a sample so that only changes that will get flushed will be those resulting from clearing and restoring the SMIL override style--is it still possible that these changes could trigger scripts, events etc.?

C) Ignore all resample requests so long as we're in a sample. While this is easy, it might be overkill and stop us from detecting legitimate bugs. The only time we're actually expecting to get such resample requests within a sample is if new content is added or removed so that's the real issue. In all other cases, sampling shouldn't be generating resample requests (and it would be nice if we can detect if that's ever violated).

D) Ignore resample requests only whilst we're getting a CSS property base value. This is a bit hacky, but basically we pass a pointer to the animation controller to nsSMILCSSProperty and when it goes to fetch the base value it tells the animation controller to just ignore resample requests at this point until it's done getting the base value.

E) Postpone (or ignore) animation (un)registrations whilst within a sample. If we can assume that the process of clearing and restoring the override style will put us back in the original state in terms of registered animations then we could possibly just ignore animation registrations and unregistrations all together whilst we're fetching a base value. I'm not sure that that's a valid assumption though since perhaps we'll unregister and destroy one animation and create a new one in its place and register that.

So perhaps we'd need two tables inside the animation controller: animations to be added, animations to be removed (stores a raw ptr which is never dereferenced). We'd try to pair them up so if the same animation was both added and removed, or vice versa, we'd do nothing. Any remaining changes would be processed after the sample completed (i.e. after we've finished enumerating over the animation hashtable). (And then resample? I think not?)

As with C/D, we'd probably want to distinguish between attempts to (un)register animations whilst fetching the CSS base value (which we know can sometimes happen) and at any other point in the sample (which we're not expecting to happen).

F) Change the was nsSMILCSSProperty::GetBaseValue works -- no idea if this is possible.

They're some ideas for my own record as much as anything. I'll try implementing some combination of B, D, and E. (B might not be strictly necessary but seems like it would be more efficient if we did this? Otherwise do we end up recreating use content on every sample where display is animated?)
Thanks for the summary.

I suspect that animations that change CSS 'display' values are pretty rare, espcially those that affect SVG <use> (since SVG doesn't really support 'display'), so I don't think performance matters for the cases that would recreate frames during GetBaseValue.
(In reply to comment #24)
>   Question: Is this still the case? We now flush styles before a sample so that
> only changes that will get flushed will be those resulting from clearing and
> restoring the SMIL override style--is it still possible that these changes
> could trigger scripts, events etc.?

I just talked to bz on IRC about this -- the answer is "yes, as long as we might be flushing changes to 'display'".  Changes to display can cause XBL binding attachment, which will run the XBL ctor, which can do a sync XHR, which will spin the event loop.

So basically, we're potentially hosed whenever we flush a "display" change during a sample.

Perhaps the simplest fix to this would be to special-case animations on the "display" property so that we follow two rules:
 (1) Never actually look up the "display" base value.
   - Perhaps we could have nsSMILCSSProperty::GetBaseValue return a "dummy" nsSMILValue in this case, and if we still have that dummy nsSMILValue after compositing, we'd take that to mean "base value!", and we'd just request that our animated value be cleared. (see below)

 (2) Don't make our final modification to the "display" animated value until our sample has completed. (whether we're setting it or clearing it)
   - Here, "our sample has completed" just means "we've stopped doing operations that flush style".
   - We could perhaps pass down some reference to a queue of nsRunnables that get dispatched at the end of our sample, to do the work of clearing/setting any "display" values that need tweaking.

I think that should put us in a better position than we're in right now...  And since it's special-cased to the "display" property, it might be less disruptive than the other solutions that Brian suggested in Comment 24.
I should have been clearer.  The worry is anything that can cause things to get a frame when they didn't have one before.  That can include changes to display, float, and position, anything that affect svg switch statements (can you trigger changes to those?), and some other stuff that I'm pretty sure that SMIL can't affect...
Oh, and realistically also any changes that might trigger content policies (e.g. background images and whatnot).... Those can run script themselves and can trigger XBL.  :(
But maybe that shouldn't be a worry, since we run those at "unsafe" times already.
/me gladly reassigns this to birtles, since he's actively working on this. :)  (Thanks Brian!)
Assignee: dholbert → birtles
Attached patch WIP patch v1 (obsolete) — Splinter Review
A first pass at implementing the first part of dholbert's suggestion in comment 26 as discussed today on IRC.
Needs a bit more thought and testing still.
Attached patch Patch v1a (obsolete) — Splinter Review
Here's an attempt at fixing this. In summary it makes the following changes:
* Ignores resample requests when we're already in the middle of a sample
* Forcibly clears the resample flag at the end of a sample to avoid some cases of infinite looping inside nsPresContext
* Skips setting and unsetting the display property to get the base value -- this is really just an optimisation to avoid recreating anonymous content on every sample and not strictly necessary.

There are four test cases included here:
#1: The test case linked to in the description
#2: Jesse's test case, attachment 475735 [details]
#3: A test verifying that xlink animation doesn't cause similar problems (as discussed with dholbert)
#4: A reduced version of the offending SVG file in bug 602880

If this approach is accepted then I will mark bug 602880 as a dupe since this patch, in its current form, fixes that bug as well.

Boris and others, I haven't requested review since I imagine you've already got enough in your queues at the moment, but please feel free to comment.
Attachment #485218 - Attachment is obsolete: true
Attachment #485943 - Flags: review?(dholbert)
Attached patch Patch v1b (obsolete) — Splinter Review
This patch incorporates dholbert's feedback on IRC to simplify the patch and remove some hackiness.
Attachment #485943 - Attachment is obsolete: true
Attachment #485954 - Flags: review?(dholbert)
Attachment #485943 - Flags: review?(dholbert)
Comment on attachment 485954 [details] [diff] [review]
Patch v1b

>+++ b/content/smil/crashtests/572938-1.svg
[...]
>+    <animate attributeName="display" to="none" dur="1s" begin="0s"
>+      repeatCount="indefinite"/>

Just from glancing at this, this animation might not cause cause problems for the first 500ms, and we might move on to the next crashtest before that time has passed.

Could you change this to be from="none" to="none" (or just use <set>), to make that this test fails early enough, if it fails?  r=dholbert with that.  Thanks for fixing this!

(while you're at it, it might be good to add a setCurrentTime(0) for a forced synchronous sample and some "reftest-wait" sauce for good measure, but I won't hold you to that.)
Attachment #485954 - Flags: review?(dholbert) → review+
...or alternately: leave the animate node as-is, but do a setCurrentTime(0); followed by setCurrentTime(0.5); to get two synchronous samples -- one with the base value, and one with the "display:none" value.  (and add reftest-wait, too.)
(In reply to comment #34)
> Comment on attachment 485954 [details] [diff] [review]
> Patch v1b
> 
> >+++ b/content/smil/crashtests/572938-1.svg
> [...]
> >+    <animate attributeName="display" to="none" dur="1s" begin="0s"
> >+      repeatCount="indefinite"/>
> 
> Just from glancing at this, this animation might not cause cause problems for
> the first 500ms, and we might move on to the next crashtest before that time
> has passed.

Actually, this sets the value immediately since:
a) "display" is not interpolatable and SVG says, "The default mode is 'linear', however if the attribute does not support linear interpolation (e.g. for strings), the ‘calcMode’ attribute is ignored and discrete interpolation is used." (SMIL 3.0 says something similar)
b) discrete to-animation sets the to value for the entire duration as per SMIL 3.0, "Since a to animation has only 1 value, a discrete to animation will simply set the to value for the simple duration."

I know you know all this, but I'm just documenting it here since I had to double-check it myself and because I'm sure this question will come up again.

But to make this clear, I've changed it to a <set> animation.
Attachment #485954 - Attachment is obsolete: true
Attachment #486234 - Flags: review+
Whiteboard: [ready to land]
Ah, thanks -- I'd forgotten about (b) (the special handling of to-animation as opposed to from-to animation, for non-interpolatable properties).

Thanks!
Whiteboard: [ready to land] → [waiting to checkin]
Summary: SMIL animation of "display" on <use> triggers ininfinite recursion & ASSERTION: Registering content during sample.: '!mRunningSample', ASSERTION: Unregistering content during sample.: '!mRunningSample' → SMIL animation of "display" on <use> triggers infinite recursion & ASSERTION: Registering content during sample.: '!mRunningSample', ASSERTION: Unregistering content during sample.: '!mRunningSample'
Pushed: http://hg.mozilla.org/mozilla-central/rev/edaabf3fffdd (finally!)
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
Whiteboard: [waiting to checkin]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: