Closed Bug 534975 Opened 15 years ago Closed 15 years ago

Crash when modifying 'display' property for a <use> element whose target is animated [@ nsSMILCSSValueType::ValueFromString(nsCSSProperty, nsIContent*, nsAString_internal const&, nsSMILValue&) const ]

Categories

(Core :: SVG, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta1+

People

(Reporter: marek.raida, Assigned: dholbert)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091215 Minefield/3.7a1pre
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091215 Minefield/3.7a1pre

Sadly, not able to reproduce it on simplified crash case, sorry :-(
I'm also not sure, if it is related to SMIL and CSS properties animation, but I suppose so, but maybe I'm wrong. 
Happens on Win7 or WinXP for sure

Reproducible: Always

Steps to Reproduce:
1. Open URL of complex SVG+SMIL document, like http://svg.kvalitne.cz/cavern/088/cavern.svg
2. Open it in more tabs, three or more (two seems ok for me)
3. start the game by clicking START link or push Enter key, and than do it quickly also in all other tabs
Actual Results:  
Browser crashes after couple of seconds

Expected Results:  
Continue working without crashing
Do you have any crash report Ids. Type about:crashes in the location bar if you've missed them.
crashes with the testcase url opened in four tabs and waiting several seconds.

Signature	GetCSSComputedValue
UUID	ee2624d1-0250-4a67-b607-4fa0c2091215
Time 	2009-12-15 15:18:51.59294
Uptime	19811
Last Crash	99946 seconds before submission
Product	Firefox
Version	3.7a1pre
Build ID	20091215045531
Branch	1.9.3
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
CPU	x86
CPU Info	GenuineIntel family 15 model 2 stepping 9
Crash Reason	EXCEPTION_ACCESS_VIOLATION
Crash Address	0x0
User Comments	Bug 534975
Processor Notes

Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	GetCSSComputedValue 	content/smil/nsSMILCSSProperty.cpp:61
1 	xul.dll 	nsSMILCSSProperty::GetBaseValue() 	content/smil/nsSMILCSSProperty.cpp:103
2 	xul.dll 	nsSMILAnimationFunction::WillReplace() 	content/smil/nsSMILAnimationFunction.cpp:347
3 	xul.dll 	nsSMILCompositor::ComposeAttribute() 	content/smil/nsSMILCompositor.cpp:157

it *looks* like Bug 529387, but its fix got checked in already and the testcase mentioned in Bug 529387 comment 0 does not crash (anymore).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: crash
Version: unspecified → Trunk
Filed bug 535004 on a slightly different method of reproducing a crash at this spot (opening just one tab of the "Cavern" demo, and closing Firefox while that's still loading).

I'm going to work on fixing that bug -- hopefully that bug's patch will fix the crash on this bug's STR, too.
See Also: → 535004
Version: Trunk → unspecified
Version: unspecified → Trunk
Another crash ID from WinXP is: 4bdf8e17-2088-4029-9a20-b36a12091216
Severity: normal → critical
Summary: Browser crash with more complex SMIL animated documents opened simultaneously → Browser crash with more complex SMIL animated documents opened simultaneously [@ GetCSSComputedValue ]
The patch posted in bug 535004 will hopefully fix this.
Depends on: 535004
Here's a crash report for this crash on my Linux machine:
http://crash-stats.mozilla.com/report/index/23ab2ec4-49c5-4047-abc7-bb19f2091216
OS: Windows 7 → All
Hardware: x86 → All
Bug 535004's patch doesn't actually fix this.

What's generally happening here is this:
* BEGIN SAMPLE
 - We aggregate all our animation functions into a compositor table
 - We proceed through the compositor table, compositing each one.
 - The first time we look up a base value of a CSS property, we get in trouble:
   * nsComputedDOMStyle::GetPropertyCSSValue calls  nsDocument::FlushPendingNotifications(Flush_Style)
   * This triggers an AnonymousContentDestroyer to unbind a subtree of content from our document (including an animation element and its parent-target)[1] I think this subtree is generated content from <use> element.
 - We continue compositing stuff from our Compositor table -- but when we reach the compositor for the target-element that's now been unbound from our tree, we die. Specifically, we die when we call GetCurrentDoc() on this target element and assume that the result is non-null.  (The result is, in fact, null, because this target has been removed from the document)

[1] Incidentally, when we call UnbindFromTree on the removed animate element, this triggers a synchronous Resample() call, which makes us DoSample() while we're already in the middle of a DoSample() call.  This is a problem in and of itself, but it's not the key problem here)
So the key issue is:
 - At the beginning of a sample, we build a compositor table (effectively a list of animation targets that need to be evaluated).
 - In the middle of a sample (while we're iterating through the table), we might look at the base value of a CSS property, which will flush style changes...
   ... and that can trigger us to remove nodes from our tree...
   ... which can kill us later in this sample, if those nodes are in our compositor table, still waiting to be evaluated.  (When we get to them, we assume that they're in a document -- but they aren't anymore.)

I think we can work around the basic issue here by preemptively calling
>  mDocument->FlushPendingNotifications(Flush_Style)
at the very beginning of DoSample.  That way, we'll get any content-tree-modifications up front, **before** we construct our Compositor Table.
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
This testcase uses scripting to quickly toggle the 'display' property between 'none' and the default value, for a node that's referenced via <use>.

When this style modification gets queued up in such a way that it's flushed during a sample, we'll destroy content mid-sample and hit issues as described in my last 2 comments.

This testcase aborts a debug build within a few seconds, with this message:
###!!! ABORT: active animations should only be able to target elements that are in a document: 'doc', file ../../../mozilla/content/smil/nsSMILCSSValueType.cpp, line 297

and it crashes opt builds within a few seconds as well, with this stack:
bp-d5cd272c-7607-45e3-87f6-ec4962091221
NOTE: In the original steps to reproduce here, the multiple-simultaneous-documents probably contributed to the crash by slowing down our handling of restyle events (making it more likely that a restyle would be queued up when we enter a sample).

The reduced testcase crashes just fine with only one instance of it open, though.
(In reply to comment #9)
> I think we can work around the basic issue here by preemptively calling
> >  mDocument->FlushPendingNotifications(Flush_Style)
> at the very beginning of DoSample.  That way, we'll get any
> content-tree-modifications up front, **before** we construct our Compositor
> Table.

Even with that fix, though, I think we a modified version of the reduced testcase could still crash us...  If the testcase tweaked 'display' with SMIL rather than with JS, I think we could still end up hiding nodes during the sample and then stumbling across their null document-pointers later on.

One other solution would be to just remove the assumption that an animation target will have a non-null document, when nsSMILCSS* gets to it. We can add null-checks that would just bail out of compositing.
Here's a modified reduced testcase as described in comment 12.  This still makes us crash, even if I flush restyles at the beginning of each sample. (as posited above)
Attachment #418899 - Attachment description: testcase 2 (crashes Firefox when loaded) → reduced testcase 2 (crashes Firefox when loaded)
This fixes the crash on both testcases, though we still get many copies of this assertion:
###!!! ASSERTION: Resample dirty flag set during sample!: '!mResampleNeeded', file ../../../mozilla/content/smil/nsSMILAnimationController.cpp, line 375

This assertion is the issue described in the footnote to comment 8.  It's a related but separate issue from the crash here, and it needs its own bug.
Attachment #418908 - Flags: review?(roc)
Summary: Browser crash with more complex SMIL animated documents opened simultaneously [@ GetCSSComputedValue ] → Crash when modifying 'display' property for a <use> element whose target is animated [@ nsSMILCSSValueType::ValueFromString(nsCSSProperty, nsIContent*, nsAString_internal const&, nsSMILValue&) const ]
Blocks: 535403
Comment on attachment 418908 [details] [diff] [review]
fix v1: Add checks for null document pointer (and bail out if null)

 GetPresContextForElement(nsIContent* aElem)
 {
   nsIDocument* doc = aElem->GetCurrentDoc();
-  NS_ABORT_IF_FALSE(doc, "active animations should only be able to "
-                    "target elements that are in a document");
+  if (!doc) {
+    // This can happen if we process certain types of restyles mid-sample
+    // and remove anonymous animated content from the document as a result.
+    // See bug 534975.
+    return PR_FALSE;

return nsnull;
Attachment #418908 - Flags: review?(roc) → review+
I think it might be a good idea to flush restyles before we start the sample operation, and do the sample operations with a script blocker in place so that flushes during the operations are disabled.
Although I suspect that the correct way to fix testcase #2 at least would be to have the <use> anonymous nodes owned by the element and present whether the <use> element has a frame or not.
Attachment #418899 - Attachment is patch: false
Attachment #418899 - Attachment mime type: text/plain → image/svg+xml
(In reply to comment #15)
> return nsnull;

Oops, thanks! fixed in my patch queue.

(In reply to comment #16)
> I think it might be a good idea to flush restyles before we start the sample
> operation, and do the sample operations with a script blocker in place so that
> flushes during the operations are disabled.

I just tested that with a script blocker, and it indeed fixes testcase 2.  It also fixes the assertion mentioned in comment 14 (which fix v1 does not).
I don't think we actually want to use a script blocker, though -- if we tweak a node's style during a sample, we want that to be reflected in its descendants' computed style *in that same sample*.  (That doesn't reliably work yet because we don't always compose ancestors before their descendants -- bee Bug 501183 -- but once that bug is fixed, we want this to work.)
(In reply to comment #17)
> Although I suspect that the correct way to fix testcase #2 at least would be to
> have the <use> anonymous nodes owned by the element and present whether the
> <use> element has a frame or not.

So, it looks like those nodes' lifetimes are currently tied to the lifetime of the <use> elements' frame, via
 * nsSVGUseFrame::CreateAnonymousContent()
 * nsSVGUseFrame::Destroy() (which calls nsSVGUseElement::DestroyAnonymousContent)

I'm not entirely sure what the repurcussions of messing with that would be.  

Roc, is it ok if I land "fix v1" for now to fix this crash, and file a followup on fixing the <use> anonymous node ownership issues?
Yes, definitely.

(In reply to comment #19)
> I don't think we actually want to use a script blocker, though -- if we tweak a
> node's style during a sample, we want that to be reflected in its descendants'
> computed style *in that same sample*.  (That doesn't reliably work yet because
> we don't always compose ancestors before their descendants -- bee Bug 501183 --
> but once that bug is fixed, we want this to work.)

Right, OK.
Landed fix v1: http://hg.mozilla.org/mozilla-central/rev/2b253512942b

Will file a followup & then mark this as fixed.
Filed bug 536660 as the followup making <use>'s anonymous content stay around even if it has no frames.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Crash Signature: [@ nsSMILCSSValueType::ValueFromString(nsCSSProperty, nsIContent*, nsAString_internal const&, nsSMILValue&) const ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: