Closed Bug 601999 Opened 9 years ago Closed 9 years ago

Too-much-recursion crash with css filter property

Categories

(Core :: SVG, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0b8
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: jruderman, Assigned: dholbert)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, testcase)

Attachments

(3 files, 1 obsolete file)

(Split from bug 550306)

I'm hitting this a lot, and it's hard to distinguish too-much-recursion crashes from each other, so this makes fuzzing difficult.
blocking2.0: --- → ?
blocking2.0: ? → betaN+
Component: Layout → SVG
QA Contact: layout → general
    Relevant stack bit:

    169 PresShell::FlushPendingNotifications(mozFlushType) + 706
    (nsPresShell.cpp:4800)
    170 PresShell::HandlePostedReflowCallbacks(int) + 250 (nsPresShell.cpp:4692)
    171 PresShell::DidDoReflow(int) + 40 (nsPresShell.cpp:7215)
    172 PresShell::ProcessReflowCommands(int) + 419 (nsPresShell.cpp:7479)
    173 PresShell::FlushPendingNotifications(mozFlushType) + 706
    (nsPresShell.cpp:4800)
The obvious question: which reflow callback is it?
Status: NEW → ASSIGNED
OS: Mac OS X → All
Hardware: x86 → All
It's this reflow callback for the nsHTMLScrollFrame (inside the <select> element), which gets registered here:
826   // Restore the old scroll position, for now, even if that's not valid anymore
827   // because we changed size. We'll fix it up in a post-reflow callback, because
828   // our current size may only be temporary (e.g. we're compute XUL desired sizes).
829   PlaceScrollArea(state, oldScrollPosition);
830   if (!mInner.mPostedReflowCallback) {
831     // Make sure we'll try scrolling to restored position
832     PresContext()->PresShell()->PostReflowCallback(&mInner);
http://hg.mozilla.org/mozilla-central/annotate/882525a98119/layout/generic/nsGfxScrollFrame.cpp#l825

The callback function itself is nsGfxScrollFrameInner::ReflowFinished
http://hg.mozilla.org/mozilla-central/annotate/882525a98119/layout/generic/nsGfxScrollFrame.cpp#l2873

Regression range for this bug:
  20100813031312 d5e211bdd793
  20100814030707 656d99ca089c
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d5e211bdd793&tochange=656d99ca089c

The lowest push in that range looks most related:
http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=29114207a571
Keywords: regression
(In reply to comment #3)
> The lowest push in that range looks most related:
> http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=29114207a571

Hm, I take that back -- a targeted build at that push is unaffected.

Now I'm suspecting this cset:
 http://hg.mozilla.org/mozilla-central/rev/1bf9a4c8c8b4
 Bug 572689 - Make nsSVGRenderingObservers observe elements instead of frames

I think something like this is going on: we reflow our scrollframe, and that invalidates the filter (which the scrollframe is a part of), and that triggers a reflow of the frames to which the filter is applied (including the scrollframe). This continues in a death-spiral.
Blocks: 572689
Here's a somewhat-simpler testcase with an explicit scrollframe (using "overflow-x: scroll" instead of <select>).

In this testcase, the filter itself & the element to which it applies are both just the div with "overflow-x" (rather than the entire document).
OK -- the first affected changeset is 4a50f3c34d5a, and the crash goes away if I comment out this line that was added in that changeset:

> diff --git a/layout/generic/nsFrame.cpp b/layout/generic/nsFrame.cpp
> @@ -3894,6 +3894,7 @@ nsIFrame::InvalidateInternal(const nsRec
>                               nsIFrame* aForChild, PRUint32 aFlags)
>  {
>  #ifdef MOZ_SVG
> +  nsSVGEffects::InvalidateDirectRenderingObservers(this);
>    if (nsSVGIntegrationUtils::UsingEffectsForFrame(this)) {

http://hg.mozilla.org/mozilla-central/diff/4a50f3c34d5a/layout/generic/nsFrame.cpp
Blocks: 506826
No longer blocks: 572689
Blocks: 587336
NOTE: up until the changeset in Comment 6, mozilla-central only had one call to nsSVGEffects::InvalidateDirectRenderingObservers -- in nsFrame::DestroyFrom(nsIFrame* aDestructRoot).
So I see a few possible strategies here:
 A. Remove the line mentioned in Comment 6.

 B. Check for cycles in nsSVGRenderingObserver::InvalidateViaReferencedElement

 C. Check for cycles in nsSVGEffects::AddRenderingObserver

I just tried (A) -- removing the regressing line -- on TryServer, and it doesn't actually break any unit tests, which is encouraging.  I'm guessing it'd be possible to construct a -moz-element testcase that would end up painting stale data as a result of that change, but I'm not immediately sure what that testcase would look like.

(B) and (C) might be better for correctness, but I'm not entirely sure how to best check for cycles there. Maybe we'd disallow rendering-observer-notifications being sent to an element from any of its ancestors?  And even if that's the right thing to do, it could still be undesirable perf-wise (it'd make us walk the DOM tree on every InvalidateViaReferencedElement or AddRenderingObserver call).

Also: I'm not sure that (C) would work after tree-surgery -- i.e. when we re-parent an element, do we un-register and then re-register it as a rendering observer?  If not, then you could still construct testcases that fail here, if they just do some node-reparenting at the last minute.

I tend to lean towards strategy (A), but I'm open to suggestions/advice.  (Maybe there's another obvious solution I'm missing?)
Attached patch fix: delete regressing line (obsolete) — Splinter Review
This is "Strategy A" from comment 8. (delete the regressing line)

Includes a crashtest from this bug & one from related bug 587336, both of which fail (hang or crash) pre-patching but pass post-patching.
Attachment #488050 - Flags: review?(roc)
Comment on attachment 488050 [details] [diff] [review]
fix: delete regressing line

mstange, I'd appreciate any thoughts you have on this, if you get a chance (in particular, what it might break, & if any better fix-strategies jump out at you).
Attachment #488050 - Flags: review?(mstange)
I think we need that wherever -moz-element needs to observe changes in the rendering of an HTML element.

(In reply to comment #4)
> I think something like this is going on: we reflow our scrollframe, and that
> invalidates the filter (which the scrollframe is a part of), and that triggers
> a reflow of the frames to which the filter is applied (including the
> scrollframe).

Why is invalidation triggering a reflow?
Not sure off the top of my head -- I've been working with the testcases in bug 587336 more recently than the testcases here.  However, just a restyle is sufficient to cause problems, in that bug's testcases -- see e.g. the stack in attachment 486783 [details]. (& description in bug 587336 comment 4).
So to answer comment 11 a little better -- what happens in the testcase here is basically the following:
 (1) We're reflowing the page
 (2) When we reflow the scrollFrame's child-block-frame, that triggers the InvalidateDirectRenderingObservers call from Ccomment 6
 (3) That uses NS_UpdateHint to request an (asynchronous) reflow.
 (4) At the end of our current reflow, we call HandlePostedReflowCallbacks, which runs a callback to update our scroll frame's scrollbars. (noted in comment 3)
 (5) That callback returns true, which makes us call FlushPendingNotifications at the last line of HandlePostedReflowCallbacks.
 (6) This flush makes us synchronously run the reflow that we'd requested in Step 3.
   (Return to step 1; repeat until stack is exhausted)
(In reply to comment #13)
>  (3) That uses NS_UpdateHint to request an (asynchronous) reflow.

Meant to reference this code (nsSVGFilterProperty::DoUpdate) for this step:
http://mxr.mozilla.org/mozilla-central/source/layout/svg/base/src/nsSVGEffects.cpp?mark=276-278#265
(In reply to comment #13)
>  (1) We're reflowing the page
>  (2) When we reflow the scrollFrame's child-block-frame, that triggers the
> InvalidateDirectRenderingObservers call from Ccomment 6
>  (3) That uses NS_UpdateHint to request an (asynchronous) reflow.

OK, the reason we're doing that is to ensure that the ancestors' overflow areas are updated. But during reflow that's going to happen anyway. So we need to suppress that additional reflow request when the frame affected is currently being reflowed. I think you can just check FRAME_IN_REFLOW somewhere.
Attached patch fix v2Splinter Review
Here's a fix along the lines of comment 15 (just guarding the NS_UpdateHint call with a NS_FRAME_IN_REFLOW state-check.)

This fixes this bug's testcases, but not Bug 587336's testcases -- Bug 587336 is going to need its own patch.
Attachment #488050 - Attachment is obsolete: true
Attachment #488377 - Flags: review?(roc)
Attachment #488050 - Flags: review?(roc)
Attachment #488050 - Flags: review?(mstange)
Thanks for looking into this, Daniel, and sorry that I didn't speak up sooner.

(In reply to comment #8)
> I just tried (A) -- removing the regressing line -- on TryServer, and it
> doesn't actually break any unit tests, which is encouraging.

Uh oh, shame on me for lack of test coverage. I'll write a test that would break asap.

The call to InvalidateDirectRenderingObservers has the purpose of invalidating (purely visually) any elements that use the current element as a -moz-element paint server. The fact that it can also cause reflows is unfortunate, and in fact it's never necessary, at least not from nsIFrame::InvalidateInternal. (It is necessary when, for example, a filter's overflow actually changes, but then we're notified elsewhere and don't need to do anything in InvalidateInternal.)

Maybe we should split rendering observers into two types? One "visual" type for -moz-element and another for filters, masks, etc. Then we'd only invalidate visual observers from nsIFrame::InvalidateInternal, which must not cause reflows, and the problem is gone.
Daniel, do you think the approach I suggested in the last comment would fix bug 587336, too?

I mean, the patches here and in bug 587336 are fine, but maybe the fact that we're unnecessarily triggering restyles during invalidation at all has caused more bugs of this kind which we don't know about yet. Restoring the pre-bug 506826 state might be the less risky approach.
(In reply to comment #17)
> Maybe we should split rendering observers into two types? One "visual" type for
> -moz-element and another for filters, masks, etc. Then we'd only invalidate
> visual observers from nsIFrame::InvalidateInternal, which must not cause
> reflows, and the problem is gone.

That would work, and I'd definitely take a patch for that.

However, one of our post-FF4 priorities is to rework invalidation to compare "before" and "after" display lists instead of using the nsIFrame::Invalidate machinery. That will make all these issues go away, or at least behave very differently.
Thanks for the feedback, Markus!  Your suggestion sounds promising, but I personally am a little hesitant to implement that change, when there's a more targeted fix available, primarily because:
 - it sounds like it could be somewhat risky, this late in the game (for the potential benefit)
 - any other as-yet-undetected bugs of this sort are likely to be sg:dos at worst (infinite loops)
 - as roc pointed out, the situation here will change post-Firefox 4
 - there are other blockers that need my attention :)

So, I'd lean towards sticking with the fix strategies in the patches attached here and on bug 587336, and then perhaps implementing Markus's suggestion in a separate bug if anyone can get to it, but probably not blocking the Firefox 4 release on that. Does that sound reasonable?
Landed: http://hg.mozilla.org/mozilla-central/rev/3dac4e710157
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
(In reply to comment #20)
> Does that sound reasonable?

Sure. Thanks!
You need to log in before you can comment on or make changes to this bug.