Closed Bug 1397795 Opened 7 years ago Closed 6 years ago

Crash [@ReflowFrame] due to infinite recursion, with multicol, vertical writing-mode, SVG, & self-referential filter

Categories

(Core :: Layout: Block and Inline, defect, P2)

41 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- fixed
firefox61 --- fixed

People

(Reporter: jkratzer, Assigned: jwatt)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, testcase)

Attachments

(8 files)

Attached file trigger.html
Testcase found while fuzzing mozilla-central rev 20170906-d8e238b811d3.
Flags: in-testsuite?
Attached file stacktrace.txt
Severity: normal → critical
Priority: -- → P2
INFO: Last good revision: 22f51211915bf7daff076180847a7140d35aa353 (2016-01-01)
INFO: First bad revision: ce643acfab14d95bea2fb6c4f56477413514b686 (2016-01-02)
INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=22f51211915bf7daff076180847a7140d35aa353&tochange=ce643acfab14d95bea2fb6c4f56477413514b686

Bug 1208344 maybe?
Has Regression Range: --- → yes
Version: unspecified → 46 Branch
(In reply to Ryan VanderMeulen [:RyanVM] from comment #2)
> Bug 1208344 maybe?

Unlikely. The "-webkit-box-orient" support added there has nothing to do with the "-webkit-box-shadow" used in the testcase here.

I can take a look, though...
Flags: needinfo?(dholbert)
Oh! This would've been:
>	9fbf850dc78d	Daniel Holbert — Bug 1213126: Enable support for webkit-prefixed CSS properties & features by default. r=heycam

So, we just started honoring the testcase's "-webkit-box-shadow" as of that ^^ bug.

I'll bet a modified testcase with "box-shadow" and "-moz-box-shadow" would break in significantly older builds.
Flags: needinfo?(dholbert)
So that testcase goes back to:
 7:01.38 INFO: Last good revision: ccd6b5f5e544c1d708849144943a776941bd3794 (2015-09-20)
 7:01.38 INFO: First bad revision: 039a8490891595736b16a3ccb17f025f4dcf13eb (2015-09-21)
 7:01.38 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ccd6b5f5e544c1d708849144943a776941bd3794&tochange=039a8490891595736b16a3ccb17f025f4dcf13eb

...which is probably:
Jonathan Kew — Bug 1205787 - Map the writing-mode values from SVG1.1 to their CSS equivalents. r=heycam

...which (like comment 4) is *also* a change that just activated some CSS in the testcase to enable a feature that normally goes by another name.  Specifically, it made "tb" in the testcase become treated as "vertical-rl".

I'll post another testcase with that changed manually, and its regression range...
This testcase now uses an actual CSS writing-mode keyword, and it regressed (started crashing) with this range:
 3:43.44 INFO: Last good revision: ce863f9d8864 (2015-06-16)
 3:43.44 INFO: First bad revision: d7c148c84594 (2015-06-17)
 3:43.44 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ce863f9d8864&tochange=d7c148c84594

In that range, my first guess would be bug 1174450 as the regressor (which was a many-part change to how we handle "writing-mode" in layout).
Blocks: 1174450
Keywords: regression
Version: 46 Branch → 41 Branch
Here's a further-reduced testcase (same regression range as previous one from comment 7).

This one is just a more human-readable testcase -- the JS-generated HTML/SVG is just converted to *actual* HTML/SVG, and the rules have been simplified & better targeted (rather than using a "*" selector).
Attachment #8912060 - Attachment description: testcase 4 (static, better targeting of CSS rules) → testcase 4 (static, better targeting of CSS rules) (WARNING, may crash your content process, like all testcases here)
Hey Daniel, could you perhaps try this testcase under rr and see where the crash happens?
Flags: needinfo?(dholbert)
The crash is from infinite recursion, due to us performing a "followup reflow" in a reflow callback, and then probably requesting another followup reflow during that one, and servicing that in HandlePostedReflowCallbacks.  And each of these reflow-callback-triggered reflows puts us a few stack-levels deeper.

Here's a snippet of the gdb backtrace when we crash -- note the recurring pattern (HandlePostedReflowCallbacks calling down to DidDoReflow [and at the lowest level here, into DoReflow, which would later become DidDoReflow if we hadn't overflowed the stack]).
======
#35 0x00007f5e4d733c3f in nsHTMLScrollFrame::Reflow(nsPresContext*, mozilla::ReflowOutput&, mozilla::ReflowInput const&, nsReflowStatus&) (this=0x7f5e2cc1a158, aPresContext=0x7f5e2c711800, aDesiredSize=..., aReflowInput=..., aStatus=...) at $SRC/layout/generic/nsGfxScrollFrame.cpp:1053
#36 0x00007f5e4d67b229 in nsContainerFrame::ReflowChild(nsIFrame*, nsPresContext*, mozilla::ReflowOutput&, mozilla::ReflowInput const&, int, int, unsigned int, nsReflowStatus&, nsOverflowContinuationTracker*) (this=0x7f5e2cc1a020, aKidFrame=0x7f5e2cc1a158, aPresContext=0x7f5e2c711800, aDesiredSize=..., aReflowInput=..., aX=0, aY=0, aFlags=0, aStatus=..., aTracker=0x0)
    at $SRC/layout/generic/nsContainerFrame.cpp:985
#37 0x00007f5e4d67aab2 in mozilla::ViewportFrame::Reflow(nsPresContext*, mozilla::ReflowOutput&, mozilla::ReflowInput const&, nsReflowStatus&) (this=0x7f5e2cc1a020, aPresContext=0x7f5e2c711800, aDesiredSize=..., aReflowInput=..., aStatus=...)
    at $SRC/layout/generic/ViewportFrame.cpp:335
#38 0x00007f5e4d54e897 in mozilla::PresShell::DoReflow(nsIFrame*, bool) (this=
    0x7f5e2c7d8000, target=0x7f5e2cc1a020, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:8963
#39 0x00007f5e4d556d55 in mozilla::PresShell::ProcessReflowCommands(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:9136
#40 0x00007f5e4d55698a in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aFlush=...)
    at $SRC/layout/base/PresShell.cpp:4250
#41 0x00007f5e4d516a40 in nsIPresShell::FlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aType=...)
    at ../../../mozilla/layout/base/nsIPresShell.h:584
#42 0x00007f5e4d555d5b in mozilla::PresShell::DoFlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at $SRC/layout/base/PresShell.cpp:4071
#43 0x00007f5e4a8febf6 in nsIPresShell::FlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at ../../../mozilla/layout/base/nsIPresShell.h:575
#44 0x00007f5e4d555c39 in mozilla::PresShell::HandlePostedReflowCallbacks(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:4040
#45 0x00007f5e4d54f0d9 in mozilla::PresShell::DidDoReflow(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:8792
#46 0x00007f5e4d556e2e in mozilla::PresShell::ProcessReflowCommands(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:9148
#47 0x00007f5e4d55698a in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aFlush=...)
    at $SRC/layout/base/PresShell.cpp:4250
#48 0x00007f5e4d516a40 in nsIPresShell::FlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aType=...)
    at ../../../mozilla/layout/base/nsIPresShell.h:584
#49 0x00007f5e4d555d5b in mozilla::PresShell::DoFlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at $SRC/layout/base/PresShell.cpp:4071
#50 0x00007f5e4a8febf6 in nsIPresShell::FlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at ../../../mozilla/layout/base/nsIPresShell.h:575
#51 0x00007f5e4d555c39 in mozilla::PresShell::HandlePostedReflowCallbacks(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:4040
#52 0x00007f5e4d54f0d9 in mozilla::PresShell::DidDoReflow(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:8792
#53 0x00007f5e4d556e2e in mozilla::PresShell::ProcessReflowCommands(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:9148
#54 0x00007f5e4d55698a in mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aFlush=...)
    at $SRC/layout/base/PresShell.cpp:4250
#55 0x00007f5e4d516a40 in nsIPresShell::FlushPendingNotifications(mozilla::ChangesToFlush) (this=0x7f5e2c7d8000, aType=...)
    at ../../../mozilla/layout/base/nsIPresShell.h:584
#56 0x00007f5e4d555d5b in mozilla::PresShell::DoFlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at $SRC/layout/base/PresShell.cpp:4071
#57 0x00007f5e4a8febf6 in nsIPresShell::FlushPendingNotifications(mozilla::FlushType) (this=0x7f5e2c7d8000, aType=mozilla::FlushType::InterruptibleLayout)
    at ../../../mozilla/layout/base/nsIPresShell.h:575
#58 0x00007f5e4d555c39 in mozilla::PresShell::HandlePostedReflowCallbacks(bool) (this=0x7f5e2c7d8000, aInterruptible=true)
    at $SRC/layout/base/PresShell.cpp:4040
======

It's possible this is related to bug 1403656 (another stack-overflow-crash with a similar backtrace and with SVG in the testcase)... Marking see-also.
See Also: → 1403656
Summary: Crash [@ReflowFrame] due to infinite recursion → Crash [@ReflowFrame] due to infinite recursion, with multicol, vertical writing-mode, SVG, & self-referential filter
So for testcase 4, there are two cycles involve.

The first is the scroll frame, which post a reflow callback to trigger another flush every time it gets reflowed, [1] for reason I don't know. This is generally fine, because after the first reflow, everything should be clean, and the further flush shouldn't be reflowing the scroll frame again.

However, for this case, before the next flush, we already have a new restyle event posted on the <svg> element, and thus we restyle that element and mark it dirty and all the way up to the root frame as "HAS_DIRTY_CHILDREN", and thus the scroll frame has dirty children before the next reflow, and consequently we reflow the scroll frame again as above.

The restyle event on <svg> element is posted by nsSVGFilterProperty::OnRenderingChange() [2] with nsChangeHint_UpdateOverflow set. That function is called (indirectly) from FlushOverflowChangedTracker() which is called after we restyle everything. The SVG frame is added to the OverflowChangedTracker because the restyle event posted by previous restyle contains nsChangeHint_UpdateOverflow. This is the other cycle.

Actually, the restyle cycle is probably the driving cycle, and the scroll frame cycle just gives it a chance to go into recursion. Without the scroll frame cycle, the restyle cycle can still drive itself via refresh driver, and we would just waste CPU on every tick to restyle it instead of crash due to stack overflow.

We probably should find a way to break the second cycle. Probably need someone familiar with SVG to see what should we do here.


[1] https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/layout/generic/nsGfxScrollFrame.cpp#1072
[2] https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/layout/svg/SVGObserverUtils.cpp#363-364
And the reason that nsSVGFilterProperty::OnRenderingChange() is triggered repeatedly is because the visual overflow of the reference element continuously grows. Each iteration it moves 9x9 towards top left, and the size grows 22x22.

The reason of the growing is probably from the styles here.
It is not clear to me, though, why SVG needs to update its visual overflow area with the visual overflow area of its reference element. I have a feeling that filter should just affect how the SVG element itself is painted, and thus it should never affect the visual overflow of the element.

It seems that updating visual overflow of SVG element with that of reference element was initially implemented in bug 775735, but I don't see much explanation there. jwatt, why do we need to do that?


Also it is unclear as well why the overflow area keeps growing... Why do we have anything here consistently extending the overflow area on top of that get from the reference element? Probably it's because of its appearance and box-shadow?
Flags: needinfo?(jwatt)
(In reply to Daniel Holbert [:dholbert] from comment #10)
> The crash is from infinite recursion, due to us performing a "followup
> reflow" in a reflow callback, and then probably requesting another followup
> reflow during that one, and servicing that in HandlePostedReflowCallbacks.

Do we have a use for reflowing in a post-reflow callback? It sounds suspect, and if not, it seems like we should be asserting that that never happens.
Flags: needinfo?(jwatt)
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #13)
> It is not clear to me, though, why SVG needs to update its visual overflow
> area with the visual overflow area of its reference element. I have a
> feeling that filter should just affect how the SVG element itself is
> painted, and thus it should never affect the visual overflow of the element.

By definition, the visual overflow area is (or needs to enclose) the area that is visually painted. If we fail to make sure that happens correctly then display list building and invalidation will go wrong.
The reason that the visual overflow region keeps growing every reflow is because we end up under this stack while updating overflow for the nsSVGOuterSVGFrame:

  nsIFrame::GetVisualOverflowRectRelativeToSelf
  nsLayoutUtils::GetBoxShadowRectForFrame
  ComputeEffectsRect
  nsIFrame::FinishAndStoreOverflow
  nsIFrame::UpdateOverflow
  mozilla::OverflowChangedTracker::Flush

GetBoxShadowRectForFrame is expanding the nsSVGOuterSVGFrame's *current* visual overflow rect every time it's called. We should instead be passing aOverflowRect (the in-progress calculation) into GetBoxShadowRectForFrame and have it expand that.
Assignee: nobody → jwatt
The broken nsLayoutUtils::GetBoxShadowRectForFrame code comes from bug 1174332:

https://hg.mozilla.org/mozilla-central/rev/893e9097a752

nsLayoutUtils::GetBoxShadowRectForFrame was changed to use the frame's visual overflow rect (instead of its frame size) in order to take account of any expansion (over the frames size) that native theming causes prior to any extra expansion from box-shadow. That isn't a valid thing to do though, since it means we're taking a prior (possibly completely wrong anyway) final overflow rect calculation and using it as the input for one of the many steps (box-shadow expansion) we'd normally take under nsIFrame::UpdateOverflow in order to figure out the frame's overflow rect.

There are two callers of nsLayoutUtils::GetBoxShadowRectForFrame:

  * ComputeEffectsRect
  * nsDisplayBoxShadowOuter::GetBoundsInternal

For the former we have the actual partially calculated visual overflow rect that we can pass into GetBoxShadowRectForFrame.

For the latter we can't sensibly do much to pass in the correct value without redoing a lot of the work done to calculate the frame's visual overflow rect. In that case it possibly does just make most sense to expand on the final visual overflow rect. It will generally just mean that we overestimte the nsDisplayBoxShadowOuter's bounds.
Hmm, although I guess if any effects that are applied after the box-shadow (SVG effects, absolute position clipping, CSS transforms), that could actually make the frame's final visual overflow area *smaller* than the area that should be passed into the box-shadow step, in which case we could end up with bounds for nsDisplayBoxShadowOuter that are too small.

Matt, any ideas on how we can fix nsDisplayBoxShadowOuter::GetBoundsInternal? If the pre-box-shadow overflow rect isn't the same as mRect, store it as a frame property (similar to the way we store PreTransformOverflowAreasProperty)?
Flags: needinfo?(matt.woodrow)
That seems like a reasonable solution to me!
Flags: needinfo?(matt.woodrow)
(In reply to Jonathan Watt [:jwatt] (needinfo? me) from comment #17)
>   * ComputeEffectsRect
>   * nsDisplayBoxShadowOuter::GetBoundsInternal
> 
> For the former we have the actual partially calculated visual overflow rect
> that we can pass into GetBoxShadowRectForFrame.

Actually this is not the right thing to do. As best I can make out from the CSS specs, the box shadow is supposed to be based on the border box without using any other effects or other things that might affect visual overflow.

Maybe the correct way to have fixed bug 1174332 was actually to stop using native theming visual output in the creation of box-shadow output? I think that's bug 1327935.

At any rate, for now, to preserve the status quo while preventing this crash it seems best to remove the GetVisualOverflowRectRelativeToSelf call in nsLayoutUtils::GetBoxShadowRectForFrame and instead simply expand the new frame bounds for any native theming.

We could still use a frame property to optimize fetching of the box shadow bounds as I proposed in comment 18, but _if_ we do that it should be a separate patch.
Markus, it's not clear to me what the comment you wrote about opaque theme widgets means, or what border-radius has to do with anything here. While you're reviewing this, can you explain so I can fix up the comments?
Attachment #8958091 - Flags: review?(mstange)
Flags: needinfo?(dholbert)
Let's see what you think of this.
Attachment #8958120 - Flags: review?(mstange)
Comment on attachment 8958091 [details] [diff] [review]
fix nsLayoutUtils::GetBoxShadowRectForFrame to not use GetVisualOverflowRectRelativeToSelf

Review of attachment 8958091 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsLayoutUtils.cpp
@@ +8756,5 @@
>    const nsStyleDisplay* styleDisplay = aFrame->StyleDisplay();
>    nsITheme::Transparency transparency;
>    if (aFrame->IsThemed(styleDisplay, &transparency)) {
>      // For opaque (rectangular) theme widgets we can take the generic
>      // border-box path with border-radius disabled.

Good point, this is only true if the opaque widget does not have overflow. I think that's the case for all our opaque widgets, but we shouldn't rely on it.

So it'd probably be better to remove the != opaque check. And for completeness, we should also remove it from nsCSSRendering::HasBoxShadowNativeTheme.
Attachment #8958091 - Flags: review?(mstange) → review+
Comment on attachment 8958120 [details] [diff] [review]
part 2 - avoid recalculating box shadow visual overflow in display list code

Review of attachment 8958120 [details] [diff] [review]:
-----------------------------------------------------------------

What's the reason for this change? Performance? I'd be surprised if it made a big difference.
(In reply to Markus Stange [:mstange] from comment #24)
> What's the reason for this change? Performance? I'd be surprised if it made
> a big difference.

Nah, just because I'd already written the code (before realizing that box-shadow doesn't depend on partially computed visual overflow) and thought I might as well put it up to see what you think. Feel free to r-.
Comment on attachment 8958120 [details] [diff] [review]
part 2 - avoid recalculating box shadow visual overflow in display list code

Oh, I see. :)
Let's not do it then.
Attachment #8958120 - Flags: review?(mstange) → review-
Pushed by jwatt@jwatt.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7c420c8dc2a
Fix stack overflow crash due to box-shadow. r=mstange
Could you also land the testcase as a crashtest?
Flags: needinfo?(jwatt)
Doh! Will do, thanks.
Flags: needinfo?(jwatt)
Keywords: leave-open
Attachment #8958984 - Flags: review?(mstange) → review+
Pushed by jwatt@jwatt.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c8618ca631bf
part 2 - Crashtest for box-shadow. r=mstange
https://hg.mozilla.org/integration/mozilla-inbound/rev/24a247584c06
part 3 - Don't rely on opaque widgets not having overflow. r=mstange
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/c8618ca631bf
https://hg.mozilla.org/mozilla-central/rev/24a247584c06
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Is this something you wanted to nominate for uplift or let ride the trains?
Flags: needinfo?(jwatt)
Flags: in-testsuite?
Flags: in-testsuite+
Comment on attachment 8958091 [details] [diff] [review]
fix nsLayoutUtils::GetBoxShadowRectForFrame to not use GetVisualOverflowRectRelativeToSelf

Thanks for asking Ryan. We could uplift this safely I think.

Approval Request Comment
[Feature/Bug causing the regression]: various
[User impact if declined]: potential crash, and potential degraded performance due to over invalidation I guess
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not likely
[Why is the change risky/not risky?]: it simply stops us unnecessarily over inflating the invalidation area for 'box-shadow'
[String changes made/needed]: none
Flags: needinfo?(jwatt)
Attachment #8958091 - Flags: approval-mozilla-beta?
Comment on attachment 8958091 [details] [diff] [review]
fix nsLayoutUtils::GetBoxShadowRectForFrame to not use GetVisualOverflowRectRelativeToSelf

layout fix for beta60
Attachment #8958091 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: