Open Bug 1403656 Opened 3 years ago Updated 3 days ago

stack overflow during reflow, with infinite recursion in ProcessReflowCommands->DidDoReflow->DoFlushPendingNotifications

Categories

(Core :: Layout, defect, P2, critical)

49 Branch
defect

Tracking

()

Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- affected
firefox-esr68 --- affected
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fix-optional
firefox72 --- fix-optional
firefox73 --- fix-optional

People

(Reporter: tsmith, Unassigned, NeedInfo)

References

(Blocks 7 open bugs)

Details

(Keywords: crash, regression, testcase, Whiteboard: [fuzzblocker][layout:triage-discuss][tbird crash])

Crash Data

Attachments

(6 files, 1 obsolete file)

Attached file test_case.html (obsolete) —
#0  0x00000000004a53ca in __asan_memset () at /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cc:430
#1  0x00007fffe0186b7d in BaseRect () at /src/obj-firefox/dist/include/mozilla/gfx/BaseRect.h:49
#2  nsRect () at /src/obj-firefox/dist/include/nsRect.h:31
#3  GetBoxShadowRectForFrame () at /src/layout/base/nsLayoutUtils.cpp:8317
#4  0x00007fffe030f7e5 in ComputeEffectsRect () at /src/layout/generic/nsFrame.cpp:6770
#5  FinishAndStoreOverflow () at /src/layout/generic/nsFrame.cpp:9082
#6  0x00007fffe0249825 in FinishAndStoreOverflow () at /src/layout/generic/nsIFrame.h:3168
#7  Reflow () at /src/layout/generic/nsBlockFrame.cpp:1459
#8  0x00007fffe02a343b in ReflowChild () at /src/layout/generic/nsContainerFrame.cpp:932
#9  0x00007fffe03b667a in ReflowInFlowChild () at /src/layout/generic/nsGridContainerFrame.cpp:5125
#10 0x00007fffe03bd55a in ReflowChildren () at /src/layout/generic/nsGridContainerFrame.cpp:5697
#11 0x00007fffe03c00f8 in Reflow () at /src/layout/generic/nsGridContainerFrame.cpp:6000
#12 0x00007fffe03f9bba in ReflowFrame () at /src/layout/generic/nsLineLayout.cpp:921
#13 0x00007fffe026a855 in ReflowInlineFrame () at /src/layout/generic/nsBlockFrame.cpp:4220
#14 0x00007fffe0269469 in DoReflowInlineFrames () at /src/layout/generic/nsBlockFrame.cpp:4016
#15 0x00007fffe0260efa in ReflowInlineFrames () at /src/layout/generic/nsBlockFrame.cpp:3890
#16 0x00007fffe025aab9 in ReflowLine () at /src/layout/generic/nsBlockFrame.cpp:2873
#17 0x00007fffe02505c0 in ReflowDirtyLines () at /src/layout/generic/nsBlockFrame.cpp:2409
#18 0x00007fffe0247373 in Reflow () at /src/layout/generic/nsBlockFrame.cpp:1235
#19 0x00007fffe053aaa8 in Reflow () at /src/layout/forms/nsSelectsAreaFrame.cpp:190
#20 0x00007fffe02a343b in ReflowChild () at /src/layout/generic/nsContainerFrame.cpp:932
#21 0x00007fffe0360ca9 in ReflowScrolledFrame () at /src/layout/generic/nsGfxScrollFrame.cpp:548
#22 0x00007fffe036235f in ReflowContents () at /src/layout/generic/nsGfxScrollFrame.cpp:660
#23 0x00007fffe036550a in Reflow () at /src/layout/generic/nsGfxScrollFrame.cpp:1037
#24 0x00007fffe0513de5 in ReflowAsDropdown () at /src/layout/forms/nsListControlFrame.cpp:528
Flags: in-testsuite?
Summary: stack overflow in during reflow → stack overflow during reflow
INFO: Last good revision: 9ce31e9f90cb0e534611b0f617c5bbc232ffe748 (2016-04-26)
INFO: First bad revision: ab0044bfa1df858919797bcd6a9aef76a668cd4a (2016-04-27)
INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9ce31e9f90cb0e534611b0f617c5bbc232ffe748&tochange=ab0044bfa1df858919797bcd6a9aef76a668cd4a

Starting with jfkthame due to nsLineBreaker on the stack (bug 1265631?).
Blocks: 1265631
Crash Signature: [@ _chkstk | nsLineBreaker::AppendText ]
Has Regression Range: --- → yes
Flags: needinfo?(jfkthame)
Version: Trunk → 49 Branch
I don't think bug 1265631 is relevant here. There's nothing in the testcase that would be affected by its changes, AFAICS, and I tried reverting it locally and the stack-overflow still occurs. nsLineBreaker just happens to be what's at the top of the stack when we finally overflow.

The real problem seems to be that we're getting into an infinitely-recursive sequence of DoFlushPendingNotifications ... ProcessReflowCommands ... DidDoReflow as a result (apparently) of modifying the text-rendering attribute on an SVG element. Stack excerpt:

    frame #36: 0x0000000104566c2d XUL`mozilla::PresShell::ProcessReflowCommands(this=<unavailable>, aInterruptible=<unavailable>) + 381 at PresShell.cpp:9553 [opt]
    frame #37: 0x00000001045667ec XUL`mozilla::PresShell::DoFlushPendingNotifications(this=0x0000000128b85000, aFlush=<unavailable>) + 1644 at PresShell.cpp:4207 [opt]
    frame #38: 0x000000010455f127 XUL`mozilla::PresShell::DidDoReflow(this=0x0000000128b85000, aInterruptible=true) + 39 at PresShell.cpp:9208 [opt]
    frame #39: 0x0000000104566d4c XUL`mozilla::PresShell::ProcessReflowCommands(this=<unavailable>, aInterruptible=<unavailable>) + 668 at PresShell.cpp:9565 [opt]
    frame #40: 0x00000001045667ec XUL`mozilla::PresShell::DoFlushPendingNotifications(this=0x0000000128b85000, aFlush=<unavailable>) + 1644 at PresShell.cpp:4207 [opt]
    frame #41: 0x000000010455f127 XUL`mozilla::PresShell::DidDoReflow(this=0x0000000128b85000, aInterruptible=true) + 39 at PresShell.cpp:9208 [opt]
    frame #42: 0x0000000104566d4c XUL`mozilla::PresShell::ProcessReflowCommands(this=<unavailable>, aInterruptible=<unavailable>) + 668 at PresShell.cpp:9565 [opt]
    frame #43: 0x00000001045667ec XUL`mozilla::PresShell::DoFlushPendingNotifications(this=0x0000000128b85000, aFlush=<unavailable>) + 1644 at PresShell.cpp:4207 [opt]
    frame #44: 0x000000010455f127 XUL`mozilla::PresShell::DidDoReflow(this=0x0000000128b85000, aInterruptible=true) + 39 at PresShell.cpp:9208 [opt]
    frame #45: 0x0000000104566d4c XUL`mozilla::PresShell::ProcessReflowCommands(this=<unavailable>, aInterruptible=<unavailable>) + 668 at PresShell.cpp:9565 [opt]
    frame #46: 0x00000001045667ec XUL`mozilla::PresShell::DoFlushPendingNotifications(this=0x0000000128b85000, aFlush=<unavailable>) + 1644 at PresShell.cpp:4207 [opt]
    frame #47: 0x000000010455f127 XUL`mozilla::PresShell::DidDoReflow(this=0x0000000128b85000, aInterruptible=true) + 39 at PresShell.cpp:9208 [opt]

... and so on, for tens of thousands of stack frames.

Looking at the pushlog from comment 1, I'm taking a guess that maybe bug 1263845 is involved here. A key aspect of the testcase is the use of a percentage on "#htmlvar00009 { top: 22%; }".
Blocks: 1263845
No longer blocks: 1265631
Flags: needinfo?(jfkthame) → needinfo?(bzbarsky)
Definitely regressed by bug 1263845.

I assume the percentage "top" is only relevant insofar as it sets NS_FRAME_CONTAINS_RELATIVE_BSIZE and this can trigger BResize to be happening.
This doesn't crash on trunk, but did crash both before and after the patch for bug 1263845.

So we have a longstanding bug here, and the patch for bug 1263845 just tickled it a bit differently....
So yeah, bug 1263845 was really more or less a red herring.  Something about dynamic reflow of that marker plus having, on the same line, a scrollframe (which does post-reflow callbacks, which is likely relevant here), is bad.
Yeah, so the problem is that nsHTMLScrollFrame::Reflow unconditionally posts a reflow callback if one is not posted already:

  if (!mHelper.mPostedReflowCallback) {
    // Make sure we'll try scrolling to restored position
    PresContext()->PresShell()->PostReflowCallback(&mHelper);
    mHelper.mPostedReflowCallback = true;
  }

Then when ScrollFrameHelper::ReflowFinished is called it returns true (meaning "flush layout") if mUpdateScrollbarAttributes is true.

So in this testcase, we keep doing that: the layout flush reflows the scrollframe, which posts the reflow callback, which says to do the layout flush, etc.  This has come up several times before.  See bug 550306, bug 551987, bug 866767 (with a pretty similar, in some ways, testcase and analysis, actually), 

Oh, and ScrollFrameHelper::LayoutScrollbars does:

  mUpdateScrollbarAttributes = true;
  if (!mPostedReflowCallback) {
    aState.PresContext()->PresShell()->PostReflowCallback(this);
    mPostedReflowCallback = true;
  }

so if that gets called we are absolutely guaranteed more relayout.  So the basic problem is that the post-reflow-callback flush is reentering ScrollFrameHelper::LayoutScrollbars, and then we lose.

OK, so why are getting to LayoutScrollbars?  That's because when we enter nsHTMLScrollFrame::Reflow we have aReflowInput.ShouldReflowAllKids() true.  And that's because aReflowInput.IsBResize() is true, and the scrollframe has the NS_FRAME_CONTAINS_RELATIVE_BSIZE bit set, coming from that "top: 0%" on the span.

We have IsBResize() true because we fell into this case in ReflowInput::InitResizeFlags:

  } else if (mCBReflowInput && !nsLayoutUtils::GetAsBlock(mFrame)) {

and did:

    SetBResize(mCBReflowInput->IsBResizeForWM(wm));

which tests true.  That case is the change from bug 1263845.

Note that even if we avoided this case, we would end up in the next case (on my testcase that crashed before bug 1263845), which does:

  } else if (ComputedBSize() == NS_AUTOHEIGHT) {
    if (eCompatibility_NavQuirks == aPresContext->CompatibilityMode() &&
        mCBReflowInput) {
      SetBResize(mCBReflowInput->IsBResizeForWM(wm))

which is just as bad.

Jet, we really need someone to sort out the vertical resize flag story...  Either we need to be a lot more conservative about them, or we need to change scrollframe to stop doing this infinite-reentry business or something.
Blocks: 550306, 551987, 866767
Flags: needinfo?(bzbarsky) → needinfo?(bugs)
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #6)
> Jet, we really need someone to sort out the vertical resize flag story...

Do you mean bug 1351924? (I can still get to that one, if it's a priority and would help here.)
Flags: needinfo?(bzbarsky)
Huh, I'd forgotten bug 1351924.  I suspect it would in fact help here!
Depends on: 1351924
Flags: needinfo?(bzbarsky)
OK - I will circle back to that bug soon, and we'll see if it helps.

[Canceling ni=jet, since he's asked me to follow up on this, which I'll do via bug 1351924.)
Flags: needinfo?(bugs)
Priority: -- → P2
See Also: → 1420029
We are seeing this bug hundreds of times a day while fuzzing.

Dan, do you know who might be able to look at this?
Flags: needinfo?(dholbert)
Whiteboard: [fuzzblocker]
I don't know - sorry for not having followed up on this yet.  Personally, I won't be able to get to it for at least another week (I'm at a layout-devtools planning thing in Toronto for the next week, which will likely keep me busy).

I can try to take a look at it after I'm back. Alternately, Mats might be a good person to take a look, too, though I don't know what he's got on his plate.

(In the meantime, I'm hoping/assuming that the automated fuzzers that are hitting this can detect and skip past this crash automatically, so that you're not having to manually intervene hundreds of times per day. Though granted, it's still bad because it means we can't thoroughly fuzz certain codepaths because they're blocked by this crash, of course.)
See Also: → 1397795
(still haven't gotten to this, but it sounds like the best way forward is to have somebody fix bug 1351924; I just posted a needinfo there to see if we can find an owner who's got cycles to look at that.)
Flags: needinfo?(dholbert)
Duplicate of this bug: 1454201
See Also: → 1461124
Blocks: 1461124
See Also: 1461124
See Also: → 1502872
There are a few "see-also" bugs here, can you take another look? 
Also, in comment 12, I'm not sure which bug you're referencing.
Flags: needinfo?(dholbert)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #14)
> There are a few "see-also" bugs here, can you take another look?
> Also, in comment 12, I'm not sure which bug you're referencing.

Yup, left out the bug #, sorry -- I meant to say "bug 1351924" there.  We have some diagnosis in comment 6 and comment 8 which suggests that fixing bug 1351924 would help.

For now, it sounds like the best way forward here is to fix bug 1351924.
Flags: needinfo?(dholbert)
See Also: → 1510333
This crash seems to be hit frequently while fuzzing on Android. It would be nice to get this fixed and out of the way of the fuzzers.
Attached file testcase.html
Simplified testcase
Attachment #8912817 - Attachment is obsolete: true
Summary: stack overflow during reflow → stack overflow during reflow, in ProcessReflowCommands
Summary: stack overflow during reflow, in ProcessReflowCommands → stack overflow during reflow, with infinite recursion in ProcessReflowCommands->DidDoReflow->DoFlushPendingNotifications

Going to try to get some movement on bug 1351924 soon, which as mentioned in comment 15 should help fix this. That said, we won't have any fix for that bug in time for 65.

Crash Signature: [@ _chkstk | nsLineBreaker::AppendText ] → [@ _chkstk | nsLineBreaker::AppendText ] [@ nsLineBreaker::AppendText ] [@ nsTArray_Impl<T>::AppendElements<T> | nsLineBreaker::AppendText ]
Duplicate of this bug: 1532084

Sean, checking in to see if there are further updates on this since comment 18 and if we could possibly have a fix to uplift to 67 beta.

Flags: needinfo?(svoisen)

No updates yet. It's still on our backlog. We haven't had the resources to work on it.

Flags: needinfo?(svoisen)
Duplicate of this bug: 1543140
Blocks: 1543140

attachment 9032854 [details] no longer seems to show a crash; I'll check some of the other testcases to look for one that does.

I have multiple test cases. I will reduce a new once and attach it.

This test case was reduced with m-c:
BuildID=20190710154620
SourceStamp=241af4dbb96483e0b9371681d2f19e4f28e5d6ed

That testcase still crashes with the patches to bug 1351924.

I think this really requires more directly addressing the situation described in comment 6; we shouldn't hope that we have enough optimizations to keep us out of it.

Marking this fix-optional for 69/70 to remove it from regression triage.

Flags: needinfo?(svoisen)
Whiteboard: [fuzzblocker] → [fuzzblocker][layout:triage-discuss]

Tyson, do you have other test cases which are still able to reproduce the crash? The test case in comment 25 doesn't crash on my linux box with nightly.

Flags: needinfo?(twsmith)
Duplicate of this bug: 1593721
Attached file testcase.html

This test case requires the following prefs:
layout.accessiblecaret.enabled=true
layout.css.font-variations.enabled=true

Bug 1593721 also has a new test case.

Flags: needinfo?(twsmith)

A Pernosco session can be found here: https://pernos.co/debug/-lpMrnJkrWSEqcpiOjyS4g/index.html
The session will expire in 7 days.

See Also: → 1594332
Attached file testcase_noprefs.html

Another testcase that does not require any extra prefs.

:hiro, Let me know if you have any issues with the new test cases or if you would like a Pernosco session for the latest test case.

Flags: needinfo?(hikezoe.birchill)
Duplicate of this bug: 1594332
Flags: needinfo?(emilio)
Whiteboard: [fuzzblocker][layout:triage-discuss] → [fuzzblocker][layout:triage-discuss][tbird crash]
You need to log in before you can comment on or make changes to this bug.