Closed Bug 1402766 Opened 7 years ago Closed 7 years ago

stylo: crash near null [@ nsInlineFrame::UpdateStyleOfOwnedAnonBoxesForIBSplit]

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- disabled
firefox57 --- fixed
firefox58 --- fixed

People

(Reporter: tsmith, Assigned: bzbarsky)

References

(Blocks 1 open bug)

Details

(Keywords: crash, csectype-nullptr, testcase)

Attachments

(4 files)

==21743==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000058 (pc 0x7f95458e3288 bp 0x7fff098710b0 sp 0x7fff09871000 T0)
==21743==The signal is caused by a READ memory access.
==21743==Hint: address points to the zero page.
    #0 0x7f95458e3287 in Hdr /src/obj-firefox/dist/include/nsTArray.h:525:32
    #1 0x7f95458e3287 in Elements /src/obj-firefox/dist/include/nsTArray.h:1038
    #2 0x7f95458e3287 in IndexOf<const mozilla::FramePropertyDescriptorUntyped *, mozilla::FrameProperties::PropertyComparator> /src/obj-firefox/dist/include/nsTArray.h:1173
    #3 0x7f95458e3287 in GetInternal /src/layout/base/FrameProperties.h:412
    #4 0x7f95458e3287 in Get<nsContainerFrame> /src/layout/base/FrameProperties.h:234
    #5 0x7f95458e3287 in GetProperty<nsContainerFrame> /src/layout/generic/nsIFrame.h:3508
    #6 0x7f95458e3287 in nsInlineFrame::UpdateStyleOfOwnedAnonBoxesForIBSplit(mozilla::ServoRestyleState&) /src/layout/generic/nsInlineFrame.cpp:1073
    #7 0x7f954580ce03 in nsIFrame::DoUpdateStyleOfOwnedAnonBoxes(mozilla::ServoRestyleState&) /src/layout/generic/nsFrame.cpp:10661:42
    #8 0x7f9545564468 in UpdateStyleOfOwnedAnonBoxes /src/layout/generic/nsIFrame.h:3334:7
    #9 0x7f9545564468 in mozilla::ServoRestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoStyleContext*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) /src/layout/base/ServoRestyleManager.cpp:877
...
see log.txt
Flags: in-testsuite?
Attached file test_case.html
Attached file log.txt
Only reproduces with Stylo enabled. Bisection ends up back on when we first starting building Stylo by default, though.
Has Regression Range: --- → no
Summary: crash near null [@ nsInlineFrame::UpdateStyleOfOwnedAnonBoxesForIBSplit] → stylo: crash near null [@ nsInlineFrame::UpdateStyleOfOwnedAnonBoxesForIBSplit]
Flags: needinfo?(emilio)
Flags: needinfo?(bzbarsky)
Priority: -- → P2
In a debug build, we have:

Assertion failure: nextInline (There is always a trailing inline in an IB split), at /home/bzbarsky/mozilla/vanilla/mozilla/layout/generic/nsInlineFrame.cpp:1068
Assignee: nobody → bzbarsky
Flags: needinfo?(emilio)
Boy, this is fun.  Layout is messing up here.  We get assertions like so:

  ###!!! ASSERTION: frame tree not empty, but caller reported complete status: 'aSubtreeRoot->GetPrevInFlow()', file /home/bzbarsky/mozilla/vanilla/mozilla/layout/base/nsLayoutUtils.cpp, line 7674
  ###!!! ASSERTION: frame tree not empty, but caller reported complete status: 'aSubtreeRoot->GetPrevInFlow()', file /home/bzbarsky/mozilla/vanilla/mozilla/layout/base/nsLayoutUtils.cpp, line 7674
  ###!!! ASSERTION: frame tree not empty, but caller reported complete status: 'start == end || IsInLetterFrame(aSubtreeRoot)', file /home/bzbarsky/mozilla/vanilla/mozilla/layout/base/nsLayoutUtils.cpp, line 7688
  ###!!! ASSERTION: frame tree not empty, but caller reported complete status: 'aSubtreeRoot->GetPrevInFlow()', file /home/bzbarsky/mozilla/vanilla/mozilla/layout/base/nsLayoutUtils.cpp, line 7674
  ###!!! ASSERTION: How did that happen?: 'nextCont', file /home/bzbarsky/mozilla/vanilla/mozilla/layout/painting/nsCSSRendering.cpp, line 321

which all come down to the same thing: we did some layout, then deleted a next-in-flow (nsBlockReflowContext::ReflowBlock calling nsBlockFrame::DeleteNextInFlowChild) that still had kids.  In particular, it had the <hr> frame (which is a block and causes the {ib} split here) and the trailing inline of that ib split.  We killed that stuff, and at least safely dropped the references to it, looks like.  But we ended up with a broken ib split which has an inline, then an empty block, then nothing.  And of course we lost the <hr>!
So to be clear: this crash is happening because we have existing layout code that is buggy and violates an invariant that should hold.  The stylo code is just depending on this invariant, and crashing when it gets violated.

We can certainly insert a null-check in the stylo code to handle the invariant violation.  That said, that invariant violation also pops up in bug 743364 and bug 766868 and probably bug 1144641...  I'll try to figure out what layout is up to.
OK, so on the simplified testcase, which looks like this:

  <style>
    #float { page-break-inside: avoid; float: left }
    span { padding-left: 1px; }
    #columns { columns: 0px; )
  </style>
  <body>
    <div id="columns">
      A <!-- Just need a non-zero-width inline here -->
      <div id="float">
        <span> <!-- Just need an inline with nonzero width -->
          <div></div>
        </span>
      </div>
    </div>
  </body>

we end up columnating the <div id="float">.  It ends up with a first continuation containing the first inline of the ib split and an empty block from the ib split, and a second continuation that contains the continuation of the ib split block (which contains that innermost div's block) and the trailing ib split inline.

Alright, so now we go to reflow that float's first continuation.  We land in ReflowDirtyLines, reflow the inline line and it's complete.  Then we look at the block line, call nsBlockFrame::ReflowBlockFrame, and this condition tests true:

    if ((!aState.mReflowInput.mFlags.mIsTopOfPage || clearedFloats) &&
        availSpace.BSize(wm) < 0) {

because our availSpace.BSize(wm) == -1 and we're not top-of-page (we had the inline line before us).  Then ReflowBlockFrame does:

      *aKeepReflowGoing = false;
      if (ShouldAvoidBreakInside(aState.mReflowInput)) {
        aState.mReflowStatus.SetInlineLineBreakBeforeAndReset();
      } else {
        PushLines(aState, aLine.prev());
        aState.mReflowStatus.SetIncomplete();
      }

In our case we have "page-break-inside: avoid", so we do the SetInlineLineBreakBeforeAndReset() bit but do NOT change aState.mReflowStatus to incomplete.  The docs for SetInlineLineBreakBeforeAndReset() say "Note that other frame completion status isn't expected to matter after calling this method"... but it turns out to matter.  The Reset() bit inside SetInlineLineBreakBeforeAndReset() set mCompletion to FullyComplete, which it was anyway.

Alright, now we return from ReflowBlockFrame to ReflowLine, then to ReflowDirtyLines.  We've set keepGoing to false and aState.mReflowInput.WillReflowAgainForClearance() is false.  Our line has things on it, so we leave the line box in place and break from the loop.  We don't pull anything from our next-in-flow (which is fine in this case) and return back to nsBlockFrame::Reflow. This merges overflow container kid status and pushed float status into aState.mReflowStatus, but it's still FullyComplete.  We do some final bookkeeping, then hand out aState.mReflowStatus, returning to nsBlockReflowContext::ReflowBlock.

Now in ReflowBlock we have this condition:

  if (!aFrameReflowStatus.IsInlineBreakBefore() ||
      (mFrame->GetStateBits() & NS_FRAME_OUT_OF_FLOW)) {

In our case we _do_ have IsInlineBreakBefore() true (because we called SetInlineLineBreakBeforeAndReset()), but we're the float, so NS_FRAME_OUT_OF_FLOW is set.  So we enter this code, and then it does:

    // If frame is complete and has a next-in-flow, we need to delete
    // them now. Do not do this when a break-before is signaled because
    // the frame is going to get reflowed again (and may end up wanting
    // a next-in-flow where it ends up), unless it is an out of flow frame.
    if (aFrameReflowStatus.IsFullyComplete()) {

Our status is complete, so we enter _that_ code and delete our next in flow, together with its descendants.  All the rest of the problems follow.

The code in ReflowBlockFrame that returned early while claiming we were complete dates back to the initial landing of bug 685012 (which implemented "page-break-inside: avoid") back in 2012.  The code in nsBlockReflowContext::ReflowBlock that ignores the break-before status for out-of-flows is ... old.  That "unless it is an out of flow frame" bit in the comment dates back to the fix for bug 145305 in 2002 (patch by karnaze, r=alexsavulov... I had thought we no longer had code like that in the tree).  The rest of that comment and most of the surrounding code is the initial commit of nsBlockReflowContext by kipp back in 1998.

Alright, so for _printing_ maybe this was all OK somehow, because printing doesn't do incremental reflow (at least not back in 2002).  So at the time when karnaze added that code the problem of having break-before on an out-of-flow that had a trailing block with a continuation in the next block couldn't really arise, maybe: that requires doing another reflow after the continuations were created.  Even then, I'd think tables could trigger it...

But in any case, the fix for bug 685012 ended up putting us into this state when "avoid" is used and we have a non-fitting block.

Mats, do you have any thoughts on how these pieces are supposed to interact?  For this specific testcase, we could check, at the point when we make the SetInlineLineBreakBeforeAndReset() call, whether the block has a next-in-flow, and if so mark ourselves incomplete.  But what if this isn't even the first line, and the block involved _is_ complete, but we have a later line that isn't complete?  I really don't understand how avoiding pushing lines in the ShouldAvoidBreakInside() case can be sound in general.
Flags: needinfo?(mats)
Oh, and I think the simplified testcase is pretty minimal: it needs the page-break-inside on a float to triggere the relevant code and the columns and whatnot to trigger pagination...
Priority: P2 → P1
AFAIK, even looking for an incomplete/complete state when the state
is "break-before" has always been bogus.

I haven't looked at the code in detail yet though, so I'm not sure
how exactly the pushing works in that case...  I'll take a look.

(Sorry for delay.  I had some computer issues the last few days.)
OK.  I just filed bug 1405443 to fix the underlying problem.  I'm just going to add a null-check workaround for now.
Flags: needinfo?(mats)
Blocks: 1405461
MozReview-Commit-ID: 3ggJI0qmOJV
Attachment #8914905 - Flags: review?(emilio)
Attachment #8914905 - Flags: review?(emilio) → review+
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71d02a129beb
Work around layout violating its own invariants and causing stylo code to crash.  r=emilio
Comment on attachment 8914905 [details] [diff] [review]
Work around layout violating its own invariants and causing stylo code to crash

Approval Request Comment
[Feature/Bug causing the regression]: Stylo
[User impact if declined]: Possible null-deref cases in some fairly weird
   situations.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Just adds a null-check for a situation
   that really shouldn't arise.  If null is detected, we stop the loop,
   because there's nothing else to loop over.
[String changes made/needed]: None.
Attachment #8914905 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/71d02a129beb
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment on attachment 8914905 [details] [diff] [review]
Work around layout violating its own invariants and causing stylo code to crash

Fix a crash, taking it.
Attachment #8914905 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #14)
> [Is this code covered by automated tests?]: Yes.
> [Has the fix been verified in Nightly?]: Yes.
> [Needs manual test from QE? If yes, steps to reproduce]: No.

Setting qe-verify- based on Boris's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
See Also: → 1431781
See Also: → 1431232
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: