Closed Bug 1462497 Opened 6 years ago Closed 6 years ago

Crash in MergeState::HasMatchingItemInOldList

Categories

(Core :: Web Painting, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 + fixed
firefox62 + fixed

People

(Reporter: marcia, Assigned: mattwoodrow)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files)

This bug was filed from the Socorro interface and is
report bp-83823177-b9b4-4f4a-bceb-efa830180429.
=============================================================

Seen while looking at nightly crash stats: https://bit.ly/2rOH4cb. Crashes go back to when nightly was in 61. 21 crashes/23 installs in 62 so far. All 62 crashes have crash reason: EXCEPTION_BREAKPOINT 

Top 10 frames of crashing thread:

0 xul.dll MergeState::HasMatchingItemInOldList layout/painting/RetainedDisplayListBuilder.cpp:322
1 xul.dll MergeState::ProcessItemFromNewList layout/painting/RetainedDisplayListBuilder.cpp:272
2 xul.dll RetainedDisplayListBuilder::MergeDisplayLists layout/painting/RetainedDisplayListBuilder.cpp:487
3 xul.dll RetainedDisplayListBuilder::AttemptPartialUpdate layout/painting/RetainedDisplayListBuilder.cpp:1097
4 xul.dll nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:3805
5 xul.dll mozilla::PresShell::Paint layout/base/PresShell.cpp:6330
6 xul.dll nsViewManager::ProcessPendingUpdatesPaint view/nsViewManager.cpp:480
7 xul.dll nsViewManager::ProcessPendingUpdatesForView view/nsViewManager.cpp:412
8 xul.dll nsViewManager::ProcessPendingUpdates view/nsViewManager.cpp:1102
9 xul.dll nsRefreshDriver::Tick layout/base/nsRefreshDriver.cpp:2053

=============================================================
Had a look at the reports:

2 - nsDisplayBackgroundImage - bug 1462477 - 4 crashes
27 - nsDisplayCompositorHitTestInfo - no bug - 2 crashes
54 - nsDisplayFilter - no bug - 5 crashes
72 - nsDisplayWrapList - no bug - 18 crashes

Those are the item type ids from the crash, and which item they correspond to.

Assuming the crash is similar to bug 1462477, the interesting piece is the wrapping item that was there in one paint and not in the next (nsDisplayFixedPosition).

I'm uploading a patch to print out that type too.

Had a look at the URLs, lots of youtube. Couldn't figure out any STR though.
Assignee: nobody → matt.woodrow
Comment on attachment 8976813 [details]
Bug 1462497 - Part 2: Invalidate the new caret frame in subdocuments, since it will result in a new nsDisplayWrapList being created.

https://reviewboard.mozilla.org/r/244904/#review251148

I think you want to combine the patches somehow.
Attachment #8976813 - Flags: review?(mstange) → review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/8c6bdb5e72fe
Add assertions to try diagnose which wrapper item went away. r=mstange
https://hg.mozilla.org/mozilla-central/rev/8c6bdb5e72fe
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Probably should have marked this as leave-open before pushing.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Target Milestone: mozilla62 → ---
I experienced it on youtube:
bp-2d0f8f98-5008-4b8d-b7ce-e2d830180519
"Item found was in the wrong list! type 72 (outer type was 38 at depth 4, now is 72)"
The vast majority (but not quite all) of the crashes are now showing type 38 as either the new or the old container item.

Type 38 is nsDisplayOpacity, and this error means that the value of 'useOpacity' [1] changed between paints, without anything marking the frame as changed.

HasVisualOpacity, and usingSVGEffects are both solely dependent on style data, and we definitely mark the frame as changed whenever the styles get updated.

nsSVGUtils::CanOptimizeOpacity calls into nsLayoutUtils::HasAnimationOfProperty.

nsDisplayOpacity::NeedsActiveLayer calls EffectCompositor::HasAnimationsForCompositor and ActiveLayerTracker::IsStyleAnimated (which then calls nsLayoutUtils::HasEffectiveAnimation).

IsStyleAnimated can definitely change it's result, since it checks the will-change budget, and it's possible something else new in the display list consumed the budget this time where it didn't previously. We should switch to passing nullptr to the builder for this.

Hiro, can the results of these 3 different animations function change their result without marking the frame as changed (nsIFrame::MarkNeedsDisplayItemRebuild, usually via InvalidateFrame or SchedulePaint)?

I think we probably want to make this condition use the 'MayHaveAnimationOfProperty' variants to make sure that it's stable.

[1] https://searchfox.org/mozilla-central/source/layout/generic/nsFrame.cpp#2957
Flags: needinfo?(hikezoe)
In case of transform animations, EffectCompositor::HasAnimationsForCompositor likely changes the result, e.g. when we knew the layer is too big to run the animation on the compositor when we build the display item for the transform.  But that's not the case here?

As for opacity cases, I don't think nsIFrame::MayHaveOpacityAnimation() works well on continuation frames.  (I've filed a bug for auditing it, bug 1459776).  So, it might be a culprit here.

Is there any site that is able to reproduce this crash reliably?  If we know the one, it would be really nice to debug. Youtube doesn't crash on my linux box unfortunately.  Keep ni? to me, I will revisit this later today.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #10)
> In case of transform animations,
> EffectCompositor::HasAnimationsForCompositor likely changes the result, e.g.
> when we knew the layer is too big to run the animation on the compositor
> when we build the display item for the transform.  But that's not the case
> here?
> 
> As for opacity cases, I don't think nsIFrame::MayHaveOpacityAnimation()
> works well on continuation frames.  (I've filed a bug for auditing it, bug
> 1459776).  So, it might be a culprit here.

I've put up patches to skip HasAnimationsForCompositor, and optionally switch to using only MayHaveOpacityAnimation.

Unsure if continuations are the issue here, but I can have a look into that code too!


> 
> Is there any site that is able to reproduce this crash reliably?  If we know
> the one, it would be really nice to debug. Youtube doesn't crash on my linux
> box unfortunately.  Keep ni? to me, I will revisit this later today.

Unfortunately not, but the crash-stats URLs and comment 8 suggest that YouTube can sometimes reproduce it. I haven't been able to do that myself though.

It sounds like it must involve an opacity animation on a frame that is either svg, or has svg effects (filter or mask).
Comment on attachment 8979133 [details]
Bug 1462497 - Part 3: Don't condition useOpacity on values that might change silently between paints (HasAnimationsForCompositor, and the will-change budget).

https://reviewboard.mozilla.org/r/245370/#review251272

This quite makes sense.  (Sorry, in the previous my comment, I missed, in opacity case, nsDisplayOpacity::NeedsActivelayer() changes its result when the frame size is too small.)

::: layout/generic/nsFrame.cpp:2959
(Diff revision 1)
>    // layer (for async animations), see
>    // nsSVGIntegrationsUtils::PaintMaskAndClipPath or
>    // nsSVGIntegrationsUtils::PaintFilter.
>    bool useOpacity = HasVisualOpacity(effectSet) &&
>                      !nsSVGUtils::CanOptimizeOpacity(this) &&
> -                    (!usingSVGEffects || nsDisplayOpacity::NeedsActiveLayer(aBuilder, this));
> +                    (!usingSVGEffects || nsDisplayOpacity::MayNeedActiveLayer(this));

We should add a comment here why we do use MayNeedActiveLayer, I think copying the commit message here is enough?

Also, it's unrelated to this change, |useOpacity| is  ambiguous, we might want to change the name at some point.
Attachment #8979133 - Flags: review?(hikezoe) → review+
Comment on attachment 8979134 [details]
Bug 1462497 - Part 4: Only use the MayHaveAnimation variants when computing useOpacity in case the more accurate version changes results between paints.

https://reviewboard.mozilla.org/r/245372/#review251276

I am not sure this change contributes to fix the crash either, but let's see what happens.
Attachment #8979134 - Flags: review?(hikezoe) → review+
Comment on attachment 8976812 [details]
Bug 1462497 - Part 1: Don't crash if we find an item from the wrong list, and keep looking instead

https://reviewboard.mozilla.org/r/244902/#review251798

LGTM. Are there any high profile sites that trigger this assertion?
Attachment #8976812 - Flags: review?(mikokm) → review+
cf comment #8, I experienced it on youtube
Comment on attachment 8976813 [details]
Bug 1462497 - Part 2: Invalidate the new caret frame in subdocuments, since it will result in a new nsDisplayWrapList being created.

https://reviewboard.mozilla.org/r/244904/#review251846

::: layout/generic/nsSubDocumentFrame.cpp:466
(Diff revision 2)
>        // in the retained display list.
>        if (mPreviousCaret) {
>          aBuilder->MarkFrameModifiedDuringBuilding(mPreviousCaret);
>        }
> +      if (aBuilder->GetCaretFrame()) {
> +        aBuilder->MarkFrameModifiedDuringBuilding(aBuilder->GetCaretFrame());

Why is this needed?

Shouldn't the call to RebuildAllItemsInCurrentSubtree() above also rebuild the caret frame?
(In reply to Miko Mynttinen [:miko] from comment #20)
> Comment on attachment 8976813 [details]
> Bug 1462497 - Part 2: Invalidate the new caret frame in subdocuments, since
> it will result in a new nsDisplayWrapList being created.
> 
> https://reviewboard.mozilla.org/r/244904/#review251846
> 
> ::: layout/generic/nsSubDocumentFrame.cpp:466
> (Diff revision 2)
> >        // in the retained display list.
> >        if (mPreviousCaret) {
> >          aBuilder->MarkFrameModifiedDuringBuilding(mPreviousCaret);
> >        }
> > +      if (aBuilder->GetCaretFrame()) {
> > +        aBuilder->MarkFrameModifiedDuringBuilding(aBuilder->GetCaretFrame());
> 
> Why is this needed?
> 
> Shouldn't the call to RebuildAllItemsInCurrentSubtree() above also rebuild
> the caret frame?

RebuildAllItemsInCurrentSubtree overrides the dirty rect to match the visible rect, and sets mInInvalidSubtree to prevent descendant stacking contexts from overriding it further.

It doesn't do anything to mark items as invalid for the purposes of merging.

In this case the old list looked something like:

nsDisplaySubDocument(root-frame)
  nsDisplayWrapList(color-frame)
    nsDisplayBackroundColor(color-frame)

and the new one is:

nsDisplaySubDocument(root-frame)
  nsDisplayWrapList(caret-frame)
    nsDisplayWrapList(color-frame)
      nsDisplayBackroundColor(color-frame)
    nsDisplayCaret(caret-frame)

Enabling the caret also adds an extra wrap list, and changes the nesting level of the wrap-list/background-color, and merging gets confused.
Comment on attachment 8976813 [details]
Bug 1462497 - Part 2: Invalidate the new caret frame in subdocuments, since it will result in a new nsDisplayWrapList being created.

https://reviewboard.mozilla.org/r/244904/#review251982

Thank you for the explanation, that makes sense.
Attachment #8976813 - Flags: review?(mikokm) → review+
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/965f3e29cc4a
Part 1: Don't crash if we find an item from the wrong list, and keep looking instead r=miko
https://hg.mozilla.org/integration/autoland/rev/1386a8847bbb
Part 2: Invalidate the new caret frame in subdocuments, since it will result in a new nsDisplayWrapList being created. r=miko,mstange
https://hg.mozilla.org/integration/autoland/rev/7fc66c715a0f
Part 3: Don't condition useOpacity on values that might change silently between paints (HasAnimationsForCompositor, and the will-change budget). r=hiro
https://hg.mozilla.org/integration/autoland/rev/39cb4d5f6602
Part 4: Only use the MayHaveAnimation variants when computing useOpacity in case the more accurate version changes results between paints. r=hiro
I don't think it's related to this crash, but I found an issue in the interaction between retained display list stuff and animations.  The issue is a bit related what I was concerned in comment 10, but anyway I've filed bug 1463605 for that, and I hope that bug will fix annoying edge cases.
Flags: needinfo?(hikezoe)
Looking good on crash-stats so far!
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Please request Beta approval on this when you get a chance. Would be great to get it into 61.0b8 tomorrow.
Flags: needinfo?(matt.woodrow)
Comment on attachment 8976812 [details]
Bug 1462497 - Part 1: Don't crash if we find an item from the wrong list, and keep looking instead

Approval Request Comment
[Feature/Bug causing the regression]: Retained-dl
[User impact if declined]: Crashes on some content.
[Is this code covered by automated tests?]: Not really, these are really transient and hard to reproduce reliably.
[Has the fix been verified in Nightly?]: Not completely, but the crash rates have dropped.
[Needs manual test from QE? If yes, steps to reproduce]: I don't think we can easily, other than watching for crashes to stop.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: First patch adds an extra layer of security when doing the list index lookup to ensure that it can't ever be wrong, and should wallpaper over all crashes (with the possibility of subtle/transient rendering issues instead).
The remaining patches address the actual bugs identified, and just make us a bit more conservative in deciding when we can optimize away opacity items, and invalidating more for caret changes.
[String changes made/needed]: None.
Flags: needinfo?(matt.woodrow)
Attachment #8976812 - Flags: approval-mozilla-beta?
Comment on attachment 8976812 [details]
Bug 1462497 - Part 1: Don't crash if we find an item from the wrong list, and keep looking instead

RDL stability fix. Approved for 61.0b8.
Attachment #8976812 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1463842
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/69dca4b4848f
Follow-up to fix code that was accidentally inside a MOZ_DIAGNOSTIC_ASSERT_ENABLED block.
I guess we're hoping that bug 1464641 helps further reduce the crashes with this signature? Otherwise, we should probably file a new bug for them.
Flags: needinfo?(matt.woodrow)
the signature is still present in 61.0b10 devedition builds - this query is/will be listing the moz_crash_reason fields:
https://bit.ly/2JnOLAp
Blocks: 1466010
This is still a high volume on beta 62. It looks like new work is happening now in bug 1466010. Can we call this a duplicate of bug 1466010?
Yeah it was closed as resolved, and bug 1466010 has taken over the tracking. It's clearly not actually resolved though, so switching the resolution to duplicate is fine with me :)
Flags: needinfo?(matt.woodrow)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: