Closed Bug 1448841 Opened 3 years ago Closed 3 years ago

Crash in MergeState::ProcessItemFromNewList

Categories

(Core :: Web Painting, defect)

Unspecified
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 blocking verified

People

(Reporter: pascalc, Assigned: mattwoodrow)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files)

This bug was filed from the Socorro interface and is
report bp-bbeeaf61-c7c6-4ec1-9361-f9b0a0180326.
=============================================================

Top 10 frames of crashing thread:

0 libxul.so MergeState::ProcessItemFromNewList layout/painting/RetainedDisplayListBuilder.cpp:267
1 libxul.so RetainedDisplayListBuilder::MergeDisplayLists layout/painting/RetainedDisplayListBuilder.cpp:484
2 libxul.so MergeState::ProcessItemFromNewList layout/painting/RetainedDisplayListBuilder.cpp:266
3 libxul.so RetainedDisplayListBuilder::MergeDisplayLists layout/painting/RetainedDisplayListBuilder.cpp:484
4 libxul.so MergeState::ProcessItemFromNewList layout/painting/RetainedDisplayListBuilder.cpp:266
5 libxul.so RetainedDisplayListBuilder::MergeDisplayLists layout/painting/RetainedDisplayListBuilder.cpp:484
6 libxul.so RetainedDisplayListBuilder::AttemptPartialUpdate layout/painting/RetainedDisplayListBuilder.cpp:1085
7 libxul.so nsLayoutUtils::PaintFrame [clone .cold.1118] 
8 libxul.so mozilla::PresShell::Paint 
9 libxul.so nsViewManager::ProcessPendingUpdates 

=============================================================


I got many crashes today when using TweetDeck. Tweetdeck got updated today (3.7) so I don't know if the changes are triggering crashes in Firefox or of today's update in Nightly are triggering the crashes.
There are 76 crashes (from 29 installations) in nightly 61 with buildid 20180326100107. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1443027.

[1] https://hg.mozilla.org/mozilla-central/rev?node=d9f154931d6d
Blocks: clouseau, 1443027
Component: Untriaged → Layout: Web Painting
Flags: needinfo?(matt.woodrow)
Keywords: regression
Product: Firefox → Core
Duplicate of this bug: 1448829
Reproduced similar crashes with the hangout/gtalk feature on Gmail. Reproduce 100% of the time when trying to type a message.
(In reply to Clément Lefèvre from comment #3)
> Reproduce 100% of the time when trying to type a message [in Gmail hangout/gtalk/chat].

For me even opening the chat box crashes the tab; no typing necessary.  (Automatic text field focus event in there may be related though.)
Bug 1443027 was backed out and new Nightly builds are currently running. Should be available within a few hours.
Assignee: nobody → matt.woodrow
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(matt.woodrow)
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Duplicate of this bug: 1448851
Reopening this since it's an existing issue, bug 1443027 just made us crash instead of having incorrect APZ hit testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This crash happens when we end up with a cycle in our merged DAG.

The problem in this case is both scrollbars, and nsDisplayCompositorHitTestInfo in nsGfxScrollFrame get their z-index set based on the highest z-index display item created by the scrollframe. When doing partial builds, we can end up getting different results to a full build, and we change the z-index (and thus the order of the display list) without issuing any invalidations.

It seems that we really do want the z-index based on the full list, not the partial list, so we can't easily know the answer during partial building.

I tried writing code to mark the items as needing moving, and then finding their correct position (on top of all content from the given scrollframe) after we've finished moving and have the full list.

The current code puts the items into the Content() list if no z-index items were found, and into the PositionedDescendants() list if there were (along with a z-index value to make sure it's at the top).

Pseudo-stacking contexts can mean that these two source lists can end up in a different final display list, so we really need to be in PositionedDescendants (guaranteed to be higher) to do the post-merge sorting. The only other option is to figure out a way to do post-merge sorting across multiple lists, but that's painful.

We then run into a problem with nsTextAreaFrame, as it combines all source lists into the Content() list. Scrollbars from an nsTextAreaFrame will end up in the Content() list (regardless of where we put them), which will be behind scrollbars from an earlier frame in the PositionedDescendants() list.

layout/forms/test/test_bug1301290.html is a test for this exact example not happening.

It's somewhat fragile, since it relies on the earlier frames contributing their scrollbars to Content(), which only happens if they don't have any other PositionedDescendants.

Does anyone know why text area contributes to the Content list instead of just being a real stacking context and contributing to PositionedDescendants? Moving to that list would solve this issue.

Another option would be to add a new source list type for scrollbars (that goes at the very top, always), and make sure nsTextAreaFrame isn't including that into the combined lists. This would be effectively the same as using INT32_MAX as the z-index for all other cases, so would include a behaviour change.

The only other thing I can think of would be to just accept the regression on text area scrollbar positioning.
Flags: needinfo?(tnikkel)
Comment on attachment 8963518 [details]
Bug 1448841 - Part 1: Add crashtest for CompositorHitTestInfo changing z-index without an invalidation.

https://reviewboard.mozilla.org/r/232450/#review238068
Attachment #8963518 - Flags: review?(mstange) → review+
Comment on attachment 8963519 [details]
Bug 1448841 - Part 2: Only compute the highest z-index for overlay scrollbars.

https://reviewboard.mozilla.org/r/232452/#review238070
Attachment #8963519 - Flags: review?(mstange) → review+
Comment on attachment 8963520 [details]
Bug 1448841 - Part 3: Disable partial display list building when we have overlay scrollbars.

https://reviewboard.mozilla.org/r/232454/#review238072
Attachment #8963520 - Flags: review?(mstange) → review+
Comment on attachment 8963521 [details]
Bug 1448841 - Part 4: Always put the CompositorHitTestInfo for inactive scrollports at the very top.

https://reviewboard.mozilla.org/r/232456/#review238074
Attachment #8963521 - Flags: review?(mstange) → review+
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b895a035f5e3
Part 1: Add crashtest for CompositorHitTestInfo changing z-index without an invalidation. r=mstange
https://hg.mozilla.org/integration/autoland/rev/f49364d5ebad
Part 2: Only compute the highest z-index for overlay scrollbars. r=mstange
https://hg.mozilla.org/integration/autoland/rev/130a8f287efe
Part 3: Disable partial display list building when we have overlay scrollbars. r=mstange
https://hg.mozilla.org/integration/autoland/rev/fba4de2167c2
Part 4: Always put the CompositorHitTestInfo for inactive scrollports at the very top. r=mstange
Talked to Matt on irc.
Flags: needinfo?(tnikkel)
The signature reappeared on nightly 61 with buildid 20180330100123 and probably a consequence of patch for bug 1443027.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 1450278
Depends on: 1450360
Depends on: 1450407
Depends on: 1450520
No longer depends on: 1450407
This is reliably triggering for me when trying to accept a calendar invite in gmail. https://crash-stats.mozilla.com/report/index/d7589458-af0a-45fd-8a75-1f0090180403#tab-details
There appears to be a small spike for Mac users in the 4-3 build: https://bit.ly/2q31exH
They're probably related to this patch: https://hg.mozilla.org/mozilla-central/rev?node=1e3dc6112e76
Flags: needinfo?(matt.woodrow)
I've hit this a couple of times.
Bug 1443027 has been backed out and new OS X nightlies requested.
This regressed again, because the critical fix piece got accidentally lost during a merge. More details in bug 1450360.
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Flags: needinfo?(matt.woodrow)
Resolution: --- → FIXED
Duplicate of this bug: 1451197
What buildid should have the fix here?  I just got the crash in 20180404100127.

https://crash-stats.mozilla.com/report/index/4027260b-4ed1-4b33-a005-8b11f0180404#tab-details
Flags: needinfo?(matt.woodrow)
(In reply to Ben Kelly [Part-time, not reviewing][:bkelly] from comment #30)
> What buildid should have the fix here?  I just got the crash in
> 20180404100127.
> 
> https://crash-stats.mozilla.com/report/index/4027260b-4ed1-4b33-a005-
> 8b11f0180404#tab-details

That actually looks like a different crash. Any ideas on STR?
Flags: needinfo?(matt.woodrow)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.