Crash in mozilla::layers::ClipManager::DefineScrollLayers

VERIFIED FIXED in Firefox 64

Status

()

defect
P2
critical
VERIFIED FIXED
10 months ago
6 months ago

People

(Reporter: darkspirit, Assigned: aosmond)

Tracking

(Blocks 2 bugs, {crash, nightly-community})

Trunk
mozilla65
Unspecified
All
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox61 unaffected, firefox62 disabled, firefox63 disabled, firefox64 fixed, firefox65 verified)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(1 attachment)

(Reporter)

Description

10 months ago
Seen on Socorro. Linux and Mac so far.

bp-9ec7baa6-214b-4eb2-89b2-271460180622 build 20180622100109 Linux
> MOZ_RELEASE_ASSERT(mIsSome)
(Reporter)

Comment 1

10 months ago
>  0 	libxul.so 	mozilla::layers::ClipManager::DefineScrollLayers(mozilla::ActiveScrolledRoot const*, nsDisplayItem*, mozilla::layers::StackingContextHelper const&) 	mfbt/Maybe.h:549

Is this something like bug 1467999?
Flags: needinfo?(bugmail)
Maybe. It's odd that there's only 7 crashes, and it looks like 6 of them are from a single user. From the crash stack it's not clear which Maybe<> object is being improperly dereferenced, but I think [1] is most likely. And that might be the result of the user having flipped some random prefs and triggering a weird (unsupported) codepath.

Let's wait and see if this keeps happening. I can upgrade the MOZ_ASSERT just above [1] to a MOZ_RELEASE_ASSERT to make it crash earlier and confirm my theory, if needed.

[1] https://searchfox.org/mozilla-central/rev/14cb8f1c238735ba1abe18ad44e39808983c2572/gfx/layers/wr/ClipManager.cpp#268
Flags: needinfo?(bugmail)
Priority: -- → P3
Whiteboard: [gfx-noted]
Assignee: nobody → bugmail
Priority: P3 → P1
(Reporter)

Comment 3

9 months ago
30 crashes on 3 Windows installations.
Crash Signature: [@ mozilla::layers::ClipManager::DefineScrollLayers ] → [@ mozilla::layers::ClipManager::DefineScrollLayers ] [@ class mozilla::Maybe<T> mozilla::layers::ClipManager::DefineScrollLayers ]
We can't release this to the field, but we can let this ride to beta.
Assignee: kats → nobody
Priority: P1 → P2
#2 non-hang crash on the 9-19 Linux Nightlies, with 6 crashes.
Priority: P2 → P3
Priority: P3 → P2
Comment hidden (offtopic)
Comment hidden (offtopic)
(Assignee)

Updated

6 months ago
Assignee: nobody → aosmond
Flags: needinfo?(aosmond)
(Assignee)

Comment 8

6 months ago
24 reports on beta already with the brief time it has been out, and the highest volume WR one (AFAIK). Probably should stamp this out sooner rather than later.
(Assignee)

Comment 9

6 months ago
I believe the actual line given by the crash report is misleading. The only place (that I have found thus far) where we deref a Maybe without checking first is:

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/gfx/layers/wr/ClipManager.cpp#295

Which is calculated by:

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/generic/nsGfxScrollFrame.cpp#3959

Which can generate Nothing, if our assumptions are failing to hold.
(Assignee)

Comment 10

6 months ago
Whoops I should have read comment 2 :).
Flags: needinfo?(aosmond)
(Assignee)

Comment 11

6 months ago
Bug 1453367 comment 2 (through to 4 and beyond) seem relevant here.
I think the crash volume is too high to be a result of people flipping that containerful scrollframe pref. But it probably makes sense to eliminate as a possibility, which we can do by adding a MOZ_RELEASE_ASSERT(!gfxPrefs::LayoutUseContainersForRootFrames()) somewhere in the ClipManager constructor or similar.
Also if it were due to that the crashes would all be startup crashes, I think. I haven't checked if that's the case.
(Assignee)

Comment 14

6 months ago
The other path this could happen is if mWillBuildScrollableLayer is false in ComputeScrollMetadata. That seems to boil down to the result from GetDisplayPort(Impl).

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/base/nsLayoutUtils.cpp#1016

Could we just not have the display port data setup?
I don't know of a scenario in which that would happen, but I suppose it must be happening. The code that does all this is super hairy and complex and I haven't looked at it for over a year so I wouldn't be surprised if stuff is broken in some cases.

If we want to get rid of the crash and replace it with possibly-incorrect behaviour I think we should detect when ComputeScrollMetadata returns Nothing() and treat it the same as we do the !IsScrollable() case a few lines down - i.e. just return the ancestorScrollId (but also add a MOZ_ASSERT(false) there to catch the case in local debug builds). At least then we might get people filing bugs about scrolling not working properly in some cases and we might get reproducible STR.
(Assignee)

Comment 16

6 months ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> I don't know of a scenario in which that would happen, but I suppose it must
> be happening. The code that does all this is super hairy and complex and I
> haven't looked at it for over a year so I wouldn't be surprised if stuff is
> broken in some cases.
> 
> If we want to get rid of the crash and replace it with possibly-incorrect
> behaviour I think we should detect when ComputeScrollMetadata returns
> Nothing() and treat it the same as we do the !IsScrollable() case a few
> lines down - i.e. just return the ancestorScrollId (but also add a
> MOZ_ASSERT(false) there to catch the case in local debug builds). At least
> then we might get people filing bugs about scrolling not working properly in
> some cases and we might get reproducible STR.

Sounds good to me. I'll put up a patch. I reviewed the URL data with the crash reports, and there doesn't seem to be any standouts, except maybe YouTube in general, but I couldn't reproduce the crashes on any of the top links for Windows or Linux.

Looking at the non-WR code, it looks like it will just skip handling the ASR without the metadata:

https://searchfox.org/mozilla-central/rev/72b1e834f384a2ffec6eb4ce405fbd4b5e881109/layout/painting/FrameLayerBuilder.cpp#6169-6171

So it is possible returning the ancestor is the right thing to do anyways? :)
(Assignee)

Comment 17

6 months ago
We are seeing crash reports in the wild where there is no scroll
metadata available for an ASR for a display item and its clip. It
appears that in the non-WR path, it skips such items, so we should
probably do the same thing with WebRender. If the scrolling ends up
being wrong, hopefully a reproducible use case will make its way to use
to further debug, as the crash reports have not yielded anything to date.
(Assignee)

Updated

6 months ago
See Also: → 1405814

Comment 18

6 months ago
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d897bc50f75
Avoid crash with WebRender when the scroll metadata is unavailable. r=kats

Comment 19

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6d897bc50f75
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
(Assignee)

Updated

6 months ago
See Also: → 1502454
No crashes on Nightly since this landed \m/. Please nominate this for Beta uplift when you get a chance.
Status: RESOLVED → VERIFIED
Flags: needinfo?(aosmond)
(Assignee)

Comment 21

6 months ago
It morphed into bug 1502454, so I would like to request uplift of them together, once it has landed and been verified.
(Assignee)

Comment 22

6 months ago
Comment on attachment 9019813 [details]
Bug 1471671 - Avoid crash with WebRender when the scroll metadata is unavailable.

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: May occasionally crash the content process if using WebRender.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: Bug 1471671

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): We were crashing before, now the theoretical worst case would be the scrolling on the page is flaky. Only affects WebRender.

String changes made/needed: N/A
Flags: needinfo?(aosmond)
Attachment #9019813 - Flags: approval-mozilla-beta?
Comment on attachment 9019813 [details]
Bug 1471671 - Avoid crash with WebRender when the scroll metadata is unavailable.

webrender crash fix, approved for 64.0b8
Attachment #9019813 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Andrew Osmond [:aosmond] from comment #22)
> Is this code covered by automated tests?: Yes
> 
> Needs manual test from QE?: No

Marking as qe-verify- per comment 22.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.