Closed Bug 1471671 Opened 6 years ago Closed 6 years ago

Crash in mozilla::layers::ClipManager::DefineScrollLayers

Categories

(Core :: Graphics: WebRender, defect, P2)

Unspecified
All
defect

Tracking

()

VERIFIED FIXED
mozilla65
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 --- disabled
firefox63 --- disabled
firefox64 --- fixed
firefox65 --- verified

People

(Reporter: jan, Assigned: aosmond)

References

(Blocks 1 open bug)

Details

(Keywords: crash, nightly-community, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

Seen on Socorro. Linux and Mac so far.

bp-9ec7baa6-214b-4eb2-89b2-271460180622 build 20180622100109 Linux
> MOZ_RELEASE_ASSERT(mIsSome)
>  0 	libxul.so 	mozilla::layers::ClipManager::DefineScrollLayers(mozilla::ActiveScrolledRoot const*, nsDisplayItem*, mozilla::layers::StackingContextHelper const&) 	mfbt/Maybe.h:549

Is this something like bug 1467999?
Flags: needinfo?(bugmail)
Maybe. It's odd that there's only 7 crashes, and it looks like 6 of them are from a single user. From the crash stack it's not clear which Maybe<> object is being improperly dereferenced, but I think [1] is most likely. And that might be the result of the user having flipped some random prefs and triggering a weird (unsupported) codepath.

Let's wait and see if this keeps happening. I can upgrade the MOZ_ASSERT just above [1] to a MOZ_RELEASE_ASSERT to make it crash earlier and confirm my theory, if needed.

[1] https://searchfox.org/mozilla-central/rev/14cb8f1c238735ba1abe18ad44e39808983c2572/gfx/layers/wr/ClipManager.cpp#268
Flags: needinfo?(bugmail)
Priority: -- → P3
Whiteboard: [gfx-noted]
Assignee: nobody → bugmail
Priority: P3 → P1
30 crashes on 3 Windows installations.
Crash Signature: [@ mozilla::layers::ClipManager::DefineScrollLayers ] → [@ mozilla::layers::ClipManager::DefineScrollLayers ] [@ class mozilla::Maybe<T> mozilla::layers::ClipManager::DefineScrollLayers ]
We can't release this to the field, but we can let this ride to beta.
Assignee: kats → nobody
Priority: P1 → P2
#2 non-hang crash on the 9-19 Linux Nightlies, with 6 crashes.
Priority: P2 → P3
Priority: P3 → P2
Assignee: nobody → aosmond
Flags: needinfo?(aosmond)
24 reports on beta already with the brief time it has been out, and the highest volume WR one (AFAIK). Probably should stamp this out sooner rather than later.
I believe the actual line given by the crash report is misleading. The only place (that I have found thus far) where we deref a Maybe without checking first is:

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/gfx/layers/wr/ClipManager.cpp#295

Which is calculated by:

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/generic/nsGfxScrollFrame.cpp#3959

Which can generate Nothing, if our assumptions are failing to hold.
Whoops I should have read comment 2 :).
Flags: needinfo?(aosmond)
Bug 1453367 comment 2 (through to 4 and beyond) seem relevant here.
I think the crash volume is too high to be a result of people flipping that containerful scrollframe pref. But it probably makes sense to eliminate as a possibility, which we can do by adding a MOZ_RELEASE_ASSERT(!gfxPrefs::LayoutUseContainersForRootFrames()) somewhere in the ClipManager constructor or similar.
Also if it were due to that the crashes would all be startup crashes, I think. I haven't checked if that's the case.
The other path this could happen is if mWillBuildScrollableLayer is false in ComputeScrollMetadata. That seems to boil down to the result from GetDisplayPort(Impl).

https://searchfox.org/mozilla-central/rev/a7f4d3ba4fbfe3efbde832869f1d672fce7122f6/layout/base/nsLayoutUtils.cpp#1016

Could we just not have the display port data setup?
I don't know of a scenario in which that would happen, but I suppose it must be happening. The code that does all this is super hairy and complex and I haven't looked at it for over a year so I wouldn't be surprised if stuff is broken in some cases.

If we want to get rid of the crash and replace it with possibly-incorrect behaviour I think we should detect when ComputeScrollMetadata returns Nothing() and treat it the same as we do the !IsScrollable() case a few lines down - i.e. just return the ancestorScrollId (but also add a MOZ_ASSERT(false) there to catch the case in local debug builds). At least then we might get people filing bugs about scrolling not working properly in some cases and we might get reproducible STR.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> I don't know of a scenario in which that would happen, but I suppose it must
> be happening. The code that does all this is super hairy and complex and I
> haven't looked at it for over a year so I wouldn't be surprised if stuff is
> broken in some cases.
> 
> If we want to get rid of the crash and replace it with possibly-incorrect
> behaviour I think we should detect when ComputeScrollMetadata returns
> Nothing() and treat it the same as we do the !IsScrollable() case a few
> lines down - i.e. just return the ancestorScrollId (but also add a
> MOZ_ASSERT(false) there to catch the case in local debug builds). At least
> then we might get people filing bugs about scrolling not working properly in
> some cases and we might get reproducible STR.

Sounds good to me. I'll put up a patch. I reviewed the URL data with the crash reports, and there doesn't seem to be any standouts, except maybe YouTube in general, but I couldn't reproduce the crashes on any of the top links for Windows or Linux.

Looking at the non-WR code, it looks like it will just skip handling the ASR without the metadata:

https://searchfox.org/mozilla-central/rev/72b1e834f384a2ffec6eb4ce405fbd4b5e881109/layout/painting/FrameLayerBuilder.cpp#6169-6171

So it is possible returning the ancestor is the right thing to do anyways? :)
We are seeing crash reports in the wild where there is no scroll
metadata available for an ASR for a display item and its clip. It
appears that in the non-WR path, it skips such items, so we should
probably do the same thing with WebRender. If the scrolling ends up
being wrong, hopefully a reproducible use case will make its way to use
to further debug, as the crash reports have not yielded anything to date.
See Also: → 1405814
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d897bc50f75
Avoid crash with WebRender when the scroll metadata is unavailable. r=kats
https://hg.mozilla.org/mozilla-central/rev/6d897bc50f75
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
See Also: → 1502454
No crashes on Nightly since this landed \m/. Please nominate this for Beta uplift when you get a chance.
Status: RESOLVED → VERIFIED
Flags: needinfo?(aosmond)
It morphed into bug 1502454, so I would like to request uplift of them together, once it has landed and been verified.
Comment on attachment 9019813 [details]
Bug 1471671 - Avoid crash with WebRender when the scroll metadata is unavailable.

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: May occasionally crash the content process if using WebRender.

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce: 

List of other uplifts needed: Bug 1471671

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): We were crashing before, now the theoretical worst case would be the scrolling on the page is flaky. Only affects WebRender.

String changes made/needed: N/A
Flags: needinfo?(aosmond)
Attachment #9019813 - Flags: approval-mozilla-beta?
Comment on attachment 9019813 [details]
Bug 1471671 - Avoid crash with WebRender when the scroll metadata is unavailable.

webrender crash fix, approved for 64.0b8
Attachment #9019813 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Andrew Osmond [:aosmond] from comment #22)
> Is this code covered by automated tests?: Yes
> 
> Needs manual test from QE?: No

Marking as qe-verify- per comment 22.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: