Crash in [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ] when you drag the scrollbar on a busy SVG with high zoom level

VERIFIED FIXED in Firefox -esr60

Status

()

defect
P3
critical
VERIFIED FIXED
6 months ago
3 months ago

People

(Reporter: mayankleoboy1, Assigned: kats)

Tracking

({crash})

Trunk
mozilla66
Points:
---

Firefox Tracking Flags

(firefox-esr60 fixed, firefox64 wontfix, firefox65 fixed, firefox66 fixed)

Details

(crash signature, )

Attachments

(1 attachment)

Reporter

Description

6 months ago
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

1. Enable WR
2. Open some large SVG like https://static.mozilla.com/moco/en-US/images/mozilla_eoy_2013_EN.svg
3. Zoom to a ridiculous level like 2000%
4. Rapidly scroll up and down with the scrollbar for some time. CPU usage becomes 100% because WR+SVG
5. After the browser goes into the 100% CPU mode, click on the page. This should make the page white/not responding
6. Try to forcibly click on the scrollbars and drag them


Actual results:

browser crashes with the signature


Expected results:

maybe not so.
Reporter

Updated

6 months ago
Severity: normal → critical
Crash Signature: [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ]
Keywords: crash
Reporter

Updated

6 months ago
Component: General → Graphics: WebRender
Summary: Crash in [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ] when WR is enabled and you try to scroll on a busy page → Crash in [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ] when WR is enabled and you drag the scrollbar on a busy SVG
Thanks! The majority of these crashes seem to be with WR off so I'll move this to the APZ component.
Component: Graphics: WebRender → Panning and Zooming
Both call sites of APZCTreeManager::StartScrollbarDrag make sure they are passing in valid / up to date layers-id [1] [2]. However, both of them then dispatch the call to the controller thread rather than making the call directly. I guess, while the runnable is running on the compositor thread, something else can cause the layer tree to go away, leading to the layers-id being stale when the runnable is invoked?

Does that mean we should just gracefully accept (do nothing) if we can't find the GeckoController for the layers-id? We MOZ_ASSERT the GeckoController in many places, should this apply to more / all of them?

[1] https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/gfx/layers/ipc/APZCTreeManagerParent.cpp#134
[2] https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/widget/nsBaseWidget.cpp#1822
(In reply to Botond Ballo [:botond] from comment #3)
> I guess, while the runnable is running on the compositor thread,

That should be "while the runnable is waiting to run on the controller thread".
The thing that puzzled me about this one was the STR don't sound like they should destroy a layers id. The tab/window is open the whole time and it's just dragging around the scrollbar. So I wanted to try and reproduce it and see why the GeckoController was going away.
I just tried reproducing it and couldn't. I also don't know how you got it to zoom to 2000% but I don't think that's normally supported.

Given the crash stats are showing this crash happens in the wild it might be worth just applying a null-check bandaid here.
Priority: -- → P3
Reporter

Comment 7

6 months ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> I just tried reproducing it and couldn't. I also don't know how you got it
> to zoom to 2000% but I don't think that's normally supported.


I did CTRL+Mouse scroll on Windows.

I also reproduced it (but not 100%) on a large page full of emojis.
Even with ctrl+mousewheel I can only scroll to 300% max. I'll write the bandaid patch.
Assignee: nobody → kats
I mean "zoom to 300% max"

Comment 11

5 months ago
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9f971d2709ce
Apply a band-aid null check for crashes happening in the wild. r=botond
This would be a good candidate to uplift. According to crash-stats the crash has been happening on a number of releases and the band-aid patch is trivial. Once it's baked a bit on Nightly we should uplift.
Has Regression Range: --- → no
Has STR: --- → no
OS: Unspecified → All
Hardware: Unspecified → All

Comment 13

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9f971d2709ce
Status: NEW → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Reporter

Comment 14

5 months ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)
> Even with ctrl+mousewheel I can only scroll to 300% max. I'll write the
> bandaid patch.

you have to modify this pref to a large value: 
zoom.maxPercent
That's what I mean by "not supported"
Seems odd to me that we're seeing that many crashes on ESR60 if it requires non-default settings to reproduce, but meh. Agreed that it looks safe-enough to uplift.
Summary: Crash in [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ] when WR is enabled and you drag the scrollbar on a busy SVG → Crash in [@ mozilla::layers::APZCTreeManager::NotifyScrollbarDragRejected ] when you drag the scrollbar on a busy SVG with high zoom level
Comment on attachment 9031584 [details]
Bug 1511701 - Apply a band-aid null check for crashes happening in the wild. r?botond

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: unknown

User impact if declined: Crashes while dragging scrollbar

Is this code covered by automated tests?: Yes

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: Yes

If yes, steps to reproduce: No STR for the crash, but would be good to generally exercise scrollbar dragging, particular while trying to switch tabs at the same time

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Simple null-check bandaid, so very little risk

String changes made/needed: none
Attachment #9031584 - Flags: approval-mozilla-beta?
Flags: qe-verify+
Comment on attachment 9031584 [details]
Bug 1511701 - Apply a band-aid null check for crashes happening in the wild. r?botond

[Triage Comment]
Adds a null check to wallpaper over a crash. Approved for 65.0b5.
Attachment #9031584 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I didn't manage to reproduce the crash.

On Windows 10x64 and Ubuntu 16.04 x64 I've noticed that on the latest Nightly 66.0a1 the scroll is done much smoother than on the affected version and the CPU is about half (40% - affected version and 20% - latest Nightly.)

On the other hand, on macOS 10.13 I get the same behaviour on the latest Nightly like I get from the affected Nightly (2018-12-02). As soon as I touch the scroll bar the whole browser freezes and I get a 300% CPU. 

I am not sure how am I supposed to verify on beta, because the pref. gfx.webrender.enabled doesn't do anything.
Flags: needinfo?(kats)
(In reply to Oana Botisan from comment #20)
> I didn't manage to reproduce the crash.
> 
> On Windows 10x64 and Ubuntu 16.04 x64 I've noticed that on the latest
> Nightly 66.0a1 the scroll is done much smoother than on the affected version
> and the CPU is about half (40% - affected version and 20% - latest Nightly.)

This is likely due to other changes that landed in the same timeframe, not from this patch specifically.

> I am not sure how am I supposed to verify on beta, because the pref.
> gfx.webrender.enabled doesn't do anything.

You can set MOZ_WEBRENDER=1 as an environment variable to force-enable on beta. But not testing on beta is fine too, I think. The patch is pretty trivial.
Flags: needinfo?(kats)
Considering the fact that I can't reproduce the issue, can you please verify the fix on beta 65.0b5(if possible) and latest Nightly 66.0a1? 
Thank you.
Flags: needinfo?(kats)
I can't either - Mayank, can you check if the patch helped?
Flags: needinfo?(kats) → needinfo?(mayankleoboy1)
Reporter

Comment 24

5 months ago
Couldnt repro when I tried.
Flags: needinfo?(mayankleoboy1)

Hi Kats, since ESR60 is affected, should we consider uplifting to ESR60 repo?

Flags: needinfo?(kats)

Yeah, we could. It's a safe enough patch.

Flags: needinfo?(kats)

Comment on attachment 9031584 [details]
Bug 1511701 - Apply a band-aid null check for crashes happening in the wild. r?botond

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Simple crash fix (null check) for crashes that are happening in the wild per crash-stats

User impact if declined: Crashes

Fix Landed on Version: 66

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Null check

String or UUID changes made by this patch: none

Attachment #9031584 - Flags: approval-mozilla-esr60?

Comment on attachment 9031584 [details]
Bug 1511701 - Apply a band-aid null check for crashes happening in the wild. r?botond

Trivial null check crash fix, approved for 60.5.0esr.

Attachment #9031584 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+

Considering the fact that I can't seem to reproduce the issue and per comment 24, neither could the reporter, I will take the qe-verify + flag out.

Flags: qe-verify+
Status: RESOLVED → VERIFIED
See Also: → 1525450
You need to log in before you can comment on or make changes to this bug.