Closed Bug 1280428 Opened 3 years ago Closed 3 years ago

Crash in nsChildView::DispatchAPZWheelInputEvent

Categories

(Core :: Widget: Cocoa, defect, P3, critical)

47 Branch
Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: marcia, Assigned: kats)

Details

(Keywords: crash, regressionwindow-wanted, Whiteboard: tpi:+)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-23ac059c-ffb9-49e6-a498-8712f2160616.
=============================================================

Seen while looking at Mac crash stats - https://crash-stats.mozilla.com/report/list?signature=nsChildView::DispatchAPZWheelInputEvent. Small volume Mac crash which affects multiple branches.

Comments talk about crashing when selecting dropdowns and popup windows - one specific comment "drop-down menu in popup window"
Priority: -- → P1
Whiteboard: tpi:+
I tried to test this issue on Mac OS X 10.10 with FF Nightly 50.0a1 (2016-07-18) and I can't reproduce any crash. I have tried to test with http://www.popuptest.com/
Priority: P1 → P3
Crash volume for signature 'nsChildView::DispatchAPZWheelInputEvent':
 - nightly (version 51): 3 crashes from 2016-08-01.
 - aurora  (version 50): 21 crashes from 2016-08-01.
 - beta    (version 49): 60 crashes from 2016-08-02.
 - release (version 48): 309 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       2       1
 - aurora        8       6       2
 - beta         28      15       6
 - release      94      78      23
 - esr           0       0       0

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly
 - aurora  #133
 - beta    #648
 - release #159
 - esr
I just ran into this when using the "forget" button while using fx50.0a2. It's definitely not easy to reproduce, and it might take 30-40 attempts, but I managed to reproduce the crash three different times using the following STR:

* installed the latest version of m-a (fx50.0a2, buildId: 20160913004005, changeset: 696a981b6d53)
* once installed, moved the forget button from the hamburger menu into the toolbar so it's easier to access
* now quickly, click on the forget button, make sure you drag the mouse cursor over the three possible selections and quickly click on "Forget!"

Crashes:

* https://crash-stats.mozilla.com/report/index/c2d578d1-d19d-4fad-b609-827a42160913
* https://crash-stats.mozilla.com/report/index/c2d578d1-d19d-4fad-b609-827a42160913
* https://crash-stats.mozilla.com/report/index/c7581037-00e7-4607-ae93-8417d2160913
Crash volume for signature 'nsChildView::DispatchAPZWheelInputEvent':
 - nightly (version 52): 4 crashes from 2016-09-19.
 - aurora  (version 51): 13 crashes from 2016-09-19.
 - beta    (version 50): 0 crashes from 2016-09-20.
 - release (version 49): 656 crashes from 2016-09-05.
 - esr     (version 45): 0 crashes from 2016-07-25.

Crash volume on the last weeks (Week N is from 10-17 to 10-23):
            W. N-1  W. N-2  W. N-3  W. N-4
 - nightly       1       1       1       0
 - aurora        7       3       1       0
 - beta          0       0       0       0
 - release     212     195     151      31
 - esr           0       0       0       0

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly #410
 - aurora  #240
 - beta
 - release #265
 - esr
Looks like the most useful stack frame got optimized away, so it's not easy to tell from the crash stack what the bad dereference is. :/
Given that the topmost stackframe is in operator bool() of a RefPtr, the problem is likely that |this| is null either at [1] or [2]. It seems less likely to be null at [1] because then you'd expect crashes going into the function. I wonder if we need to hold a kungFuDeathGrip to |this| inside the function, because the swipeTracker call at [3] can run basically arbitrary code and the window might get destroyed during it.

[1] https://hg.mozilla.org/releases/mozilla-release/annotate/b0310cb90fd0/widget/cocoa/nsChildView.mm#l2755
[2] https://hg.mozilla.org/releases/mozilla-release/annotate/b0310cb90fd0/widget/cocoa/nsChildView.mm#l2767
[3] https://hg.mozilla.org/releases/mozilla-release/annotate/b0310cb90fd0/widget/cocoa/nsChildView.mm#l2759
Comment on attachment 8823444 [details]
Bug 1280428 - Hold a strong pointer to the nsChildView while dispatching events (speculative crash fix).

https://reviewboard.mozilla.org/r/101968/#review102340

I think by this point it's already too late; `this` is already null. I think we need to hold a strong reference to mGeckoChild in -[ChildView scrollWheel:] instead, or alternatively re-check mGeckoChild after each event dispatch in that function.
Attachment #8823444 - Flags: review?(mstange)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> Given that the topmost stackframe is in operator bool() of a RefPtr, the
> problem is likely that |this| is null either at [1] or [2].

In bp-23ac059c-ffb9-49e6-a498-8712f2160616, the crash is happening at [1].

I found this out by downloading the symbol information of the relevant XUL library from http://symbols.mozilla.org/XUL/BF1E4087BA9F3DC9A552019B311627BE0/XUL.sym and finding the module_offset "0x1bffbba" in the crash report's raw dump. The relevant part from the symbol dump is:

> FUNC 1bffba0 b3a 0 nsChildView::DispatchAPZWheelInputEvent(mozilla::InputData&, bool)
> 1bffba0 1a 2754 12077
> 1bffbba c 289 12060
> 1bffbc6 6 2755 12077
> 1bffbcc 8 2759 12077
> 1bffbd4 16 2761 12077
> 1bffbea 13 568 8311
> 1bffbfd a 569 8311
> 1bffc07 20 88 8390
> 1bffc27 a 92 8390

1bffbba is the second line, and the lines around it are at nsChildView.mm:2754 and nsChildView.mm:2755. (There's a file table in that symbol dump which maps the file identifier 12077 to nsChildView.mm.) And https://hg.mozilla.org/releases/mozilla-release/annotate/b0310cb90fd0/widget/cocoa/nsChildView.mm#l2755 is the mSwipeTracker check.
Oh wow, thanks (both for doing that, and for showing how)! I'll update the patch. I also think crash-stats should automate at least some of what you had to do manually above, I've filed bug 1328585 for that.
Comment on attachment 8823444 [details]
Bug 1280428 - Hold a strong pointer to the nsChildView while dispatching events (speculative crash fix).

https://reviewboard.mozilla.org/r/101968/#review102522

r+ with that additional null check

::: widget/cocoa/nsChildView.mm:4882
(Diff revision 2)
>      [self sendWheelCondition:NO
>                         first:eWheelOperationStart
>                        second:eWheelOperationEnd
>                      forEvent:theEvent];
>    }
>  

I think we need another mGeckoChild null check here. We access the nsChildView through mGeckoChild further down, which can be nulled out during sendWheelCondition even if the widget survives through that refptr. (sendWheelCondition -> ... -> nsChildView::Destroy -> -[ChildView widgetDestroyed])
Attachment #8823444 - Flags: review?(mstange) → review+
After discussion with markus on IRC, I added the extra null check, moved the deathgrip down to be after that second null check, and dereferenced the deathgrip instead of mGeckoChild in the rest of the function. I got r+ for the changes on IRC.
Assignee: nobody → bugmail
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/85d94636124e
Hold a strong pointer to the nsChildView while dispatching events (speculative crash fix). r=mstange
https://hg.mozilla.org/mozilla-central/rev/85d94636124e
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8823444 [details]
Bug 1280428 - Hold a strong pointer to the nsChildView while dispatching events (speculative crash fix).

Approval Request Comment
[Feature/Bug causing the regression]: unknown
[User impact if declined]: low-volume crash
[Is this code covered by automated tests?]: the code in question is exercised by existing tests
[Has the fix been verified in Nightly?]: no, because we don't have concrete STR for the crash. fix is speculative. crash-stats volume on nightly is too low to confirm the fix, but there haven't been any crashes since this landed.
[Needs manual test from QE? If yes, steps to reproduce]: no STR
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: small change to use a pattern that is widely used in our code
[String changes made/needed]: none
Attachment #8823444 - Flags: approval-mozilla-aurora?
Comment on attachment 8823444 [details]
Bug 1280428 - Hold a strong pointer to the nsChildView while dispatching events (speculative crash fix).

speculative fix for mac crash, aurora52+
Attachment #8823444 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.