Closed Bug 1256344 Opened 8 years ago Closed 8 years ago

crash in mozilla::layers::AsyncPanZoomController::OnLongPress

Categories

(Core :: Panning and Zooming, defect)

48 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: MatsPalmgren_bugz, Assigned: kats)

References

Details

(Keywords: crash, Whiteboard: gfx-noted)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-b9334420-832d-4111-89e8-d77882160313.
=============================================================

This signature appears to have become more frequent in v48.
The stack is similar to bug 1256341 so it might be the same underlying issue.
The volume is low so far (for the past 28 days):

Operating System	Percentage	Number Of Crashes
Windows 10	64.29%	9
Windows 8.1	28.57%	4
Linux	7.14%	1

Product	Version	Percentage	Number Of Crashes
Firefox	48.0a1	78.57%	11
FennecAndroid	48.0a1	7.14%	1
Firefox	46.0a2	7.14%	1
Firefox	47.0a1	7.14%	1

Architecture	Percentage	Number Of Crashes
amd64	50.00%	7
x86	42.86%	6
arm	7.14%	1

Stack:
mozilla::layers::AsyncPanZoomController::OnLongPress(mozilla::TapGestureInput const&)
mozilla::layers::AsyncPanZoomController::HandleGestureEvent(mozilla::InputData const&)
mozilla::layers::GestureEventListener::HandleInputTimeoutLongTap()
MessageLoop::DoDelayedWork(base::TimeTicks*)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()


BTW, the Linux crash is bp-a166d953-9f27-414c-ad7a-460a52160313
That stack is slightly different.
Fwiw, this signature is currently at #22 in Firefox 48.0a1 Top Crash list.
(Together with bug 1256341 they would be at #10.)
This comes from:
http://hg.mozilla.org/mozilla-central/diff/2f392da0634f/gfx/layers/apz/src/AsyncPanZoomController.cpp#l1665

CurrentTouchBlock() is null. Kats is a null check appropriate here?
Blocks: 1113386
Flags: needinfo?(bugmail.mozilla)
Whiteboard: gfx-noted
While bug 1113386 is the patch that added the call to CurrentTouchBlock(), it may not be what regressed this, as it landed over a year ago, and we're only hearing about these crashes now. I suspect it's more likely that a more recent patch broke the invariant that CurrentTouchBlock() cannot be null there.
I think the call to CurrentTouchBlock() is definitely wrong, since CurrentTouchBlock() is intended to be called while the active input block is a touch block. In both this bug and bug 1256341, the call happens from a posted runnable at which point the active input block may be something other than the touch block that posted the runnable.

I suspect this problem has existed since this code was added, but is only being exposed on desktop because actually get other kinds of input blocks (in particular, wheel/mouse input) whereas on B2G that never happened. This is probably easy to reproduce if you are interleaving different kinds of input.
Yeah, I think you're right. I thought perhaps we had some logic to cancel the relevant runnables when a new kind of input block is started, but it doesn't look like we do.
Adding a null check in OnLongPress() seems fine - a touch block that's gone away cannot be in a fast fling.

Adding one in GenerateSingleTap() is also safe, but we have a latent bug here:

  - When processing a touch-end, we send an APZStateChange::EndTouch notification, with an
    aWasClick argument that's set to TouchBlockState::mSingleTapOccurred. This is used
    by ActiveElementManager.

  - TouchBlockState::mSingleTapOccurred is set in GenerateSingleTap().

  - The bug is that, in general, GenerateSingleTap() can happen *after* the touch-end is
    processed (if we were waiting for a timeout), so the value of aWasClick being sent
    with the EndTouch notification may not be accurate.

Do we care?
(In reply to Botond Ballo [:botond] from comment #6)
> Adding a null check in OnLongPress() seems fine - a touch block that's gone
> away cannot be in a fast fling.
> 
> Adding one in GenerateSingleTap() is also safe,

I'm not 100% convinced yet. The scenario would be as follows:

a) user kicks off a fling (touch down, bunch of touch moves, touch up with high velocity). All of this is input block #1
b) user does a quick tap (touch down, touch up). This is input block #2 and causes the tap timeout to get scheduled
c) user does a wheel event, input block #3
d) tap timeout fires, with the "current input block" as #3 (i.e. CurrentTouchBlock returns nullptr)

So here input block #2 will have the mDuringFastFling flag set on it, because that is set during input block construction. At step (d) I don't think we want to process it as a tap; instead we want to ignore it because at step (b) the page was flinging fast and the user stopped the fling with the tap.

Or to put it another way, if you omit step (c), we wouldn't generate the single-tap event. So we shouldn't generate it with step (c) either.

> but we have a latent bug
> here:
> 
>   - When processing a touch-end, we send an APZStateChange::EndTouch
> notification, with an
>     aWasClick argument that's set to TouchBlockState::mSingleTapOccurred.
> This is used
>     by ActiveElementManager.
> 
>   - TouchBlockState::mSingleTapOccurred is set in GenerateSingleTap().
> 
>   - The bug is that, in general, GenerateSingleTap() can happen *after* the
> touch-end is
>     processed (if we were waiting for a timeout), so the value of aWasClick
> being sent
>     with the EndTouch notification may not be accurate.
> 
> Do we care?

I think we do care, but maybe in a separate bug. That might actually be of the things that's impacting bug 1227241.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> Or to put it another way, if you omit step (c), we wouldn't generate the
> single-tap event. So we shouldn't generate it with step (c) either.

Oh, good point! I forgot about the exact semantics of mDuringFastFling.

Perhaps we can save the touch block's mDuringFastFling on the GEL (or in the task), and pass it into OnLongPress?
I think we can do one better and skip scheduling the timers entirely if the touch block has mDuringFastFling, because the events they generate are gonna get ignored anyway.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> I think we can do one better and skip scheduling the timers entirely if the
> touch block has mDuringFastFling, because the events they generate are gonna
> get ignored anyway.

I've been trying to reason about whether or not that would work; in particular, whether or not it's OK to omit the state transitions that are triggered when the timers fire.

For example, consider a quick tap followed by a long tap. I think we rely on the max tap timeout expiring and setting the gesture state to NONE to prevent this from being treated as a double-tap.

(An unrelated question about this scenario - it doesn't look like in this scenario the long tap is detected as a long tap. Should it be?)
(In reply to Botond Ballo [:botond] from comment #11)
> I've been trying to reason about whether or not that would work; in
> particular, whether or not it's OK to omit the state transitions that are
> triggered when the timers fire.
> 
> For example, consider a quick tap followed by a long tap. I think we rely on
> the max tap timeout expiring and setting the gesture state to NONE to
> prevent this from being treated as a double-tap.

We can probably do something like keep the state transitions but ditch the events dispatched into APZC. I'll take this bug and see if I can cook up a fix and a gtest.

> (An unrelated question about this scenario - it doesn't look like in this
> scenario the long tap is detected as a long tap. Should it be?)

I'm not sure, that might count as a double tap.
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
Blocks: 1244402
Component: Graphics: Layers → Panning and Zooming
I wrote a gtest to exercise this, that was pretty easy. However the fix I tried (passing the IsDuringFastFling into the NewRunnableMethod and throwing away the TapGestureInput if it's true) didn't quite work. The problem is that when we do trigger a long-press, we call InjectNewTouchBlock on the input queue, which tries to copy things over from the current touch block, and if that's null there's still a crash.

Need to figure out if we should keep that old touch block around, or if it's not really valid any more, or what.
Attached patch Part 1 - Gtest (obsolete) — Splinter Review
Here's the gtest anyway.
Attached patch Part 1 - GtestSplinter Review
Attachment #8730864 - Attachment is obsolete: true
Attachment #8731310 - Flags: review?(botond)
Attached patch Part 2 - FixSplinter Review
So I concluded here that if we get a wheel event while the finger is still down, we should abort the long-press because it doesn't really make sense to dispatch that. If the finger is lifted before the wheel event comes in we should not be generating a long-press anyway since we cancel the runnable.
Attachment #8731311 - Flags: review?(botond)
Attachment #8731310 - Flags: review?(botond) → review+
Attachment #8731311 - Flags: review?(botond) → review+
https://hg.mozilla.org/mozilla-central/rev/b0079f81c85b
https://hg.mozilla.org/mozilla-central/rev/33c773637c21
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
For the record I'm able to run into this crash pretty easily while trying to tap/double-tap/drag the scrollbars on the March 16 Windows nightly.
You need to log in before you can comment on or make changes to this bug.