Closed Bug 1009733 (apz-state) Opened 6 years ago Closed 6 years ago

[meta] Audit APZC state changes and interaction with gesture detector and content listeners

Categories

(Core :: Panning and Zooming, defect)

All
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: kats, Assigned: kats)

References

Details

(Keywords: meta)

Attachments

(4 files, 7 obsolete files)

75.02 KB, patch
botond
: review+
Details | Diff | Splinter Review
8.75 KB, patch
kats
: review+
Details | Diff | Splinter Review
8.79 KB, patch
kats
: review+
kats
: checkin-
Details | Diff | Splinter Review
8.59 KB, patch
kats
: review+
Details | Diff | Splinter Review
While investigating bug 995475, I discovered that our state machine in APZC is more broken than I previously thought. For instance, it's easy to leave the APZC in state TOUCHING even though no fingers are on the screen, just by doing a tap. While working on bug 988494 I also ran into extensive problems with our state management. This really needs an overhaul, so filing this bug to implement the overhaul, which should fix both bug 988494 and bug 995475.
Assignee: nobody → bugmail.mozilla
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0)
> For instance, it's easy to leave
> the APZC in state TOUCHING even though no fingers are on the screen, just by
> doing a tap.

This symptom is consistent with APZC::OnTouchEnd() not being called after a tap, which I diagnosed in bug 1013378 as having been regressed by bug 964750.
Blocks: 964750
I think this bug has expanded sufficiently in scope (at least in my mind) that it's better turned into a metabug.
Alias: apz-state
Keywords: meta
Summary: Audit APZC state changes and interaction with gesture detector and content listeners → [meta] Audit APZC state changes and interaction with gesture detector and content listeners
Attached patch Early WIP (obsolete) — Splinter Review
First compiling WIP. Completely untested, but throwing it up anyway in case people want to get a general idea.
Attached patch WIP v2 (obsolete) — Splinter Review
Now with gtests passing
Attachment #8453965 - Attachment is obsolete: true
Attached patch WIP v3 (obsolete) — Splinter Review
More fixes, adding some documentation.
Attachment #8454430 - Attachment is obsolete: true
Attached patch Refactor boilerplate in tests (obsolete) — Splinter Review
Do you think this is a good idea? I find it really hard to start a new test without cargo-culting a bunch of code from some other test, and I think this might make it easier. If you concur I can split it into a new bug and get it landed since it's pretty independent of this bug.
Attachment #8454583 - Flags: feedback?(botond)
Comment on attachment 8454583 [details] [diff] [review]
Refactor boilerplate in tests

Moved this to bug 1037591
Attachment #8454583 - Attachment is obsolete: true
Attachment #8454583 - Flags: feedback?(botond)
Sorry for the monster patch but I couldn't find a reasonable way to do a nontrivial split so that each part could stand on its own. It all sort of ties together. My suggestion for how to review the code:

- Start by reading the class documentation in TouchBlockState.cpp
- Then read the documentation for the mTouchBlockBalance field in APZC.h
- Then read the new implementation for APZC::ReceiveInputEvent (ignoring the old code/diff, since it's a clean rewrite)
- Rest of code

Note that the changes to TabChild are to ensure that every touch block gets exactly one ContentReceivedTouch callback and to remove edge cases where that might or might not happen.
Attachment #8454580 - Attachment is obsolete: true
Attachment #8456105 - Flags: review?(botond)
I wanted to call this the input thread but the existing documentation refers to it as the controller/UI thread so I went with that.
Attachment #8456143 - Flags: review?(botond)
Comment on attachment 8456105 [details] [diff] [review]
Part 1 - Rewrite touch block code

Review of attachment 8456105 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice patch overall! I think it makes this area of the code significantly easier to understand (in addition to being more correct, of course). I like the symmetry between content-received-touch and timeout-content-response, and the concept of the touch block balance.

The comments are mostly minor, but I'd like to hear back about the touch block lifetime before r+'ing.

::: gfx/layers/apz/public/GeckoContentController.h
@@ +68,5 @@
>     * relative to the current scroll offset. HandleLongTapUp will always be
>     * preceeded by HandleLongTap. However not all calls to HandleLongTap will
>     * be followed by a HandleLongTapUp (for example, if the user drags
> +   * around between the long-tap and lifting their finger, or if content
> +   * notifies the APZ that the long-tap event was prevent-defaulted).

It might be worth adding a comment to PBrowser.ipdl saying that many of the functions there correspond to functions here, and that the reader should refer to this file for additional documentation of those functions.

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ +753,5 @@
>  
> +  while (!mTouchBlockQueue.IsEmpty()) {
> +    delete mTouchBlockQueue[0];
> +    mTouchBlockQueue.RemoveElementAt(0);
> +  }

This becomes simply 'mTouchBlockQueue.Clear()' if we use UniquePtr.

@@ +797,5 @@
>  {
>    return static_cast<AxisLockMode>(gfxPrefs::APZAxisLockMode());
>  }
>  
>  nsEventStatus AsyncPanZoomController::ReceiveInputEvent(const InputData& aEvent) {

Please document (in the header file) the meanings of the return values. (In particular, 'ConsumeDoDefault' meaning 'the event was queued' is not self-explanatory.)

@@ +821,5 @@
>      }
> +  } else if (mTouchBlockQueue.IsEmpty()) {
> +    NS_WARNING("Received a non-start touch event while no touch blocks active!");
> +  } else {
> +    // this touch is part of the current block

This comment appears to use a different sense of "current block" than CurrentTouchBlock() (last in queue vs. first).

@@ +1115,5 @@
>  
>  nsEventStatus AsyncPanZoomController::OnScaleBegin(const PinchGestureInput& aEvent) {
>    APZC_LOG("%p got a scale-begin in state %d\n", this, mState);
>  
> +  if (HasReadyTouchBlock() && !CurrentTouchBlock()->TouchActionAllowsPinchZoom()) {

If we get here, shouldn't HasReadyTouchBlock() always be true?

@@ +2465,5 @@
> +AsyncPanZoomController::ScheduleContentResponseTimeout() {
> +  APZC_LOG("%p scheduling content response timeout\n", this);
> +  PostDelayedTask(
> +    NewRunnableMethod(this, &AsyncPanZoomController::ContentResponseTimeout),
> +    gfxPrefs::APZContentResponseTimeout());

Is there any issue with these tasks keeping the APZC alive? For example, might they delay the destruction of an APZC for longer than we want? (There may be no problem, just thinking aloud.)

@@ +2572,5 @@
> +  if (gfxPrefs::TouchActionEnabled() && aCopyAllowedTouchBehaviorFromCurrent) {
> +    newBlock->CopyAllowedTouchBehaviorsFrom(CurrentTouchBlock());
> +  }
> +
> +  // We're going to starting a new block, so clear out any depleted blocks at the head of the queue.

s/starting/start

::: gfx/layers/apz/src/AsyncPanZoomController.h
@@ +703,5 @@
> +
> +private:
> +  void ScheduleContentResponseTimeout();
> +  void ContentResponseTimeout();
> +  void ProcessPendingInputBlocks();

Document the precondition that there is at least one touch block in the queue.

@@ +710,5 @@
> +  bool HasReadyTouchBlock();
> +
> +private:
> +  // The queue of touch blocks that have not yet been processed by this APZC.
> +  nsTArray<TouchBlockState*> mTouchBlockQueue;

Is there any reason we need to manage the lifetime of the touch blocks manually, rather than using UniquePtr? If not, let's please switch to UniquePtr.

@@ +749,5 @@
> +  // - negative when we are in a state where we have received more timeout expirations
> +  //   than content responses. This means that the next timeout expiration that we will
> +  //   receive will correspond to the first unprocessed block, but the next n content
> +  //   responses need to be ignored as they are for blocks we have already processed.
> +  //   (n is the absolute value of the balance.)

As pointed out on IRC, the description of positive/negative here is backwards wrt. the code.

@@ +758,5 @@
> +  // once an input block has been processed, it can potentially be removed from the queue.
> +  // Therefore the information in that block is lost. An alternative approach would
> +  // be to keep around those blocks until they have received both the content response
> +  // and timeout expiration, but that involves a higher level of memory usage.
> +  int32_t mTouchBlockBalance;

Nice concept and explanation!

::: gfx/layers/apz/src/TouchBlockState.cpp
@@ +14,5 @@
> +namespace mozilla {
> +namespace layers {
> +
> +/**
> + * Default touch behavior (is used when not touch behavior is set).

s/not/no

::: gfx/layers/apz/src/TouchBlockState.h
@@ +21,5 @@
> + * Every touch-start event creates a new touch block. In this case, the
> + * touch block consists of the touch-start, followed by all touch events
> + * up to but not including the next touch-start (except in the case where
> + * a long-tap happens, see below). Note that in particular we cannot know
> + * when a touch block ends until the next one is started. Most touch

Is this the case even when a touch-end is received?

@@ +71,5 @@
> +  /**
> +   * Copy the allowed touch behavior flags from another block.
> +   * @return false if this block already has these flags set, true if not.
> +   */
> +  bool CopyAllowedTouchBehaviorsFrom(TouchBlockState* aOther);

The argument should be const because the function does not modify it, and a reference because it cannot be null.

@@ +77,5 @@
> +  /**
> +   * @return true iff this block has received all the information needed
> +   *         to properly dispatch the events in the block.
> +   */
> +  bool IsReadyForHandling();

Make this method const. Same for IsDefaultPrevented(), SingleTapOccurred(), HasEvents(), and TouchActionAllows*().
Anything I didn't comment on explicitly - agreed and will fix.

(In reply to Botond Ballo [:botond] from comment #11)
> >  
> > +  if (HasReadyTouchBlock() && !CurrentTouchBlock()->TouchActionAllowsPinchZoom()) {
> 
> If we get here, shouldn't HasReadyTouchBlock() always be true?

If the pinch is generated from touch events then yes. However widget code is allowed to send us PinchGestureInput events, in which case we can go directly into the OnScale* functions without having created any touch blocks. I can add a comment to that effect in the code here.

> > +    NewRunnableMethod(this, &AsyncPanZoomController::ContentResponseTimeout),
> > +    gfxPrefs::APZContentResponseTimeout());
> 
> Is there any issue with these tasks keeping the APZC alive? For example,
> might they delay the destruction of an APZC for longer than we want? (There
> may be no problem, just thinking aloud.)

I couldn't think of any scenarios where it would be a problem. All the tasks should eventually run and so release their refs to the APZC, unless we're in some sort of a shutdown scenario in which case it doesn't matter.

> @@ +710,5 @@
> > +  bool HasReadyTouchBlock();
> > +
> > +private:
> > +  // The queue of touch blocks that have not yet been processed by this APZC.
> > +  nsTArray<TouchBlockState*> mTouchBlockQueue;
> 
> Is there any reason we need to manage the lifetime of the touch blocks
> manually, rather than using UniquePtr? If not, let's please switch to
> UniquePtr.

Yeah we should be able to do that. I'll update the patch.

>
> ::: gfx/layers/apz/src/TouchBlockState.h
> @@ +21,5 @@
> > + * Every touch-start event creates a new touch block. In this case, the
> > + * touch block consists of the touch-start, followed by all touch events
> > + * up to but not including the next touch-start (except in the case where
> > + * a long-tap happens, see below). Note that in particular we cannot know
> > + * when a touch block ends until the next one is started. Most touch
> 
> Is this the case even when a touch-end is received?

Yes. For example, say you get a set of events for a pinch-pan that looks like this:
touchstart touchstart touchmove touchmove touchend touchmove touchmove touchend

In this case there are two touch blocks. The first one consists of just the first touchstart. The second one consists of the second touchstart all the way to the last touchend (and might keep going beyond that). Note in particular that the 5th event (a touchend) does not break the input block.
Comment on attachment 8456106 [details] [diff] [review]
Part 2 - Some more prevent-default tests

Review of attachment 8456106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/tests/gtest/TestAsyncPanZoomController.cpp
@@ +199,5 @@
>    {
>      apzc->Destroy();
>    }
>  
> +  void MakeMetricsMayHaveTouchListeners()

Not a big fan of this name. SetMayHaveTouchListeners?

@@ +204,2 @@
>    {
> +    FrameMetrics frameMetrics = apzc->GetFrameMetrics();

We could add to TestAsyncPanZoomController a 'FrameMetrics& GetFrameMetrics()' method (and adjust the current one to be 'const FrameMetrics& GetFrameMetrics() const', which is what it should be anyways), and then this function just becomes 'apzc->GetFrameMetrics().mMayHaveTouchListeners = true'.
Attachment #8456106 - Flags: review?(botond) → review+
Comment on attachment 8456143 [details] [diff] [review]
Part 3 - Bolt on some thread safety checks

Review of attachment 8456143 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/apz/src/AsyncPanZoomController.cpp
@@ +410,5 @@
> +static void
> +AssertOnControllerThread() {
> +  static bool sControllerThreadDetermined = false;
> +  if (!sControllerThreadDetermined) {
> +    sControllerThread = pthread_self();

How do we know the first call to this function will be on the controller thread?

@@ +810,5 @@
>    return static_cast<AxisLockMode>(gfxPrefs::APZAxisLockMode());
>  }
>  
>  nsEventStatus AsyncPanZoomController::ReceiveInputEvent(const InputData& aEvent) {
> +  if (GetThreadAssertionsEnabled()) {

Since all uses of AssertOnControllerThread() are conditioned on GetThreadAssertionsEnabled(), we can move the condition inside AssertOnControllerThread() (or introduce a wrapper that includes the condition).
(In reply to Botond Ballo [:botond] from comment #14)
> 
> How do we know the first call to this function will be on the controller
> thread?

We don't. But the important thing is that all accesses of the touch block queue are on the same thread, regardless of what we call that thread. If the first call to the function is wrong then the second call to the function will assert, which should alert us to the problem. I don't really want to make widget code responsible for reporting to the APZC which thread is actually the controller thread.
Comment on attachment 8456143 [details] [diff] [review]
Part 3 - Bolt on some thread safety checks

Review of attachment 8456143 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> (In reply to Botond Ballo [:botond] from comment #14)
> > 
> > How do we know the first call to this function will be on the controller
> > thread?
> 
> We don't. But the important thing is that all accesses of the touch block
> queue are on the same thread, regardless of what we call that thread. If the
> first call to the function is wrong then the second call to the function
> will assert, which should alert us to the problem. I don't really want to
> make widget code responsible for reporting to the APZC which thread is
> actually the controller thread.

That's reasonable. Perhaps we could mention this in a comment in AssertOnControllerThread(), so that someone reading the implementation doesn't wonder about it.
Attachment #8456143 - Flags: review?(botond) → review+
Updated to address all review comments.
Attachment #8456105 - Attachment is obsolete: true
Attachment #8456105 - Flags: review?(botond)
Attachment #8456595 - Flags: review?(botond)
Updated to address review comments, carrying r+
Attachment #8456106 - Attachment is obsolete: true
Attachment #8456596 - Flags: review+
Comment on attachment 8456595 [details] [diff] [review]
Part 1 - Rewrite touch block code (v2)

Review of attachment 8456595 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM!
Attachment #8456595 - Flags: review?(botond) → review+
Backed out part 3 for windows bustage:
https://hg.mozilla.org/integration/mozilla-inbound/rev/65206717c04c
Whiteboard: [leave open]
New attempt at part 3, using PRThread* instead of pthread_t which should be windows-friendly. Try push including this patch at https://tbpl.mozilla.org/?tree=Try&rev=3d7cb20f3660; carrying r+ since it's a simple change compared to the previous version.
Attachment #8456917 - Flags: review+
Whiteboard: [leave open]
No longer blocks: 1038002
Duplicate of this bug: 1038002
Trivial follow-up patch to fix an APZC_LOG statement that was broken by the switchover to UniquePtr and that I didn't catch before landing.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d616c7070518
Depends on: 1046428
This issue could reproduced easily at FxOS v2.0
Could this patch land to FxOS v2.0?
Flags: needinfo?(bugmail.mozilla)
(In reply to jongsoo.oh from comment #30)
> This issue could reproduced easily at FxOS v2.0
> Could this patch land to FxOS v2.0?

Which issue are you referring to? This is a meta bug that is a pretty big rewrite of a lot of code, and i don't think it's a good idea to try to uplift it.
Flags: needinfo?(bugmail.mozilla)
Depends on: 1052023
Hi Jongsoo -

Could you elaborate more on what you meant in Comment#30? Not sure what issue you are referring to

Thanks

Vance
Flags: needinfo?(zzongsoo)
We detected Bug 1038002(Bug 1038002 - Holding your finger down and then trying to pan gets stuck until the next touch)in dialer.
So we needed to uplift this patches to v2.0
But bug 1038002 is not reproduced the latest v2.0 flame.(We will check which code solve this issue)
When We meet specific issue for APZC, we will inform again. 
Thanks
Flags: needinfo?(zzongsoo)
(In reply to jongsoo.oh from comment #33)
> We detected Bug 1038002(Bug 1038002 - Holding your finger down and then
> trying to pan gets stuck until the next touch)in dialer.
> So we needed to uplift this patches to v2.0
> But bug 1038002 is not reproduced the latest v2.0 flame.(We will check which
> code solve this issue)
> When We meet specific issue for APZC, we will inform again. 
> Thanks

Thanks Jongsoo for the update, if you find another new issue for APZC, please submit a new issue to Bugzilla to track

Thanks for your help

Vance
You need to log in before you can comment on or make changes to this bug.