Closed
Bug 1298173
Opened 9 years ago
Closed 9 years ago
Intermittent gfx/layers/apz/test/mochitest/test_bug1285070.html | helper_bug1285070.html | Event pointerup should be generated once - got +0, expected 1
Categories
(Core :: Panning and Zooming, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla52
Tracking | Status | |
---|---|---|
firefox50 | --- | unaffected |
firefox51 | --- | fixed |
firefox52 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: kats)
References
Details
(Keywords: intermittent-failure, Whiteboard: [gfx-noted])
Attachments
(2 files)
58 bytes,
text/x-review-board-request
|
dvander
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
58 bytes,
text/x-review-board-request
|
dvander
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
Assignee | ||
Updated•9 years ago
|
Priority: -- → P3
Version: unspecified → 51 Branch
Updated•9 years ago
|
Assignee: nobody → sshih
Comment 1•9 years ago
|
||
19:49:26 INFO - Received event pointerover
19:49:26 INFO - Received event pointerenter
19:49:26 INFO - Received event pointerdown
Test failures...
19:49:28 INFO - Received event pointerup
19:49:28 INFO - Received event pointerout
19:49:28 INFO - Received event pointerleave
From the log, it looks like the click event was fired before pointerup. Have no idea how this could be happened and trying to reproduce it.
Assignee | ||
Comment 2•9 years ago
|
||
This sounds like the sort of thing that could be regressed by bug 1289650. There were a number of other intermittent failures that also spiked in volume yesterday, which are likely also a result of threading changes from the same bug. I'll try to investigate today.
Assignee | ||
Comment 3•9 years ago
|
||
The pointerup event is triggered by the touchup event. The path for that is as follows:
- nsBaseWidget calls APZCTreeManager::ReceiveInputEvent with it on the main thread [1]
- nsBaseWidget then dispatches it to the parent process event mechanism [2]
- the event is sent over PContent to the child process main thread [3]
Whereas the click event is also spawned from the touchup event using this path:
- nsBaseWidget calls APZCTreeManager::ReceiveInputEvent with it on the main thread [1]
- AsyncPanZoomController does a PostDelayedTask back to the parent process main thread [4]
- RemoteContentController shifts it to the compositor thread [5]
- the event is sent over PAPZ to the child process main thread (?) [6]
So it sounds like if PContent is blocked, the touchup event might be delayed. If PAPZ (which now is part of PCompositorBridge rather than PContent) is not blocked, the click event can "overtake" the touchup event and get into the child process main thread queue first. That would result in the click event getting fired before the touchup, which sounds like what Stone was seeing.
In practice this can result in things like bug 965381, as per the comment at [7]. In fact the PostDelayedTask call at [4] was inserted specifically to avoid this, but now that the click event is getting remoted over PCompositorBridge it is less effective.
[1] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/widget/nsBaseWidget.cpp#1227
[2] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/widget/nsBaseWidget.cpp#1233
[3] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/dom/ipc/TabParent.cpp#1641
[4] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/gfx/layers/apz/src/AsyncPanZoomController.cpp#2043
[5] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/gfx/layers/ipc/RemoteContentController.cpp#59
[6] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/gfx/layers/ipc/RemoteContentController.cpp#70
[7] http://searchfox.org/mozilla-central/rev/f80822840bc5f3d2d3cae3ece621ddbce72e7f54/gfx/layers/apz/src/AsyncPanZoomController.cpp#2030
Assignee | ||
Comment 4•9 years ago
|
||
(Note that the above is still speculation on my part, I haven't confirmed it. If I'm right, it should be possible to reproduce by somehow artificially jamming PContent so that messages are delayed).
Assignee | ||
Updated•9 years ago
|
Whiteboard: [gfx-noted]
Assignee | ||
Comment 5•9 years ago
|
||
Assigning to rhunt for now, since I'm pretty sure it's a regression from his change.
Assignee: sshih → rhunt
Assignee | ||
Comment 6•9 years ago
|
||
I caught an instance of this using rr chaos mode. I'll investigate.
Assignee: ryanhunt → bugmail
Assignee | ||
Comment 7•9 years ago
|
||
Comment 3 is indeed what is happening. And it'll get worse with GPU process separation, because the touchup event will have an extra process hop (GPU process -> main process -> content process) whereas the click will go directly from GPU process -> content process) and so will be more likely to "overtake" the touchup.
Not yet sure how to fix this.
Assignee | ||
Comment 8•9 years ago
|
||
As far as I can see there are two main options for fixing this:
1) Route the HandleTap message back through PAPZCTreeManager to the parent process, and from there to the child process (if necessary). That way the HandleTap would "follow" the touchup on the same protocols/threads, and therefore be guaranteed to arrive after the touchup.
2) Add some machinery in APZEventState that queues up and delays the single-tap events from being disabled until after the corresponding touch-up has been handled.
I was initially leaning towards (1) but it's pretty messy to implement because there are no other messages that take that path, and so a bunch of new plumbing needs to be added. Another complication is that smaug is working on some DOM changes that entail receiving input events into the content process off the main thread. So stuff like RecvRealTouchEvent would move off the main thread to some other thread, and RecvHandleTap is in the same bucket and would need to move. As part of that I realized that a couple of other functions like RequestContentRepaint and NotifyAPZStateChange might need moving also. It might be that PAPZ ends up getting shifted out of PCompositorBridge and into some other top-level protocol.
All this to say, I would like to build a solution that is robust at the endpoint so that no matter which protocol we end up using to send the messages, it will remain ordered in the way it's supposed to be - and that means going with option (2). So now I'm leaning towards that, and will try cooking up something that uses that approach.
Assignee | ||
Comment 9•9 years ago
|
||
After some discussion with smaug and keeping in mind future changes I'm leaning towards (1) again. I implemented it and it seems to work. Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=f4ecf04afffc
I might still try implementing (2) as it might help with bug 1145090.
Assignee | ||
Comment 10•9 years ago
|
||
Here's a better try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca705241b538&group_state=expanded
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> I might still try implementing (2) as it might help with bug 1145090.
I spent some time thinking about this and didn't get anywhere useful. I was hoping to find a way to add a bunch of state tracking to APZEventState that would ensure the incoming messages for a given input block are handled together and don't get interrupted by messages for a different input block. However it's turning out rather complicated and confusing because I'm not fully clear on which interleavings are possible and am having trouble dealing with all the cases. So I'm going to punt on that for now and get these patches up.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
![]() |
||
Comment 14•9 years ago
|
||
mozreview-review |
Comment on attachment 8792980 [details]
Bug 1298173 - Remove redundant parameter.
https://reviewboard.mozilla.org/r/79792/#review78544
Attachment #8792980 -
Flags: review?(dvander) → review+
![]() |
||
Comment 15•9 years ago
|
||
mozreview-review |
Comment on attachment 8792981 [details]
Bug 1298173 - Push HandleTap from the GPU process back to the parent process before passing it on to the child process.
https://reviewboard.mozilla.org/r/79794/#review78550
::: gfx/layers/ipc/RemoteContentController.cpp:61
(Diff revision 1)
> uint64_t aInputBlockId)
> {
> - if (MessageLoop::current() != mCompositorThread) {
> - // We have to send messages from the compositor thread
> - mCompositorThread->PostTask(NewRunnableMethod<TapType, LayoutDevicePoint, Modifiers,
> - ScrollableLayerGuid, uint64_t>(this,
> + APZThreadUtils::AssertOnControllerThread();
> + const CompositorBridgeParent::LayerTreeState* state =
> + CompositorBridgeParent::GetIndirectShadowTree(aGuid.mLayersId);
> + if (state) {
If the controller thread != compositor thread, this probably isn't safe. LayerTreeState and sIndirectLayerTreeState are mutated on the compositor thread, so stuff like mParent and mApzcTreeManagerParent have to be grabbed inside the lock.
Some helper method like CompositorBridgeParent::GetApzcTreeManagerForRootLayer(aLayersId) would be fine to hide all this.
::: gfx/layers/ipc/RemoteContentController.cpp:75
(Diff revision 1)
> + if (state) {
> + APZCTreeManagerParent* apzctmp = state->mApzcTreeManagerParent;
> + if (apzctmp) {
> + // We're in the GPU process, on the compositor thread. Send the message
> + // to the parent process to deal with.
> + // XXX MOZ_ASSERT(XRE_IsGpuProcess());
Can this only fire in the GPU process because APZCTreeManagerParent won't exist in-process? Is it possible to do a process-type check before we grab layer tree state to make this clearer?
![]() |
||
Comment 16•9 years ago
|
||
mozreview-review |
Comment on attachment 8792981 [details]
Bug 1298173 - Push HandleTap from the GPU process back to the parent process before passing it on to the child process.
https://reviewboard.mozilla.org/r/79794/#review78558
r=me with the LayerTreeState stuff addressed
Attachment #8792981 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to David Anderson [:dvander] from comment #15)
> If the controller thread != compositor thread, this probably isn't safe.
> LayerTreeState and sIndirectLayerTreeState are mutated on the compositor
> thread, so stuff like mParent and mApzcTreeManagerParent have to be grabbed
> inside the lock.
>
> Some helper method like
> CompositorBridgeParent::GetApzcTreeManagerForRootLayer(aLayersId) would be
> fine to hide all this.
Sounds good, I'll do that.
> ::: gfx/layers/ipc/RemoteContentController.cpp:75
> (Diff revision 1)
> > + if (state) {
> > + APZCTreeManagerParent* apzctmp = state->mApzcTreeManagerParent;
> > + if (apzctmp) {
> > + // We're in the GPU process, on the compositor thread. Send the message
> > + // to the parent process to deal with.
> > + // XXX MOZ_ASSERT(XRE_IsGpuProcess());
>
> Can this only fire in the GPU process because APZCTreeManagerParent won't
> exist in-process?
Yes. (To be clear, the APZCTreeManagerParent won't exist in-process *for the root layers id* - it will still exist for the content processes.)
> Is it possible to do a process-type check before we grab
> layer tree state to make this clearer?
Sure, but since there isn't a gpu process type check I'll just do a !XRE_IsParentProcess() instead.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 20•9 years ago
|
||
Another try push with the updates: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2762ce6ed62a
Comment 21•9 years ago
|
||
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/037d241ec65c
Remove redundant parameter. r=dvander
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3879b41dc6c
Push HandleTap from the GPU process back to the parent process before passing it on to the child process. r=dvander
Comment 22•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/037d241ec65c
https://hg.mozilla.org/mozilla-central/rev/c3879b41dc6c
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment 23•9 years ago
|
||
Please request Aurora approval on this when you get a chance.
Flags: needinfo?(bugmail)
Assignee | ||
Comment 24•9 years ago
|
||
I wanted to give this one a few days' bake time before requesting approval. There's a bunch of IPDL code changes that I'm mildly worried will cause crashes, because... well it's IPDL. I'm keeping it on my radar and will request uplift sometime next week once I'm more confident in it.
Flags: needinfo?(bugmail)
Assignee | ||
Comment 25•9 years ago
|
||
Comment on attachment 8792980 [details]
Bug 1298173 - Remove redundant parameter.
Approval Request Comment
[Feature/regressing bug #]: bug 1289650
[User impact if declined]: If the user taps (on touch devices) the click event for the tap may be dispatched before the touchend event, which is wrong. This has caused issues on websites in the past.
[Describe test coverage new/current, TreeHerder]: We have an automated test to catch this, and it was failing intermittently
[Risks and why]: relatively low, there's a bunch of IPDL boilerplate changes and some new stuff but if it results in crashes it should show up pretty quickly on crash-stats (and I haven't seen anything so far).
[String/UUID change made/needed]: none
Attachment #8792980 -
Flags: approval-mozilla-aurora?
Assignee | ||
Updated•9 years ago
|
Attachment #8792981 -
Flags: approval-mozilla-aurora?
Comment 26•9 years ago
|
||
Comment on attachment 8792980 [details]
Bug 1298173 - Remove redundant parameter.
Fix a tap/click behavior failure. Take it in 51 aurora.
Attachment #8792980 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•9 years ago
|
Attachment #8792981 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The second patch hit conflicts in gfx/ipc/RemoteCompositorSession.cpp. Could we get a rebased patch for uplift?
Flags: needinfo?(bugmail)
Assignee | ||
Comment 28•9 years ago
|
||
Rebased and uplifted:
https://hg.mozilla.org/releases/mozilla-aurora/rev/eda2ba9ceb01
https://hg.mozilla.org/releases/mozilla-aurora/rev/277a0cbc72e6
Flags: needinfo?(bugmail)
You need to log in
before you can comment on or make changes to this bug.
Description
•