Closed Bug 1030941 Opened 7 years ago Closed 7 years ago

APZC: Fix issues if TabChild receives touch input events prior to Parent Process

Categories

(Core Graveyard :: Widget: Gonk, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mchang, Assigned: mchang)

References

Details

(Keywords: perf, Whiteboard: [c=effect p=4 s= u=] )

Attachments

(3 files, 1 obsolete file)

From bug 930939, the current architecture of APZC is that the main thread on the parent process sends touch events to the child process AFTER it has finished processing events through the Gecko Event State Manager. With the new architecture, of putting touch events and APZC on a different thread, this won't be the case. We can now have a race condition where APZC finishes processing a touch event, sends the touch event to TabChild, prior to the parent process ever seeing the touch event. This bug is to investigate and see if we can either (a) live with this or (b) reorder events.
CC'ing smaug also.

At the very least we'll have to deal with things like "HandleSingleTap" notifications getting sent to the child process before the corresponding touchstart/touchend events. This has caused problems in the past. Additionally as Mason describes above, we might want to send the touchstart/touchend events directly to the child process without going through the root process ESM.
Ugly WIP, we actually have 2 threads already interacting with APZ. One is the Compositor Loop when we get a layer transaction, and the other is the main thread loop. 

This WIP keeps the compositor tasks on the compositor thread and posts all other APZ tasks onto the input thread. If we send an IPC message on two different threads, bad things happen and lots of crashes occur. We hit double free issues. We still have race conditions as im sure other TabParent <--> TabChild IPC messages occur on both the main thread and input thread. They don't even have to be related messages and we can double free.
This seems to work so far. At the moment, I no longer send TabParent::SendRealTouchEvent events at all to TabChild. Instead, APZC sends TabChild::RecvRealMultiTouchEvent(MultiTouchInput) and converts the MultiTouchInput into a WidgetTouchInput. I'm not sure if there will be ugly consequences from having the Child::ESM process prior to Parent::ESM processing, but I haven't seen anything so far. Will investigate more.
Attachment #8446910 - Attachment is obsolete: true
Blocks: 1031601
No longer blocks: 1031601
Depends on: 1031601
Priority: P1 → P2
I have to scroll really fast and repeatedly for this to happen though, pretty sure it's a race condition.
Attached file APZC Overscroll Log
Hi Botond, I think it's dying because we cancel the fling while thinking it's not in overscroll, and therefore we never start the StartSnapBack animation. Something changes it to think that one of the axis is in overscroll. I think this is in line 93 of the log. If you see anything else, please let me know. Thanks!
Flags: needinfo?(botond)
Never mind, looks like there is a different issue. We kept sending the child a touch message even though the APZC was in overscrolled state. For some reason, when calling http://dxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp?from=GetTouchInputBlockAPZC&case=true#, even with only sending 1 touch at a time, we can have the aOutOverscrolledApzc be true whereas the returned APZC->IsOverscrolled will be false. Still have to dig to see why that happens.
Flags: needinfo?(botond)
(In reply to Mason Chang [:mchang] from comment #6)
> Never mind, looks like there is a different issue. We kept sending the child
> a touch message even though the APZC was in overscrolled state. For some
> reason, when calling
> http://dxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/
> APZCTreeManager.cpp?from=GetTouchInputBlockAPZC&case=true#, even with only
> sending 1 touch at a time, we can have the aOutOverscrolledApzc be true
> whereas the returned APZC->IsOverscrolled will be false. Still have to dig
> to see why that happens.

My intention is that if inOverscrolledApzc is true, the returned APZC is nullptr - see http://dxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp#1127. Is this not the case?
Currently I'm getting an APZC returned in those cases. The problem is this: http://dxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/APZCTreeManager.cpp?from=GetTouchInputBlockAPZC&case=true#447

We go through the APZC tree, and if we get 1 target, we return the APZC. However, if any APZC along the chain is in overscroll, we set the aOutOverscroll to true. I think this is what is happening from reading the code, but still have to instrument. Will check it out tomorrow.

Unfortunately, I can still cause the freezing, it just occurs much less frequently now :(. Probably because I'm accessing APZC's on the main thread, input thread, and compositor thread. If you can take a look at the log and see anything, I'd appreciate it. Thanks!
(In reply to Mason Chang [:mchang] from comment #8)
> Unfortunately, I can still cause the freezing, it just occurs much less
> frequently now :(. Probably because I'm accessing APZC's on the main thread,
> input thread, and compositor thread. If you can take a look at the log and
> see anything, I'd appreciate it. Thanks!

It would help to see the code which performs the added printfs that show up in the log, such as "Is in overscrolled APZC" or "Cancelling fling thread".
Depends on: 1037191
The issue this bug was filed for is no longer an issue using the new patches on bug 930939. The way it works is like so:

1) touchevent enters on libui thread
2) touchevent queued to compositor thread
3) touchevent goes through APZ processing on compositor thread
4) touchevent triggers tap gesture, HandleSingleTap task is queued on compositor thread using our previous fix for this [1]
5) touchevent processing in APZ finishes and gonk widget queues the untransformed event on the main thread
6a) main thread dispatches the touchevent to gecko, which sends it to the child process
6b) compositor thread runs the task queued in step 4, which re-queues the HandleSingleTap task on the main thread [2]
7) main thread runs the task queued at step 6b, which dispatches the HandleSingleTap to the child process.

Since step 7 is guaranteed to happen after step 6a, we don't need to worry about the TabChild receiving the HandleSingleTap call before it sees all the touch events that generated it.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/apz/src/AsyncPanZoomController.cpp?rev=992050dc1e5f#1724
[2] http://mxr.mozilla.org/mozilla-central/source/layout/ipc/RenderFrameParent.cpp?rev=26f5729d5ccc#141
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)
> The issue this bug was filed for is no longer an issue using the new patches
> on bug 930939. The way it works is like so:
> 
> 1) touchevent enters on libui thread
> 2) touchevent queued to compositor thread
> 3) touchevent goes through APZ processing on compositor thread
> 4) touchevent triggers tap gesture, HandleSingleTap task is queued on
> compositor thread using our previous fix for this [1]
> 5) touchevent processing in APZ finishes and gonk widget queues the
> untransformed event on the main thread
> 6a) main thread dispatches the touchevent to gecko, which sends it to the
> child process
> 6b) compositor thread runs the task queued in step 4, which re-queues the
> HandleSingleTap task on the main thread [2]
> 7) main thread runs the task queued at step 6b, which dispatches the
> HandleSingleTap to the child process.
> 
> Since step 7 is guaranteed to happen after step 6a, we don't need to worry
> about the TabChild receiving the HandleSingleTap call before it sees all the
> touch events that generated it.

I'm confused about steps 6a and 6b. Aren't they racy? There isn't a way that the re-queued HandleSingleTap gets processed on the main thread prior to the touch event in 6a?
6a and 6b execute concurrently, but since the main thread can only be doing one thing at a time, it's going to be doing 6a when the re-queue in 6b happens. Even if 6b *starts* before 6a, the re-queue will go into the main-thread queue after the task from step 5 (which is what runs 6a). So there should be no chance that 7 will happen before 6a.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.