I believe I tracked down the cause of the bug. It's fairly subtle. The touch event objects processed by APZ store two copies of their coordinates in two different data members: [`mScreenPoint`](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/widget/InputData.h#175) stores the global coordinates of the touch relative to the screen, and [`mLocalScreenPoint`](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/widget/InputData.h#179) stores the coordinates relative to the APZC which is processing the event (more specifically, relative to its composition bounds). The first one is set by widget code when creating the event; the second one is computed when an event is dispatched to an APZC by applying a transform to the first one representing the position of the APZC's composition bounds relative to the screen. The interpolation in `MaybeSplitTouchMoveEvent()` operates on screen coordinates (because the touch-move threshold is specified in screen inches), and sets the `mScreenPoint` of the synthesized event. It's then necessary to compute and set the `mLocalScreenPoint` of the synthesized event. [The code does this.](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/gfx/layers/apz/src/AsyncPanZoomController.cpp#6726) However, there's an unfortunate snag: if the target APZC is a subframe and the scroll is being handed off to an ancestor, then the composition bounds of the target APZC itself are changing over the course of the gesture. As a result, the transform from global coordinates to local coordinates is itself changing. To handle this, we cache the transform at the beginning of the input block, and use the cached transform for all events in the input block, so that their local coordinates are all in the same coordinate system and comparable to each other. The bug is that `MaybeSplitTouchMoveEvent()` is not using the cached transform from the input block: it's recomputing an up-to-date transform at that moment, and calculating `mLocalScreenPoint` based on that. As a result, the coordinates of the synthesized event are actually in a different coordinate system than the real event before it, and trying to interpret them as being in the same coordinate system results in the observed bad behaviour. The fix is simple: have the input block expose the cached transform and use it to compute the local coordinates of the synhtesized event.
Bug 1812227 Comment 37 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
I believe I tracked down the cause of the bug. It's fairly subtle. The touch event objects processed by APZ store two copies of their coordinates in two different data members: [`mScreenPoint`](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/widget/InputData.h#175) stores the global coordinates of the touch relative to the screen, and [`mLocalScreenPoint`](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/widget/InputData.h#179) stores the coordinates relative to the APZC which is processing the event (more specifically, relative to its composition bounds). The first one is set by widget code when creating the event; the second one is computed when an event is dispatched to an APZC by applying a transform to the first one representing the position of the APZC's composition bounds relative to the screen. The interpolation in `MaybeSplitTouchMoveEvent()` operates on screen coordinates (because the touch-move threshold is specified in screen inches), and sets the `mScreenPoint` of the synthesized event. It's then necessary to compute and set the `mLocalScreenPoint` of the synthesized event. [The code does this.](https://searchfox.org/mozilla-central/rev/181e5bb2645236a617d42e3740420098097f7a0f/gfx/layers/apz/src/AsyncPanZoomController.cpp#6726) However, there's an unfortunate snag: if the target APZC is a subframe and the scroll is being handed off to an ancestor, then the composition bounds of the target APZC itself are changing over the course of the gesture. As a result, the transform from global coordinates to local coordinates is itself changing. To handle this, we cache the transform at the beginning of the input block, and use the cached transform for all events in the input block, so that their local coordinates are all in the same coordinate system and comparable to each other. The bug is that `MaybeSplitTouchMoveEvent()` is not using the cached transform from the input block: it's recomputing an up-to-date transform at that moment, and calculating `mLocalScreenPoint` based on that. As a result, the coordinates of the synthesized event are actually in a different coordinate system than the real event before it, and trying to interpret them as being in the same coordinate system results in the observed bad behaviour. The fix is simple: have the input block expose the cached transform and use it to compute the local coordinates of the synthesized event.