Closed Bug 1576524 Opened 5 months ago Closed 5 months ago

Focus on mouse movement is offset when popup is open via "PopUp" add-on

Categories

(Core :: Panning and Zooming, defect, P1)

68 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla71
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- disabled
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- verified
firefox71 --- verified

People

(Reporter: yoasif, Assigned: botond)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

As reported on reddit: https://www.reddit.com/r/firefox/comments/cv08ek/bad_behavior_in_pop_up_window/

Steps to reproduce:

  1. Install https://addons.mozilla.org/es/firefox/addon/popup/
  2. Navigate to https://codepen.io/nadoaa/pen/vbJpyr
  3. Click add-on toolbar icon to open page
  4. Move mouse around the preview area

What happens:

The white circle does not track with the mouse cursor.

Expected result:

White circle should track with mouse cursor as previously.

There is a video of the issue available: https://reddit.com/link/cv08ek/video/c0ucmbzv1hi31/player

19:31.71 INFO: No more inbound revisions, bisection finished.
19:31.71 INFO: Last good revision: 3c4b55694127237c3dd740d5ed055959c4d783fe
19:31.71 INFO: First bad revision: 91d0c9066fd139285e025b0b87a70f61823281af
19:31.71 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3c4b55694127237c3dd740d5ed055959c4d783fe&tochange=91d0c9066fd139285e025b0b87a70f61823281af

Keywords: regression
Regressed by: 1530661

Henri, this bug was regressed by your patch in bug 1530661 -- can you take a look at this bug?

Flags: needinfo?(hsivonen)

Bug 1565525 was supposed to hide this bug. Do you still see the problem in a build that has the patch for bug 1565525?

Flags: needinfo?(hsivonen) → needinfo?(yoasif)

I am seeing this in the latest Firefox Nightly, which I assume has the patch for bug 1565525.

Flags: needinfo?(yoasif)

Hmm, I can't reproduce this in today's Nightly: for me, the white circle continues to track the cursor in the popup window.

And I can reproduce the issue in a 2019-08-19 nightly, which doesn't include bug 1565525.

Note, the above tests were on Linux. For good measure, I did try the STR on Windows as well, but I was not able to perform them: on Windows, clicking on the addon's button in the toolbar results in an empty popup menu with no menu items.

Update: it worked in a clean profile. Once again, no repro (white circle successfully tracks the cursor in the popup, on Windows 10, using today's (2019-08-26) nightly).

Running 20190826094255 and I still see the issue. I am in a GNOME Wayland session and I see the issue both with MOZ_ENABLE_WAYLAND=0 and MOZ_ENABLE_WAYLAND=1.

This is in a clean profile other than using the add-on to create the popup.

Ok, I was able to reproduce it. The difference seems to be WebRender: the bug requires WebRender to be enabled to reproduce.

(Also means disabling WebRender works as a workaround for the bug.)

Ah, sorry - WebRender is enabled by default on my hardware now, so I forgot to test with it disabled.

Hmm, I can't reproduce it in a local debug build for some reason, even with WebRender enabled.

I can reproduce it in a local opt build. Time related / race condition? I do see us getting into the codepath added by the bug 1565525 fix.

Priority: -- → P1

It does indeed look like a race condition. We are clearing the stale transforms in the codepath added in bug 1565525, but shortly after something (I'm guessing some code called from tasks which are already queued up to run) is re-setting them and we remain stuck with them.

I tried a couple of approaches to get the code to clear the transforms to run after whatever is setting them:

  • Run it in a task posted to the compositor message loop.
  • Run it in a task on the updater thread (which will eventually post it back to the compositor message loop).

Unfortunately, in either case, the code to clear the transforms still runs "too early".

The next step would be to undertake a closer examination of what tasks are setting the transforms, and where they are queued, so we can understand how to post a task that will run after all that.

Here is (the interesting part of) a stacktrace showing what is triggering the transforms to be re-set to their old values:

#19 0x00007ff4b78f765e in mozilla::layers::RemoteContentController::NotifyLayerTransforms (this=0x7ff4a7f3e920, aTransforms=...)
    at /home/botond/dev/mozilla/central/gfx/layers/ipc/RemoteContentController.cpp:35
#20 0x00007ff4b7821a8e in mozilla::layers::APZCTreeManager::SendSubtreeTransformsToChromeMainThread (this=<optimized out>, aAncestor=<optimized out>)
    at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZCTreeManager.cpp:3318
#21 0x00007ff4b783dd9d in mozilla::layers::APZCTreeManager::UpdateHitTestingTreeImpl<mozilla::layers::WebRenderScrollDataWrapper> (this=0x7ff4b1d9d800, aRoot=..., aIsFirstPaint=false,
    aOriginatingWrRootId=..., aPaintSequenceNumber=<optimized out>) at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZCTreeManager.cpp:607
#22 0x00007ff4b783b1f8 in mozilla::layers::APZCTreeManager::UpdateHitTestingTree (this=0x7ff4b1d9d800, aScrollWrapper=...,
    aOriginatingWrRootId=<error reading variable: access outside bounds of object referenced via synthetic pointer>, aPaintSequenceNumber=<optimized out>, aIsFirstPaint=<optimized out>)
    at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZCTreeManager.cpp:638
#23 mozilla::layers::APZUpdater::UpdateScrollDataAndTreeState(mozilla::layers::WRRootId, mozilla::layers::WRRootId, mozilla::wr::Epoch const&, mozilla::layers::WebRenderScrollData&&)::$_28::operator()() (this=<optimized out>) at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZUpdater.cpp:220
#24 mozilla::detail::RunnableFunction<mozilla::layers::APZUpdater::UpdateScrollDataAndTreeState(mozilla::layers::WRRootId, mozilla::layers::WRRootId, mozilla::wr::Epoch const&, mozilla::layers::WebRenderScrollData&&)::$_28>::Run() (this=<optimized out>) at /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/include/nsThreadUtils.h:564
#25 0x00007ff4b7825479 in mozilla::layers::APZUpdater::ProcessQueue (this=<optimized out>) at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZUpdater.cpp:533
#26 0x00007ff4b7824fda in mozilla::layers::APZUpdater::CompleteSceneSwap (aWindowId=..., aInfo=...) at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZUpdater.cpp:122
#27 0x00007ff4b7826a26 in apz_post_scene_swap (aWindowId=..., aInfo=...) at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZUpdater.cpp:597
#28 0x00007ff4bb61da16 in _$LT$webrender_bindings..bindings..APZCallbacks$u20$as$u20$webrender..renderer..SceneBuilderHooks$GT$::post_scene_swap::hf24b64877d4d9769 (self=<optimized out>,
    document_ids=0x7ff4a7efd6c8, info=..., sceneswap_time=<optimized out>) at gfx/webrender_bindings/src/bindings.rs:944
#29 0x00007ff4bb947c6e in webrender::scene_builder::SceneBuilder::forward_built_transactions::h77eaf8cb3ef0f13d (self=0x7ff4a7efe4c0, txns=...) at gfx/wr/webrender/src/scene_builder.rs:607
#30 0x00007ff4bb946aec in webrender::scene_builder::SceneBuilder::run::h367fcc07cee0d257 (self=0x2) at gfx/wr/webrender/src/scene_builder.rs:330

It's in WebRender specific code, which explains why the issue only occurs with WebRender enabled.

Note that the stack trace from the previous comment shows us running a task in the updater queue. Here is the stack trace for when the task in question is enqueued:

#19 0x00007fa9abe9ae33 in mozilla::layers::APZUpdater::UpdateScrollDataAndTreeState(mozilla::layers::WRRootId, mozilla::layers::WRRootId, mozilla::wr::Epoch const&, mozilla::layers::WebRenderScrollData&&) (this=0x7fa9ba894430, aRootLayerTreeId=..., aOriginatingWrRootId=..., aEpoch=...,
    aScrollData=<unknown type in /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/bin/libxul.so, CU 0x618068b, DIE 0x6272fb0>)
    at /home/botond/dev/mozilla/central/gfx/layers/apz/src/APZUpdater.cpp:223
#20 0x00007fa9abe5846c in mozilla::layers::WebRenderBridgeParent::UpdateAPZScrollData(mozilla::wr::Epoch const&, mozilla::layers::WebRenderScrollData&&, mozilla::wr::RenderRoot) (
    this=0x7fa99b42a400, aEpoch=..., aData=<unknown type in /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/bin/libxul.so, CU 0x5dff4bd, DIE 0x5fe35d7>,
    aRenderRoot=<optimized out>) at /home/botond/dev/mozilla/central/gfx/layers/wr/WebRenderBridgeParent.cpp:1066
#21 0x00007fa9abe58ca1 in mozilla::layers::WebRenderBridgeParent::ProcessRenderRootDisplayListData (this=0x7fa99b42a400, aDisplayList=..., aWrEpoch=..., aTxnStartTime=...,
    aValidTransaction=true, aObserveLayersUpdate=true) at /home/botond/dev/mozilla/central/gfx/layers/wr/WebRenderBridgeParent.cpp:1187
#22 0x00007fa9abe59a32 in mozilla::layers::WebRenderBridgeParent::RecvSetDisplayList(nsTArray<mozilla::layers::RenderRootDisplayListData>&&, nsTArray<mozilla::layers::OpDestroy>&&, unsigned long const&, mozilla::layers::BaseTransactionId<mozilla::layers::TransactionIdType> const&, bool const&, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, nsTString<char> const&, mozilla::TimeStamp const&, nsTArray<mozilla::layers::CompositionPayload>&&) (this=<optimized out>,
    aDisplayLists=<unknown type in /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/bin/libxul.so, CU 0x5dff4bd, DIE 0x5fe6dde>,
    aToDestroy=<unknown type in /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/bin/libxul.so, CU 0x5dff4bd, DIE 0x5fe6def>,
    aFwdTransactionId=@0x7fa900000001: <error reading variable>, aTransactionId=..., aContainsSVGGroup=@0x7fa9a5dbd900: false, aVsyncId=..., aVsyncStartTime=..., aRefreshStartTime=...,
    aTxnStartTime=..., aTxnURL=..., aFwdTime=...,
    aPayloads=<unknown type in /home/botond/dev/mozilla/out-of-tree-objdirs/central-desktop-clang-opt/dist/bin/libxul.so, CU 0x5dff4bd, DIE 0x5fe6e93>)
    at /home/botond/dev/mozilla/central/gfx/layers/wr/WebRenderBridgeParent.cpp:1320

Importantly, the enqueueing happens after we clear the transforms in RecvAdoptChild. This suggests that the approaches described in comment 14 are not going to work.

I think the takeaway here is that the solution approach implemented in bug 1565525 -- clearing the transforms once in RecvAdoptChild -- is not sufficient, no matter what queue we place that task into.

Instead, I'm thinking of an extended approach along these lines:

  • In addition to clearing the transforms in RecvAdoptChild, tell the the APZCTreeManager that any further transforms it sends for that layers id should be empty.
  • Additionally, there should be a mechanism to clear that state from the APZCTreeManager (in case layers ids eventually get reused, or at least to not accumulate them indefinitely in case of high tab turnover). I'm thinking that when the layers id disappears from the hit testing tree is a good time to clear the state.

The other solution here would be to avoid the problem by fixing bug 1493208 (enabling APZ in popups), but that might be on the risky side for a regression fix.

I implemented the approach described in comment 17 locally and it seems to be working.

Assignee: nobody → botond

This ensures that mDetachedLayersIds doesn't grow indefinitely.

Depends on D44740

Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bc384b89c51f
Have APZCTreeManager track a set of layers ids for which it should not send MatrixMessages. r=tnikkel,hsivonen
https://hg.mozilla.org/integration/autoland/rev/8d7284764bd6
Clean up APZCTreeManager::mDetachedLayersIds in UpdateHitTestingTree. r=tnikkel
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Asif, could you verify that the issue is resolved in the latest Nightly? Thanks!

Flags: needinfo?(yoasif)

Botond, the issue is resolved in the latest Nightly. Thanks!

Flags: needinfo?(yoasif)

Great! I will try to get the fix into 70.

Comment on attachment 9090533 [details]
Bug 1576524 - Have APZCTreeManager track a set of layers ids for which it should not send MatrixMessages. r=tnikkel!,hsivonen!

Beta/Release Uplift Approval Request

  • User impact if declined: With WebRender enabled (which is the default on a lot of hardware these days), WebExtensions that move tabs into popups suffer from event targeting problems (where e.g. clicking results in activating an element away from the cursor position rather than at the cursor position).
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The code being touched is somewhat tricky, but what we're doing in this fix is pretty safe and only affects scenarios where a tab is moved from a regular (APZ-emabled) window into a popup (or another window which is APZ-disabled which is rare).
  • String changes made/needed:
Attachment #9090533 - Flags: approval-mozilla-beta?
Attachment #9090534 - Flags: approval-mozilla-beta?

Comment on attachment 9090533 [details]
Bug 1576524 - Have APZCTreeManager track a set of layers ids for which it should not send MatrixMessages. r=tnikkel!,hsivonen!

Fix for WebRender issue, verified in nightly, might be a bit risky but we're early in beta 70.
OK for uplift for beta 6.

Attachment #9090533 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9090534 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Verified as fixed on the latest Nightly 71.0a1 (Build ID: 20190913092859) and Firefox 70 beta 6 - tested on Windows 10 x64-bit, Ubuntu 18.04 and Mac OS X 10.13.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.