Closed Bug 1660805 Opened 4 years ago Closed 3 years ago

Order of focus and click events reversed when clicking to focus an OOP iframe

Categories

(Core :: DOM: UI Events & Focus Handling, defect, P3)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Fission Milestone M7
Tracking Status
firefox85 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

As shown by the upcoming test case for bug 1653160, when clicking to focus a different-site iframe, focus fires before click in non-Fission and click fires before focus in Fission.

Severity: -- → S3
Fission Milestone: --- → M6c

In the non-Fission case, we call SetFocusInner for the iframe element with aFlags=0, aFocusChanged=false.

In the Fission case, we call SetFocusInner for the iframe element via RaiseWindow with aFlags=NOSCROLL|RAISE, aFocusChanged=true.

Not yet sure if the problem is the RAISE flag or aFocusChanged=true, but I suspect the latter.

Obviously, click can only be generated on mouseup.

In the non-Fission case, when EventStateManager::PostHandleEvent for mousedown calls nsFocusManager::SetFocusedWindow, we end up firing focus synchronously. Therefore, it is guaranteed to happen before mouseup.

In the Fission case, when EventStateManager::PostHandleEvent for mousedown calls nsFocusManager::SetFocusedWindow, the content process for the framed document bounces a message via the chrome process to the process to request focus from the framer focus. Then that process bounces a message via the chrome process back to the frame document process to fire the focus event. Whether or not this arrives before or after mouseup depends on how long the mouse button remained pressed relative to IPC speed. In automated tests, the mouseup follows mousedown as fast as logically possible, which is unnaturally fast.

Up next: Compare the order of mousedown and focus events and see if the order of focus and mouseup/click looks better if the synthetic mouse click takes more time between mousedown and mouseup. That might not solve the issue for all input devices, since the user may use an input device that emulates a mouse and generates faster clicks that we'd see with a mouse.

That might not solve the issue for all input devices, since the user may use an input device that emulates a mouse and generates faster clicks that we'd see with a mouse.

I tried this with a dictated mouse click, and it was OK.

Also, I was wrong in the meeting about my trackpad generating mouseup upon fingers touching the trackpad. Of course, the mouseup happens when the fingers leave the trackpad surface.

Could QA, please, run the following steps on a slow laptop (preferably with some sites with ads opened in background tabs):

  1. Ensure Fission is enabled.
  2. Navigate to https://hsivonen.fi/fission-host.html
  3. Use an accessibility tool that generates a mouse click without a button or key that actually has down and up motions to click the button labeled "Button" in the first one of the three iframes.
  4. Comment here whether the focus event bullet point in the iframe appeared before the mouseup bullet point or after it.

Here's how to perform step 3 using Windows' built-in speech recognition (which is the first tool that I can think of that fits the criteria of what's needed for step 3 above):

  1. Have the UI language of Windows set to U.S. English.
  2. Type "Speech" into the search box in the task bar.
  3. Choose "Turn on Speech Recognition".
  4. In Windows Settings that opens, flip the "Turn on Speech Recognition" toggle.
  5. Follow the instructions of the wizard.
  6. Once the speech recognition UI shows up at the top of the screen, make sure it says "Listening" and the microphone circle is blue. (Click the circle if not.)
  7. Once at step 3 of the above list, say "Click Button".
  8. Number overlays appear over each of the three instances of the button labeled "Button".
  9. Say "one" followed by "OK".

The point is to get a synthetic click whose time gap between the mousedown and mouseup events is as close as it is with accessibility tools that synthesize mouse events without user control of the click duration. With mouse keys and such, the timing of the mousedown and mouseup events still depends on the key on the keyboard going down and up as pressed by the user.

Flags: needinfo?(camelia.badau)

If Windows speech recognition doesn't recognize your voice out of the box, right-click the microphone icon in the system tray, and choose "Improve voice recognition" from the configuration submenu. If with any prompted sentence the computer doesn't accept your reading of the sentence, click Pause, click Resume, and say the sentence again. (AFAICT, the Pause/Resume cycle resets the expectation to the start of the sentence.)

Probably worthwhile to also try clicking with an actual mouse and trackpad trying to make the click as quick as possible (on a slow laptop).

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

I experimented a bit. The shortest click I observed was with Windows Speech Recognition: 2 milliseconds. Most of the time, Windows Speech Recognition gives me 3-millisecond clicks (measured in Web content in Firefox with Fission turned off but with e10s on).

With a USB trackpad, I was able to tap and lift my fingers fast enough to get a 30-millisecond click.

Clicks with a physical mouse button take longer. I think we don't need to worry about those.

Problem: Even 30 ms is too short for debug builds. Locally, I need at least 50 ms in the debug case.

I will test an opt build.

(In reply to Henri Sivonen (:hsivonen) from comment #9)

I will test an opt build.

Locally, I can take the delay from mousedown to mouseup to 5 milliseconds in the test harness in the case of an opt build. That's 2 milliseconds longer than what Windows Speech Recognition can synthesize. That is, I see the event reversal if I take the delay to 4 milliseconds.

So far, it looks like we're going to be OK with physical pointing devices (as well as Dragon whose synthetic clicks are long). I'm now a bit worried about Windows Speech Recognition, though I haven't reproduced the event reversal with it.

Jamie, do you have suggestions of tools other than Windows Speech Recognition and Dragon that synthesize mouse clicks such that the duration of the mouse click depends on the tool and not on any human motion? (That is, e.g. mouse keys depend on human motion moving the key on the keyboard, so I don't mean tools like that.)

Flags: needinfo?(jteh)

Hi Henri, if you [https://www.nvaccess.org/](download NVDA) and install it, it simulates mouse clicks on the item you move to with the virtual cursor, AKA in browse mode. So if you load a web page, tab through links, then press Enter, you are in what NVDA calls reading mode. In this mode, the keyboard is completely controlled by NVDA. Pressing Enter here will not be the same as pressing Enter on a link normally would. Instead, NVDA simulates a mouse click on the deepest child, AKA the text or graphic contained in a link, instead, to make sure the element always activates properly for the blind user.

Thanks. After experimenting a bit and reading NVDA docs, it's unclear to me how to cause NVDA to synthesize a click on an object that doesn't have focus.

On https://hsivonen.fi/fission-host.html , how do I use NVDA to click a button labeled "Button" in an iframe without first moving the focus to the iframe?

Flags: needinfo?(mzehe)

Use the arrow keys as if you were in NotePad, then when NVDA speaks the item you want to click on, even if it doesn't have focus, press Enter.

Flags: needinfo?(mzehe)

(In reply to Marco Zehe (:MarcoZ) from comment #13)

Use the arrow keys as if you were in NotePad, then when NVDA speaks the item you want to click on, even if it doesn't have focus, press Enter.

For me, the arrow keys move focus, so by the time I have an opportunity to "click" the button, the focus is already there.

In NVDA's preferences, Browse Mode (or Reading Mode), there is a setting at the bottom that reads "Automatically set focus to focusable items" or similar. It is the last checkbox above the OK and Cancel buttons. In new installations, this should be off by default, but if it is not, uncheck this and try again. Then, NVDA should not move the focus unless you try to activate an element, at which point the mouse click will do it.

(In reply to Marco Zehe (:MarcoZ) from comment #15)

In NVDA's preferences, Browse Mode (or Reading Mode), there is a setting at the bottom that reads "Automatically set focus to focusable items" or similar.

I didn't find a checkbox with that label. I unchecked both focus-related boxes, but arrow keys kept moving focus.

Are you sure you are on NVDA 2020.2? Go to NVDA menu, Help, Check for Updates. You may already have NVDA installed from earlier and not be up to date, then.

Marco's suggestion is where I was headed, too. If that setting is missing, you have an old version of NVDA.

There is an extra dimension of complication here, though. There are two kinds of clicks you might end up with:

  1. NVDA asks Gecko a11y to perform default action, so Gecko a11y synthesises a click internally.
  2. If there's no default action, NVDA synthesises a click itself.

Here are test cases for both of these:

Gecko a11y synthesis:
data:text/html,<button onclick="this.textContent = 'Default action performed';">Perform default action</button>

NVDA synthesis:
data:text/html,<button onclick="this.firstElementChild.textContent = 'Clicked';"><div>Click</div>

Flags: needinfo?(jteh)

(In reply to Marco Zehe (:MarcoZ) from comment #17)

Are you sure you are on NVDA 2020.2? Go to NVDA menu, Help, Check for Updates. You may already have NVDA installed from earlier and not be up to date, then.

I installed NVDA yesterday after you suggested it. I installed 2020.2 on a virtual machine that had never had NVDA installed. About says 2020.2 and Check for Updates doesn't find updates.

(In reply to James Teh [:Jamie] from comment #18)

Here are test cases for both of these:

Thanks.

Ah. Looks like this was still in the Advanced settings in 2020.2. It moved to the Browse Mode settings in 2020.3beta1.

Thanks. I have so far tested the case that I believe to match the "Gecko a11y" synthesis case. In that case, both with and without Fission the focus event fires before mousedown. This suggests that even when NVDA is configured not to move focus on arrow keys, it moves focus to the iframe immediately before the synthetic click. Does that match what you expect it to do?

(The time from mousedown to mouseup is 3 milliseconds.)

Flags: needinfo?(jteh)

If I'm actually getting Gecko synthesis and NVDA synthesis in the cases where I expect, it seems that the latter is generally longer in duration but both can go as fast as 1 ms.

I wonder if Windows Speech Recognition actually invokes Gecko synthesis, too. Where in the source code do we perform the Gecko a11y synthesis? I'd like to try adding a delay to it.

The base implementation is here. Several derived accessible classes do specialized versions if needed as overrides. But this is in general what you need to look for.

Thanks. It turns out that currently there's not even an event loop spin between the events: https://searchfox.org/mozilla-central/rev/f4b4008f5ee00f5afa3095f48c54f16828e4b22b/accessible/generic/Accessible.cpp#1893

Try run to generate a Windows x86_64 build with 30 ms delay for a11y-generated mouseup:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=09580ac16969190c6217167ab86abd6fa247ad9b

Looks like Windows Speech Recognition does not use Gecko a11y to generate the click.

(In reply to Henri Sivonen (:hsivonen) from comment #25)

Try run to generate a Windows x86_64 build with 30 ms delay for a11y-generated mouseup:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=09580ac16969190c6217167ab86abd6fa247ad9b

Patch uploaded to bug 1665630.

(In reply to Henri Sivonen (:hsivonen) from comment #21)

In that case, both with and without Fission the focus event fires before mousedown. This suggests that even when NVDA is configured not to move focus on arrow keys, it moves focus to the iframe immediately before the synthetic click. Does that match what you expect it to do?

Oh. Sorry, I forgot about that, but yes, that's correct. (There was a point in time where that didn't happen, but it got changed for various reasons.)

(In reply to Henri Sivonen (:hsivonen) from comment #24)

Thanks. It turns out that currently there's not even an event loop spin between the events: https://searchfox.org/mozilla-central/rev/f4b4008f5ee00f5afa3095f48c54f16828e4b22b/accessible/generic/Accessible.cpp#1893

Even if we go with bug 1665630, aren't we still going to have problems with the mouse clicks synthesised by a11y clients?

Flags: needinfo?(jteh)

(In reply to James Teh [:Jamie] from comment #28)

(In reply to Henri Sivonen (:hsivonen) from comment #21)

In that case, both with and without Fission the focus event fires before mousedown. This suggests that even when NVDA is configured not to move focus on arrow keys, it moves focus to the iframe immediately before the synthetic click. Does that match what you expect it to do?

Oh. Sorry, I forgot about that, but yes, that's correct. (There was a point in time where that didn't happen, but it got changed for various reasons.)

OK. So then this bug isn't relevant to NVDA.

(In reply to Henri Sivonen (:hsivonen) from comment #24)

Thanks. It turns out that currently there's not even an event loop spin between the events: https://searchfox.org/mozilla-central/rev/f4b4008f5ee00f5afa3095f48c54f16828e4b22b/accessible/generic/Accessible.cpp#1893

Even if we go with bug 1665630, aren't we still going to have problems with the mouse clicks synthesised by a11y clients?

I still haven't ruled out problems with Windows Speech Recognition. Also, I still don't have proof of any of this actually being a Web compat problem.

Regarding other browsers:
I tested Chrome and new Edge with Windows Speech Recognition.

When things were not buggy and the overlay numbers showed up, the events happened in the order mousedown, focus, mouseup and the delay from mousedown to mouseup was about 14 ms.

This suggests that Chromium itself might do the thing that smaug suggested and delay processing of mouseup. Or, alternatively, Chromium has some accidental slowness to the same effect.

When things were buggy and the number overlays didn't show up and instead Windows Speech Recognition just went ahead and clicked one of the buttons, the events were reordered.

Firefox also shows a delay when the focusing happens in Fission. It only see the delay going to 1 ms when the OOP iframe already has focus.

I don't understand why, but since I can't demonstrate an actual problem with Windows Speech Recognition, I'm going to approach this as a test fix only.

Attached image focus.png

Hello Henri, so I was able to follow the steps from Comment 5 and I will attach a picture with the outcome. This was after some tweakings, as there were times it won't understand what to do, or times where Firefox just hanged and needed to be forcibly shut down, but finally succeeded in getting an outcome there.

This was done on Firefox 82.0a1 (2020-09-15) under Windows 10 (v2004).

Please let me know if there is anything else that I can help here with.

Flags: needinfo?(camelia.badau)

(In reply to Catalin Sasca, QA [:csasca] from comment #33)

Hello Henri, so I was able to follow the steps from Comment 5 and I will attach a picture with the outcome.

Thank you.

So far, even if measurements suggest that things should be able to go wrong with Windows Speech Recognition, we haven't witnessed them going wrong. Given this, I think we shouldn't add risky code to address a problem that we haven't actually seen occur.

According to Henri, "Since we’ve been unable to witness the problem that measurements indicated we should expect, I intend to change our test harness and a11y code and not our event queue or focus IPC."
So, moving to M7.

Fission Milestone: M6c → M7
Priority: -- → P3

smaug pointed out synthetic clicks on tap. I need to investigate the timing of those.

(In reply to Henri Sivonen (:hsivonen) from comment #38)

smaug pointed out synthetic clicks on tap. I need to investigate the timing of those.

Test case: https://hsivonen.fi/tap-click.html

The synthetic mouse events start after touchend, so there is no need to squeeze them in before touchend. While Safari makes the synthetic click really short, Chrome does not, which suggests that it would be OK for us to make the duration of the synthetic click longer than it is now.

If we still want to delay mouseup for tap-generated clicks, the right places seems to be:
https://searchfox.org/mozilla-central/rev/16d30bafd4e5276d6d3c632fb52a6c71e739cc44/gfx/layers/apz/util/APZCCallbackHelper.cpp#539

Does a 30-millisecond delay in mouseup/click after tap cause perceptible lag, though?

smaug, do you have an opinion on whether I should add a delay to tap-generated mouseup or whether I should go back to the idea of temporarily stalling the user event queue inside the process hosting the iframe?

Flags: needinfo?(bugs)

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Test case: https://hsivonen.fi/tap-click.html

I tested on my Windows 10 laptop with a touch screen. I see the same event order in Chrome, Chromium Edge, Firefox e10s, and Firefox Fission (whether clicking or tapping the "Button" button): mousedown, focus, mouseup, click. I don't see any touchstart or touchend events.

But I do see touchstart and touchend events generated in these w3schools demos of touch events:

https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_touchstart
https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_touchend

(In reply to Chris Peterson [:cpeterson] from comment #41)

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Test case: https://hsivonen.fi/tap-click.html

I tested on my Windows 10 laptop with a touch screen. I see the same event order in Chrome, Chromium Edge, Firefox e10s, and Firefox Fission (whether clicking or tapping the "Button" button): mousedown, focus, mouseup, click.

Thanks.

Neha, I take it that this counts as us not having been able to repro this as a problem that requires a user-facing fix. I.e. let's move forward with a test-only change again?

I don't see any touchstart or touchend events.

If this state of things remained after reloading the test case, this is really strange and doesn't make sense in the light of kats saying on Matrix that this should work the same way as on Android: i.e. Gecko--not Windows--generating the click.

Flags: needinfo?(bugs) → needinfo?(nkochar)

tnikkel got the same result as cpeterson.

kats solved the touch event mystery: bug 1412485 comment 8. I've updated the test case accordingly.

And regarding mousedown, focus, and mouseup, kats saw the same as cpeterson and tnikkel.

Yes, if this not reproducible in a user-noticeable way, let's only fix our tests.

Flags: needinfo?(nkochar)

It seems to be easy to have a case where this is noticeable. If mousedown listener just happens to take a tiny bit more time
http://mozilla.pettay.fi/moztests/focus_parent.html
And it is not rare to have mousedown which does something heavy.

(In reply to Olli Pettay [:smaug] from comment #47)

It seems to be easy to have a case where this is noticeable. If mousedown listener just happens to take a tiny bit more time
http://mozilla.pettay.fi/moztests/focus_parent.html
And it is not rare to have mousedown which does something heavy.

So we're back to intentionally stalling the input event queue? Edgar, did you already write code for that in another bug?

Flags: needinfo?(echen)

(In reply to Henri Sivonen (:hsivonen) from comment #48)

So we're back to intentionally stalling the input event queue? Edgar, did you already write code for that in another bug?

We already support suspending the input event queue, see https://searchfox.org/mozilla-central/rev/c37038c592a352eda0f5e77dfb58c4929bf8bcd3/dom/ipc/ContentParent.cpp#3081.
But I am not sure if that is something related to the input event queue given that I still see the same thing after disabling the input event queue, i.e. set pref input_event_queue.supported to false.

Flags: needinfo?(echen)

That's in the parent. I was thinking of making the child not take new events from the input event queue until unsuspended.

(In reply to Henri Sivonen (:hsivonen) from comment #48)
Edgar, did you already write code for that in another bug?

(In reply to Henri Sivonen (:hsivonen) from comment #50)
That's in the parent. I was thinking of making the child not take new events from the input event queue until unsuspended.

I see, then no. This is different from my case in another bug.

This simplest patch that could possible work fixes a simple test case, but let's see what it breaks:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6a4dfa3f60c9135371726826333ca0a5f4ca6195

Well, as expected, that turned out to be orange.

Attachment #9188793 - Attachment description: Bug 1660805 - In content processes, run window raising to completion synchronously to ensure correct order of focus and mouseup when focusing an OOP iframe by click. → Bug 1660805 - In OOP iframes, run window raising to completion synchronously to ensure correct order of focus and mouseup when focusing an OOP iframe by click.
Attachment #9182405 - Attachment is obsolete: true
Blocks: 1653457
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3e80dd9589d9
In OOP iframes, run window raising to completion synchronously to ensure correct order of focus and mouseup when focusing an OOP iframe by click. r=nika
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
QA Whiteboard: [qa-85b-p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: