Closed Bug 1842170 Opened 2 years ago Closed 1 year ago

When "Confirm before closing multiple tabs" is enabled, right-clicking the taskbar item and selecting "Close window" causes Firefox to become unresponsive

Categories

(Core :: Widget: Win32, defect, P3)

Desktop
Windows
defect

Tracking

()

VERIFIED FIXED
118 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 119+ verified
firefox115 --- wontfix
firefox116 --- wontfix
firefox117 --- wontfix
firefox118 --- verified

People

(Reporter: Tom25519, Assigned: rkraesig)

References

(Regression)

Details

(Keywords: hang, nightly-community, regression)

Attachments

(2 files)

  1. Enable "Confirm before closing multiple tabs"
  2. Open several tabs
  3. Right-click taskbar, and click "Close window", repeat several times
  4. Couldn't operate Firefox UI

The Bugbug bot thinks this bug should belong to the 'Firefox::Tabbed Browser' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Tabbed Browser

(In reply to Tom25519 from comment #0)

  1. Enable "Confirm before closing multiple tabs"
  2. Open several tabs
  3. Right-click taskbar, and click "Close window", repeat several times
  4. Couldn't operate Firefox UI

For the last point, are you saying you can't switch to the Firefox window and confirm the "close multiple tabs" dialog? Can you make a screenshot or screencast to help us understand the problem?

Flags: needinfo?(Tom25519)
Attached video 2023-07-12 12-30-51.mp4
Flags: needinfo?(Tom25519)
Severity: -- → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3

STR:

  1. Start Browser w/ new profile
  2. Turn on Confirm before closing multiple tabs from about:preferences#general
  3. Restart Browser
  4. Close Make primary browser popup, if any.
  5. Open Several tabs
  6. Right click on taskbar button
  7. Select Close window
  8. Click on Close Tabs button on the dialog

I can reproduce this issue on Nightly117.0a1 Windows10.

Once I click on the [Close Tab] button in the dialog, the browser remains stuck and will not operate any further mouse or keyboard operations.
I must kill it from Windows Task Manager.

Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=243ca18dc17200998c0c8d21979c15fb930e42fb&tochange=7903ed468b9ccbb72574bee396d0588325387502

Suspect: Bug 1732517

Component: Tabbed Browser → Widget: Win32
Product: Firefox → Core
Regressed by: 1732517

I don't see how bug 1732517 could be the cause; it should only affect fullscreen windows, and even then, doesn't touch event handling... but the other commits in that regression range look even less likely.


Note that an alternate, and quite surprising, exit mechanism exists:

  1. Right click on taskbar button
  2. Under Tasks, select Open new window
  3. In the new window, select Quit from the application menu (or press Ctrl+Shift+Q)

The product::component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit BugBot documentation.

Priority: P3 → --

Firefox profile was captured from a second window that had been opened beforehand. https://share.firefox.dev/3NRB7o4

Alice: how did you get a regression window here? When I run in mozregression, step 7 always closes the window, no matter the setting of browser.tabs.warnOnClose (or anything else, as far as I can tell).

Flags: needinfo?(alice0775)

(In reply to Ray Kraesig [:rkraesig] from comment #8)

Alice: how did you get a regression window here? When I run in mozregression, step 7 always closes the window, no matter the setting of browser.tabs.warnOnClose (or anything else, as far as I can tell).

I did not use mozregression.
I used locally saved builds from https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.autoland.pushdate.2022.06. However, those builds have already expired and been erased from server.

Flags: needinfo?(alice0775)

On further investigation, I don't think anything in that patch could have resulted in this behavior. Removing as regression, at least for now.

This would be S2, but if it's been latent and unnoticed for a year, it's probably an uncommonly-used-enough UI path that S3 will do.

Priority: -- → P3
No longer regressed by: 1732517
Summary: When enable "Confirm before closing multiple tabs", couldn't operate Firefox UI if right-click taskbar, and click "Close window" several times → When "Confirm before closing multiple tabs" is enabled, right-clicking the taskbar item and selecting "Close window" causes Firefox to become unresponsive

Regression triage: Let's run another regression window to see what actually regressed this behavior.

In the bad build of the window of comment#4, setting widget.windows.alternate_fullscreen_heuristics to false fix the issue. (the pref was added by Bug 1732517. And it is no longer available as it was removed due to Bug 1835851.)

So, I think this is regressed by Bug 1732517.

(In reply to Alice0775 White from comment #12)

In the bad build of the window of comment#4, setting widget.windows.alternate_fullscreen_heuristics to false fix the issue.

Well! Yes, that's definitely strong evidence that bug 1732517 is the cause. Restoring, and investigating further.

Regressed by: 1732517

Well, this is awful. Bug 1732517 surfaces the problem, but it was latent, and might have occurred at any point.

Background information: if a thread with a message queue makes certain blocking calls, the Windows kernel may call back into that thread while it is blocked, directly invoking a WndProc to process one or more messages (via the pseudofunction KiUserCallbackDispatch). This is required behavior for some calls: for example, we may react to a WM_KEYDOWN message for F11 by resizing the window via ::SetWindowPos, which itself requires the processing of a number of messages such as WM_WINDOWPOSCHANGING.

There is no documentation anywhere describing which Windows functions may result in reentrant calls to WindowProc, and so WindowProc needs to be safely reentrant at all times. To modern sensibilities, this may seem a poor architectural decision on Microsoft's part; but a) this dates back to the Windows 3.0 era and isn't going anywhere anytime soon, and b) as shall shortly be seen, we're in no position to throw stones.

The sequence of events upon clicking Close Window is approximately as follows:

  1. We receive a WM_ACTIVATE message. This is forwarded to DefWindowProc, which invokes the kernel and blocks.
  2. The kernel steals the stack to get us to process a WM_SETFOCUS message.
  3. While handling the WM_SETFOCUS message, we call (via TaskbarConcealer) the method ITaskbarList2::MarkFullscreen(), which internally sends a Windows message to the shell via ::SendMessageW(), which blocks until the receiving thread has processed the message.
  4. While ::SendMessageW is blocking, the kernel steals the stack again to get us to process a WM_SYSCOMMAND[SC_CLOSE] message.
  5. Processing the WM_SYSCOMMAND[SC_CLOSE] message invokes JavaScript, eventually calling warnOnTabsClosing, which calls openPromptSync, which calls SpinEventLoopUntil, which will not return until the prompt is clicked.

However, the prompt can't be clicked because we never receive an WM_LBUTTONDOWN message — because we never finished processing the WM_ACTIVATE or WM_SETFOCUS messages, so we don't have focus, and Windows won't send us input events!


One wouldn't ordinarily expect the kernel to make us process a WM_SYSCOMMAND[SC_CLOSE] message while we're still processing the WM_ACTIVATE message — it's not logically related nor required. This is at least plausibly a result of a deadlock-avoidance strategy: the WM_SYSCOMMAND[SC_CLOSE] is coming from the Windows shell, while the undocumented message in ITaskbarList2::MarkFullscreen() is going to the Windows shell. If two threads ::SendMessage() each other at the same time, they'll deadlock, and bumping one of those messages past the message queue might solve that. Presumably simply changing either of these to use ::PostMessage() would resolve the issue — neither one really seems to need to block on the other's result — but both of those are in Microsoft-side code, so that's not a reasonable option.

But, of course, well-behaved Windows programs should not do things like SpinEventLoopUntil at all (as is well known). warnOnTabsClosing should really be async, and use async nsIPromptService methods (which exist, as of bug 1621737 — though may not presently work; see bug 1751953). A proper fix would involve asyncifying an entire chain of calls all the way up to at least AppWindow::RequestWindowClose.

A cheap hack, on the other hand, could get away with merely tracking the current recursion depth of our WindowProc and requeuing the WM_SYSCOMMAND[SC_CLOSE] message if we're in a nested event loop. A quick-and-dirty test of that approach does seem to work.

Based on Comment 12 and Comment 13, I'm removing the regressionwindow-wanted keyword.

If we enter SpinEventLoopUntil while certain Windows events are
waiting for resolution, we may enter an unusable state.

Since removing all SpinEventLoopUntil instances -- or even the one
causing this problem -- isn't really an option at present, kick the can
down the road by kicking the can down the road [sic].

Assignee: nobody → rkraesig
Status: NEW → ASSIGNED
Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/195953c1543e attempt to avoid nested event loops at inconvenient times r=win-reviewers,mhowell
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 118 Branch
Flags: qe-verify+

Reproduced with Fx 117.0a1 (2023-07-12) on Windows 10.
Verified fixed with Fx 118.0b4 and Fx 119.0a1 (2023-09-04) on Windows 10.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Is this something we should uplift to ESR115? It grafts cleanly.

Flags: needinfo?(rkraesig)

Probably a good idea. If this had caused any issues I think we'd have seen them by now.

Flags: needinfo?(rkraesig)

Ah, wait, this hasn't actually hit Release yet -- when I said that, I thought it had landed in 117.

I'd rather wait until 118 has been the current Release for a bit, but I suppose I don't have a strong objection to uplifting, if you'd still rather.

Flags: needinfo?(ryanvm)

Waiting another cycle is fine by me. I've tagged the bug as such. Feel free to nominate whenever you're comfortable doing so.

Flags: needinfo?(ryanvm)

Is this ready for an ESR approval request?

Flags: needinfo?(rkraesig)

Comment on attachment 9348502 [details]
Bug 1842170 - attempt to avoid nested event loops at inconvenient times r?handyman,cmartin,gstoll

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Annoying bug with a low-risk patch.
  • User impact if declined: Possible frustration when attempting to close a Firefox window from the taskbar.
  • Fix Landed on Version: 118
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not believed risky: small, grafts cleanly, no problems reported along its train-ride to 118 Release.
Flags: needinfo?(rkraesig)
Attachment #9348502 - Flags: approval-mozilla-esr115?

Comment on attachment 9348502 [details]
Bug 1842170 - attempt to avoid nested event loops at inconvenient times r?handyman,cmartin,gstoll

Approved for 115.4esr.

Attachment #9348502 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+

Verified fixed with Fx 115.4.0esr on Windows 10.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: