When "Confirm before closing multiple tabs" is enabled, right-clicking the taskbar item and selecting "Close window" causes Firefox to become unresponsive
Categories
(Core :: Widget: Win32, defect, P3)
Tracking
()
People
(Reporter: Tom25519, Assigned: rkraesig)
References
(Regression)
Details
(Keywords: hang, nightly-community, regression)
Attachments
(2 files)
448.42 KB,
video/mp4
|
Details | |
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-esr115+
|
Details | Review |
- Enable "Confirm before closing multiple tabs"
- Open several tabs
- Right-click taskbar, and click "Close window", repeat several times
- Couldn't operate Firefox UI
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Firefox::Tabbed Browser' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•2 years ago
|
||
(In reply to Tom25519 from comment #0)
- Enable "Confirm before closing multiple tabs"
- Open several tabs
- Right-click taskbar, and click "Close window", repeat several times
- Couldn't operate Firefox UI
For the last point, are you saying you can't switch to the Firefox window and confirm the "close multiple tabs" dialog? Can you make a screenshot or screencast to help us understand the problem?
Updated•2 years ago
|
Comment 4•2 years ago
•
|
||
STR:
- Start Browser w/ new profile
- Turn on
Confirm before closing multiple tabs
from about:preferences#general - Restart Browser
- Close
Make primary browser popup
, if any. - Open Several tabs
- Right click on taskbar button
- Select
Close window
- Click on
Close Tabs
button on the dialog
I can reproduce this issue on Nightly117.0a1 Windows10.
Once I click on the [Close Tab] button in the dialog, the browser remains stuck and will not operate any further mouse or keyboard operations.
I must kill it from Windows Task Manager.
Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=243ca18dc17200998c0c8d21979c15fb930e42fb&tochange=7903ed468b9ccbb72574bee396d0588325387502
Suspect: Bug 1732517
Updated•2 years ago
|
Assignee | ||
Comment 5•2 years ago
|
||
I don't see how bug 1732517 could be the cause; it should only affect fullscreen windows, and even then, doesn't touch event handling... but the other commits in that regression range look even less likely.
Note that an alternate, and quite surprising, exit mechanism exists:
- Right click on taskbar button
- Under
Tasks
, selectOpen new window
- In the new window, select
Quit
from the application menu (or pressCtrl+Shift+Q
)
Comment 6•2 years ago
|
||
The product::component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit BugBot documentation.
Comment 7•2 years ago
|
||
Firefox profile was captured from a second window that had been opened beforehand. https://share.firefox.dev/3NRB7o4
Assignee | ||
Comment 8•2 years ago
|
||
Alice: how did you get a regression window here? When I run in mozregression, step 7 always closes the window, no matter the setting of browser.tabs.warnOnClose
(or anything else, as far as I can tell).
Comment 9•2 years ago
|
||
(In reply to Ray Kraesig [:rkraesig] from comment #8)
Alice: how did you get a regression window here? When I run in mozregression, step 7 always closes the window, no matter the setting of
browser.tabs.warnOnClose
(or anything else, as far as I can tell).
I did not use mozregression.
I used locally saved builds from https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.autoland.pushdate.2022.06. However, those builds have already expired and been erased from server.
Updated•2 years ago
|
Assignee | ||
Comment 10•2 years ago
•
|
||
On further investigation, I don't think anything in that patch could have resulted in this behavior. Removing as regression, at least for now.
This would be S2, but if it's been latent and unnoticed for a year, it's probably an uncommonly-used-enough UI path that S3 will do.
Comment 11•1 years ago
|
||
Regression triage: Let's run another regression window to see what actually regressed this behavior.
Comment 12•1 years ago
•
|
||
In the bad build of the window of comment#4, setting widget.windows.alternate_fullscreen_heuristics to false fix the issue. (the pref was added by Bug 1732517. And it is no longer available as it was removed due to Bug 1835851.)
So, I think this is regressed by Bug 1732517.
Assignee | ||
Comment 13•1 years ago
•
|
||
(In reply to Alice0775 White from comment #12)
In the bad build of the window of comment#4, setting widget.windows.alternate_fullscreen_heuristics to false fix the issue.
Well! Yes, that's definitely strong evidence that bug 1732517 is the cause. Restoring, and investigating further.
Updated•1 years ago
|
Assignee | ||
Comment 14•1 years ago
|
||
Well, this is awful. Bug 1732517 surfaces the problem, but it was latent, and might have occurred at any point.
Background information: if a thread with a message queue makes certain blocking calls, the Windows kernel may call back into that thread while it is blocked, directly invoking a WndProc
to process one or more messages (via the pseudofunction KiUserCallbackDispatch
). This is required behavior for some calls: for example, we may react to a WM_KEYDOWN
message for F11 by resizing the window via ::SetWindowPos
, which itself requires the processing of a number of messages such as WM_WINDOWPOSCHANGING
.
There is no documentation anywhere describing which Windows functions may result in reentrant calls to WindowProc
, and so WindowProc
needs to be safely reentrant at all times. To modern sensibilities, this may seem a poor architectural decision on Microsoft's part; but a) this dates back to the Windows 3.0 era and isn't going anywhere anytime soon, and b) as shall shortly be seen, we're in no position to throw stones.
The sequence of events upon clicking Close Window is approximately as follows:
- We receive a
WM_ACTIVATE
message. This is forwarded toDefWindowProc
, which invokes the kernel and blocks. - The kernel steals the stack to get us to process a
WM_SETFOCUS
message. - While handling the
WM_SETFOCUS
message, we call (viaTaskbarConcealer
) the methodITaskbarList2::MarkFullscreen()
, which internally sends a Windows message to the shell via::SendMessageW()
, which blocks until the receiving thread has processed the message. - While
::SendMessageW
is blocking, the kernel steals the stack again to get us to process aWM_SYSCOMMAND[SC_CLOSE]
message. - Processing the
WM_SYSCOMMAND[SC_CLOSE]
message invokes JavaScript, eventually callingwarnOnTabsClosing
, which callsopenPromptSync
, which callsSpinEventLoopUntil
, which will not return until the prompt is clicked.
However, the prompt can't be clicked because we never receive an WM_LBUTTONDOWN
message — because we never finished processing the WM_ACTIVATE
or WM_SETFOCUS
messages, so we don't have focus, and Windows won't send us input events!
One wouldn't ordinarily expect the kernel to make us process a WM_SYSCOMMAND[SC_CLOSE]
message while we're still processing the WM_ACTIVATE
message — it's not logically related nor required. This is at least plausibly a result of a deadlock-avoidance strategy: the WM_SYSCOMMAND[SC_CLOSE]
is coming from the Windows shell, while the undocumented message in ITaskbarList2::MarkFullscreen()
is going to the Windows shell. If two threads ::SendMessage()
each other at the same time, they'll deadlock, and bumping one of those messages past the message queue might solve that. Presumably simply changing either of these to use ::PostMessage()
would resolve the issue — neither one really seems to need to block on the other's result — but both of those are in Microsoft-side code, so that's not a reasonable option.
But, of course, well-behaved Windows programs should not do things like SpinEventLoopUntil
at all (as is well known). warnOnTabsClosing
should really be async
, and use async nsIPromptService
methods (which exist, as of bug 1621737 — though may not presently work; see bug 1751953). A proper fix would involve asyncifying an entire chain of calls all the way up to at least AppWindow::RequestWindowClose
.
A cheap hack, on the other hand, could get away with merely tracking the current recursion depth of our WindowProc
and requeuing the WM_SYSCOMMAND[SC_CLOSE]
message if we're in a nested event loop. A quick-and-dirty test of that approach does seem to work.
Comment 15•1 years ago
|
||
Based on Comment 12 and Comment 13, I'm removing the regressionwindow-wanted keyword.
Assignee | ||
Comment 16•1 year ago
|
||
If we enter SpinEventLoopUntil
while certain Windows events are
waiting for resolution, we may enter an unusable state.
Since removing all SpinEventLoopUntil
instances -- or even the one
causing this problem -- isn't really an option at present, kick the can
down the road by kicking the can down the road [sic].
Updated•1 year ago
|
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
bugherder |
Updated•1 year ago
|
Comment 19•1 year ago
|
||
Reproduced with Fx 117.0a1 (2023-07-12) on Windows 10.
Verified fixed with Fx 118.0b4 and Fx 119.0a1 (2023-09-04) on Windows 10.
Comment 20•1 year ago
|
||
Is this something we should uplift to ESR115? It grafts cleanly.
Assignee | ||
Comment 21•1 year ago
|
||
Probably a good idea. If this had caused any issues I think we'd have seen them by now.
Assignee | ||
Comment 22•1 year ago
|
||
Ah, wait, this hasn't actually hit Release yet -- when I said that, I thought it had landed in 117.
I'd rather wait until 118 has been the current Release for a bit, but I suppose I don't have a strong objection to uplifting, if you'd still rather.
Comment 23•1 year ago
|
||
Waiting another cycle is fine by me. I've tagged the bug as such. Feel free to nominate whenever you're comfortable doing so.
Assignee | ||
Comment 25•1 year ago
|
||
Comment on attachment 9348502 [details]
Bug 1842170 - attempt to avoid nested event loops at inconvenient times r?handyman,cmartin,gstoll
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Annoying bug with a low-risk patch.
- User impact if declined: Possible frustration when attempting to close a Firefox window from the taskbar.
- Fix Landed on Version: 118
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Not believed risky: small, grafts cleanly, no problems reported along its train-ride to 118 Release.
Comment 26•1 year ago
|
||
uplift |
Comment 27•1 year ago
|
||
Comment on attachment 9348502 [details]
Bug 1842170 - attempt to avoid nested event loops at inconvenient times r?handyman,cmartin,gstoll
Approved for 115.4esr.
Updated•1 year ago
|
Comment 28•1 year ago
|
||
Verified fixed with Fx 115.4.0esr on Windows 10.
Description
•