[meta] Asynchronous, out-of-process Win32 file picker
Categories
(Core :: Widget: Win32, task, P2)
Tracking
()
People
(Reporter: bugzilla, Assigned: rkraesig)
References
(Depends on 3 open bugs, Blocks 4 open bugs)
Details
(Keywords: meta, Whiteboard: [bhr:CFileOpenSave::Show][win:stability])
Attachments
(6 files)
658 bytes,
patch
|
Details | Diff | Splinter Review | |
943 bytes,
patch
|
Details | Diff | Splinter Review | |
8.86 KB,
patch
|
Details | Diff | Splinter Review | |
26.26 KB,
patch
|
Details | Diff | Splinter Review | |
19.77 KB,
patch
|
Details | Diff | Splinter Review | |
5.65 KB,
patch
|
Details | Diff | Splinter Review |
I've got a plan for moving the COM bits of nsFilePicker
out-of-process. The benefits are twofold:
- It serves as a DLL injection mitigation, as pickers load shell extensions;
- It gives us the opportunity to show pickers asynchronously, which keeps our parent process's main thread (mostly) active.
We can pull this off because Windows provides proxy/stubs for the IFileOpenDialog
, IFileSaveDialog
, and IFileDialog
interfaces. We can create our own implementation of these interfaces that lives out of process, and forwards calls to the "real" object implementation.
Note that for the async stuff we actually need to generate our own async interface, but I think that it's worth it for the responsiveness gains we get on the main thread.
Comment 1•3 years ago
|
||
Are you really doing this now? Remind me when we meet at the next all hands because this needs a serious toast and the tab's on me.
Reporter | ||
Comment 2•3 years ago
|
||
I've landed the prerequisites :-)
Reporter | ||
Comment 3•3 years ago
|
||
Hey Jamie, would you mind trying out this try build to see whether or not I broke file picker a11y? Please note that you will need to run its installer as it registers some components.
Comment 5•3 years ago
|
||
I tried quite hard, but I couldn't break it. Nicely done.
I was mostly worried about focus not being restored correctly when you dismiss a file picker, but it does get restored correctly.
Reporter | ||
Comment 6•3 years ago
|
||
Here are some profiles with this patchset:
Preffed off (ie, still runs synchronously)
Preffed on
Comment 7•2 years ago
|
||
Toshi, can we grab Aaron's WIP before they disappear from the try server? (And attach them to this bug)
Comment 8•2 years ago
|
||
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 11•2 years ago
|
||
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
Comment 14•2 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)
Toshi, can we grab Aaron's WIP before they disappear from the try server? (And attach them to this bug)
I was curious to find out how to do this when I saw this request come through bugmail. It was easier than expected. Toshi, I figured I'd just attach the patches instead of making you go through the same excercise.
Comment 15•2 years ago
|
||
(In reply to Stephen A Pohl [:spohl] from comment #14)
I was curious to find out how to do this when I saw this request come through bugmail. It was easier than expected. Toshi, I figured I'd just attach the patches instead of making you go through the same excercise.
Thank you for doing it. I was thinking of getting commits with hg pull
, rebasing, and uploading them to phabricator.
Comment 16•2 years ago
|
||
(In reply to Toshihito Kikuchi [:toshi] from comment #15)
Thank you for doing it. I was thinking of getting commits with
hg pull
, rebasing, and uploading them to phabricator.
Good point. If we end up sending these patches for review (rather than writing new patches based on Aaron's work), then yes, that might be the better option and we can remove the attached patches in bugzilla.
Reporter | ||
Comment 17•2 years ago
|
||
I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 18•2 years ago
|
||
(In reply to (No longer employed by Mozilla) Aaron Klotz from comment #17)
I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.
Any luck?
Comment 19•2 years ago
|
||
(In reply to Ray Kraesig [:rkraesig] from comment #18)
(In reply to (No longer employed by Mozilla) Aaron Klotz from comment #17)
I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.
Any luck?
Let's proceed under the assumption that the answer is no, and adjust any patches if Aaron comes through after all.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Comment 20•1 year ago
|
||
Ray, are you still working on this? Could you provide a status update on what's missing before we can land?
Assignee | ||
Comment 21•1 year ago
|
||
I'm afraid this has almost completely fallen by the wayside. (Although the recent commentary on bug 1798791 got me to poke at my notes a bit.)
The conclusion I came to some months ago was that :aklotz's original idea of using the system DLL surrogate resulted in too much complexity at both install and (outside of the happy path) invocation time due to tying us to the global registration mechanism of COM. (See, relatedly, what happened with AccessibleHandler.dll in bug 1670147.) While the problems there are solvable, it would be simpler just to have our own custom host-process in the Firefox installation directory -- and if we have that, there's not much reason to use COM at all.
I've done some initial investigation into using either a) an auxiliary Firefox process or b) a custom host process. Both appear feasible, and I vaguely prefer b), but the metaphorical shovel has not yet struck earth.
Removing myself as the assignee because of that. Feel free to reassign me to it, though, and I'll requeue it and reprioritize things.
Assignee | ||
Comment 22•1 year ago
|
||
Removing myself as the assignee because of that. Feel free to reassign me to it, though, and I'll requeue it and reprioritize things.
Taking, because I'm working on this again.
Assignee | ||
Comment 23•5 months ago
|
||
(Bug 1683425 appears to have been a blocker for :aklotz's externally-hosted COM implementation, but does not block the current XPIDL-based approach.)
Assignee | ||
Updated•4 months ago
|
Assignee | ||
Updated•4 months ago
|
Updated•2 months ago
|
Updated•2 months ago
|
Comment 24•23 days ago
|
||
Hi :rkraesig, I was trying to understand the status of this bug. Given it is a meta-bug, I'd have expected to see some open bugs it depends on, but that seems not to be the case. And FWIW I do not see this bug anymore as top-bug in the background hang monitor data - does this mean it's fixed?
Assignee | ||
Comment 25•23 days ago
|
||
The status is that it's currently in Nightly and is pref-blocked from riding the trains. This is partly to flush out papercut bugs, and partly because we — well, I — want to reduce the failures we're seeing before letting it go out. The latter is blocked on getting more information about failures via telemetry; and getting that telemetry in has been hovering at the top of my personal to-do list for a while, but it keeps getting pushed out by P2s. There should indeed be some open bugs it depends on; I just haven't been very good about linking bugs to it. I'll see about doing that.
On the other hand, the fact that this issue ever showed up in the background hang monitor data was probably an issue with the hang-detector itself — or alternatively, that I'm not understanding what it's detecting. (The latter is quite likely, since I don't think I've ever seen that dashboard before.) The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop; and b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.
Comment 26•22 days ago
|
||
Cool, thanks!
There should indeed be some open bugs it depends on; ... I'll see about doing that.
Yeah, that would be nice.
b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.
My (also limited) understanding of the BHM is that it only detects things blocking the main thread. So moving something away on a different thread definitely will make it less harmful for main thread responsiveness and will make it not show up anymore here.
The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop;
I think the important piece here is the "alien" loop, IIUC that means we are processing native windows events but no normal thread runnables/events which from our responsiveness point of view means to be hanging while waiting, I think. This might or might not be an issue also for the thread you moved this to, depending on what the thread is supposed to do and how you handle it now. An important edge case to consider here is thread shutdown, where the thread needs to be responsive to some extent, otherwise it might block process shutdown (but I did not look at what you actually did here, so my concerns are only hints, no real ones).
Comment 27•22 days ago
|
||
(In reply to Ray Kraesig [:rkraesig] from comment #25)
On the other hand, the fact that this issue ever showed up in the background hang monitor data was probably an issue with the hang-detector itself — or alternatively, that I'm not understanding what it's detecting. (The latter is quite likely, since I don't think I've ever seen that dashboard before.) The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop; and b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.
There was some discussion in bug 1586922 about whether detecting the File Picker as a hang was a false positive. I initially thought it was, but changed my mind with the profile in bug 1586922 comment 1 that shows the main thread being blocked by OS code doing main thread I/O, and the BHR markers being in that time frame. We should not be reporting hangs during the nested event loop while waiting for the user to act on the prompt, the reported hangs should be the time it takes to open the prompt.
The entry from 2023-10-01 you pointed to shows 124,770 hangs reported within a day for a total time of 42,691s, that's an average of 342ms per hang, wait too short for it to be waiting for a user action.
For context, BHR reports a hang and captures a stack whenever the main thread has been blocked for 128ms.
Description
•