Open Bug 1677170 Opened 3 years ago Updated 20 days ago

[meta] Asynchronous, out-of-process Win32 file picker

Categories

(Core :: Widget: Win32, task, P2)

Unspecified
Windows
task

Tracking

()

People

(Reporter: bugzilla, Assigned: rkraesig)

References

(Depends on 3 open bugs, Blocks 4 open bugs)

Details

(Keywords: meta, Whiteboard: [bhr:CFileOpenSave::Show][win:stability])

Attachments

(6 files)

I've got a plan for moving the COM bits of nsFilePicker out-of-process. The benefits are twofold:

  • It serves as a DLL injection mitigation, as pickers load shell extensions;
  • It gives us the opportunity to show pickers asynchronously, which keeps our parent process's main thread (mostly) active.

We can pull this off because Windows provides proxy/stubs for the IFileOpenDialog, IFileSaveDialog, and IFileDialog interfaces. We can create our own implementation of these interfaces that lives out of process, and forwards calls to the "real" object implementation.

Note that for the async stuff we actually need to generate our own async interface, but I think that it's worth it for the responsiveness gains we get on the main thread.

See Also: → 1468250

Are you really doing this now? Remind me when we meet at the next all hands because this needs a serious toast and the tab's on me.

I've landed the prerequisites :-)

Hey Jamie, would you mind trying out this try build to see whether or not I broke file picker a11y? Please note that you will need to run its installer as it registers some components.

See comment 3.

Flags: needinfo?(jteh)

I tried quite hard, but I couldn't break it. Nicely done.

I was mostly worried about focus not being restored correctly when you dismiss a file picker, but it does get restored correctly.

Flags: needinfo?(jteh)

Here are some profiles with this patchset:

Preffed off (ie, still runs synchronously)
Preffed on

Depends on: 1683425
See Also: → 1683033
See Also: → 1321097

Toshi, can we grab Aaron's WIP before they disappear from the try server? (And attach them to this bug)

Flags: needinfo?(tkikuchi)
Attachment #9261053 - Attachment is patch: true
Attachment #9261056 - Attachment is patch: true
Attachment #9261057 - Attachment is patch: true

(In reply to Gian-Carlo Pascutto [:gcp] from comment #7)

Toshi, can we grab Aaron's WIP before they disappear from the try server? (And attach them to this bug)

I was curious to find out how to do this when I saw this request come through bugmail. It was easier than expected. Toshi, I figured I'd just attach the patches instead of making you go through the same excercise.

Flags: needinfo?(tkikuchi)

(In reply to Stephen A Pohl [:spohl] from comment #14)

I was curious to find out how to do this when I saw this request come through bugmail. It was easier than expected. Toshi, I figured I'd just attach the patches instead of making you go through the same excercise.

Thank you for doing it. I was thinking of getting commits with hg pull, rebasing, and uploading them to phabricator.

(In reply to Toshihito Kikuchi [:toshi] from comment #15)

Thank you for doing it. I was thinking of getting commits with hg pull, rebasing, and uploading them to phabricator.

Good point. If we end up sending these patches for review (rather than writing new patches based on Aaron's work), then yes, that might be the better option and we can remove the attached patches in bugzilla.

I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [bhr:CFileOpenSave::Show]
See Also: → 1743393

(In reply to (No longer employed by Mozilla) Aaron Klotz from comment #17)

I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.

Any luck?

Assignee: bugzilla → rkraesig
See Also: → 1771638

(In reply to Ray Kraesig [:rkraesig] from comment #18)

(In reply to (No longer employed by Mozilla) Aaron Klotz from comment #17)

I've still got the patchset, which is probably more up to date than whatever was on try. Let me see if I can dig them up.

Any luck?

Let's proceed under the assumption that the answer is no, and adjust any patches if Aaron comes through after all.

See Also: → 1670411
Depends on: 1743393
See Also: 1743393
Blocks: 1743393
No longer depends on: 1743393
See Also: → 1794264
See Also: → 1798791
Whiteboard: [bhr:CFileOpenSave::Show] → [bhr:CFileOpenSave::Show][win:stability]

Ray, are you still working on this? Could you provide a status update on what's missing before we can land?

Flags: needinfo?(rkraesig)

I'm afraid this has almost completely fallen by the wayside. (Although the recent commentary on bug 1798791 got me to poke at my notes a bit.)

The conclusion I came to some months ago was that :aklotz's original idea of using the system DLL surrogate resulted in too much complexity at both install and (outside of the happy path) invocation time due to tying us to the global registration mechanism of COM. (See, relatedly, what happened with AccessibleHandler.dll in bug 1670147.) While the problems there are solvable, it would be simpler just to have our own custom host-process in the Firefox installation directory -- and if we have that, there's not much reason to use COM at all.

I've done some initial investigation into using either a) an auxiliary Firefox process or b) a custom host process. Both appear feasible, and I vaguely prefer b), but the metaphorical shovel has not yet struck earth.

Removing myself as the assignee because of that. Feel free to reassign me to it, though, and I'll requeue it and reprioritize things.

Assignee: rkraesig → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(rkraesig)
Depends on: 1704500
Depends on: 1816740

Removing myself as the assignee because of that. Feel free to reassign me to it, though, and I'll requeue it and reprioritize things.

Taking, because I'm working on this again.

Assignee: nobody → rkraesig
Depends on: 1833450
See Also: → 1810341
Blocks: 1582795
See Also: → 112134
See Also: → 1837008
Depends on: 1837079
Blocks: 1845379
Depends on: 1858225
Depends on: 1862712

(Bug 1683425 appears to have been a blocker for :aklotz's externally-hosted COM implementation, but does not block the current XPIDL-based approach.)

No longer depends on: 1683425
Blocks: 1810341
See Also: 1810341
Blocks: 705190
No longer blocks: 1810341
No longer blocks: 1675831
See Also: → 1866517
Summary: Asynchronous, out-of-process Win32 file picker → [meta] Asynchronous, out-of-process Win32 file picker
Depends on: 1872397
Blocks: 1837008
See Also: 1837008
No longer blocks: 705190
Depends on: 705190
No longer blocks: 1661752
See Also: → 1661752
See Also: → 1749130
No longer depends on: 705190

Hi :rkraesig, I was trying to understand the status of this bug. Given it is a meta-bug, I'd have expected to see some open bugs it depends on, but that seems not to be the case. And FWIW I do not see this bug anymore as top-bug in the background hang monitor data - does this mean it's fixed?

Flags: needinfo?(rkraesig)

The status is that it's currently in Nightly and is pref-blocked from riding the trains. This is partly to flush out papercut bugs, and partly because we — well, I — want to reduce the failures we're seeing before letting it go out. The latter is blocked on getting more information about failures via telemetry; and getting that telemetry in has been hovering at the top of my personal to-do list for a while, but it keeps getting pushed out by P2s. There should indeed be some open bugs it depends on; I just haven't been very good about linking bugs to it. I'll see about doing that.

On the other hand, the fact that this issue ever showed up in the background hang monitor data was probably an issue with the hang-detector itself — or alternatively, that I'm not understanding what it's detecting. (The latter is quite likely, since I don't think I've ever seen that dashboard before.) The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop; and b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.

Flags: needinfo?(rkraesig)

Cool, thanks!

There should indeed be some open bugs it depends on; ... I'll see about doing that.

Yeah, that would be nice.

b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.

My (also limited) understanding of the BHM is that it only detects things blocking the main thread. So moving something away on a different thread definitely will make it less harmful for main thread responsiveness and will make it not show up anymore here.

The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop;

I think the important piece here is the "alien" loop, IIUC that means we are processing native windows events but no normal thread runnables/events which from our responsiveness point of view means to be hanging while waiting, I think. This might or might not be an issue also for the thread you moved this to, depending on what the thread is supposed to do and how you handle it now. An important edge case to consider here is thread shutdown, where the thread needs to be responsive to some extent, otherwise it might block process shutdown (but I did not look at what you actually did here, so my concerns are only hints, no real ones).

(In reply to Ray Kraesig [:rkraesig] from comment #25)

On the other hand, the fact that this issue ever showed up in the background hang monitor data was probably an issue with the hang-detector itself — or alternatively, that I'm not understanding what it's detecting. (The latter is quite likely, since I don't think I've ever seen that dashboard before.) The stack from its entry on 2023-10-01 suggests that a) the process wasn't actually "hung", just in a nested (but alien) modal event loop; and b) that it stopped being detected when bug 1858225 landed. That was the "async" part of this bug, which pushed the uncontrolled modal event loop into a separate thread.

There was some discussion in bug 1586922 about whether detecting the File Picker as a hang was a false positive. I initially thought it was, but changed my mind with the profile in bug 1586922 comment 1 that shows the main thread being blocked by OS code doing main thread I/O, and the BHR markers being in that time frame. We should not be reporting hangs during the nested event loop while waiting for the user to act on the prompt, the reported hangs should be the time it takes to open the prompt.

The entry from 2023-10-01 you pointed to shows 124,770 hangs reported within a day for a total time of 42,691s, that's an average of 342ms per hang, wait too short for it to be waiting for a user action.

For context, BHR reports a hang and captures a stack whenever the main thread has been blocked for 128ms.

Depends on: 1883943
Depends on: 1884221
Depends on: 1884426
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: