Closed Bug 1794581 Opened 2 years ago Closed 2 years ago

Add a way to open a background process outside the main thread

Categories

(Core :: XPCOM, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox108 --- fixed

People

(Reporter: saschanaz, Assigned: saschanaz)

References

Details

Attachments

(1 file)

Currently nsIProcess::RunProcess cannot be called outside main process because it observes xpcom-shutdown in nonblocking mode while nsObserverService is only available on main thread.

The observe function touches mThread, mObserver, and mShutdown:

  • mThread: This is set in nonblocking mode to call Monitor() in a separate thread.
  • mObserver: This is set only if aObserver is set.
  • mShutdown: This is used by Monitor().

I'd like to add a way to opt out monitoring the thread, and then it should be able to be used outside main thread as it won't need any shutdown cleanup.

Summary: Add a way to opt out process monitoring from nsIProcess → Add a way to opt out process monitoring from nsIProcess (for use outside main thread)

Looking at nsProcess::RunProcess further, I think the process monitoring makes sense only if 1) aBlocking is true or 2) aObserver is passed. Monitoring without them doesn't really make sense since nothing can observe the result.

So I'd like to skip calling Monitor() if none of them are passed, what do you think?

Flags: needinfo?(nika)
Flags: needinfo?(kmaglione+bmo)

IIRC this is for bug 1791675. Previously you'd create the process with PR_CreateProcessDetached (https://searchfox.org/mozilla-central/rev/e94c6cb9649bfe4e6a3888460f41bcd4fe30a6ca/netwerk/cache2/CacheFileIOManager.cpp#4094), but that isn't setting CREATE_BREAKAWAY_FROM_JOB on windows, which you want in order to make sure that the background task completes.

In general we probably want to move away from NSPR more, as was noted in bug 1406971 where we moved away from using NSPR in nsProcess, so using some other mechanism here is probably a good idea. One possibility which was mentioned in the original patches is to use base::LaunchApp, which is the same mechanism which we use to launch child processes. OTTOMH it might be reasonable to add a flag to base::LaunchOptions which specifies that the process should be treated as a background task and detached + set as BREAKAWAY_FROM_JOB, which would mean that we could continue to use the best maintained code-path we have for launching new processes for this.

Adding a needinfo? for :jld who is more familiar with LaunchApp and the platform-specific peculiarities of launching processes than I am.

Flags: needinfo?(nika) → needinfo?(jld)

IIRC this is for bug 1791675. Previously you'd create the process with PR_CreateProcessDetached (https://searchfox.org/mozilla-central/rev/e94c6cb9649bfe4e6a3888460f41bcd4fe30a6ca/netwerk/cache2/CacheFileIOManager.cpp#4094), but that isn't setting CREATE_BREAKAWAY_FROM_JOB on windows, which you want in order to make sure that the background task completes.

Yes, exactly.

which is the same mechanism which we use to launch child processes.

Only for XP_UNIX. Is there some reason not to use it on Windows and macOS? Maybe because LaunchOptions is not rich enough for nsIProcess?

See Also: → 1791675

macOS is XP_UNIX. I didn't change the nsIProcess backend on Windows because it wasn't causing problems the way NSPR's eager waiting was on Unix, but as far as I know it could be changed.

I think we could add a flag to LaunchOptions for creating a background task, as described in comment #2; on the Unix side I think we'd implement it by double-forking (so the process is inherited by init and doesn't need to be waited by us if it exits first) and calling setsid (to remove it from its session, process group, and controlling tty; practically speaking, this means that it won't get SIGHUP if the terminal it was run from is closed, nor SIGINT/SIGQUIT from terminal keys).

But, launching processes is also one case where macOS isn't exactly “unix”; we have a separate implementation using posix_spawn instead of fork/exec (and there are reasons for this). macOS does have a nonstandard flag POSIX_SPAWN_SETSID but we may need a fallback; it appears in the public xnu repository as being added in a commit named "xnu-6153.11.26", which is apparently macOS 10.15? And we still support back to 10.12. There's another flag, POSIX_SPAWN_SETEXEC, which skips the built-in fork, so we might be able to fork + fork + setsid + spawn-as-exec or something like that.

Flags: needinfo?(jld)

macOS is XP_UNIX.

Yes, but nsIProcess uses XP_MACOSX for separate impl as you described later.

I think we could add a flag to LaunchOptions for creating a background task, as described in comment #2;

👍, I'll try writing a patch.

on the Unix side I think we'd implement it by double-forking

Does Unix need anything more here though? This bug exists because of Windows-only JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, I believe other platforms are fine. In other words, I don't want to wait for background task processes.

I agree that, for the sake of launching things off main thread, adding the features we need to LaunchApp makes much more sense than adapting nsIProcess to work off main thread.

That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there. As I said elsewhere, if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched. And if you do have that shutdown blocker, any runnables you dispatch before the shutdown phase end are guaranteed to run before we fast exit the app. You'll also ideally need to dispatch a runnable to the main thread to clear the shutdown blocker anyway.

Flags: needinfo?(kmaglione+bmo)

(In reply to Kris Maglione [:kmag] from comment #6)

As I said elsewhere, if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.

For more context, which I also added elsewhere: Right now, fast shutdown always happens after XPCOM thread shutdown, so anything dispatched to an XPCOM thread event queue in an early shutdown phase can reasonably be expected to run to completion before shutdown. And anything it dispatches to the main thread event queue can also be expected to run, since we clear the event queue after we finish each shutdown phase, before we advance the phase and possibly trigger fast shutdown.

But the goal, as I understand it, and as the fast shutdown code code is currently written, is to move the fast shutdown phase to the start of XPCOMShutdownThreads, i.e., before we dispatch the "xpcom-shutdown-threads" observer notification and explicitly shutdown non-main threads. Which means that threads will no longer be able to expect runnables dispatched from earlier shutdown phases to actually run before app shutdown unless they explicitly block shutdown until they have completed.

That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.

I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.

if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.

I believe this is done elsewhere in the code I'm touching as 1) it's being launched in an existing shutdown cleanup function 2) we have seen shutdown crash there because of long shutdown, that means we absolutely are waiting for the cleanup function to finish. Thus, shutdown blocking shouldn't be something we should worry about. (And that means... XPCOM should always be alive at that point I guess?)

(In reply to Kagami :saschanaz from comment #5)

macOS is XP_UNIX.

Yes, but nsIProcess uses XP_MACOSX for separate impl as you described later.

Thanks for the correction; I'd forgotten about that part of nsProcess.

on the Unix side I think we'd implement it by double-forking

Does Unix need anything more here though? This bug exists because of Windows-only JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE, I believe other platforms are fine. In other words, I don't want to wait for background task processes.

The main problem I see on Unix is when we use this and the parent process isn't already shutting down; when the child process exits, it becomes a zombie process until the parent uses one of the wait* calls to collect it. So we'd either need something to monitor it and waitpid, or fork twice so the process is reparented to init (which will collect it). There are also the issues I mentioned if the child process inherits the browser's process group and/or controlling tty; that won't kill the process automatically when the rest of the app exits like what seems to be happening on Windows, but it could affect some users and it might not be the result we want.

(In reply to Kagami :saschanaz from comment #8)

if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.

I believe this is done elsewhere in the code I'm touching as 1) it's being launched in an existing shutdown cleanup function 2) we have seen shutdown crash there because of long shutdown, that means we absolutely are waiting for the cleanup function to finish. Thus, shutdown blocking shouldn't be something we should worry about. (And that means... XPCOM should always be alive at that point I guess?)

That's not necessarily true. As long as we don't fast shutdown before XPCOM thread shutdown, you can probably mostly rely on cleanup tasks on background threads to complete before shutdown, but we won't always wait until after thread shutdown to fast quit, at which point any tasks without a blocker that waits for them to complete will run or fail sporadically depending on timing.

(In reply to Kagami :saschanaz from comment #8)

That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.

I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.

OK.

(In reply to Kris Maglione [:kmag] from comment #10)

That's not necessarily true. As long as we don't fast shutdown before XPCOM thread shutdown, you can probably mostly rely on cleanup tasks on background threads to complete before shutdown, but we won't always wait until after thread shutdown to fast quit, at which point any tasks without a blocker that waits for them to complete will run or fail sporadically depending on timing.

Hmm, indeed I don't see any addBlocker call in netwerk/cache2 nor in toolkit/components/cleardata. But I'm not sure calling a process opening function takes longer than existing storage I/O, does it? In that case I think we can just file a bug as a potential existing bug for each component.

How would a synchronous C++ function use addBlocker, btw? The callback won't be called until the synchronous function finishes, right? Could you give me some examples?

(In reply to Kagami :saschanaz from comment #8)

That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.

I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.

OK.

(If anyone has any idea why that's failing or have seen similar situation, please help)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #9)

The main problem I see on Unix is when we use this and the parent process isn't already shutting down; when the child process exits, it becomes a zombie process until the parent uses one of the wait* calls to collect it. So we'd either need something to monitor it and waitpid, or fork twice so the process is reparented to init (which will collect it). There are also the issues I mentioned if the child process inherits the browser's process group and/or controlling tty; that won't kill the process automatically when the rest of the app exits like what seems to be happening on Windows, but it could affect some users and it might not be the result we want.

Interesting, does it apply to all processing opening functions e.g. PR_CreateProcess?

Summary: Add a way to opt out process monitoring from nsIProcess (for use outside main thread) → Add a way to open a background process outside the main thread
See Also: → 1797064

This for now only applies to Windows. macOS/Linux support may follow as a separate effort.

Assignee: nobody → krosylight
Status: NEW → ASSIGNED
Pushed by krosylight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/367202604a94 Add base::LaunchOptions::start_independent for Windows use r=jld
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch

I think that this change causes these Thunderbird mochitest failures. This line appears in the logs which wasn't there before:

[Parent 7620, Cache2 I/O] WARNING: CreateProcess Failed: 5: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_win.cc:313

Any ideas what's going wrong here? (This is well outside my areas of expertise, I'm just investigating the test failures.)

Flags: needinfo?(krosylight)

Quite not sure what is happening there, might be more related to bug 1791675 than this one.

Can you check the value of argv in https://searchfox.org/comm-central/rev/d6d09ef935d8da3fc484514178638167b7460ac4/mozilla/toolkit/components/backgroundtasks/BackgroundTasksRunner.cpp#47 ? And perhaps what happens if you comment out line 33?

Flags: needinfo?(krosylight) → needinfo?(geoff)

It finally occurred to me I could use the Try server to answer the questions, as I have no Windows build to experiment with.

I commented out start_independent = true and printf'ed argv:

  • Z:\task_166915801652956\build\application\thunderbird\thunderbird.exe
  • --backgroundtask
  • removeDirectory
  • C:\Users\task_166915801652956\AppData\Local\Temp\tmpn0cb_uis.mozrunner
  • 0
  • .purge.bg_rm
  • (null)
Flags: needinfo?(geoff) → needinfo?(krosylight)

argv doesn't seem problematic. Did commenting out the flag fix the test failures?

Perhaps Thunderbird just doesn't support background tasks? Previously purgeHTTPCache caller didn't warn when the process launch fails, so it may be just that we detected an existing breakage. Thunderbird still seemingly has MOZ_BACKGROUNDTASKS so I'm not sure how that can happen, though. Nick, do you have any idea?

Flags: needinfo?(nalexander)
Flags: needinfo?(krosylight)
Flags: needinfo?(geoff)

Did commenting out the flag fix the test failures?

It did.

Flags: needinfo?(geoff)

Background tasks should generally work in Thunderbird (automated tests run successfully for them.)

Hmmmm. In that case the only thing I can think of is that somehow Thunderbird is not setting JOB_OBJECT_LIMIT_BREAKAWAY_OK when creating a job, per the flag description: https://learn.microsoft.com/en-us/windows/win32/procthread/process-creation-flags#flags

The child processes of a process associated with a job are not associated with the job.
If the calling process is not associated with a job, this constant has no effect. If the calling process is associated with a job, the job must set the JOB_OBJECT_LIMIT_BREAKAWAY_OK limit.

Could you try adding this function to mozilla/ipc/chromium/src/base/process_util_win.cc, call it from https://searchfox.org/comm-central/rev/d99c2ec689b941ae120776f423726f59875957a5/mozilla/ipc/chromium/src/base/process_util_win.cc#297 and check the result? printf for each branch will help.

I'll also try getting some time to investigate.

Flags: needinfo?(nalexander)
Flags: needinfo?(geoff)

NeedToBreakAwayFromJob returned false from the last branch, JOB_OBJECT_LIMIT_BREAKAWAY_OK isn't set.

Flags: needinfo?(geoff)

Interesting! Now it's clear Thunderbird uses jobs on Windows but for some reason without JOB_OBJECT_LIMIT_BREAKAWAY_OK. The task is to identify how it's launching the process and add that flag.

Is this something Thunderbird people can investigate?

Flags: needinfo?(geoff)
See Also: → 1802559

Rob, can you offer any assistance here? I think this is as far as I can reasonably get on this one.

Flags: needinfo?(geoff) → needinfo?(rob)

(In reply to Kagami :saschanaz from comment #19)

argv doesn't seem problematic. Did commenting out the flag fix the test failures?

Perhaps Thunderbird just doesn't support background tasks? Previously purgeHTTPCache caller didn't warn when the process launch fails, so it may be just that we detected an existing breakage. Thunderbird still seemingly has MOZ_BACKGROUNDTASKS so I'm not sure how that can happen, though. Nick, do you have any idea?

I have little insight here, but I will note that the error is CreateProcess failed: 5 (from here). Error 5 is "Access is denied"; I think that means that the relevant EXE either doesn't exist or can't be executed. Are we confident that the backslashes are correctly interpreted on Windows in this invocation?

Also, it rather looks like we have an empty string argument. Kagami, I recall you introduced some special handling/quoting for that case. Can you suggest how to dump the actual invocation command line to be sure it's being handled correctly?

Flags: needinfo?(krosylight)

(In reply to Nick Alexander :nalexander [he/him] from comment #26)

I have little insight here, but I will note that the error is CreateProcess failed: 5 (from here). Error 5 is "Access is denied"; I think that means that the relevant EXE either doesn't exist or can't be executed. Are we confident that the backslashes are correctly interpreted on Windows in this invocation?

Comment #20 and comment #23 suggest this is just a flag issue, IMO. Docs say it must have JOB_OBJECT_LIMIT_BREAKAWAY_OK, so that's probably this.

Also, it rather looks like we have an empty string argument. Kagami, I recall you introduced some special handling/quoting for that case. Can you suggest how to dump the actual invocation command line to be sure it's being handled correctly?

Printf-ing cmdline should do that.

Flags: needinfo?(krosylight)

Background tasks work just fine on my test win 11 VM. I ran a removeDirectory and a success task no problems. So running the command itself works.

I will try to get a debugger on this, but it might not be until tomorrow.

I wonder how the Thunderbird main process launches. Does it hit mozilla::LauncherMain? Maybe it's using a different path?

I knew this would come up...

https://searchfox.org/comm-central/rev/219c6396225f9c262cef4fe7766f5c31cefe2a05/mozilla/toolkit/xre/nsWindowsWMain.cpp#136

No, Thunderbird builds do not use LauncherMain. That header file lives at /browser/app/winlauncher, and (rightly) is not part of the build.

IIRC, when that whole directory was created a couple of years back, there was some initial Thunderbird build bustage. At the time I was told that Thunderbird didn't need all of that, and a fix was made somewhere in toolkit I think to fix the bustage.

Maybe that assessment is no longer true and Thunderbird needs the winlauncher code now? Maybe not? I never really understood why it was deemed unnecessary. If Thunderbird should be using winlauncher, there's likely some significant work to do to make it work. I can help with build config but C++ not so much.

Flags: needinfo?(rob)

I didn't notice at the time, but 24 hours ago the test tasks stopped failing on TaskCluster. I'm assuming that bug 1765732 covered up the problem but it still exists. Does that matter though?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: