Add a way to open a background process outside the main thread
Categories
(Core :: XPCOM, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox108 | --- | fixed |
People
(Reporter: saschanaz, Assigned: saschanaz)
References
Details
Attachments
(1 file)
Currently nsIProcess::RunProcess
cannot be called outside main process because it observes xpcom-shutdown
in nonblocking mode while nsObserverService
is only available on main thread.
The observe function touches mThread
, mObserver
, and mShutdown
:
mThread
: This is set in nonblocking mode to callMonitor()
in a separate thread.mObserver
: This is set only ifaObserver
is set.mShutdown
: This is used byMonitor()
.
I'd like to add a way to opt out monitoring the thread, and then it should be able to be used outside main thread as it won't need any shutdown cleanup.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
Looking at nsProcess::RunProcess further, I think the process monitoring makes sense only if 1) aBlocking
is true or 2) aObserver
is passed. Monitoring without them doesn't really make sense since nothing can observe the result.
So I'd like to skip calling Monitor()
if none of them are passed, what do you think?
Comment 2•2 years ago
|
||
IIRC this is for bug 1791675. Previously you'd create the process with PR_CreateProcessDetached
(https://searchfox.org/mozilla-central/rev/e94c6cb9649bfe4e6a3888460f41bcd4fe30a6ca/netwerk/cache2/CacheFileIOManager.cpp#4094), but that isn't setting CREATE_BREAKAWAY_FROM_JOB
on windows, which you want in order to make sure that the background task completes.
In general we probably want to move away from NSPR more, as was noted in bug 1406971 where we moved away from using NSPR in nsProcess
, so using some other mechanism here is probably a good idea. One possibility which was mentioned in the original patches is to use base::LaunchApp
, which is the same mechanism which we use to launch child processes. OTTOMH it might be reasonable to add a flag to base::LaunchOptions
which specifies that the process should be treated as a background task and detached + set as BREAKAWAY_FROM_JOB
, which would mean that we could continue to use the best maintained code-path we have for launching new processes for this.
Adding a needinfo? for :jld who is more familiar with LaunchApp
and the platform-specific peculiarities of launching processes than I am.
Assignee | ||
Comment 3•2 years ago
|
||
IIRC this is for bug 1791675. Previously you'd create the process with PR_CreateProcessDetached (https://searchfox.org/mozilla-central/rev/e94c6cb9649bfe4e6a3888460f41bcd4fe30a6ca/netwerk/cache2/CacheFileIOManager.cpp#4094), but that isn't setting CREATE_BREAKAWAY_FROM_JOB on windows, which you want in order to make sure that the background task completes.
Yes, exactly.
which is the same mechanism which we use to launch child processes.
Only for XP_UNIX. Is there some reason not to use it on Windows and macOS? Maybe because LaunchOptions is not rich enough for nsIProcess?
Comment 4•2 years ago
|
||
macOS is XP_UNIX
. I didn't change the nsIProcess
backend on Windows because it wasn't causing problems the way NSPR's eager wait
ing was on Unix, but as far as I know it could be changed.
I think we could add a flag to LaunchOptions
for creating a background task, as described in comment #2; on the Unix side I think we'd implement it by double-forking (so the process is inherited by init
and doesn't need to be wait
ed by us if it exits first) and calling setsid
(to remove it from its session, process group, and controlling tty; practically speaking, this means that it won't get SIGHUP
if the terminal it was run from is closed, nor SIGINT
/SIGQUIT
from terminal keys).
But, launching processes is also one case where macOS isn't exactly “unix”; we have a separate implementation using posix_spawn
instead of fork
/exec
(and there are reasons for this). macOS does have a nonstandard flag POSIX_SPAWN_SETSID
but we may need a fallback; it appears in the public xnu
repository as being added in a commit named "xnu-6153.11.26", which is apparently macOS 10.15? And we still support back to 10.12. There's another flag, POSIX_SPAWN_SETEXEC, which skips the built-in fork
, so we might be able to fork + fork + setsid + spawn-as-exec or something like that.
Assignee | ||
Comment 5•2 years ago
•
|
||
macOS is XP_UNIX.
Yes, but nsIProcess uses XP_MACOSX for separate impl as you described later.
I think we could add a flag to LaunchOptions for creating a background task, as described in comment #2;
👍, I'll try writing a patch.
on the Unix side I think we'd implement it by double-forking
Does Unix need anything more here though? This bug exists because of Windows-only JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE
, I believe other platforms are fine. In other words, I don't want to wait for background task processes.
Comment 6•2 years ago
|
||
I agree that, for the sake of launching things off main thread, adding the features we need to LaunchApp
makes much more sense than adapting nsIProcess
to work off main thread.
That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess
there. As I said elsewhere, if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched. And if you do have that shutdown blocker, any runnables you dispatch before the shutdown phase end are guaranteed to run before we fast exit the app. You'll also ideally need to dispatch a runnable to the main thread to clear the shutdown blocker anyway.
Comment 7•2 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #6)
As I said elsewhere, if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.
For more context, which I also added elsewhere: Right now, fast shutdown always happens after XPCOM thread shutdown, so anything dispatched to an XPCOM thread event queue in an early shutdown phase can reasonably be expected to run to completion before shutdown. And anything it dispatches to the main thread event queue can also be expected to run, since we clear the event queue after we finish each shutdown phase, before we advance the phase and possibly trigger fast shutdown.
But the goal, as I understand it, and as the fast shutdown code code is currently written, is to move the fast shutdown phase to the start of XPCOMShutdownThreads
, i.e., before we dispatch the "xpcom-shutdown-threads" observer notification and explicitly shutdown non-main threads. Which means that threads will no longer be able to expect runnables dispatched from earlier shutdown phases to actually run before app shutdown unless they explicitly block shutdown until they have completed.
Assignee | ||
Comment 8•2 years ago
|
||
That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.
I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.
if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.
I believe this is done elsewhere in the code I'm touching as 1) it's being launched in an existing shutdown cleanup function 2) we have seen shutdown crash there because of long shutdown, that means we absolutely are waiting for the cleanup function to finish. Thus, shutdown blocking shouldn't be something we should worry about. (And that means... XPCOM should always be alive at that point I guess?)
Comment 9•2 years ago
|
||
(In reply to Kagami :saschanaz from comment #5)
macOS is XP_UNIX.
Yes, but nsIProcess uses XP_MACOSX for separate impl as you described later.
Thanks for the correction; I'd forgotten about that part of nsProcess
.
on the Unix side I think we'd implement it by double-forking
Does Unix need anything more here though? This bug exists because of Windows-only
JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE
, I believe other platforms are fine. In other words, I don't want to wait for background task processes.
The main problem I see on Unix is when we use this and the parent process isn't already shutting down; when the child process exits, it becomes a zombie process until the parent uses one of the wait*
calls to collect it. So we'd either need something to monitor it and waitpid
, or fork twice so the process is reparented to init
(which will collect it). There are also the issues I mentioned if the child process inherits the browser's process group and/or controlling tty; that won't kill the process automatically when the rest of the app exits like what seems to be happening on Windows, but it could affect some users and it might not be the result we want.
Comment 10•2 years ago
|
||
(In reply to Kagami :saschanaz from comment #8)
if you're expecting a background thread to be able to reliably launch a process during shutdown, you need to register an async shutdown blocker that doesn't resolve until the process is launched.
I believe this is done elsewhere in the code I'm touching as 1) it's being launched in an existing shutdown cleanup function 2) we have seen shutdown crash there because of long shutdown, that means we absolutely are waiting for the cleanup function to finish. Thus, shutdown blocking shouldn't be something we should worry about. (And that means... XPCOM should always be alive at that point I guess?)
That's not necessarily true. As long as we don't fast shutdown before XPCOM thread shutdown, you can probably mostly rely on cleanup tasks on background threads to complete before shutdown, but we won't always wait until after thread shutdown to fast quit, at which point any tasks without a blocker that waits for them to complete will run or fail sporadically depending on timing.
(In reply to Kagami :saschanaz from comment #8)
That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.
I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.
OK.
Assignee | ||
Comment 11•2 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #10)
That's not necessarily true. As long as we don't fast shutdown before XPCOM thread shutdown, you can probably mostly rely on cleanup tasks on background threads to complete before shutdown, but we won't always wait until after thread shutdown to fast quit, at which point any tasks without a blocker that waits for them to complete will run or fail sporadically depending on timing.
Hmm, indeed I don't see any addBlocker
call in netwerk/cache2
nor in toolkit/components/cleardata
. But I'm not sure calling a process opening function takes longer than existing storage I/O, does it? In that case I think we can just file a bug as a potential existing bug for each component.
How would a synchronous C++ function use addBlocker
, btw? The callback won't be called until the synchronous function finishes, right? Could you give me some examples?
(In reply to Kagami :saschanaz from comment #8)
That said, I still think the easiest solution for your use case is to dispatch a runnable to the main thread and use nsIProcess there.
I tried this and it turns out it fails with no debuggable reason at startup phase. I'd rather not spend more time to make a workaround work. The background task I'm trying to launch will run both at startup and shutdown, so it should work at both phase reliably.
OK.
(If anyone has any idea why that's failing or have seen similar situation, please help)
Assignee | ||
Comment 12•2 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #9)
The main problem I see on Unix is when we use this and the parent process isn't already shutting down; when the child process exits, it becomes a zombie process until the parent uses one of the
wait*
calls to collect it. So we'd either need something to monitor it andwaitpid
, or fork twice so the process is reparented toinit
(which will collect it). There are also the issues I mentioned if the child process inherits the browser's process group and/or controlling tty; that won't kill the process automatically when the rest of the app exits like what seems to be happening on Windows, but it could affect some users and it might not be the result we want.
Interesting, does it apply to all processing opening functions e.g. PR_CreateProcess
?
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 13•2 years ago
|
||
This for now only applies to Windows. macOS/Linux support may follow as a separate effort.
Updated•2 years ago
|
Comment 14•2 years ago
|
||
Comment 15•2 years ago
|
||
bugherder |
Comment 16•2 years ago
|
||
I think that this change causes these Thunderbird mochitest failures. This line appears in the logs which wasn't there before:
[Parent 7620, Cache2 I/O] WARNING: CreateProcess Failed: 5: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_win.cc:313
Any ideas what's going wrong here? (This is well outside my areas of expertise, I'm just investigating the test failures.)
Assignee | ||
Comment 17•2 years ago
•
|
||
Quite not sure what is happening there, might be more related to bug 1791675 than this one.
Can you check the value of argv
in https://searchfox.org/comm-central/rev/d6d09ef935d8da3fc484514178638167b7460ac4/mozilla/toolkit/components/backgroundtasks/BackgroundTasksRunner.cpp#47 ? And perhaps what happens if you comment out line 33?
Comment 18•2 years ago
|
||
It finally occurred to me I could use the Try server to answer the questions, as I have no Windows build to experiment with.
I commented out start_independent = true
and printf'ed argv
:
- Z:\task_166915801652956\build\application\thunderbird\thunderbird.exe
- --backgroundtask
- removeDirectory
- C:\Users\task_166915801652956\AppData\Local\Temp\tmpn0cb_uis.mozrunner
- 0
- .purge.bg_rm
- (null)
Assignee | ||
Comment 19•2 years ago
|
||
argv doesn't seem problematic. Did commenting out the flag fix the test failures?
Perhaps Thunderbird just doesn't support background tasks? Previously purgeHTTPCache caller didn't warn when the process launch fails, so it may be just that we detected an existing breakage. Thunderbird still seemingly has MOZ_BACKGROUNDTASKS so I'm not sure how that can happen, though. Nick, do you have any idea?
Comment 20•2 years ago
|
||
Did commenting out the flag fix the test failures?
It did.
Comment 21•2 years ago
|
||
Background tasks should generally work in Thunderbird (automated tests run successfully for them.)
Assignee | ||
Comment 22•2 years ago
•
|
||
Hmmmm. In that case the only thing I can think of is that somehow Thunderbird is not setting JOB_OBJECT_LIMIT_BREAKAWAY_OK
when creating a job, per the flag description: https://learn.microsoft.com/en-us/windows/win32/procthread/process-creation-flags#flags
The child processes of a process associated with a job are not associated with the job.
If the calling process is not associated with a job, this constant has no effect. If the calling process is associated with a job, the job must set the JOB_OBJECT_LIMIT_BREAKAWAY_OK limit.
Could you try adding this function to mozilla/ipc/chromium/src/base/process_util_win.cc
, call it from https://searchfox.org/comm-central/rev/d99c2ec689b941ae120776f423726f59875957a5/mozilla/ipc/chromium/src/base/process_util_win.cc#297 and check the result? printf for each branch will help.
I'll also try getting some time to investigate.
Assignee | ||
Updated•2 years ago
|
Comment 23•2 years ago
|
||
NeedToBreakAwayFromJob
returned false from the last branch, JOB_OBJECT_LIMIT_BREAKAWAY_OK
isn't set.
Assignee | ||
Comment 24•2 years ago
|
||
Interesting! Now it's clear Thunderbird uses jobs on Windows but for some reason without JOB_OBJECT_LIMIT_BREAKAWAY_OK. The task is to identify how it's launching the process and add that flag.
Is this something Thunderbird people can investigate?
Comment 25•2 years ago
|
||
Rob, can you offer any assistance here? I think this is as far as I can reasonably get on this one.
Comment 26•2 years ago
|
||
(In reply to Kagami :saschanaz from comment #19)
argv doesn't seem problematic. Did commenting out the flag fix the test failures?
Perhaps Thunderbird just doesn't support background tasks? Previously purgeHTTPCache caller didn't warn when the process launch fails, so it may be just that we detected an existing breakage. Thunderbird still seemingly has MOZ_BACKGROUNDTASKS so I'm not sure how that can happen, though. Nick, do you have any idea?
I have little insight here, but I will note that the error is CreateProcess failed: 5
(from here). Error 5 is "Access is denied"; I think that means that the relevant EXE either doesn't exist or can't be executed. Are we confident that the backslashes are correctly interpreted on Windows in this invocation?
Also, it rather looks like we have an empty string argument. Kagami, I recall you introduced some special handling/quoting for that case. Can you suggest how to dump the actual invocation command line to be sure it's being handled correctly?
Assignee | ||
Comment 27•2 years ago
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #26)
I have little insight here, but I will note that the error is
CreateProcess failed: 5
(from here). Error 5 is "Access is denied"; I think that means that the relevant EXE either doesn't exist or can't be executed. Are we confident that the backslashes are correctly interpreted on Windows in this invocation?
Comment #20 and comment #23 suggest this is just a flag issue, IMO. Docs say it must have JOB_OBJECT_LIMIT_BREAKAWAY_OK
, so that's probably this.
Also, it rather looks like we have an empty string argument. Kagami, I recall you introduced some special handling/quoting for that case. Can you suggest how to dump the actual invocation command line to be sure it's being handled correctly?
Printf-ing cmdline
should do that.
Comment 28•2 years ago
|
||
Background tasks work just fine on my test win 11 VM. I ran a removeDirectory and a success task no problems. So running the command itself works.
I will try to get a debugger on this, but it might not be until tomorrow.
Assignee | ||
Comment 29•2 years ago
•
|
||
I wonder how the Thunderbird main process launches. Does it hit mozilla::LauncherMain? Maybe it's using a different path?
Comment 30•2 years ago
|
||
I knew this would come up...
No, Thunderbird builds do not use LauncherMain
. That header file lives at /browser/app/winlauncher
, and (rightly) is not part of the build.
IIRC, when that whole directory was created a couple of years back, there was some initial Thunderbird build bustage. At the time I was told that Thunderbird didn't need all of that, and a fix was made somewhere in toolkit I think to fix the bustage.
Maybe that assessment is no longer true and Thunderbird needs the winlauncher code now? Maybe not? I never really understood why it was deemed unnecessary. If Thunderbird should be using winlauncher, there's likely some significant work to do to make it work. I can help with build config but C++ not so much.
Comment 31•2 years ago
|
||
I didn't notice at the time, but 24 hours ago the test tasks stopped failing on TaskCluster. I'm assuming that bug 1765732 covered up the problem but it still exists. Does that matter though?
Description
•