Port bug 1908725: Provide a never-reused PID alternative for all Gecko processes
Categories
(Thunderbird :: Upstream Synchronization, task)
Tracking
(Not tracked)
People
(Reporter: babolivier, Assigned: babolivier)
References
Details
Attachments
(3 files)
https://phabricator.services.mozilla.com/D218471 removed content_process_main
, which Thunderbird currently uses
Updated•3 months ago
|
Assignee | ||
Comment 1•3 months ago
|
||
Pushed by brendan@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/2a91c425ea03
Port bug 1908725: Adapt to recent changes to application startup code. r=mkmelin
Assignee | ||
Comment 3•3 months ago
|
||
Assignee | ||
Comment 4•3 months ago
|
||
Reopening since we're still having issues with this upstream change.
Assignee | ||
Comment 5•3 months ago
•
|
||
Hi Nika, we're having a few issues porting bug 1908725 to Thunderbird. The builds work with the patches that are already listed here (which isn't to say they're entirely correct), but we're hitting some failures with tests and I was wondering if you could share your thoughts on what might be the cause, as the author of the aforementioned bug.
A number of our Linux mochitests are timing out with this output:
0:05.82 GECKO(980519) [980632, Main Thread] WARNING: XPCOM object ProfilerParentTracker destroyed from static ctor/dtor: file /home/babolivier/Documents/mozilla/mozilla-central/xpcom/base/nsTraceRefcnt.cpp:216
0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC Connection Error: [Parent][PContentParent] Send(msgname=PWindowGlobal::Msg_RawMessage) Channel error: cannot send/recv: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/MessageChannel.cpp:1943
0:05.83 INFO Console message: [JavaScript Warning: "HTTPS-First Mode: Upgrading insecure speculative TCP connection “http://example.org/browser/comm/mail/base/test/widgets/files/links.html” to use “https”."]
0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC message 'PBrowser::Msg_Destroy' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
0:05.83 GECKO(980519) [Parent 980519, ProcessHangMon] WARNING: IPC message 'PProcessHangMonitor::Msg_RequestContentJSInterrupt' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: build ID mismatch false alarm: file /home/babolivier/Documents/mozilla/mozilla-central/dom/base/nsFrameLoader.cpp:3739
0:05.83 GECKO(980519) [Parent 980519, GMPThread] WARNING: Failed to delete GMP storage directory: file /home/babolivier/Documents/mozilla/mozilla-central/dom/media/gmp/GMPServiceParent.cpp:1909
0:05.83 GECKO(980519) console.error:
0:05.83 GECKO(980519) remote browser crashed while on
0:05.83 GECKO(980519) about:blank
0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC message 'PContent::Msg_CommitBrowsingContextTransaction' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: content process childid = 1 pid = 980632 crashed without leaving a minidump behind: file /home/babolivier/Documents/mozilla/mozilla-central/dom/ipc/ContentParent.cpp:4246
0:05.83 GECKO(980519) [Parent 980519, IPC I/O Parent] WARNING: process 980632 exited with status 1: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/chromium/src/base/process_util_posix.cc:341
This specific output comes from https://searchfox.org/comm-central/source/mail/base/test/widgets/browser_linkHandler.js, when running the first test in there (testNoGroup
) - I have also seen it coming from other tests (most of the mochitests that are timing out on Linux here: https://treeherder.mozilla.org/jobs?repo=comm-central&revision=2a91c425ea0326dadc9953b9ae2125aaf2558d4e).
We're also seeing instances of bug 1824444 (except permanent) on Windows debug mochitest runs, and "Tests should have failed" on MSIX xpcshell-tests for Windows (both opt and debug). I haven't managed to make 100% sure these failures are related to bug 1908725 (or the patches in here), but they started appearing after its patches were pushed to mozilla-central, so I have a suspicion that might be the case. Here is a try run that exhibits these failures: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=0d3a726e5d4285661b3291aba10e4e39d8e7b62c
As I'm a bit unfamiliar with Firefox's IPC, I'm struggling to figure out what's going on, and how to properly fix this - would you be able to have a quick look and share some thoughts and/or pointers, please?
Assignee | ||
Updated•3 months ago
|
Updated•3 months ago
|
Assignee | ||
Comment 6•3 months ago
|
||
Turns out I missed some bits when initially porting D218471, these errors seem to disappear with the latest version of the follow-up patch. Sorry for the ping, Nika!
Updated•3 months ago
|
Comment 7•3 months ago
|
||
It appears the bug is that the changes to nsMailApp.cpp
didn't fully cover the changes to nsBrowserApp.cpp
, so the childID and BrowserID arguments are not being removed, and are not being properly initialized, which is likely leading to cascading IPC failures.
You probably want int main
to look more like this (untested, I applied the changes I remember doing to nsBrowserApp.cpp
to nsMailApp.cpp
):
diff --git a/mail/app/nsMailApp.cpp b/mail/app/nsMailApp.cpp
index ceab93af3f4e0..beb1031c9d8e1 100644
--- a/mail/app/nsMailApp.cpp
+++ b/mail/app/nsMailApp.cpp
@@ -253,53 +253,55 @@ uint32_t gBlocklistInitFlags = eDllBlocklistInitFlagDefault;
#endif
int main(int argc, char* argv[], char* envp[]) {
-#if defined(MOZ_ENABLE_FORKSERVER)
- if (strcmp(argv[argc - 1], "forkserver") == 0) {
- nsresult rv = InitXPCOMGlue(LibLoadingStrategy::NoReadAhead);
- if (NS_FAILED(rv)) {
- return 255;
- }
+#ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC
+ if (argc > 1 && IsArg(argv[1], "contentproc")) {
+ // Set the process type and gecko child id.
+ SetGeckoProcessType(argv[--argc]);
+ SetGeckoChildID(argv[--argc]);
+
+# if defined(MOZ_ENABLE_FORKSERVER)
+ if (GetGeckoProcessType() == GeckoProcessType_ForkServer) {
+ nsresult rv = InitXPCOMGlue(LibLoadingStrategy::NoReadAhead);
+ if (NS_FAILED(rv)) {
+ return 255;
+ }
- // Run a fork server in this process, single thread. When it
- // returns, it means the fork server have been stopped or a new
- // content process is created.
- //
- // For the later case, XRE_ForkServer() will return false, running
- // in a content process just forked from the fork server process.
- // argc & argv will be updated with the values passing from the
- // chrome process. With the new values, this function
- // continues the reset of the code acting as a content process.
- if (gBootstrap->XRE_ForkServer(&argc, &argv)) {
- // Return from the fork server in the fork server process.
- // Stop the fork server.
- gBootstrap->NS_LogTerm();
- return 0;
+ // Run a fork server in this process, single thread. When it returns, it
+ // means the fork server have been stopped or a new child process is
+ // created.
+ //
+ // For the latter case, XRE_ForkServer() will return false, running in a
+ // child process just forked from the fork server process. argc & argv
+ // will be updated with the values passing from the chrome process, as
+ // will GeckoProcessType and GeckoChildID. With the new values, this
+ // function continues the reset of the code acting as a child process.
+ if (gBootstrap->XRE_ForkServer(&argc, &argv)) {
+ // Return from the fork server in the fork server process.
+ // Stop the fork server.
+ // InitXPCOMGlue calls NS_LogInit, so we need to balance it here.
+ gBootstrap->NS_LogTerm();
+ return 0;
+ }
}
- // In a content process forked from the fork server.
- // Start acting as a content process.
+# endif
}
#endif
mozilla::TimeStamp start = mozilla::TimeStamp::Now();
+ // Register an external module to report on otherwise uncatchable exceptions.
+ // Note that in child processes this must be called after Gecko process type
+ // has been set.
+ CrashReporter::RegisterRuntimeExceptionModule();
+
// Make sure we unregister the runtime exception module before returning.
- // We do this here to cover both registers for child and main processes.
auto unregisterRuntimeExceptionModule =
MakeScopeExit([] { CrashReporter::UnregisterRuntimeExceptionModule(); });
#ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC
// We are launching as a content process, delegate to the appropriate
// main
- if (argc > 1 && IsArg(argv[1], "contentproc")) {
- // Set the process type. We don't remove the arg here as that will be done
- // later in common code.
- SetGeckoProcessType(argv[argc - 1]);
-
- // Register an external module to report on otherwise uncatchable
- // exceptions. Note that in child processes this must be called after Gecko
- // process type has been set.
- CrashReporter::RegisterRuntimeExceptionModule();
-
+ if (GetGeckoProcessType() != GeckoProcessType_Default) {
# ifdef HAS_DLL_BLOCKLIST
DllBlocklist_Initialize(gBlocklistInitFlags |
eDllBlocklistInitFlagIsChildProcess);
@@ -361,9 +363,6 @@ int main(int argc, char* argv[], char* envp[]) {
}
#endif
- // Register an external module to report on otherwise uncatchable exceptions.
- CrashReporter::RegisterRuntimeExceptionModule();
-
#ifdef HAS_DLL_BLOCKLIST
DllBlocklist_Initialize(gBlocklistInitFlags);
#endif
The previous patch changed when and how we parse the early command line for process type and ID information, so that needed to be copied over, this also included some tweaks to the forkserver logic (though I'm unsure if thunderbird actually uses that code).
On a side note, it does seem that nsMailApp.cpp
is largely the same as nsBrowserApp.cpp
with only a few changes. Some of these appear to be intentional, while others appear to be changes which didn't cause Thunderbird to stop building, so were never ported from nsBrowserApp.cpp
.
As far as I can tell on first glance, the main relevant changes are:
- Thunderbird uses a different
appDataPath
-#define kDesktopFolder "browser" +#define kDesktopFolder ""
- Using the app name in error messageboxes.
- messageBoxW(nullptr, wide_msg, L"Firefox", + messageBoxW(nullptr, wide_msg, L"Thunderbird", MB_OK | MB_ICONERROR | MB_SETFOREGROUND);
- Different
EnsureCommandlineSafe
arguments because thunderbird doesn't use the launcher process.- EnsureBrowserCommandlineSafe(argc, argv); + // Note: FF needs to keep in sync with LauncherProcessWin, + // TB doesn't have that file. + const char* acceptableParams[] = {"compose", "mail", nullptr}; + EnsureCommandlineSafe(argc, argv, acceptableParams);
- Thunderbird probably doesn't want the pre-XUL skeleton UI, so didn't port it over.
- - // Once the browser process hits the main function, we no longer need - // a writable section handle because all dependent modules have been - // loaded. - mozilla::freestanding::gSharedSection.ConvertToReadOnly(); - - mozilla::CreateAndStorePreXULSkeletonUI(GetModuleHandle(nullptr), argc, argv);
Beyond that there are a number of small things missing from nsMailApp
, such as process-specific flags for the DLLBlocklist, win32k lockdown support, base profiler initialization, some fuzzer-driver stuff, and default file descriptor reservation on unix.
I wonder if it might make sense long-term for Thunderbird to try to use the same entry point as Firefox and apply some of these tweaks with #ifdef
s in nsBrowserApp.cpp
such that they don't desync more in the future.
Comment 8•3 months ago
|
||
Wrote that comment before you found the core of the issue, but figured I'd still post it. I've left a comment on your version of the patch with a thing you missed.
Comment 9•3 months ago
|
||
Thank you so much, Brendan is off already but I'll take care of those last few changes.
Updated•3 months ago
|
Comment 10•3 months ago
|
||
I added you as a reviewer for the updated patch, if you're still around to give it a final look
Updated•3 months ago
|
Comment 11•3 months ago
|
||
Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/07c8c90d28eb
Follow-up: Correct include and initialization of the exception handler. r=tobyp
Comment 12•3 months ago
|
||
(In reply to Alessandro Castellani [:aleca] from comment #10)
I added you as a reviewer for the updated patch, if you're still around to give it a final look
You put the registration with the exception handler in the wrong place unfortunately. I left a comment in the patch with the place it should actually happen: https://phabricator.services.mozilla.com/D218844#inline-1216776, https://phabricator.services.mozilla.com/D218844#inline-1216778
Comment 13•3 months ago
|
||
Thanks for looking at it, I'll do a follow up and ping you for a review.
Updated•3 months ago
|
Comment 14•3 months ago
|
||
Updated•3 months ago
|
Comment 15•3 months ago
|
||
Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/c7df29bada38
Register the crash reporter runtime in the correct place. r=mkmelin
Description
•