Closed Bug 1912236 Opened 3 months ago Closed 3 months ago

Port bug 1908725: Provide a never-reused PID alternative for all Gecko processes

Categories

(Thunderbird :: Upstream Synchronization, task)

Thunderbird 131

Tracking

(Not tracked)

RESOLVED FIXED
131 Branch

People

(Reporter: babolivier, Assigned: babolivier)

References

Details

Attachments

(3 files)

https://phabricator.services.mozilla.com/D218471 removed content_process_main, which Thunderbird currently uses

Type: enhancement → task

Pushed by brendan@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/2a91c425ea03
Port bug 1908725: Adapt to recent changes to application startup code. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED

Reopening since we're still having issues with this upstream change.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Hi Nika, we're having a few issues porting bug 1908725 to Thunderbird. The builds work with the patches that are already listed here (which isn't to say they're entirely correct), but we're hitting some failures with tests and I was wondering if you could share your thoughts on what might be the cause, as the author of the aforementioned bug.

A number of our Linux mochitests are timing out with this output:

 0:05.82 GECKO(980519) [980632, Main Thread] WARNING: XPCOM object ProfilerParentTracker destroyed from static ctor/dtor: file /home/babolivier/Documents/mozilla/mozilla-central/xpcom/base/nsTraceRefcnt.cpp:216
 0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC Connection Error: [Parent][PContentParent] Send(msgname=PWindowGlobal::Msg_RawMessage) Channel error: cannot send/recv: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/MessageChannel.cpp:1943
 0:05.83 INFO Console message: [JavaScript Warning: "HTTPS-First Mode: Upgrading insecure speculative TCP connection “http://example.org/browser/comm/mail/base/test/widgets/files/links.html” to use “https”."]
 0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC message 'PBrowser::Msg_Destroy' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
 0:05.83 GECKO(980519) [Parent 980519, ProcessHangMon] WARNING: IPC message 'PProcessHangMonitor::Msg_RequestContentJSInterrupt' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
 0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: build ID mismatch false alarm: file /home/babolivier/Documents/mozilla/mozilla-central/dom/base/nsFrameLoader.cpp:3739
 0:05.83 GECKO(980519) [Parent 980519, GMPThread] WARNING: Failed to delete GMP storage directory: file /home/babolivier/Documents/mozilla/mozilla-central/dom/media/gmp/GMPServiceParent.cpp:1909
 0:05.83 GECKO(980519) console.error:
 0:05.83 GECKO(980519)   remote browser crashed while on
 0:05.83 GECKO(980519)   about:blank
 0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: IPC message 'PContent::Msg_CommitBrowsingContextTransaction' discarded: actor cannot send: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/glue/ProtocolUtils.cpp:557
 0:05.83 GECKO(980519) [Parent 980519, Main Thread] WARNING: content process childid = 1 pid = 980632 crashed without leaving a minidump behind: file /home/babolivier/Documents/mozilla/mozilla-central/dom/ipc/ContentParent.cpp:4246
 0:05.83 GECKO(980519) [Parent 980519, IPC I/O Parent] WARNING: process 980632 exited with status 1: file /home/babolivier/Documents/mozilla/mozilla-central/ipc/chromium/src/base/process_util_posix.cc:341

This specific output comes from https://searchfox.org/comm-central/source/mail/base/test/widgets/browser_linkHandler.js, when running the first test in there (testNoGroup) - I have also seen it coming from other tests (most of the mochitests that are timing out on Linux here: https://treeherder.mozilla.org/jobs?repo=comm-central&revision=2a91c425ea0326dadc9953b9ae2125aaf2558d4e).

We're also seeing instances of bug 1824444 (except permanent) on Windows debug mochitest runs, and "Tests should have failed" on MSIX xpcshell-tests for Windows (both opt and debug). I haven't managed to make 100% sure these failures are related to bug 1908725 (or the patches in here), but they started appearing after its patches were pushed to mozilla-central, so I have a suspicion that might be the case. Here is a try run that exhibits these failures: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=0d3a726e5d4285661b3291aba10e4e39d8e7b62c

As I'm a bit unfamiliar with Firefox's IPC, I'm struggling to figure out what's going on, and how to properly fix this - would you be able to have a quick look and share some thoughts and/or pointers, please?

Flags: needinfo?(nika)
Keywords: leave-open
Attachment #9418325 - Attachment description: Bug 1912236 - Follow-up: move misplaced include. r=#thunderbird-reviewers → Bug 1912236 - Follow-up: Correct include and initialization of the exception handler. r=#thunderbird-reviewers

Turns out I missed some bits when initially porting D218471, these errors seem to disappear with the latest version of the follow-up patch. Sorry for the ping, Nika!

Flags: needinfo?(nika)
Target Milestone: --- → 131 Branch

It appears the bug is that the changes to nsMailApp.cpp didn't fully cover the changes to nsBrowserApp.cpp, so the childID and BrowserID arguments are not being removed, and are not being properly initialized, which is likely leading to cascading IPC failures.

You probably want int main to look more like this (untested, I applied the changes I remember doing to nsBrowserApp.cpp to nsMailApp.cpp):

diff --git a/mail/app/nsMailApp.cpp b/mail/app/nsMailApp.cpp
index ceab93af3f4e0..beb1031c9d8e1 100644
--- a/mail/app/nsMailApp.cpp
+++ b/mail/app/nsMailApp.cpp
@@ -253,53 +253,55 @@ uint32_t gBlocklistInitFlags = eDllBlocklistInitFlagDefault;
 #endif
 
 int main(int argc, char* argv[], char* envp[]) {
-#if defined(MOZ_ENABLE_FORKSERVER)
-  if (strcmp(argv[argc - 1], "forkserver") == 0) {
-    nsresult rv = InitXPCOMGlue(LibLoadingStrategy::NoReadAhead);
-    if (NS_FAILED(rv)) {
-      return 255;
-    }
+#ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC
+  if (argc > 1 && IsArg(argv[1], "contentproc")) {
+    // Set the process type and gecko child id.
+    SetGeckoProcessType(argv[--argc]);
+    SetGeckoChildID(argv[--argc]);
+
+#  if defined(MOZ_ENABLE_FORKSERVER)
+    if (GetGeckoProcessType() == GeckoProcessType_ForkServer) {
+      nsresult rv = InitXPCOMGlue(LibLoadingStrategy::NoReadAhead);
+      if (NS_FAILED(rv)) {
+        return 255;
+      }
 
-    // Run a fork server in this process, single thread.  When it
-    // returns, it means the fork server have been stopped or a new
-    // content process is created.
-    //
-    // For the later case, XRE_ForkServer() will return false, running
-    // in a content process just forked from the fork server process.
-    // argc & argv will be updated with the values passing from the
-    // chrome process.  With the new values, this function
-    // continues the reset of the code acting as a content process.
-    if (gBootstrap->XRE_ForkServer(&argc, &argv)) {
-      // Return from the fork server in the fork server process.
-      // Stop the fork server.
-      gBootstrap->NS_LogTerm();
-      return 0;
+      // Run a fork server in this process, single thread. When it returns, it
+      // means the fork server have been stopped or a new child process is
+      // created.
+      //
+      // For the latter case, XRE_ForkServer() will return false, running in a
+      // child process just forked from the fork server process. argc & argv
+      // will be updated with the values passing from the chrome process, as
+      // will GeckoProcessType and GeckoChildID. With the new values, this
+      // function continues the reset of the code acting as a child process.
+      if (gBootstrap->XRE_ForkServer(&argc, &argv)) {
+        // Return from the fork server in the fork server process.
+        // Stop the fork server.
+        // InitXPCOMGlue calls NS_LogInit, so we need to balance it here.
+        gBootstrap->NS_LogTerm();
+        return 0;
+      }
     }
-    // In a content process forked from the fork server.
-    // Start acting as a content process.
+#  endif
   }
 #endif
 
   mozilla::TimeStamp start = mozilla::TimeStamp::Now();
 
+  // Register an external module to report on otherwise uncatchable exceptions.
+  // Note that in child processes this must be called after Gecko process type
+  // has been set.
+  CrashReporter::RegisterRuntimeExceptionModule();
+
   // Make sure we unregister the runtime exception module before returning.
-  // We do this here to cover both registers for child and main processes.
   auto unregisterRuntimeExceptionModule =
       MakeScopeExit([] { CrashReporter::UnregisterRuntimeExceptionModule(); });
 
 #ifdef MOZ_BROWSER_CAN_BE_CONTENTPROC
   // We are launching as a content process, delegate to the appropriate
   // main
-  if (argc > 1 && IsArg(argv[1], "contentproc")) {
-    // Set the process type. We don't remove the arg here as that will be done
-    // later in common code.
-    SetGeckoProcessType(argv[argc - 1]);
-
-    // Register an external module to report on otherwise uncatchable
-    // exceptions. Note that in child processes this must be called after Gecko
-    // process type has been set.
-    CrashReporter::RegisterRuntimeExceptionModule();
-
+  if (GetGeckoProcessType() != GeckoProcessType_Default) {
 #  ifdef HAS_DLL_BLOCKLIST
     DllBlocklist_Initialize(gBlocklistInitFlags |
                             eDllBlocklistInitFlagIsChildProcess);
@@ -361,9 +363,6 @@ int main(int argc, char* argv[], char* envp[]) {
   }
 #endif
 
-  // Register an external module to report on otherwise uncatchable exceptions.
-  CrashReporter::RegisterRuntimeExceptionModule();
-
 #ifdef HAS_DLL_BLOCKLIST
   DllBlocklist_Initialize(gBlocklistInitFlags);
 #endif

The previous patch changed when and how we parse the early command line for process type and ID information, so that needed to be copied over, this also included some tweaks to the forkserver logic (though I'm unsure if thunderbird actually uses that code).


On a side note, it does seem that nsMailApp.cpp is largely the same as nsBrowserApp.cpp with only a few changes. Some of these appear to be intentional, while others appear to be changes which didn't cause Thunderbird to stop building, so were never ported from nsBrowserApp.cpp.

As far as I can tell on first glance, the main relevant changes are:

  • Thunderbird uses a different appDataPath
    -#define kDesktopFolder "browser"
    +#define kDesktopFolder ""
    
  • Using the app name in error messageboxes.
    -      messageBoxW(nullptr, wide_msg, L"Firefox",
    +      messageBoxW(nullptr, wide_msg, L"Thunderbird",
                       MB_OK | MB_ICONERROR | MB_SETFOREGROUND);
    
  • Different EnsureCommandlineSafe arguments because thunderbird doesn't use the launcher process.
    -  EnsureBrowserCommandlineSafe(argc, argv);
    +  // Note: FF needs to keep in sync with LauncherProcessWin,
    +  //       TB doesn't have that file.
    +  const char* acceptableParams[] = {"compose", "mail", nullptr};
    +  EnsureCommandlineSafe(argc, argv, acceptableParams);
    
  • Thunderbird probably doesn't want the pre-XUL skeleton UI, so didn't port it over.
    -
    -  // Once the browser process hits the main function, we no longer need
    -  // a writable section handle because all dependent modules have been
    -  // loaded.
    -  mozilla::freestanding::gSharedSection.ConvertToReadOnly();
    -
    -  mozilla::CreateAndStorePreXULSkeletonUI(GetModuleHandle(nullptr), argc, argv);
    

Beyond that there are a number of small things missing from nsMailApp, such as process-specific flags for the DLLBlocklist, win32k lockdown support, base profiler initialization, some fuzzer-driver stuff, and default file descriptor reservation on unix.

I wonder if it might make sense long-term for Thunderbird to try to use the same entry point as Firefox and apply some of these tweaks with #ifdefs in nsBrowserApp.cpp such that they don't desync more in the future.

Wrote that comment before you found the core of the issue, but figured I'd still post it. I've left a comment on your version of the patch with a thing you missed.

Flags: needinfo?(brendan)

Thank you so much, Brendan is off already but I'll take care of those last few changes.

Flags: needinfo?(brendan)
Attachment #9418325 - Attachment description: Bug 1912236 - Follow-up: Correct include and initialization of the exception handler. r=#thunderbird-reviewers → Bug 1912236 - Follow-up: Correct include and initialization of the exception handler. r=#thunderbird-reviewers,nika

I added you as a reviewer for the updated patch, if you're still around to give it a final look

Flags: needinfo?(nika)

Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/07c8c90d28eb
Follow-up: Correct include and initialization of the exception handler. r=tobyp

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED

(In reply to Alessandro Castellani [:aleca] from comment #10)

I added you as a reviewer for the updated patch, if you're still around to give it a final look

You put the registration with the exception handler in the wrong place unfortunately. I left a comment in the patch with the place it should actually happen: https://phabricator.services.mozilla.com/D218844#inline-1216776, https://phabricator.services.mozilla.com/D218844#inline-1216778

Flags: needinfo?(nika) → needinfo?(alessandro)

Thanks for looking at it, I'll do a follow up and ping you for a review.

Flags: needinfo?(alessandro)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Pushed by alessandro@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/c7df29bada38
Register the crash reporter runtime in the correct place. r=mkmelin

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: