Open Bug 1655793 Opened 3 months ago Updated 2 months ago

Crash in [@ mozilla::AppShutdown::DoImmediateExit]

Categories

(Core :: XPCOM, defect, P3)

74 Branch
All
Windows
defect

Tracking

()

Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fix-optional

People

(Reporter: philipp, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

This bug is for crash report bp-1eaa08b4-bce4-45a7-b754-599740200728.

Top 10 frames of crashing thread:

0 xul.dll static mozilla::AppShutdown::DoImmediateExit xpcom/base/AppShutdown.cpp:231
1 xul.dll static mozilla::AppShutdown::MaybeFastShutdown xpcom/base/AppShutdown.cpp:188
2 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:726
3 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1286
4 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4883
5 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4920
6 firefox.exe NS_internal_main browser/app/nsBrowserApp.cpp:331
7 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:131
8 firefox.exe __scrt_common_main_seh /builds/worker/workspace/obj-build/browser/app/f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288
9 kernel32.dll BaseThreadInitThunk 

this crash signature is regressing since firefox 74.0 - judging by the pre-release crash volume and the early rollout-data from 79.0 on release, the crash seems to get more frequent with firefox 79 though.
crashes are happening in MOZ_CRASH("TerminateProcess failed.") that got added in bug 1606880 and curiously mostly seem to hit users on spanish builds.

Flags: needinfo?(dothayer)

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

Flags: needinfo?(dothayer)

any idea what might make this issue more common starting in firefox 79?

Yes - in 79 we started terminating the browser with TerminateProcess in the very late stages of shutdown. Previously we just used this to terminate content processes in some circumstances.

See Also: → 1606879

(In reply to [:philipp] from comment #0)

crashes are happening in MOZ_CRASH("TerminateProcess failed.") that got added in bug 1606880 and curiously mostly seem to hit users on spanish builds.

The spanish connection is kind of odd. Do we have telemetry environment info on whether the profile dir is non-ascii or something, perhaps we're mis-encoding a path we're passing somewhere or something? Any other correlations that shed light on this particular stat?

Flags: needinfo?(madperson)

in the meantime, dll correlations for this signature got generated and this one sticks out:
(88.76% in signature vs 00.33% overall) Module "Graphics64.dll" = true

...and it seems to be part of the "cyberclient" product by this company creating software for internet cafe providers: https://tenaxsoft.com/ (and thus probably explaining the "spanish connection")

Flags: needinfo?(madperson)

Nathan I'm triaging this bug as REO for 79. As triage owner, can you set a priority and severity for this bug?

Flags: needinfo?(nfroyd)

(In reply to Doug Thayer [:dthayer] (he/him) from comment #1)

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

I think before this we were just calling _exit(0);; can we go back to doing that? One would think from reading the documentation on TerminateProcess that terminating yourself should just work, but that is apparently not the case.

Alternatively, windows experts say that !gle in windbg will (ideally) tell us why TerminateProcess is failing, so maybe we could attempt to fix that instead.

Flags: needinfo?(dothayer)

(In reply to Nathan Froyd [:froydnj] from comment #7)

(In reply to Doug Thayer [:dthayer] (he/him) from comment #1)

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

I think before this we were just calling _exit(0);; can we go back to doing that? One would think from reading the documentation on TerminateProcess that terminating yourself should just work, but that is apparently not the case.

Alternatively, windows experts say that !gle in windbg will (ideally) tell us why TerminateProcess is failing, so maybe we could attempt to fix that instead.

There were two locations which were doing similar things, which we merged into one. One of them was calling _exit(0), and the other was calling TerminateProcess. The problem with calling _exit(0) is that it calls DllMain on any loaded assemblies with a DLL_PROCESS_DETACH argument. This can cause Bad Things to happen, because it immediately terminates any threads running, which could for instance be locking on a mutex which the DllMain for the loaded assembly may want to try to lock, thus inducing a deadlock. So if we're going to forcefully terminate things, we need to ensure we're forcefully terminating everything.

I suppose we could annotate the crash with the result of ::GetLastError() though? I don't see why we need to / how we could involve windbg without being able to reproduce this locally. I have a suspicion that it will return ERROR_ACCESS_DENIED, and it will turn out that some third-party software has for some reason revoked our PROCESS_TERMINATE access right.

Flags: needinfo?(dothayer)

(In reply to [:philipp] from comment #5)

in the meantime, dll correlations for this signature got generated and this one sticks out:
(88.76% in signature vs 00.33% overall) Module "Graphics64.dll" = true

...and it seems to be part of the "cyberclient" product by this company creating software for internet cafe providers: https://tenaxsoft.com/ (and thus probably explaining the "spanish connection")

Given that strong correlation I wonder if this software is hooking TerminateProcess.

FWIW, chromium has seen the same sort of weird behavior, also correlated with Graphics64.dll: https://bugs.chromium.org/p/chromium/issues/detail?id=820518

I'd like to try and look at a minidump later today, but I won't object if somebody beats me to it. :)

(In reply to Doug Thayer [:dthayer] (he/him) from comment #8)

I suppose we could annotate the crash with the result of ::GetLastError() though? I don't see why we need to / how we could involve windbg without being able to reproduce this locally.

I thought minidumps were supposed to contain this information, which WinDbg can display with the !gle extension, but after it failed on several reports I remembered that we only enable MiniDumpWithProcessThreadData for the nightly channel: https://searchfox.org/mozilla-central/rev/ab81b8552f4aa9696a2524f97fdfeb59d4dc31c1/toolkit/crashreporter/nsExceptionHandler.cpp#1789-1792

Fortunately I found a single report from nightly which has these symptoms (bp-cadb906d-96d3-4142-a142-0dd1c0200314) and the result is:

0:000> !gle
LastErrorValue: (Win32) 0x6 (6) - The handle is invalid.
LastStatusValue: (NTSTATUS) 0xc0000008 - An invalid HANDLE was specified.

(In reply to :dmajor from comment #11)

Fortunately I found a single report from nightly which has these symptoms (bp-cadb906d-96d3-4142-a142-0dd1c0200314) and the result is:

0:000> !gle
LastErrorValue: (Win32) 0x6 (6) - The handle is invalid.
LastStatusValue: (NTSTATUS) 0xc0000008 - An invalid HANDLE was specified.

How can the handle to the current process be invalid?

I suspect that TerminateProcess or one of its transitive callees is being redirected.

I don't think there's really anything we can do here.

Severity: -- → S2
Flags: needinfo?(nfroyd)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.