1655793 - Crash in [@ mozilla::AppShutdown::DoImmediateExit]

Reporter

Description

•

5 years ago

This bug is for crash report bp-1eaa08b4-bce4-45a7-b754-599740200728.

Top 10 frames of crashing thread:

0 xul.dll static mozilla::AppShutdown::DoImmediateExit xpcom/base/AppShutdown.cpp:231
1 xul.dll static mozilla::AppShutdown::MaybeFastShutdown xpcom/base/AppShutdown.cpp:188
2 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:726
3 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1286
4 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4883
5 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:4920
6 firefox.exe NS_internal_main browser/app/nsBrowserApp.cpp:331
7 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:131
8 firefox.exe __scrt_common_main_seh /builds/worker/workspace/obj-build/browser/app/f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288
9 kernel32.dll BaseThreadInitThunk

this crash signature is regressing since firefox 74.0 - judging by the pre-release crash volume and the early rollout-data from 79.0 on release, the crash seems to get more frequent with firefox 79 though.
crashes are happening in MOZ_CRASH("TerminateProcess failed.") that got added in bug 1606880 and curiously mostly seem to hit users on spanish builds.

Flags: needinfo?(dothayer)

Alex Thayer [:alexical] (she/her)

Comment 1

•

5 years ago

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

Flags: needinfo?(dothayer)

[:philipp]

Reporter

Comment 2

•

5 years ago

any idea what might make this issue more common starting in firefox 79?

Alex Thayer [:alexical] (she/her)

Comment 3

•

5 years ago

Yes - in 79 we started terminating the browser with TerminateProcess in the very late stages of shutdown. Previously we just used this to terminate content processes in some circumstances.

[:philipp]

Reporter

Updated

•

5 years ago

Comment 4

•

5 years ago

(In reply to [:philipp] from comment #0)

crashes are happening in MOZ_CRASH("TerminateProcess failed.") that got added in bug 1606880 and curiously mostly seem to hit users on spanish builds.

The spanish connection is kind of odd. Do we have telemetry environment info on whether the profile dir is non-ascii or something, perhaps we're mis-encoding a path we're passing somewhere or something? Any other correlations that shed light on this particular stat?

Flags: needinfo?(madperson)

[:philipp]

Reporter

Comment 5

•

5 years ago

•

Edited

in the meantime, dll correlations for this signature got generated and this one sticks out:
(88.76% in signature vs 00.33% overall) Module "Graphics64.dll" = true

...and it seems to be part of the "cyberclient" product by this company creating software for internet cafe providers: https://tenaxsoft.com/ (and thus probably explaining the "spanish connection")

Flags: needinfo?(madperson)

Kim Moir [:kmoir] ET

Comment 6

•

5 years ago

Nathan I'm triaging this bug as REO for 79. As triage owner, can you set a priority and severity for this bug?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 7

•

5 years ago

(In reply to Doug Thayer [:dthayer] (he/him) from comment #1)

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

I think before this we were just calling _exit(0);; can we go back to doing that? One would think from reading the documentation on TerminateProcess that terminating yourself should just work, but that is apparently not the case.

Alternatively, windows experts say that !gle in windbg will (ideally) tell us why TerminateProcess is failing, so maybe we could attempt to fix that instead.

Flags: needinfo?(dothayer)

Alex Thayer [:alexical] (she/her)

Comment 8

•

5 years ago

(In reply to Nathan Froyd [:froydnj] from comment #7)

(In reply to Doug Thayer [:dthayer] (he/him) from comment #1)

I'm not sure if there's much we can or want to do about this. It's also not very critical at all. Crashing basically accomplishes exactly what we were hoping for: terminating the process, albeit with the side effect of creating a crash report. I'm not sure how this is failing: if somehow process doesn't have the TERMINATE_PROCESS access right for itself, then I suppose we would hit this.

I think before this we were just calling _exit(0);; can we go back to doing that? One would think from reading the documentation on TerminateProcess that terminating yourself should just work, but that is apparently not the case.

Alternatively, windows experts say that !gle in windbg will (ideally) tell us why TerminateProcess is failing, so maybe we could attempt to fix that instead.

There were two locations which were doing similar things, which we merged into one. One of them was calling _exit(0), and the other was calling TerminateProcess. The problem with calling _exit(0) is that it calls DllMain on any loaded assemblies with a DLL_PROCESS_DETACH argument. This can cause Bad Things to happen, because it immediately terminates any threads running, which could for instance be locking on a mutex which the DllMain for the loaded assembly may want to try to lock, thus inducing a deadlock. So if we're going to forcefully terminate things, we need to ensure we're forcefully terminating everything.

I suppose we could annotate the crash with the result of ::GetLastError() though? I don't see why we need to / how we could involve windbg without being able to reproduce this locally. I have a suspicion that it will return ERROR_ACCESS_DENIED, and it will turn out that some third-party software has for some reason revoked our PROCESS_TERMINATE access right.

Flags: needinfo?(dothayer)

(Away)

Comment 9

•

5 years ago

(In reply to [:philipp] from comment #5)

in the meantime, dll correlations for this signature got generated and this one sticks out:
(88.76% in signature vs 00.33% overall) Module "Graphics64.dll" = true

...and it seems to be part of the "cyberclient" product by this company creating software for internet cafe providers: https://tenaxsoft.com/ (and thus probably explaining the "spanish connection")

Given that strong correlation I wonder if this software is hooking TerminateProcess.

Nathan Froyd [:froydnj]

Comment 10

•

5 years ago

FWIW, chromium has seen the same sort of weird behavior, also correlated with Graphics64.dll: https://bugs.chromium.org/p/chromium/issues/detail?id=820518

I'd like to try and look at a minidump later today, but I won't object if somebody beats me to it. :)

(Away)

Comment 11

•

5 years ago

(In reply to Doug Thayer [:dthayer] (he/him) from comment #8)

I suppose we could annotate the crash with the result of ::GetLastError() though? I don't see why we need to / how we could involve windbg without being able to reproduce this locally.

I thought minidumps were supposed to contain this information, which WinDbg can display with the !gle extension, but after it failed on several reports I remembered that we only enable MiniDumpWithProcessThreadData for the nightly channel: https://searchfox.org/mozilla-central/rev/ab81b8552f4aa9696a2524f97fdfeb59d4dc31c1/toolkit/crashreporter/nsExceptionHandler.cpp#1789-1792

Fortunately I found a single report from nightly which has these symptoms (bp-cadb906d-96d3-4142-a142-0dd1c0200314) and the result is:

0:000> !gle
LastErrorValue: (Win32) 0x6 (6) - The handle is invalid.
LastStatusValue: (NTSTATUS) 0xc0000008 - An invalid HANDLE was specified.

Nathan Froyd [:froydnj]

Comment 12

•

5 years ago

(In reply to :dmajor from comment #11)

Fortunately I found a single report from nightly which has these symptoms (bp-cadb906d-96d3-4142-a142-0dd1c0200314) and the result is:
0:000> !gle
LastErrorValue: (Win32) 0x6 (6) - The handle is invalid.
LastStatusValue: (NTSTATUS) 0xc0000008 - An invalid HANDLE was specified.

How can the handle to the current process be invalid?

(Away)

Comment 13

•

5 years ago

I suspect that TerminateProcess or one of its transitive callees is being redirected.

Kim Moir [:kmoir] ET

Updated

•

5 years ago

status-firefox79: affected → wontfix

Julien Cristau [:jcristau]

Updated

•

5 years ago

status-firefox80: affected → fix-optional

Nathan Froyd [:froydnj]

Comment 14

•

5 years ago

I don't think there's really anything we can do here.

Severity: -- → S2

Flags: needinfo?(nfroyd)

Priority: -- → P3

RaresB

Updated

•

4 years ago

QA Whiteboard: qa-not-actionable

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Dianna Smith [:diannaS]

Comment 15

•

3 years ago

Is this truly an S2? It seems it has been open for quite a while and is related to third party software..

status-firefox80: fix-optional → wontfix

Flags: needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Comment 16

•

3 years ago

I don't think there's anything reasonable we can do if some third-party dll has injected something which makes TerminateProcess return other than what we're already doing (crashing), so I think we can just mark this WONTFIX.

Status: NEW → RESOLVED

Closed: 3 years ago

Flags: needinfo?(nika)

Resolution: --- → WONTFIX

Gabriele Svelto [:gsvelto]

Comment 17

•

2 years ago

Just stumbled upon this because we had a few crashes on nightly. It looks like most of the users affected have odd DLLs injected - which might mean either TerminateProcess() or GetCurrentProcess() have been hooked. That being said it's interesting to note that in the nightly crashes the value of GetLastError() was ERROR_INVALID_HANDLE which suggests that GetCurrentProcess() returned a value that was bogus.

Bugzilla

Crash in [@ mozilla::AppShutdown::DoImmediateExit]

Categories

(Core :: XPCOM, defect, P3)

Tracking

()

People

(Reporter: philipp, Unassigned)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated

Comment 14

Updated

Updated

Comment 15

Comment 16

Comment 17