Closed Bug 1851889 Opened 9 months ago Closed 5 months ago

Intermittent application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | single tracking bug

Categories

(Core :: IPC, defect)

defect

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox119 --- wontfix
firefox121 --- wontfix
firefox122 --- wontfix
firefox123 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: yannis)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, test-verify-fail)

Crash Data

Attachments

(2 files, 1 obsolete file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=428164755&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bk44jw-YR_-ojE7VhnT8dw/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bk44jw-YR_-ojE7VhnT8dw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2023-09-06T18:14:44.475Z] 18:14:44     INFO - PROCESS-CRASH | application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | browser\components\firefoxview\tests\browser\firefoxview-next\browser.ini 
[task 2023-09-06T18:14:44.475Z] 18:14:44     INFO - Process type: unknown
[task 2023-09-06T18:14:44.475Z] 18:14:44     INFO - Process pid: 10000
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Crash dump filename: C:\Users\task_169402224476647\AppData\Local\Temp\tmpfndnbhe7.mozrunner\minidumps\615b608b-4c74-47ce-9b44-9b12a7b34aa5.dmp
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Operating system: Windows NT
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -                   10.0.22621
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - CPU: x86
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      GenuineIntel family 6 model 106 stepping 6
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      8 CPUs
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - 
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Crash reason:  EXCEPTION_BREAKPOINT
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Crash address: 0x6688f6ed
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Process uptime: 49 seconds
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - 
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO - Thread 0 MainThread (crashed)
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -  0  xul.dll!MOZ_Crash(char const*, int, char const*) [Assertions.h:58c832e34613172b88870ad71a0d9596987ed80f : 281]
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -     Found by: inlining
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -  1  xul.dll!mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash(wchar_t const*) [UtilityProcessImpl.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 69 + 0x18]
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      eip = 0x6688f6ed    esp = 0x0546f964    ebp = 0x0546f96c    ebx = 0x0546f998
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      esi = 0x65035a10    edi = 0x6eadbcca    eax = 0x65035a0c    ecx = 0x03b8febf
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      edx = 0x0546ea0d eflags = 0x00000202
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -     Found by: given as instruction pointer in context
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -  2  xul.dll!mozilla::ipc::UtilityAudioDecoderParent::GenericPreloadForSandbox() [UtilityAudioDecoderParent.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 74]
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -     Found by: inlining
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -  3  xul.dll!mozilla::ipc::UtilityProcessImpl::Init(int, char**) [UtilityProcessImpl.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 91 + 0x9]
[task 2023-09-06T18:14:44.480Z] 18:14:44     INFO -      eip = 0x66893603    esp = 0x0546f974    ebp = 0x0546f9ec    ebx = 0x0546f998
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x0546f9f4    edi = 0x05a02dd0
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  4  xul.dll!XRE_InitChildProcess(int, char**, XREChildData const*) [nsEmbedFunctions.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 623 + 0x17]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      eip = 0x6bee7359    esp = 0x0546f9f4    ebp = 0x0546fb58    ebx = 0x05a02dd0
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x668934d0    edi = 0x05a049d0
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  5  xul.dll!mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) [Bootstrap.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 67 + 0xd]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      eip = 0x6beee161    esp = 0x0546fb60    ebp = 0x0546fb6c    ebx = 0x00000016
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x05a05068    edi = 0x6beee150
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  6  firefox.exe!content_process_main(mozilla::Bootstrap*, int, char**) [plugin-container.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 57]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: inlining
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  7  firefox.exe!NS_internal_main(int, char**, char**) [nsBrowserApp.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 375 + 0x8e]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      eip = 0x00ef14d4    esp = 0x0546fb74    ebp = 0x0546fccc    ebx = 0x00000016
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x05a05068    edi = 0x6beee150
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  8  firefox.exe!wmain(int, wchar_t**) [nsWindowsWMain.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 151 + 0x10]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      eip = 0x00ef11d5    esp = 0x0546fcd4    ebp = 0x0546fd00    ebx = 0x05643068
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x05a02dd0    edi = 0x05a02430
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -  9  firefox.exe!invoke_main() [exe_common.inl : 90]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -     Found by: inlining
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO - 10  firefox.exe!__scrt_common_main_seh() [exe_common.inl : 288 + 0x1b]
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      eip = 0x00f72339    esp = 0x0546fd08    ebp = 0x0546fd48    ebx = 0x05180000
[task 2023-09-06T18:14:44.481Z] 18:14:44     INFO -      esi = 0x05643068    edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO - 11  kernel32.dll!BaseThreadInitThunk + 0x18
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      eip = 0x768d7d49    esp = 0x0546fd50    ebp = 0x0546fd58    ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      esi = 0x05643068    edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO - 12  ntdll.dll!__RtlUserThreadStart + 0x2a
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      eip = 0x7792b74b    esp = 0x0546fd60    ebp = 0x0546fdb0    ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      esi = 0x05643068    edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -     Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO - 13  ntdll.dll!_RtlUserThreadStart + 0x1a
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      eip = 0x7792b6cf    esp = 0x0546fdb8    ebp = 0x0546fdc0    ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -      esi = 0x05643068    edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44     INFO -     Found by: call frame info
Summary: Intermittent [TV] application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | browser\components\firefoxview\tests\browser\firefoxview-next\browser.ini → Intermittent [TV] application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | single tracking bug

bad state worker?

Summary: Intermittent [TV] application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | single tracking bug → Intermittent application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | single tracking bug
Severity: -- → S3

All of the errors reports 0x241, as already commented in comment #3. I'm not sure there is anything we can do there ?

Blocks: 1831236

All of the instances I can see linked on treeherder intermittent viewer shows the same 0x241 failure

Update

There have been 37 total failures within the last 7 days:

  • 3 failures on Windows 11 x86 22H2 MinGW WebRender debug
  • 17 failures on Windows 11 x86 22H2 WebRender debug
  • 3 failures on Windows 11 x64 22H2 MinGW WebRender debug
  • 14 failures on Windows 11 x64 22H2 WebRender debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=436772178&repo=autoland&lineNumber=3307

Whiteboard: [stockwell needswork:owner]
Depends on: 1865795

Update

There have been 64 total failures within the last 7 days:

  • 1 failure on Windows 11 x86 22H2 MinGW WebRender debug
  • 20 failures on Windows 11 x86 22H2 WebRender debug
  • 4 failures on Windows 11 x64 22H2 MinGW WebRender debug
  • 39 failures on Windows 11 x64 22H2 WebRender debug

Recent log: https://treeherder.mozilla.org/logviewer?job_id=437625582&repo=mozilla-central&lineNumber=2871

Yannis, is this something you are working on?
Thank you.

Flags: needinfo?(yjuglaret)

Yes, I am working in bug 1865795 on adding more debug information. It should land soon and it won't fix anything, but the failures that will occur after landing should help us understand better why they occur, hopefully enough to write a fix.

Flags: needinfo?(yjuglaret)
Depends on: 1869805

I have made progress in the investigation and I hope that with bug 1869805, these failures will now be reported as something like Intermittent application crashed [@ sandbox::TargetNtCreateSection] and contain useful diagnostics information. I'm not sure how to link these new failures to this bug though, such that they would appear here.

Depends on: 1871209

Removing the disable-recommanded tag because there are no failures since two weeks ago

Whiteboard: [stockwell disable-recommended]

Here is the root cause for these intermittent failures after investigation. On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird semi-dead state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed. (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209. They have yet to be fixed.)

It is the failure of the underlying sandbox IPC call IpcTag::NTCREATESECTION that made LoadLibraryW ultimately fail with ERROR_INVALID_IMAGE_HASH in LoadLibraryOrCrash. This sandbox IPC call is required to load mozavcodec.dll and mozavutil.dll because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see sandbox::TargetNtCreateSection), which results in ERROR_INVALID_IMAGE_HASH for any non-Microsoft DLL when CIG is active. So ERROR_INVALID_IMAGE_HASH failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the IpcTag::NTCREATESECTION call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in SignedPolicy::CreateSectionAction.

On Windows, we have a dedicated thread called IPC Launch (see GetIPCLauncher()). This is the thread on which sandboxed process launches occur (see BaseProcessLauncher::Launch). As such, it is also the thread on which the call to CreateMutexW that initializes g_alive_mutex occurs (see sandbox::SharedMemIPCServer::SharedMemIPCServer). When we reach xpcom-shutdown-threads, we let this thread die (see IPCLaunchThreadObserver::Observe).

However, we currently do nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created g_alive_mutex leaves the mutex abandoned (see WAIT_ABANDONED). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see SharedMemIPCClient::DoCall).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in SharedMemIPCClient::DoCall. As long as the broker process always takes less than kIPCWaitTimeOut1 (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.

This patch removes the temporary diagnostics code that was added
to the chromium sandbox in bug 1869805 to investigate bug 1851889.

The sandbox IPC client/server communication protocol implicitly requires
that the thread that does the first call to SpawnTarget lives as long as
the broker process will. When that thread dies, clients can start to
assume that the whole broker process is down.

This patch lets users of the sandbox call SpawnTarget from threads that
live shorter than the broker process. In order to do that, one must
provide TargetPolicy::SetAliveMutex with a handle to a mutex that is
currently owned. The owning thread is required to live as long as the
broker process will.

In other words, this patch moves the requirement that implicitly existed
on the calling thread of SpawnTarget, so that it is now a requirement on
the thread that owns the mutex passed to TargetPolicy::SetAliveMutex.

Depends on D197422

We currently call SpawnTarget from the IPC launcher thread, but that
thread dies during XPCOM shutdown. Once the thread dies, the sandbox
ends up in a weird state because we break an implicit assumption of the
sandbox IPC client/server protocol.

In order to fix this, we must provide a custom mutex through
TargetPolicy::SetAliveMutex. Because we provide a mutex which is owned
by the main thread, which lives as long as the broker process, the
sandbox IPC protocol will not be disturbed by the death of the IPC
launcher thread anymore.

Depends on D197423

Assignee: nobody → yjuglaret
Attachment #9370514 - Attachment description: WIP: Bug 1851889 - Undo temporary diagnostics code for prespawn CIG. r=bobowen → Bug 1851889 - Undo temporary diagnostics code for prespawn CIG. r=bobowen
Attachment #9370515 - Attachment description: WIP: Bug 1851889 - Allow setting a custom alive mutex through the sandbox policy. r=bobowen → Bug 1851889 - Allow setting a custom alive mutex through the sandbox policy. r=bobowen
Attachment #9370516 - Attachment description: WIP: Bug 1851889 - Fix calling SpawnTarget from the IPC launcher thread. r=bobowen → Bug 1851889 - Fix calling SpawnTarget from the IPC launcher thread. r=bobowen
Attachment #9370515 - Attachment description: Bug 1851889 - Allow setting a custom alive mutex through the sandbox policy. r=bobowen → Bug 1851889 - Create the alive mutex during sandbox initialization. r=bobowen
Attachment #9370515 - Attachment description: Bug 1851889 - Create the alive mutex during sandbox initialization. r=bobowen → Bug 1851889 - Create the broker alive mutex during sandbox initialization. r=bobowen
Attachment #9370516 - Attachment is obsolete: true
Pushed by yjuglaret@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/34aadf2b024b
Undo temporary diagnostics code for prespawn CIG. r=bobowen
https://hg.mozilla.org/integration/autoland/rev/42738fa75e0b
Create the broker alive mutex during sandbox initialization. r=bobowen
Blocks: 1871209
No longer depends on: 1871209
See Also: → 1872995

The issue could affect all sandbox client/server communication after XPCOM shutdown (see the bolded part of comment 29), but the only visible impact we have seen is with the prespawn CIG mitigation. This mitigation is only active for Nightly users (and actually a very small portion of them, see bug 1872995). Moreover the fix is on a critical path of the sandbox code. It seems more reasonable not to uplift this.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: