Intermittent application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | single tracking bug
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: yannis)
References
(Blocks 1 open bug)
Details
(Keywords: crash, intermittent-failure, test-verify-fail)
Crash Data
Attachments
(2 files, 1 obsolete file)
Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=428164755&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bk44jw-YR_-ojE7VhnT8dw/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bk44jw-YR_-ojE7VhnT8dw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2023-09-06T18:14:44.475Z] 18:14:44 INFO - PROCESS-CRASH | application crashed [@ mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash] | browser\components\firefoxview\tests\browser\firefoxview-next\browser.ini
[task 2023-09-06T18:14:44.475Z] 18:14:44 INFO - Process type: unknown
[task 2023-09-06T18:14:44.475Z] 18:14:44 INFO - Process pid: 10000
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Crash dump filename: C:\Users\task_169402224476647\AppData\Local\Temp\tmpfndnbhe7.mozrunner\minidumps\615b608b-4c74-47ce-9b44-9b12a7b34aa5.dmp
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Operating system: Windows NT
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 10.0.22621
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - CPU: x86
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - GenuineIntel family 6 model 106 stepping 6
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 8 CPUs
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO -
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Crash address: 0x6688f6ed
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Process uptime: 49 seconds
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO -
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Thread 0 MainThread (crashed)
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 0 xul.dll!MOZ_Crash(char const*, int, char const*) [Assertions.h:58c832e34613172b88870ad71a0d9596987ed80f : 281]
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Found by: inlining
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 1 xul.dll!mozilla::ipc::UtilityProcessImpl::LoadLibraryOrCrash(wchar_t const*) [UtilityProcessImpl.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 69 + 0x18]
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - eip = 0x6688f6ed esp = 0x0546f964 ebp = 0x0546f96c ebx = 0x0546f998
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - esi = 0x65035a10 edi = 0x6eadbcca eax = 0x65035a0c ecx = 0x03b8febf
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - edx = 0x0546ea0d eflags = 0x00000202
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Found by: given as instruction pointer in context
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 2 xul.dll!mozilla::ipc::UtilityAudioDecoderParent::GenericPreloadForSandbox() [UtilityAudioDecoderParent.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 74]
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - Found by: inlining
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - 3 xul.dll!mozilla::ipc::UtilityProcessImpl::Init(int, char**) [UtilityProcessImpl.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 91 + 0x9]
[task 2023-09-06T18:14:44.480Z] 18:14:44 INFO - eip = 0x66893603 esp = 0x0546f974 ebp = 0x0546f9ec ebx = 0x0546f998
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x0546f9f4 edi = 0x05a02dd0
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 4 xul.dll!XRE_InitChildProcess(int, char**, XREChildData const*) [nsEmbedFunctions.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 623 + 0x17]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - eip = 0x6bee7359 esp = 0x0546f9f4 ebp = 0x0546fb58 ebx = 0x05a02dd0
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x668934d0 edi = 0x05a049d0
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 5 xul.dll!mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) [Bootstrap.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 67 + 0xd]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - eip = 0x6beee161 esp = 0x0546fb60 ebp = 0x0546fb6c ebx = 0x00000016
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x05a05068 edi = 0x6beee150
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 6 firefox.exe!content_process_main(mozilla::Bootstrap*, int, char**) [plugin-container.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 57]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: inlining
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 7 firefox.exe!NS_internal_main(int, char**, char**) [nsBrowserApp.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 375 + 0x8e]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - eip = 0x00ef14d4 esp = 0x0546fb74 ebp = 0x0546fccc ebx = 0x00000016
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x05a05068 edi = 0x6beee150
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 8 firefox.exe!wmain(int, wchar_t**) [nsWindowsWMain.cpp:58c832e34613172b88870ad71a0d9596987ed80f : 151 + 0x10]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - eip = 0x00ef11d5 esp = 0x0546fcd4 ebp = 0x0546fd00 ebx = 0x05643068
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x05a02dd0 edi = 0x05a02430
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 9 firefox.exe!invoke_main() [exe_common.inl : 90]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - Found by: inlining
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - 10 firefox.exe!__scrt_common_main_seh() [exe_common.inl : 288 + 0x1b]
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - eip = 0x00f72339 esp = 0x0546fd08 ebp = 0x0546fd48 ebx = 0x05180000
[task 2023-09-06T18:14:44.481Z] 18:14:44 INFO - esi = 0x05643068 edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - 11 kernel32.dll!BaseThreadInitThunk + 0x18
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - eip = 0x768d7d49 esp = 0x0546fd50 ebp = 0x0546fd58 ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - esi = 0x05643068 edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - 12 ntdll.dll!__RtlUserThreadStart + 0x2a
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - eip = 0x7792b74b esp = 0x0546fd60 ebp = 0x0546fdb0 ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - esi = 0x05643068 edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - Found by: call frame info
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - 13 ntdll.dll!_RtlUserThreadStart + 0x1a
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - eip = 0x7792b6cf esp = 0x0546fdb8 ebp = 0x0546fdc0 ebx = 0x05180000
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - esi = 0x05643068 edi = 0x0564b098
[task 2023-09-06T18:14:44.482Z] 18:14:44 INFO - Found by: call frame info
Comment 1•9 months ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment 3•9 months ago
|
||
(In reply to Serban Stanca [:SerbanS] from comment #1)
This specific instance crashed with: Hit MOZ_CRASH(Unable to preload module: 0x241)
, where 0x241
is error code for ERROR_INVALID_IMAGE_HASH
Comment 4•9 months ago
|
||
bad state worker?
Updated•9 months ago
|
Updated•8 months ago
|
Updated•8 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 8•8 months ago
|
||
All of the errors reports 0x241
, as already commented in comment #3. I'm not sure there is anything we can do there ?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 14•7 months ago
|
||
All of the instances I can see linked on treeherder intermittent viewer shows the same 0x241 failure
Comment hidden (Intermittent Failures Robot) |
Comment 16•6 months ago
|
||
Update
There have been 37 total failures within the last 7 days:
- 3 failures on Windows 11 x86 22H2 MinGW WebRender debug
- 17 failures on Windows 11 x86 22H2 WebRender debug
- 3 failures on Windows 11 x64 22H2 MinGW WebRender debug
- 14 failures on Windows 11 x64 22H2 WebRender debug
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=436772178&repo=autoland&lineNumber=3307
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•6 months ago
|
||
Update
There have been 64 total failures within the last 7 days:
- 1 failure on Windows 11 x86 22H2 MinGW WebRender debug
- 20 failures on Windows 11 x86 22H2 WebRender debug
- 4 failures on Windows 11 x64 22H2 MinGW WebRender debug
- 39 failures on Windows 11 x64 22H2 WebRender debug
Recent log: https://treeherder.mozilla.org/logviewer?job_id=437625582&repo=mozilla-central&lineNumber=2871
Yannis, is this something you are working on?
Thank you.
Assignee | ||
Comment 22•6 months ago
•
|
||
Yes, I am working in bug 1865795 on adding more debug information. It should land soon and it won't fix anything, but the failures that will occur after landing should help us understand better why they occur, hopefully enough to write a fix.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 26•5 months ago
•
|
||
I have made progress in the investigation and I hope that with bug 1869805, these failures will now be reported as something like Intermittent application crashed [@ sandbox::TargetNtCreateSection]
and contain useful diagnostics information. I'm not sure how to link these new failures to this bug though, such that they would appear here.
Comment hidden (Intermittent Failures Robot) |
Comment 28•5 months ago
|
||
Removing the disable-recommanded tag because there are no failures since two weeks ago
Assignee | ||
Comment 29•5 months ago
•
|
||
Here is the root cause for these intermittent failures after investigation. On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird semi-dead state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed. (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209. They have yet to be fixed.)
It is the failure of the underlying sandbox IPC call IpcTag::NTCREATESECTION
that made LoadLibraryW
ultimately fail with ERROR_INVALID_IMAGE_HASH
in LoadLibraryOrCrash
. This sandbox IPC call is required to load mozavcodec.dll
and mozavutil.dll
because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see sandbox::TargetNtCreateSection
), which results in ERROR_INVALID_IMAGE_HASH
for any non-Microsoft DLL when CIG is active. So ERROR_INVALID_IMAGE_HASH
failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the IpcTag::NTCREATESECTION
call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in SignedPolicy::CreateSectionAction
.
On Windows, we have a dedicated thread called IPC Launch
(see GetIPCLauncher()
). This is the thread on which sandboxed process launches occur (see BaseProcessLauncher::Launch
). As such, it is also the thread on which the call to CreateMutexW
that initializes g_alive_mutex
occurs (see sandbox::SharedMemIPCServer::SharedMemIPCServer
). When we reach xpcom-shutdown-threads
, we let this thread die (see IPCLaunchThreadObserver::Observe
).
However, we currently do nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created g_alive_mutex
leaves the mutex abandoned (see WAIT_ABANDONED
). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see SharedMemIPCClient::DoCall
).
Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in SharedMemIPCClient::DoCall
. As long as the broker process always takes less than kIPCWaitTimeOut1
(1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Assignee | ||
Comment 30•5 months ago
|
||
This patch removes the temporary diagnostics code that was added
to the chromium sandbox in bug 1869805 to investigate bug 1851889.
Assignee | ||
Comment 31•5 months ago
|
||
The sandbox IPC client/server communication protocol implicitly requires
that the thread that does the first call to SpawnTarget lives as long as
the broker process will. When that thread dies, clients can start to
assume that the whole broker process is down.
This patch lets users of the sandbox call SpawnTarget from threads that
live shorter than the broker process. In order to do that, one must
provide TargetPolicy::SetAliveMutex with a handle to a mutex that is
currently owned. The owning thread is required to live as long as the
broker process will.
In other words, this patch moves the requirement that implicitly existed
on the calling thread of SpawnTarget, so that it is now a requirement on
the thread that owns the mutex passed to TargetPolicy::SetAliveMutex.
Depends on D197422
Assignee | ||
Comment 32•5 months ago
|
||
We currently call SpawnTarget from the IPC launcher thread, but that
thread dies during XPCOM shutdown. Once the thread dies, the sandbox
ends up in a weird state because we break an implicit assumption of the
sandbox IPC client/server protocol.
In order to fix this, we must provide a custom mutex through
TargetPolicy::SetAliveMutex. Because we provide a mutex which is owned
by the main thread, which lives as long as the broker process, the
sandbox IPC protocol will not be disturbed by the death of the IPC
launcher thread anymore.
Depends on D197423
Assignee | ||
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Updated•5 months ago
|
Comment 33•5 months ago
|
||
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/34aadf2b024b Undo temporary diagnostics code for prespawn CIG. r=bobowen https://hg.mozilla.org/integration/autoland/rev/42738fa75e0b Create the broker alive mutex during sandbox initialization. r=bobowen
Updated•5 months ago
|
Updated•5 months ago
|
Assignee | ||
Comment 34•5 months ago
•
|
||
The issue could affect all sandbox client/server communication after XPCOM shutdown (see the bolded part of comment 29), but the only visible impact we have seen is with the prespawn CIG mitigation. This mitigation is only active for Nightly users (and actually a very small portion of them, see bug 1872995). Moreover the fix is on a critical path of the sandbox code. It seems more reasonable not to uplift this.
Comment 35•5 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/34aadf2b024b
https://hg.mozilla.org/mozilla-central/rev/42738fa75e0b
Updated•5 months ago
|
Description
•