Bug 1851889 Comment 29 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer, but they will fail as soon as the broker process uses more than one second to answer one call. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed or completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the IPC call failure, the sandboxed process was trying to create the section for the DLL on its own, which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one call. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed or completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the IPC call failure, the sandboxed process was trying to create the section for the DLL on its own, which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed or completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the IPC call failure, the sandboxed process was trying to create the section for the DLL on its own, which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the IPC call failure, the sandboxed process was trying to create the section for the DLL on its own, which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own, which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see: [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see: [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see: [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see: [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see: [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see: [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer one of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, as can be seen in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call for which will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occurs at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209. They have yet to be fixed.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, as far as I can tell, we do currently nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209. They have yet to be fixed.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, we currently do nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.
Here is the root cause for these intermittent failures after investigation. **On Windows, XPCOM shutdown currently leaves sandbox IPC in a weird semi-dead state: sandbox IPC calls are still allowed and will work seemlessly as long as the broker process takes less than 1 second to answer them, but they will fail as soon as the broker process uses more than one second to answer any of them. XPCOM shutdown should instead guarantee that sandbox IPC calls are now either still completely allowed, or otherwise completely disallowed.** (Note that these intermittent failures still occur at the moment, with a different signature, in bug 1871209. They have yet to be fixed.)

It is the failure of the underlying sandbox IPC call `IpcTag::NTCREATESECTION` that made `LoadLibraryW` ultimately fail with `ERROR_INVALID_IMAGE_HASH` in `LoadLibraryOrCrash`. This sandbox IPC call is required to load `mozavcodec.dll` and `mozavutil.dll` because prespawn CIG is active during tests, because the dynamic blocklist holds a test entry (this weird chain of consequences is probably not intentional either, see bug 1869805 comment 0 for more details). After the sandbox IPC call failure, the sandboxed process was trying to create the section for the DLL on its own (see `sandbox::TargetNtCreateSection`), which results in `ERROR_INVALID_IMAGE_HASH` for any non-Microsoft DLL when CIG is active. So `ERROR_INVALID_IMAGE_HASH` failures occured when, after XPCOM shutdown, the broker process somehow took more than one second to answer either the `IpcTag::NTCREATESECTION` call or a prior call. It is possible to reproduce these failures by adding an artificial delay of 2 seconds in `SignedPolicy::CreateSectionAction`.

On Windows, we have a dedicated thread called `IPC Launch` (see [`GetIPCLauncher()`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#979-999)). This is the thread on which sandboxed process launches occur (see [`BaseProcessLauncher::Launch`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#1811-1828)). As such, it is also the thread on which the call to `CreateMutexW` that initializes `g_alive_mutex` occurs (see [`sandbox::SharedMemIPCServer::SharedMemIPCServer`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_server.cc#31-53)). When we reach `xpcom-shutdown-threads`, we let this thread die (see [`IPCLaunchThreadObserver::Observe`](https://searchfox.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#964-977)).

However, we currently do nothing to ensure that all sandbox IPC is stopped before this thread dies. Yet, the death of the thread that created `g_alive_mutex` leaves the mutex abandoned (see [`WAIT_ABANDONED`](https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject)). The chromium sandbox IPC code assumes that, for as long as IPC is still possible, this mutex can only be abandoned in case the broker process crashed. In fact, sandboxed processes rely on this to detect a broker process crash and stop waiting for a result (see [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153)).

Therefore, XPCOM shutdown currently leaves sandbox IPC in a semi-dead state, by breaking the logic in [`SharedMemIPCClient::DoCall`](https://searchfox.org/mozilla-central/source/security/sandbox/chromium/sandbox/win/src/sharedmem_ipc_client.cc#94-153). As long as the broker process always takes less than `kIPCWaitTimeOut1` (1 second) to answer every IPC call (by signalling the pong event), IPC can still occur seemlessly after XPCOM shutdown. However, the first call that takes more than 1 second will make the sandboxed process realize that the mutex is now abandoned, and this will cause this sandbox IPC call (and the next ones) to fail.

Back to Bug 1851889 Comment 29