Bug 1871209 Comment 1 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs are evidence that intermittent failures to load  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process has initiated shutdown so the IPC server is unavailable**: the sandboxed process will try to load the DLL on its own and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process finishes a previous proxy DLL load request for mozavcodec.dll, which started before shutdown; further requests will be rejected
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// Now that mozavcodec.dll is loaded, the utility process needs a new proxy DLL load for mozavutil.dll, but it gets denied because shutdown has started and the IPC server is unavailable
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

(I believe these logs are also evidence that loading `mozavcodec.dll` can take a long time.)

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` for which one reason in `SharedMemIPCClient::DoCall` is when the IPC server is down.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs are evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process has initiated shutdown so the IPC server is unavailable**: the sandboxed process will try to load the DLL on its own and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process finishes a previous proxy DLL load request for mozavcodec.dll, which started before shutdown; further requests will be rejected
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// Now that mozavcodec.dll is loaded, the utility process needs a new proxy DLL load for mozavutil.dll, but it gets denied because shutdown has started and the IPC server is unavailable
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

(I believe these logs are also evidence that loading `mozavcodec.dll` can take a long time.)

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` for which one reason in `SharedMemIPCClient::DoCall` is when the IPC server is down.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs are evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process has initiated shutdown so the IPC server is unavailable**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process finishes a previous proxy DLL load request for mozavcodec.dll, which started before shutdown; further requests will be rejected
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// Now that mozavcodec.dll is loaded, the utility process needs a new proxy DLL load for mozavutil.dll, but it gets denied because shutdown has started and the IPC server is unavailable
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

(I believe these logs are also evidence that loading `mozavcodec.dll` can take a long time.)

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` for which one reason in `SharedMemIPCClient::DoCall` is when the IPC server is down.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs are evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process has initiated shutdown so the IPC server is unavailable**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process finishes a previous proxy DLL load request for mozavcodec.dll, which started before shutdown; further requests will be rejected
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// Now that mozavcodec.dll is loaded, the utility process needs a new proxy DLL load for mozavutil.dll, but it gets denied because shutdown has started and the IPC server is unavailable
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` for which one reason in `SharedMemIPCClient::DoCall` is when the IPC server is down.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs are evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, causing it to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process somehow processes the proxy DLL load request for mozavcodec.dll and successfully creates the section
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on sandboxed process side, something is wrong with the IPC channel, likely because of shutdown:
// DoCall ends with `SBOX_ERROR_CHANNEL_ERROR`
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, causing it to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// Main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// Main process somehow processes the proxy DLL load request for mozavcodec.dll and successfully creates the section
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on sandboxed process side, something is wrong with the IPC channel, likely because of shutdown:
// DoCall ends with `SBOX_ERROR_CHANNEL_ERROR`
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer);` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, causing it to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// The main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown, and successfully creates the section.
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I read them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// The main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown, and successfully creates the section.
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// The main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown, and successfully creates the section.
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// 18 seconds later, the main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown.
// It successfully creates the section (this log indicates success, it is the same as when things go right).
// Maybe the request predates shutdown?
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// 18 seconds later, the main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown.
// Maybe the request predates shutdown?
// It successfully creates the section (this log indicates success, it is the same as when things go right).
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// 18 seconds later, the main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown.
// Maybe the request predates shutdown?
// It successfully creates the section (this log indicates success, it is the same as when things go right).
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, the for the proxy load `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` fails with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// 18 seconds later, the main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown.
// Maybe the request predates shutdown? (But then, what happened during these 18 seconds?)
// It successfully creates the section (this log indicates success, it is the same as when things go right).
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, the for the proxy load `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` fails with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.
These failures were expected as part of the ongoing investigation in bug 1851889 (see bug 1851889 comment 26). When we use prespawn CIG in a sandboxed process, this process will use proxy DLL loads for Mozilla DLLs. This means that instead of a normal DLL load, the sandboxed process will do some IPC with the parent process so that the parent process loads the DLL on behalf of the sandboxed process. This is because when prespawn CIG is active, the sandboxed process can only load Microsoft-signed DLLs on its own (loading other DLLs on its own fails with `ERROR_INVALID_IMAGE_HASH`).

I believe that these logs could be evidence that intermittent failures with  `ERROR_INVALID_IMAGE_HASH` occur when **a sandboxed process using prespawn CIG needs a proxy DLL load but the main process initiates shutdown during or before the IPC exchange, therefore causing the proxy DLL load to fail**: the sandboxed process will try to load the DLL on its own as a fallback, and this will fail with `ERROR_INVALID_IMAGE_HASH`. Therefore, in `LoadLibraryOrCrash`, we should just consider these failures as information that shutdown has started in the main process, and either ignore them or use them to speed up shutdown (or, we could stop using prespawn CIG).

The relevant parts of the logs are the following, with comments about how I understand them (note that I may be wrong):

```c++
// The main process has initiated shutdown and offers 20 seconds to the sandboxed utility process to shutdown.
03:39:21 [Parent 5040, IPC I/O Parent] WARNING: Process 5672 may be hanging at shutdown; will wait for up to 20000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_win.cc:129

// 18 seconds later, the main process somehow processes the proxy DLL load request for mozavcodec.dll despite shutdown.
// Maybe the request predates shutdown? (But then, what happened during these 18 seconds?)
// It successfully creates the section (this log indicates success, it is the same as when things go right).
03:39:39 Prespawn CIG: SignedDispatcher::CreateSection completed (pid=5672, path=\Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll, result=0x3, nt_status=0x0)

// But on the sandboxed process side, the IPC call does not complete.
// CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer) ends with SBOX_ERROR_CHANNEL_ERROR.
// Something is wrong with the IPC channel, likely because of shutdown.
03:39:39 Hit MOZ_CRASH(Prespawn CIG: STATUS_INVALID_IMAGE_HASH for \Device\HarddiskVolume7\task_170312591276099\build\application\firefox\mozavcodec.dll (break_reason: 9, faulty_value: 0xa)
03:39:39 ) at /builds/worker/checkouts/gecko/security/sandbox/chromium/sandbox/win/src/signed_interception.cc:193
```

`break_reason: 9, faulty_value: 0xa` means that we fail because in `TargetNtCreateSection`, the proxy load with `CrossCall(ipc, IpcTag::NTCREATESECTION, file_handle, &answer)` fails with `SBOX_ERROR_CHANNEL_ERROR`. We could add extra logging in `SharedMemIPCClient::DoCall` to be 100% sure of the reason for the failure.

Edit: The first version of this comment was wrong on some points.

Back to Bug 1871209 Comment 1