Bug 1850969 Comment 7 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` was is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here. It would be nice to understand why that happens too. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here. It would be nice to understand why that happens too. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that. It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that (would that lead to a launcher process failure?). It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that (would that lead to a launcher process failure?). It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, it is the following new path that we take:

```
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that (would that lead to a launcher process failure?). It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, it is the following new path that we take:

```c++
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that (would that lead to a launcher process failure?). It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, we take the following new failure path:

```c++
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that. It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, we take the following new failure path:

```c++
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that. It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, we take the following new failure path:

```c++
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?

Edit: In most crashes the main thread is in `mozilla::CreateAndStorePreXULSkeletonUI`, [which is called right after `mozilla::freestanding::gSharedSection.ConvertToReadOnly();`](https://searchfox.org/mozilla-central/source/browser/app/nsBrowserApp.cpp#429-431) and takes some time to complete, so I would suspect the following. On most computers, we only have one thread when we reach the code that converts the shared section to read-only, so no risk of a race condition. But third-party products can start a thread in our process before we reach that point, in which case they make the race condition possible.
(In reply to Ray Kraesig [:rkraesig] from comment #4)
> `NtMapViewOfSection` itself probably never does that, but we're hooking it -- and bug 1733532 / [patch D182982](https://phabricator.services.mozilla.com/D182982), which was also added in 117 and uplifted to 115.0.2esr, added [a return path](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#574-577) to our hook which returns `ERROR_ACCESS_DENIED` without zeroing `*aBaseAddress`.

Good catch! This is definitely what is happening here, `patched_NtMapViewOfSection` intended to block the mapping, but `mView` is still pointing the view that we just unmapped.

(In reply to Ray Kraesig [:rkraesig] from comment #5)
> (Addendum: in fairness, there are a number of existing code paths (_e.g._, [[1]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#417-420), [[2]](https://searchfox.org/mozilla-central/source/browser/app/winlauncher/freestanding/DllBlocklist.cpp#541-542)) in our hook that do the same thing. That they seem never to have triggered this crash is possibly relevant?)

The patch you attached should prevent the crash but it is indeed concerning that we would be blocking our own intentional call to `NtMapViewOfSection` here and I'm not sure how we would react to that. It would be nice to understand why we would decide to block here. The new logic implemented in `patched_NtMapViewOfSection` with bug 1842088 may be too restrictive and need an adjustment.

Most likely, as you pointed out, we take the following new failure path:

```c++
  PUBLIC_OBJECT_BASIC_INFORMATION obi;
  NTSTATUS ntStatus = ::NtQueryObject(aSection, ObjectBasicInformation, &obi,
                                      sizeof(obi), nullptr);
  if (!NT_SUCCESS(ntStatus)) {
    ::NtUnmapViewOfSection(aProcess, *aBaseAddress);
    return STATUS_ACCESS_DENIED;
  }
```

I wonder if we may be observing a race condition where, due to interaction with third-party products, we somehow end up having another thread close the section handle between the call to `NtMapViewOfSection` and the call to `NtQueryObject`? Could we sometimes have another thread reach `SharedSection::Reset` while we are in `SharedSection::EnsureWriteCopyViewOnce`?

Edit: In most crashes the main thread is in `mozilla::CreateAndStorePreXULSkeletonUI`, [which is called right after `mozilla::freestanding::gSharedSection.ConvertToReadOnly();`](https://searchfox.org/mozilla-central/source/browser/app/nsBrowserApp.cpp#429-431) and takes some time to complete, so I would suspect the following. On most computers, we only have one thread when we reach the code that converts the shared section to read-only, so no risk of a race condition. But third-party products can start a thread in our process before we reach that point, in which case they make the race condition possible.

I will try to see if I can artificially force the race condition to occur with artificial delays, and see if I get the same crash by doing that.

Back to Bug 1850969 Comment 7