Closed Bug 1783223 Opened 2 years ago Closed 1 year ago

Enable Arbitrary Code Guard in RDD on Nightly

Categories

(Core :: Security: Process Sandboxing, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox108 --- fixed

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

This was previously disabled in bug 1673194 because of start up crashes.
However, it wasn't obvious under what circumstances these crashes
happen. I'd like to investigate the cause and determine if we can
enable ACG under some circumstances.

This was previously disabled in bug 1673194 because of start up crashes.
However, it wasn't obvious under what circumstances these crashes
happen. I'd like to investigate the cause and determine if we can
enable ACG under some circumstances.

Blocks: 1381050
Severity: -- → S4
Type: defect → enhancement
Priority: -- → P1

So I can reproduce the x86 32 bit crash locally. We're crashing in a function that's called by a function that calls GetProcessMitigationPolicy(GetCurrentProcess(), ProcessDynamicCodePolicy, ...) so that's pretty interesting.

This function also looks like it will try to opt the current thread out of ACG SetThreadInformation if that's allowed by AllowThreadOptOut

The 64 bit version of msmpeg2vdec.dll doesn't seem to contain similar code to call GetProcessMitigationPolicy

Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ef7acc434052
Enable Arbitratry Code Guard in RDD on Nightly. r=bobowen

It looks the failure's only happen on Windows ASAN builds. Is there a weird interaction between ASAN and ACG?

Flags: needinfo?(jmuizelaar)

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jrmuizel, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bobowencode)

The patch did land, but was backed out.
This was not totally unexpected, because we were trying to get a better idea about what issues this mitigation caused.

Flags: needinfo?(bobowencode)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)

It looks the failure's only happen on Windows ASAN builds. Is there a weird interaction between ASAN and ACG?

Hello, that is very likely indeed. I'm very confident that ACG should be disabled for ASAN to work properly. Below is an explanation why, showing where I would suggest to dig to confirm exactly why that could produce timeouts.

ASAN relies on a open-source run-time library called clang_rt.asan*.dll on Windows. This library contains code that prepares the environment that Firefox will execute in when built with ASAN. As part of this initialization, it puts interceptors on various functions. Various strategies are tried for putting interceptors, but they should all fail with ACG enabled. Moreover some of them could potentially take a very long time to fail. Consider for example this function, which will iterate over regions of memory looking for a suitable location where asking to allocate RWX memory works, something that will always fail with ACG. This may explain the timeouts, although I don't have confirmed practically that this is the exact reason for those.

After discussing with [:bobowen], we think it should be technically possible to make ASAN with ACG working if ACG was enabled by the child process itself dynamically (so after ASAN initialization), and not as part of the startup info for that process set by the parent. Although we want ASAN builds to work as close as possible to release builds, it's unclear what we would really gain by doing that compared to just disabling ACG for ASAN builds.

Pushed by jmuizelaar@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a389830fb63f
Enable Arbitratry Code Guard in RDD on Nightly. r=bobowen
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch

Here are some notes for the future.

[:jrmuizel] pointed out that some other processes already have ACG with ASAN. They are using SetDelayedProcessMitigations to achieve that, meaning that the mitigation will be applied after initialization (in particular, after ASAN initialization). Since the code to do this already exists, we could consider using SetDelayedProcessMitigations to enable ACG in ASAN builds here too, mostly for the sake of having consistent behavior between the two kinds of builds and not really for a security reason (so, this is much lower priority compared to doing it for release builds).

Regarding release builds however, if everything works with the non-delayed approach, I'd recommend keeping it non-delayed for them. The security impact is better with a non-delayed ACG. The problem I see with enabling the mitigation as delayed is that I think the legitimate code that runs before the mitigation is applied is allowed to allocate RWX memory that may survive for the rest of the life of the process. I don't think delayed ACG would catch this if it came to happen.

After some discussion, although it's likely only a few lines of code, this would only catch cases where an ASAN Nightly user does some action to provoke an ACG violation that no user on a regular Nightly does. That seems borderline enough that we probably don't need to bother.

Hello, this enhancement may have caused bug 1790713, could you please have a look ?
Thank you.

Regressions: 1790713
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 106 Branch → ---
Summary: Enable Arbitratry Code Guard in RDD on Nightly → Enable Arbitrary Code Guard in RDD on Nightly

(In reply to Bernard Alleysson from comment #17)

Hello, this enhancement may have caused bug 1790713, could you please have a look ?
Thank you.

I will take a look at this.

The code in DllBlocklist_Initialize relies on features very similar to the ones I described for ASAN, so it shouldn't work under a non-delayed ACG. I suspect that instead of failing gracefully when ACG is active, it currently doesn't check for it, tries to do its job, and leaves the process in a weird ready-to-crash state. I will confirm this theory next week.

If that is indeed the problem, we should consider delaying ACG here, and we could additionally make sure that DllBlocklist_Initialize doesn't try to do its job if it detects non-delayed ACG. Delaying ACG would fix this because ApplyProcessMitigationsToCurrentProcess runs after DllBlocklist_Initialize, so ACG wouldn't be active yet. That is probably why other processes that already have delayed ACG don't yield the problem we see here.

Regarding 64-bit builds: the current code in DllBlocklist_Initialize doesn't explicitly check whether the mitigation is set and will try to do its job, however it seems to fail gracefully after the first failing call to VirtualAlloc (see full stack trace below). As a result I can load videos without problem on my machine and cannot reproduce the problem. The blocklist doesn't appear to be the root cause here. After discussing with [:bobowen] about how to address this, I will propose a custom build to nightly users who reported the problem, which will hopefully help us go further with this.

For my experiments I tried to catch failing calls to VirtualAlloc and VirtualProtect, as well as attempts to read or set the process policy ProcessDynamicCodePolicy or the thread information ThreadDynamicCodePolicy. With WinDbg that translates to the following, given the positions of the ret instructions in my specific version of KERNELBASE:

bp KERNELBASE!VirtualAlloc+0x62 "j (rax=0) ''; 'gc'"
bp KERNELBASE!VirtualProtect+0x56 "j (rax=0) ''; 'gc'"
bp KERNELBASE!GetProcessMitigationPolicy "j (rdx=2) ''; 'gc'"
bp KERNELBASE!SetProcessMitigationPolicy "j (rdx=2) ''; 'gc'"
bp KERNELBASE!GetThreadInformation "j (rdx=2) ''; 'gc'"
bp KERNELBASE!SetThreadInformation "j (rdx=2) ''; 'gc'"

This resulted in catching the following failing call to VirtualAlloc:

00 000000fe`22ffed88 00007ffb`1e39682a     KERNELBASE!VirtualAlloc+0x62
01 000000fe`22ffed90 00007ffb`1e3966ac     mozglue!mozilla::interceptor::MMPolicyInProcess::MaybeCommitNextPage+0x6a [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/MMPolicies.h @ 594]
02 000000fe`22ffee10 00007ffb`1e3920e6     mozglue!mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>::GetNextTrampoline+0x3c [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 157]
03 (Inline Function) --------`--------     mozglue!mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>,std::nullptr_t>::GetNextTrampoline+0xd [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 80]
04 (Inline Function) --------`--------     mozglue!mozilla::interceptor::VMSharingPolicyShared::GetNextTrampoline+0x2b [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 263]
05 (Inline Function) --------`--------     mozglue!mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyShared,mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>,std::nullptr_t> >::GetNextTrampoline+0x2b [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 48]
06 000000fe`22ffeea0 00007ffb`1e391f4f     mozglue!mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::AddHook+0x126 [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/PatcherDetour.h @ 451]
07 000000fe`22ffefd0 00007ffb`1e391ab9     mozglue!mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddDetour+0x3ff [/builds/worker/workspace/obj-build/dist/include/nsWindowsDllInterceptor.h @ 522]
08 000000fe`22fff170 00007ffb`1e3b5ea0     mozglue!mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddDetour+0x159 [/builds/worker/workspace/obj-build/dist/include/nsWindowsDllInterceptor.h @ 476]
09 (Inline Function) --------`--------     mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::ApplyDetour+0x5 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 186]
0a 000000fe`22fff240 00007ffb`8e9d643a     mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::InitOnceCallback+0x30 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 197]
0b 000000fe`22fff280 00007ffb`8bfe0b11     ntdll!RtlRunOnceExecuteOnce+0x9a
0c 000000fe`22fff2c0 00007ffb`1e3b4ab3     KERNELBASE!InitOnceExecuteOnce+0x21
0d (Inline Function) --------`--------     mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::SetDetour+0x54 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 141]
0e 000000fe`22fff300 00007ff6`81291b9e     mozglue!DllBlocklist_Initialize+0x213 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp @ 622]
0f 000000fe`22fff490 00007ff6`8129184f     firefox!NS_internal_main+0x27e [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 327]
10 000000fe`22fff680 00007ff6`812914c0     firefox!wmain+0x34f [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 167]
11 000000fe`22fff980 00007ff6`812913d7     firefox!main+0x50 [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 39]
12 000000fe`22fff9e0 00007ff6`81291436     firefox!WinMainCRTStartup+0x297
13 000000fe`22fffaa0 00007ffb`8cc254e0     firefox!mainCRTStartup+0x16
14 000000fe`22fffad0 00007ffb`8e9c485b     KERNEL32!BaseThreadInitThunk+0x10
15 000000fe`22fffb00 00000000`00000000     ntdll!RtlUserThreadStart+0x2b

BaseThreadInitThunk hook failed

And the following attempt at checking the process policy:

00 0000005a`687fd4b8 00007ffa`b3d75da8     KERNELBASE!GetProcessMitigationPolicy
01 0000005a`687fd4c0 00007ffa`b3d751aa     d3d11!D3D11CoreCreateDevice+0xff8
02 0000005a`687fd5c0 00007ffa`b3d73c42     d3d11!D3D11CoreCreateDevice+0x3fa
03 0000005a`687fd8d0 00007ffa`b3db24f7     d3d11+0x13c42
04 0000005a`687fdb60 00007ffa`b3db23ec     d3d11!D3D11CreateDeviceAndSwapChain+0xf7
05 0000005a`687fdc20 00007ffa`b3db235e     d3d11!D3D11CreateDevice+0x16c
06 0000005a`687fdc90 00007ff9`f931ba11     d3d11!D3D11CreateDevice+0xde
07 0000005a`687fdd40 00007ff9`f931c34d     xul!mozilla::gfx::DeviceManagerDx::CreateDevice+0xc1 [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 753]
08 0000005a`687fde50 00007ff9`f931c229     xul!mozilla::gfx::DeviceManagerDx::CreateContentDevice+0xad [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 862]
09 0000005a`687fdfd0 00007ff9`f931c1ee     xul!mozilla::gfx::DeviceManagerDx::CreateContentDevicesLocked+0x29 [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 509]
0a 0000005a`687fe020 00007ff9`fa6c1577     xul!mozilla::gfx::DeviceManagerDx::CreateContentDevices+0x1e [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 495]
0b 0000005a`687fe060 00007ff9`fa6e4210     xul!mozilla::RDDParent::RecvInitVideoBridge+0x67 [C:\mozilla-source\mozilla-unified\dom\media\ipc\RDDParent.cpp @ 219]
0c 0000005a`687fe0b0 00007ff9`f8e813e0     xul!mozilla::PRDDParent::OnMessageReceived+0xb70 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-mingw32\ipc\ipdl\PRDDParent.cpp @ 596]
0d 0000005a`687fe380 00007ff9`f8e80855     xul!mozilla::ipc::MessageChannel::DispatchAsyncMessage+0x70 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1756]
0e 0000005a`687fe3e0 00007ff9`f8e80c24     xul!mozilla::ipc::MessageChannel::DispatchMessage+0x155 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1685]
0f 0000005a`687fe4c0 00007ff9`f8e80eb1     xul!mozilla::ipc::MessageChannel::RunMessage+0x104 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1482]
10 0000005a`687fe510 00007ff9`f8855e87     xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x71 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1588]
11 0000005a`687fe560 00007ff9`f883ed0d     xul!mozilla::RunnableTask::Run+0xb7 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 539]
12 0000005a`687fe9f0 00007ff9`f883dd48     xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0x7dd [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 851]
13 0000005a`687feca0 00007ff9`f883df89     xul!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal+0x28 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 683]
14 0000005a`687fed30 00007ff9`f88584d2     xul!mozilla::TaskController::ProcessPendingMTTask+0x39 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 461]
15 (Inline Function) --------`--------     xul!mozilla::TaskController::InitializeInternal::<lambda_1>::operator()+0xe [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 187]
16 0000005a`687fedb0 00007ff9`f884b5ae     xul!mozilla::detail::RunnableFunction<`lambda at C:/mozilla-source/mozilla-unified/xpcom/threads/TaskController.cpp:187:7'>::Run+0x12 [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThreadUtils.h @ 532]
17 0000005a`687fede0 00007ff9`f884f688     xul!nsThread::ProcessNextEvent+0x63e [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThread.cpp @ 1209]
18 0000005a`687fefa0 00007ff9`f8e83a28     xul!NS_ProcessNextEvent+0x68 [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThreadUtils.cpp @ 465]
19 0000005a`687feff0 00007ff9`f8e446f0     xul!mozilla::ipc::MessagePump::Run+0xa8 [C:\mozilla-source\mozilla-unified\ipc\glue\MessagePump.cpp @ 86]
1a (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0x16 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 381]
1b 0000005a`687ff050 00007ff9`f8e44668     xul!MessageLoop::RunHandler+0x50 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 375]
1c 0000005a`687ff0a0 00007ff9`fb00ccb8     xul!MessageLoop::Run+0x58 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 357]
1d 0000005a`687ff0f0 00007ff9`fb09c9ec     xul!nsBaseAppShell::Run+0x28 [C:\mozilla-source\mozilla-unified\widget\nsBaseAppShell.cpp @ 152]
1e 0000005a`687ff130 00007ff9`fc5fde1c     xul!nsAppShell::Run+0x1cc [C:\mozilla-source\mozilla-unified\widget\windows\nsAppShell.cpp @ 614]
1f 0000005a`687ff2a0 00007ff9`f8e446f0     xul!XRE_RunAppShell+0x4c [C:\mozilla-source\mozilla-unified\toolkit\xre\nsEmbedFunctions.cpp @ 880]
20 (Inline Function) --------`--------     xul!MessageLoop::RunInternal+0x16 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 381]
21 0000005a`687ff2e0 00007ff9`f8e44668     xul!MessageLoop::RunHandler+0x50 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 375]
22 0000005a`687ff330 00007ff9`fc5fdb5c     xul!MessageLoop::Run+0x58 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 357]
23 0000005a`687ff380 00007ff7`0691189b     xul!XRE_InitChildProcess+0x8ec [C:\mozilla-source\mozilla-unified\toolkit\xre\nsEmbedFunctions.cpp @ 743]
24 (Inline Function) --------`--------     firefox!content_process_main+0xa3 [C:\mozilla-source\mozilla-unified\ipc\contentproc\plugin-container.cpp @ 57]
25 0000005a`687ff620 00007ff7`06911340     firefox!NS_internal_main+0x4db [C:\mozilla-source\mozilla-unified\browser\app\nsBrowserApp.cpp @ 359]
26 0000005a`687ff7f0 00007ff7`069656d8     firefox!wmain+0x340 [C:\mozilla-source\mozilla-unified\toolkit\xre\nsWindowsWMain.cpp @ 167]
27 (Inline Function) --------`--------     firefox!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90]
28 0000005a`687ffaf0 00007ffa`bf9f54e0     firefox!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
29 0000005a`687ffb30 00007ffa`c0a6485b     KERNEL32!BaseThreadInitThunk+0x10
2a 0000005a`687ffb60 00000000`00000000     ntdll!RtlUserThreadStart+0x2b
Attached file translation-with-assembly (obsolete) —
Regarding 32-bit builds: I did a similar experiment with the isolated example from bug 1783223 comment 6 (not Firefox). To summarize, I would agree that given the current behavior of the 32-bit version of `msmpeg2vdec.dll`, the changes that [:jrmuizel] introduced to allow threads to opt out are the best compromise we can make for 32-bit builds.

Here are more details, obtained with the following breakpoints:

```
bp KERNELBASE!VirtualProtect+0x39 "j (eax=0) ''; 'gc'"
bp KERNELBASE!VirtualAlloc+0x4c "j (eax=0) ''; 'gc'"
bp KERNELBASE!SetThreadInformation "j (poi(esp+0x8) = 2) ''; 'gc'"
bp KERNELBASE!GetProcessMitigationPolicy "j (poi(esp+0x8) = 2) ''; 'gc'"
bp KERNELBASE!SetProcessMitigationPolicy "j (poi(esp+0x8) = 2) ''; 'gc'"
```

This led me to a code path in `msmpeg2vdec` which looks as follows, the translation to C++ being my own:

```
  DWORD dwThreadDynamicCodePolicy = 0;
  BOOL bResult = GetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy,
    sizeof(dwThreadDynamicCodePolicy));
  if(!bResult) {
    goto opt_out;
  }
  if(dwThreadDynamicCodePolicy == THREAD_DYNAMIC_CODE_ALLOW) {
    goto opt_out_done;
  }
opt_out:
  dwThreadDynamicCodePolicy = THREAD_DYNAMIC_CODE_ALLOW;
  SetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy));
  someObject->someField = 1;
opt_out_done:
  return someObject;
```

Here is the same with the original assembly:

```
  DWORD dwThreadDynamicCodePolicy = 0;
        656f5985 8364240c00      and     dword ptr [esp+0Ch],0

  BOOL bResult = GetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy,
    sizeof(dwThreadDynamicCodePolicy));
        656f598a 8d44240c        lea     eax,[esp+0Ch]
        656f598e 6a04            push    4
        656f5990 50              push    eax
        656f5991 6a02            push    2
        656f5993 ff1594918565    call    dword ptr [msmpeg2vdec!DllUnregisterServer+0x14dd34 (65859194)] // points to GetCurrentThread
        656f5999 8b35e81a8565    mov     esi,dword ptr [msmpeg2vdec!DllUnregisterServer+0x146688 (65851ae8)] // points to GetThreadInformation
        656f599f 8bce            mov     ecx,esi
        656f59a1 50              push    eax
        656f59a2 ff15a8948565    call    dword ptr [msmpeg2vdec!DllUnregisterServer+0x14e048 (658594a8)] // points to a ret instruction
        656f59a8 ffd6            call    esi

  if(!bResult) { goto opt_out; }
        656f59aa 85c0            test    eax,eax
        656f59ac 740b            je      msmpeg2vdec!DllRegisterServer+0x24519 (656f59b9)

  if(dwThreadDynamicCodePolicy == THREAD_DYNAMIC_CODE_ALLOW) { goto opt_out_done; }
        656f59ae 837c240c01      cmp     dword ptr [esp+0Ch],1
        656f59b3 0f849845fcff    je      msmpeg2vdec!DllGetClassObject+0x96b1 (656b9f51)

opt_out:
  dwThreadDynamicCodePolicy = THREAD_DYNAMIC_CODE_ALLOW;
        656f59b9 0fb6c3          movzx   eax,bl
        656f59bc 8944240c        mov     dword ptr [esp+0Ch],eax
  SetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy));
        656f59c0 8d44240c        lea     eax,[esp+0Ch]
        656f59c4 6a04            push    4
        656f59c6 50              push    eax
        656f59c7 6a02            push    2
        656f59c9 ff1594918565    call    dword ptr [msmpeg2vdec!DllUnregisterServer+0x14dd34 (65859194)] // points to GetCurrentThread
        656f59cf 8b35ec1a8565    mov     esi,dword ptr [msmpeg2vdec!DllUnregisterServer+0x14668c (65851aec)]  // points to SetThreadInformation
        656f59d5 8bce            mov     ecx,esi
        656f59d7 50              push    eax
        656f59d8 ff15a8948565    call    dword ptr [msmpeg2vdec!DllUnregisterServer+0x14e048 (658594a8)] // points to a ret instruction
        656f59de ffd6            call    esi

  someObject->someField = 1;
        656f59e0 c6471001        mov     byte ptr [edi+10h],1
        656f59e4 e96845fcff      jmp     msmpeg2vdec!DllGetClassObject+0x96b1 (656b9f51)

opt_out_done:
  return someObject;
        656b9f51 8bc7            mov     eax,edi
        656b9f53 5f              pop     edi
        656b9f54 5e              pop     esi
        656b9f55 5b              pop     ebx
        656b9f56 8be5            mov     esp,ebp
        656b9f58 5d              pop     ebp
        656b9f59 c20400          ret     4
```

We reach this code after multiple code paths that get the current process' policy for ACG, including one originating from `msmpeg2vdec` (note that this was not the case with 64-bit builds, where none was originating from `msmpeg2vdec`). The important point I'd like to share about the code above is that the result of the call to `SetThreadInformation` is not checked and thus lost. This code thums seems to assume that opting out of ACG will work and seems unable to adapt to a strict ACG without opt-out.

After this code gets executed, a failing call to `VirtualProtect` originating from `msmpeg2vdec` occurs:

```
00 03afd5f0 656b938e     KERNELBASE!VirtualProtect+0x39
01 03afd618 656b932e     msmpeg2vdec!DllGetClassObject+0x8aee
02 03afd630 656b4b90     msmpeg2vdec!DllGetClassObject+0x8a8e
03 03afd7e8 656aec59     msmpeg2vdec!DllGetClassObject+0x42f0
04 03afd8d8 656ae894     msmpeg2vdec+0x7ec59
05 03afd918 656ae3fa     msmpeg2vdec+0x7e894
06 03afd950 656ae373     msmpeg2vdec+0x7e3fa
07 03afd968 75a09d5c     msmpeg2vdec+0x7e373
08 03afda68 75a17385     combase!CServerContextActivator::CreateInstance+0x1ec [onecore\com\combase\objact\actvator.cxx @ 881]
   ...
```

Then the code path that tries to opt out is reached once again (and will, again, fail without noticing at `SetThreadInformation`), and finally we crash by jumping to a portion of the area that `VirtualProtect` was trying to set as executable. In my case the failing call was `VirtualProtect(lpAddress=0x09160000, dwSize=0x00010000, flProtect=0x40=PAGE_EXECUTE_READWRITE)` and I was crashing with `eip=0916da87`.
Attachment #9295527 - Attachment is obsolete: true

Here is a technical update regarding the progress made so far while trying to understand the problem in bug 1790713.

Summary

I have analyzed the paths taken in kernelbase.dll, ntdll.dll, and ntoskrnl.exe by APIs impacted by ACG, in search of realistic ways to let reporters help us debug the problem. To summarize findings:

  • MiArbitraryCodeBlocked is a valuable point of interest in kernel code to study ACG failures. Almost all code paths use this function to check the status of ACG for the current thread.
  • RtlSetLastWin32Error (a.k.a. SetLastError) is a valuable point of interest in userland code to study ACG failures. All APIs that fail because of ACG should call this function with ERROR_DYNAMIC_CODE_BLOCKED before they return.
  • There is an internal variable called g_dwLastErrorToBreakOn baked into in ntdll.dll, which can be used to produce a breakpoint when a specific error code gets passed to RtlSetLastWin32Error.

What happens in the kernel

MiArbitraryCodeBlocked will generate ETW events for two different providers:

Here is a rough pseudo-code equivalent for MiArbitraryCodeBlocked, leading to different ETW events being produced depending on how ACG is configured (note: this has been updated since first write):

EVENT_DESCRIPTOR MITIGATION_AUDIT_PROHIBIT_DYNAMIC_CODE{
  Id=1, Version=0, Channel=0x10, Level=0, Opcode=0, Task=1, Keyword=0x8000000000000000
};
EVENT_DESCRIPTOR MITIGATION_ENFORCE_PROHIBIT_DYNAMIC_CODE{
  Id=2, Version=0, Channel=0x10, Level=3, Opcode=0, Task=1, Keyword=0x8000000000000000
};

EVENT_DESCRIPTOR KERNEL_MEM_EVENT_ACG{
  Id=8, Version=0, Channel=0x10, Level=4, Opcode=0, Task=6, Keyword=0x8000000000000100
};

// Returns STATUS_DYNAMIC_CODE_BLOCKED if ACG is active for the current thread, STATUS_SUCCESS otherwise
NTSTATUS MiArbitraryCodeBlocked(CurrentProcess)
{
  if (IsDynamicCodeBlocked(CurrentProcess) && !HasOptedOut(GetCurrentThread())) {
    // The current operation is blocked by ACG
    EtwWriteEx(RegHandleFor("Microsoft-Windows-Kernel-Memory"), &KERNEL_MEM_EVENT_ACG, ..., Flags=1, ...); // ACGFlag = 0x80000000
    if (IsDynamicCodeAudited(CurrentProcess)) {
      // Report only one failure to Microsoft-Windows-Security-Mitigations
      EtwWriteEx(RegHandleFor("Microsoft-Windows-Security-Mitigations"), &MITIGATION_ENFORCE_PROHIBIT_DYNAMIC_CODE, ..., Flags=0, ...);
      SetDynamicCodeAudited(CurrentProcess, false);
    }
    return STATUS_DYNAMIC_CODE_BLOCKED;
  }
  if (IsDynamicCodeAudited(CurrentProcess) && !HasOptedOut(GetCurrentThread())) {
    // Using ACG in audit mode, meaning no actual ACG failures will occur, but events are reported
    // Report only one failure to Microsoft-Windows-Security-Mitigations
    EtwWriteEx(RegHandleFor("Microsoft-Windows-Security-Mitigations"), &MITIGATION_AUDIT_PROHIBIT_DYNAMIC_CODE, ..., Flags=0, ...);
    SetDynamicCodeAudited(CurrentProcess, false);
  }
  EtwWriteEx(RegHandleFor("Microsoft-Windows-Kernel-Memory"), &KERNEL_MEM_EVENT_ACG, ..., Flags=1, ...); // ACGFlag = 0
  return STATUS_SUCCESS;
}

(Edited) About IsDynamicCodeAudited in the pseudo-code above: this corresponds to the AuditProhibitDynamicCode bit. This bit can be set individually but it is also automatically set to 1 when setting ProhibitDynamicCode to 1. The important information here is that there will be at most one ETW event reported on Microsoft-Windows-Security-Mitigations unless there is a new call to SetProcessMitigationPolicy!

What happens in userland

The attached diagram explains what happens when a system call fails due to an ACG failure through the example of the VirtualProtect API. As shown in the diagram, if a syscall should fail under ACG, it will propagate the STATUS_DYNAMIC_CODE_BLOCKED value as their result, then the higher-level API that called them will set the last error to ERROR_DYNAMIC_CODE_BLOCKED accordingly. STATUS_DYNAMIC_CODE_BLOCKED is 0xc0000604 and ERROR_DYNAMIC_CODE_BLOCKED is 0x677. The translation is performed by BaseSetLastNTError.

What I tried

Based on this information, I produced several custom builds as my understanding grew, gradually catching more potential causes of failure and getting closer to the original Nightly build:

  • The first custom build requires a delayed ACG. It catches failing calls to VirtualAlloc, VirtualProtect, MapViewOfFile, and SetProcessValidCallTargets using Firefox’s own hooking functions.
  • The second custom build requires a delayed ACG. It catches all calls failing with ERROR_DYNAMIC_CODE_BLOCKED, by hooking RtlSetLastWin32Error with Firefox’s own hooking functions.
  • The third custom build (CI failures are expected) uses a non-delayed ACG like the original Nightly build. It relies on a new separate pdbtool.exe executable that downloads ntdll.dll PDB symbols. This build catches all calls failing with ERROR_DYNAMIC_CODE_BLOCKED, by setting the ntdll!g_dwLastErrorToBreakOn internal variable to ERROR_DYNAMIC_CODE_BLOCKED at process creation, while the process is still suspended.

I have tested each custom build with a Windows 10 VM from before Creators' Update, removing the version check introduced by [:jrmuizel], which we know should provoke an ACG failure. This resulted in crashes for each individual build.

Going further

In addition to the custom builds, I have tried to get some information from the reporters through the ETW events generated from MiArbitraryCodeBlocked. Unfortunately these logs are not very verbose. So far I have not succeeded in my attempts to have ETW collect the stack traces when the events are written, which would definitely help us understand what's going on. It may be possible to collect those stack traces, but my ETW knowledge is too limited currently.

ESET Endpoint Antivirus was installed on the machine of at least two out of three users who reproduced bug 1790713. ESET seems compatible with a delayed ACG though.

The original Nightly build is expected to produce a non-fatal ACG failure in DllBlocklist_Initialize, while the third custom build I provided is expected to produce no ACG failure. However for the ESET user, both builds produced exactly one ACG failure. This indicates that the additional failure observed by the ESET user was fatal and prevented the original Nightly build's process from even reaching DllBlocklist_Initialize. Otherwise, there should be two failures reported with the original Nightly build, the one likely due to ESET and the one known to be caused by DllBlocklist_Initialize. (Edited: The reasoning here is false. See the updated pseudo-code in comment 24: there will always be a single failure reported on Microsoft-Windows-Security-Mitigations by design choice from Microsoft.)

Since this user moreover had weird events in their ETW traces, I was curious to see if I could reproduce those by adding code in Firefox that does what ESET probably does as well - inject code and run threads in processes at their creation time. I used VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread from the parent process while the child process was still suspended. I injected a ret instruction and created a thread that executes it. Here are some notes after these tests:

  • ACG does not impact allocations originating from an external process through VirtualAllocEx. From the parent, I was able to allocate remote RWX memory in a RDD process with non-delayed ACG.
  • This simple test produced the following event: Process '...\firefox.exe' (PID 23488) was blocked from loading the non-Microsoft-signed binary '...\mozglue.dll'.. The ESET user had a similar error but for mozavcodec.dll and mozavutil.dll.
  • I wasn't able to reproduce the events related to Win32k.

I would thus say that ESET is probably creating remote threads somewhere around process creation, though not using VirtualAllocEx. They are probably using another method which can result in an ACG failure, and not adapting to their operation being denied by the system. (Edited: Given the false reasoning above, it's a bit hasty to conclude this.)

I think it would be worth investigating the root cause for the extra ETW events observed by ESET users as well as they may be producing problems without us noticing. They are not related to adding ACG in RDD though. Making more advances on having stack traces with ETW reporting could also help here.

In that regard, I was able to produce the stack traces associated with ACG-related ETW events programmatically by adapting an official example from krabsetw. It's only raw non-symbolized addresses at the moment, but at least it shows that this information can be retrieved.

Regarding the code used to opt out in 32-bit builds, which [:jrmuizel] identified in comment 3 and comment 4 and I analyzed in comment 23, I found the corresponding source code in the open-source though outdated chakra-core repository. I was able to match the codes to confirm that this is indeed what's used in msmpeg2vdec.dll and also in MSAudDecMFT.dll as suggested in bug 1766432 comment 3. See:

Here's how it's used: AutoEnableDynamicCodeGen objects are added to the scope of code that would otherwise get rejected by ACG. The constructor for AutoEnableDynamicCodeGen will opt out the current thread of ACG, and the destructor will opt in again. An example use in source code is available here. AutoEnableDynamicCodeGen will fail silently in release builds when executed within a process that doesn't allow threads to opt out, thus leading to the later crashes that we observed.

In recent 32-bit builds, compared to the outdated open-source implementation, the code has slightly evolved to include reporting errors when we are trying to opt out from the context of a process that doesn't allow opting out, however the code itself will continue to fail silently and neither throw an exception nor force a crash. As was mentioned in comment 5, 64-bit versions of msmpeg2vdec.dll and MSAudDecMFT.dll have shifted to code that no longer requires opting out from ACG with Windows 10 Creators' Update.

We should check exactly which versions of Windows have the opt-out code in these DLLs and use ACG with opt out for these versions. That should have started around the time that the code was introduced in chakra-core.

After analyzing these 2 DLLs on various versions on Windows, here is the shared compatibility chart for msmpeg2vdec.dll and MSAudDecMFT.dll:

Windows Version 32-bit 64-bit
8.1 and before INC INC
10 1511 (November Update) or earlier INC INC
10 1607 (Anniversary Update) OPT OPT
10 1703 (Creators' Update) or later OPT ACG

where:

  • INC: Incompatible with any variant of ACG;
  • OPT: Compatible with ACG if opting out is enabled;
  • ACG: Fully compatible with any variant of ACG.

I have produced a first version of mitimon, a custom tool which traces ETW events related to security mitigations.

The good

  • Catches ACG violations reported on both Microsoft-Windows-Security-Mitigations and Microsoft-Windows-Kernel-Memory.
  • Produces symbolized kernel and userland stack traces for these events, at least on Windows 11.

Current problems

The tool can already be used with its full potential to catch ACG failures on Windows 11, and provide some results on Windows 10. It can also be used to track the first Win32k failure in a process, but not the next ones. Here is why:

  • On both Windows 11 and Windows 10, only one ACG violation event at most is reported per process on Microsoft-Windows-Security-Mitigations by design choice from Microsoft. See the updated pseudo-code for MiArbitraryCodeBlocked in comment 24. Things are similar for Win32k violations. This is where Microsoft-Windows-Kernel-Memory saves us, though only for ACG, because all ACG violation events are reported on it.
  • However, on Windows 10, the stack traces seem to be missing the userland part for the ACG violations reported on Microsoft-Windows-Kernel-Memory. I don't have this problem on my Windows 11 machine, so this looks like a bug that Microsoft would have solved in Windows 11 without backporting the patch to Windows 10. From my tests, the common root cause for events that don't get their user stack could be that they report events using EtwWriteEx with the undocumented Flags parameter set to 1, compared to 0 for regular events that do get their user stack. This is kernel code so we cannot do much ourselves.

Going further

I tried two potential ways to bypass the current problem for Windows 10 and ACG:

  1. When we detect an ACG failure, we can execute code within the process in which the failure occurred. We make that process call SetProcessMitigationPolicy again. That resets the boolean used in the kernel to decide whether the next ACG failure should be reported on Microsoft-Windows-Security-Mitigations. Beyond the fact that this technique is intrusive, the main problem with it is that we will miss ACG failures that may occur in the lapse of time between the current failure and the effective call to SetProcessMitigationPolicy. Indeed ETW isn't real-time, and the audited process keeps running while we are analyzing its events. In my tests, I wasn't able to catch successive failures with this technique without adding intentional Sleep delays between them, despite using a dedicated high-priority thread to parse the events.
  2. We can use ETW to capture stack traces at system call entry, and try to match these stack traces to the ACG violation events reported on Microsoft-Windows-Kernel-Memory to guess their userland stack. Capturing stacks at system call entry requires additional code, but it seems to work. This should provide the best possible results for ACG violations once integrated into the tool.

Regarding tracking Win32k violations, here are ideas regarding what's possible or not:

  1. The 1st technique described for ACG should work for Win32k with the same limitations as those described for ACG.
  2. We cannot use the 2nd technique because (a) there is no event that gets unconditionally reported upon Win32k failures (not just the first failure), and (b) system call entry/exit events are not reported for blocked Win32k system calls.
  3. Because system call entry/exit event are reported for non-blocked Win32k system calls, if we disable the mitigation in the target process, we can use ETW to track these events and detect Win32k system calls originating from the process. This doesn't exactly test what happens when the mitigation is active, but it can help identify what Win32k system calls are still being used by the process.

While using mitimon on my Windows 11 machine, I once caught the following ACG failures in the RDD process with non-delayed ACG (not always though?):

0x00007ffc3302f774 ntdll+0x9f774 ntdll!NtProtectVirtualMemory+0x14
0x00007ffc2a83566b apphelp+0x566b apphelp!SepIatPatch+0x163
0x00007ffc2a83e4b6 apphelp+0xe4b6 apphelp!SepRouterHookImportedApi+0x2b6
0x00007ffc2a83de99 apphelp+0xde99 apphelp!SepRouterHookIAT+0x319
0x00007ffc2a83dac7 apphelp+0xdac7 apphelp!SE_DllLoaded+0x157
0x00007ffc33009997 ntdll+0x79997 ntdll!LdrpSendShimEngineInitialNotifications+0x6f
0x00007ffc330098cc ntdll+0x798cc ntdll!LdrpLoadShimEngine+0x124
0x00007ffc33009271 ntdll+0x79271 ntdll!LdrpInitShimEngine+0x159
0x00007ffc3306612c ntdll+0xd612c ntdll!LdrpInitializeProcess+0x1b54
0x00007ffc33059b1e ntdll+0xc9b1e ntdll!_LdrpInitialize+0x55bf2
0x00007ffc33003ef3 ntdll+0x73ef3 ntdll!LdrpInitializeInternal+0x6b
0x00007ffc33003e1e ntdll+0x73e1e ntdll!LdrInitializeThunk+0xe

0x00007ffc3302f774 ntdll+0x9f774 ntdll!NtProtectVirtualMemory+0x14
0x00007ffc2a83566b apphelp+0x566b apphelp!SepIatPatch+0x163
0x00007ffc2a83e4b6 apphelp+0xe4b6 apphelp!SepRouterHookImportedApi+0x2b6
0x00007ffc2a83de99 apphelp+0xde99 apphelp!SepRouterHookIAT+0x319
0x00007ffc2a83dac7 apphelp+0xdac7 apphelp!SE_DllLoaded+0x157
0x00007ffc32fbf04c ntdll+0x2f04c ntdll!LdrpSendPostSnapNotifications+0x12c
0x00007ffc32fbeeef ntdll+0x2eeef ntdll!LdrpNotifyLoadOfGraph+0x4f
0x00007ffc32fbef09 ntdll+0x2ef09 ntdll!LdrpNotifyLoadOfGraph+0x69
0x00007ffc32fbdce1 ntdll+0x2dce1 ntdll!LdrpPrepareModuleForExecution+0x79
0x00007ffc32fb9040 ntdll+0x29040 ntdll!LdrpLoadDllInternal+0x20c
0x00007ffc32fa932c ntdll+0x1932c ntdll!LdrpLoadDll+0xb0
0x00007ffc32fba95a ntdll+0x2a95a ntdll!LdrLoadDll+0xfa
0x00007ff7d8439b4e firefox+0x29b4e firefox!mozilla::freestanding::patched_LdrLoadDll+0x1ae /builds/worker/checkouts/gecko/browser/app/winlauncher/freestanding/DllBlocklist.cpp:365+0x19
0x00007ffc307aba62 KernelBase+0x2ba62 KernelBase!LoadLibraryExW+0x172
0x00007ffc307a7c41 KernelBase+0x27c41 KernelBase!LoadLibraryExA+0x31
0x00007ffc228712fb igd10iumd64+0x12fb
0x00007ffc22871526 igd10iumd64+0x1526
0x00007ffc295b7390 d3d11+0x17390 d3d11!NDXGI::CUMDAdapter::OpenAdapter10_2+0xa0
0x00007ffc295b658f d3d11+0x1658f d3d11!CCreateDeviceCache::CUMDAdapterCache::Load+0x257
0x00007ffc295b608c d3d11+0x1608c d3d11!CCreateDeviceCache::CAdapterCache::ResolveUMDAndVersion+0x180
0x00007ffc295b590b d3d11+0x1590b d3d11!D3D11CoreCreateDevice+0x45b
0x00007ffc295b41e0 d3d11+0x141e0 d3d11!D3D11CreateDeviceAndSwapChainImpl+0x3f0
0x00007ffc295df847 d3d11+0x3f847 d3d11!D3D11CreateDeviceAndSwapChain+0xf7
0x00007ffc295df62c d3d11+0x3f62c d3d11!?D3D11CreateDeviceImpl@@YAJPEAUIDXGIAdapter@@W4D3D_DRIVER_TYPE@@PEAUHINSTANCE__@@IPEBW4D3D_FEATURE_LEVEL@@IIPEAPEAUID3D11Device@@PEAW44@PEAPEAUID3D11DeviceContext@@@Z+0x5c
0x00007ffc295df71e d3d11+0x3f71e d3d11!D3D11CreateDevice+0xde
0x00007ffb7ab3039d xul+0x280039d xul!mozilla::gfx::DeviceManagerDx::CreateDevice+0xad /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:753+0x71
0x00007ffb7ab30c26 xul+0x2800c26 xul!mozilla::gfx::DeviceManagerDx::CreateContentDevice+0xe6 /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:862+0x25
0x00007ffb7ab30b1f xul+0x2800b1f xul!mozilla::gfx::DeviceManagerDx::CreateContentDevicesLocked+0x1f /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:509+0x8
0x00007ffb7ab30aee xul+0x2800aee xul!mozilla::gfx::DeviceManagerDx::CreateContentDevices+0x1e /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:495+0x8
0x00007ffb7b8e3701 xul+0x35b3701 xul!mozilla::RDDParent::RecvInitVideoBridge+0x91 /builds/worker/checkouts/gecko/dom/media/ipc/RDDParent.cpp:213+0x8
0x00007ffb79112055 xul+0xde2055 xul!mozilla::PRDDParent::OnMessageReceived+0x755 /builds/worker/workspace/obj-build/ipc/ipdl/PRDDParent.cpp:596+0x28
0x00007ffb79c235b6 xul+0x18f35b6 xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x426 /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1579+0x38c
0x00007ffb79e41150 xul+0x1b11150 xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0xf10 /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:851+0x2e2
0x00007ffb79b8c072 xul+0x185c072 xul!nsThread::ProcessNextEvent+0xeb2 /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1205+0x3d
0x00007ffb79e6ac94 xul+0x1b3ac94 xul!mozilla::ipc::MessagePump::Run+0xc4 /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:85+0x29
0x00007ffb78d5092f xul+0xa2092f xul!MessageLoop::RunHandler+0x2f /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:375+0x0
0x00007ffb783538fe xul+0x238fe xul!MessageLoop::Run+0x4e /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:357+0x0
0x00007ffb784c25f8 xul+0x1925f8 xul!nsBaseAppShell::Run+0x28 /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:152+0x0
0x00007ffb784c13a8 xul+0x1913a8 xul!nsAppShell::Run+0x38 /builds/worker/checkouts/gecko/widget/windows/nsAppShell.cpp:614+0x8
0x00007ffb7968ceeb xul+0x135ceeb xul!XRE_RunAppShell+0x4b /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:880+0xd
0x00007ffb78d5092f xul+0xa2092f xul!MessageLoop::RunHandler+0x2f /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:375+0x0
0x00007ffb783538fe xul+0x238fe xul!MessageLoop::Run+0x4e /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:357+0x0
0x00007ffb7968c836 xul+0x135c836 xul!XRE_InitChildProcess+0x5f6 /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:743+0x0
0x00007ff7d84309f6 firefox+0x209f6 firefox!wmain+0x5b6 /builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp:167+0x31a
0x00007ff7d8440398 firefox+0x30398 firefox!__scrt_common_main_seh+0x10c d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288+0x22
0x00007ffc31dc244d kernel32+0x1244d kernel32!BaseThreadInitThunk+0x1d
0x00007ffc32fedf78 ntdll+0x5df78 ntdll!RtlUserThreadStart+0x28

These failures originate from Windows internal code, part of the Shim Engine. The existence of the Shim Engine within Windows means that even loading a DLL can be incompatible with non-delayed ACG. This is even more evidence that we should use the ACG mitigation as delayed. The other reasons were:

  • full incompatibility with ASAN initialization;
  • partial incompatibility with our own DLL blocklist code;
  • likely full incompatibility with ESET antivirus.

The compatibility chart from comment 27 explains the remaining problems in bug 1766432 (x86) and bug 1773005 (versions before Creators' Update). The changes I propose in this changeset fix those as well, on Nightly.

We will wait a bit to push these changes as today is Soft Code Freeze.

Attachment #9298232 - Attachment description: Bug 1783223 - Enable best ACG variant in audio decoder on Nightly. r=bobowen → Bug 1783223 - Use ACG-with-opt-out for 32-bit builds and Windows 10 1607 in audio decoder on Nightly. r=bobowen
Attachment #9298233 - Attachment description: Bug 1783223 - Enable best ACG variant in RDD on Nightly. r=bobowen → Bug 1783223 - Enable best ACG variant compatible with system media libraries in RDD on Nightly. r=bobowen
Attachment #9298231 - Attachment description: Bug 1783223 - Define utility function for choosing the best ACG variant compatible with system media libraries. r=bobowen → Bug 1783223 - Define utility function for choosing an ACG variant compatible with system media libraries. r=bobowen

We should be able to land now.

Flags: needinfo?(jmuizelaar) → needinfo?(yjuglaret)
Pushed by yjuglaret@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0725a94f55ec
Define utility function for choosing an ACG variant compatible with system media libraries. r=bobowen
https://hg.mozilla.org/integration/autoland/rev/09170d6279c4
Use ACG-with-opt-out for 32-bit builds and Windows 10 1607 in audio decoder on Nightly. r=bobowen
https://hg.mozilla.org/integration/autoland/rev/084256b6f69e
Enable best ACG variant compatible with system media libraries in RDD on Nightly. r=bobowen

I have tested the changes with the critical Windows 10 versions from comment 27 (1511, 1607, 1703) and all seems good. I validated that I can play AAC, OGG, and YouTube videos even with MinGW builds. I was worried about some test failures in browser_utility_audioDecodeCrash.js, browser_utility_audio_shutdown.js and browser_utility_multipleAudio.js with MinGW builds, but they do not originate from these changes and were already failing on central. These tests are not normally associated with MinGW targets and I added them myself -- so it may be expected that they don't work.

Flags: needinfo?(yjuglaret)
Status: REOPENED → RESOLVED
Closed: 2 years ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: