Enable Arbitrary Code Guard in RDD on Nightly
Categories
(Core :: Security: Process Sandboxing, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox108 | --- | fixed |
People
(Reporter: jrmuizel, Assigned: jrmuizel)
References
(Blocks 1 open bug)
Details
Attachments
(5 files, 1 obsolete file)
This was previously disabled in bug 1673194 because of start up crashes.
However, it wasn't obvious under what circumstances these crashes
happen. I'd like to investigate the cause and determine if we can
enable ACG under some circumstances.
Assignee | ||
Comment 1•2 years ago
|
||
This was previously disabled in bug 1673194 because of start up crashes.
However, it wasn't obvious under what circumstances these crashes
happen. I'd like to investigate the cause and determine if we can
enable ACG under some circumstances.
Updated•2 years ago
|
Assignee | ||
Comment 2•2 years ago
|
||
This crashes on try on x86. So that's a good place to start:
https://treeherder.mozilla.org/jobs?repo=try&revision=9160ef478a62416dd6135d12ec6450353fb099fe&selectedTaskRun=MRo8Bly1Tlq1b8wo6G1ZUg.0
Assignee | ||
Comment 3•2 years ago
|
||
So I can reproduce the x86 32 bit crash locally. We're crashing in a function that's called by a function that calls GetProcessMitigationPolicy(GetCurrentProcess(), ProcessDynamicCodePolicy, ...)
so that's pretty interesting.
Assignee | ||
Comment 4•2 years ago
|
||
This function also looks like it will try to opt the current thread out of ACG SetThreadInformation
if that's allowed by AllowThreadOptOut
Assignee | ||
Comment 5•2 years ago
|
||
The 64 bit version of msmpeg2vdec.dll doesn't seem to contain similar code to call GetProcessMitigationPolicy
Assignee | ||
Comment 6•2 years ago
|
||
Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ef7acc434052 Enable Arbitratry Code Guard in RDD on Nightly. r=bobowen
Comment 8•2 years ago
|
||
Backed out changeset for causing multiple mochitest failures.
Backout link: https://hg.mozilla.org/integration/autoland/rev/41da5c9b54f31928f5114d710a97ab1b78e79210
Failure log:
https://treeherder.mozilla.org/logviewer?job_id=387747280&repo=autoland&lineNumber=2207
https://treeherder.mozilla.org/logviewer?job_id=387752231&repo=autoland&lineNumber=3091
https://treeherder.mozilla.org/logviewer?job_id=387747527&repo=autoland&lineNumber=2229
Assignee | ||
Comment 9•2 years ago
|
||
It looks the failure's only happen on Windows ASAN builds. Is there a weird interaction between ASAN and ACG?
Comment 10•2 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:jrmuizel, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.
Comment 11•2 years ago
|
||
The patch did land, but was backed out.
This was not totally unexpected, because we were trying to get a better idea about what issues this mitigation caused.
Comment 12•2 years ago
•
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
It looks the failure's only happen on Windows ASAN builds. Is there a weird interaction between ASAN and ACG?
Hello, that is very likely indeed. I'm very confident that ACG should be disabled for ASAN to work properly. Below is an explanation why, showing where I would suggest to dig to confirm exactly why that could produce timeouts.
ASAN relies on a open-source run-time library called clang_rt.asan*.dll
on Windows. This library contains code that prepares the environment that Firefox will execute in when built with ASAN. As part of this initialization, it puts interceptors on various functions. Various strategies are tried for putting interceptors, but they should all fail with ACG enabled. Moreover some of them could potentially take a very long time to fail. Consider for example this function, which will iterate over regions of memory looking for a suitable location where asking to allocate RWX memory works, something that will always fail with ACG. This may explain the timeouts, although I don't have confirmed practically that this is the exact reason for those.
After discussing with [:bobowen], we think it should be technically possible to make ASAN with ACG working if ACG was enabled by the child process itself dynamically (so after ASAN initialization), and not as part of the startup info for that process set by the parent. Although we want ASAN builds to work as close as possible to release builds, it's unclear what we would really gain by doing that compared to just disabling ACG for ASAN builds.
Comment 13•2 years ago
|
||
Pushed by jmuizelaar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a389830fb63f Enable Arbitratry Code Guard in RDD on Nightly. r=bobowen
Comment 14•2 years ago
|
||
bugherder |
Comment 15•2 years ago
•
|
||
Here are some notes for the future.
[:jrmuizel] pointed out that some other processes already have ACG with ASAN. They are using SetDelayedProcessMitigations
to achieve that, meaning that the mitigation will be applied after initialization (in particular, after ASAN initialization). Since the code to do this already exists, we could consider using SetDelayedProcessMitigations
to enable ACG in ASAN builds here too, mostly for the sake of having consistent behavior between the two kinds of builds and not really for a security reason (so, this is much lower priority compared to doing it for release builds).
Regarding release builds however, if everything works with the non-delayed approach, I'd recommend keeping it non-delayed for them. The security impact is better with a non-delayed ACG. The problem I see with enabling the mitigation as delayed is that I think the legitimate code that runs before the mitigation is applied is allowed to allocate RWX memory that may survive for the rest of the life of the process. I don't think delayed ACG would catch this if it came to happen.
Comment 16•2 years ago
|
||
After some discussion, although it's likely only a few lines of code, this would only catch cases where an ASAN Nightly user does some action to provoke an ACG violation that no user on a regular Nightly does. That seems borderline enough that we probably don't need to bother.
Comment 17•2 years ago
|
||
Hello, this enhancement may have caused bug 1790713, could you please have a look ?
Thank you.
Comment 18•2 years ago
|
||
Backed out as requested by jrmuizel
Backout link: https://hg.mozilla.org/integration/autoland/rev/9a9bedd083ca0cb269509ef5323a40834658c3af
Updated•2 years ago
|
Updated•2 years ago
|
Comment 19•2 years ago
|
||
(In reply to Bernard Alleysson from comment #17)
Hello, this enhancement may have caused bug 1790713, could you please have a look ?
Thank you.
I will take a look at this.
Comment 20•2 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/9a9bedd083ca
Comment 21•2 years ago
•
|
||
The code in DllBlocklist_Initialize
relies on features very similar to the ones I described for ASAN, so it shouldn't work under a non-delayed ACG. I suspect that instead of failing gracefully when ACG is active, it currently doesn't check for it, tries to do its job, and leaves the process in a weird ready-to-crash state. I will confirm this theory next week.
If that is indeed the problem, we should consider delaying ACG here, and we could additionally make sure that DllBlocklist_Initialize
doesn't try to do its job if it detects non-delayed ACG. Delaying ACG would fix this because ApplyProcessMitigationsToCurrentProcess
runs after DllBlocklist_Initialize
, so ACG wouldn't be active yet. That is probably why other processes that already have delayed ACG don't yield the problem we see here.
Comment 22•2 years ago
•
|
||
Regarding 64-bit builds: the current code in DllBlocklist_Initialize
doesn't explicitly check whether the mitigation is set and will try to do its job, however it seems to fail gracefully after the first failing call to VirtualAlloc
(see full stack trace below). As a result I can load videos without problem on my machine and cannot reproduce the problem. The blocklist doesn't appear to be the root cause here. After discussing with [:bobowen] about how to address this, I will propose a custom build to nightly users who reported the problem, which will hopefully help us go further with this.
For my experiments I tried to catch failing calls to VirtualAlloc
and VirtualProtect
, as well as attempts to read or set the process policy ProcessDynamicCodePolicy
or the thread information ThreadDynamicCodePolicy
. With WinDbg that translates to the following, given the positions of the ret
instructions in my specific version of KERNELBASE
:
bp KERNELBASE!VirtualAlloc+0x62 "j (rax=0) ''; 'gc'"
bp KERNELBASE!VirtualProtect+0x56 "j (rax=0) ''; 'gc'"
bp KERNELBASE!GetProcessMitigationPolicy "j (rdx=2) ''; 'gc'"
bp KERNELBASE!SetProcessMitigationPolicy "j (rdx=2) ''; 'gc'"
bp KERNELBASE!GetThreadInformation "j (rdx=2) ''; 'gc'"
bp KERNELBASE!SetThreadInformation "j (rdx=2) ''; 'gc'"
This resulted in catching the following failing call to VirtualAlloc
:
00 000000fe`22ffed88 00007ffb`1e39682a KERNELBASE!VirtualAlloc+0x62
01 000000fe`22ffed90 00007ffb`1e3966ac mozglue!mozilla::interceptor::MMPolicyInProcess::MaybeCommitNextPage+0x6a [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/MMPolicies.h @ 594]
02 000000fe`22ffee10 00007ffb`1e3920e6 mozglue!mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>::GetNextTrampoline+0x3c [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 157]
03 (Inline Function) --------`-------- mozglue!mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>,std::nullptr_t>::GetNextTrampoline+0xd [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 80]
04 (Inline Function) --------`-------- mozglue!mozilla::interceptor::VMSharingPolicyShared::GetNextTrampoline+0x2b [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 263]
05 (Inline Function) --------`-------- mozglue!mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyShared,mozilla::interceptor::TrampolinePool<mozilla::interceptor::VMSharingPolicyUnique<mozilla::interceptor::MMPolicyInProcess>,std::nullptr_t> >::GetNextTrampoline+0x2b [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/VMSharingPolicies.h @ 48]
06 000000fe`22ffeea0 00007ffb`1e391f4f mozglue!mozilla::interceptor::WindowsDllDetourPatcher<mozilla::interceptor::VMSharingPolicyShared>::AddHook+0x126 [/builds/worker/workspace/obj-build/dist/include/mozilla/interceptor/PatcherDetour.h @ 451]
07 000000fe`22ffefd0 00007ffb`1e391ab9 mozglue!mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddDetour+0x3ff [/builds/worker/workspace/obj-build/dist/include/nsWindowsDllInterceptor.h @ 522]
08 000000fe`22fff170 00007ffb`1e3b5ea0 mozglue!mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>::AddDetour+0x159 [/builds/worker/workspace/obj-build/dist/include/nsWindowsDllInterceptor.h @ 476]
09 (Inline Function) --------`-------- mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::ApplyDetour+0x5 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 186]
0a 000000fe`22fff240 00007ffb`8e9d643a mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::InitOnceCallback+0x30 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 197]
0b 000000fe`22fff280 00007ffb`8bfe0b11 ntdll!RtlRunOnceExecuteOnce+0x9a
0c 000000fe`22fff2c0 00007ffb`1e3b4ab3 KERNELBASE!InitOnceExecuteOnce+0x21
0d (Inline Function) --------`-------- mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::SetDetour+0x54 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 141]
0e 000000fe`22fff300 00007ff6`81291b9e mozglue!DllBlocklist_Initialize+0x213 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp @ 622]
0f 000000fe`22fff490 00007ff6`8129184f firefox!NS_internal_main+0x27e [/builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp @ 327]
10 000000fe`22fff680 00007ff6`812914c0 firefox!wmain+0x34f [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 167]
11 000000fe`22fff980 00007ff6`812913d7 firefox!main+0x50 [/builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp @ 39]
12 000000fe`22fff9e0 00007ff6`81291436 firefox!WinMainCRTStartup+0x297
13 000000fe`22fffaa0 00007ffb`8cc254e0 firefox!mainCRTStartup+0x16
14 000000fe`22fffad0 00007ffb`8e9c485b KERNEL32!BaseThreadInitThunk+0x10
15 000000fe`22fffb00 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
BaseThreadInitThunk hook failed
And the following attempt at checking the process policy:
00 0000005a`687fd4b8 00007ffa`b3d75da8 KERNELBASE!GetProcessMitigationPolicy
01 0000005a`687fd4c0 00007ffa`b3d751aa d3d11!D3D11CoreCreateDevice+0xff8
02 0000005a`687fd5c0 00007ffa`b3d73c42 d3d11!D3D11CoreCreateDevice+0x3fa
03 0000005a`687fd8d0 00007ffa`b3db24f7 d3d11+0x13c42
04 0000005a`687fdb60 00007ffa`b3db23ec d3d11!D3D11CreateDeviceAndSwapChain+0xf7
05 0000005a`687fdc20 00007ffa`b3db235e d3d11!D3D11CreateDevice+0x16c
06 0000005a`687fdc90 00007ff9`f931ba11 d3d11!D3D11CreateDevice+0xde
07 0000005a`687fdd40 00007ff9`f931c34d xul!mozilla::gfx::DeviceManagerDx::CreateDevice+0xc1 [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 753]
08 0000005a`687fde50 00007ff9`f931c229 xul!mozilla::gfx::DeviceManagerDx::CreateContentDevice+0xad [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 862]
09 0000005a`687fdfd0 00007ff9`f931c1ee xul!mozilla::gfx::DeviceManagerDx::CreateContentDevicesLocked+0x29 [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 509]
0a 0000005a`687fe020 00007ff9`fa6c1577 xul!mozilla::gfx::DeviceManagerDx::CreateContentDevices+0x1e [C:\mozilla-source\mozilla-unified\gfx\thebes\DeviceManagerDx.cpp @ 495]
0b 0000005a`687fe060 00007ff9`fa6e4210 xul!mozilla::RDDParent::RecvInitVideoBridge+0x67 [C:\mozilla-source\mozilla-unified\dom\media\ipc\RDDParent.cpp @ 219]
0c 0000005a`687fe0b0 00007ff9`f8e813e0 xul!mozilla::PRDDParent::OnMessageReceived+0xb70 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-mingw32\ipc\ipdl\PRDDParent.cpp @ 596]
0d 0000005a`687fe380 00007ff9`f8e80855 xul!mozilla::ipc::MessageChannel::DispatchAsyncMessage+0x70 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1756]
0e 0000005a`687fe3e0 00007ff9`f8e80c24 xul!mozilla::ipc::MessageChannel::DispatchMessage+0x155 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1685]
0f 0000005a`687fe4c0 00007ff9`f8e80eb1 xul!mozilla::ipc::MessageChannel::RunMessage+0x104 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1482]
10 0000005a`687fe510 00007ff9`f8855e87 xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x71 [C:\mozilla-source\mozilla-unified\ipc\glue\MessageChannel.cpp @ 1588]
11 0000005a`687fe560 00007ff9`f883ed0d xul!mozilla::RunnableTask::Run+0xb7 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 539]
12 0000005a`687fe9f0 00007ff9`f883dd48 xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0x7dd [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 851]
13 0000005a`687feca0 00007ff9`f883df89 xul!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal+0x28 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 683]
14 0000005a`687fed30 00007ff9`f88584d2 xul!mozilla::TaskController::ProcessPendingMTTask+0x39 [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 461]
15 (Inline Function) --------`-------- xul!mozilla::TaskController::InitializeInternal::<lambda_1>::operator()+0xe [C:\mozilla-source\mozilla-unified\xpcom\threads\TaskController.cpp @ 187]
16 0000005a`687fedb0 00007ff9`f884b5ae xul!mozilla::detail::RunnableFunction<`lambda at C:/mozilla-source/mozilla-unified/xpcom/threads/TaskController.cpp:187:7'>::Run+0x12 [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThreadUtils.h @ 532]
17 0000005a`687fede0 00007ff9`f884f688 xul!nsThread::ProcessNextEvent+0x63e [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThread.cpp @ 1209]
18 0000005a`687fefa0 00007ff9`f8e83a28 xul!NS_ProcessNextEvent+0x68 [C:\mozilla-source\mozilla-unified\xpcom\threads\nsThreadUtils.cpp @ 465]
19 0000005a`687feff0 00007ff9`f8e446f0 xul!mozilla::ipc::MessagePump::Run+0xa8 [C:\mozilla-source\mozilla-unified\ipc\glue\MessagePump.cpp @ 86]
1a (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 381]
1b 0000005a`687ff050 00007ff9`f8e44668 xul!MessageLoop::RunHandler+0x50 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 375]
1c 0000005a`687ff0a0 00007ff9`fb00ccb8 xul!MessageLoop::Run+0x58 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 357]
1d 0000005a`687ff0f0 00007ff9`fb09c9ec xul!nsBaseAppShell::Run+0x28 [C:\mozilla-source\mozilla-unified\widget\nsBaseAppShell.cpp @ 152]
1e 0000005a`687ff130 00007ff9`fc5fde1c xul!nsAppShell::Run+0x1cc [C:\mozilla-source\mozilla-unified\widget\windows\nsAppShell.cpp @ 614]
1f 0000005a`687ff2a0 00007ff9`f8e446f0 xul!XRE_RunAppShell+0x4c [C:\mozilla-source\mozilla-unified\toolkit\xre\nsEmbedFunctions.cpp @ 880]
20 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 381]
21 0000005a`687ff2e0 00007ff9`f8e44668 xul!MessageLoop::RunHandler+0x50 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 375]
22 0000005a`687ff330 00007ff9`fc5fdb5c xul!MessageLoop::Run+0x58 [C:\mozilla-source\mozilla-unified\ipc\chromium\src\base\message_loop.cc @ 357]
23 0000005a`687ff380 00007ff7`0691189b xul!XRE_InitChildProcess+0x8ec [C:\mozilla-source\mozilla-unified\toolkit\xre\nsEmbedFunctions.cpp @ 743]
24 (Inline Function) --------`-------- firefox!content_process_main+0xa3 [C:\mozilla-source\mozilla-unified\ipc\contentproc\plugin-container.cpp @ 57]
25 0000005a`687ff620 00007ff7`06911340 firefox!NS_internal_main+0x4db [C:\mozilla-source\mozilla-unified\browser\app\nsBrowserApp.cpp @ 359]
26 0000005a`687ff7f0 00007ff7`069656d8 firefox!wmain+0x340 [C:\mozilla-source\mozilla-unified\toolkit\xre\nsWindowsWMain.cpp @ 167]
27 (Inline Function) --------`-------- firefox!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90]
28 0000005a`687ffaf0 00007ffa`bf9f54e0 firefox!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288]
29 0000005a`687ffb30 00007ffa`c0a6485b KERNEL32!BaseThreadInitThunk+0x10
2a 0000005a`687ffb60 00000000`00000000 ntdll!RtlUserThreadStart+0x2b
Comment 23•2 years ago
•
|
||
Regarding 32-bit builds: I did a similar experiment with the isolated example from bug 1783223 comment 6 (not Firefox). To summarize, I would agree that given the current behavior of the 32-bit version of `msmpeg2vdec.dll`, the changes that [:jrmuizel] introduced to allow threads to opt out are the best compromise we can make for 32-bit builds. Here are more details, obtained with the following breakpoints: ``` bp KERNELBASE!VirtualProtect+0x39 "j (eax=0) ''; 'gc'" bp KERNELBASE!VirtualAlloc+0x4c "j (eax=0) ''; 'gc'" bp KERNELBASE!SetThreadInformation "j (poi(esp+0x8) = 2) ''; 'gc'" bp KERNELBASE!GetProcessMitigationPolicy "j (poi(esp+0x8) = 2) ''; 'gc'" bp KERNELBASE!SetProcessMitigationPolicy "j (poi(esp+0x8) = 2) ''; 'gc'" ``` This led me to a code path in `msmpeg2vdec` which looks as follows, the translation to C++ being my own: ``` DWORD dwThreadDynamicCodePolicy = 0; BOOL bResult = GetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy)); if(!bResult) { goto opt_out; } if(dwThreadDynamicCodePolicy == THREAD_DYNAMIC_CODE_ALLOW) { goto opt_out_done; } opt_out: dwThreadDynamicCodePolicy = THREAD_DYNAMIC_CODE_ALLOW; SetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy)); someObject->someField = 1; opt_out_done: return someObject; ``` Here is the same with the original assembly: ``` DWORD dwThreadDynamicCodePolicy = 0; 656f5985 8364240c00 and dword ptr [esp+0Ch],0 BOOL bResult = GetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy)); 656f598a 8d44240c lea eax,[esp+0Ch] 656f598e 6a04 push 4 656f5990 50 push eax 656f5991 6a02 push 2 656f5993 ff1594918565 call dword ptr [msmpeg2vdec!DllUnregisterServer+0x14dd34 (65859194)] // points to GetCurrentThread 656f5999 8b35e81a8565 mov esi,dword ptr [msmpeg2vdec!DllUnregisterServer+0x146688 (65851ae8)] // points to GetThreadInformation 656f599f 8bce mov ecx,esi 656f59a1 50 push eax 656f59a2 ff15a8948565 call dword ptr [msmpeg2vdec!DllUnregisterServer+0x14e048 (658594a8)] // points to a ret instruction 656f59a8 ffd6 call esi if(!bResult) { goto opt_out; } 656f59aa 85c0 test eax,eax 656f59ac 740b je msmpeg2vdec!DllRegisterServer+0x24519 (656f59b9) if(dwThreadDynamicCodePolicy == THREAD_DYNAMIC_CODE_ALLOW) { goto opt_out_done; } 656f59ae 837c240c01 cmp dword ptr [esp+0Ch],1 656f59b3 0f849845fcff je msmpeg2vdec!DllGetClassObject+0x96b1 (656b9f51) opt_out: dwThreadDynamicCodePolicy = THREAD_DYNAMIC_CODE_ALLOW; 656f59b9 0fb6c3 movzx eax,bl 656f59bc 8944240c mov dword ptr [esp+0Ch],eax SetThreadInformation(GetCurrentThread(), ThreadDynamicCodePolicy, &dwThreadDynamicCodePolicy, sizeof(dwThreadDynamicCodePolicy)); 656f59c0 8d44240c lea eax,[esp+0Ch] 656f59c4 6a04 push 4 656f59c6 50 push eax 656f59c7 6a02 push 2 656f59c9 ff1594918565 call dword ptr [msmpeg2vdec!DllUnregisterServer+0x14dd34 (65859194)] // points to GetCurrentThread 656f59cf 8b35ec1a8565 mov esi,dword ptr [msmpeg2vdec!DllUnregisterServer+0x14668c (65851aec)] // points to SetThreadInformation 656f59d5 8bce mov ecx,esi 656f59d7 50 push eax 656f59d8 ff15a8948565 call dword ptr [msmpeg2vdec!DllUnregisterServer+0x14e048 (658594a8)] // points to a ret instruction 656f59de ffd6 call esi someObject->someField = 1; 656f59e0 c6471001 mov byte ptr [edi+10h],1 656f59e4 e96845fcff jmp msmpeg2vdec!DllGetClassObject+0x96b1 (656b9f51) opt_out_done: return someObject; 656b9f51 8bc7 mov eax,edi 656b9f53 5f pop edi 656b9f54 5e pop esi 656b9f55 5b pop ebx 656b9f56 8be5 mov esp,ebp 656b9f58 5d pop ebp 656b9f59 c20400 ret 4 ``` We reach this code after multiple code paths that get the current process' policy for ACG, including one originating from `msmpeg2vdec` (note that this was not the case with 64-bit builds, where none was originating from `msmpeg2vdec`). The important point I'd like to share about the code above is that the result of the call to `SetThreadInformation` is not checked and thus lost. This code thums seems to assume that opting out of ACG will work and seems unable to adapt to a strict ACG without opt-out. After this code gets executed, a failing call to `VirtualProtect` originating from `msmpeg2vdec` occurs: ``` 00 03afd5f0 656b938e KERNELBASE!VirtualProtect+0x39 01 03afd618 656b932e msmpeg2vdec!DllGetClassObject+0x8aee 02 03afd630 656b4b90 msmpeg2vdec!DllGetClassObject+0x8a8e 03 03afd7e8 656aec59 msmpeg2vdec!DllGetClassObject+0x42f0 04 03afd8d8 656ae894 msmpeg2vdec+0x7ec59 05 03afd918 656ae3fa msmpeg2vdec+0x7e894 06 03afd950 656ae373 msmpeg2vdec+0x7e3fa 07 03afd968 75a09d5c msmpeg2vdec+0x7e373 08 03afda68 75a17385 combase!CServerContextActivator::CreateInstance+0x1ec [onecore\com\combase\objact\actvator.cxx @ 881] ... ``` Then the code path that tries to opt out is reached once again (and will, again, fail without noticing at `SetThreadInformation`), and finally we crash by jumping to a portion of the area that `VirtualProtect` was trying to set as executable. In my case the failing call was `VirtualProtect(lpAddress=0x09160000, dwSize=0x00010000, flProtect=0x40=PAGE_EXECUTE_READWRITE)` and I was crashing with `eip=0916da87`.
Updated•2 years ago
|
Comment 24•2 years ago
•
|
||
Here is a technical update regarding the progress made so far while trying to understand the problem in bug 1790713.
Summary
I have analyzed the paths taken in kernelbase.dll
, ntdll.dll
, and ntoskrnl.exe
by APIs impacted by ACG, in search of realistic ways to let reporters help us debug the problem. To summarize findings:
MiArbitraryCodeBlocked
is a valuable point of interest in kernel code to study ACG failures. Almost all code paths use this function to check the status of ACG for the current thread.RtlSetLastWin32Error
(a.k.a.SetLastError
) is a valuable point of interest in userland code to study ACG failures. All APIs that fail because of ACG should call this function withERROR_DYNAMIC_CODE_BLOCKED
before they return.- There is an internal variable called
g_dwLastErrorToBreakOn
baked into inntdll.dll
, which can be used to produce a breakpoint when a specific error code gets passed toRtlSetLastWin32Error
.
What happens in the kernel
MiArbitraryCodeBlocked
will generate ETW events for two different providers:
Microsoft-Windows-Security-Mitigations
with GUID{FAE10392-F0AF-4AC0-B8FF-9F4D920C3CDF}
;Microsoft-Windows-Kernel-Memory
with GUID{D1D93EF7-E1F2-4F45-9943-03D245FE6C00}
.
Here is a rough pseudo-code equivalent for MiArbitraryCodeBlocked
, leading to different ETW events being produced depending on how ACG is configured (note: this has been updated since first write):
EVENT_DESCRIPTOR MITIGATION_AUDIT_PROHIBIT_DYNAMIC_CODE{
Id=1, Version=0, Channel=0x10, Level=0, Opcode=0, Task=1, Keyword=0x8000000000000000
};
EVENT_DESCRIPTOR MITIGATION_ENFORCE_PROHIBIT_DYNAMIC_CODE{
Id=2, Version=0, Channel=0x10, Level=3, Opcode=0, Task=1, Keyword=0x8000000000000000
};
EVENT_DESCRIPTOR KERNEL_MEM_EVENT_ACG{
Id=8, Version=0, Channel=0x10, Level=4, Opcode=0, Task=6, Keyword=0x8000000000000100
};
// Returns STATUS_DYNAMIC_CODE_BLOCKED if ACG is active for the current thread, STATUS_SUCCESS otherwise
NTSTATUS MiArbitraryCodeBlocked(CurrentProcess)
{
if (IsDynamicCodeBlocked(CurrentProcess) && !HasOptedOut(GetCurrentThread())) {
// The current operation is blocked by ACG
EtwWriteEx(RegHandleFor("Microsoft-Windows-Kernel-Memory"), &KERNEL_MEM_EVENT_ACG, ..., Flags=1, ...); // ACGFlag = 0x80000000
if (IsDynamicCodeAudited(CurrentProcess)) {
// Report only one failure to Microsoft-Windows-Security-Mitigations
EtwWriteEx(RegHandleFor("Microsoft-Windows-Security-Mitigations"), &MITIGATION_ENFORCE_PROHIBIT_DYNAMIC_CODE, ..., Flags=0, ...);
SetDynamicCodeAudited(CurrentProcess, false);
}
return STATUS_DYNAMIC_CODE_BLOCKED;
}
if (IsDynamicCodeAudited(CurrentProcess) && !HasOptedOut(GetCurrentThread())) {
// Using ACG in audit mode, meaning no actual ACG failures will occur, but events are reported
// Report only one failure to Microsoft-Windows-Security-Mitigations
EtwWriteEx(RegHandleFor("Microsoft-Windows-Security-Mitigations"), &MITIGATION_AUDIT_PROHIBIT_DYNAMIC_CODE, ..., Flags=0, ...);
SetDynamicCodeAudited(CurrentProcess, false);
}
EtwWriteEx(RegHandleFor("Microsoft-Windows-Kernel-Memory"), &KERNEL_MEM_EVENT_ACG, ..., Flags=1, ...); // ACGFlag = 0
return STATUS_SUCCESS;
}
(Edited) About IsDynamicCodeAudited
in the pseudo-code above: this corresponds to the AuditProhibitDynamicCode
bit. This bit can be set individually but it is also automatically set to 1 when setting ProhibitDynamicCode
to 1. The important information here is that there will be at most one ETW event reported on Microsoft-Windows-Security-Mitigations
unless there is a new call to SetProcessMitigationPolicy
!
What happens in userland
The attached diagram explains what happens when a system call fails due to an ACG failure through the example of the VirtualProtect
API. As shown in the diagram, if a syscall should fail under ACG, it will propagate the STATUS_DYNAMIC_CODE_BLOCKED
value as their result, then the higher-level API that called them will set the last error to ERROR_DYNAMIC_CODE_BLOCKED
accordingly. STATUS_DYNAMIC_CODE_BLOCKED
is 0xc0000604
and ERROR_DYNAMIC_CODE_BLOCKED
is 0x677
. The translation is performed by BaseSetLastNTError
.
What I tried
Based on this information, I produced several custom builds as my understanding grew, gradually catching more potential causes of failure and getting closer to the original Nightly build:
- The first custom build requires a delayed ACG. It catches failing calls to
VirtualAlloc
,VirtualProtect
,MapViewOfFile
, andSetProcessValidCallTargets
using Firefox’s own hooking functions. - The second custom build requires a delayed ACG. It catches all calls failing with
ERROR_DYNAMIC_CODE_BLOCKED
, by hookingRtlSetLastWin32Error
with Firefox’s own hooking functions. - The third custom build (CI failures are expected) uses a non-delayed ACG like the original Nightly build. It relies on a new separate
pdbtool.exe
executable that downloadsntdll.dll
PDB symbols. This build catches all calls failing with ERROR_DYNAMIC_CODE_BLOCKED, by setting thentdll!g_dwLastErrorToBreakOn
internal variable toERROR_DYNAMIC_CODE_BLOCKED
at process creation, while the process is still suspended.
I have tested each custom build with a Windows 10 VM from before Creators' Update, removing the version check introduced by [:jrmuizel], which we know should provoke an ACG failure. This resulted in crashes for each individual build.
Going further
In addition to the custom builds, I have tried to get some information from the reporters through the ETW events generated from MiArbitraryCodeBlocked
. Unfortunately these logs are not very verbose. So far I have not succeeded in my attempts to have ETW collect the stack traces when the events are written, which would definitely help us understand what's going on. It may be possible to collect those stack traces, but my ETW knowledge is too limited currently.
Comment 25•2 years ago
•
|
||
ESET Endpoint Antivirus was installed on the machine of at least two out of three users who reproduced bug 1790713. ESET seems compatible with a delayed ACG though.
The original Nightly build is expected to produce a non-fatal ACG failure in DllBlocklist_Initialize
, while the third custom build I provided is expected to produce no ACG failure. However for the ESET user, both builds produced exactly one ACG failure. This indicates that the additional failure observed by the ESET user was fatal and prevented the original Nightly build's process from even reaching (Edited: The reasoning here is false. See the updated pseudo-code in comment 24: there will always be a single failure reported on DllBlocklist_Initialize
. Otherwise, there should be two failures reported with the original Nightly build, the one likely due to ESET and the one known to be caused by DllBlocklist_Initialize
.Microsoft-Windows-Security-Mitigations
by design choice from Microsoft.)
Since this user moreover had weird events in their ETW traces, I was curious to see if I could reproduce those by adding code in Firefox that does what ESET probably does as well - inject code and run threads in processes at their creation time. I used VirtualAllocEx
, WriteProcessMemory
, and CreateRemoteThread
from the parent process while the child process was still suspended. I injected a ret
instruction and created a thread that executes it. Here are some notes after these tests:
- ACG does not impact allocations originating from an external process through
VirtualAllocEx
. From the parent, I was able to allocate remote RWX memory in a RDD process with non-delayed ACG. - This simple test produced the following event:
Process '...\firefox.exe' (PID 23488) was blocked from loading the non-Microsoft-signed binary '...\mozglue.dll'.
. The ESET user had a similar error but formozavcodec.dll
andmozavutil.dll
. - I wasn't able to reproduce the events related to Win32k.
I would thus say that ESET is probably creating remote threads somewhere around process creation, though not using (Edited: Given the false reasoning above, it's a bit hasty to conclude this.)VirtualAllocEx
. They are probably using another method which can result in an ACG failure, and not adapting to their operation being denied by the system.
I think it would be worth investigating the root cause for the extra ETW events observed by ESET users as well as they may be producing problems without us noticing. They are not related to adding ACG in RDD though. Making more advances on having stack traces with ETW reporting could also help here.
In that regard, I was able to produce the stack traces associated with ACG-related ETW events programmatically by adapting an official example from krabsetw. It's only raw non-symbolized addresses at the moment, but at least it shows that this information can be retrieved.
Comment 26•2 years ago
•
|
||
Regarding the code used to opt out in 32-bit builds, which [:jrmuizel] identified in comment 3 and comment 4 and I analyzed in comment 23, I found the corresponding source code in the open-source though outdated chakra-core repository. I was able to match the codes to confirm that this is indeed what's used in msmpeg2vdec.dll
and also in MSAudDecMFT.dll
as suggested in bug 1766432 comment 3. See:
- the class declaration for AutoEnableDynamicCodeGen;
- the implementation for AutoEnableDynamicCodeGen.
Here's how it's used: AutoEnableDynamicCodeGen
objects are added to the scope of code that would otherwise get rejected by ACG. The constructor for AutoEnableDynamicCodeGen
will opt out the current thread of ACG, and the destructor will opt in again. An example use in source code is available here. AutoEnableDynamicCodeGen
will fail silently in release builds when executed within a process that doesn't allow threads to opt out, thus leading to the later crashes that we observed.
In recent 32-bit builds, compared to the outdated open-source implementation, the code has slightly evolved to include reporting errors when we are trying to opt out from the context of a process that doesn't allow opting out, however the code itself will continue to fail silently and neither throw an exception nor force a crash. As was mentioned in comment 5, 64-bit versions of msmpeg2vdec.dll
and MSAudDecMFT.dll
have shifted to code that no longer requires opting out from ACG with Windows 10 Creators' Update.
We should check exactly which versions of Windows have the opt-out code in these DLLs and use ACG with opt out for these versions. That should have started around the time that the code was introduced in chakra-core.
Comment 27•2 years ago
|
||
After analyzing these 2 DLLs on various versions on Windows, here is the shared compatibility chart for msmpeg2vdec.dll
and MSAudDecMFT.dll
:
Windows Version | 32-bit | 64-bit |
---|---|---|
8.1 and before | INC | INC |
10 1511 (November Update) or earlier | INC | INC |
10 1607 (Anniversary Update) | OPT | OPT |
10 1703 (Creators' Update) or later | OPT | ACG |
where:
- INC: Incompatible with any variant of ACG;
- OPT: Compatible with ACG if opting out is enabled;
- ACG: Fully compatible with any variant of ACG.
Comment 28•2 years ago
•
|
||
I have produced a first version of mitimon
, a custom tool which traces ETW events related to security mitigations.
The good
- Catches ACG violations reported on both
Microsoft-Windows-Security-Mitigations
andMicrosoft-Windows-Kernel-Memory
. - Produces symbolized kernel and userland stack traces for these events, at least on Windows 11.
Current problems
The tool can already be used with its full potential to catch ACG failures on Windows 11, and provide some results on Windows 10. It can also be used to track the first Win32k failure in a process, but not the next ones. Here is why:
- On both Windows 11 and Windows 10, only one ACG violation event at most is reported per process on
Microsoft-Windows-Security-Mitigations
by design choice from Microsoft. See the updated pseudo-code forMiArbitraryCodeBlocked
in comment 24. Things are similar for Win32k violations. This is whereMicrosoft-Windows-Kernel-Memory
saves us, though only for ACG, because all ACG violation events are reported on it. - However, on Windows 10, the stack traces seem to be missing the userland part for the ACG violations reported on
Microsoft-Windows-Kernel-Memory
. I don't have this problem on my Windows 11 machine, so this looks like a bug that Microsoft would have solved in Windows 11 without backporting the patch to Windows 10. From my tests, the common root cause for events that don't get their user stack could be that they report events usingEtwWriteEx
with the undocumentedFlags
parameter set to 1, compared to 0 for regular events that do get their user stack. This is kernel code so we cannot do much ourselves.
Going further
I tried two potential ways to bypass the current problem for Windows 10 and ACG:
- When we detect an ACG failure, we can execute code within the process in which the failure occurred. We make that process call
SetProcessMitigationPolicy
again. That resets the boolean used in the kernel to decide whether the next ACG failure should be reported onMicrosoft-Windows-Security-Mitigations
. Beyond the fact that this technique is intrusive, the main problem with it is that we will miss ACG failures that may occur in the lapse of time between the current failure and the effective call toSetProcessMitigationPolicy
. Indeed ETW isn't real-time, and the audited process keeps running while we are analyzing its events. In my tests, I wasn't able to catch successive failures with this technique without adding intentionalSleep
delays between them, despite using a dedicated high-priority thread to parse the events. - We can use ETW to capture stack traces at system call entry, and try to match these stack traces to the ACG violation events reported on
Microsoft-Windows-Kernel-Memory
to guess their userland stack. Capturing stacks at system call entry requires additional code, but it seems to work. This should provide the best possible results for ACG violations once integrated into the tool.
Regarding tracking Win32k violations, here are ideas regarding what's possible or not:
- The 1st technique described for ACG should work for Win32k with the same limitations as those described for ACG.
- We cannot use the 2nd technique because (a) there is no event that gets unconditionally reported upon Win32k failures (not just the first failure), and (b) system call entry/exit events are not reported for blocked Win32k system calls.
- Because system call entry/exit event are reported for non-blocked Win32k system calls, if we disable the mitigation in the target process, we can use ETW to track these events and detect Win32k system calls originating from the process. This doesn't exactly test what happens when the mitigation is active, but it can help identify what Win32k system calls are still being used by the process.
Comment 29•2 years ago
•
|
||
While using mitimon
on my Windows 11 machine, I once caught the following ACG failures in the RDD process with non-delayed ACG (not always though?):
0x00007ffc3302f774 ntdll+0x9f774 ntdll!NtProtectVirtualMemory+0x14
0x00007ffc2a83566b apphelp+0x566b apphelp!SepIatPatch+0x163
0x00007ffc2a83e4b6 apphelp+0xe4b6 apphelp!SepRouterHookImportedApi+0x2b6
0x00007ffc2a83de99 apphelp+0xde99 apphelp!SepRouterHookIAT+0x319
0x00007ffc2a83dac7 apphelp+0xdac7 apphelp!SE_DllLoaded+0x157
0x00007ffc33009997 ntdll+0x79997 ntdll!LdrpSendShimEngineInitialNotifications+0x6f
0x00007ffc330098cc ntdll+0x798cc ntdll!LdrpLoadShimEngine+0x124
0x00007ffc33009271 ntdll+0x79271 ntdll!LdrpInitShimEngine+0x159
0x00007ffc3306612c ntdll+0xd612c ntdll!LdrpInitializeProcess+0x1b54
0x00007ffc33059b1e ntdll+0xc9b1e ntdll!_LdrpInitialize+0x55bf2
0x00007ffc33003ef3 ntdll+0x73ef3 ntdll!LdrpInitializeInternal+0x6b
0x00007ffc33003e1e ntdll+0x73e1e ntdll!LdrInitializeThunk+0xe
0x00007ffc3302f774 ntdll+0x9f774 ntdll!NtProtectVirtualMemory+0x14
0x00007ffc2a83566b apphelp+0x566b apphelp!SepIatPatch+0x163
0x00007ffc2a83e4b6 apphelp+0xe4b6 apphelp!SepRouterHookImportedApi+0x2b6
0x00007ffc2a83de99 apphelp+0xde99 apphelp!SepRouterHookIAT+0x319
0x00007ffc2a83dac7 apphelp+0xdac7 apphelp!SE_DllLoaded+0x157
0x00007ffc32fbf04c ntdll+0x2f04c ntdll!LdrpSendPostSnapNotifications+0x12c
0x00007ffc32fbeeef ntdll+0x2eeef ntdll!LdrpNotifyLoadOfGraph+0x4f
0x00007ffc32fbef09 ntdll+0x2ef09 ntdll!LdrpNotifyLoadOfGraph+0x69
0x00007ffc32fbdce1 ntdll+0x2dce1 ntdll!LdrpPrepareModuleForExecution+0x79
0x00007ffc32fb9040 ntdll+0x29040 ntdll!LdrpLoadDllInternal+0x20c
0x00007ffc32fa932c ntdll+0x1932c ntdll!LdrpLoadDll+0xb0
0x00007ffc32fba95a ntdll+0x2a95a ntdll!LdrLoadDll+0xfa
0x00007ff7d8439b4e firefox+0x29b4e firefox!mozilla::freestanding::patched_LdrLoadDll+0x1ae /builds/worker/checkouts/gecko/browser/app/winlauncher/freestanding/DllBlocklist.cpp:365+0x19
0x00007ffc307aba62 KernelBase+0x2ba62 KernelBase!LoadLibraryExW+0x172
0x00007ffc307a7c41 KernelBase+0x27c41 KernelBase!LoadLibraryExA+0x31
0x00007ffc228712fb igd10iumd64+0x12fb
0x00007ffc22871526 igd10iumd64+0x1526
0x00007ffc295b7390 d3d11+0x17390 d3d11!NDXGI::CUMDAdapter::OpenAdapter10_2+0xa0
0x00007ffc295b658f d3d11+0x1658f d3d11!CCreateDeviceCache::CUMDAdapterCache::Load+0x257
0x00007ffc295b608c d3d11+0x1608c d3d11!CCreateDeviceCache::CAdapterCache::ResolveUMDAndVersion+0x180
0x00007ffc295b590b d3d11+0x1590b d3d11!D3D11CoreCreateDevice+0x45b
0x00007ffc295b41e0 d3d11+0x141e0 d3d11!D3D11CreateDeviceAndSwapChainImpl+0x3f0
0x00007ffc295df847 d3d11+0x3f847 d3d11!D3D11CreateDeviceAndSwapChain+0xf7
0x00007ffc295df62c d3d11+0x3f62c d3d11!?D3D11CreateDeviceImpl@@YAJPEAUIDXGIAdapter@@W4D3D_DRIVER_TYPE@@PEAUHINSTANCE__@@IPEBW4D3D_FEATURE_LEVEL@@IIPEAPEAUID3D11Device@@PEAW44@PEAPEAUID3D11DeviceContext@@@Z+0x5c
0x00007ffc295df71e d3d11+0x3f71e d3d11!D3D11CreateDevice+0xde
0x00007ffb7ab3039d xul+0x280039d xul!mozilla::gfx::DeviceManagerDx::CreateDevice+0xad /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:753+0x71
0x00007ffb7ab30c26 xul+0x2800c26 xul!mozilla::gfx::DeviceManagerDx::CreateContentDevice+0xe6 /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:862+0x25
0x00007ffb7ab30b1f xul+0x2800b1f xul!mozilla::gfx::DeviceManagerDx::CreateContentDevicesLocked+0x1f /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:509+0x8
0x00007ffb7ab30aee xul+0x2800aee xul!mozilla::gfx::DeviceManagerDx::CreateContentDevices+0x1e /builds/worker/checkouts/gecko/gfx/thebes/DeviceManagerDx.cpp:495+0x8
0x00007ffb7b8e3701 xul+0x35b3701 xul!mozilla::RDDParent::RecvInitVideoBridge+0x91 /builds/worker/checkouts/gecko/dom/media/ipc/RDDParent.cpp:213+0x8
0x00007ffb79112055 xul+0xde2055 xul!mozilla::PRDDParent::OnMessageReceived+0x755 /builds/worker/workspace/obj-build/ipc/ipdl/PRDDParent.cpp:596+0x28
0x00007ffb79c235b6 xul+0x18f35b6 xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x426 /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1579+0x38c
0x00007ffb79e41150 xul+0x1b11150 xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal+0xf10 /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:851+0x2e2
0x00007ffb79b8c072 xul+0x185c072 xul!nsThread::ProcessNextEvent+0xeb2 /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1205+0x3d
0x00007ffb79e6ac94 xul+0x1b3ac94 xul!mozilla::ipc::MessagePump::Run+0xc4 /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:85+0x29
0x00007ffb78d5092f xul+0xa2092f xul!MessageLoop::RunHandler+0x2f /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:375+0x0
0x00007ffb783538fe xul+0x238fe xul!MessageLoop::Run+0x4e /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:357+0x0
0x00007ffb784c25f8 xul+0x1925f8 xul!nsBaseAppShell::Run+0x28 /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:152+0x0
0x00007ffb784c13a8 xul+0x1913a8 xul!nsAppShell::Run+0x38 /builds/worker/checkouts/gecko/widget/windows/nsAppShell.cpp:614+0x8
0x00007ffb7968ceeb xul+0x135ceeb xul!XRE_RunAppShell+0x4b /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:880+0xd
0x00007ffb78d5092f xul+0xa2092f xul!MessageLoop::RunHandler+0x2f /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:375+0x0
0x00007ffb783538fe xul+0x238fe xul!MessageLoop::Run+0x4e /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:357+0x0
0x00007ffb7968c836 xul+0x135c836 xul!XRE_InitChildProcess+0x5f6 /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:743+0x0
0x00007ff7d84309f6 firefox+0x209f6 firefox!wmain+0x5b6 /builds/worker/checkouts/gecko/toolkit/xre/nsWindowsWMain.cpp:167+0x31a
0x00007ff7d8440398 firefox+0x30398 firefox!__scrt_common_main_seh+0x10c d:\agent\_work\2\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288+0x22
0x00007ffc31dc244d kernel32+0x1244d kernel32!BaseThreadInitThunk+0x1d
0x00007ffc32fedf78 ntdll+0x5df78 ntdll!RtlUserThreadStart+0x28
These failures originate from Windows internal code, part of the Shim Engine. The existence of the Shim Engine within Windows means that even loading a DLL can be incompatible with non-delayed ACG. This is even more evidence that we should use the ACG mitigation as delayed. The other reasons were:
- full incompatibility with ASAN initialization;
- partial incompatibility with our own DLL blocklist code;
- likely full incompatibility with ESET antivirus.
Comment 30•2 years ago
|
||
Comment 31•2 years ago
|
||
Depends on D159178
Comment 32•2 years ago
|
||
Depends on D159179
Comment 33•2 years ago
•
|
||
The compatibility chart from comment 27 explains the remaining problems in bug 1766432 (x86) and bug 1773005 (versions before Creators' Update). The changes I propose in this changeset fix those as well, on Nightly.
Comment 34•2 years ago
|
||
We will wait a bit to push these changes as today is Soft Code Freeze.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 35•2 years ago
|
||
We should be able to land now.
Comment 36•1 year ago
|
||
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0725a94f55ec Define utility function for choosing an ACG variant compatible with system media libraries. r=bobowen https://hg.mozilla.org/integration/autoland/rev/09170d6279c4 Use ACG-with-opt-out for 32-bit builds and Windows 10 1607 in audio decoder on Nightly. r=bobowen https://hg.mozilla.org/integration/autoland/rev/084256b6f69e Enable best ACG variant compatible with system media libraries in RDD on Nightly. r=bobowen
Comment 37•1 year ago
•
|
||
I have tested the changes with the critical Windows 10 versions from comment 27 (1511, 1607, 1703) and all seems good. I validated that I can play AAC, OGG, and YouTube videos even with MinGW builds. I was worried about some test failures in browser_utility_audioDecodeCrash.js
, browser_utility_audio_shutdown.js
and browser_utility_multipleAudio.js
with MinGW builds, but they do not originate from these changes and were already failing on central. These tests are not normally associated with MinGW targets and I added them myself -- so it may be expected that they don't work.
Comment 38•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0725a94f55ec
https://hg.mozilla.org/mozilla-central/rev/09170d6279c4
https://hg.mozilla.org/mozilla-central/rev/084256b6f69e
Description
•