Closed Bug 1389758 Opened 7 years ago Closed 3 years ago

Crash in mozilla::layers::MLGDeviceD3D11::MaybeLockTexture

Categories

(Core :: Graphics, defect, P3)

56 Branch
All
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- fix-optional
firefox57 --- fix-optional

People

(Reporter: philipp, Assigned: vliu)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-768636fd-4974-4fac-aa03-8932a0170811.
=============================================================
Crashing Thread (21), Name: Compositor
Frame 	Module 	Signature 	Source
0 	xul.dll 	CrashStatsLogForwarder::CrashAction(mozilla::gfx::LogReason) 	gfx/thebes/gfxPlatform.cpp:416
1 	xul.dll 	mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 	obj-firefox/dist/include/mozilla/gfx/Logging.h:524
2 	xul.dll 	mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() 	obj-firefox/dist/include/mozilla/gfx/Logging.h:281
3 	xul.dll 	mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log<1, mozilla::gfx::CriticalLogger>() 	obj-firefox/dist/include/mozilla/gfx/Logging.h:273
4 	xul.dll 	mozilla::layers::MLGDeviceD3D11::MaybeLockTexture(ID3D11Texture2D*) 	gfx/layers/d3d11/MLGDeviceD3D11.cpp:1713
5 	xul.dll 	mozilla::layers::MLGDeviceD3D11::ResolveTextureSourceForShader(mozilla::layers::TextureSource*) 	gfx/layers/d3d11/MLGDeviceD3D11.cpp:1680
6 	xul.dll 	mozilla::layers::MLGDeviceD3D11::SetPSTextures(unsigned int, unsigned int, mozilla::layers::TextureSource* const*) 	gfx/layers/d3d11/MLGDeviceD3D11.cpp:1660
7 	xul.dll 	mozilla::layers::SingleTexturePass::SetupPipeline() 	gfx/layers/mlgpu/RenderPassMLGPU.cpp:624
8 	xul.dll 	mozilla::layers::ShaderRenderPass::ExecuteRendering() 	gfx/layers/mlgpu/RenderPassMLGPU.cpp:314
9 	xul.dll 	mozilla::layers::RenderViewMLGPU::ExecutePass(mozilla::layers::RenderPassMLGPU*) 	gfx/layers/mlgpu/RenderViewMLGPU.cpp:474
10 	xul.dll 	mozilla::layers::RenderViewMLGPU::ExecuteRendering() 	gfx/layers/mlgpu/RenderViewMLGPU.cpp:427
11 	xul.dll 	mozilla::layers::FrameBuilder::Render() 	gfx/layers/mlgpu/FrameBuilder.cpp:108
12 	xul.dll 	mozilla::layers::LayerManagerMLGPU::RenderLayers() 	gfx/layers/mlgpu/LayerManagerMLGPU.cpp:378
13 	xul.dll 	mozilla::layers::LayerManagerMLGPU::Composite() 	gfx/layers/mlgpu/LayerManagerMLGPU.cpp:321
14 	xul.dll 	mozilla::layers::LayerManagerMLGPU::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/mlgpu/LayerManagerMLGPU.cpp:280
15 	xul.dll 	mozilla::layers::CompositorBridgeParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) 	gfx/layers/ipc/CompositorBridgeParent.cpp:1041
16 	xul.dll 	mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp) 	gfx/layers/ipc/CompositorVsyncScheduler.cpp:262
17 	xul.dll 	mozilla::detail::RunnableMethodImpl<mozilla::layers::CompositorVsyncScheduler* const, void ( mozilla::layers::CompositorVsyncScheduler::*)(mozilla::TimeStamp), 1, 1, mozilla::TimeStamp>::Run() 	obj-firefox/dist/include/nsThreadUtils.h:1172
18 	xul.dll 	MessageLoop::RunTask(already_AddRefed<nsIRunnable>) 	ipc/chromium/src/base/message_loop.cc:452
19 	xul.dll 	MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) 	ipc/chromium/src/base/message_loop.cc:460
20 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:535
21 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc:210
22 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h:80
23 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
24 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
25 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc:181
26 	xul.dll 	`anonymous namespace'::ThreadFunc 	ipc/chromium/src/base/platform_thread_win.cc:28
27 	kernel32.dll 	BaseThreadInitThunk 	
28 	ntdll.dll 	RtlUserThreadStart 	
29 	kernel32.dll 	BasepReportFault 	
30 	kernel32.dll 	BasepReportFault

this is a lower volume crash, currently on nightly only, looking related to advanced layers landing in 56.
vincent, could you have a look at this bug?
Flags: needinfo?(vliu)
From crash report, there were some logs related to device reset.

|[4]CP+[GFX1-]: [D3D11] 2 CreateTexture2D failure Size: Size(914,593)texture11: false Code: 0x887a0005 (t=952.592) 
|[5]CP+[GFX1-]: (gfxWindowsPlatform) Detected device reset: 3 (t=952.819) 
|[6][GFX1-]: (nsWindow) Detected device reset: 7 (t=952.846) 
|[7][GFX1 2]: D3D lock mutex timeout (t=970.69)
(In reply to Peter Chang[:pchang] from comment #1)
> vincent, could you have a look at this bug?

ok. I will take a look into it.
Assignee: nobody → vliu
Flags: needinfo?(vliu)
The below information are what I currently found.

1. There are totally 10 crash signatures of this type. 8 are on Windows 7 while the other 2 are on Windows 10. If the crash happens on Windows 10, the crashes were in GPU process. In the other side, if it happens on windows 7, the crashes were in UI process. The status of e10sEnabled were all set to true in these 10 signatures.
2. All signatures were hit in [1]. From the string of "D3D lock mutex timeout", it apparently lock mutex fail with timeout. Most AdapterDriverVersion were 22.21.13.8233 or 22.21.13.8253, which are all NVIDIA GeForce GT card.
3. From mini-dump, I saw mNextWaitForPresentQuery in MLGDeviceD3D11 became NULL when the break point stoped in [1]. For this, I am not sure if it is relatives to timeout happens. Maybe I need to have more study on this.

    mNextWaitForPresentQuery	{mRawPtr=0x0000000000000000 <NULL> }	RefPtr<ID3D11Query>

4. I also saw another similar bug 1239188. At that time we didn't have advanced layer implementation so it crashed in LockD3DTexture(). One thing it noticed me that when gecko checked the return value of mutex->AcquireSync(0, 10000), it called gfxDevCrash for timeout fail. But for other fails like abandoned or others, current design used gfxCriticalNote() to collect information. One question is do we actually need to crash at this point? Maybe we can put a proper error handling here to focus on actual problem.  

[1]: https://searchfox.org/mozilla-central/source/gfx/layers/d3d11/MLGDeviceD3D11.cpp#1722
(In reply to Vincent Liu[:vliu] from comment #4)
> The below information are what I currently found.
> 
> 1. There are totally 10 crash signatures of this type. 8 are on Windows 7
> while the other 2 are on Windows 10. If the crash happens on Windows 10, the
> crashes were in GPU process. In the other side, if it happens on windows 7,
> the crashes were in UI process. The status of e10sEnabled were all set to
> true in these 10 signatures.
> 2. All signatures were hit in [1]. From the string of "D3D lock mutex
> timeout", it apparently lock mutex fail with timeout. Most
> AdapterDriverVersion were 22.21.13.8233 or 22.21.13.8253, which are all
> NVIDIA GeForce GT card.
> 3. From mini-dump, I saw mNextWaitForPresentQuery in MLGDeviceD3D11 became
> NULL when the break point stoped in [1]. For this, I am not sure if it is
> relatives to timeout happens. Maybe I need to have more study on this.
> 
>     mNextWaitForPresentQuery	{mRawPtr=0x0000000000000000 <NULL> }
> RefPtr<ID3D11Query>
> 
> 4. I also saw another similar bug 1239188. At that time we didn't have
> advanced layer implementation so it crashed in LockD3DTexture(). One thing
> it noticed me that when gecko checked the return value of
> mutex->AcquireSync(0, 10000), it called gfxDevCrash for timeout fail. But
> for other fails like abandoned or others, current design used
> gfxCriticalNote() to collect information. One question is do we actually
> need to crash at this point? Maybe we can put a proper error handling here
> to focus on actual problem.  
> 
The 
error handling about locking failure would be great but you have to deal with this from SetupPipeline in ShaderRenderPass::ExecuteRendering().

[1]https://searchfox.org/mozilla-central/source/gfx/layers/mlgpu/RenderPassMLGPU.cpp#314

> [1]:
> https://searchfox.org/mozilla-central/source/gfx/layers/d3d11/MLGDeviceD3D11.
> cpp#1722
(In reply to Peter Chang[:pchang] from comment #5)

> The 
> error handling about locking failure would be great but you have to deal
> with this from SetupPipeline in ShaderRenderPass::ExecuteRendering().
> 
> [1]https://searchfox.org/mozilla-central/source/gfx/layers/mlgpu/
> RenderPassMLGPU.cpp#314
> 

I am not sure dealing with error handling starting from SetupPipeline() should be the proper place. 
ShaderRenderPass are the base class of TexturedRenderPass and SolidColorPass. It seems that SolidColorPass is not the texture based RenderPass so it won't hit the texture lock problem[1]. Please correct me if I got anything wrong.

[1]: http://searchfox.org/mozilla-central/source/gfx/layers/mlgpu/RenderPassMLGPU.cpp#434
Blocks: 1388995
Whiteboard: [gfx-noted]
I will lower the priority since the crash rate is low for the past week. Please leave comment if anyone has different opinion. Thanks
Severity: critical → normal
Priority: -- → P3

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.