Closed Bug 1408514 Opened 4 years ago Closed 2 years ago

Crash in mozilla::layers::CompositorManagerChild::ProcessingError

Categories

(Core :: Graphics: Layers, defect, P4)

56 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- disabled
firefox57 --- disabled
firefox58 --- unaffected
firefox61 --- affected

People

(Reporter: philipp, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [wr-reserve])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-a65ab7a5-54af-4cb5-9ec1-a82c30171013.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	CrashStatsLogForwarder::CrashAction(mozilla::gfx::LogReason) 	gfx/thebes/gfxPlatform.cpp:416
1 	xul.dll 	mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) 	gfx/2d/Logging.h:525
2 	xul.dll 	mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() 	gfx/2d/Logging.h:282
3 	xul.dll 	mozilla::layers::CompositorManagerChild::ProcessingError(mozilla::ipc::HasResultCodes::Result, char const*) 	gfx/layers/ipc/CompositorManagerChild.cpp:255
4 	xul.dll 	mozilla::ipc::MessageChannel::MaybeHandleError(mozilla::ipc::HasResultCodes::Result, IPC::Message const&, char const*) 	ipc/glue/MessageChannel.cpp:2522
5 	xul.dll 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp:2121
6 	xul.dll 	mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message&&) 	ipc/glue/MessageChannel.cpp:2049
7 	xul.dll 	mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) 	ipc/glue/MessageChannel.cpp:1895
8 	xul.dll 	mozilla::ipc::MessageChannel::MessageTask::Run() 	ipc/glue/MessageChannel.cpp:1928
9 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1037
10 	xul.dll 	NS_ProcessPendingEvents(nsIThread*, unsigned int) 	xpcom/threads/nsThreadUtils.cpp:466
11 	xul.dll 	nsWindow::DispatchPendingEvents() 	widget/windows/nsWindow.cpp:4256
12 	xul.dll 	nsWindow::ProcessMessage(unsigned int, unsigned int&, long&, long*) 	widget/windows/nsWindow.cpp:5857
13 	xul.dll 	nsWindow::WindowProcInternal(HWND__*, unsigned int, unsigned int, long) 	widget/windows/nsWindow.cpp:4955
14 	xul.dll 	CallWindowProcCrashProtected 	xpcom/base/nsCrashOnException.cpp:35
15 	xul.dll 	nsWindow::WindowProc(HWND__*, unsigned int, unsigned int, long) 	widget/windows/nsWindow.cpp:4907
16 	user32.dll 	InternalCallWinProc 	
17 	user32.dll 	UserCallWinProcCheckWow 	
18 	user32.dll 	DispatchMessageWorker 	
19 	user32.dll 	DispatchMessageW 	
20 	xul.dll 	nsAppShell::ProcessNextNativeEvent(bool) 	widget/windows/nsAppShell.cpp:352
21 	xul.dll 	nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) 	widget/nsBaseAppShell.cpp:273
22 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:950
23 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:97
24 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:319
25 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:299
26 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:158
27 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:230
28 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:288
29 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4694
30 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4856
31 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4951
32 	xul.dll 	mozilla::BootstrapImpl::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/Bootstrap.cpp:49
33 	firefox.exe 	do_main 	browser/app/nsBrowserApp.cpp:231
34 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:111
35 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
36 	kernel32.dll 	BaseThreadInitThunk 	
37 	ntdll.dll 	__RtlUserThreadStart 	
38 	ntdll.dll 	_RtlUserThreadStart

this cross-platform crash signature is hanging around in nightly since firefox 56 - all the reports show MOZ_CRASH(GFX_CRASH).
The GraphicsCriticalError shows a bunch of shader compilation failures:

|[0]CP+[GFX1]: Potential driver version mismatch ignored due to missing DLLs igd10umd32 v= and igd10iumd32 v= (t=7.44569)
|[121]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_rectangle", "")) (t=8.11849)
|[122]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_border_corner", "")) (t=8.11849)
|[123]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_border_edge", "")) (t=8.11849)
|[124]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_image", "")) (t=8.11849)
|[125]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_text_run", "")) (t=8.11849)
|[126]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_blend", "")) (t=8.11849)
|[127]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_text_run", "")) (t=8.11849)
|[128][GFX1 35]: Processing error in CompositorBridgeChild: 6 (t=8.11849)
|[114]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_gradient", "")) (t=8.11849)
|[115]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_rectangle", "")) (t=8.11849)
|[116]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_image", "")) (t=8.11849)
|[117]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_image", "")) (t=8.11849)
|[118]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_rectangle", "")) (t=8.11849)
|[119]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_image", "")) (t=8.11849)
|[120]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("ps_box_shadow", "")) (t=8.11849) 

Presumably we should try to repro on similar hardware?
Which is (taken from the telemetry environment on the crash report):

"adapters": [
    -
    {
        "description": "Intel(R) Q35 Express Chipset Family",
        "vendorID": "0x8086",
        "deviceID": "0x29b2",
        "subsysID": "2819103c",
        "RAM": null,
        "driver": "igdumdx32",
        "driverVersion": "8.15.10.1930",
        "driverDate": "9-23-2009",
        "GPUActive": true
    }
],
Whiteboard: [wr-mvp] [triage]
We need blocklisting for webrender (bug 1409022)
Priority: -- → P3
Priority: P3 → P2
Whiteboard: [wr-mvp] [triage] → [wr-mvp]
Webrender error log was added by Bug 1390138.
See Also: → 1390138
(In reply to Sotaro Ikeda [:sotaro] from comment #4)
> Webrender error log was added by Bug 1390138.

Webrender error log output does not directly related to CompositorManagerChild::ProcessingError(). For example, if I set "gfx.webrender.force-angle;false" on my one pc, I saw shader related error logs, but CompositorManagerChild::ProcessingError() was not called.
Priority: P2 → P3
Whiteboard: [wr-mvp] → [wr-reserve]
Been hitting this ~daily this week.
This is still happening but at a low frequency. The most recent one (bp-e8c2e834-690f-4468-98f1-2240e0180720) has this GraphicsCriticalError. Note that 0x8007000E is E_OUTOFMEMORY and the "6" in the final error is MsgRouteError.

====

|[0]GP+[GFX1-]: GFX: RenderThread detected a device reset in BeginFrame (t=4663.73) |[1]GP+[GFX1-]: GFX: RenderThread detected a device reset in BeginFrame (t=4665.92) |[2]GP+[GFX1-]: GFX: RenderThread detected a device reset in BeginFrame (t=4680.28) |[3]GP+[GFX1-]: Failed to load a program object with a program binary: brush_blend renderer ANGLE (Intel(R) HD Graphics 620 Direct3D11 vs_5_0 ps_5_0)
 (t=4701.14) |[4]GP+[GFX1-]: Failed program_binary (t=4701.14) |[5]GP+[GFX1-]: Failed to link shader program: brush_blend
C:\fakepath(536,20-34): warning X3556: integer modulus may be much slower, try using uints if possible.
C:\fakepath(536,39-53): warning X3556: integer divides may be much slower, try using uints if possible.


Error allocating VertexShader. HRESULT: 0x8007000E
 (t=4701.14) |[6]GP+[GFX1-]: Failed to load a program object with a program binary: brush_blend renderer ANGLE (Intel(R) HD Graphics 620 Direct3D11 vs_5_0 ps_5_0)
 (t=4701.14) |[7]GP+[GFX1-]: Failed program_binary (t=4701.14) |[8]GP+[GFX1-]: Failed to compile shader: brush_blend
 (t=4701.14) |[9]GP+[GFX1-]: wr_renderer_render: Shader(Link("brush_blend", "C:\\fakepath(536,20-34): warning X3556: integer modulus may be much slower, try using uints if possible.\nC:\\fakepath(536,39-53): warning X3556: integer divides may be much slower, try using uints if possible.\n\n\nError allocating VertexShader. HRESULT: 0x8007000E\n")) (t=4701.14) |[10]GP+[GFX1-]: wr_renderer_render: Shader(Compilation("brush_blend", "")) (t=4701.14) |[11][GFX1 35]: Processing error in CompositorBridgeChild: 6 (t=4701.14)
See Also: → 1479778
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> Note that 0x8007000E is E_OUTOFMEMORY and the "6" in the final error is
> MsgRouteError.

Just to expand on this a bit: I suspect the OOM results in IPDL actors maybe not getting created properly, which then results in the IPDL crash. So maybe some IPDL robustification is in order here.
We want to look at this in more detail to make sure it's not hardware specific.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #9)
> We want to look at this in more detail to make sure it's not hardware
> specific.

There should be corresponding crash reports generated for the GPU process, and those would have their own bugs. However, I looked a few of the most recent reports, and the submitters don't appear to have any GPU process crashes nearby. I'm guessing that is because the parent process crashes before the GPU crash report is generated.

The reports do claim the GPU process is running and it is on its first launch. This suggests a race between CompositorManagerChild being torn down before GPUChild, and something tried to send a message via CompositorManagerChild (or its actor children). This should be handled but it is kind of messy, so it is possible there is a mistake.
I talked with Andrew and it sounds like this is probably a race that's caused by the GPU process going down. We should still be getting GPU process crashes with WebRender so this will just cause some of those to show up in this bucket. I don't think we need to block nightly on this.
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
Hi Jeff, IIUC this isn't a WebRender crash.  It's merely triggered by WR; so I think we should move this out of the WR component and not block on it. WDYT?  We're already looking at memory issues in general for WR in other bugs.
Flags: needinfo?(jmuizelaar)
Priority: P3 → P4
Sure. That seems reasonable.
No longer blocks: stage-wr-trains
Component: Graphics: WebRender → Graphics: Layers
Flags: needinfo?(jmuizelaar)

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
You need to log in before you can comment on or make changes to this bug.