crash in mozilla::layers::CompositorD3D11::BeginFrame

RESOLVED FIXED in Firefox 50

Status

()

Core
Graphics: Layers
--
critical
RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: dmajor, Assigned: jerry)

Tracking

({crash})

unspecified
mozilla50
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox50 fixed)

Details

(Whiteboard: [gfx-noted][tbird topcrash], crash signature)

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-4e47d175-93f9-478d-8775-0f5892150215.
=============================================================

This comes and goes from the top crash list on nightly 38.

0 	xul.dll 	mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) 	gfx/layers/d3d11/CompositorD3D11.cpp
1 	xul.dll 	mozilla::layers::LayerManagerComposite::Render() 	gfx/layers/composite/LayerManagerComposite.cpp
2 	xul.dll 	mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp
3 	xul.dll 	mozilla::layers::LayerManagerComposite::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp
4 	xul.dll 	mozilla::layers::CompositorParent::CompositeToTarget(mozilla::gfx::DrawTarget*, nsIntRect const*) 	gfx/layers/ipc/CompositorParent.cpp
5 	xul.dll 	mozilla::layers::CompositorParent::CompositeCallback(mozilla::TimeStamp) 	gfx/layers/ipc/CompositorParent.cpp
6 	xul.dll 	RunnableMethod<mozilla::layers::CompositorParent, void ( mozilla::layers::CompositorParent::*)(mozilla::TimeStamp), Tuple1<mozilla::TimeStamp> >::Run() 	ipc/chromium/src/base/task.h
7 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc
8 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc
9 	xul.dll 	base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpWin::Dispatcher*) 	ipc/chromium/src/base/message_pump_win.cc
10 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h
11 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
12 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
13 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc
14 	xul.dll 	`anonymous namespace'::ThreadFunc(void*) 	ipc/chromium/src/base/platform_thread_win.cc
15 	kernel32.dll 	BaseThreadInitThunk 	
16 	ntdll.dll 	__RtlUserThreadStart 	
17 	ntdll.dll 	_RtlUserThreadStart
(Reporter)

Comment 1

3 years ago
What is the action item if we see these mutexes time out? Can we do anything about it?
Blocks: 1119854
Flags: needinfo?(bas)

Comment 2

3 years ago
I get these crashes, every time I view full screen flash video of more than a couple of minutes. They are followed a minute of two latter with a windows message the graphics card has stopped responding and that is followed by a full on blue screen crash and system restart.

In the lat 24 hours has come the added twist of Windows identifying plugin container as having stopped responding before the blue screen.

The one thing I don't think the crash data supplies is I am running with dual monitors.

Updated

3 years ago
Whiteboard: gfx-noted
I can reproduce this by forcing Firefox onto discrete GPU (on a W540 dual GPU setup), then disabling the discrete GPU in the device manager.

Updated

3 years ago
Whiteboard: gfx-noted → [gfx-noted][tbird crash]

Comment 4

3 years ago
This signature is 0.6% of crashes in 38.0b6 and makes it into the top 20 with that.
This is I guess another of the few TDR signatures.
Assignee: nobody → bas

Comment 6

3 years ago
#2 crash for TB38.0b4
Keywords: topcrash-thunderbird
Whiteboard: [gfx-noted][tbird crash] → [gfx-noted][tbird topcrash]

Updated

3 years ago
Crash Signature: [@ mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozill&hellip; → [@ mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozill&hellip;

Comment 7

3 years ago
I had another of these, this time in Thunderbird Daily following update the restart. 
https://crash-stats.mozilla.com/report/index/8bde75d3-afc5-4722-90c7-c4ef12150614 Refers

The program then went on to make a number of windows "stopped functioning" crashes but worked fine in Safe mode.

Found my router had crashed and on resetting it all errors disappeared.

Comment 8

3 years ago
¡Hola!

Another data point that I hope it's useful, if not my apologies for the bug spam =)

On https://bugzilla.mozilla.org/show_bug.cgi?id=1127270#c48 I was instructed to update the driver for Intel HD3000

So I installed win64_152824.exe

I disobeyed the installer and left Nightly running during the update.

This resulted on the following crash:

Report ID 	Date Submitted
bp-4388b86f-bd1b-4444-a05c-be5682150625
	25/06/2015	04:04 p.m.

That is seemingly this bug...
(Reporter)

Updated

3 years ago
Flags: needinfo?(bas)
This is no longer a top issue for Thunderbird but remains a significant issue for Firefox.

Thunderbird 38.3.0 has 1 reports
Thunderbird 38.2.0 has 0 reports
Thunderbird 38.1.0 has 11 reports

Firefox 41 has 1012 reports
Firefox 40 has 876 reports
Firefox 39 has 30 reports

Based on volume this would rank #30 for Firefox and doesn't rank at all for Thunderbird. I'm not sure if Firefox crashes are being investigated here or if that's a different bug report.
Keywords: topcrash-thunderbird

Comment 10

2 years ago
changing a signature, since it doesn't seem to be properly mapped in crash-stats atm.
Crash Signature: [@ mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozill&hellip; → [@ mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozill&hellip;

Comment 11

2 years ago
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #9)
> This is no longer a top issue for Thunderbird but remains a significant
> issue for Firefox.
> 
> Thunderbird 38.3.0 has 1 reports
> Thunderbird 38.2.0 has 0 reports

Ah, but we disabled HWA in 38.2.0 https://www.mozilla.org/en-US/thunderbird/38.2.0/releasenotes/
Jerry, you've spent some time dealing with device resets now, can you dig a bit into this one?  I wonder if we get a device reset while we're waiting to AcquireSync (https://hg.mozilla.org/integration/mozilla-inbound/annotate/b0096c5c7277/gfx/layers/d3d11/CompositorD3D11.cpp#l1198) so we time out.

Here's a recent crash: https://crash-stats.mozilla.com/report/index/3b10f2c6-a6d3-4068-8ec9-301de2160529
Flags: needinfo?(hshih)
Assignee: bas → nobody
(Assignee)

Updated

2 years ago
Assignee: nobody → hshih
Flags: needinfo?(hshih)
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED
Hi Bas,
If there is an IDXGIKeyedMutex from device context A and then the A is device-removed, does that mutex still work?

From
https://msdn.microsoft.com/en-us/library/windows/desktop/ff471339%28v=vs.85%29.aspx

The return value are:
E_FAIL
WAIT_ABANDONED
WAIT_TIMEOUT

I'm not sure the AcquireSync() call is still workable when the device is removed.
Flags: needinfo?(bas)
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #13)
> Hi Bas,
> If there is an IDXGIKeyedMutex from device context A and then the A is
> device-removed, does that mutex still work?
> 
> From
> https://msdn.microsoft.com/en-us/library/windows/desktop/ff471339%28v=vs.
> 85%29.aspx
> 
> The return value are:
> E_FAIL
> WAIT_ABANDONED
> WAIT_TIMEOUT
> 
> I'm not sure the AcquireSync() call is still workable when the device is
> removed.

We've discussed this a couple of times before, I've always been in favor of not crashing but checking whether the device has been reset in this situation.
Flags: needinfo?(bas)
See Also: → bug 1275798
If we handle it that way, let's make sure we do so in other places trying to get the sync texture (e.g., bug 1275798 comment 11)
Created attachment 8760161 [details] [diff] [review]
check device-removed status when we have timeout. v1
Attachment #8760161 - Flags: review?(bas)
Created attachment 8760166 [details] [diff] [review]
check device-removed status when we have timeout. v2
Attachment #8760166 - Flags: review?(bas)
(Assignee)

Updated

2 years ago
Attachment #8760161 - Attachment is obsolete: true
Attachment #8760161 - Flags: review?(bas)
Comment on attachment 8760166 [details] [diff] [review]
check device-removed status when we have timeout. v2

Review of attachment 8760166 [details] [diff] [review]:
-----------------------------------------------------------------

This patch will empty the renderBound when we have driver-removed during AcquireSync(). Then that frame is skipped. That might prevent a lot of timeout MOZ_ASSERT() in our textureHost code.

Should we update all textureHost timeout call or update the BeginFrame() at this moment?

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +1195,5 @@
>      MOZ_ASSERT(mutex);
>      HRESULT hr = mutex->AcquireSync(0, 10000);
>      if (hr == WAIT_TIMEOUT) {
> +      hr = mDevice->GetDeviceRemovedReason();
> +      if (hr == S_OK) {

If the device status is normal, we use crash for the timeout.

@@ +1203,5 @@
> +
> +      // Since the timeout is related to the driver-removed, clear the
> +      // render-bounding size to skip this frame.
> +      gfxCriticalNote << "GFX: D3D11 timeout with device-removed:" << gfx::hexa(hr);
> +      *aRenderBoundsOut = IntRect();

If this is related to driver-removed, empty the renderBound to skip this frame.
Attachment #8760166 - Flags: review?(milan)
Attachment #8760166 - Flags: review?(bas) → review+
Comment on attachment 8760166 [details] [diff] [review]
check device-removed status when we have timeout. v2

Review of attachment 8760166 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +1197,5 @@
>      if (hr == WAIT_TIMEOUT) {
> +      hr = mDevice->GetDeviceRemovedReason();
> +      if (hr == S_OK) {
> +        // There is no driver-removed event. Crash with this timeout.
> +        MOZ_CRASH("GFX: D3D11 timeout");

I would change this message slightly:
MOZ_CRASH("GFX: D3D11 normal status timeout");
for example.  That way, we can quickly search for the old type of crashes (we time out) vs. new type of crashes (we time out without a device reset) and can more easily find out of the original problem was fixed.
Attachment #8760166 - Flags: review?(milan) → review+
Created attachment 8760543 [details] [diff] [review]
check device-removed status when we have timeout. v3. r=milan, r=bas

update moz_crash message.
(Assignee)

Updated

2 years ago
Attachment #8760166 - Attachment is obsolete: true
please land the attachment 8760543 [details] [diff] [review] to m-c.
Keywords: checkin-needed

Comment 23

2 years ago
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/afd3d8815462
check device-removed status when we have timeout. r=milan, r=bas
Keywords: checkin-needed

Comment 24

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/afd3d8815462
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox50: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
This is still reproducible on Fx50, based on the last 2 months of crash data.

  SIGNATURE   | mozilla::layers::CompositorD3D11::BeginFrame
  ----------------------------------------------------------
  CRASH STATS | http://tinyurl.com/hc7otrn
  ----------------------------------------------------------
  OVERVIEW    | 33 crashes on nightly 52
	      | 127 crashes on nightly 51
	      | 23 crashes on aurora 51
	      | 2 crashes on nightly 50
	      | 12 crashes on aurora 50
	      | 4 crash on beta 50
  ----------------------------------------------------------
  LAST CRASH  | 2016-09-26 (on 50.0b1, 52.0a1)
status-firefox51: --- → affected
status-firefox52: --- → affected
Bug 1306168 is tracking ongoing crashes with this signature.
status-firefox51: affected → ---
status-firefox52: affected → ---
You need to log in before you can comment on or make changes to this bug.