Closed Bug 1159751 Opened 5 years ago Closed 4 years ago

shutdownhang in UMDevice::DestroyShaderResourceView

Categories

(Core :: Graphics, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox39 + wontfix
firefox38.0.5 --- unaffected
firefox40 + verified
firefox41 + fixed
firefox42 --- fixed

People

(Reporter: tracy, Assigned: milan)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(2 files, 1 obsolete file)

Not sure where this one belongs.  Put in Untriaged.  Exclusively on Windows 7. Is a shutdown hang in the top 10 Nightly crash list.

This bug was filed from the Socorro interface and is 
report bp-48610c9b-409d-4eb5-958d-0102d2150422.
=============================================================

Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::`anonymous namespace'::RunWatchdog(void*) 	toolkit/components/terminator/nsTerminator.cpp
1 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
2 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
3 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
4 	msvcr120.dll 	msvcr120.dll@0x2c000 	
5 	kernel32.dll 	BaseThreadInitThunk 	
6 	ntdll.dll 	__RtlUserThreadStart 	
7 	ntdll.dll 	_RtlUserThreadStart
[Tracking Requested - why for this release]:

For shutdownhangs, it's not the crashing thread that's interesting, it's Thread 0.

In this case, its top frames are:
0 	d3d10warp.dll 	UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*) 	
1 	d3d11.dll 	CShaderResourceView::CLS::FinalRelease(CContext*) 	
2 	d3d11.dll 	TCLSWrappers<CShaderResourceView>::CLSDestroy(CShaderResourceView::CLS*, CContext*) 	
3 	d3d11.dll 	CLayeredObjectWithCLS<CShaderResourceView>::~CLayeredObjectWithCLS<CShaderResourceView>() 	
4 	d3d11.dll 	CLayeredObjectWithCLS<CShaderResourceView>::`vector deleting destructor'(unsigned int) 	
5 	d3d11.dll 	CLayeredObjectWithCLS<CUnorderedAccessView>::Release() 	
6 	d3d11.dll 	ATL::AtlComPtrAssign(IUnknown**, IUnknown*) 	
[...]

Those are all in D3D libraries, so it belongs into GFX.

This shutdownhang is new in 39 compared to 38, and it's 2.2% of all 39.0b1 data, we'll need investigation of this.
Component: Untriaged → Graphics
Tracked for 39, 40, and 41, to address crashes.
Noting, this looks like the #4 topcrash in 39 right now.
Keywords: topcrash
Comment on attachment 8620470 [details] [diff] [review]
Pure speculation.

Review of attachment 8620470 [details] [diff] [review]:
-----------------------------------------------------------------

Shouldn't help, but why not.
Attachment #8620470 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/3f10cb55f1e0
Assignee: nobody → milan
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Let's leave this open and check the crash stats, given comment 6.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Needinfo me to check the crash stats.
Flags: needinfo?(milan)
It's interesting that thread 0 is in "real code" rather than waiting on a mutex.

The top frame is running a linear search for some value, with an array length of over 100,000 in the first random minidump I opened. I see at least three layers of 'vector deleting destructor' on the stack. In other words, a bunch of nested for-loops with a long linear search in the innermost. No surprise that it's taking a long time!

This seems like something that's just plain slow, rather than the kind of deadlock that the hang detector was intended to prevent. Any idea what might be triggering it? Is there a way we can accumulate fewer of these objects? Or just drop them on the floor instead of this long cleanup routine?
Rank 	Platform version 	Count 	%
1 	6.1.7600 	2447 	99.88 %

This started in 20150328030209: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e046475a75cb&tochange=ad587ca628cf

Gfx changes from that range:
	0e0e159457ec	Nicolas Silva — Bug 1145981 - Backout, missing review
	a626cde31115	Nicolas Silva — Bug 1146912 - Finish replacing TextureHost::GetTextureSource by BindTextureSource. r=sotaro
	48a6591e10a2	Nicolas Silva — Bug 1145981 - Do not crash when a DIB texture is updated without a compositor. r=jrmuizel
	f1dfed74dc9a	Nicolas Silva — Bug 1147894 - Remove the redundant OpUpdateTexture IPDL message. r=sotaro
	f39fd4c98562	Nicolas Silva — Bug 1147894 - Only use non-null compositors with TextureHost::SetCompositor. r=sotaro
	96c8ca415e45	Bas Schouten — Bug 1147728: When using WARP, don't try to create a synchronization texture. This will fail on Windows 7. r=jrmuizel

Bug 1147728 looks pretty suspicious given the WARP and Win7 connection. What happens if we don't create a synchronization texture? Could it lead to the creation of this huge mass of ShaderResourceView's?
(In reply to David Major [:dmajor] from comment #12)
> Rank 	Platform version 	Count 	%
> 1 	6.1.7600 	2447 	99.88 %
> 
> This started in 20150328030209:
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=e046475a75cb&tochange=ad587ca628cf
> 
> Gfx changes from that range:

It'd be surprising if it was one of these:
> 	0e0e159457ec	Nicolas Silva — Bug 1145981 - Backout, missing review
> 	48a6591e10a2	Nicolas Silva — Bug 1145981 - Do not crash when a DIB texture
> is updated without a compositor. r=jrmuizel
> 	a626cde31115	Nicolas Silva — Bug 1146912 - Finish replacing
> TextureHost::GetTextureSource by BindTextureSource. r=sotaro
> 	f39fd4c98562	Nicolas Silva — Bug 1147894 - Only use non-null compositors
> with TextureHost::SetCompositor. r=sotaro

but these two could have done it, and I agree the last one clearly looks the most suspicious on correlation:


> 	f1dfed74dc9a	Nicolas Silva — Bug 1147894 - Remove the redundant
> OpUpdateTexture IPDL message. r=sotaro

> 	96c8ca415e45	Bas Schouten — Bug 1147728: When using WARP, don't try to
> create a synchronization texture. This will fail on Windows 7. r=jrmuizel
> 
> Bug 1147728 looks pretty suspicious given the WARP and Win7 connection. What
> happens if we don't create a synchronization texture? Could it lead to the
> creation of this huge mass of ShaderResourceView's?
Flags: needinfo?(milan)
Flags: needinfo?(bas)
(In reply to David Major [:dmajor] from comment #12)
> Rank 	Platform version 	Count 	%
> 1 	6.1.7600 	2447 	99.88 %
> 
> This started in 20150328030209:
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=e046475a75cb&tochange=ad587ca628cf
> 
> Gfx changes from that range:
> 	0e0e159457ec	Nicolas Silva — Bug 1145981 - Backout, missing review
> 	a626cde31115	Nicolas Silva — Bug 1146912 - Finish replacing
> TextureHost::GetTextureSource by BindTextureSource. r=sotaro
> 	48a6591e10a2	Nicolas Silva — Bug 1145981 - Do not crash when a DIB texture
> is updated without a compositor. r=jrmuizel
> 	f1dfed74dc9a	Nicolas Silva — Bug 1147894 - Remove the redundant
> OpUpdateTexture IPDL message. r=sotaro
> 	f39fd4c98562	Nicolas Silva — Bug 1147894 - Only use non-null compositors
> with TextureHost::SetCompositor. r=sotaro
> 	96c8ca415e45	Bas Schouten — Bug 1147728: When using WARP, don't try to
> create a synchronization texture. This will fail on Windows 7. r=jrmuizel
> 
> Bug 1147728 looks pretty suspicious given the WARP and Win7 connection. What
> happens if we don't create a synchronization texture? Could it lead to the
> creation of this huge mass of ShaderResourceView's?

Yeah.. that's when we started using WARP on Win7, so sadly it's not very informative :(.
Flags: needinfo?(bas)
We have enough crashes here to be able to do some experimenting - Bas, can we add something to the crash reports that would help us diagnose this further?
Flags: needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #15)
> We have enough crashes here to be able to do some experimenting - Bas, can
> we add something to the crash reports that would help us diagnose this
> further?

Not any diagnostics that I could think of, I could do an experimental patch to reduce the amount of ShaderResourceViews that we use.
Flags: needinfo?(bas)
In this case you (but not our users) are lucky we're already in the RC stage of 39, otherwise I would have called for a backout of the whole feature. I really do not want us to ship any features to release with regressions like this.
We will not remove it from 39 now obviously but if we can't solve this, I think we should rethink if we can leave it in for 40.
KaiRo what feature are you talking about?  bug 1159751 or warp in general?  

Is that something that rode with 39 that I wasn't aware of since if it was ever tracked, it was fixed by the time we began to look at tracked bugs?  My impression here is that we don't have a cause narrowed down.  I would like to understand what you're talking about.
Flags: needinfo?(kairo)
(In reply to Liz Henry (:lizzard) from comment #18)
> KaiRo what feature are you talking about?  bug 1159751 or warp in general?  

I meant WARP in general. IMHO, we can't ship features like that which cause a significant issue like the bug here if we want to follow the call for more quality.

> Is that something that rode with 39 that I wasn't aware of since if it was
> ever tracked, it was fixed by the time we began to look at tracked bugs?  My
> impression here is that we don't have a cause narrowed down.  I would like
> to understand what you're talking about.

We know that introducing WARP caused this, and that those are new shutdownhangs that we didn't have before turning on WARP. Together with the volume, that's enough for me to call for backout or even feature removal if this can't be fixed.
Flags: needinfo?(kairo)
Crash Signature: [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] → [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] [@ shutdownhang | UMDevice::DestroyShaderResourceView]
Summary: crash in shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*) → shutdownhang in UMDevice::DestroyShaderResourceView
(In reply to Robert Kaiser (:kairo@mozilla.com) - on vacation or slow to reply until the end of June from comment #19)
> 
> We know that introducing WARP caused this, and that those are new
> shutdownhangs that we didn't have before turning on WARP. Together with the
> volume, that's enough for me to call for backout or even feature removal if
> this can't be fixed.

Agreed.  Enter bug 1179504, currently waiting for uplift approval.
Crash Signature: [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] [@ shutdownhang | UMDevice::DestroyShaderResourceView] → [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] [@ shutdownhang | UMDevice::DestroyShaderResourceView] [@ shutdownhang | UMResource::~UMResource()] [@ shutdownhang | UMResource::~UMResource]
Crash Signature: [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] [@ shutdownhang | UMDevice::DestroyShaderResourceView] [@ shutdownhang | UMResource::~UMResource()] [@ shutdownhang | UMResource::~UMResource] → [@ shutdownhang | UMDevice::DestroyShaderResourceView(UMDevice*, UMShaderResourceViewContainer*)] [@ shutdownhang | UMDevice::DestroyShaderResourceView] [@ shutdownhang | UMResource::~UMResource()] [@ shutdownhang | UMResource::~UMResource] [@ shutdow…
Bug 1179504 landed on Aurora, but the patch attached to it wasn't enough to actually, fully, truly, disable warp.
Comment on attachment 8632945 [details] [diff] [review]
Make sure we never use WARP on Windows 7

Review of attachment 8632945 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good way to minimize the change, given that we want to uplift this patch.  To avoid confusion, and since we don't know that this fixed the problem in this bug, I would put this patch against bug 1179504, where the original one was.  Then we can verify this crash goes away once that lands.
Attachment #8632945 - Flags: review?(milan) → review+
We will coordinate this with bug 1179504 patches.
Relanded on aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/59f92930d90f

Central got it back through inbound which never really lost it.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch harderSplinter Review
Approval Request Comment
Same as before but now with fewer crashes. This is already on Aurora.
Attachment #8632945 - Attachment is obsolete: true
Attachment #8636601 - Flags: approval-mozilla-beta?
Comment on attachment 8636601 [details] [diff] [review]
harder

WARP needs to be disabled for win7. Discussed with Lawrence and we should be uplifting this to beta.
Attachment #8636601 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Those shutdownhangs are gone in 40.0b7 and higher, where bug 1179504 landed to disabled WARP on Win7.
As KaiRo notes in bug 1179504, OOMs also decreased a lot with the WARP-disabling. That would line up with the large accumulation of objects seen in comment 11; it's likely there was a leak in the system.
David, is this fix also in Beta41? In that case, could we set status-firefox41 to fixed? Thanks.
Flags: needinfo?(dmajor)
I'm setting these based on comment 29. Additionally confirmed by crash-stats.
Flags: needinfo?(dmajor)
I'm closing this bug as there have been zero reports against the current or previous release (Firefox 41 and 40, respectively). We also have zero reports against the current test branches.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.