Closed Bug 768395 Opened 12 years ago Closed 10 years ago

crash in CDevice::DriverInternalErrorCB mainly with Intel GPUs

Categories

(Core :: Graphics, defect)

16 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox31 + verified
firefox32 --- verified
firefox33 --- verified

People

(Reporter: scoobidiver, Assigned: benjamin)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file, 1 obsolete file)

It's a bug similar to bug 630197, but for AMD GPUs.

Correlations per module in 14.0 show:
CDevice::DriverInternalErrorCB(long)|0x00000879 / 0x00000000 (20 crashes)
    100% (20/20) vs.   3% (1270/39773) atidxx32.dll
    100% (20/20) vs.   3% (1273/39773) atiuxpag.dll
    100% (20/20) vs.   3% (1380/39773) aticfx32.dll

Signature 	CDevice::DriverInternalErrorCB(long) More Reports Search
UUID	99aa6da6-b44a-44ea-8caf-7d9662120624
Date Processed	2012-06-24 13:36:15
Uptime	147
Last Crash	2.5 minutes before submission
Install Age	1.3 hours since version was first installed.
Install Time	2012-06-24 12:20:26
Product	Firefox
Version	14.0
Build ID	20120624012213
Release Channel	beta
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 44 stepping 2
Crash Reason	0x00000879 / 0x00000000
Crash Address	0x76c9b9bc
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x689c, AdapterSubsysID: 034a1043, AdapterDriverVersion: 9.0.0.0
D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x689c
Total Virtual Memory	4294836224
Available Virtual Memory	144334848
System Memory Use Percentage	41
Available Page File	18538545152
Available Physical Memory	7512104960

Frame 	Module 	Signature 	Source
0 	KERNELBASE.dll 	RaiseException 	
1 	d3d10_1core.dll 	CDevice::DriverInternalErrorCB 	
2 	d3d10_1core.dll 	CDevice::UMSetError_ 	
3 	aticfx32.dll 	aticfx32.dll@0x24f32 	
4 	atidxx32.dll 	atidxx32.dll@0xe631 	
5 	atidxx32.dll 	atidxx32.dll@0x4663 	
6 	atidxx32.dll 	atidxx32.dll@0x22a19 	
7 	atiuxpag.dll 	atiuxpag.dll@0x44e7 	
8 	aticfx32.dll 	aticfx32.dll@0x1300f 	
9 	aticfx32.dll 	aticfx32.dll@0x5dcca 	
10 	aticfx32.dll 	aticfx32.dll@0x5a5bb 	
11 	aticfx32.dll 	aticfx32.dll@0x6a5b7 	
12 	aticfx32.dll 	aticfx32.dll@0x253b 	
13 	d3d10_1core.dll 	CBaseResource<ID3D10Resource,2>::CopyResource 	
14 	d3d10_1core.dll 	NMultithread::CDevice::CopyResource 	
15 	xul.dll 	xul.dll@0x643e27 	
16 	xul.dll 	xul.dll@0x1e78d6

More reports at:
https://crash-stats.mozilla.com/report/list?signature=CDevice%3A%3ADriverInternalErrorCB%28long%29
I can confirm I'm experiencing this exact bug and the impact is very severe. I've experienced a total of 5 crashes in 2 hours of which 2 I've confirmed as the above. The remaining crashes either have a corrupt dump or are not yet available on Mozilla Crash Reports, however, considering the frequency of these crashes compared to the usual zero I think it's reasonable to assume they may be related.

Perhaps the most interesting thing to note however is that they seem to precisely coincide with an upgrade of the AMD Catalyst drivers from 10.6 to 10.8. I'll see if I can downgrade the drivers and witness the issue continuing to occur.
Apologies, above should read upgrade of AMD Catalyst drivers from 12.6 to 12.8.
Having disabled "Use hardware acceleration when available" under Options -> Advanced -> General -> Browsing no further crashes have been witnessed. It seems fairly clear from this the issue is related to accelerated graphics code paths and in my case only witnessed under AMD Catalyst 12.8 drivers. Stack traces of two recent relevant crashes indicate the preceding function calls all reside in AMD modules and so it's plausible the fault may lie in the drivers rather than Firefox?

https://crash-stats.mozilla.com/report/index/bp-0ae510f9-7bcc-4a42-8cb4-f1f7b2120915
https://crash-stats.mozilla.com/report/index/bp-cd10bb06-d2cf-4ce0-a801-f57222120915
I just had a couple people come into #firefox on Freenode, both confirmed that they had AMD graphics cards, with a 12.8 driver. One said that disabling hardware acceleration fixed the problem. This is also present on Windows 8, with Firefox 16 beta.
OS: Windows 7 → All
Hardware: x86 → All
Version: 14 Branch → 16 Branch
When OS is set to All, it is meant for Windows, Mac and Linux.
When HW is set to All, it is meant for 32-bit and 64-bit builds.
OS: All → Windows 7
Hardware: All → x86
This is currently #5 in Firefox 31.0a2 but is possibly just a manifestation of the ongoing intermittent AMD issues. Nominating for tracking to keep an eye on it.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #6)
> This is currently #5 in Firefox 31.0a2 but is possibly just a manifestation
> of the ongoing intermittent AMD issues. Nominating for tracking to keep an
> eye on it.

I can't speak for the past history of this crash signature, but the current wave of crashes is all Intel CPUs with Intel graphics. Some also have ATI code in addition to the Intel -- these may be laptops with switchable graphics.

So the current crashes aren't the AMD CPU bug, and they're probably not the same issue as comment 0 either. CDevice::DriverInternalErrorCB sounds like a generic error handler that can be triggered by different causes.
David, do you think what we're seeing on Aurora warrants its own bug?
We can open a new bug or we can retitle this one to cover the Intel crashes. I don't feel particularly strongly either way.
Topcrash, tracking!
I don't think that this is a regression in Firefox. Looking at the reports and summary at https://crash-stats.mozilla.com/report/list?signature=CDevice%3A%3ADriverInternalErrorCB%28long%29#tab-sigsummary this happens in Firefox 29 as well.

Regression by build shows that this started in nightly and aurora at around the same time:

https://crash-analysis.mozilla.com/bsmedberg/bug768395-DriverInternalError-nightly.svg
https://crash-analysis.mozilla.com/bsmedberg/bug768395-DriverInternalError-aurora.svg

So I strongly suspect a new driver version. Kairo can you look into driver versions in more detail here?
Flags: needinfo?(kairo)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #11)
> So I strongly suspect a new driver version. Kairo can you look into driver
> versions in more detail here?

I don't really know how unless I go for a few hours of parsing app notes. Bug 918386 needs to be solved before we can make any driver version queries in a really useful way.

The app notes facet of https://crash-stats.mozilla.com/search/?signature=~CDevice%3A%3ADriverInternalErrorCB%28long%29&_facets=app_notes&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform looks somewhat fancy, but I just guess that the things looking like 4-field version numbers are graphics driver versions. That's all I can give you at this point.

13 	10.18.10.3412 	723 	23.00 %
14 	10.18.10.3345 	630 	20.04 %
30 	10.18.10.3496 	277 	8.81 %
31 	10.18.10.3355 	225 	7.16 %
32 	10.18.10.3277 	213 	6.78 %
33 	8.15.10.1749 	172 	5.47 %
34 	10.18.10.3325 	157 	5.00 %
36 	10.18.10.3304 	147 	4.68 %
40 	10.18.10.3266 	97 	3.09 %
43 	10.18.10.3308 	81 	2.58 %
49 	10.18.10.3621 	55 	1.75 %

Those are all in the top 50 results of this fancy facet.
Flags: needinfo?(kairo)
Most of the crashing versions have been around for a while.

    Timestamp:        Tue Aug 20 11:05:12 2013 (5212A4A8)
    File version:     10.18.10.3277

    Timestamp:        Tue Oct 29 10:58:28 2013 (526EDE04)
    File version:     10.18.10.3345

    Timestamp:        Thu Jan 23 11:34:22 2014 (52E0476E)
    File version:     10.18.10.3412

    Timestamp:        Sat Mar 08 05:10:36 2014 (5319EF7C)
    File version:     10.18.10.3496

    Timestamp:        Sat May 17 16:20:39 2014 (5376E397)
    File version:     10.18.10.3621
ok, let's approach this from the other side: what is the error, and why is it an exception instead of a normal D3D error result?

If I'm reading ID3D10Device correctly, there is a Get/SetExceptionMode pair which determines whether errors cause structured exceptions (crashes) or whether the method should return a failure.

The "long" passed to DriverInternalErrorCB is 0x8876017c, which is the HRESULT for D3DERR_OUTOFVIDEOMEMORY.

This is the stack according to breakpad:

0 	KERNELBASE.dll 	RaiseException 	
1 	d3d10_1core.dll 	CDevice::DriverInternalErrorCB(long) 	
2 	d3d10_1core.dll 	CDevice::UMSetError_(D3D10DDI_HRTCORELAYER,long) 	
3 	igd10iumd32.dll 	igd10iumd32.dll@0x75bc9 	
4 	d3d10_1core.dll 	CBaseResource<ID3D10Texture1D,6>::FinalConstruct(D3D10DDIARG_CREATERESOURCE const &,SD3D10SharedResourceCreationArgs const *,D3D10DDI_HRTRESOURCE)

This is the stack in MSVC:

 	KERNELBASE.dll!_RaiseException@16()	Unknown
>	d3d10_1core.dll!CDevice::DriverInternalErrorCB(long)	Unknown
 	d3d10_1core.dll!CDevice::UMSetError_(struct D3D10DDI_HRTCORELAYER,long)	Unknown
 	d3d10_1core.dll!CBaseResource<struct ID3D10Texture1D,3>::FinalConstruct(struct D3D10DDIARG_CREATERESOURCE const &,struct SD3D10SharedResourceCreationArgs const *,struct D3D10DDI_HRTRESOURCE)	Unknown
 	d3d10_1core.dll!CTexture2D<6>::FinalConstruct(struct STexture2DConstructorArgs const &)	Unknown

The disassembly of DriverInternalErrorCB itself is:

CDevice::DriverInternalErrorCB:
6C94A7B5  mov         edi,edi  
6C94A7B7  push        ebp  
6C94A7B8  mov         ebp,esp  
6C94A7BA  push        ecx  
6C94A7BB  push        ecx  
6C94A7BC  inc         dword ptr ds:[6C9724ECh]  
6C94A7C2  push        esi  
6C94A7C3  mov         esi,ecx  
6C94A7C5  mov         eax,dword ptr [esi+11C0h]  
6C94A7CB  mov         ecx,dword ptr [eax]  
6C94A7CD  push        887A0020h   # This is DXGI_ERROR_DRIVER_INTERNAL_ERROR
6C94A7D2  push        eax  
6C94A7D3  call        dword ptr [ecx+28h]  # I'm guessing but this is probably an internal SetDeviceRemovedReason call
6C94A7D6  test        byte ptr [esi+13ECh],1  # Also guessing that this tests the internal exception-mode flag
6C94A7DD  pop         esi  
6C94A7DE  je          CDevice::DriverInternalErrorCB+56h (6C94A80Bh)  
6C94A7E0  mov         eax,dword ptr [ebp+8]  
6C94A7E3  push        6C941F78h  
6C94A7E8  mov         dword ptr [ebp-8],887A0020h  
6C94A7EF  mov         dword ptr [ebp-4],eax  
6C94A7F2  call        dword ptr ds:[6C94104Ch]  
6C94A7F8  lea         eax,[ebp-8]  
6C94A7FB  push        eax  
6C94A7FC  push        2  
6C94A7FE  push        0  
6C94A800  push        879h  
6C94A805  call        dword ptr ds:[6C94106Ch]  
6C94A80B  leave  

Is the default to throw an exception or not? I can't tell from the docs, but perhaps we should just test or set it explicitly when we create a device?
Flags: needinfo?(jmuizelaar)
We could try setting SetExceptionMode(0) and hopefully it would just cause the device to be lost. It seems like perhaps we're leaking memory though? Can we get an about:memory dump from just before it happens?
Flags: needinfo?(jmuizelaar) → needinfo?(sdl)
To be clear we appear to be running out of video-card memory: Firefox process memory appears to be fine, at least in the sample of reports I looked at.
I believe we try to account the video memory that we use on windows in about:memory. The gpu drivers will page video memory into our address space. Could it be that we've run out of address space for those allocations?
Just an update: this is the #6 topcrash for Firefox 31.0b1 for the last week, with 1513/29222 crashes.
Updating the title since the current incarnation of this crash signature is related to Intel cards.

Rank 	Adapter vendor id 	Count 	%
1 	0x8086 	5496 	98.78 %
Summary: crash in CDevice::DriverInternalErrorCB mainly with AMD GPUs → crash in CDevice::DriverInternalErrorCB mainly with Intel GPUs
Attached patch assert-and-SetExceptionMode (obsolete) — Splinter Review
Attachment #8441514 - Flags: review?(jmuizelaar)
Comment on attachment 8441514 [details] [diff] [review]
assert-and-SetExceptionMode

Review of attachment 8441514 [details] [diff] [review]:
-----------------------------------------------------------------

Let's give it a try
Attachment #8441514 - Flags: review?(jmuizelaar) → review+
It appears that the default may be to crash, at least in some cases; a try of this hits the assertion in Factory::SetDirect3D10Device that the exceptionmode is nonzero.
The failures were all on win7, not win8. Does this mean that win8 isn't hitting this codepath, or has a different default?
Assignee: nobody → benjamin
Attachment #8441514 - Attachment is obsolete: true
Attachment #8442074 - Flags: review?(jmuizelaar)
Attachment #8442074 - Flags: review?(jmuizelaar) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/f82b53676bd1
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
This was not seen on Nightly after build ID 20140619030203 yet, so I think it's good. Benjamin, can this please be uplifted to aurora and beta?
Flags: needinfo?(benjamin)
Comment on attachment 8442074 [details] [diff] [review]
assert-and-SetExceptionMode, v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not  sure, but probably a driver update
User impact if declined: More crashes
Testing completed (on m-c, etc.): landed, checked crash stats
Risk to taking this patch (and alternatives if risky): This is very low risk for release builds. Debug builds have an extra assertion which might be triggered, but that's good diagnostics.
String or IDL/UUID changes made by this patch: None
Attachment #8442074 - Flags: approval-mozilla-beta?
Attachment #8442074 - Flags: approval-mozilla-aurora?
Flags: needinfo?(benjamin)
Attachment #8442074 - Flags: approval-mozilla-beta?
Attachment #8442074 - Flags: approval-mozilla-beta+
Attachment #8442074 - Flags: approval-mozilla-aurora?
Attachment #8442074 - Flags: approval-mozilla-aurora+
I don't see any crashes after the 20140626181429 build for Firefox 31. (Yet.)
(In reply to Liz Henry :lizzard from comment #31)
> I don't see any crashes after the 20140626181429 build for Firefox 31. (Yet.)

Because there is no newer build out there (yet).
(In reply to Liz Henry :lizzard from comment #31)
> I don't see any crashes after the 20140626181429 build for Firefox 31. (Yet.)

That said, the patch has landed before this beta was created and should be in it. The crash signature also fell from #4 (4%) to #11 (1.2%) on the topcrash list.

Are the remaining cases a different cause?
We don't know what's causing the driver failures. This patch only fixed the default crash-on-error behavior of D3D; if some code is resetting the crash-on-error behavior after device initialization, we could still see the crash.

The patch here added an additional debug assert, so it would be good to get people who are still seeing this to run debug builds and report if they see any assertions.
Commenting due to needinfo request. Fortunately (for me at least!) I'm no longer experiencing this particular crash. At some point in the past I must have re-enabled the hardware acceleration option (assuming an update didn't do it automatically) but I've also updated my graphics drivers several times since my original comments; I tend to keep them reasonably up-to-date. I suspect I re-enabled it at some point after a graphics driver update to see if the issue was resolved, and seemingly, it must have been.

I can tell you that I don't see this crash using AMD Catalyst Software Suite v14.4 for Windows 7 x64. My particular graphics hardware consists of 2xRadeon HD 6970's in CrossFireX. Hope this is of some limited help.
Flags: needinfo?(sdl)
I just wanted to report that the volume of this crash has dropped on Beta. It is currently down 5 positions to #10, accounting for 1.85% of our crashes over the last week (down 1.43%).
7-day Stats
===========
Firefox 31 - 3355/3954 crashes are with builds after this landed (Ranked #9, down to 1.35%)
Firefox 32 - 26/32 crashes are with builds after this landed (Ranked #50, down to 0.04%)
Firefox 33 - only 1 crash reported, with 2014-06-16 build

Based on the stats and the report in comment 35 I think we can assume this is fixed, or at least mitigated. Volume is still kind of high on Beta but I'm not sure what can be done about that. We should still keep this on our radar as something to watch when we release Firefox 31 as it might require some user outreach but I'm marking this verified fixed.
You need to log in before you can comment on or make changes to this bug.