Closed Bug 830531 Opened 11 years ago Closed 4 months ago

[Win8] crash in XPC_WN_Helper_NewResolve mainly with AMD Radeon HD 6290/6310/6320/7290/7310/7340 (Wrestler Asic)

Categories

(Core :: JavaScript Engine, defect)

19 Branch
x86
Windows 8
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox19 + affected
firefox20 + affected

People

(Reporter: scoobidiver, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [Win8][qa-not-actionable])

Crash Data

Attachments

(3 files)

It first showed up in 20.0a2/20130113 and is currently #8 top browser crasher in Aurora (high for a crash specific to Windows 8). The regression range is:
http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=4f74542c3678&tochange=cf2ccc84268f

The stack trace usually looks like:
Frame 	Module 	Signature 	Source
0 	KERNELBASE.dll 	TlsGetValue 	
1 		@0x3757fe04

But there are a few crashes with a better one:
Signature 	TlsGetValue More Reports Search
UUID	aa5e3fda-7925-40d1-a3f1-bbe232130114
Date Processed	2013-01-14 22:14:41
Uptime	21449
Last Crash	7.8 hours before submission
Install Age	8.2 hours since version was first installed.
Install Time	2013-01-14 14:01:27
Product	Firefox
Version	20.0a2
Build ID	20130113042017
Release Channel	aurora
OS	Windows NT
OS Version	6.2.9200
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 20 model 2 stepping 0
Crash Reason	EXCEPTION_BREAKPOINT
Crash Address	0x7501be04
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x9809, AdapterSubsysID: 00000000, AdapterDriverVersion: 8.982.7.0
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers+ 
EMCheckCompatibility	True
Adapter Vendor ID	0x1002
Adapter Device ID	0x9809
Total Virtual Memory	4294836224
Available Virtual Memory	3831201792
System Memory Use Percentage	51
Available Page File	2244403200
Available Physical Memory	1862819840

Frame 	Module 	Signature 	Source
0 	KERNELBASE.dll 	TlsGetValue 	
1 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:391
2 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:439
3 	mozjs.dll 	js::GetPropertyOperation 	js/src/jsinterpinlines.h:279
4 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2235
5 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:348
6 	mozjs.dll 	UncachedInlineCall 	js/src/methodjit/InvokeHelpers.cpp:372
7 	mozjs.dll 	js::mjit::stubs::UncachedCallHelper 	js/src/methodjit/InvokeHelpers.cpp:460
8 	mozjs.dll 	js::mjit::CallCompiler::update 	js/src/methodjit/MonoIC.cpp:1236
9 	mozjs.dll 	js::mjit::ic::Call 	js/src/methodjit/MonoIC.cpp:1317
10 	mozjs.dll 	js::mjit::JaegerShot 	js/src/methodjit/MethodJIT.cpp:1117
11 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:2419
12 	mozjs.dll 	JS_DHashTableOperate 	js/src/jsdhash.cpp:581
13 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:406
14 	mozjs.dll 	js::Invoke 	js/src/jsinterp.cpp:439
15 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5805
16 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/xpconnect/src/XPCWrappedJSClass.cpp:1432
17 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/xpconnect/src/XPCWrappedJS.cpp:580
18 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:85
19 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:112
20 	xul.dll 	nsBrowserStatusFilter::OnStateChange 	toolkit/components/statusfilter/nsBrowserStatusFilter.cpp:150
21 	xul.dll 	nsDocLoader::DoFireOnStateChange 	uriloader/base/nsDocLoader.cpp:1305
22 	xul.dll 	nsDocLoader::doStopDocumentLoad 	uriloader/base/nsDocLoader.cpp:896
23 	xul.dll 	nsDocLoader::DocLoaderIsEmpty 	uriloader/base/nsDocLoader.cpp:775
24 	xul.dll 	nsDocLoader::OnStopRequest 	uriloader/base/nsDocLoader.cpp:659
25 	xul.dll 	nsLoadGroup::RemoveRequest 	netwerk/base/src/nsLoadGroup.cpp:676
26 	xul.dll 	nsLoadGroup::QueryInterface 	netwerk/base/src/nsLoadGroup.cpp:155
27 	xul.dll 	nsDocument::UnblockOnload 	content/base/src/nsDocument.cpp:7322
28 	xul.dll 	nsRunnableMethodImpl<void 	obj-firefox/dist/include/nsThreadUtils.h:367
29 	nspr4.dll 	_MD_CURRENT_THREAD 	nsprpub/pr/src/md/windows/w95thred.c:312
30 	nspr4.dll 	PR_Unlock 	nsprpub/pr/src/threads/combined/prulock.c:315
31 	xul.dll 	nsTimerImpl::Cancel 	xpcom/threads/nsTimerImpl.cpp:337
32 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:208
33 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:182
34 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
35 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:232
36 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:288
37 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3823
38 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3890
39 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4093

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A20.0a2&signature=TlsGetValue
URLs:
9 	https://www.facebook.com/
4 	https://www.facebook.com/messages/mozhgan.golzarian
3 	https://www.facebook.com/marissa.oneal2012/friends?ft_ref=mni
2 	about:blank
2 	https://www.facebook.com/cher0308?ref=tn_tnmn
2 	http://forum.paradoxplaza.com/forum/showthread.php?657612-A-Federation-of-quot-E
2 	http://forum.paradoxplaza.com/forum/showthread.php?657612-A-Federation-of-quot-E
...and a longer list of random pages with a single hit.

There doesn't seem to be any notable connection with URLs here.

Correlations:

Modules:
100% (51/51) vs.   9% (180/1920) bcryptPrimitives.dll
100% (51/51) vs.  10% (184/1920) WINMMBASE.dll
100% (51/51) vs.  10% (184/1920) combase.dll
100% (51/51) vs.  10% (184/1920) SHCore.dll
98% (50/51) vs.  11% (217/1920) winhttp.dll
98% (50/51) vs.  13% (246/1920) aticfx32.dll
92% (47/51) vs.   7% (141/1920) atiu9pag.dll
92% (47/51) vs.   8% (146/1920) atiumdva.dll
92% (47/51) vs.   8% (159/1920) atiumdag.dll

(nothing that interesting in add-on correlations, though)
Keywords: needURLs
There are no crashes in 20.0a2/20130114 and above so it's likely caused by two connected patches landed in Aurora with one day of delay.
Let's wait a few days to confirm it's gone.
It was a one-day spike.
Status: NEW → RESOLVED
Closed: 11 years ago
Keywords: steps-wanted
Resolution: --- → WORKSFORME
Back in 20.0a2/20130117. It seems PGO related.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
David/Naveed - I know how much you love spikey PGO-related crashes :)

Is there any hardening or disabling of PGO around this code that we can perform?
It's currently #77 top browser crasher in 20.0a2.
Keywords: topcrash
It spike again in 20.0a2/20130207.
is firefox 19 really unaffected?
(In reply to philipp from comment #9)
> is firefox 19 really unaffected?
It's indeed #1 top browser crasher in 19.0 while it was a very low volume crash in 19.0b6.
Based on comments, Firefox 19 is unusable on Windows 8.
Severity: critical → blocker
Crash Signature: [@ TlsGetValue] → [@ TlsGetValue] [@ InterlockedIncrement]
Summary: [Win8] crash in TlsGetValue → [Win8] crash in XPC_WN_Helper_NewResolve
Keywords: topcrash
Version: 20 Branch → 19 Branch
first support question is coming in - any useful information we can gather or troubleshooting to suggest? https://support.mozilla.org/en-US/questions/950825
(In reply to philipp from comment #14)
> any useful information we can gather or troubleshooting to suggest?
The workaround is to downgrade to 18.0.2.


With combined signatures, it accounts for 57% of all crashes.

More reports also at:
https://crash-stats.mozilla.com/report/list?signature=XPC_WN_Helper_NewResolve
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve]
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x2b]
(In reply to Scoobidiver from comment #11)
> (In reply to philipp from comment #9)
> > is firefox 19 really unaffected?
> It's indeed #1 top browser crasher in 19.0 while it was a very low volume
> crash in 19.0b6.
> Based on comments, Firefox 19 is unusable on Windows 8.

Unusable on Win8 for affected users. Let's hold judgement until we get some data around # of unique affected users.

Scoobidiver/KaiRo - are there any high correlations for this crash signature? We'll ask QA to test Win8 until we have more leads
Flags: needinfo?(kairo)
It's hard to tell, but this looks like an xpconnect problem. The crash happens when constructing an XPCCallContext. My guess is that it calls nsXPConnect::GetXPConnect(), which calls NS_IsMainThread(), and that calls this:

http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsThreadUtils.cpp#129

However, it's a bit hard to know if that's where the crash is happening.

The code that's triggering this is a release-mode assertion. However, it looks like it's not the assertion itself that's crashing us--it's the act of checking the assertion condition. Also, the fact that it's sporadic across versions suggests that it's not an assertion firing.

To relieve the crash, we could try removing the assertion. It might just cause us to crash elsewhere, but it might fix the problem.
Taking QA Contact to help coordinate any testing necessary.
(In reply to Bill McCloskey (:billm) from comment #17)
> It's hard to tell, but this looks like an xpconnect problem. The crash
> happens when constructing an XPCCallContext. My guess is that it calls
> nsXPConnect::GetXPConnect(), which calls NS_IsMainThread(), and that calls
> this:
> 
> http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsThreadUtils.
> cpp#129
> 
> However, it's a bit hard to know if that's where the crash is happening.
> 
> The code that's triggering this is a release-mode assertion. However, it
> looks like it's not the assertion itself that's crashing us--it's the act of
> checking the assertion condition. Also, the fact that it's sporadic across
> versions suggests that it's not an assertion firing.
> 
> To relieve the crash, we could try removing the assertion. It might just
> cause us to crash elsewhere, but it might fix the problem.

Is it possible to glean anything about why TlsGetValue is crashing? If we could tell that the segment registers are bogus that would be big. If we can't, maybe the TLS is corrupt or something but that'd be harder to tell without a full crash dump.
So far I tested on two different Windows 8 machines and have not yet been able to reproduce the crash. If you need the machine specs I can provide them.
Something weird is going on, and I don't know what to make of it.

(1) If you look at the second thread in these crash reports - either on crash-stats or in Visual Studio - it's breakpad. I've never seen this before. Is that normal? Could it be that breakpad is either masking the fault with another one, or somehow misreporting or double-crashing?

To make things weirder, the crashing thread is totally unreadable to Visual Studio. WinDbg seems to be okay. 

(2) The crash as seen by WinDbg is:

> 0:000> .ecxr
> eax=00000022 ebx=05dc5400 ecx=00000000 edx=00000022 esi=768c5895 edi=2d6284e0
> eip=746abe04 esp=00cdc744 ebp=00cdc758 iopl=0         nv up ei pl nz na po nc
> cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
> KERNELBASE!TlsGetValue+0x4:
> 746abe04 ec              in      al,dx
> 
> 0:000> u 746abe00
> KERNELBASE!TlsGetValue:
> 746abe00 8bff            mov     edi,edi
> 746abe02 55              push    ebp
> 746abe03 8bec            mov     ebp,esp
> 746abe05 648b0d18000000  mov     ecx,dword ptr fs:[18h]

So, if the breakpad information is correct, the main thread jumped to some random address inside TlsGetValue.

Again, I don't know what to make of this. Next steps might be (1) seeing what's going on with breakpad or (2) seeing if there's a correlation to binary addons.
CCing ted and bsmedberg. Maybe they can figure out what's going on here.
CCing Ioana so she can keep Softvision informed of what's going on here. Ioana, depending on the state of this bug when you get online later tonight, please see if your team can find steps to reproduce.
QA Contact: anthony.s.hughes
(In reply to Bill McCloskey (:billm) from comment #17)
> To relieve the crash, we could try removing the assertion. It might just
> cause us to crash elsewhere, but it might fix the problem.

The release-mode assertion that XPConnect is always used on the main thread is an important one in terms of keeping addons honest. I'd be pretty unhappy about removing it.
Top URLs for TlsGetValue:

127 	about:blank
105 	https://www.facebook.com/
57 	http://www.facebook.com/
38 	about:sessionrestore
21 	about:newtab
21 	about:home
18 	https://mail.google.com/mail/?shva=1#inbox
16 	https://mail.google.com/mail/u/0/?shva=1#inbox
16 	http://www.facebook.com/?ref=tn_tnmn

And here's a breakdown of installations:

breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age  * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature='TlsGetValue' AND utc_day_is(date_processed, '2013-02-19') GROUP BY version;  
  version   | crashes | installations 
------------+---------+---------------
 10.0.12esr |       1 |             1
 17.0.1     |       1 |             1
 18.0.1     |       1 |             1
 18.0.2     |      29 |            29
 19.0       |    2479 |          1792
 3.0b1      |       1 |             1
 3.6        |       2 |             2
 3.6.2      |       1 |             1
 4.0b4      |       1 |             1
 5.0        |       1 |             1
 9.0        |       1 |             1
(11 rows)
Flags: needinfo?(kairo)
(In reply to David Anderson [:dvander] from comment #21)
> So, if the breakpad information is correct, the main thread jumped to some
> random address inside TlsGetValue.

"jumped to some random address" reminds me of bug 839270 - do we know what graphics card this is happening on?
(In reply to David Anderson [:dvander] from comment #21)
> Something weird is going on, and I don't know what to make of it.
> 
> (1) If you look at the second thread in these crash reports - either on
> crash-stats or in Visual Studio - it's breakpad. I've never seen this
> before. Is that normal? Could it be that breakpad is either masking the
> fault with another one, or somehow misreporting or double-crashing?

Do you have an example report that shows this? I clicked through a few reports and didn't see what you were talking about. Note that the minidumps always include the Breakpad thread that does the dump writing (the minidump-writing code includes all threads), but Breakpad includes a special stream with the thread ID of that thread so it knows to skip it while printing stack traces. It's possible that information is simply missing. If you can show me an example I can tell for sure. 

> To make things weirder, the crashing thread is totally unreadable to Visual
> Studio. WinDbg seems to be okay. 
> 
> (2) The crash as seen by WinDbg is:
> 
> > 0:000> .ecxr
> > eax=00000022 ebx=05dc5400 ecx=00000000 edx=00000022 esi=768c5895 edi=2d6284e0
> > eip=746abe04 esp=00cdc744 ebp=00cdc758 iopl=0         nv up ei pl nz na po nc
> > cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200202
> > KERNELBASE!TlsGetValue+0x4:
> > 746abe04 ec              in      al,dx
> > 
> > 0:000> u 746abe00
> > KERNELBASE!TlsGetValue:
> > 746abe00 8bff            mov     edi,edi
> > 746abe02 55              push    ebp
> > 746abe03 8bec            mov     ebp,esp
> > 746abe05 648b0d18000000  mov     ecx,dword ptr fs:[18h]
> 
> So, if the breakpad information is correct, the main thread jumped to some
> random address inside TlsGetValue.

Generally the exception record is pretty reliable. This comes directly from Windows' EXCEPTION_POINTERS data, so I tend to believe it. However, that doesn't mean that anything following it is reliable--obviously the register state could be corrupted in myriad ways. Unfortunately it's really hard to figure out the root cause after-the-fact when something like that happens.

I've downloaded a few dumps, I'll take a look at them tomorrow.
Depends on: 842855
https://bugzilla.mozilla.org/show_bug.cgi?id=842855#c4 has the reasoning for temporarily disabling updates for Win8 users tomorrow (morning?) instead of today. I'd like us to get more data.
Most of the individual reports I looked at had AMD.

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #26)
> (In reply to David Anderson [:dvander] from comment #21)
> > So, if the breakpad information is correct, the main thread jumped to some
> > random address inside TlsGetValue.
> 
> "jumped to some random address" reminds me of bug 839270 - do we know what
> graphics card this is happening on?
No longer depends on: 842855
Depends on: 842855
Cc:ing Andrew who might have some insight here. If anyone needs access to crash dumps here, I'm happy to help...
(In reply to Marcia Knous [:marcia] from comment #29)
> Most of the individual reports I looked at had AMD.
It's the right lead to follow. It's even restricted to the following device IDs: 0x9802, 0x9806, 0x9807, 0x9808, 0x9809, 0x980a (see http://developer.amd.com/resources/hardware-drivers/ati-catalyst-pc-vendor-id-1002-li/ for the matching GPUs).

It's a kind of bug 839270 but in the XPConnect component and with more device IDs.

D2D and D9D are disabled because of bug 840161.
Summary: [Win8] crash in XPC_WN_Helper_NewResolve → [Win8] crash in XPC_WN_Helper_NewResolve mainly with AMD Radeon HD 6290/6310/6320/7290/7310/7340
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x2b] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x2b] [@ nsXPConnect::GetXPConnect()]
FWIW, I dug hard into one of these crashes:

https://crash-stats.mozilla.com/report/index/d1805cc2-9606-4def-8ecb-64ec92130219

The crash is happening under this line:

http://hg.mozilla.org/releases/mozilla-release/file/20238b786063/js/xpconnect/src/nsXPConnect.cpp#l139

In the debugger, the inlined call to TlsGetValue looks like this:

   136:     // Do a release-mode assert that we're not doing anything significant in
   137:     // XPConnect off the main thread. If you're an extension developer hitting
   138:     // this, you need to change your code. See bug 716167.
   139:     if (!MOZ_LIKELY(NS_IsMainThread() || NS_IsCycleCollectorThread()))
03401857  mov         eax,dword ptr ds:[10E63428h]  
0340185C  push        esi  
0340185D  mov         esi,dword ptr ds:[109BF4E8h]  
03401863  push        eax  
03401864  call        esi  <-- crash under this call

The disassembled version of this looks like this:

  10171857: A1 28 34 0B 11     mov         eax,dword ptr [?gTLSThreadIDIndex@@3KA]
  1017185C: 56                 push        esi
  1017185D: 8B 35 E8 F4 C0 10  mov         esi,dword ptr [__imp__TlsGetValue@4]
  10171863: 50                 push        eax
  10171864: FF D6              call        esi

The correct relocated value is 0x03e9f4e8 __imp__TlsGetValue@4

So, just like the prior bugs, we appear to have an incorrect relocation or a 2-byte memory corruption. This is not a code bug.

I would *love* to find a person who can reproduce this and set up some kind of debugging mechanism to actually watch the relocation/corruption happen.
Tried to hunt this bug looking at the comments from crash reports but with no success. Upgraded from 18.0.2 to 19.0, played with gmail, facebook, yahoo, youtube, pdf. 
Using Windows 8 x32, Firefox 19.0 RC, with AMD Radeon HD 6450 (this is the only GPU we have here that is close to the ones related to the summary of the bug).
I have this problem on Windows 8 x64 with AMD HD6320
Win 8 x86, AMD Radeon HD 7700 Series
I updated from Firefox 18.0.2 to 19.0 and used facebook, google services, about:newtab, session restore, etc with HWA both disabled and enabled, but I haven't encountered any crash.
So, here's the correlations:

Modules:
100% (1732/1732) vs.   9% (5527/61447) bcryptPrimitives.dll
100% (1732/1732) vs.   9% (5605/61447) SHCore.dll
100% (1732/1732) vs.   9% (5617/61447) WINMMBASE.dll
100% (1732/1732) vs.   9% (5617/61447) combase.dll
94% (1625/1732) vs.   9% (5433/61447) winhttp.dll
85% (1473/1732) vs.   7% (4567/61447) aticfx32.dll
80% (1383/1732) vs.   6% (3487/61447) atiu9pag.dll
80% (1383/1732) vs.   6% (3692/61447) atiumdva.dll
80% (1383/1732) vs.   6% (3757/61447) atiumdag.dll
96% (1670/1732) vs.  24% (14933/61447) explorerframe.dll
96% (1670/1732) vs.  24% (14943/61447) dui70.dll
99% (1709/1732) vs.  27% (16493/61447) NapiNSP.dll
99% (1708/1732) vs.  27% (16493/61447) pnrpnsp.dll
99% (1709/1732) vs.  27% (16567/61447) nlaapi.dll
100% (1729/1732) vs.  28% (17344/61447) DWrite.dll
100% (1728/1732) vs.  28% (17313/61447) cryptsp.dll
96% (1670/1732) vs.  25% (15179/61447) duser.dll
96% (1656/1732) vs.  26% (16102/61447) FWPUCLNT.DLL
100% (1732/1732) vs.  43% (26508/61447) sspicli.dll
80% (1383/1732) vs.  26% (16076/61447) d3d9.dll
98% (1698/1732) vs.  45% (27577/61447) ntmarta.dll

Nothing interesting in Add-ons.

Cores:
100% (1728/1732) vs.  58% (35899/61447) x86 with 2 cores


So, this is mostly ATI-graphics machines (as ati*.dll are the drivers for those), as we know, and it's all (!) dual-core machines - no single-core, no more-than-dual-core.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #37)
> ... and it's all (!) dual-core machines - no single-core, no
> more-than-dual-core.

Actually, almost all - the 4 other crashes with this signature are on single-core machines.
Here's a few module version correlations, first the Windows ones:

100% (1732/1732) vs.   9% (5527/61447) bcryptPrimitives.dll
0% (0/1732) vs.   0% (35/61447) 6.2.8400.0
100% (1732/1732) vs.   9% (5492/61447) 6.2.9200.16384
100% (1732/1732) vs.   9% (5605/61447) SHCore.dll
0% (0/1732) vs.   0% (53/61447) 6.2.8102.0
0% (0/1732) vs.   0% (37/61447) 6.2.8250.0
0% (0/1732) vs.   0% (35/61447) 6.2.8400.0
21% (356/1732) vs.   2% (1193/61447) 6.2.9200.16384
4% (70/1732) vs.   0% (258/61447) 6.2.9200.16420
75% (1306/1732) vs.   7% (4029/61447) 6.2.9200.16433
100% (1732/1732) vs.   9% (5617/61447) WINMMBASE.dll
0% (0/1732) vs.   0% (53/61447) 6.2.8102.0
0% (0/1732) vs.   0% (37/61447) 6.2.8250.0
0% (0/1732) vs.   0% (35/61447) 6.2.8400.0
100% (1732/1732) vs.   9% (5492/61447) 6.2.9200.16384
100% (1732/1732) vs.   9% (5617/61447) combase.dll
0% (0/1732) vs.   0% (53/61447) 6.2.8102.0
0% (0/1732) vs.   0% (37/61447) 6.2.8250.0
0% (0/1732) vs.   0% (35/61447) 6.2.8400.0
22% (374/1732) vs.   2% (1270/61447) 6.2.9200.16384
78% (1358/1732) vs.   7% (4222/61447) 6.2.9200.16420

I would guess that the 8000 versions are the previews/betas of Win8 and the 16000 ones the release(s), which would make this only happen with the latter.

And now the ATI drivers - leaving out all the versions this is not happening with.

85% (1473/1732) vs.   7% (4567/61447) aticfx32.dll
80% (1388/1732) vs.   4% (2735/61447) 8.17.10.1140
3% (47/1732) vs.   0% (241/61447) 8.17.10.1151
2% (38/1732) vs.   0% (291/61447) 8.17.10.1172

80% (1383/1732) vs.   6% (3487/61447) atiu9pag.dll
80% (1380/1732) vs.   4% (2719/61447) 8.14.1.6268
0% (1/1732) vs.   0% (32/61447) 8.14.1.6278
0% (2/1732) vs.   0% (64/61447) 8.14.1.6290

80% (1383/1732) vs.   6% (3692/61447) atiumdva.dll
80% (1380/1732) vs.   4% (2711/61447) 8.14.10.363
0% (1/1732) vs.   0% (26/61447) 8.14.10.370
0% (2/1732) vs.   0% (57/61447) 8.14.10.381

80% (1383/1732) vs.   6% (3757/61447) atiumdag.dll
80% (1380/1732) vs.   4% (2713/61447) 9.14.10.924
0% (1/1732) vs.   0% (25/61447) 9.14.10.926
0% (2/1732) vs.   0% (53/61447) 9.14.10.945
FWIW the AMD Wrestler GPUs are part of AMDs low-cost low-power Bobcat core (http://en.wikipedia.org/wiki/Bobcat_(microarchitecture)). These are all one and two core CPUs with on chip GPUs. 

Do we have any evidence of this problem on any other CPU GPU combo?

If it is restricted to this subset are we able to selectively target FF to not update on those machines?
I have an Acer netbook with the AMD C70 dualcore CPU and an integrated Radeon 7290. Since installing a Windows update today I've been encountering intermittent blue screens of death. The last time this occurred was while scrolling an email in Gmail with Firefox 19. If someone can guide me to where those dumps are stored in Windows 8 I'm happy to provide them for debugging.
Check http://support.microsoft.com/kb/315263 and its last chapter.

This tool (http://www.nirsoft.net/utils/blue_screen_view.html) can help to open minidumps.
I noticed a bunch of Windows updates were installed today. I'm trying to uninstall them one at a time to see which one resolves my blue screens. Unfortunately this is going to be a long process. I'll comment back here when I find something.
Incredible amount of crashes. 20 times. Windows 8 with AMD dual core. With the graphics card. Also blue screened three times as well. 6 gb ram x64. I think it's crashing to much to be usable. IE 10 for now. It happened today.
After uninstalling http://support.microsoft.com/?kbid=2805940 I no longer experience blue screens but I do experience consistent startup crashes with the signature @nsXPConnect::GetXPConnect(). Starting a new profile I no longer see the crashes.

Please advise how I can debug this further.
Here is one of my crash reports:
https://crash-stats.mozilla.com/report/index/bp-64154ba2-3f16-48c4-9812-82cef2130220

I did some research (thanks Loic for the tip) and the blue screen I was seeing was this:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff558949%28v=vs.85%29.aspx

"This error has been linked to excessive paged pool usage and may occur due to user-mode graphics drivers crossing over and passing bad data to the kernel code."
I have an easily reproducible case where I can just be using GMail for a minute or so when it crashes. This happens on a new profile as well.

Benjamin is helping me debug this with WinDbg.
I installed Win8 on our trusty HP Pavilion dm1 with an AMD Radeon HD 6310, and I tried running Fx18.0.2 with several sites. I then installed Fx19 (pave over installation) and tried the same sites (netflix, facebook, facebook games, yelp, hotmail, yahoo), and I was not able to crash after maybe 15 minutes of user interaction.

Then I enabled the about:config pref layers.acceleration.force-enable, restarted and tried the same things, and after a little bit of closing and opening tabs I was able to crash with this related signature:

https://crash-stats.mozilla.com/report/index/bp-46c82cc6-e6f3-4e57-966b-371462130220

It wasn't easy to crash, and I had to force enable hardware acceleration. The machine has not installed about 30 pending Windows system updates. You can access it through the MV network at 10.250.6.86 with VNC.
I'm not sure if it's useful information but under the same circumstances as comment 47, Firefox 19.0-final crashes but 19.0b6 does not.
Dropping stepswanted and qawanted from this bug since we've made progress on it and I have a reproducible case. I will continue to assist Benjamin with investigation as required.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #49)
> I'm not sure if it's useful information but under the same circumstances as
> comment 47, Firefox 19.0-final crashes but 19.0b6 does not.

Some of these other AMD-only crashes we've had seemed to be PGO related, and would appear or disappear from build to build.  See bug 772330 and the various blocking bugs.
(In reply to Scoobidiver from comment #31)
> D2D and D9D are disabled because of bug 840161.
In fact, Direct3D 9 is enabled in most crash reports I've checked because bug 840161 only applies to Windows 7.
Direct2D is disabled because we required 9.10.8.0 or above for AMD GPUs on Windows 8.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #41)
> I have an Acer netbook with the AMD C70 dualcore CPU and an integrated
> Radeon 7290. Since installing a Windows update today I've been encountering
> intermittent blue screens of death. The last time this occurred was while
> scrolling an email in Gmail with Firefox 19. If someone can guide me to
> where those dumps are stored in Windows 8 I'm happy to provide them for
> debugging.

Mine is exactly the same netbook as yours.  BSOD is caused by Windows update 2778344 so you can just uninstall it to stop BSOD happening again.

However, Firefox 19 still crashes regardless but no BSOD.  I have switched to Chrome for a while until Firefox is fixed.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #47)
> I have an easily reproducible case where I can just be using GMail for a
> minute or so when it crashes. This happens on a new profile as well.
> 
> Benjamin is helping me debug this with WinDbg.

Any updates around the success of debugging?

To help guide a final solution here, it's highly desirable to find a pref (or something similar) that we could flip instead of rolling a 19.0.1. The reasoning is:

* This is our only 19.0.1 driver right now
* We'd only want to push out updates to Win8 users, which is apparently difficult from the RelEng side of things
* Even if we could update only Win8 users, it's possible that we'd continue playing whack-a-mole with this crash (unless we find a true fix)
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x2b] [@ nsXPConnect::GetXPConnect()] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x0 | XPC_WN_Helper_NewResolve ] [@ @0x2b] [@ nsXPConnect::GetXPConnect()] [@ XPC_WN_NoHelper_Resolve ]
(In reply to Scoobidiver from comment #52)
> (In reply to Scoobidiver from comment #31)
> > D2D and D9D are disabled because of bug 840161.
> In fact, Direct3D 9 is enabled in most crash reports I've checked because
> bug 840161 only applies to Windows 7.
> Direct2D is disabled because we required 9.10.8.0 or above for AMD GPUs on
> Windows 8.

Let's see if blocklisting for Win8 would have the intended affect.

gfx team - what pref should ashughes set to emulate the blocklist w/o needing to stage one?
To emulate blocklisted graphics features, use the .disabled prefs, in particular:

layers.acceleration.disabled = true  to emulate blocklisting d3d9 / d3d10 layers

gfx.direct2d.disabled = true to emulate blocklisting direct2d
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #39)
> And now the ATI drivers - leaving out all the versions this is not happening
> with.
> 85% (1473/1732) vs.   7% (4567/61447) aticfx32.dll
> 80% (1388/1732) vs.   4% (2735/61447) 8.17.10.1140  -> Catalyst 12.08
> 3% (47/1732) vs.   0% (241/61447) 8.17.10.1151 -> Catalyst 12.10
> 2% (38/1732) vs.   0% (291/61447) 8.17.10.1172 -> Catalyst 13.1
Based on http://amddevcentral.com/Resources/hardware-drivers/ccc/Pages/default.aspx, I wrote the matching driver versions.
See bug 806991 for Direct2D-blocking on Windows 8.

(In reply to Alex Keybl [:akeybl] from comment #56)
> gfx team - what pref should ashughes set to emulate the blocklist w/o
> needing to stage one?
With Catalyst 12.10 or lower (Direct2D already disabled), set layers.acceleration.disabled to true. Compare with the default value. Based on comment 48, it should be OK.
Windows 8 x64 on Toshiba Satellite C660D(AMD E-450,AMD Radeon HD 6320)
layers.acceleration.disabled = true  
gfx.direct2d.disabled = true
With Catalyst 13.1 or Catalyst 12.10 i have crashes.
(In reply to sergantjohns from comment #59)
> layers.acceleration.disabled = true  
> gfx.direct2d.disabled = true
You need to restart Firefox to make those changes apply.
Setting layers.acceleration.disabled=TRUE and gfx.direct2d.disabled=TRUE and restarting Firefox did not make a difference. I'm still crashing.

https://crash-stats.mozilla.com/report/bp-8729f688-57dd-458f-ae62-883752130221
(In reply to Scoobidiver from comment #60)
> (In reply to sergantjohns from comment #59)
> > layers.acceleration.disabled = true  
> > gfx.direct2d.disabled = true
> You need to restart Firefox to make those changes apply.

I'm still crashing after restart Firefox
sergantjohns, Anthony, thanks for that testing - so we now know that the blocklisting would be ineffective. :(
Firefox 20.0b1(installer) crashes with some times BSOD but 20.0b1-candidates(build2 from zip) does not.Why?
(In reply to sergantjohns from comment #64)
> Firefox 20.0b1(installer) crashes with some times BSOD but
> 20.0b1-candidates(build2 from zip) does not.Why?
For the same reason 19.0b6 doesn't crash and not 19.0 despite an identical code. It's a random bug depending on how the compiler optimizes the code.
(In reply to Scoobidiver from comment #65)
> (In reply to sergantjohns from comment #64)
> > Firefox 20.0b1(installer) crashes with some times BSOD but
> > 20.0b1-candidates(build2 from zip) does not.Why?
> For the same reason 19.0b6 doesn't crash and not 19.0 despite an identical
> code. It's a random bug depending on how the compiler optimizes the code.

Actually, if both are build2 then they have been built and optimized exactly the same. If it's different builds, then this is possible.
Benjamin asked me via IRC to clarify something.

In comment 45 I mentioned uninstalling http://support.microsoft.com/?kbid=2805940 resolved the BSODs I was experiencing. This was not entirely factual.

Removing http://support.microsoft.com/?kbid=2805940 reduced the occurrence of BSODs. Gary advised removing http://support.microsoft.com/?kbid=2778344 in comment 54. Doing so resolved my BSODs completely.

To clarify, I had to remove both 2805940 and 2778344 to resolve my blue screens (it was not enough to remove one or the other). None of this resolves the Firefox crashes.
Summary: [Win8] crash in XPC_WN_Helper_NewResolve mainly with AMD Radeon HD 6290/6310/6320/7290/7310/7340 → [Win8] crash in XPC_WN_Helper_NewResolve mainly with AMD Radeon HD 6290/6310/6320/7290/7310/7340 (Wrestler Asic)
19.0.1-candidates/build1/ I don't have crashes!
(In reply to sergantjohns from comment #69)
> 19.0.1-candidates/build1/ I don't have crashes!

Thanks!

Let's leave status-firefox20 as affected so that the investigation here continues into the FF20 beta cycle (even if we call this resolved for FF19).
I confirm that I'm not getting crashes with the 19.0.1 candidate builds.
Depends on: 844156
Depends on: 772330
I can confirm that there's no crash with 19.0.1 build too.

A windows update for AMD processor has just come out: http://support.microsoft.com/kb/2818604

It's a small update for AMD processors, could this be it?
The update from comment 72 is very unlikely to be relevant. It's for the CPU not the graphics, and it's only for very specific stepping numbers.
Gary and ashughes, can you both please attach your dxdiag output from your affected netbooks to this bug?

https://help.ea.com/article/how-to-gather-dxdiag-information
Flags: needinfo?(enquiry)
Flags: needinfo?(anthony.s.hughes)
This is the DxDiag from the HP Pavilion from comment 48 which experiences the crash but only after significant usage.
Attached file DxDiag from ashughes
Here is my DxDiag for the Acer netbook used in comment 47.
Flags: needinfo?(anthony.s.hughes)
Attached file dxdiag
Flags: needinfo?(enquiry)
Since it's been asked of QA to spotcheck release builds against this bug on known affected hardware until we're confident it is fixed I'm reporting here that 19.0.2#1 candidates appear to not be hitting this bug on my netbook.

Gary or anyone else seeing this bug before, if you'd like to confirm for yourself please test the following builds:
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/19.0.2-candidates/build1/win32/

Thanks
Windows 8 x64 on Toshiba C660D(AMD E-450,AMD Radeon HD 6320) with the 19.0.2-candidates/build1/ I don't have crashes!
Dropping QAWANTED from this bug as there doesn't seem to be any specific assistance we can provide any longer. We'll continue to track and verify mitigated for releases until this is actually fixed. Please re-add QAWANTED if there's some specific way we can be of service.
Keywords: qawanted
It's no longer a top crasher.
Severity: blocker → critical
Keywords: topcrash
Should be solved as fix, no longer crashes on Radeon HD 7340 E2-1800.
(In reply to Nick from comment #82)
> Should be solved as fix, no longer crashes on Radeon HD 7340 E2-1800.

Unfortunately we can't call this categorically "fixed" unless we knowingly landed a patch which fixed this. We could resolve this as WORKSFORME if it's started to disappear for those who were previously affected.

When did you first notice this crash went away? Did you recently receive any Windows or AMD Driver updates? It would be good to know if this was fixed internally or "magically" by some other code we landed.
I just queried crash-stats and it looks like there are still people out there who are experiencing this crash. I'm seeing nearly 600 reports in the last week, most of which are in the latest release (Firefox 28).

Nick, based on these numbers I think this is very unlikely to be fixed. Perhaps something in recent days/months has changed on your system to make encountering this crash much more rare.
I had this issue last year, crashed about 20 times in one day. I quit using firefox till they had what I had forgoten update, used it again in several months again. It was early 2013 I think. I have windows 8.1 X64 now, so I think that is the reason. However, I still have WDDM 1.2 as AMD did not update it yet, but I no longer have the issue. WDDM 1.2 was probably the issue, but I think they probably have old drivers now. Remember a lot of people don't update graphics driver a lot, but this is specutive. Maybe there is something different in Radeon 7340 and the ealier Bobcat. E2-1800 apu is kind of rare, not many have it. AMD did some very minor rework in it. The most likely thing is they have old drivers. AMD had stablity issues in early windows 8 drivers, as when I got my hp computer in 2012, it would very often crash. I no longer have this when I went back to it in 2013. Regardless, they are probably using the drivers that would often crash. I don't know, but what I do know is that I used the windows 8.1 beta since july 2013, and then windows 8.1 when it came out. That was when the issues ended in firefox. Maybe Microsoft fixed it and/or AMD, I'm probably no help for those still on Widnows 8.
Assignee: general → nobody
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x0 | XPC_WN_Helper_NewResolve ] [@ @0x2b] [@ nsXPConnect::GetXPConnect()] [@ XPC_WN_NoHelper_Resolve ] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x0 | XPC_WN_Helper_NewResolve ] [@ @0x2b] [@ nsXPConnect::GetXPConnect()] [@ XPC_WN_NoHelper_Resolve ] [@ nsXPConnect::GetXPConnect]
Whiteboard: [Win8] → [Win8][qa-not-actionable]
Severity: critical → S2

Since the crash volume is low (less than 5 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
Crash Signature: [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x0 | XPC_WN_Helper_NewResolve ] [@ @0x2b] [@ nsXPConnect::GetXPConnect()] [@ XPC_WN_NoHelper_Resolve ] [@ nsXPConnect::GetXPConnect] → [@ TlsGetValue] [@ InterlockedIncrement] [@ XPC_WN_Helper_NewResolve] [@ @0x0 | XPC_WN_Helper_NewResolve ] [@ @0x2b] [@ nsXPConnect::GetXPConnect] [@ XPC_WN_NoHelper_Resolve ] [@ nsXPConnect::GetXPConnect]

Let's close this. Windows 8 is no longer supported and any remaining crashes are better tracked in a new bug.

Status: REOPENED → RESOLVED
Closed: 11 years ago4 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: