Closed Bug 635464 Opened 13 years ago Closed 2 years ago

Crash in TextStageManager::MapTextureTransferSurface while switching from ATI/AMD's to Intel's GPU with Intel driver versions 8.15.10.2141 and below

Categories

(Core :: Graphics, defect, P3)

x86
Windows 7
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox31 - affected

People

(Reporter: scoobidiver, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, Whiteboard: [platform-rel-Intel][platform-rel-AMD])

Crash Data

Attachments

(1 file)

It is a crash signature that still happens in certain conditions, that is the switching from ATI's GPU to Intel's GPU.
It is #236 crasher in 4.0b11 over the last week.

Stack traces are various.

Correlations by module give:
     95% (21/22) vs.   2% (1435/62097) igd10umd32.dll
         41% (9/22) vs.   0% (9/62097) 8.15.10.2125
         55% (12/22) vs.   0% (12/62097) 8.15.10.2141
Intel's driver blocklisting is not taken into account because when Firefox is launched, the vendor ID is ATI/AMD.


Comments say:
"ThinkPad W500 with "switchable graphics": crashed after completing switch from ATI Mobility FireGL V5700 to Intel Mobile Graphics. Web content area of Firefox window was black for a moment, then content displayed, then crash."
"ThinkPad W500 switch from discrete ATI Mobility FireGL V5700 to (chipset) Intel Accelerated graphics. Current web content tries to display, but crashes."
"after switching AMD graphics to Intel graphics O:-)"

More reports at:
https://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=_VEC_memzero&version=Firefox%3A4.0b11
Summary: Crash while switching from ATI/AMD's to Intel's GPU [@ _VEC_memzero ] → Crash while switching from ATI/AMD's to Intel's GPU with Intel driver versions 8.15.10.2141 and below [@ _VEC_memzero ]
FYI, those are driver versions Microsoft is blocking from getting Win7 SP1.
http://support.microsoft.com/kb/2498452
* Igdkmd32.sys (32-bit), versions 8.15.10.2104 through 8.15.10.2141
* Igdkmd64.sys (64-bit), versions 8.15.10.2104 through 8.15.10.2141
The only way to solve this at the moment is to ask G45 chipset(X4500HD/MHD IGP) users to update to latest drivers 8.15.10.2281 if possible.

32bit
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2991&DwnldID=19788

64bit
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2991&DwnldID=19784

And for HD Graphics(Clarkdale/Arrandale) users to update to latest drivers 8.15.10.2291 if possible.

32bit
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3319&DwnldID=19807

64bit
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=3319&DwnldID=19809
> The only way to solve this at the moment is to ask G45 chipset(X4500HD/MHD IGP)
> users to update to latest drivers 8.15.10.2281 if possible.
No because these users use an ATI's driver not an Intel's one.
Blocks: 601079
Crash Signature: [@ _VEC_memzero ]
Blocks: 605779, 605780
An unknown bug added _VEC_memzero to the skiplist (see https://github.com/mozilla/socorro/blob/master/scripts/config/processorconfig.py.dist).
Crash Signature: [@ _VEC_memzero ] → [@ _VEC_memzero ] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface]
(In reply to Scoobidiver from comment #4)
> An unknown bug added _VEC_memzero to the skiplist (see
> https://github.com/mozilla/socorro/blob/master/scripts/config/
> processorconfig.py.dist).

bug 715921 (as mentioned in the commit message for this line - https://github.com/mozilla/socorro/commit/ef1bfa72005612d560bd4feea7a1fe93bf8e8a88 ) as requested by :mats.

I'm also not sure if this updated skiplist has even been deployed to production yet.
Depends on: 715921
Crash Signature: [@ _VEC_memzero ] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface] → [@ _VEC_memzero ] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)]
Kairo, do you know when it's supposed to be added?
Note that these driver versions (8.15.10.2141 and below) are *already* blacklisted. On Win7 / Intel GMA X4500/HD , everything under 8.15.10.2202 is blacklisted, see 
https://wiki.mozilla.org/Blocklisting/Blocked_Graphics_Drivers#Intel_cards

As Scoobidiver notes in comment 3, this is explained by the fact that we get confused by the fact that there are two GPUs.

CC'ing Ali. At some point there was a proposal to require, in case of dual GPUs, that we require both driver versions to be high enough, regardless of which GPU is currently in use. Is there a bug number for this? I would support that move now.
This bug is about switching GPUs (12%) and bug 711656 is about startup crashes (88%).

(In reply to Benoit Jacob [:bjacob] from comment #7)
> As Scoobidiver notes in comment 3, this is explained by the fact that we get
> confused by the fact that there are two GPUs.
For startup crashes, the GPU is Intel and the driver version looks like 8.15.10.xxxx in almost all cases so there's no confusion.
(In reply to Benoit Jacob [:bjacob] from comment #7)
 > As Scoobidiver notes in comment 3, this is explained by the fact that we get
> confused by the fact that there are two GPUs.
> 
> CC'ing Ali. At some point there was a proposal to require, in case of dual
> GPUs, that we require both driver versions to be high enough, regardless of
> which GPU is currently in use. Is there a bug number for this? I would
> support that move now.

Filed Bug 724874 for this.
(In reply to Scoobidiver from comment #8)
> This bug is about switching GPUs (12%) and bug 711656 is about startup
> crashes (88%).
> 
> (In reply to Benoit Jacob [:bjacob] from comment #7)
> > As Scoobidiver notes in comment 3, this is explained by the fact that we get
> > confused by the fact that there are two GPUs.
> For startup crashes, the GPU is Intel and the driver version looks like
> 8.15.10.xxxx in almost all cases so there's no confusion.

So, these users are already blacklisted, so I really don't see how they can crash?

Are volumes low enough to allow for the hypothesis that it's power-users who have set .force-enabled in about:config?
(In reply to Benoit Jacob [:bjacob] from comment #10)
> So, these users are already blacklisted, so I really don't see how they can
> crash?
These users are no longer blacklisted from 11.0a1/20111215. See bug 711656 comment 16 (same crash signature but only for startup crashes).
There are more than 600 crashes in 22.0.

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=_VEC_memzero+|+TextStageManager%3A%3AMapTextureTransferSurface%28D2D_RECT_U+const%26%2C+unsigned+char**%2C+unsigned+int*%29
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=memset+|+TextStageManager%3A%3AMapTextureTransferSurface%28D2D_RECT_U+const%26%2C+unsigned+char**%2C+unsigned+int*%29
Crash Signature: [@ _VEC_memzero ] [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] → [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ memset | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*) ]
Summary: Crash while switching from ATI/AMD's to Intel's GPU with Intel driver versions 8.15.10.2141 and below [@ _VEC_memzero ] → Crash in TextStageManager::MapTextureTransferSurface while switching from ATI/AMD's to Intel's GPU with Intel driver versions 8.15.10.2141 and below
This has come back again, #7 on Aurora, but may just be the random AMD issue we've been having for over a year.
Topcrash, tracking!
It is not longer in the top #20. It could have been the AMD bug. Untracking.
Crash Signature: [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ memset | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*) ] → [@ _VEC_memzero | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*)] [@ memset | TextStageManager::MapTextureTransferSurface(D2D_RECT_U const&, unsigned char**, unsigned int*) ] [@ _VEC_memzero | TextStageMan…
Whiteboard: [platform-rel-Intel]
platform-rel: --- → ?
Attachment #8772465 - Flags: review?(bas) → review+
Comment on attachment 8772465 [details]
Bug 635464: Diagnostic crash in nightly and aurora, to see if we are asking basic content client for alpha.

https://reviewboard.mozilla.org/r/65260/#review62424

::: gfx/layers/client/ContentClient.cpp:131
(Diff revision 1)
>                                   RefPtr<gfx::DrawTarget>* aBlackDT,
>                                   RefPtr<gfx::DrawTarget>* aWhiteDT)
>  {
>    MOZ_ASSERT(!(aFlags & BUFFER_COMPONENT_ALPHA));
> +  if (aFlags & BUFFER_COMPONENT_ALPHA) {
> +    gfxDevCrash(LogReason::AlphaWithBasicClient) << "Asking basic content client for alpha";

I'd prefer adding 'Component alpha' explicitly in the message here to avoid confusion. It's common place to ask it for regular alpha clients.
Comment on attachment 8772465 [details]
Bug 635464: Diagnostic crash in nightly and aurora, to see if we are asking basic content client for alpha.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/65260/diff/1-2/
Pushed by msreckovic@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/115209d4f543
Diagnostic crash in nightly and aurora, to see if we are asking basic content client for alpha. r=bas
No crashes with this extra info over the weekend.
I see this crash 100% of the time when I load netflix.com in current Win32 Nightly on my Lenovo W530 running Win7 x64.

For example:

https://crash-stats.mozilla.com/report/index/b371e24e-f2e9-4e6c-b2dc-da97d2160823

Is there any debug information I can collect to help fix this bug?
Flags: needinfo?(milan)
Flags: needinfo?(bas)
(In reply to Chris Pearce (:cpearce) from comment #22)
> I see this crash 100% of the time when I load netflix.com in current Win32
> Nightly on my Lenovo W530 running Win7 x64.
> 
> For example:
> 
> https://crash-stats.mozilla.com/report/index/b371e24e-f2e9-4e6c-b2dc-
> da97d2160823
> 
> Is there any debug information I can collect to help fix this bug?

Can you catch it in a debugger and get a full stack? The stack on that is pretty useless.
Flags: needinfo?(bas)
Flags: needinfo?(milan) → needinfo?(cpearce)
This is a much better stack: https://crash-stats.mozilla.com/report/index/a7f4d884-7033-44e5-a218-7ac122160822

Bas - thoughts?
Flags: needinfo?(cpearce) → needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #24)
> This is a much better stack:
> https://crash-stats.mozilla.com/report/index/a7f4d884-7033-44e5-a218-
> 7ac122160822
> 
> Bas - thoughts?

Note that that stack is an EXCEPTION_BREAKPOINT, it also appears from the metadata to be happening during a device reset. That's problematic in itself, but a little less worrying than the other one which seems to have happened in a clean situation and in a reproducible manner. Then again, they might simply be totally different bugs.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #25)
> (In reply to Milan Sreckovic [:milan] from comment #24)
> > This is a much better stack:
> > https://crash-stats.mozilla.com/report/index/a7f4d884-7033-44e5-a218-
> > 7ac122160822
> > 
> > Bas - thoughts?
> 
> Note that that stack is an EXCEPTION_BREAKPOINT, it also appears from the
> metadata to be happening during a device reset. That's problematic in
> itself, but a little less worrying than the other one which seems to have
> happened in a clean situation and in a reproducible manner. Then again, they
> might simply be totally different bugs.

Note the latter stack trace is more likely to occur on a GPU switch. But if in that situation something inside the driver decides to throw a random exception (i.e. int 3), there's not much we can do other than handle it? But that would require wrapping a lot of commands in exception handlers which I doubt we'll want to do. Contacting AMD may be the best option.
Whiteboard: [platform-rel-Intel] → [platform-rel-Intel][platform-rel-AMD]
Got this crash the 1st time a few minutes ago:
https://crash-stats.mozilla.com/report/index/fc45d14f-6b9c-4303-806a-46df82161025
(In reply to Loic from comment #27)
> Got this crash the 1st time a few minutes ago:
> https://crash-stats.mozilla.com/report/index/fc45d14f-6b9c-4303-806a-
> 46df82161025

The error reported in this one is D2DERR_RECREATE_TARGET
platform-rel: ? → ---
The leave-open keyword is there and there is no activity for 6 months.
:davidb, maybe it's time to close this bug?
Flags: needinfo?(dbolter)
Good question. I think this might be a good topic for the new GFX manager and team. I think the crashes are pretty low volume and this bug doesn't seem to have proven useful to keep open.
Flags: needinfo?(jbonisteel)
Flags: needinfo?(dbolter)
Flags: needinfo?(aosmond)
Severity: critical → normal
Flags: needinfo?(jbonisteel)
Priority: -- → P5
Flags: needinfo?(aosmond)

The leave-open keyword is there and there is no activity for 6 months.
:jbonisteel, maybe it's time to close this bug?

Flags: needinfo?(jbonisteel)
Flags: needinfo?(jbonisteel)

The leave-open keyword is there and there is no activity for 6 months.
:jbonisteel, maybe it's time to close this bug?

Flags: needinfo?(jbonisteel)
Flags: needinfo?(jbonisteel)

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #33)

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Valid bug with current crash reports. Crash volume is very low, which probably has to do with how it's triggered.

Severity: normal → S4
Flags: needinfo?(jmathies)
Priority: P5 → P3

The leave-open keyword is there and there is no activity for 6 months.
:jimm, maybe it's time to close this bug?

Flags: needinfo?(jmathies)
Flags: needinfo?(jmathies)

The leave-open keyword is there and there is no activity for 6 months.
:bhood, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Flags: needinfo?(bhood)

The leave-open keyword is there and there is no activity for 6 months.
:bhood, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(bhood)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: