crashes with WGL in ig4icd32.dll (We probably shouldn't be using WGL)

VERIFIED FIXED in Firefox 36

Status

()

defect
--
critical
VERIFIED FIXED
5 years ago
4 years ago

People

(Reporter: kairo, Assigned: milan)

Tracking

({crash})

Trunk
mozilla36
x86
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox36+ verified, firefox37 verified, firefox38+ verified, firefox39 verified, firefox40 verified)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(2 attachments)

This bug was filed from the Socorro interface and is 
report bp-9aff7fef-42d1-4503-8cc7-7dfc32150115.
=============================================================

Those crashes started spiking with the 36.0b1 release (they do not appear in 35, but in 36 and higher) and summed up they are >6% of our 36 beta crashes at least, so pretty significant - but we have a collection of different signatures so it's spread up in stats.

Most stacks are pretty useless but some at least give away that it's GL stuff, like bp-f8d20c25-88ce-4581-80cd-c48932150114 or bp-7d1551e6-1bfd-4380-a5ce-a04bc2150116 and I found this is a GL library of the Intel graphics drivers.

https://crash-stats.mozilla.com/search/?signature=~ig4icd32.dll&_facets=signature&_facets=address&_facets=adapter_device_id&_facets=adapter_driver_version&_facets=platform_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature will give you a list of signatures and other facets as well as links to dig deeper. Unsurprisingly (and I left those facets out), it's all Intel chips on Windows, but across all Windows versions, lots of driver versions and so on.

This might very well be a collection of different issues, but I wonder why it's only starting to show up in 36 and was not present at all in 35 or before.
Jeff, are you the right owner for GL stuff?

Is there anything we can do about these crashes? Do you need any further information from crash reports?
Flags: needinfo?(jgilbert)
Milan, that is one of the top crash on 36, could you help? thanks
Flags: needinfo?(milan)
Whiteboard: [gfx-noted]
Right now, when I do searches for 36.0b4 data, ~ 10k of 65k total crashes have a ig4icd32.dll signature, so that's ~15% of all crashes and therefore by far our largest issue at this time.
Kaiser (:kairo@mozilla.com) from comment #0)
 
> This might very well be a collection of different issues, but I wonder why
> it's only starting to show up in 36 and was not present at all in 35 or
> before.

Perhaps we're only now just exposing issues in a driver from 2009 which hasn't seen an update since then.
(In reply to Arthur K. from comment #6)
> Perhaps we're only now just exposing issues in a driver from 2009 which
> hasn't seen an update since then.

Surely possible, but the fact is that those are people crashing that probably have not seen crashes in 35, and we should not ship 36 this way.
This crashes are showing us using WGL which we should not use unless specifically asked for.
Summary: crashes in ig4icd32.dll → crashes in ig4icd32.dll (maybe
Summary: crashes in ig4icd32.dll (maybe → crashes with WGL in ig4icd32.dll (We probably shouldn't be using WGL)
I don't think all the crashes are WGL+.
Flags: needinfo?(milan)
But the random 20+ reports that I looked at all seem to be OMTC basic?  Is that what D3D11 Layers- and D3D9 Layers- would mean?
(In reply to Milan Sreckovic [:milan] from comment #10)
> But the random 20+ reports that I looked at all seem to be OMTC basic?  Is
> that what D3D11 Layers- and D3D9 Layers- would mean?

It is.
Looking at some of the reports, it's likely this is the first time they try to switch to a different graphics card; one of the sites mentioned is pch.com, and that one triggers a switch to discrete graphics on os x.
Note, there is only one crash from beta breakers following https://etherpad.mozilla.org/QA-Tests-Gfx-Hw-Support approach.  I'll see if we can get a bug report/stack.
So WGL is hit when we fail (blacklist or disable) d2d usage. If you disable d2d, the first canvas fallback is skia. For some reason, USE_SKIA_GPU is defined on desktop, therefore skiagl is possible. I don't know exactly where skia chooses between software and gpu backends, but it's clearly choosing the gpu backend here.

Since the default provider for GL contexts is WGL on Windows, when skia asks for 'one of the default gl contexts', it gets WGL. (In WebGL we have to explicitly ask for an EGL-sourced context to try to get ANGLE)
Flags: needinfo?(jgilbert)
Specifically, it appears that canvas rendering begins with the software backend, but we have `CanvasDrawObserver` which in `CanvasDrawObserver::FrameEnd()` weighs a heuristic and can choose to try switching to GPU rendering. This happens through `CanvasRenderingContext2D::SwitchRenderingMode` and `CanvasRenderingContext2D::EnsureTarget`. There doesn't seem to be any guard preventing skiagl activation on Desktop.
I recommend having a pref that mediates skiagl activation, either disable-skiagl or allow-skiagl, and to set this by platform.
(In reply to Jeff Gilbert [:jgilbert] from comment #16)
> I recommend having a pref that mediates skiagl activation, either
> disable-skiagl or allow-skiagl, and to set this by platform.

That sounds reasonable.
Let's do it...
I don't have a beta checkout. But this patch should do the trick and probably be safe for beta since I don't think we will release any platform from there that we use SkiaGL on? But I might be wrong.
Don't we use SkiaGL for android canvas2d?
Comment on attachment 8556963 [details] [diff] [review]
Conditionally allow SkiaGL

Review of attachment 8556963 [details] [diff] [review]:
-----------------------------------------------------------------

The pref should default to true for mobile, I think, but this seems like a fine place for the logic. I do think we should have an `MOZ_ASSERT(pref set to allow)` in GetSkiaGLGlue().
Attachment #8556963 - Flags: feedback+
Comment on attachment 8556963 [details] [diff] [review]
Conditionally allow SkiaGL

Review of attachment 8556963 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +1347,5 @@
>          nsContentUtils::PersistentLayerManagerForDocument(ownerDoc);
>      }
>  
>       if (layerManager) {
> +      if (mode == RenderingMode::OpenGLBackendMode && CheckSizeForSkiaGL(size) && gfxPrefs::SkiaGLAllow()) {

Wouldn't just adding && gfxPlatform::GetPlatform()->UseAcceleratedSkiaCanvas() do it?  Without the new preference?
Since we already have a accelerated canvas preference, though not specific to SkiaGL, maybe it's enough to just check that?  It is off by default on desktop and on by default on Android already.
Attachment #8557261 - Flags: review?(jmuizelaar)
(In reply to Milan Sreckovic [:milan] from comment #25)
> Created attachment 8557261 [details] [diff] [review]
> Alternative - Check if accelerated canvas is preferred before going to
> SkiaGL.
> 
> Since we already have a accelerated canvas preference, though not specific
> to SkiaGL, maybe it's enough to just check that?  It is off by default on
> desktop and on by default on Android already.

This seems like it would probably be ok.
Comment on attachment 8557261 [details] [diff] [review]
Alternative - Check if accelerated canvas is preferred before going to SkiaGL.

Review of attachment 8557261 [details] [diff] [review]:
-----------------------------------------------------------------

I tried reading through the code to make sure it would do things correctly and it wasn't obvious. But not adding a pref is cleaner.
Attachment #8557261 - Flags: review?(jmuizelaar) → review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=58ea4f2ca2f2

Bas, Jeff G., thoughts on being safe with the "no new pref change" patch?
Flags: needinfo?(jgilbert)
Flags: needinfo?(bas)
Sounds good.
Flags: needinfo?(jgilbert)
Comment on attachment 8557261 [details] [diff] [review]
Alternative - Check if accelerated canvas is preferred before going to SkiaGL.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: 16+% of the beta crashes.  We want to see if this helps.
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: We shouldn't be going into SkiaGL on the desktop, so this just enforces it.
[String/UUID change made/needed]:
Flags: needinfo?(bas)
Attachment #8557261 - Flags: approval-mozilla-beta?
Attachment #8557261 - Flags: approval-mozilla-aurora?
Try looks reasonably green, and asking for the aurora (mostly beta, really) approval so that we can catch the Monday build if the re-triggered try failures clear up.
Attachment #8557261 - Flags: approval-mozilla-beta?
Attachment #8557261 - Flags: approval-mozilla-beta+
Attachment #8557261 - Flags: approval-mozilla-aurora?
Attachment #8557261 - Flags: approval-mozilla-aurora+
I can verify that those crashes are gone in 36.0b6.
Do we mark this bug as resolved, given that we only needed Aurora+Beta?
WFM
Assignee: nobody → milan
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Blocks: 1149954
[Tracking Requested - why for this release]:
This is back with violence in 38, comprising 13% of all 38.0b1 crashes.

39 Dev Edition and Nightly 40 are also affected.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #36)
> [Tracking Requested - why for this release]:
> This is back with violence in 38, comprising 13% of all 38.0b1 crashes.
> 
> 39 Dev Edition and Nightly 40 are also affected.

Yeah, it never went away in 38.  I'm unclear where we decided this is Aurora + Beta only at the time (seemy comment 34 for that conclusion), but we did, so it landed on 36 and 37, but never on 38.  Tracking and will land it in bug 1149954.
Flags: needinfo?(jmuizelaar)
Bug 1149954 is marked verified everywhere, and I can confirm the crashes are gone at least from 38.0b2
You need to log in before you can comment on or make changes to this bug.