Please don't disable WebGL if crashed on WebGL context

RESOLVED FIXED in Firefox 46

Status

()

Core
Canvas: WebGL
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: sakari, Assigned: milan)

Tracking

(Blocks: 1 bug)

44 Branch
mozilla47
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox46 fixed, firefox47 fixed)

Details

(Whiteboard: gfx-noted,feature)

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:44.0) Gecko/20100101 Firefox/44.0
Build ID: 20160205155049

Steps to reproduce:

I was wondering why my WebGL context was not being created.
For example at https://get.webgl.org/ the result would be 'Your browser seems to support WebGL, but it is disabled'.

I was wondering maybe this is a bug in my video driver or something, but with Chrome works just fine.

Then I found out Firefox disables WebGL if it has crashed when a WebGL context was being active, and I had to tinker with some obscure 'gfx.crash-guard.status.glcontext' setting and set it to 0, and now WebGL works again.

Please don't do this. I was thinking my drivers had some issues, or maybe ThreeJs wasn't doing things correctly.


Actual results:

WebGL was disabled


Expected results:

WebGL should work, or at least tell the user that it has been disabled automatically.

Updated

2 years ago
Component: Untriaged → Canvas: WebGL
Product: Firefox → Core
Do we perma disable webgl in this case?
Flags: needinfo?(jgilbert)
Whiteboard: gfx-noted,feature
90% sure this is 'yes'.
dvander knows.
Flags: needinfo?(jgilbert) → needinfo?(dvander)
We disable it until either drivers change or the Firefox version changes. I'm open to removing the guard for WebGL, since it's unlikely it will lead to a startup crash.

Milan, what do you think?
Assignee: nobody → dvander
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(dvander) → needinfo?(milan)
(In reply to David Anderson [:dvander] from comment #3)
> We disable it until either drivers change or the Firefox version changes.
> I'm open to removing the guard for WebGL, since it's unlikely it will lead
> to a startup crash.
> 
> Milan, what do you think?

Can we keep the the startup crash protection, and disable the crashguard activation if the request comes from an actual webpage?

We shouldn't be starting webgl during startup as far as I'm aware. If we are, we should figure out why.
Agreed - "crash-guard" are really "startup-crash-guard".  We don't need to perma disable WebGL unless it's somehow causing a startup crash.
Flags: needinfo?(milan)
Duplicate of this bug: 1248237
Created attachment 8719579 [details] [diff] [review]
Leave the catching of GL crashes, but don't disable because of it.

David, what should we do?  Just remove the crash notification on GL contexts completely, or leave them in, report them in about:support/crash reports, but otherwise don't disable WebGL?
Flags: needinfo?(dvander)
Attachment #8719579 - Flags: feedback?(dvander)
Comment on attachment 8719579 [details] [diff] [review]
Leave the catching of GL crashes, but don't disable because of it.

Seems fine. Alternately, we could consider only activating the guard if some condition is true, like (1) the originator of the GL context is chrome code or (2) it's within the first N seconds of uptime.

Just removing it seems fine though, with E10S the chances of GL causing a startup crash drop dramatically.
Flags: needinfo?(dvander)
Attachment #8719579 - Flags: feedback?(dvander) → review+
Duplicate of this bug: 1248940
Keywords: checkin-needed

Comment 12

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/173c17843fa6
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox47: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8719579 [details] [diff] [review]
Leave the catching of GL crashes, but don't disable because of it.

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Crash during WebGL initialization causes us to disable WebGL, leaving people in a state where they can't run it again, even if that crash was spurious. 
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: Low.  We're basically undoing an earlier change.
[String/UUID change made/needed]:

This is already in 47, let's uplift to 46 - I'm not clear if that's a beta or aurora uplift at this point.
Attachment #8719579 - Flags: approval-mozilla-beta?
Attachment #8719579 - Flags: approval-mozilla-aurora?
Comment on attachment 8719579 [details] [diff] [review]
Leave the catching of GL crashes, but don't disable because of it.

Clearing out the Aurora flag. Liz ni'ing you so this is reviewed for Beta46 uplift.
Flags: needinfo?(lhenry)
Attachment #8719579 - Flags: approval-mozilla-aurora?
Comment on attachment 8719579 [details] [diff] [review]
Leave the catching of GL crashes, but don't disable because of it.

We want users to be able to use WebGL, OK to uplift to beta
Flags: needinfo?(lhenry)
Attachment #8719579 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 16

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/c4bfa93e7480
status-firefox46: --- → fixed
Duplicate of this bug: 1257664
Milan, is this something manual QA should look at? If so, could you please give us a few specifics on how to simulate such a scenario?
Flags: qe-verify?
Flags: needinfo?(milan)

Comment 19

2 years ago
This just happened to me.  I have been working on a WebGL app for hours and at some point, Firebug reported that Firefox ran out of RAM (which is not supposed to be true, since not even a fourth of my 16GB were used according to Windows) but I had to "end task" on Firefox and re-launch it to be able to see the textures in the WebGLRenderer.  From that point on, webgl_detect() returned false and not a single WebGL demo or app worked.  I had to reboot my computer completely to solve this issue.  Reloading the page or cleaning the cache didn't change anything.

What is even more sad is that before I rebooted, out of curiosity, I launched Internet Explorer 11 and it ran my WebGL app without a single problem or lag whatsoever.

Comment 20

2 years ago
As a matter of fact, I just rebooted and it still doesn't work! WTF?
(In reply to mozilla from comment #20)
> As a matter of fact, I just rebooted and it still doesn't work! WTF?

If you think the cause is due to Firefox crashing during initializing WebGL, you can go to about:config and change the "gfx.crash-guard.status.glcontext" pref to 2.

Comment 22

2 years ago
Yep, I was pulling my hair out for a minute, I tried to reset everything listed under "webgl" but I had it wrong.  The culprit was "gfx.crash-guard.status.glcontext" that was set to 3, I reset it and it automatically set itself to 2 upon the first WebGL app I loaded in another tab and now it works correctly.  I think this Firefox WebGL context auto-disabling "feature" is a pain for a developper :P
The intent was to guard against startup crashes, but it ended up being too aggressive.

Comment 24

2 years ago
Well, if I may suggest: somebody could at least pull down a bar like it does (did?) when an extension is disabled for security reasons (Flash or Java for example).  You could simply have a dismiss X button and a RE-ENABLE button for quick context setting override.  People are already used to this kind of bar that pulls down on security blocking problems and such, so why not re-use this interface to overcome this issue and give people ease of use?

Comment 25

2 years ago
Thanks for the suggestion polycoda. Note that like was mentioned above, this was a bug that is fixed for Firefox 46 and newer, so there will be no need to show a notification bar for an activity that no longer can happen.
Noticed that it's fixed in Firefox 46, can we updated the target milestone as a result?
(In reply to Martin Best (:mbest) from comment #26)
> Noticed that it's fixed in Firefox 46, can we updated the target milestone
> as a result?

For better or worse, we set the milestone to the version where the fix hit the central, rather than how far it got uplifted.  We may want to rethink that (I know I'd like it), but until then, 47 is the correct setting.
Flags: needinfo?(milan)
The crashes due to removal of this guard are spiking in the latest 46 beta (see bug 1259811.)  We'll probably have to live with that.
Just to clarify what are we planning to do?  Re-enable the guard, or live with the crash rate?
We could consider something finer grained maybe. If we had evidence that the D3D11 ANGLE backend crashes but not D3D9, a crash guard for WebGL1 + D3D11 ANGLE might be better.

Process-per-tab is the real solution for general WebGL crashes though, so it might be better to just wait or blacklist.
In the case of bug 1259811, I'm hoping we fix the underlying issue :)

But there are some things we can try with the crash guard, though not as simple as what we had before.

David, is one of the items in telemetry dashboard (e.g., http://people.mozilla.org/~danderson/moz-gfx-telemetry/www/#view=blacklisting) reporting the "this is disabled because of a crash guard"?
(In reply to Milan Sreckovic [:milan] from comment #31)
> David, is one of the items in telemetry dashboard (e.g.,
> http://people.mozilla.org/~danderson/moz-gfx-telemetry/www/
> #view=blacklisting) reporting the "this is disabled because of a crash
> guard"?

It's a little hard to suss that out of that graph since the status code is "Blocked", which is also used by the nvdxgiwrap workaround. This graph might be better since it's reported by the crash guard directly:

http://people.mozilla.org/~danderson/moz-gfx-telemetry/www/#view=startup

Here, 0.2% of sessions disabled layers acceleration due to a crash in D3D11 initialization.

(I'm making a note to explain the status codes clearly on the blacklist page)
(In reply to David Anderson [:dvander] from comment #32)

Sorry meant to say: the DisplayLink workaround reports as "Blocked". And that is the vast majority of instances of the "Blocked" status.

Comment 34

2 years ago
So we have indication that the crash guard was shielding us against some statistical significant amount of WebGL crashes. That is interesting to know.

Do we have data on which crashes exactly were the causes of the crash guard to kick in before we took it down? If we do re-enable the crash guard, we should re-enable it in a way that we do get that data, and are able to say "we need crash guard against bugs X, Y and Z." so that we can understand why we are holding the guard in place.

I feel that bug(s) like 1259811 surfacing its head _after_ we disabled the crash guard is an indication that the crash guard had had a detrimental effect of hiding bugs from our eyes (or did we know about bug 1259811 before?). If we need to reintroduce the crash guard, we should make it in a way that we don't lose sight of the crashes that it guards us from, so that we can work on those bugs while the guard is up, and estimate the impact/effect of those bugs we are being shielded from.

In any case, I'd prefer we had the crash guard down, and instead applied fixes (or crash guards if we can't fix) to individual crashes we are seeing through statistics, since that seems to be more effective workflow (our reaction to bug 1259811 as an example).
The crashes still got reported, but it's definitely true that (1) we would stop reporting *additional* crashes which may be totally different and (2) the volume might be reduced enough such that no one notices.

Anyway, I agree - I think this approach was too aggressive. At the very least we weren't sending enough data back.
Also, WebGL is a special cookie that way, and could be treated differently.  The general idea of "notice that a particular crash will trigger a crash guard in the future" is a valid one, and yes, bug 1259811 does show up as #1 cause of "crashes that would be stopped by a crash guard".  In this case though, it wasn't that list that got us to notice it, because I didn't look at it recently:

https://crash-stats.mozilla.com/search/?product=Firefox&graphics_startup_test=!__null__&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Comment 37

2 years ago
...and please don't make us dig trough the about:config to find that setting we need to reset in order to unlock the WebGL feature.  At least drop a bar at the top containing a crash message with a button in it that we can click that does reset that setting for us.  Just so you know, I can make the WebGL context in Firefox 46 crash on demand simply by reloading the page while a canvas is initializing.  It's not convenient that we have to go to about:config to reset this.  It could be easier.
Flags: qe-verify? → qe-verify-
(In reply to that-ben from comment #37)
> ...and please don't make us dig trough the about:config to find that setting
> we need to reset in order to unlock the WebGL feature.  At least drop a bar
> at the top containing a crash message with a button in it that we can click
> that does reset that setting for us.  Just so you know, I can make the WebGL
> context in Firefox 46 crash on demand simply by reloading the page while a
> canvas is initializing.  It's not convenient that we have to go to
> about:config to reset this.  It could be easier.

If I understand your suggestion correctly, bug 1270894 should take care of this.
You need to log in before you can comment on or make changes to this bug.