Use EGL_CONTEXT_LOST to detect driver resets in WebGL on Windows with ANGLE

RESOLVED FIXED in mozilla11

Status

()

Core
Canvas: WebGL
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: bjacob, Assigned: drs)

Tracking

(Blocks: 1 bug)

unspecified
mozilla11
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 4 obsolete attachments)

This is the low-hanging-fruit part of bug 656824. Windows is easiest to handle because:
 1) ANGLE has EGL_CONTEXT_LOST to tell us when a driver reset occurs
 2) On Windows, besides WebGL, we don't use GL/GLES
 3) For D3D contexts, this is already a solved problem (thanks Bas)

So it's just a matter of detecting EGL_CONTEXT_LOST in WebGL and then properly lose the WebGL context.
(Assignee)

Updated

6 years ago
Assignee: nobody → dsherk
(Assignee)

Comment 1

6 years ago
Created attachment 565425 [details]
Test case; should create an alert that says "oh" when the context is lost.
(Assignee)

Comment 2

6 years ago
Created attachment 565426 [details] [diff] [review]
Patch v1.0, handling EGL_CONTEXT_LOST on Windows.

This patch is pretty sub-optimal, but seems to be the only sane way to do it that I could think of. Before I explain, some notes:

* You can't test this in debug mode, or the whole browser just crashes whenever you use my test case (whereas it keeps going with WebGL disabled in release mode). There might be some way around this, but I wrote/tested this patch in release mode and just sort of guessed what was going on behind the scenes.
* This patch only applies to Windows, and of course only EGL.

So what this actually implements is informing a script through "webglcontextlost" that the context has been lost when the implementation receives an EGL_CONTEXT_LOST error. It uses some of the ARB_robustness code and is thus dependent on the ARB_robustness patch. 

Now, to actually generate this EGL_CONTEXT_LOST error, you have to try to switch the context and it has to be gone. The problem is, our EGL code only switches the context if the current one is different than the one we're trying to set (i.e. if they're the same we just stop there and pretend it succeeded). So if the user isn't switching pages or anything like that, they will never generate this error even if the context is indeed lost.

There were 2 workarounds I thought of:
1) Always force a context switch, even if we're already on that context. I tossed this out because it's terrible and slow.
2) I looked at the angle code and saw that the MakeCurrent() code there checks if the device is null or has been lost, so I figured I could test that myself in our code to predict what result MakeCurrent() will give. If we see the same conditions that would make MakeCurrent() error, then I run MakeCurrent() anyways (even if I normally wouldn't because the context is already current) and check for an error. 

I chose the second method, but the problem is that thebes has no concept of an actual EGLDisplay inside it, and instead it's typedef'd to a void pointer. So my (crappy) solution was to just copy the header for EGLDisplay over to thebes, and replace any tokens that we don't have a definition for with a void pointer within its class def. This works, but is incredibly hacky.
Attachment #565426 - Flags: review?(bjacob)
(Assignee)

Updated

6 years ago
No longer blocks: 656824
Depends on: 656824
(Assignee)

Comment 3

6 years ago
Created attachment 570182 [details] [diff] [review]
Patch v1.1, handling EGL_CONTEXT_LOST on Windows.

Cleared up bitrot.
Attachment #565426 - Attachment is obsolete: true
Attachment #565426 - Flags: review?(bjacob)
Attachment #570182 - Flags: review?(bjacob)
(Assignee)

Comment 4

6 years ago
Running on Try: https://tbpl.mozilla.org/?tree=Try&rev=20a50709f71c
(Assignee)

Comment 5

6 years ago
Created attachment 573724 [details] [diff] [review]
Patch v2.0, handling EGL_CONTEXT_LOST.

Catches the EGL_CONTEXT_LOST error which occurs after driver resets, and sends a
WebGL context the canvas event webglcontextlost when this occurs. Forces a
context switch during DrawArrays and DrawElements, which could potentially be a
performance regression.

Tested on Windows, may work on Android although untested. Will push to try once it re-opens for checkins.
Attachment #570182 - Attachment is obsolete: true
Attachment #570182 - Flags: review?(bjacob)
Attachment #573724 - Flags: review?(bjacob)
(Assignee)

Comment 6

6 years ago
Try push: https://tbpl.mozilla.org/?tree=Try&rev=f1f570b11131
(Assignee)

Comment 7

6 years ago
Created attachment 577829 [details] [diff] [review]
Patch v3.0, handling EGL_CONTEXT_LOST.

Rewrote for the second time, this time it abuses the robustness timer to do random forced MakeCurrent() if the context provider is EGL. I simulate a CONTEXT_GUILTY_CONTEXT_RESET_ARB, reason being that we don't want to allow the user to respawn the context since we have no information on whether it was guilty or not. Thus, we have to assume the worst case, which is that any context which receives this error is guilty of causing it.
Attachment #573724 - Attachment is obsolete: true
Attachment #573724 - Flags: review?(bjacob)
Attachment #577829 - Flags: review?(bjacob)
(Assignee)

Comment 8

6 years ago
Try push: https://tbpl.mozilla.org/?tree=Try&rev=f37d4d6f4a4b
(Assignee)

Comment 9

6 years ago
Created attachment 578081 [details] [diff] [review]
Patch v3.1, handling EGL_CONTEXT_LOST.

Small change to MakeCurrent() behavior.
Attachment #577829 - Attachment is obsolete: true
Attachment #577829 - Flags: review?(bjacob)
Attachment #578081 - Flags: review?(bjacob)
Comment on attachment 578081 [details] [diff] [review]
Patch v3.1, handling EGL_CONTEXT_LOST.

Review of attachment 578081 [details] [diff] [review]:
-----------------------------------------------------------------

Neat! Like this much better than previous versions.
Attachment #578081 - Flags: review?(bjacob) → review+
(Assignee)

Comment 11

6 years ago
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=66f61be58aa3
https://hg.mozilla.org/mozilla-central/rev/66f61be58aa3
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
(Assignee)

Updated

6 years ago
Blocks: 707860
Created attachment 583149 [details] [diff] [review]
Remove nsContentUtils.h include

There doesn't seem to be a reason to include nsContentUtils.h here; compiles fine without.
Attachment #583149 - Flags: review?(bugzilla)
(Assignee)

Comment 14

6 years ago
Was this patch meant for this ticket and is that the correct patch?
(Assignee)

Comment 15

4 years ago
Comment on attachment 583149 [details] [diff] [review]
Remove nsContentUtils.h include

Review of attachment 583149 [details] [diff] [review]:
-----------------------------------------------------------------

Clearing from review queue, doesn't seem relevant to this bug anyways.
Attachment #583149 - Flags: review?(bugzilla)
You need to log in before you can comment on or make changes to this bug.