crash in rx::Renderer11::testDeviceLost

RESOLVED FIXED in Firefox 50

Status

()

Core
Graphics
--
critical
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: mats, Assigned: eflores)

Tracking

(Depends on: 1 bug, {crash})

unspecified
mozilla52
x86
Windows NT
crash
Points:
---

Firefox Tracking Flags

(firefox47 wontfix, firefox48 wontfix, firefox49 wontfix, firefox-esr45 wontfix, firefox50 fixed, firefox51 fixed, firefox52 fixed)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(2 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-f5215f88-494f-4858-8ede-0fabe2160203.
=============================================================

Crash occurs in all FF versions at least back to v37.
In Firefox 44 Top Crash list it's currently at #45.

It only occurs on Windows, with Windows 10 standing out:
Operating System	Percentage	Number Of Crashes
Windows 10	91.16%	1629
Windows 7	7.05%	126
Windows 8.1	1.68%	30
Windows 8	0.11%	2


Stack:
rx::Renderer11::testDeviceLost()
egl::Display::createPbufferSurface(egl::Config const*, egl::AttributeMap const&, egl::Surface**)
egl::CreatePbufferSurface(void*, void*, int const*)
mozilla::gl::GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo(void*, unsigned int, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>&)
mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContext(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gl::SurfaceCaps const&)
mozilla::gl::GLContextProviderEGL::CreateOffscreen(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gl::SurfaceCaps const&, mozilla::gl::CreateContextFlags)

More Reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=rx%3A%3ARenderer11%3A%3AtestDeviceLost
It looks like this might be happening during a device reset.
Depends on: 1245843
Whiteboard: [gfx-noted]
Assignee: nobody → edwin
It seems that when EGL_LOSE_CONTEXT_ON_RESET isn't set, ANGLE will automagically try to reset the device. It detects this in its testDeviceLost() function, and tries to reset the device by calling restoreLostDevice() (e.g. [1]).

But restoreLostDevice can fail, leaving mDevice null. The next time testDeviceLost() is called, we get a segfault since there's no check for mDevice or mDeviceLost. [2]

It seems we did use EGL_LOSE_CONTEXT_ON_RESET at some point (bug 660070) which avoids the ANGLE magic, but it got lost somewhere along the line.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/angle/src/libANGLE/Display.cpp#523
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#2358
Created attachment 8753354 [details] [diff] [review]
1245747.patch

This patch adds back the EGL_LOSE_CONTEXT_ON_RESET attrib and re-initialises the display when we see a context loss.

Not sure if HandleDeviceLost() is correct here so just requesting f?. WFM, though.
Attachment #8753354 - Flags: feedback?(jmuizelaar)
(of course, just adding a null check in the ANGLE code would work as well, but this way means not having to faff about upstream and hopefully generalises to Android as well)
Comment on attachment 8753354 [details] [diff] [review]
1245747.patch

Review of attachment 8753354 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable to me. Jgilbert is probably the best person to ask for r+
Attachment #8753354 - Flags: feedback?(jmuizelaar) → review?(jgilbert)
Comment on attachment 8753354 [details] [diff] [review]
1245747.patch

Review of attachment 8753354 [details] [diff] [review]:
-----------------------------------------------------------------

Let's not destroy contexts unless we have a plan for all contexts.

::: gfx/gl/GLContextProviderEGL.cpp
@@ +332,5 @@
>      bool succeeded = true;
>  
> +    if (IsContextLost()) {
> +        return false;
> +    }

Don't need {} for simple return statements.

@@ +361,5 @@
>                                                mContext);
>          if (!succeeded) {
>              int eglError = sEGLLibrary.fGetError();
>              if (eglError == LOCAL_EGL_CONTEXT_LOST) {
> +                sEGLLibrary.fDestroyContext(EGL_DISPLAY(), mContext);

Don't destroy this context unless you also destroy all other existing contexts.

@@ +378,5 @@
>      } else {
>          MOZ_ASSERT(sEGLLibrary.CachedCurrentContextMatches());
> +
> +        if (sEGLLibrary.HasRobustness() && fGetGraphicsResetStatus() != NO_ERROR) {
> +            sEGLLibrary.fDestroyContext(EGL_DISPLAY(), mContext);

Also no reason to DestroyContext here.

::: gfx/gl/GLLibraryEGL.cpp
@@ +220,5 @@
>  
> +void
> +GLLibraryEGL::HandleDeviceLost()
> +{
> +  fTerminate(mEGLDisplay);

4-space not 2-space.
Attachment #8753354 - Flags: review?(jgilbert) → review-
Created attachment 8771936 [details] [diff] [review]
1245747.patch
Attachment #8753354 - Attachment is obsolete: true
Attachment #8771936 - Flags: review?(jgilbert)
Comment on attachment 8771936 [details] [diff] [review]
1245747.patch

Review of attachment 8771936 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +375,1 @@
>                  mContextLost = true;

Include `mContextLost = true` in HandleDeviceLost() and s/HandleDeviceLost/SetDeviceLost/.
Attachment #8771936 - Flags: review?(jgilbert) → review+

Comment 9

2 years ago
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b0b62fc39e
Terminate the EGL display on context loss - r=jgilbert

Comment 10

2 years ago
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/137df01eff02
Backed out changeset c5b0b62fc39e for bustage
(In reply to Edwin Flores [:eflores] [:edwin] from comment #4)
> (of course, just adding a null check in the ANGLE code would work as well,
> but this way means not having to faff about upstream and hopefully
> generalises to Android as well)

We should fix ANGLE.

After thinking about it, I'm not sure this bug's current approach is the best course of action. If it's a bug in ANGLE, we should fix that bug, leaving the current patch unnecessary.
Do I understand this right?
Flags: needinfo?(edwin)
Created attachment 8775106 [details] [diff] [review]
1245747-angle.patch
Attachment #8771936 - Attachment is obsolete: true
Flags: needinfo?(edwin)
Attachment #8775106 - Flags: review?(jgilbert)
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Review of attachment 8775106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp
@@ +2613,5 @@
>  
> +    if (!mDevice) {
> +        // We should only get here if resetDevice failed.
> +        ASSERT(mDeviceLost);
> +        return true;

false
Attachment #8775106 - Flags: review?(jgilbert) → review-
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Review of attachment 8775106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp
@@ +2613,5 @@
>  
> +    if (!mDevice) {
> +        // We should only get here if resetDevice failed.
> +        ASSERT(mDeviceLost);
> +        return true;

Nope, `true` is correct.
Attachment #8775106 - Flags: review- → review+

Comment 15

2 years ago
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb6cb6888ca0
Handle ANGLE device reset failure - r=jgilbert

Comment 16

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/fb6cb6888ca0
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox50: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Crash volume for signature 'rx::Renderer11::testDeviceLost':
 - nightly(version 50):42 crashes from 2016-06-06.
 - aurora (version 49):98 crashes from 2016-06-07.
 - beta   (version 48):503 crashes from 2016-06-06.
 - release(version 47):3018 crashes from 2016-05-31.
 - esr    (version 45):172 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       3       4       9      12       4       3       4
 - aurora        8      11      21      31       9       9       9
 - beta         59      54     105     127      57      50      32
 - release     388     397     646     821     237     230     214
 - esr          27      27      24      18      15      15      14

Affected platform: Windows
status-firefox47: --- → affected
status-firefox48: --- → affected
status-firefox49: --- → affected
status-firefox-esr45: --- → affected
This is still reproducible in low volume on Fx50.

  SIGNATURE   | rx::Renderer11::testDeviceLost
  ---------------------------------------------------------------
  CRASH STATS | http://tinyurl.com/zabqlab
  ---------------------------------------------------------------
  OVERVIEW    | 2 crashes on nightly 52
	      | 1 crash on aurora 51
	      | 1 crash on aurora 50
	      | 18 crashes on beta 50 (1 on 50.0b2, 17 on 50.0b1)
status-firefox51: --- → affected
status-firefox52: --- → affected
The check was removed when we updated ANGLE in bug 1297924.

The patch wasn't upstreamed since apparently some new code "now includes this check"[1] -- which, now that I look at it, doesn't appear to be the case.

We should re-land and uplift this and I'll try to get it upstreamed again.

[1] https://chromium-review.googlesource.com/#/c/364120/
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Approval Request Comment
[Feature/regressing bug #]: bug 1297924
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: was in 50 for a couple of months
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8775106 - Flags: approval-mozilla-beta?
Attachment #8775106 - Flags: approval-mozilla-aurora?
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Actually, will add new patch without the assert since mDeviceLost no longer exists.
Attachment #8775106 - Flags: approval-mozilla-beta?
Attachment #8775106 - Flags: approval-mozilla-aurora?
Created attachment 8795701 [details] [diff] [review]
1245747.patch

Approval Request Comment
[Feature/regressing bug #]: bug 1297924
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: was in 50 for a couple of months
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8795701 - Flags: review+
Attachment #8795701 - Flags: approval-mozilla-beta?
Attachment #8795701 - Flags: approval-mozilla-aurora?
Comment on attachment 8795701 [details] [diff] [review]
1245747.patch

Crash fix, Aurora51+, Beta50+
Attachment #8795701 - Flags: approval-mozilla-beta?
Attachment #8795701 - Flags: approval-mozilla-beta+
Attachment #8795701 - Flags: approval-mozilla-aurora?
Attachment #8795701 - Flags: approval-mozilla-aurora+

Updated

2 years ago
status-firefox50: fixed → affected
status-firefox47: affected → wontfix
status-firefox48: affected → wontfix
status-firefox49: affected → wontfix
status-firefox-esr45: affected → wontfix
Target Milestone: mozilla50 → mozilla52
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla52 → ---

Comment 24

2 years ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ae90c76179f
Handle ANGLE device reset failure. r=jgilbert

Comment 25

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/cc721212bace
status-firefox51: affected → fixed

Comment 26

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/b01845a42049
status-firefox50: affected → fixed

Comment 27

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2ae90c76179f
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
status-firefox52: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.