Closed Bug 1245747 Opened 4 years ago Closed 3 years ago

crash in rx::Renderer11::testDeviceLost

Categories

(Core :: Graphics, defect, critical)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: mats, Assigned: eflores)

References

(Depends on 1 open bug)

Details

(Keywords: crash, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-f5215f88-494f-4858-8ede-0fabe2160203.
=============================================================

Crash occurs in all FF versions at least back to v37.
In Firefox 44 Top Crash list it's currently at #45.

It only occurs on Windows, with Windows 10 standing out:
Operating System	Percentage	Number Of Crashes
Windows 10	91.16%	1629
Windows 7	7.05%	126
Windows 8.1	1.68%	30
Windows 8	0.11%	2


Stack:
rx::Renderer11::testDeviceLost()
egl::Display::createPbufferSurface(egl::Config const*, egl::AttributeMap const&, egl::Surface**)
egl::CreatePbufferSurface(void*, void*, int const*)
mozilla::gl::GLContextEGL::CreatePBufferSurfaceTryingPowerOfTwo(void*, unsigned int, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>&)
mozilla::gl::GLContextEGL::CreateEGLPBufferOffscreenContext(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gl::SurfaceCaps const&)
mozilla::gl::GLContextProviderEGL::CreateOffscreen(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gl::SurfaceCaps const&, mozilla::gl::CreateContextFlags)

More Reports:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=rx%3A%3ARenderer11%3A%3AtestDeviceLost
It looks like this might be happening during a device reset.
Depends on: 1245843
Whiteboard: [gfx-noted]
Assignee: nobody → edwin
It seems that when EGL_LOSE_CONTEXT_ON_RESET isn't set, ANGLE will automagically try to reset the device. It detects this in its testDeviceLost() function, and tries to reset the device by calling restoreLostDevice() (e.g. [1]).

But restoreLostDevice can fail, leaving mDevice null. The next time testDeviceLost() is called, we get a segfault since there's no check for mDevice or mDeviceLost. [2]

It seems we did use EGL_LOSE_CONTEXT_ON_RESET at some point (bug 660070) which avoids the ANGLE magic, but it got lost somewhere along the line.

[1] https://dxr.mozilla.org/mozilla-central/source/gfx/angle/src/libANGLE/Display.cpp#523
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp#2358
Attached patch 1245747.patch (obsolete) — Splinter Review
This patch adds back the EGL_LOSE_CONTEXT_ON_RESET attrib and re-initialises the display when we see a context loss.

Not sure if HandleDeviceLost() is correct here so just requesting f?. WFM, though.
Attachment #8753354 - Flags: feedback?(jmuizelaar)
(of course, just adding a null check in the ANGLE code would work as well, but this way means not having to faff about upstream and hopefully generalises to Android as well)
Comment on attachment 8753354 [details] [diff] [review]
1245747.patch

Review of attachment 8753354 [details] [diff] [review]:
-----------------------------------------------------------------

This seems reasonable to me. Jgilbert is probably the best person to ask for r+
Attachment #8753354 - Flags: feedback?(jmuizelaar) → review?(jgilbert)
Comment on attachment 8753354 [details] [diff] [review]
1245747.patch

Review of attachment 8753354 [details] [diff] [review]:
-----------------------------------------------------------------

Let's not destroy contexts unless we have a plan for all contexts.

::: gfx/gl/GLContextProviderEGL.cpp
@@ +332,5 @@
>      bool succeeded = true;
>  
> +    if (IsContextLost()) {
> +        return false;
> +    }

Don't need {} for simple return statements.

@@ +361,5 @@
>                                                mContext);
>          if (!succeeded) {
>              int eglError = sEGLLibrary.fGetError();
>              if (eglError == LOCAL_EGL_CONTEXT_LOST) {
> +                sEGLLibrary.fDestroyContext(EGL_DISPLAY(), mContext);

Don't destroy this context unless you also destroy all other existing contexts.

@@ +378,5 @@
>      } else {
>          MOZ_ASSERT(sEGLLibrary.CachedCurrentContextMatches());
> +
> +        if (sEGLLibrary.HasRobustness() && fGetGraphicsResetStatus() != NO_ERROR) {
> +            sEGLLibrary.fDestroyContext(EGL_DISPLAY(), mContext);

Also no reason to DestroyContext here.

::: gfx/gl/GLLibraryEGL.cpp
@@ +220,5 @@
>  
> +void
> +GLLibraryEGL::HandleDeviceLost()
> +{
> +  fTerminate(mEGLDisplay);

4-space not 2-space.
Attachment #8753354 - Flags: review?(jgilbert) → review-
Attached patch 1245747.patch (obsolete) — Splinter Review
Attachment #8753354 - Attachment is obsolete: true
Attachment #8771936 - Flags: review?(jgilbert)
Comment on attachment 8771936 [details] [diff] [review]
1245747.patch

Review of attachment 8771936 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/gl/GLContextProviderEGL.cpp
@@ +375,1 @@
>                  mContextLost = true;

Include `mContextLost = true` in HandleDeviceLost() and s/HandleDeviceLost/SetDeviceLost/.
Attachment #8771936 - Flags: review?(jgilbert) → review+
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b0b62fc39e
Terminate the EGL display on context loss - r=jgilbert
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/137df01eff02
Backed out changeset c5b0b62fc39e for bustage
(In reply to Edwin Flores [:eflores] [:edwin] from comment #4)
> (of course, just adding a null check in the ANGLE code would work as well,
> but this way means not having to faff about upstream and hopefully
> generalises to Android as well)

We should fix ANGLE.

After thinking about it, I'm not sure this bug's current approach is the best course of action. If it's a bug in ANGLE, we should fix that bug, leaving the current patch unnecessary.
Do I understand this right?
Flags: needinfo?(edwin)
Attachment #8771936 - Attachment is obsolete: true
Flags: needinfo?(edwin)
Attachment #8775106 - Flags: review?(jgilbert)
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Review of attachment 8775106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp
@@ +2613,5 @@
>  
> +    if (!mDevice) {
> +        // We should only get here if resetDevice failed.
> +        ASSERT(mDeviceLost);
> +        return true;

false
Attachment #8775106 - Flags: review?(jgilbert) → review-
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Review of attachment 8775106 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/angle/src/libANGLE/renderer/d3d/d3d11/Renderer11.cpp
@@ +2613,5 @@
>  
> +    if (!mDevice) {
> +        // We should only get here if resetDevice failed.
> +        ASSERT(mDeviceLost);
> +        return true;

Nope, `true` is correct.
Attachment #8775106 - Flags: review- → review+
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb6cb6888ca0
Handle ANGLE device reset failure - r=jgilbert
https://hg.mozilla.org/mozilla-central/rev/fb6cb6888ca0
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Crash volume for signature 'rx::Renderer11::testDeviceLost':
 - nightly(version 50):42 crashes from 2016-06-06.
 - aurora (version 49):98 crashes from 2016-06-07.
 - beta   (version 48):503 crashes from 2016-06-06.
 - release(version 47):3018 crashes from 2016-05-31.
 - esr    (version 45):172 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       3       4       9      12       4       3       4
 - aurora        8      11      21      31       9       9       9
 - beta         59      54     105     127      57      50      32
 - release     388     397     646     821     237     230     214
 - esr          27      27      24      18      15      15      14

Affected platform: Windows
This is still reproducible in low volume on Fx50.

  SIGNATURE   | rx::Renderer11::testDeviceLost
  ---------------------------------------------------------------
  CRASH STATS | http://tinyurl.com/zabqlab
  ---------------------------------------------------------------
  OVERVIEW    | 2 crashes on nightly 52
	      | 1 crash on aurora 51
	      | 1 crash on aurora 50
	      | 18 crashes on beta 50 (1 on 50.0b2, 17 on 50.0b1)
The check was removed when we updated ANGLE in bug 1297924.

The patch wasn't upstreamed since apparently some new code "now includes this check"[1] -- which, now that I look at it, doesn't appear to be the case.

We should re-land and uplift this and I'll try to get it upstreamed again.

[1] https://chromium-review.googlesource.com/#/c/364120/
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Approval Request Comment
[Feature/regressing bug #]: bug 1297924
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: was in 50 for a couple of months
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8775106 - Flags: approval-mozilla-beta?
Attachment #8775106 - Flags: approval-mozilla-aurora?
Comment on attachment 8775106 [details] [diff] [review]
1245747-angle.patch

Actually, will add new patch without the assert since mDeviceLost no longer exists.
Attachment #8775106 - Flags: approval-mozilla-beta?
Attachment #8775106 - Flags: approval-mozilla-aurora?
Attached patch 1245747.patchSplinter Review
Approval Request Comment
[Feature/regressing bug #]: bug 1297924
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: was in 50 for a couple of months
[Risks and why]: none
[String/UUID change made/needed]: none
Attachment #8795701 - Flags: review+
Attachment #8795701 - Flags: approval-mozilla-beta?
Attachment #8795701 - Flags: approval-mozilla-aurora?
Comment on attachment 8795701 [details] [diff] [review]
1245747.patch

Crash fix, Aurora51+, Beta50+
Attachment #8795701 - Flags: approval-mozilla-beta?
Attachment #8795701 - Flags: approval-mozilla-beta+
Attachment #8795701 - Flags: approval-mozilla-aurora?
Attachment #8795701 - Flags: approval-mozilla-aurora+
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla52 → ---
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ae90c76179f
Handle ANGLE device reset failure. r=jgilbert
https://hg.mozilla.org/mozilla-central/rev/2ae90c76179f
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.