Closed Bug 1546397 Opened 7 months ago Closed 5 months ago

Crash in [@ mozilla::gl::GLLibraryEGL::fMakeCurrent]

Categories

(Core :: Canvas: WebGL, defect, P1, critical)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- fixed
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fixed

People

(Reporter: marcia, Assigned: sotaro)

References

Details

(Keywords: crash, regression, Whiteboard: [fennec68.1])

Crash Data

Attachments

(1 file)

This bug is for crash report bp-8a309548-c8ff-43c6-ad38-72fbe0190412.

Seen while looking at beta crash stats for 67. Although this has been present in other releases, it appears as #3 overall on b11: https://bit.ly/2IBSIkP

Correlations:
(97.85% in signature vs 05.85% overall) android_version = 17 (REL)
(54.84% in signature vs 02.37% overall) adapter_device_id = Adreno (TM) 305

Top 10 frames of crashing thread:

0 libc.so libc.so@0x19848 
1 libEGL.so libEGL.so@0x426a6 
2 libEGL.so libEGL.so@0xc909 
3 libEGL.so libEGL.so@0xc5d9 
4 libEGL.so libEGL.so@0x426ca 
5 libEGL.so libEGL.so@0x426a6 
6 libEGL.so libEGL.so@0xe8d5 
7 libEGL.so libEGL.so@0x426a6 
8 libEGL.so libEGL.so@0xe7e7 
9 libxul.so mozilla::gl::GLLibraryEGL::fMakeCurrent const gfx/gl/GLLibraryEGL.h:165

51 crashes/8 installs in 67b11. We will see how this pans out in b13. Over 1000 crashes on 66 release, although there are also less installs than crashes.

Component: General → Graphics
Priority: P2 → --
Product: Firefox for Android → Core

This starting rising when 66 went to beta, so presumably that is where the regression is. Given the low volume on nightly, we probably won't be able to identify the regression conclusively via crash reports alone. The vast majority of signatures are during WebGL initialization for Adreno 3xx devices with Android 17. I didn't really see any related changes go into 66 (bug 1514985, bug 1527534 seem innocent enough?). Thoughts Jeff?

Component: Graphics → Canvas: WebGL
Flags: needinfo?(jgilbert)
Priority: -- → P2

Can you repro?

Flags: needinfo?(jgilbert) → needinfo?(tdaede)
Assignee: nobody → jgilbert
Priority: P2 → P1

I could not reproduce on an Adreno 330 (Android 5.1.1)

Flags: needinfo?(tdaede)

I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.

I/Gecko ( 3770): [GFX1]: Unexpected glGetGraphicsResetStatus: 0x1
I/Gecko ( 3770): WebGL(0x80d55000)::ForceLoseContext

[1]

Between Android 4.2 Jelly Bean (API 17) and Android 4.3 Jelly Bean (API 18), they are both "Jelly Bean", but OpenGL support seems very different. Android 4.3 Jelly Bean (API 18) added support of OpenGL ES 3.0.
https://android.googleblog.com/2013/07/introducing-android-43-sweeter-jelly.html

(In reply to Sotaro Ikeda [:sotaro] from comment #6)

I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.

On nexus4(Android 4.3[API 18], Adreno 320). WebGL worked normally. It did not lost WebGL context.

I updated nexus4 with a factory image at the following.

(In reply to Sotaro Ikeda [:sotaro] from comment #6)

I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.

I checked the regression of WebGL context lost with "./mach mozregression --app fennec --good=2017-02-08 --bad=2018-11-10".

16:31.09 INFO: Last good revision: c40ca7a1bdd93632c6bdc5e23bd33d984d508b19 (2017-03-09)
16:31.09 INFO: First bad revision: a8d497b09753c91783b68c5805c64f34a2f39629 (2017-03-10)
16:31.09 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c40ca7a1bdd93632c6bdc5e23bd33d984d508b19&tochange=a8d497b09753c91783b68c5805c64f34a2f39629

Bug 1339256 seems like a culprit.

Interesting! I'm curious if it's really getting a value of 0x1 here, since that's never an expected return value.

I checked how each function of fGetGraphicsResetStatus works.
https://searchfox.org/mozilla-central/source/gfx/gl/GLContext.cpp#507

glGetGraphicsResetStatusEXT did not return error. glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR soon returned 0x1. It seems strange, since glGetGraphicsResetStatusEXT worked differently than others.

By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus. But when https://akirodic.com/p/jellyfish/ was visited, WebGL context was lost in 20 seconds with the following log. out of gpu memory seemed to happen.

W/Adreno200-GSL(22710): <sharedmem_gpumem_alloc:991>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
W/Adreno200-GSL(22710): <sharedmem_gpumem_alloc:991>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
W/Adreno200-ES20(22710): <oglLinkEGLImage2GLTexture:1693>: GL_INVALID_OPERATION
W/GeckoConsole(22710): [JavaScript Warning: "Error: WebGL warning: <PresentScreenBuffer>: PublishFrame failed. Losing context."]
I/Gecko (22710): WebGL(0x8a9c0800)::ForceLoseContext

That sounds reasonable to me.

Do we get different pfn addresses for glGetGraphicsResetStatusEXT, glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR?

Assignee: jgilbert → sotaro.ikeda.g

(In reply to Jeff Gilbert [:jgilbert] from comment #13)

That sounds reasonable to me.

Do we get different pfn addresses for glGetGraphicsResetStatusEXT, glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR?

Yes, each function returned a different address.

Depends on: 1559758

By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus.
I don't quite get what you mean. On context-loss, we're hoping for a non-zero value from glGetGraphicsResetStatus[,KHR,EXT,ARB].
Are you saying that the EXT version always gives zero? If so, that's the only one we /don't/ want.

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Jeff Gilbert [:jgilbert] from comment #15)

By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus.
I don't quite get what you mean. On context-loss, we're hoping for a non-zero value from glGetGraphicsResetStatus[,KHR,EXT,ARB].
Are you saying that the EXT version always gives zero? If so, that's the only one we /don't/ want.

I wanted to mean that glGetGraphicsResetStatus[,KHR,ARB] always returned 0x1 soon even when robustness was disabled or EGL_NO_RESET_NOTIFICATION was set. Then I did not looked into EXT version enough. EXT version always returned 0x0 when I tested. I just misunderstood that EXT version could bypass 0x1.

When EXT version was used, WebGL content was lost by GLScreenBuffer::PublishFrame() failure.

Flags: needinfo?(sotaro.ikeda.g)

It seems that glGetGraphicsResetStatus does not work on API 17 and Adreno 3xx.

Chromium blacklist GPU raster on Adreno 3xx with ES2-only drivers.
https://codereview.chromium.org/1115313002

Is the device you're testing on an es2-only Adreno 3xx?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Jeff Gilbert [:jgilbert] from comment #19)

Is the device you're testing on an es2-only Adreno 3xx?

It does not support gles 3.x. But it also support gles 1.x. Chromium also seems to block this case by comment 18.

Flags: needinfo?(sotaro.ikeda.g)

Cool, let's consider rejecting WebGLContext init for Adreno 3xx with es2.
Unfortunately we can't blocklist it for Layers, because Layers on Android has a hard requirement of some GLContext.

"WebGL context was lost soon within 3 seconds" sounds like out context-loss timer triggering context-loss when it hits that non-zero value.
One last thing I'd like to try is on Adreno3xx && es2, just treat 0x1 as 0x0.
Can you try this locally?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Jeff Gilbert [:jgilbert] from comment #21)

"WebGL context was lost soon within 3 seconds" sounds like out context-loss timer triggering context-loss when it hits that non-zero value.
One last thing I'd like to try is on Adreno3xx && es2, just treat 0x1 as 0x0.
Can you try this locally?

I tried it. In this case, glGetGraphicsResetStatus() did not return other error than 0x1. 0x1 was returned very often. Then WebGL context was lost by GLScreenBuffer::PublishFrame() failure that was caused by oom.

Flags: needinfo?(sotaro.ikeda.g)

Does it behave OK otherwise? At least for other WebGL content?

(In reply to Jeff Gilbert [:jgilbert] from comment #23)

Does it behave OK otherwise?

Other than 0x1 error. It had oom problem. As in Bug 1559758 comment 2, to avoid it, disabling SurfaceFactory_EGLImage and SurfaceFactory_SurfaceTexture in parent process was necessary.
https://phabricator.services.mozilla.com/D35167

At least for other WebGL content?

From the above problem, when WebGL in parent process might affect to oom with compositor's gl context. On fennec, content also run on parent process.

When oom happened, fennec became very unstable and caused oom again very often.

Ok, let's forbid these devices from using WebGL then.

OK. I am going to work for it.

Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9af3c0c1ede0
Blacklist WebGL on some android devices r=jgilbert
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69

This seems like a low-risk patch to take in 68 still and the crash rate on release makes it look worthwhile. Please nominate this for Beta approval if you agree.

Flags: needinfo?(sotaro.ikeda.g)

Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices

Beta/Release Uplift Approval Request

  • User impact if declined: Crash might happen on some android devices with adreno 3xx and android_version == 17 during using WebGL.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: none
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Fix is relatively simple. It just blocks devices with adreno 3xx and android_version == 17.
  • String changes made/needed: none
Flags: needinfo?(sotaro.ikeda.g)
Attachment #9074425 - Flags: approval-mozilla-beta?

Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices

I'm going to punt this to 68.1 as we already built 68.0.

Attachment #9074425 - Flags: approval-mozilla-beta? → approval-mozilla-esr68?

Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices

Avoids a WebGL crash on some Android devices. Approved for Fennec 68.1b1.

Attachment #9074425 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Whiteboard: [fennec68.1]
You need to log in before you can comment on or make changes to this bug.