Crash in [@ mozilla::gl::GLLibraryEGL::fMakeCurrent]
Categories
(Core :: Graphics: CanvasWebGL, defect, P1)
Tracking
()
People
(Reporter: marcia, Assigned: sotaro)
References
Details
(Keywords: crash, regression, Whiteboard: [fennec68.1])
Crash Data
Attachments
(1 file)
|
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
This bug is for crash report bp-8a309548-c8ff-43c6-ad38-72fbe0190412.
Seen while looking at beta crash stats for 67. Although this has been present in other releases, it appears as #3 overall on b11: https://bit.ly/2IBSIkP
Correlations:
(97.85% in signature vs 05.85% overall) android_version = 17 (REL)
(54.84% in signature vs 02.37% overall) adapter_device_id = Adreno (TM) 305
Top 10 frames of crashing thread:
0 libc.so libc.so@0x19848
1 libEGL.so libEGL.so@0x426a6
2 libEGL.so libEGL.so@0xc909
3 libEGL.so libEGL.so@0xc5d9
4 libEGL.so libEGL.so@0x426ca
5 libEGL.so libEGL.so@0x426a6
6 libEGL.so libEGL.so@0xe8d5
7 libEGL.so libEGL.so@0x426a6
8 libEGL.so libEGL.so@0xe7e7
9 libxul.so mozilla::gl::GLLibraryEGL::fMakeCurrent const gfx/gl/GLLibraryEGL.h:165
Updated•6 years ago
|
| Reporter | ||
Comment 1•6 years ago
|
||
51 crashes/8 installs in 67b11. We will see how this pans out in b13. Over 1000 crashes on 66 release, although there are also less installs than crashes.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 2•6 years ago
|
||
This starting rising when 66 went to beta, so presumably that is where the regression is. Given the low volume on nightly, we probably won't be able to identify the regression conclusively via crash reports alone. The vast majority of signatures are during WebGL initialization for Adreno 3xx devices with Android 17. I didn't really see any related changes go into 66 (bug 1514985, bug 1527534 seem innocent enough?). Thoughts Jeff?
Updated•6 years ago
|
Comment 4•6 years ago
|
||
I could not reproduce on an Adreno 330 (Android 5.1.1)
| Assignee | ||
Comment 5•6 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #2)
This starting rising when 66 went to beta, so presumably that is where the regression is.
Very similar crashes happened on 64 and 65.
| Assignee | ||
Comment 6•6 years ago
|
||
I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.
I/Gecko ( 3770): [GFX1]: Unexpected glGetGraphicsResetStatus: 0x1
I/Gecko ( 3770): WebGL(0x80d55000)::ForceLoseContext
[1]
| Assignee | ||
Comment 7•6 years ago
|
||
Between Android 4.2 Jelly Bean (API 17) and Android 4.3 Jelly Bean (API 18), they are both "Jelly Bean", but OpenGL support seems very different. Android 4.3 Jelly Bean (API 18) added support of OpenGL ES 3.0.
https://android.googleblog.com/2013/07/introducing-android-43-sweeter-jelly.html
| Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.
On nexus4(Android 4.3[API 18], Adreno 320). WebGL worked normally. It did not lost WebGL context.
I updated nexus4 with a factory image at the following.
| Assignee | ||
Comment 9•6 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
I tested on nexus4(Android 4.2.2[API 17], Adreno 320). I could not reproduce the crash. But WebGL was very unstable. When I opened the some WebGL pages like [1], WebGL context was lost soon within 3 seconds with the following log.
I checked the regression of WebGL context lost with "./mach mozregression --app fennec --good=2017-02-08 --bad=2018-11-10".
16:31.09 INFO: Last good revision: c40ca7a1bdd93632c6bdc5e23bd33d984d508b19 (2017-03-09)
16:31.09 INFO: First bad revision: a8d497b09753c91783b68c5805c64f34a2f39629 (2017-03-10)
16:31.09 INFO: Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c40ca7a1bdd93632c6bdc5e23bd33d984d508b19&tochange=a8d497b09753c91783b68c5805c64f34a2f39629
Bug 1339256 seems like a culprit.
Comment 10•6 years ago
|
||
Interesting! I'm curious if it's really getting a value of 0x1 here, since that's never an expected return value.
| Assignee | ||
Comment 11•6 years ago
|
||
I checked how each function of fGetGraphicsResetStatus works.
https://searchfox.org/mozilla-central/source/gfx/gl/GLContext.cpp#507
glGetGraphicsResetStatusEXT did not return error. glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR soon returned 0x1. It seems strange, since glGetGraphicsResetStatusEXT worked differently than others.
| Assignee | ||
Comment 12•6 years ago
•
|
||
By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus. But when https://akirodic.com/p/jellyfish/ was visited, WebGL context was lost in 20 seconds with the following log. out of gpu memory seemed to happen.
W/Adreno200-GSL(22710): <sharedmem_gpumem_alloc:991>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
W/Adreno200-GSL(22710): <sharedmem_gpumem_alloc:991>: sharedmem_gpumem_alloc: mmap failed errno 12 Out of memory
W/Adreno200-ES20(22710): <oglLinkEGLImage2GLTexture:1693>: GL_INVALID_OPERATION
W/GeckoConsole(22710): [JavaScript Warning: "Error: WebGL warning: <PresentScreenBuffer>: PublishFrame failed. Losing context."]
I/Gecko (22710): WebGL(0x8a9c0800)::ForceLoseContext
Comment 13•6 years ago
|
||
That sounds reasonable to me.
Do we get different pfn addresses for glGetGraphicsResetStatusEXT, glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR?
Updated•6 years ago
|
| Assignee | ||
Comment 14•6 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #13)
That sounds reasonable to me.
Do we get different pfn addresses for glGetGraphicsResetStatusEXT, glGetGraphicsResetStatus, glGetGraphicsResetStatusARB and glGetGraphicsResetStatusKHR?
Yes, each function returned a different address.
Comment 15•6 years ago
|
||
By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus.
I don't quite get what you mean. On context-loss, we're hoping for a non-zero value from glGetGraphicsResetStatus[,KHR,EXT,ARB].
Are you saying that the EXT version always gives zero? If so, that's the only one we /don't/ want.
Updated•6 years ago
|
| Assignee | ||
Comment 16•6 years ago
•
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #15)
By using glGetGraphicsResetStatusEXT , I did not see an error of fGetGraphicsResetStatus.
I don't quite get what you mean. On context-loss, we're hoping for a non-zero value from glGetGraphicsResetStatus[,KHR,EXT,ARB].
Are you saying that the EXT version always gives zero? If so, that's the only one we /don't/ want.
I wanted to mean that glGetGraphicsResetStatus[,KHR,ARB] always returned 0x1 soon even when robustness was disabled or EGL_NO_RESET_NOTIFICATION was set. Then I did not looked into EXT version enough. EXT version always returned 0x0 when I tested. I just misunderstood that EXT version could bypass 0x1.
When EXT version was used, WebGL content was lost by GLScreenBuffer::PublishFrame() failure.
| Assignee | ||
Comment 17•6 years ago
•
|
||
It seems that glGetGraphicsResetStatus does not work on API 17 and Adreno 3xx.
| Assignee | ||
Comment 18•6 years ago
|
||
Chromium blacklist GPU raster on Adreno 3xx with ES2-only drivers.
https://codereview.chromium.org/1115313002
Comment 19•6 years ago
|
||
Is the device you're testing on an es2-only Adreno 3xx?
| Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #19)
Is the device you're testing on an es2-only Adreno 3xx?
It does not support gles 3.x. But it also support gles 1.x. Chromium also seems to block this case by comment 18.
Comment 21•6 years ago
|
||
Cool, let's consider rejecting WebGLContext init for Adreno 3xx with es2.
Unfortunately we can't blocklist it for Layers, because Layers on Android has a hard requirement of some GLContext.
"WebGL context was lost soon within 3 seconds" sounds like out context-loss timer triggering context-loss when it hits that non-zero value.
One last thing I'd like to try is on Adreno3xx && es2, just treat 0x1 as 0x0.
Can you try this locally?
| Assignee | ||
Comment 22•6 years ago
•
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #21)
"WebGL context was lost soon within 3 seconds" sounds like out context-loss timer triggering context-loss when it hits that non-zero value.
One last thing I'd like to try is onAdreno3xx && es2, just treat 0x1 as 0x0.
Can you try this locally?
I tried it. In this case, glGetGraphicsResetStatus() did not return other error than 0x1. 0x1 was returned very often. Then WebGL context was lost by GLScreenBuffer::PublishFrame() failure that was caused by oom.
Comment 23•6 years ago
|
||
Does it behave OK otherwise? At least for other WebGL content?
| Assignee | ||
Comment 24•6 years ago
•
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #23)
Does it behave OK otherwise?
Other than 0x1 error. It had oom problem. As in Bug 1559758 comment 2, to avoid it, disabling SurfaceFactory_EGLImage and SurfaceFactory_SurfaceTexture in parent process was necessary.
https://phabricator.services.mozilla.com/D35167
At least for other WebGL content?
From the above problem, when WebGL in parent process might affect to oom with compositor's gl context. On fennec, content also run on parent process.
When oom happened, fennec became very unstable and caused oom again very often.
Comment 25•6 years ago
|
||
Ok, let's forbid these devices from using WebGL then.
| Assignee | ||
Comment 26•6 years ago
|
||
OK. I am going to work for it.
| Assignee | ||
Comment 27•6 years ago
|
||
Comment 28•6 years ago
|
||
Comment 29•6 years ago
|
||
| bugherder | ||
Comment 30•6 years ago
|
||
This seems like a low-risk patch to take in 68 still and the crash rate on release makes it look worthwhile. Please nominate this for Beta approval if you agree.
| Assignee | ||
Comment 31•6 years ago
|
||
Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices
Beta/Release Uplift Approval Request
- User impact if declined: Crash might happen on some android devices with adreno 3xx and android_version == 17 during using WebGL.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Fix is relatively simple. It just blocks devices with adreno 3xx and android_version == 17.
- String changes made/needed: none
Comment 32•6 years ago
|
||
Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices
I'm going to punt this to 68.1 as we already built 68.0.
Updated•6 years ago
|
Comment 33•6 years ago
|
||
Comment on attachment 9074425 [details]
Bug 1546397 - Blacklist WebGL on some android devices
Avoids a WebGL crash on some Android devices. Approved for Fennec 68.1b1.
Comment 34•6 years ago
|
||
| bugherder uplift | ||
Updated•6 years ago
|
Updated•6 years ago
|
Description
•