<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 4

•

5 years ago

No, we haven't been able to reproduce this crash locally, but are seeing ~10 crashes per day on nightly.
Maybe it's a specific shader causing the issue.

@sotaro: Would it be possible to force compile all shaders to see if that could be the issue? I don't know if we have all the combinations predefined, so maybe this is not possible and how long this would take.

Flags: needinfo?(ktaeleman) → needinfo?(sotaro.ikeda.g)

Comment 5

•

5 years ago

If you want to find out which shader it is you could try something like this:
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/js/src/jit/x86-shared/Assembler-x86-shared.cpp#119-126
and store the name of the shader on the stack and we could read it out of the minidumps

Sotaro Ikeda [:sotaro]

Comment 6

•

5 years ago

•

Edited

All crashes in Socorro have the following GraphicsCriticalError message. But the crashes in comment 0 did not have the message.

|[0][GFX1-]: Failed to create EGLContext!: 0x300c

0x300c error means EGL_BAD_PARAMETER.
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/gl/GLContextProviderEGL.cpp#301

Sotaro Ikeda [:sotaro]

Comment 7

•

5 years ago

(In reply to Kris Taeleman (:ktaeleman) from comment #4)

@sotaro: Would it be possible to force compile all shaders to see if that could be the issue? I don't know if we have all the combinations predefined, so maybe this is not possible and how long this would take.

ShaderPrecacheFlags::FULL_COMPILE flag seems to make WebRender to compile majoriy of shaders at startup, though it might not be all shader combinations.

https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/webrender_bindings/src/bindings.rs#3747
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/webrender_bindings/src/bindings.rs#3747

:ktaeleman, how do you know the crashes happened by on demand shader compilation?

Flags: needinfo?(sotaro.ikeda.g)

Assignee

Comment 8

•

5 years ago

:ktaeleman, how do you know the crashes happened by on demand shader compilation?

I think we're just guessing, based on libllvm-glnext.so in the crash signature.

Reporter

Updated

•

5 years ago

No longer blocks: wr-android

Assignee

Comment 9

•

5 years ago

On a Moto g7 play, I can reproduce fairly reliably by enabling gfx.webrender.debug.show-overdraw. (I was just randomly toggling prefs to see if anything caused a crash!)

I'm sceptical that enough users are flipping this pref in the wild to give us these crash numbers, so maybe there are multiple ways to trigger it.

Assignee

Comment 10

•

5 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #6)

All crashes in Socorro have the following GraphicsCriticalError message. But the crashes in comment 0 did not have the message.

|[0][GFX1-]: Failed to create EGLContext!: 0x300c

0x300c error means EGL_BAD_PARAMETER.
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/gl/GLContextProviderEGL.cpp#301

This is a red herring: since bug 1474281 we attempt to create an OpenGL context first, then fall back to GLES. The error message is from failing to create the GL context, but the GLES context is created successfully immediately afterwards.

Assignee

Comment 11

•

5 years ago

I cannot reproduce this crash ever in GVE, but can reproduce in Fenix. Setting ShaderPrecacheFlags::FULL_COMPILE makes it crash at startup. Sometimes in a debug overdraw shader, but not always, so I don't think that is important. The specific shader which crashes seems to vary: sometimes it is the first one, sometimes a few compile successfully before the crash.

Kelsey Gilbert [:jgilbert]

Assignee

Comment 12

•

5 years ago

Figured a bit more of this out:

I can in fact reproduce from GVE fairly easily. But it is even easier in Sample Browser / Fenix. I think because the SkiaGL (used for the android UI) can either trigger the crash itself or help set up the required state for the crash to occur.
The crash seems to occur when calling glLinkProgram when one of the shader sources is identical to a shader source used for a previously linked program. Perhaps a bug in some driver-internal code which is attempting to cache shaders?
This is a very common scenario when gfx.webrender.debug.show-overdraw is enabled, as long as gfx.webrender.use-optimized-shaders is also enabled. This is because the shader optimization pass makes it so that:
a) The vertex source for a debug-overdraw variant is identical to the non-debug-overdraw variant (as debug overdraw only affects the fragment shader).
b) Different shaders' debug overdraw variants have the exact same fragment source as each other (because it just outputs a fixed colour).
The reason why the specific shader which crashed kept changing was because of webrender's shader cache. Say we have programs A, B, and C which all have identical fragment shader source. On the first run A will be successfully compiled and cached, then B will cause the crash. On the second run A will be loaded from the cache, B will be successfully compiled, then C will cause the crash. And so on.
Even with this knowledge, I have been unable to reproduce in wrench or a custom-written test app. And like I said it does occur less frequently in GVE than Fenix, so there must be some required state that I haven't figured out yet.

Now, debug overdraw will not be the reason users are hitting this crash: it's unlikely many users have enabled a debug option, it only causes crashes in conjunction with optimized shaders (which only landed a month ago), and some (most?) crashes appear to be on fenix stable rather than nightly, which means webrender probably isn't even enabled. My theory is therefore that webgl pages are causing this. A website could attempt to compile 2 different programs which share a common shader. Perhaps even visiting a webgl app multiple times could trigger this. I have, however, been unable to trigger this myself by writing a webgl app.

Fixing webrender's optimized debug-overdraw shaders is simple (we can add a unique comment to each shader's source). While I'm not certain webgl is the cause of these crashes, perhaps we should try to do the same there and see if the numbers go down. Jeff, do you think that'd be reasonable? Speculatively appending a unique comment to the end of each shader string webgl passes to glShaderSource()? (On Adreno 506 only)

Flags: needinfo?(jgilbert)

Comment 13

•

5 years ago

Wow, a bug in driver-level shader caching sounds awful. It's worth trying a comment, but it might cache with comments stripped. If that doesn't work, we can try adding a random unused variable or something.

Great dissection!

Flags: needinfo?(jgilbert)

Reporter

Updated

•

5 years ago

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ]

Reporter

Updated

•

5 years ago

Comment 14

•

5 years ago

Attached file Bug 1609191 - Ensure shader sources are always unique to workaround adreno crash. r?gw — Details

On some Adreno 505 and 506 devices we are encountering driver crashes during
glLinkProgram(). The only circumstance in which we have been able to reproduce
locally is when the show-overdraw debug option is enabled. The reason appears to
be that, due to shader optimisation, the debug overdraw variants of many shaders
have identical source code. The crash seems to occur when linking a shader which
has identical source code to a previously linked shader.

This does not, however, explain the non-insignificant numbers of crashes in the
wild because a) it's unlikely many users are enabling overdraw debugging, and b)
some crash reports predate the commit which enabled shader
optimisation. However, it is possible that for a different reason we are
compiling multiple shaders with identical source code.

To attempt to work around this crash this change adds a random comment to the
end of each shader source string, on the affected devices.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → jnicol

Status: NEW → ASSIGNED

Comment 15

•

5 years ago

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bbe5ed51273b Ensure shader sources are always unique to workaround adreno crash. r=gw

Assignee

Comment 16

•

5 years ago

I'm not very optimistic about this fixing the bug, so let's leave this open for now

Keywords: leave-open

Cristina Coroiu [:ccoroiu]

Comment 17

•

5 years ago

Pushed by ccoroiu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e76ed046c4f9 Backed changeset bbe5ed51273b for webrender failures. CLOSED TREE

Comment 18

•

5 years ago

Backed changeset bbe5ed51273b (Bug 1609191) for webrender failures

Backout: https://hg.mozilla.org/integration/autoland/rev/e76ed046c4f956a5ce7c83e3813ef35242d4e86e

Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=bbe5ed51273b2a3a90c736b024f3e27cf01314aa

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=309852776&repo=autoland&lineNumber=169

Flags: needinfo?(jnicol)

Assignee

Comment 19

•

5 years ago

Sorry, stupid mistake. forgot to update webrender's Cargo.lock

Flags: needinfo?(jnicol)

Bogdan Tara[:bogdan_tara | bogdant]

Comment 20

•

5 years ago

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c08d4fe356e5 Ensure shader sources are always unique to workaround adreno crash. r=gw

Comment 21

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bbe5ed51273b
https://hg.mozilla.org/mozilla-central/rev/e76ed046c4f9
https://hg.mozilla.org/mozilla-central/rev/c08d4fe356e5

Reporter

Comment 22

•

5 years ago

@jnicol: Could you add a resolved callstack to this bug?

Flags: needinfo?(jnicol)

General search query for this -
https://crash-stats.mozilla.org/search/?signature=~libllvm-glnext.so&date=%3E%3D2020-12-19T18%3A47%3A00.000Z&date=%3C2021-01-19T18%3A47%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Comment 23

•

5 years ago

Kevin Brosnan [Ex-Mozilla]

Updated

•

5 years ago

Flags: needinfo?(jnicol)

Comment 24

•

4 years ago

This popped up in b87 again with Adreno 505 devices, mostly the Moto G6. Did we revert this fix?

Flags: needinfo?(jnicol)

Assignee

Comment 25

•

4 years ago

Nope, we never had a fix. The patch that landed earlier in this bug didn't do anything. Webrender was switched on on beta for these devices so it's probably due to increased numbers of users on beta.

The crash numbers have been so low we've never been able to figure out any information about this bug. We're expecting the numbers to still be reasonably low, but hopefully large enough that we get some STR from a user.

Jim, fyi this has popped up.

Flags: needinfo?(jnicol) → needinfo?(jmathies)

Assignee

Comment 26

•

4 years ago

This has unfortunately spiked quite significantly since reaching release. I can reproduce somewhat reliably on my Moto G6 if I enable gfx.webrender.precache-shaders to make us compile all shaders at startup. but only in a fenix build, never in geckoview_example. This is the crash stack:

#00 pc 00732610  /vendor/lib/libllvm-glnext.so (ESXLinker::bcConstruct()+752)
#01 pc 00734e19  /vendor/lib/libllvm-glnext.so (SOLinker::linkShaders(QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+88)
#02 pc 0072dafb  /vendor/lib/libllvm-glnext.so (CompilerContext::LinkProgram(unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+322)
#03 pc 007e0367  /vendor/lib/libllvm-glnext.so (QGLCLinkProgram(void*, unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+58)
#04 pc 0015091d  /vendor/lib/egl/libGLESv2_adreno.so (EsxShaderCompiler::CompileProgram(EsxContext*, EsxProgram const*, EsxLinkedList const*, EsxLinkedList const*, EsxInfoLog*)+1700)
#05 pc 00127b57  /vendor/lib/egl/libGLESv2_adreno.so (EsxProgram::Link(EsxContext*)+494)
#06 pc 000a881b  /vendor/lib/egl/libGLESv2_adreno.so (EsxContext::LinkProgram(EsxProgram*)+62)
#07 pc 0553d865  /data/app/org.mozilla.fenix.debug-1ifaNJWoJ7dxuSFtgz58tA==/lib/arm/libxul.so (offset 0x2688000) (webrender::device::gl::Device::reset_state::h1719f7991337b528+200)

Note the webrender function says gl::Device::reset_state, and sometimes it says compile_shader, but really it's the glLinkProgram call in Device::link_program that is crashing.

Sometimes, the crash occurs in android UI code:

#00 pc 00732610  /vendor/lib/libllvm-glnext.so (ESXLinker::bcConstruct()+752)
#01 pc 00734e19  /vendor/lib/libllvm-glnext.so (SOLinker::linkShaders(QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+88)
#02 pc 0072dafb  /vendor/lib/libllvm-glnext.so (CompilerContext::LinkProgram(unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+322)
#03 pc 007e0367  /vendor/lib/libllvm-glnext.so (QGLCLinkProgram(void*, unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+58)
#04 pc 0015091d  /vendor/lib/egl/libGLESv2_adreno.so (EsxShaderCompiler::CompileProgram(EsxContext*, EsxProgram const*, EsxLinkedList const*, EsxLinkedList const*, EsxInfoLog*)+1700)
#05 pc 00127b57  /vendor/lib/egl/libGLESv2_adreno.so (EsxProgram::Link(EsxContext*)+494)
#06 pc 000a881b  /vendor/lib/egl/libGLESv2_adreno.so (EsxContext::LinkProgram(EsxProgram*)+62)
#07 pc 003c12bb  /system/lib/libhwui.so (GrGLProgramBuilder::CreateProgram(GrPipeline const&, GrPrimitiveProcessor const&, GrProgramDesc*, GrGLGpu*)+750)#08 pc 003569bb  /system/lib/libhwui.so (GrGLGpu::ProgramCache::refProgram(GrGLGpu const*, GrPipeline const&, GrPrimitiveProcessor const&, bool)+658)
#09 pc 00354f97  /system/lib/libhwui.so (GrGLGpu::flushGLState(GrPipeline const&, GrPrimitiveProcessor const&, bool)+38)
#10 pc 00355a65  /system/lib/libhwui.so (GrGLGpu::draw(GrPipeline const&, GrPrimitiveProcessor const&, GrMesh const*, GrPipeline::DynamicState const*, int)+72)
#11 pc 0034d2bd  /system/lib/libhwui.so (GrOpFlushState::executeDrawsAndUploadsForMeshDrawOp(unsigned int, SkRect const&)+184)
#12 pc 0039ff91  /system/lib/libhwui.so (GrRenderTargetOpList::onExecute(GrOpFlushState*)+204)
#13 pc 00396df1  /system/lib/libhwui.so (GrDrawingManager::executeOpLists(int, int, GrOpFlushState*)+304)
#14 pc 00396a8b  /system/lib/libhwui.so (GrDrawingManager::internalFlush(GrSurfaceProxy*, GrResourceCache::FlushType, int, GrBackendSemaphore*)+966)
#15 pc 003970db  /system/lib/libhwui.so (GrDrawingManager::prepareSurfaceForExternalIO(GrSurfaceProxy*, int, GrBackendSemaphore*)+58)
#16 pc 0035f723  /system/lib/libhwui.so (android::uirenderer::skiapipeline::SkiaPipeline::renderFrame(android::uirenderer::LayerUpdateQueue const&, SkRect const&, std::__1::vector<android::sp<android::uirenderer::RenderNode>, std::__1::allocator<android::sp<android::uirenderer::RenderNode>>> const&, bool, bool, android::uirenderer::Rect const&, sk_sp<SkSurface>)+130)
#17 pc 0035ed8b  /system/lib/libhwui.so (android::uirenderer::skiapipeline::SkiaOpenGLPipeline::draw(android::uirenderer::renderthread::Frame const&, SkRect const&, SkRect const&, android::uirenderer::FrameBuilder::LightGeometry const&, android::uirenderer::LayerUpdateQueue*, android::uirenderer::Rect const&, bool, bool, android::uirenderer::BakedOpRenderer::LightInfo const&, std::__1::vector<android::sp<android::uirenderer::RenderNode>, std::__1::allocator<android::sp<android::uirenderer::RenderNode>>
#18 pc 00099b2b  /system/lib/libhwui.so (android::uirenderer::renderthread::CanvasContext::draw()+150)
#19 pc 003624b5  /system/lib/libhwui.so (_ZNSt3__110__function6__funcIZN7android10uirenderer12renderthread13DrawFrameTask11postAndWaitEvE3$_0NS_9allocatorIS6_EEFvvEEclEv$c303f2d2360db58ed70a2d0ac7ed911b+576)
#20 pc 0032afcf  /system/lib/libhwui.so (android::uirenderer::WorkQueue::process()+122)
#21 pc 000a256f  /system/lib/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+178)
#22 pc 0000c08b  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+166)
#23 pc 000632b5  /system/lib/libc.so (__pthread_start(void*)+22)
#24 pc 0001de79  /system/lib/libc.so (__start_thread+24)

If I add android:hardwareAccelerated="false" to fenix's AndroidManifest.xml then I can no longer reproduce the crash. My guess is that the driver's shader cache is not thread safe, and a race between webrender and the android UI compiling shaders causes the crash. That would be a really unfortunate fix to have to make..

Julien Cristau [:jcristau] (back August 18)

Assignee

Comment 27

•

4 years ago

Sotaro, can you think of any way we can synchronize webrender's render thread and Android UI renderer thread to ensure they do not run at the same time?

Flags: needinfo?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 28

•

4 years ago

•

Edited

Hmm, I am not sure if there is a way for it. On chromium case, GL calls in the main app process is called on Android UI renderer thread by using hidden apis.

Flags: needinfo?(sotaro.ikeda.g)

Updated

•

4 years ago

status-firefox87: --- → affected

status-firefox88: --- → affected

status-firefox89: --- → affected

tracking-firefox87: --- → +

tracking-firefox88: --- → +

Updated

•

4 years ago

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]

Flags: needinfo?(jmathies)

Updated

•

4 years ago

URL: https://crash-stats.mozilla.org/searc...

Assignee

Comment 29

•

4 years ago

I think we now have a reasonable plan here:

Disable UI hardware acceleration globally in Fenix's AndroidManifest.xml
We can then enable hardware acceleration on a per-window basis: https://developer.android.com/guide/topics/graphics/hardware-accel
We can use Application.registerActivityLifecycleCallbacks to do this in a single place for all of Fenix's Activities when they are created (Fenix already uses these callbacks for other purposes)

The question is which devices to keep hardware acceleration disabled for. Here is a crash stats search for libllvm-glnext.so. It seems that 505, 506, 510, and 530 are affected, then there is a huge drop in numbers. Also based on those numbers I would guess that not all 530 devices are affected, otherwise I would think the numbers would be higher than say for 506. In any case, using the GL renderer string is trickier in Fenix than gecko as it is not readily available. It's possible to create an offscreen EGL and GL context to query it, but this takes a non-zero amount of milliseconds so doesn't seem ideal during startup.

So instead I think we should use information available in the android.os.Build package. Using the model would be very precise, but there is a long list of models. So I think our best bet is to use the board. If we facet the crash stats search on android board, the top 5 results cover over 90% of the crashes. That's msm8953, msm8976, msm8937, msm8996, and sdm450_mh4x. There may be devices using these boards with non-broken drivers, but it's not the end of the world for them to have hardware acceleration disabled.

Then it's down to <1.0% per board. Some of these will certainly be the same crash, but equally some of them may be random unrelated crashes which also occur in libllvm-glnext.so. But I think it's okay if we don't cover 100% of them - As far as we're aware this doesn't repeatedly crash, just once or twice until all the shaders are compiled and cached, so the fix doesn't have to be perfect.

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ]

Assignee

Comment 30

•

4 years ago

Whoops, my tab was stale so accidentally reverted Jim's signature update

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]

Assignee

Comment 31

•

4 years ago

Created fenix pull request here: https://github.com/mozilla-mobile/fenix/pull/18817

No gecko changes should be required if that lands.

Comment 32

•

4 years ago

Per https://github.com/mozilla-mobile/fenix/pull/18817#issuecomment-814230892, it sounds like the upstream fix is intended to ride whatever release it lands on. Not sure if there's any kind of Gecko-side workaround we could implement for 88 in the mean time as at least a half-fix?

status-firefox87: affected → wontfix

status-firefox88: affected → wontfix

tracking-firefox89: --- → +

Flags: needinfo?(jnicol)

Assignee

Comment 33

•

4 years ago

Other than disabling webrender, no I can't think of one.

Are the numbers bad enough to warrant that? Yes they spike badly but each user should only crash once after each upgrade, so averaged over the course of the cycle they're not particularly high.

Flags: needinfo?(jnicol)

Dzmitry Malyshau [:kvark]

Comment 34

•

4 years ago

No, if we don't have a relatively low-risk option for mitigating this in 88, we should just live with it for another cycle.

Comment 35

•

4 years ago

Add more signatures

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] [@ libGLESv2_adreno.s…

Assignee

Updated

•

4 years ago

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] [@ libGLESv2_adreno.s… → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ]

Assignee

Comment 38

•

4 years ago

•

Edited

I have found out some fascinating things about this crash, which I will document shortly, but don't think I'm any closer to having a workaround unfortunately.

I will keep digging, but in the meantime:

Searching the crash reports for all libllvm-glnext.so signatures, we can see that 85% of the crashes occur at address 0x54. This matches the crash I can reproduce locally. It's probably a safe assumption that all crashes at 0x54 are the same bug, and the others crashes are random unrelated ones.

Filtering by that address shows all the crashes occur on Android SDK version 28 (Android 9), with Adreno driver version V@331.0. This again matches what I can reproduce on locally.

And the GPUs are almost all Adreno 505 and 506, as we originally suspected in this bug. In comment 29 I discussed eg the 530 being affected too, but I think they are crashing at a different address, so likely a different bug (and much lower numbers).

So, until I'm able to find a proper solution to this, I think blocking webrender on (Adreno 505 OR 506) AND android 9 would be reasonable. Android 9 is only 18% or our 505/506 users, and Adreno 505/506 is about 7% of our android population. So this would be disabling webrender for ~1% of users to stop 85% of these libllvm-glnext.so crashes.

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]

Assignee

Comment 39

•

4 years ago

Attached file Bug 1609191 - Revert previous attempt to work around Adreno shader compilation crash. r?nical — Details

On Adreno 505 and 506 devices we encounter frequent crashes when
compiling shaders. We previously attempted to work around this by
ensuring that the source strings were always unique, as we believed it
may be due to buggy caching internally in the driver. This did not
have any effect, however, so this patch reverts the attempted
workaround.

Assignee

Comment 40

•

4 years ago

Attached file Bug 1609191 - Disable webrender on devices affected by Adreno shader compilation crash. r?nical — Details

We encounter frequent crashes in glLinkProgram on some Adreno
devices. This is likely due to a driver bug, but we have been unable
to figure out the exact cause and work around it.

According to the crash data, this bug appears to affect Adreno 505 and
506 devices running Android 9 only. Disable webrender on these devices
in order to avoid the crash.

Depends on D114949

https://hg.mozilla.org/mozilla-central/rev/baa304ec5f8a
https://hg.mozilla.org/mozilla-central/rev/3686cde7ffa5

Comment 41

•

4 years ago

Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/baa304ec5f8a Revert previous attempt to work around Adreno shader compilation crash. r=nical https://hg.mozilla.org/integration/autoland/rev/3686cde7ffa5 Disable webrender on devices affected by Adreno shader compilation crash. r=nical

Atila Butkovits

Comment 42

•

4 years ago

bugherder

Assignee

Comment 43

•

4 years ago

Attached patch Patch for beta — Details — Splinter Review

Patch rebased on beta (slight conflict due to Adreno 3xx being enabled on nightly but not beta)

Assignee

Comment 44

•

4 years ago

Comment on attachment 9221767 [details] [diff] [review]
Patch for beta

Beta/Release Uplift Approval Request

User impact if declined: Lots of crashes, spiking after each release
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Switches back to layers graphics backend for some devices.
String changes made/needed:

Attachment #9221767 - Flags: approval-mozilla-beta?

Comment 45

•

4 years ago

Comment on attachment 9221767 [details] [diff] [review]
Patch for beta

Approved for Fenix 89.0.0-beta.7.

Jamie, did you intend for this bug to still be open after the latest patches?

Flags: needinfo?(jnicol)

Attachment #9221767 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Assignee

Comment 46

•

4 years ago

•

Edited

No, good catch. the leave-open was from the patch many months ago.

Let's close this bug now. This should avoid the vast majority of the crashes. There will still be some as the signature list isn't correct: we have sort of just dumped any libllvm-glnext.so signatures in it. But on recent investigation some of them are unrelated. Once the dust settles we can create another bug for remaining signatures, and another bug for further investigation and proper solution to this one.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Flags: needinfo?(jnicol)

Keywords: leave-open

Resolution: --- → FIXED

Updated

•

4 years ago

status-firefox90: --- → fixed

status-firefox-esr78: --- → unaffected

tracking-firefox90: --- → +

Target Milestone: --- → 90 Branch

https://hg.mozilla.org/releases/mozilla-beta/rev/68d6b59a6118

Comment 47

•

4 years ago

bugherder uplift

status-firefox89: affected → fixed