Closed Bug 1609191 Opened 4 years ago Closed 3 years ago

Some Adreno 5xx devices crash during shader compilation

Categories

(Core :: Graphics: WebRender, defect, P2)

72 Branch
All
Android
defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox87 + wontfix
firefox88 + wontfix
firefox89 + fixed
firefox90 + fixed

People

(Reporter: ktaeleman, Assigned: jnicol)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, Whiteboard: wr-android)

Crash Data

Attachments

(4 files)

Moto G7 play (Adreno 506):
https://crash-stats.mozilla.org/report/index/47cd05e9-e6c5-4055-951e-87f970200114

Xiami Redmi 7A (Adreno 505):
https://crash-stats.mozilla.org/report/index/20c9ded7-0daf-4263-b96b-2b2190200113

On both devices the application had only been running for under 30s, pointing to on demand shader compilation.

Fixing the crash signature so it shows up properly in Socorro.

Crash Signature: libllvm-glnext.so → [@ libllvm-glnext.so@0x732bb0 ]
No longer blocks: wr-74-android

This seems to be happening both on Fenix with WR as on Fennec. Both occuring on Adreno 505 and 506.

Whiteboard: wr-android
Blocks: wr-adreno5xx6xx
No longer blocks: wr-75-android
Severity: normal → S3

Sotaro mentioned he tried to reproduce this crash with an Adreno 506 but did not see the crash in the example app + WebRender. Are there clear STR? Does it happen for you in the example app?

Flags: needinfo?(ktaeleman)

No, we haven't been able to reproduce this crash locally, but are seeing ~10 crashes per day on nightly.
Maybe it's a specific shader causing the issue.

@sotaro: Would it be possible to force compile all shaders to see if that could be the issue? I don't know if we have all the combinations predefined, so maybe this is not possible and how long this would take.

Flags: needinfo?(ktaeleman) → needinfo?(sotaro.ikeda.g)

If you want to find out which shader it is you could try something like this:
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/js/src/jit/x86-shared/Assembler-x86-shared.cpp#119-126
and store the name of the shader on the stack and we could read it out of the minidumps

All crashes in Socorro have the following GraphicsCriticalError message. But the crashes in comment 0 did not have the message.

|[0][GFX1-]: Failed to create EGLContext!: 0x300c

0x300c error means EGL_BAD_PARAMETER.
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/gl/GLContextProviderEGL.cpp#301

(In reply to Kris Taeleman (:ktaeleman) from comment #4)

@sotaro: Would it be possible to force compile all shaders to see if that could be the issue? I don't know if we have all the combinations predefined, so maybe this is not possible and how long this would take.

ShaderPrecacheFlags::FULL_COMPILE flag seems to make WebRender to compile majoriy of shaders at startup, though it might not be all shader combinations.

https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/webrender_bindings/src/bindings.rs#3747
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/webrender_bindings/src/bindings.rs#3747

:ktaeleman, how do you know the crashes happened by on demand shader compilation?

Flags: needinfo?(sotaro.ikeda.g)

:ktaeleman, how do you know the crashes happened by on demand shader compilation?

I think we're just guessing, based on libllvm-glnext.so in the crash signature.

No longer blocks: wr-android

On a Moto g7 play, I can reproduce fairly reliably by enabling gfx.webrender.debug.show-overdraw. (I was just randomly toggling prefs to see if anything caused a crash!)

I'm sceptical that enough users are flipping this pref in the wild to give us these crash numbers, so maybe there are multiple ways to trigger it.

(In reply to Sotaro Ikeda [:sotaro] from comment #6)

All crashes in Socorro have the following GraphicsCriticalError message. But the crashes in comment 0 did not have the message.

|[0][GFX1-]: Failed to create EGLContext!: 0x300c

0x300c error means EGL_BAD_PARAMETER.
https://searchfox.org/mozilla-central/rev/61fceb7c0729773f544a9656f474e36cd636e5ea/gfx/gl/GLContextProviderEGL.cpp#301

This is a red herring: since bug 1474281 we attempt to create an OpenGL context first, then fall back to GLES. The error message is from failing to create the GL context, but the GLES context is created successfully immediately afterwards.

I cannot reproduce this crash ever in GVE, but can reproduce in Fenix. Setting ShaderPrecacheFlags::FULL_COMPILE makes it crash at startup. Sometimes in a debug overdraw shader, but not always, so I don't think that is important. The specific shader which crashes seems to vary: sometimes it is the first one, sometimes a few compile successfully before the crash.

Figured a bit more of this out:

  • I can in fact reproduce from GVE fairly easily. But it is even easier in Sample Browser / Fenix. I think because the SkiaGL (used for the android UI) can either trigger the crash itself or help set up the required state for the crash to occur.
  • The crash seems to occur when calling glLinkProgram when one of the shader sources is identical to a shader source used for a previously linked program. Perhaps a bug in some driver-internal code which is attempting to cache shaders?
  • This is a very common scenario when gfx.webrender.debug.show-overdraw is enabled, as long as gfx.webrender.use-optimized-shaders is also enabled. This is because the shader optimization pass makes it so that:
    a) The vertex source for a debug-overdraw variant is identical to the non-debug-overdraw variant (as debug overdraw only affects the fragment shader).
    b) Different shaders' debug overdraw variants have the exact same fragment source as each other (because it just outputs a fixed colour).
  • The reason why the specific shader which crashed kept changing was because of webrender's shader cache. Say we have programs A, B, and C which all have identical fragment shader source. On the first run A will be successfully compiled and cached, then B will cause the crash. On the second run A will be loaded from the cache, B will be successfully compiled, then C will cause the crash. And so on.
  • Even with this knowledge, I have been unable to reproduce in wrench or a custom-written test app. And like I said it does occur less frequently in GVE than Fenix, so there must be some required state that I haven't figured out yet.

Now, debug overdraw will not be the reason users are hitting this crash: it's unlikely many users have enabled a debug option, it only causes crashes in conjunction with optimized shaders (which only landed a month ago), and some (most?) crashes appear to be on fenix stable rather than nightly, which means webrender probably isn't even enabled. My theory is therefore that webgl pages are causing this. A website could attempt to compile 2 different programs which share a common shader. Perhaps even visiting a webgl app multiple times could trigger this. I have, however, been unable to trigger this myself by writing a webgl app.

Fixing webrender's optimized debug-overdraw shaders is simple (we can add a unique comment to each shader's source). While I'm not certain webgl is the cause of these crashes, perhaps we should try to do the same there and see if the numbers go down. Jeff, do you think that'd be reasonable? Speculatively appending a unique comment to the end of each shader string webgl passes to glShaderSource()? (On Adreno 506 only)

Flags: needinfo?(jgilbert)

Wow, a bug in driver-level shader caching sounds awful. It's worth trying a comment, but it might cache with comments stripped. If that doesn't work, we can try adding a random unused variable or something.

Great dissection!

Flags: needinfo?(jgilbert)
Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ]
See Also: → 1595821

On some Adreno 505 and 506 devices we are encountering driver crashes during
glLinkProgram(). The only circumstance in which we have been able to reproduce
locally is when the show-overdraw debug option is enabled. The reason appears to
be that, due to shader optimisation, the debug overdraw variants of many shaders
have identical source code. The crash seems to occur when linking a shader which
has identical source code to a previously linked shader.

This does not, however, explain the non-insignificant numbers of crashes in the
wild because a) it's unlikely many users are enabling overdraw debugging, and b)
some crash reports predate the commit which enabled shader
optimisation. However, it is possible that for a different reason we are
compiling multiple shaders with identical source code.

To attempt to work around this crash this change adds a random comment to the
end of each shader source string, on the affected devices.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bbe5ed51273b
Ensure shader sources are always unique to workaround adreno crash. r=gw

I'm not very optimistic about this fixing the bug, so let's leave this open for now

Keywords: leave-open
Pushed by ccoroiu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e76ed046c4f9
Backed changeset bbe5ed51273b for webrender failures. CLOSED TREE

Sorry, stupid mistake. forgot to update webrender's Cargo.lock

Flags: needinfo?(jnicol)
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c08d4fe356e5
Ensure shader sources are always unique to workaround adreno crash. r=gw

@jnicol: Could you add a resolved callstack to this bug?

Flags: needinfo?(jnicol)
Flags: needinfo?(jnicol)

This popped up in b87 again with Adreno 505 devices, mostly the Moto G6. Did we revert this fix?

Flags: needinfo?(jnicol)

Nope, we never had a fix. The patch that landed earlier in this bug didn't do anything. Webrender was switched on on beta for these devices so it's probably due to increased numbers of users on beta.

The crash numbers have been so low we've never been able to figure out any information about this bug. We're expecting the numbers to still be reasonably low, but hopefully large enough that we get some STR from a user.

Jim, fyi this has popped up.

Flags: needinfo?(jnicol) → needinfo?(jmathies)

This has unfortunately spiked quite significantly since reaching release. I can reproduce somewhat reliably on my Moto G6 if I enable gfx.webrender.precache-shaders to make us compile all shaders at startup. but only in a fenix build, never in geckoview_example. This is the crash stack:

#00 pc 00732610  /vendor/lib/libllvm-glnext.so (ESXLinker::bcConstruct()+752)
#01 pc 00734e19  /vendor/lib/libllvm-glnext.so (SOLinker::linkShaders(QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+88)
#02 pc 0072dafb  /vendor/lib/libllvm-glnext.so (CompilerContext::LinkProgram(unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+322)
#03 pc 007e0367  /vendor/lib/libllvm-glnext.so (QGLCLinkProgram(void*, unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+58)
#04 pc 0015091d  /vendor/lib/egl/libGLESv2_adreno.so (EsxShaderCompiler::CompileProgram(EsxContext*, EsxProgram const*, EsxLinkedList const*, EsxLinkedList const*, EsxInfoLog*)+1700)
#05 pc 00127b57  /vendor/lib/egl/libGLESv2_adreno.so (EsxProgram::Link(EsxContext*)+494)
#06 pc 000a881b  /vendor/lib/egl/libGLESv2_adreno.so (EsxContext::LinkProgram(EsxProgram*)+62)
#07 pc 0553d865  /data/app/org.mozilla.fenix.debug-1ifaNJWoJ7dxuSFtgz58tA==/lib/arm/libxul.so (offset 0x2688000) (webrender::device::gl::Device::reset_state::h1719f7991337b528+200)

Note the webrender function says gl::Device::reset_state, and sometimes it says compile_shader, but really it's the glLinkProgram call in Device::link_program that is crashing.

Sometimes, the crash occurs in android UI code:

#00 pc 00732610  /vendor/lib/libllvm-glnext.so (ESXLinker::bcConstruct()+752)
#01 pc 00734e19  /vendor/lib/libllvm-glnext.so (SOLinker::linkShaders(QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+88)
#02 pc 0072dafb  /vendor/lib/libllvm-glnext.so (CompilerContext::LinkProgram(unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+322)
#03 pc 007e0367  /vendor/lib/libllvm-glnext.so (QGLCLinkProgram(void*, unsigned int, QGLC_SRCSHADER_IRSHADER**, QGLC_LINKPROGRAM_DATA*, QGLC_LINKPROGRAM_RESULT*)+58)
#04 pc 0015091d  /vendor/lib/egl/libGLESv2_adreno.so (EsxShaderCompiler::CompileProgram(EsxContext*, EsxProgram const*, EsxLinkedList const*, EsxLinkedList const*, EsxInfoLog*)+1700)
#05 pc 00127b57  /vendor/lib/egl/libGLESv2_adreno.so (EsxProgram::Link(EsxContext*)+494)
#06 pc 000a881b  /vendor/lib/egl/libGLESv2_adreno.so (EsxContext::LinkProgram(EsxProgram*)+62)
#07 pc 003c12bb  /system/lib/libhwui.so (GrGLProgramBuilder::CreateProgram(GrPipeline const&, GrPrimitiveProcessor const&, GrProgramDesc*, GrGLGpu*)+750)#08 pc 003569bb  /system/lib/libhwui.so (GrGLGpu::ProgramCache::refProgram(GrGLGpu const*, GrPipeline const&, GrPrimitiveProcessor const&, bool)+658)
#09 pc 00354f97  /system/lib/libhwui.so (GrGLGpu::flushGLState(GrPipeline const&, GrPrimitiveProcessor const&, bool)+38)
#10 pc 00355a65  /system/lib/libhwui.so (GrGLGpu::draw(GrPipeline const&, GrPrimitiveProcessor const&, GrMesh const*, GrPipeline::DynamicState const*, int)+72)
#11 pc 0034d2bd  /system/lib/libhwui.so (GrOpFlushState::executeDrawsAndUploadsForMeshDrawOp(unsigned int, SkRect const&)+184)
#12 pc 0039ff91  /system/lib/libhwui.so (GrRenderTargetOpList::onExecute(GrOpFlushState*)+204)
#13 pc 00396df1  /system/lib/libhwui.so (GrDrawingManager::executeOpLists(int, int, GrOpFlushState*)+304)
#14 pc 00396a8b  /system/lib/libhwui.so (GrDrawingManager::internalFlush(GrSurfaceProxy*, GrResourceCache::FlushType, int, GrBackendSemaphore*)+966)
#15 pc 003970db  /system/lib/libhwui.so (GrDrawingManager::prepareSurfaceForExternalIO(GrSurfaceProxy*, int, GrBackendSemaphore*)+58)
#16 pc 0035f723  /system/lib/libhwui.so (android::uirenderer::skiapipeline::SkiaPipeline::renderFrame(android::uirenderer::LayerUpdateQueue const&, SkRect const&, std::__1::vector<android::sp<android::uirenderer::RenderNode>, std::__1::allocator<android::sp<android::uirenderer::RenderNode>>> const&, bool, bool, android::uirenderer::Rect const&, sk_sp<SkSurface>)+130)
#17 pc 0035ed8b  /system/lib/libhwui.so (android::uirenderer::skiapipeline::SkiaOpenGLPipeline::draw(android::uirenderer::renderthread::Frame const&, SkRect const&, SkRect const&, android::uirenderer::FrameBuilder::LightGeometry const&, android::uirenderer::LayerUpdateQueue*, android::uirenderer::Rect const&, bool, bool, android::uirenderer::BakedOpRenderer::LightInfo const&, std::__1::vector<android::sp<android::uirenderer::RenderNode>, std::__1::allocator<android::sp<android::uirenderer::RenderNode>>
#18 pc 00099b2b  /system/lib/libhwui.so (android::uirenderer::renderthread::CanvasContext::draw()+150)
#19 pc 003624b5  /system/lib/libhwui.so (_ZNSt3__110__function6__funcIZN7android10uirenderer12renderthread13DrawFrameTask11postAndWaitEvE3$_0NS_9allocatorIS6_EEFvvEEclEv$c303f2d2360db58ed70a2d0ac7ed911b+576)
#20 pc 0032afcf  /system/lib/libhwui.so (android::uirenderer::WorkQueue::process()+122)
#21 pc 000a256f  /system/lib/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+178)
#22 pc 0000c08b  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+166)
#23 pc 000632b5  /system/lib/libc.so (__pthread_start(void*)+22)
#24 pc 0001de79  /system/lib/libc.so (__start_thread+24)

If I add android:hardwareAccelerated="false" to fenix's AndroidManifest.xml then I can no longer reproduce the crash. My guess is that the driver's shader cache is not thread safe, and a race between webrender and the android UI compiling shaders causes the crash. That would be a really unfortunate fix to have to make..

Sotaro, can you think of any way we can synchronize webrender's render thread and Android UI renderer thread to ensure they do not run at the same time?

Flags: needinfo?(sotaro.ikeda.g)
Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]
Flags: needinfo?(jmathies)

I think we now have a reasonable plan here:

  • Disable UI hardware acceleration globally in Fenix's AndroidManifest.xml
  • We can then enable hardware acceleration on a per-window basis: https://developer.android.com/guide/topics/graphics/hardware-accel
  • We can use Application.registerActivityLifecycleCallbacks to do this in a single place for all of Fenix's Activities when they are created (Fenix already uses these callbacks for other purposes)

The question is which devices to keep hardware acceleration disabled for. Here is a crash stats search for libllvm-glnext.so. It seems that 505, 506, 510, and 530 are affected, then there is a huge drop in numbers. Also based on those numbers I would guess that not all 530 devices are affected, otherwise I would think the numbers would be higher than say for 506. In any case, using the GL renderer string is trickier in Fenix than gecko as it is not readily available. It's possible to create an offscreen EGL and GL context to query it, but this takes a non-zero amount of milliseconds so doesn't seem ideal during startup.

So instead I think we should use information available in the android.os.Build package. Using the model would be very precise, but there is a long list of models. So I think our best bet is to use the board. If we facet the crash stats search on android board, the top 5 results cover over 90% of the crashes. That's msm8953, msm8976, msm8937, msm8996, and sdm450_mh4x. There may be devices using these boards with non-broken drivers, but it's not the end of the world for them to have hardware acceleration disabled.

Then it's down to <1.0% per board. Some of these will certainly be the same crash, but equally some of them may be random unrelated crashes which also occur in libllvm-glnext.so. But I think it's okay if we don't cover 100% of them - As far as we're aware this doesn't repeatedly crash, just once or twice until all the shaders are compiled and cached, so the fix doesn't have to be perfect.

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ]

Whoops, my tab was stale so accidentally reverted Jim's signature update

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]

Created fenix pull request here: https://github.com/mozilla-mobile/fenix/pull/18817

No gecko changes should be required if that lands.

Per https://github.com/mozilla-mobile/fenix/pull/18817#issuecomment-814230892, it sounds like the upstream fix is intended to ride whatever release it lands on. Not sure if there's any kind of Gecko-side workaround we could implement for 88 in the mean time as at least a half-fix?

Other than disabling webrender, no I can't think of one.

Are the numbers bad enough to warrant that? Yes they spike badly but each user should only crash once after each upgrade, so averaged over the course of the cycle they're not particularly high.

Flags: needinfo?(jnicol)

No, if we don't have a relatively low-risk option for mitigating this in 88, we should just live with it for another cycle.

Add more signatures

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] [@ libGLESv2_adreno.s…
Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] [@ libGLESv2_adreno.s… → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ]

I have found out some fascinating things about this crash, which I will document shortly, but don't think I'm any closer to having a workaround unfortunately.

I will keep digging, but in the meantime:

Searching the crash reports for all libllvm-glnext.so signatures, we can see that 85% of the crashes occur at address 0x54. This matches the crash I can reproduce locally. It's probably a safe assumption that all crashes at 0x54 are the same bug, and the others crashes are random unrelated ones.

Filtering by that address shows all the crashes occur on Android SDK version 28 (Android 9), with Adreno driver version V@331.0. This again matches what I can reproduce on locally.

And the GPUs are almost all Adreno 505 and 506, as we originally suspected in this bug. In comment 29 I discussed eg the 530 being affected too, but I think they are crashing at a different address, so likely a different bug (and much lower numbers).

So, until I'm able to find a proper solution to this, I think blocking webrender on (Adreno 505 OR 506) AND android 9 would be reasonable. Android 9 is only 18% or our 505/506 users, and Adreno 505/506 is about 7% of our android population. So this would be disabling webrender for ~1% of users to stop 85% of these libllvm-glnext.so crashes.

Crash Signature: [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x9984b8 ] → [@ libllvm-glnext.so@0x732bb0 ] [@ libllvm-glnext.so@0x732610 ] [@ libllvm-glnext.so@0x732ba0 ] [@ libllvm-glnext.so@0x732600 ] [@ libllvm-glnext.so@0x7acb6c ] [@ libllvm-glnext.so@0x732610 ]

On Adreno 505 and 506 devices we encounter frequent crashes when
compiling shaders. We previously attempted to work around this by
ensuring that the source strings were always unique, as we believed it
may be due to buggy caching internally in the driver. This did not
have any effect, however, so this patch reverts the attempted
workaround.

We encounter frequent crashes in glLinkProgram on some Adreno
devices. This is likely due to a driver bug, but we have been unable
to figure out the exact cause and work around it.

According to the crash data, this bug appears to affect Adreno 505 and
506 devices running Android 9 only. Disable webrender on these devices
in order to avoid the crash.

Depends on D114949

Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/baa304ec5f8a
Revert previous attempt to work around Adreno shader compilation crash. r=nical
https://hg.mozilla.org/integration/autoland/rev/3686cde7ffa5
Disable webrender on devices affected by Adreno shader compilation crash. r=nical
Attached patch Patch for betaSplinter Review

Patch rebased on beta (slight conflict due to Adreno 3xx being enabled on nightly but not beta)

Comment on attachment 9221767 [details] [diff] [review]
Patch for beta

Beta/Release Uplift Approval Request

  • User impact if declined: Lots of crashes, spiking after each release
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Switches back to layers graphics backend for some devices.
  • String changes made/needed:
Attachment #9221767 - Flags: approval-mozilla-beta?

Comment on attachment 9221767 [details] [diff] [review]
Patch for beta

Approved for Fenix 89.0.0-beta.7.

Jamie, did you intend for this bug to still be open after the latest patches?

Flags: needinfo?(jnicol)
Attachment #9221767 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

No, good catch. the leave-open was from the patch many months ago.

Let's close this bug now. This should avoid the vast majority of the crashes. There will still be some as the signature list isn't correct: we have sort of just dumped any libllvm-glnext.so signatures in it. But on recent investigation some of them are unrelated. Once the dust settles we can create another bug for remaining signatures, and another bug for further investigation and proper solution to this one.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jnicol)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
Blocks: 1715746
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: