Android crash in [@ mozilla::WebGLContext::UniformData]
Categories
(Core :: Graphics: CanvasWebGL, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox105 | --- | unaffected |
firefox106 | --- | unaffected |
firefox107 | + | disabled |
firefox108 | --- | fixed |
People
(Reporter: cpeterson, Assigned: lsalzman)
References
(Regression)
Details
(Keywords: crash, regression, topcrash, Whiteboard: [geckoview:m108])
Crash Data
Attachments
(1 file, 1 obsolete file)
There was a recent spike of these crashes in Fenix Nightly 107.0a1, but there was also spike in Fenix 99.1.1 and 99.2.0 for some reason.
Crash report: https://crash-stats.mozilla.org/report/index/0fddf748-6740-42db-98f5-758460221008
Reason: SIGBUS / BUS_ADRALN
Top 10 frames of crashing thread:
0 libxul.so mozilla::WebGLContext::UniformData const dom/canvas/WebGLContextGL.cpp:1358
1 libxul.so mozilla::HostWebGLContext::UniformData const dom/canvas/HostWebGLContext.h:598
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, const dom/canvas/WebGLCommandQueue.h:246
1 libxul.so std::__ndk1::__invoke_constexpr<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/type_traits:3507
1 libxul.so std::__ndk1::__apply_tuple_impl<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/tuple:1390
1 libxul.so std::__ndk1::apply<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/tuple:1399
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, dom/canvas/WebGLCommandQueue.h:237
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, dom/canvas/WebGLCommandQueue.h:251
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, dom/canvas/WebGLCommandQueue.h:251
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher, dom/canvas/WebGLCommandQueue.h:251
Comment 1•2 years ago
|
||
All the crashes come from the GPU process and it seems to be specific to 32-bit builds. There's a wide range of hardware in the crashes (both different device and GPU vendors) so it looks like this isn't a device-specific issue.
Comment 2•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on nightly
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 3•2 years ago
|
||
@ Kelsey and Chun-Min, do you think either of your changes could to have caused this Android WebGLContext crash regression?
- Kelsey preffing on
webgl.out-of-process
on Android in bug 1793679 - Chun-Min implementing VideoFrame in bug 1774300 (which touches some canvas code)
This is an old crash signature, but it spiked in 107.0a1. The earliest 107.0a1 crash reports are from build ID 20221005094233. Here is the pushlog between 2022-10-04 and 20221005094233, which includes both bugs:
Curiously, 99% of these crash reports are from 32-bit ARM.
About 80% of the crash reports are from the GPU process, 20% from the parent process.
Reporter | ||
Updated•2 years ago
|
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 4•2 years ago
|
||
Two more crash signatures with similar stack traces:
[@ mozilla::gl::GLContext::fUniform1fv]
[@ mozilla::gl::GLContext::fUniform2fv]
Comment 5•2 years ago
|
||
The bug is marked as tracked for firefox107 (beta). However, the bug still isn't assigned.
:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 6•2 years ago
|
||
It's hard to see how bug 1774300 can cause the crash from the stack. Keep NI for now
Comment 7•2 years ago
|
||
Hey Bob, can we push the remote webgl change back to 108 to avoid shipping with this? (REO triage)
Comment 8•2 years ago
|
||
Jamie and I spoke about this yesterday, and he is hinting that OOP WebGL will likely have to be disabled because of the issues that are arising (like this). I'll emphasis this with him today.
Comment 9•2 years ago
|
||
I'll disable it for beta but keep OOP-webgl enabled on Nightly, in the hopes that a user will come forward with a crashing URL to help us diagnose the crash.
Assignee | ||
Comment 10•2 years ago
|
||
The UniformData call accepts a Range<uint8_t>. When serialized by ClientWebGLContext,
however, the data is only serialized to the alignment of the supplied uint8_t type.
The HostWebGLContext may then alias it to a Range<uint32_t> or similar, while the data
was only aligned to 1-byte alignment. On some platforms such as ARM, these unaligned
accesses can cause a SIGBUS.
As a temporary workaround, we will now pessimistically align such ranges as if they
might be accessed as any possible data type with kUniversalAlignment.
Comment 11•2 years ago
|
||
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0cbb0a1bb813 Ensure worst case alignment of UniformData Range when serialized. r=jgilbert
Comment 12•2 years ago
•
|
||
Backed out for causing multiple WebGL related failures.
- Failure log when it fails with Assertion failure: [GFX1]: mBindFailureGuard failure: Generating error <enum 0x0500>: WebGL warning: bindBuffer: target: Invalid enum value <enum 0x0004>, at /builds/worker/checkouts/gecko/gfx/2d/Logging.h:754
- Failure log when it fails with TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__offscreencanvas__offscreencanvas-transfer-image-bitmap.html | OffscreenCanvas.webgl: This pixel should be [255, 255, 0, 255], but it is: [255, 0, 0, 255].
- Failure log when it fails with TEST-UNEXPECTED-FAIL | dom/canvas/test/test_offscreencanvas_toimagebitmap.html | [after gl.clear] gl.readPixels(0,0,1,1) was [0,0,0,0], expected [0,255,0,255]
- Failure log when it fails with REFTEST TEST-UNEXPECTED-FAIL | dom/canvas/test/reftest/webgl-clear-test.html == dom/canvas/test/reftest/wrapper.html?green.png | image comparison, max difference: 255, number of differing pixels: 65536
- Failure log when it fails with REFTEST TEST-UNEXPECTED-FAIL | dom/canvas/test/reftest/color_quads.html?e_context=webgl == dom/canvas/test/reftest/color_quads.html?= | image comparison, max difference: 255, number of differing pixels: 130000
- Failure log when it fails with REFTEST PROCESS-CRASH | Last test finished | application crashed [@ mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)]
- Failure log when it fails with PROCESS-CRASH | /html/canvas/element/manual/imagebitmap/imageBitmapRendering-transferFromImageBitmap-webgl.html | application crashed [@ mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)]
- Failure log when it fails with TEST-UNEXPECTED-FAIL | /html/canvas/offscreen/manual/the-offscreen-canvas/offscreencanvas.resize.html | Verify that writing to the width and height attributes of an OffscreenCanvas works when there is a webgl context attached. - assert_equals: expected 30 but got 1
Assignee | ||
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Pushed by lsalzman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4bf6c23773ea Ensure worst case alignment of UniformData Range when serialized. r=jgilbert
Comment 14•2 years ago
|
||
bugherder |
Updated•2 years ago
|
Updated•2 years ago
|
Comment 15•2 years ago
|
||
We should disable webgl out-of-process on 107, since we turned it on "because why not".
Comment 16•2 years ago
|
||
Oops, that aspect was fixed in bug 1797347.
Updated•2 years ago
|
Comment 17•2 years ago
|
||
:lsalzman/:jnicol sorry for the double needinfo
Bug 1797347 supposedly disabled this in Beta, and made its way to Fenix 107.0b4.
107 is marked as disabled in Bug 1794237
The volume of crashes in the table for 107.0b4 are similar to before it was disabled in Bug 1797347.
Setting 107 back to affected while this is investigated.
Updated•2 years ago
|
Assignee | ||
Comment 18•2 years ago
|
||
Jamie, it still looks like out-of-process is enabled in 107 somehow?
Comment 19•2 years ago
|
||
Oh, whoops. As Lee points out, XP_LINUX
evaluates true on Android here, so we are still enabling the pref on Android by accident.
Comment 20•2 years ago
|
||
This was accidentally left enabled by the previous patch, since
XP_LINUX is true on Android. We must therefore use
defined(XP_LINUX) && !defined(ANDROID)
instead, as we did prior to enabling OOP webgl on Android in the first
place.
Comment 21•2 years ago
|
||
Comment on attachment 9301400 [details]
Bug 1794237 - Actually disable OOP webgl on Android on non-nightly. r?lsalzman
Revision D160951 was moved to bug 1797347. Setting attachment 9301400 [details] to obsolete.
Comment 22•2 years ago
|
||
The patch landed in nightly and beta is affected.
:lsalzman, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox107
towontfix
.
For more information, please visit auto_nag documentation.
Comment 23•2 years ago
|
||
Follow-up patch in Bug 1797347 uplifted to 107. It will be included in Fenix/Focus 107.0b6.
Setting 107 fixed on this bug, but will monitor Fenix/Focus 107.0b6 stability.
Comment 24•2 years ago
|
||
Slight correction, 107 should be disabled
Description
•