Closed Bug 1794237 Opened 2 years ago Closed 2 years ago

Android crash in [@ mozilla::WebGLContext::UniformData]

Categories

(Core :: Graphics: CanvasWebGL, defect)

ARM
Android
defect

Tracking

()

RESOLVED FIXED
108 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox105 --- unaffected
firefox106 --- unaffected
firefox107 + disabled
firefox108 --- fixed

People

(Reporter: cpeterson, Assigned: lsalzman)

References

(Regression)

Details

(Keywords: crash, regression, topcrash, Whiteboard: [geckoview:m108])

Crash Data

Attachments

(1 file, 1 obsolete file)

There was a recent spike of these crashes in Fenix Nightly 107.0a1, but there was also spike in Fenix 99.1.1 and 99.2.0 for some reason.

Crash report: https://crash-stats.mozilla.org/report/index/0fddf748-6740-42db-98f5-758460221008

Reason: SIGBUS / BUS_ADRALN

Top 10 frames of crashing thread:

0 libxul.so mozilla::WebGLContext::UniformData const dom/canvas/WebGLContextGL.cpp:1358
1 libxul.so mozilla::HostWebGLContext::UniformData const dom/canvas/HostWebGLContext.h:598
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  const dom/canvas/WebGLCommandQueue.h:246
1 libxul.so std::__ndk1::__invoke_constexpr<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/type_traits:3507
1 libxul.so std::__ndk1::__apply_tuple_impl<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/tuple:1390
1 libxul.so std::__ndk1::apply<mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  /builds/worker/fetches/android-ndk/sources/cxx-stl/llvm-libc++/include/tuple:1399
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  dom/canvas/WebGLCommandQueue.h:237
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  dom/canvas/WebGLCommandQueue.h:251
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  dom/canvas/WebGLCommandQueue.h:251
1 libxul.so mozilla::MethodDispatcher<mozilla::WebGLMethodDispatcher,  dom/canvas/WebGLCommandQueue.h:251

All the crashes come from the GPU process and it seems to be specific to 32-bit builds. There's a wide range of hardware in the crashes (both different device and GPU vendors) so it looks like this isn't a device-specific issue.

Hardware: Unspecified → ARM

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on nightly

For more information, please visit auto_nag documentation.

Keywords: topcrash

@ Kelsey and Chun-Min, do you think either of your changes could to have caused this Android WebGLContext crash regression?

  • Kelsey preffing on webgl.out-of-process on Android in bug 1793679
  • Chun-Min implementing VideoFrame in bug 1774300 (which touches some canvas code)

This is an old crash signature, but it spiked in 107.0a1. The earliest 107.0a1 crash reports are from build ID 20221005094233. Here is the pushlog between 2022-10-04 and 20221005094233, which includes both bugs:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=73c16d284362ba24606a516cd454dd3fe395b9b6&tochange=c14f7934269f333be9e65958c7a012899b3123bd

Curiously, 99% of these crash reports are from 32-bit ARM.

About 80% of the crash reports are from the GPU process, 20% from the parent process.

Flags: needinfo?(jgilbert)
Flags: needinfo?(cchang)
Whiteboard: [geckoview:m108]

Two more crash signatures with similar stack traces:

[@ mozilla::gl::GLContext::fUniform1fv]
[@ mozilla::gl::GLContext::fUniform2fv]

Crash Signature: [@ mozilla::WebGLContext::UniformData] → [@ mozilla::gl::GLContext::fUniform1fv] [@ mozilla::gl::GLContext::fUniform2fv] [@ mozilla::WebGLContext::UniformData]

The bug is marked as tracked for firefox107 (beta). However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Assignee: nobody → jnicol
Flags: needinfo?(bhood)

It's hard to see how bug 1774300 can cause the crash from the stack. Keep NI for now

Hey Bob, can we push the remote webgl change back to 108 to avoid shipping with this? (REO triage)

Flags: needinfo?(cchang) → needinfo?(bhood)

Jamie and I spoke about this yesterday, and he is hinting that OOP WebGL will likely have to be disabled because of the issues that are arising (like this). I'll emphasis this with him today.

Flags: needinfo?(bhood)

I'll disable it for beta but keep OOP-webgl enabled on Nightly, in the hopes that a user will come forward with a crashing URL to help us diagnose the crash.

Depends on: 1797347

The UniformData call accepts a Range<uint8_t>. When serialized by ClientWebGLContext,
however, the data is only serialized to the alignment of the supplied uint8_t type.
The HostWebGLContext may then alias it to a Range<uint32_t> or similar, while the data
was only aligned to 1-byte alignment. On some platforms such as ARM, these unaligned
accesses can cause a SIGBUS.

As a temporary workaround, we will now pessimistically align such ranges as if they
might be accessed as any possible data type with kUniversalAlignment.

Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0cbb0a1bb813
Ensure worst case alignment of UniformData Range when serialized. r=jgilbert

Backed out for causing multiple WebGL related failures.

Push with failures.

Backout link.

Flags: needinfo?(lsalzman)
Flags: needinfo?(lsalzman)
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4bf6c23773ea
Ensure worst case alignment of UniformData Range when serialized. r=jgilbert
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 108 Branch
Flags: needinfo?(jgilbert)
Assignee: jnicol → lsalzman

We should disable webgl out-of-process on 107, since we turned it on "because why not".

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Oops, that aspect was fixed in bug 1797347.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED

:lsalzman/:jnicol sorry for the double needinfo
Bug 1797347 supposedly disabled this in Beta, and made its way to Fenix 107.0b4.
107 is marked as disabled in Bug 1794237
The volume of crashes in the table for 107.0b4 are similar to before it was disabled in Bug 1797347.
Setting 107 back to affected while this is investigated.

Flags: needinfo?(lsalzman)
Flags: needinfo?(jnicol)

Jamie, it still looks like out-of-process is enabled in 107 somehow?

Flags: needinfo?(lsalzman)

Oh, whoops. As Lee points out, XP_LINUX evaluates true on Android here, so we are still enabling the pref on Android by accident.

Flags: needinfo?(jnicol)

This was accidentally left enabled by the previous patch, since
XP_LINUX is true on Android. We must therefore use

defined(XP_LINUX) && !defined(ANDROID)

instead, as we did prior to enabling OOP webgl on Android in the first
place.

Comment on attachment 9301400 [details]
Bug 1794237 - Actually disable OOP webgl on Android on non-nightly. r?lsalzman

Revision D160951 was moved to bug 1797347. Setting attachment 9301400 [details] to obsolete.

Attachment #9301400 - Attachment is obsolete: true

The patch landed in nightly and beta is affected.
:lsalzman, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox107 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(lsalzman)

Follow-up patch in Bug 1797347 uplifted to 107. It will be included in Fenix/Focus 107.0b6.
Setting 107 fixed on this bug, but will monitor Fenix/Focus 107.0b6 stability.

Flags: needinfo?(lsalzman)
Regressions: 1798703

Slight correction, 107 should be disabled

Regressed by: 1793679
See Also: → 1810623
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: