Open Bug 1900134 Opened 1 year ago Updated 2 months ago

Crash in [@ IPCError-content | GPUProcessKill]

Categories

(Core :: Graphics, defect)

Unspecified
Android
defect

Tracking

()

People

(Reporter: mccr8, Unassigned, NeedInfo)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(3 files)

174.21 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document
Details
59.79 KB, image/png
Details
106.15 KB, image/jpeg
Details

Crash report: https://crash-stats.mozilla.org/report/index/23d4dceb-418a-494b-9070-320c40240531

Reason: DUMP_REQUESTED

Top 10 frames:

0  libc.so  libc.so@0x1beac
1  libmozglue.so  mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&)  mozglue/misc/ConditionVariable_posix.cpp:106
2  libxul.so  mozilla::OffTheBooksCondVar::Wait()  xpcom/threads/CondVar.h:58
2  libxul.so  mozilla::Monitor::Wait()  xpcom/threads/Monitor.h:37
2  libxul.so  mozilla::detail::BaseMonitorAutoLock<mozilla::Monitor>::Wait()  xpcom/threads/Monitor.h:138
2  libxul.so  nsAppShell::Queue::Pop(bool)  widget/android/nsAppShell.h:196
2  libxul.so  nsAppShell::ProcessNextNativeEvent(bool)  widget/android/nsAppShell.cpp:637
3  libxul.so  nsBaseAppShell::DoProcessNextNativeEvent(bool)  widget/nsBaseAppShell.cpp:131
3  libxul.so  nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool)  widget/nsBaseAppShell.cpp:267
3  libxul.so  {virtual override thunk({offset(-8)}, nsBaseAppShell::OnProcessNextEvent(nsIT...  widget/nsBaseAppShell.h:0

I'm not sure how actionable this is, but there seems like a solid volume of it on Nightly Android, so I figured I'd file. In the crashes I saw, we're waiting on a mutex for some native event, maybe? Could this be some kind of deadlock?

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on nightly

For more information, please visit BugBot documentation.

Keywords: topcrash

The severity field is not set for this bug.
:bhood, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

This is spiking pretty heavily on Release 128 now too.

Flags: needinfo?(jnicol)

The crash is from bug 1880503 - we've replaced what was previously either a deadlock, or just an incredibly slow to respond GPU process, with a GPU process crash. This will be a better experience for users, and additionally gives us more information about what's going on. Previously we knew these hangs were occuring due to sentry reports (bug 1855536), but the sentry reports contained very little useful information compared to crash reports

I need to go through the reports and determine what the root problem(s) are.

Severity: -- → S3
Flags: needinfo?(jnicol)
See Also: → 1908798

Looking through a few dozen reports, these mostly fall in to a few categories:

  • By far most common, we are attempting to pause the compositor. The renderer thread is stuck. Usually in eglSwapBuffers, less frequently in eglSetDamageRegion.
  • Occasionally we are stuck whilst resuming the compositor instead.
  • There are some which indicate we are handling a memory pressure, but I'm not sure whether to believe these stacks as NS_DispatchMemoryPressure should only be called on main thread.
  • A few waiting for a remote texture
  • When destroying a webrender renderer, and the renderer thread is stuck in glDeleteTextures. Probably not all that different circumstances to the pause/resume cases in that the GPU is probably overwhelmed.
Attached file logcat.docx β€”

I was able to reproduce the crash by browsing this page: https://threadreaderapp.com/thread/1850535311694057598.html

https://crash-stats.mozilla.org/report/index/e7954f43-c3d3-495e-a06d-906450241028

Reproduced the crash on today's Nightly 133.0a1, with a OnePlus 5T (Android 10).

The November spike is bug 1929209: WebRender SVG filters were enabled in Fx132``

See Also: → 1929209

(In reply to miralobontiu from comment #7)

Created attachment 9433631 [details]
logcat.docx

I was able to reproduce the crash by browsing this page: https://threadreaderapp.com/thread/1850535311694057598.html

https://crash-stats.mozilla.org/report/index/e7954f43-c3d3-495e-a06d-906450241028

Reproduced the crash on today's Nightly 133.0a1, with a OnePlus 5T (Android 10).

I'm looking into blocklisting the SVG filter acceleration on certain Adreno driver versions to resolve this for now, deeper investigation will follow.

Side note: I find that page interesting because I don't find any SVG filter usage in its page source (specifically the <filter> tag, or various CSS filters in combination with SVG paths), so I am wondering if filters are triggering incorrectly on the SVG paths in the page. But it's probably coming from one of the other pieces it is loading.

This crash is showing up under this signature too, but with significantly less volume.

Crash Signature: [@ IPCError-content | GPUProcessKill] → [@ IPCError-content | GPUProcessKill] [@ nsAppShell::Queue::Pop]

@jnicol Do you have the OnePlus 5T mentioned in comment #8 to repro this? I'm happy to craft a blocklist for svgfe if it's responsible but I am having a hard time reconstructing the theory that pointed to it after all of this time, and there have been some fixes in the interim that might possibly affect it.

Flags: needinfo?(jnicol)

This signature occurs when we receive a timeout for sync IPC with the GPU process, eg the GPU process is not responsive. This spiked in bug 1929209 because the SVGFE shader took >30s to compile on some adreno drivers. Due to the glsl-optimizer handling switch-case atrociously, and the shader just being a giant switch case. We restructured the shader to use a big if-else instead of switch-case and that solved the problem. The spike went away and we're back to baseline numbers here.

So we don't need to blocklist anything.

Flags: needinfo?(jnicol)

I've started getting this crash error more and more lately.

(In reply to eclaudiu64 from comment #13)

Created attachment 9500113 [details]
Screenshot_20250711-201859-603.png

I've started getting this crash error more and more lately.

I get it when accessing some sites, from what I see.

Any sites in particular? Could you share the socorro links from about:crashes here?

Flags: needinfo?(emanuellclaudiu)

(In reply to Jamie Nicol [:jnicol] from comment #15)

Any sites in particular? Could you share the socorro links from about:crashes here?

This is one of the sites, for example: https://www.veed.io/convert/video-converter , after I put a larger file, over 900 MB, 1 GB, then it loads, but this crash is also displayed
Crash details: https://crash-stats.mozilla.org/report/index/59015fac-2fa5-4aac-82e6-294790250711

Flags: needinfo?(emanuellclaudiu) → needinfo?(jnicol)
Attached image IMG_20250722003412586.jpg β€”

I'll add something else: he also gave me 2 socorros, the link specified in the previous comment and this one: https://crash-stats.mozilla.org/report/index/f99339c8-f9ae-405e-8101-fb5880250711 and also at the same mentioned site.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: