Closed Bug 1758473 Opened 2 years ago Closed 2 years ago

[NVIDIA] Crash in [@ XDisplayString] from VAAPIIsSupported: Consider blocking vdpau_drv_video.so from being loaded

Categories

(Core :: Widget: Gtk, defect)

Firefox 99
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
103 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- disabled
firefox98 --- unaffected
firefox99 --- disabled
firefox100 --- disabled
firefox101 --- disabled
firefox102 --- disabled
firefox103 --- fixed

People

(Reporter: gcp, Assigned: rmader)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/8078454a-bf79-45bb-be8c-0624e0220307

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0 libX11.so.6 XDisplayString 
1 vdpau_drv_video.so __vaDriverInit_1_13 
2 libva.so.2 va_infoMessage 
3 libva.so.2 vaInitialize 
4 libxul.so mozilla::widget::VAAPIIsSupported widget/gtk/VAAPIUtils.cpp:151
5 libxul.so gfxPlatformGtk::InitVAAPIConfig gfx/thebes/gfxPlatformGtk.cpp:260
6 libxul.so gfxPlatformGtk::gfxPlatformGtk gfx/thebes/gfxPlatformGtk.cpp:115
7 libxul.so gfxPlatform::Init gfx/thebes/gfxPlatform.cpp:903
8 libxul.so gfxPlatform::GetPlatform gfx/thebes/gfxPlatform.cpp:465
9 libxul.so gfxPlatform::InitializeCMS gfx/thebes/gfxPlatform.cpp:2083

Bit of a guess as to the regressor. Was looking for fallout from bug 1129492.

Regressed by: 1751987
Summary: Crash in [@ XDisplayString] → Crash in [@ XDisplayString] from VAAPIIsSupported
Has Regression Range: --- → yes

:stransky, since you are the author of the regressor, bug 1751987, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(stransky)

Set release status flags based on info from the regressing bug 1751987

This is a problem with NVIDIA proprietary drivers.

I see the same crashes from RDD process:
https://crash-stats.mozilla.org/report/index/8b6a51fe-5d6c-4e24-9b69-f8a360220313

Flags: needinfo?(stransky)
Summary: Crash in [@ XDisplayString] from VAAPIIsSupported → [NVIDIA] Crash in [@ XDisplayString] from VAAPIIsSupported

Yes, we check the VA-API status only when it's enabled by StaticPrefs::media_ffmpeg_vaapi_enabled() here so it's disabled by default.

btw. This may be reported to NVIDIA proprietary drivers bugtracker - no idea why it open X display connection from vaDriverInit, we use DRM VA-API device for decode, not X11 one. Maybe some bug in VA-API -> vdpau bridge.

I don't think a disabled feature should be an S2 bug. Reducing until we ship.

Severity: S2 → S4

It might be this one: https://salsa.debian.org/multimedia-team/attic/vdpau-video/-/blob/63450ffea86143d418c6e83cb8d2828d3a7beb25/src/vdpau_driver.c#L188

const char * const x11_dpy_name = XDisplayString(driver_data->x11_dpy);

https://bugs.archlinux.org/task/72241#comments

vaGetDisplayDRM() doesn't fill ->x11_dpy

VAAPI should be blocked for vdpau_drv_video.so.
vdpau_drv_video.so is deprecated and has been removed from Debian.
Debian Buster (oldstable) was the last release that had a package for it: https://packages.debian.org/oldstable/vdpau-va-driver
https://tracker.debian.org/pkg/vdpau-video
https://salsa.debian.org/multimedia-team/attic/vdpau-video

Summary: [NVIDIA] Crash in [@ XDisplayString] from VAAPIIsSupported → [NVIDIA] Crash in [@ XDisplayString] from VAAPIIsSupported: Consider blocking vdpau_drv_video.so from being loaded

This is a classic "driver so broken that touching it risks crashing everything". That historically the role for glxtest, so I think what we should do here is move VAAPIIsSupported() and its dependencies (VAAPIGetAcceleratedProfiles() and VAAPIGetAcceleratedProfilesInternal()) there. I only see them used for the initial test, so we don't seem to need them anywhere else.

That will still make FF fall back to software rendering, but at least the browser starts.

Some VAAPI drivers are so broken that trying to use them
crashes Firefox. This is nothing entirely new for GPU drivers
and which is why we have glxtest.

Thus move the test from VAAPIUtils there. This has the
additional benefit of only doing the test once.

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED

IIUC, your patch would cause the crash (at a different place) even for Nvidia users who haven't force-enabled VAAPI. (?)

At the moment, VAAPIIsSupported() is only called if EGL is used and if the VAAPI feature is enabled by whitelist rule (Intel/AMD).
It's not enabled on devices that are recognized as Nvidia.
This crash can only occur

  • if an Nvidia user force-enables VAAPI while having deprecated vdpau_drv_video.so installed.
  • in case there is a so far unknown bug that Intel or AMD are detected as GPU but the DRM device is Nvidia.

Hm, that's indeed a valid point. however we'd at least be consistent: if a system has EGL drivers installed that crash, we'll also not wait for the GfxInfo blocklist. IMHO systems with crashing drivers should always be seen as broken and get software rendering, and we shouldn't try everything we can to make things work. Hard call though.

I don't think we should do this.

The crash here comes from RDD process which is designed for it - when it crashes we just restart it. It enables only approved and potentially working drivers which reduces crash probability. It also has strong sadbox which is different than glxtest environment so it's not guaranteed that glxtest pass will mean RDD pass (Bug 1771382).

I think the better solution is to track RDD crashes and disable VA-API entirely after some initial crashes (2-3?) - Bug 1762544.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #14)

I don't think we should do this.

The crash here comes from RDD process which is designed for it - when it crashes we just restart it. It enables only approved and potentially working drivers which reduces crash probability. It also has strong sadbox which is different than glxtest environment so it's not guaranteed that glxtest pass will mean RDD pass (Bug 1771382).

I think the better solution is to track RDD crashes and disable VA-API entirely after some initial crashes (2-3?) - Bug 1762544.

Hm, the sandbox argument is invalid as we run VAAPIIsSupported() from parent process which is not sandboxed too.

Robert, is is possible to check VA-API in glxtest according to nsIGfxInfo::FEATURE_VAAPI status?

Flags: needinfo?(robert.mader)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #16)

Robert, is is possible to check VA-API in glxtest according to nsIGfxInfo::FEATURE_VAAPI status?

Tough, given that glxtest is needed to determine on which hardware we're running. However, I'm looking into letting the child process fork again before doing the VA-API check so a crash there doesn't impact the rest.

Flags: needinfo?(robert.mader)

(In reply to Robert Mader [:rmader] from comment #17)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #16)

Robert, is is possible to check VA-API in glxtest according to nsIGfxInfo::FEATURE_VAAPI status?

Tough, given that glxtest is needed to determine on which hardware we're running. However, I'm looking into letting the child process fork again before doing the VA-API check so a crash there doesn't impact the rest.

Yeah, AFAIK NVIDIA is supported now so we can't throw away WebRender on NVIDIA just because it crashes on VA-API init which is disabled by default anyway.

Attachment #9280813 - Attachment description: Bug 1758473 - Move VAAPI test into glxtest, r=stransky → Bug 1758473 - Move VA-API test into glxtest, r=stransky

Updated the patch to do the VA-API test in an own process, logging appropriate errors in case of crashes. Try: https://treeherder.mozilla.org/jobs?repo=try&revision=dc471f57e006cb55c1604a4b27b499db75292cfe

Pushed by robert.mader@posteo.de:
https://hg.mozilla.org/integration/autoland/rev/6d7485c81e90
Move VA-API test into glxtest, r=stransky
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 103 Branch
See Also: → 1777927
See Also: → 1787182
Regressions: 1787182
See Also: 1787182
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: