Don't fire glxtest process for remote instances (VAAPI test sometimes hangs Firefox on AMD. bug 1813500 hotfix lets users continue but with disabled hardware rendering.)
Categories
(Core :: Widget: Gtk, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox104 | --- | wontfix |
firefox112 | --- | wontfix |
firefox113 | --- | wontfix |
firefox114 | --- | fixed |
People
(Reporter: o.freyermuth, Assigned: stransky)
References
(Blocks 3 open bugs, Regression)
Details
(Keywords: hang, regression)
Attachments
(10 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0
Steps to reproduce:
Start Firefox, message:
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: glxtest: VA-API test failed: process crashed. Please check your VA-API drivers. (t=0.447447) [GFX1-]: glxtest: VA-API test failed: process crashed. Please check your VA-API drivers.
is shown. A coredump is saved, but Firefox functions normally.
Each time I open any link on my system (which in turn calls Firefox and opens the page), a new coredump is saved, leading to many dozens of coredumps each day.
This issue is a side-effect of #1758473 which reruns the test on each Firefox invocation, causing a coredump on systems with affected drivers every time Firefox is started or any URL is opened (filling up journals / hard drives).
My affected system is a Gentoo Linux x86_64, other users on Debian seem affected, too:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017414
Actual results:
Coredump due to falure of glxtext.
Trace found in coredumps:
#0 XDisplayString (dpy=0x0) at /usr/src/debug/x11-libs/libX11-1.8.1/libX11-1.8.1/src/Macros.c:119
#1 0x00007fccbf6a5fc5 in vdpau_common_Initialize (driver_data=0x7fccbd5df200)
at /usr/src/debug/x11-libs/libva-vdpau-driver-0.7.4-r5/libva-vdpau-driver-0.7.4/src/vdpau_driver.c:188
#2 vdpau_Initialize_Current (ctx=0x7fcccc12fc40) at /usr/src/debug/x11-libs/libva-vdpau-driver-0.7.4-r5/libva-vdpau-driver-0.7.4/src/vdpau_driver_template.h:561
#3 __vaDriverInit_1_15 (ctx=0x7fcccc12fc40) at /usr/src/debug/x11-libs/libva-vdpau-driver-0.7.4-r5/libva-vdpau-driver-0.7.4/src/vdpau_driver.c:317
#4 0x00007fccbf6bce4c in () at /usr/lib64/libva.so.2
#5 0x00007fccbf6bdfc6 in vaInitialize () at /usr/lib64/libva.so.2
#6 0x00007fccc78b1d6f in () at /usr/lib64/firefox/libxul.so
#7 0x00007fccc78b2988 in () at /usr/lib64/firefox/libxul.so
#8 0x00007fccc78b2a66 in () at /usr/lib64/firefox/libxul.so
#9 0x00007fccc78a8043 in () at /usr/lib64/firefox/libxul.so
#10 0x00007fccc78ae846 in () at /usr/lib64/firefox/libxul.so
#11 0x00007fccc78aecaa in () at /usr/lib64/firefox/libxul.so
#12 0x000056302fc4d473 in ()
#13 0x00007fcccc47534a in __libc_start_call_main (main=main@entry=0x56302fc4d0b0, argc=argc@entry=2, argv=argv@entry=0x7ffd51e764f8)
at ../sysdeps/nptl/libc_start_call_main.h:58
#14 0x00007fcccc4753fc in __libc_start_main_impl
(main=0x56302fc4d0b0, argc=2, argv=0x7ffd51e764f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd51e764e8)
at ../csu/libc-start.c:389
#15 0x000056302fc4cf51 in _start ()
Expected results:
Do not coredump on each start.
Comment 1•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•3 years ago
|
||
Please uninstall deprecated libva-vdpau-driver: It doesn't support vaGetDisplayDRM and crashes.
(Darkspirit from bug 1758473 comment #9)
It might be this one: https://salsa.debian.org/multimedia-team/attic/vdpau-video/-/blob/63450ffea86143d418c6e83cb8d2828d3a7beb25/src/vdpau_driver.c#L188
const char * const x11_dpy_name = XDisplayString(driver_data->x11_dpy);
https://bugs.archlinux.org/task/72241#comments
vaGetDisplayDRM() doesn't fill ->x11_dpy
VAAPI should be blocked for vdpau_drv_video.so.
vdpau_drv_video.so is deprecated and has been removed from Debian.
Debian Buster (oldstable) was the last release that had a package for it: https://packages.debian.org/oldstable/vdpau-va-driver
https://tracker.debian.org/pkg/vdpau-video
https://salsa.debian.org/multimedia-team/attic/vdpau-video
Comment 3•3 years ago
|
||
(In reply to Oliver Freyermuth from comment #0)
other users on Debian seem affected, too:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1017414
That one has a different problem, it uses experimental nvidia_drv_video.so.
(Darkspirit from bug 1782408 comment 1)
https://github.com/elFarto/nvidia-vaapi-driver requires CUDA which doesn't save you power and is blocked by the media process sandbox. Allowing CUDA would require further media process sandbox exceptions/holes.
https://github.com/elFarto/nvidia-vaapi-driver/issues/74#issuecomment-1100918826
It's a driver issue. As soon as you initialise cuda, it forces the GPU into a higher power state (at least P2 IIRC), and the nvdec implementation forces you to use cuda to interact with it, so this always happens. It means that using nvdec will never save you power unless you were going to use cuda anyway.
Reporter | ||
Comment 4•3 years ago
|
||
(In reply to Darkspirit from comment #2)
Please uninstall deprecated libva-vdpau-driver: It doesn't support vaGetDisplayDRM and crashes.
Thanks, indeed that fixes it for me. Gentoo currently still has libva-vdpau-driver in stable and it's used as dependency of the latest Kodi release (unless VDPAU support is disabled there). I'll also raise a bug in the Gentoo bugtracker then so the issue with / deprecation of libva-vdpau-driver gets known and this can be reconsidered.
Taking into account that two users are affected by the segfaults: Do you consider this "ok" (since it's a driver bug causing the segfault, in the end), or should this bug be kept open to improve on existing glxtest behaviour (e.g. blacklist "harder" / "earlier" so the coredump / segmentation fault is not triggered each time)?
Reporter | ||
Comment 5•3 years ago
|
||
For reference, in case a Gentoo user ends up here, the downstream issue I reported is at:
https://bugs.gentoo.org/866557
Comment 6•3 years ago
|
||
The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.
Comment 7•3 years ago
|
||
(In reply to Oliver Freyermuth from comment #4)
Taking into account that two users are affected by the segfaults: Do you consider this "ok" (since it's a driver bug causing the segfault, in the end), or should this bug be kept open to improve on existing glxtest behaviour (e.g. blacklist "harder" / "earlier" so the coredump / segmentation fault is not triggered each time)?
IMHO the warning "Please check your VA-API drivers." should be enough to point affected users to the fact that they are using broken drivers :P Implementing further detection risks overblocking in case someone fixes the bug (which shouldn't be hard by the way).
Comment 8•3 years ago
|
||
For reference, in case a Gentoo user ends up here, the downstream issue I reported is at:
https://bugs.gentoo.org/866557
Thanks. For distros shipping this driver by default I think it's reasonable to expect them to
- remove it from the default packages, as it's unmaintained
- ship a downstream patch for the issue, i.e. maintain it themselves
Comment 9•3 years ago
|
||
Olivier: on a second though I wonder if we could just do better with the warning. So instead of
Please check your VA-API drivers.
have something more verbose, for example
This was likely caused by outdated or unmaintained drivers installed on your system. You can check the driver by running
vainfo
- please consider updating or uninstalling the driver.
Do you think that would have helped you/would make things clear enough?
Reporter | ||
Comment 10•3 years ago
|
||
Thanks for reaching out, Robert!
In fact, I think the proposed more verbose warning would have helped to identify the problematic library faster.
Additionally, though, it took me more time than expected to even see the warning, since it is not always visible:
The coredumps are produced on each invocation of the firefox binary, e.g. handling URLs. However, if one Firefox session is running (e.g. started via GUI) and a URL is opened, while the coredump is still produced, the error message is not shown in the new invocation of firefox (since it is echoed by the original process, started via GUI). So I only really became aware of the message once I closed all Firefox processes and started from scratch.
So my proposal would be (if that can be implemented) to run the glxtests
only when the main process starts, and not when handling URLs. This would have several benefits:
- This approach does not overblock in case the bug is fixed (a very valid point).
- It prevents filling up harddrives with coredumps. Regular users will not investigate issues in-depth, but blame Firefox unstable for coredumping all the time, even though Firefox itself is not at fault.
- If a user observes the coredump and wants to find the source, he/she will have to restart the main process — which is the process which will print the error message.
This combined with the clearer error message would certainly have made things clear enough for me (and hopefully also for others).
Further ideas to reach even more users:
- Show the error highlighted on top of
about:support
. - Add some graphical alerting the first time an error with
glxtests
is encountered (and refer toabout:support
).
This is just a collection of ideas from a user point of view to improve the visibility of the issue and allow them to fix it more easily.
Comment 11•3 years ago
|
||
Thanks for the input!
So my proposal would be (if that can be implemented) to run the glxtests only when the main process starts, and not when handling URLs. This would have several benefits:
Urgh, I didn't know we do that. It doesn't make sense and we should definitely look into stopping it.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 13•2 years ago
|
||
Downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2147344
When glxtest process hangs (but not crashed!) it blocks every Firefox start (even remote one).
Assignee | ||
Comment 14•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 15•2 years ago
|
||
Right now we fire glxtest on every Firefox start, even if we going to update, restart or ping running remote instance.
When we're running on system with broken/unstable gfx drivers (drivers/glx freezes or crashes) every such action is delayed or coredumps are generated on systems.
In this patch we launch glx test proces later if we know we need it.
Depends on D168650
Comment 16•2 years ago
|
||
Comment 17•2 years ago
•
|
||
Backed out 2 changesets (Bug 1787182) for valgrind-test bustages and we and bc failures.
Backout link
Push with failures <--> V-swr
Failure Log
Also Wr11
Also Wr2
Also bc25
Also wpt19
Also wpt27
Also wpt31
Assignee | ||
Updated•2 years ago
|
Comment 18•2 years ago
|
||
Comment 19•2 years ago
|
||
bugherder |
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 20•2 years ago
|
||
We should not run VA-API testing as part of OpenGL test as we want to test VA-API on supported hardware only.
Depends on D168651
Assignee | ||
Comment 21•2 years ago
|
||
Depends on D171993
Assignee | ||
Comment 22•2 years ago
|
||
- Implement fire_vaapi_process() which launch VA-API test utility on given DRM device.
- Implement GfxInfo::GetDataVAAPI() which gets VA-API test results
- Run VA-API tests when FEATURE_HARDWARE_VIDEO_DECODING is probed and only if it's enabled by GfxInfo.
Depends on D171994
Comment 23•2 years ago
|
||
Assignee | ||
Comment 25•2 years ago
|
||
glxtest is run later when Firefox already spawns threads. Recently glxtest runs in forked process
which doesn't work correctly in multi-thread environment, so we need to move glxtest to different binary file
and launch it as stand alone code.
Depends on D171995
Comment 26•2 years ago
|
||
Assignee | ||
Updated•2 years ago
|
Comment 27•2 years ago
•
|
||
Backed out 5 changesets (bug 1787182) for causing build bustage at pangofc-font.h and leaks
Backout: https://hg.mozilla.org/integration/autoland/rev/b32a6d33f443a9a5c76fced7c0fb4221ef285f21
Failure logs:
bustage: https://treeherder.mozilla.org/logviewer?job_id=410080200&repo=autoland&lineNumber=32412
leaks: https://treeherder.mozilla.org/logviewer?job_id=410088584&repo=autoland&lineNumber=2582
Comment 29•2 years ago
|
||
Comment 32•2 years ago
|
||
Comment 33•2 years ago
|
||
Backed out 5 changesets (Bug 1787182) for causing leaks on linux asan.
Backout: https://hg.mozilla.org/integration/autoland/rev/5b357a414a06e58cf5e776f6c407ddffce2f3cd0
Failure log: https://treeherder.mozilla.org/logviewer?job_id=410358088&repo=autoland&lineNumber=2582
Assignee | ||
Comment 36•2 years ago
|
||
Asan looks clean now:
https://treeherder.mozilla.org/jobs?repo=try&revision=0a2ec8301871d79d81e9b088883d8d83c53efa04
Comment 37•2 years ago
|
||
Comment 38•2 years ago
|
||
Backed out for causing bustage on glxtest.cpp and xpcshell failure on test_gfxBlacklist_Device.js
Comment 39•2 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/57269e722be4
Assignee | ||
Updated•2 years ago
|
Comment 41•2 years ago
|
||
Comment 42•2 years ago
|
||
Backed out for bustages on vaapitest.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/23d78b67eef8cb5ca84d73a62e3eef833c7545a7
Log link: https://treeherder.mozilla.org/logviewer?job_id=411700662&repo=autoland&lineNumber=33025
There were also xpcshell perma failures on test_gfxBlacklist_Version.js.
Log link: https://treeherder.mozilla.org/logviewer?job_id=411701341&repo=autoland&lineNumber=4434
Assignee | ||
Comment 43•2 years ago
|
||
Depends on D173486
Comment 44•2 years ago
•
|
||
Summary:
Before bug 1758473, VAAPI has only been tested (in main process) if VAAPI was enabled.
Since then it has been tested in glxtest in any case which has caused
- a startup hang/freeze for some AMD users and
- a glxtest crash for Nvidia users with deprecated/incompatible libva-vdpau-driver which has created a coredump file.
Hotfixes:
bug 1799747 disabled the VAAPI test on Nvidia.
bug 1813500 let AMD users continue but with disabled hardware rendering. This bug would be the actual fix.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 46•2 years ago
|
||
Depends on D174995
Assignee | ||
Comment 47•2 years ago
|
||
Change FEATURE_BLOCKED_PLATFORM_TEST VA-API test failure to FEATURE_BLOCKED_DRIVER_VERSION fail as we support VA-API now but only on new Mesa.
Depends on D175235
Updated•2 years ago
|
Assignee | ||
Comment 50•2 years ago
|
||
Depends on D175236
Comment 52•2 years ago
|
||
![]() |
||
Comment 53•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ae30d1ad9c33
https://hg.mozilla.org/mozilla-central/rev/ea656d846a56
https://hg.mozilla.org/mozilla-central/rev/579f0b121b02
https://hg.mozilla.org/mozilla-central/rev/2a37788899e2
https://hg.mozilla.org/mozilla-central/rev/4b8a03e2bd4f
https://hg.mozilla.org/mozilla-central/rev/6e0d9d9e1a3d
https://hg.mozilla.org/mozilla-central/rev/561fef5949d5
https://hg.mozilla.org/mozilla-central/rev/0cb52a90a1fe
https://hg.mozilla.org/mozilla-central/rev/d87d273845cc
Comment 54•2 years ago
|
||
The patch landed in nightly and beta is affected.
:stransky, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.Also, don't forget to request an uplift for the patches in the regressions caused by this fix.
- If no, please set
status-firefox113
towontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•2 years ago
|
Description
•