Closed Bug 1745172 Opened 2 years ago Closed 4 months ago

EGL/X11/hybrid Intel+NV (NV unused, but driver 495.44 installed): Transparent window (only borders) on Intel video (Can be fixed with `__EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json`: libglvnd bug?)

Categories

(Core :: Graphics, defect)

Firefox 95
x86_64
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox95 --- wontfix
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox101 --- wontfix
firefox102 --- affected
firefox103 --- affected

People

(Reporter: robert, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: correctness)

Attachments

(1 file)

Attached file glxinfo.log

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0

Steps to reproduce:

Start Firefox 95 on an Linux X11 session, Fedora 35 with mesa RPMs at 21.3.1-2.fc35.

Actual results:

The application main window is transparent, only the border is visible. After multiple window closing with Alt+F4 sometimes it start normally.

From console it prints:

ATTENTION: default value of option mesa_glthread overridden by environment.

Setting gfx.x11-egl.force-disabled to true is a workaround, No problem on a Wayland session either.

Expected results:

Windows should display perfectly

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core
Severity: -- → S3
OS: Unspecified → Linux

Thanks for the report!
Which X11 desktop environment do you have? Gnome? KDE?
Do you have a second Nvidia GPU?
Do you have the proprietary Nvidia driver installed? If yes, which version?
Can this bug be fixed by setting gfx.x11-egl.force-disabled back to false, but gfx.webrender.max-partial-present-rects to 0, gfx.webrender.allow-partial-present-buffer-age to false and restarting Firefox?

(In reply to Darkspirit from comment #2)

Thanks for the report!
Which X11 desktop environment do you have? Gnome? KDE?

GNOME on Xorg, I usually use the deafult Wayland version, but switch occasionally to Xorg in order to use a legacy screen capture software. I noticed the Firefox problem on Xorg because I forgot to switch back, Not sure if it happened before release 95.

Do you have a second Nvidia GPU?

Yes, the laptop is an hybrid GPU model. The NVidia GPU is disabled at startup with a systemd service defined to run the following commands before plymouth-start.service and gdm.service.

echo 1 > /sys/bus/pci/devices/0000:01:00.1/remove
echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove

This power down the NVIDIA GPU and it even isn't listed on lspci after this.

At the kernel level, the kernel cmdline includes:

rd.driver.blacklist=nouveau modprobe.blacklist=nouveau rd.driver.blacklist=nvidia modprobe.blacklist=nvidia rd.driver.blacklist=nvidia_drm modprobe.blacklist=nvidia_drm rd.driver.blacklist=nvidia_modeset modprobe.blacklist=nvidia_modeset rd.driver.blacklist=nvidia_uvm modprobe.blacklist=nvidia_uvm

In order to blacklist these modules (nouveau and nvidia*)

Do you have the proprietary Nvidia driver installed? If yes, which version?

Yes, the RPMFusion provided at version kmod-nvidia-5.15.6-200.fc35.x86_64-495.44-1.fc35.x86_64

Can this bug be fixed by setting gfx.x11-egl.force-disabled back to false, but gfx.webrender.max-partial-present-rects to 0, gfx.webrender.allow-partial-present-buffer-age to false and restarting Firefox?

No, this does not solve the problem, 3 out of 5 restarts open with a transparent window.

Note: If you ask yourself about the complexity of the Nvidia settings, I have another boot menu option in order to enable the NVidia GPU only ehwn I need it (some non professional Blender).

Can this bug be prevented by starting Firefox with the following command?
$ __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json firefox

Blocks: wr-nv-linux
Keywords: correctness
Hardware: Unspecified → x86_64
See Also: → 1737078
Summary: [X11][EGL] Transparent window (only borders) on Intel video → EGL/X11/hybrid Intel+NV (NV unused, but driver 495.44 installed): Transparent window (only borders) on Intel video

Yes, the __EGL_VENDOR_LIBRARY_FILENAMES is a workaround for the problem. Interesting setting, I think I will add it to my environment when the NVIDIA GPU is disabled.

The referenced bug looks like this one too, with the exception that I permanently disable the NVIDIA GPU, not using Bumblebee.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: EGL/X11/hybrid Intel+NV (NV unused, but driver 495.44 installed): Transparent window (only borders) on Intel video → EGL/X11/hybrid Intel+NV (NV unused, but driver 495.44 installed): Transparent window (only borders) on Intel video (Can be fixed with `__EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json`: libglvnd bug?)
Blocks: linux-egl

Hm, this is actually quite bad - should we disable WR for the moment if a secondary NV device is detected?

Eric: here's another issue that you should probably know about :)

Flags: needinfo?(ekurzinger)

This should be fixed by the following commit to our egl-wayland library https://github.com/NVIDIA/egl-wayland/commit/d4937adc5cd04ac7df98fc5616e40319fb52fdee

Sorry I haven't cut a new release yet since it was merged. I suppose I probably should so distros start picking it up. I will do so tomorrow.

Flags: needinfo?(ekurzinger)

Can a commit to egl-wayland really fix this EGL/X11 (not Xwayland) bug?

I should have read the description more carefully, apologies. A similar issue was previously reported to us on Wayland (with the same __EGL_VENDOR_LIBRARY_FILENAMES work-around) but we were not aware of anything like this affecting X11. You're right, though, in that case an egl-wayland update would not fix it.

Looking at https://bugzilla.mozilla.org/show_bug.cgi?id=1731480 it does seem like there might be an issue with libglvnd incorrectly dispatching some EGLDevice-related calls when the NVIDIA driver is present, maybe that's responsible?

I think my bug report might be related to this:
https://bugzilla.mozilla.org/show_bug.cgi?id=1744947

/sys/bus/pci/devices/0000:01:00.1/

See Also: → 1754074
See Also: → 1771382

(Kevin Locke from bug 1771382 comment 3)

If I understand correctly, the GLVND EGL dispatching tries libEGL_nvidia.so before libEGL_mesa.so because 10_nvidia.json precedes 50_mesa.json. Normally (based on strace of eglgears_wayland) libEGL_nvidia.so loads libnvidia-glsi.so which reads /proc/modules and each /sys/bus/pci/devices/*/config, then forks and execs /usr/bin/nvidia-modprobe (which also reads /proc/modules and each /sys/bus/pci/devices/*/config, then exits with code 1, without attempting mknod or init_module) and gives up after nvidia-modprobe exits, letting libEGL_mesa.so try (and succeed on my system).

Perhaps if the RDD process it is unable to read or fork+exec those files, libnvidia-glsi.so falls back to attempting mknod itself and gets killed (for seccomp sandbox violation: syscall 259, mknodat)? That could explain why the issue only occurs on systems with the nvidia drivers installed and the nvidia module not loaded (because otherwise mknod would not be needed as the device files would already exist).

However, that doesn't explain why Firefox sometimes crashes when the RDD process dies (especially when particular videos are loaded at the same time in multiple tabs).
[...]

(Darkspirit from bug 1771382 comment 4)

[...]
Wrong EGL driver selection:

You seem to be right.
But it also seems to affect the main process which doesn't have a sandbox (bug 1745172).

That's what I've found a few days ago:
https://www.reddit.com/r/pop_os/comments/rnergb/firefox_shows_blank_screen_after_upgrading_to_2110/

This only occurs with Integrated graphics mode, not Nvidia graphics mode;
This does not always occur. Around 1 in 3 attempts to run firefox will succeed.
Even if firefox starts successfully, opening Help -> About Firefox will still show a blank screen.

https://askubuntu.com/questions/1380600/firefox-occasionally-not-rendering-its-own-window-when-opened

Turns out it has something to do in the case when you disable the Nvidia card and use integrated graphics. Firefox seems to be trying to render through the disabled GPU. To fix add this to your .profile:

if ! grep -w -q nvidia <(lsmod) ; then export
__EGL_VENDOR_LIBRARY_FILENAMES="/usr/share/glvnd/egl_vendor.d/50_mesa.json"
fi

The user has 10_nvidia.json and 50_mesa.json.
glvnd just picks the json file with the lowest number.
The Nvidia driver doesn't reliably reject its responsibility when the GPU is disabled, the kernel module not loaded, etc.

Hi Robert, is this still an issue or can we close the bug?

Flags: needinfo?(robert)

This bug can be closed as fixed. Tested again un-setting the __EGL_VENDOR_LIBRARY_FILENAMES workaround and it worked fine on a X11 session.

Flags: needinfo?(robert)

Perfect, thanks!

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.