Closed Bug 1739611 Opened 3 years ago Closed 2 years ago

EGL gets blocklisted on nVidia in multi-gpu setups (symbol eglGetDisplayDriverName not defined)

Categories

(Core :: Graphics: WebRender, defect)

Firefox 94
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
107 Branch
Tracking Status
firefox94 --- disabled
firefox95 --- disabled
firefox96 --- disabled
firefox107 --- fixed

People

(Reporter: arnolds, Assigned: rmader)

References

(Blocks 2 open bugs)

Details

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0

Steps to reproduce:

I’m using nVidia Geforce GT1030 with 470.82.00 driver and get the message “[GFX1-]: glxtest: libEGL missing eglGetDisplayDriverName” on starting firefox-94. I’m afraid this symbol is not declared in the nVidia 470.82 and 495.44 libEGL?

The Bugbug bot thinks this bug should belong to the 'Core::Graphics: WebRender' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core

Thanks for the report!
eglGetDisplayDriverName (EGL_MESA_query_driver) does not seem to be supported by the proprietary Nvidia driver.
As long as there are no negative consequences, current behavior seems to be expected.
Please open about:support, click on "Copy text to clipboard" and paste it here.


x11_egltest is tried first:
If pci_count determined by get_pci_status is not exactly 1,

OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Summary: firefox-94 nVidia libEGL symbol eglGetDisplayDriverName nor defined (Core / Graphics::WebRender) → firefox-94 nVidia libEGL symbol eglGetDisplayDriverName not defined

Hi Darkspirit,

thanks for your explanation of x11_egltest. By knowing this it's clear: I have two graphic adapters. The other one (Intel) is on board, so I can't remove it. What's about checking the vendor (/sys/bus/pci/devices/0000:01:00.0/vendor) too and not using eglGetDisplayDriverName if vendor ID == nVidia (0x10de)? Or, isn't there another singular attribute of the nVidia libEGL?

thanks again and have a nice weekend,

Ado

Attached file as requested

What's about '/sys/bus/pci/devices/0000:01:00.0/driver/module/version' -> 470.82.00. Let me know if I can be of any help.

Blocks: 1737428

How does about:support of https://nightly.mozilla.org look like?

Thanks for your quick reply (fix?)! Unfortunately I'm no longer in touch with the equipment having the problem for today. I will test your fix as soon as possible on Monday morning. Thanks and have a nice weekend, Ado

Good morning Darkspirit, just tested Nightly/96.0a1. It behaves like 94.0 with my problem: libEGL + nVidia (active) + Intel (inactive, onboard).
[GFX1-]: glxtest: libEGL missing eglGetDisplayDriverName
[GFX1-]: glxtest: libEGL missing eglGetDisplayDriverName
I'm afraid another heuristic for identifying the proprietary nVidia driver is needed.

Kind regards, Ado

See Also: → 1717328

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)
Severity: -- → S4
Flags: needinfo?(jmathies)

Dear Developers,

after the first very fast response by Darkspirit with an explanation why the problem exists, it's a bit disapointing to see the low severity "Small/Trivial" now. Without a fix it's not possible to use firefox + nvidia graphis device with nNvidia's drivers + libEGL just cause nVidia's libEGL doesn't support the symbol eglGetDisplayDriverName. Full performance settings are not enabled although libEGL is available. So firefox is using only fractions of the computer/graphis possibilies. E.g.: Webex reports "Video is not currently available due to low bandwidth". Video is shown in this context without any problem with the onboard Intel graphic interface or with google chrome.

Would be great to see a fix fir this problem in a not to far future. If I can be of any help, just let me know.

Cheers, Ado

Still a Nightly-only bug: You have Nightly, EGL should be enabled there, but the egl test does not succeed.
Default config works: GLX works according to your about:support and EGL isn't shipped yet to X11 on Nvidia.
This bug blocks shipping. It will be looked at.

There's already a TODO for this case in the code: https://searchfox.org/mozilla-central/source/toolkit/xre/glxtest.cpp#613-616

Note that this only affects multi-gpu setups.

Assignee: nobody → robert.mader
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: firefox-94 nVidia libEGL symbol eglGetDisplayDriverName not defined → EGL gets blocklisted on nVidia in multi-gpu setups (symbol eglGetDisplayDriverName not defined)

Thanks to let me know, Robert!

Just to clarify: "multi-gpu setup" in my case means the onboard Intel device is not used (but can't be removed naturally) cause I needed an additionally nVidia card to control a screen with higher resolution.

Have a nice day, Ado

While this disables EGL on some devices, this doesn't block bug 1737428.

No longer blocks: 1737428
See Also: → 1742994

Is there at least a temporary workaround for this bug, so we can force EGL when using nvidia? Even if it's a ugly hack (or a patch), would be sufficient so we can use EGL while we wait for the fix. Thanks!

Does it work if you start Firefox with MOZ_ENABLE_WAYLAND=1 environment variable when using Wayland
or if you set gfx.x11-egl.force-enabled=true on about:config when using X11?

(In reply to Darkspirit from comment #18)

Does it work if you start Firefox with MOZ_ENABLE_WAYLAND=1 environment variable when using Wayland
or if you set gfx.x11-egl.force-enabled=true on about:config when using X11?

No, it doesn't work with gfx.x11-egl.force-enabled=true (same error "glxtest: libEGL missing eglGetDisplayDriverName").

I'm unable to test with Wayland, since the Wayland session is broken and I get only a black screen, so I'm stuck with X11.

IIUC, glxtest should only be relevant for decision making. Do you see "EGL_VENDOR" in WebGL info on about:support? Then you are using EGL.

(In reply to Darkspirit from comment #20)

IIUC, glxtest should only be relevant for decision making. Do you see "EGL_VENDOR" in WebGL info on about:support? Then you are using EGL.

No, I don't see "EGL_VENDOR" in WebGL. I see only "GLX_VENDOR":

GLX_VENDOR(client): NVIDIA Corporation
GLX_VENDOR(server): NVIDIA Corporation

(In reply to Darkspirit from comment #20)

IIUC, glxtest should only be relevant for decision making. Do you see "EGL_VENDOR" in WebGL info on about:support? Then you are using EGL.

IIRC we currently hard-block EGL if egltest in glxtest fails. We really should fix this one, will try to have a look soon.

Can you update to Nvidia driver 495 and use Ubuntu 22.04 with Wayland?

(In reply to Dan from comment #17)

Even if it's a ugly hack (or a patch)

$ sudo apt purge *nvidia*

(In reply to Darkspirit from comment #20)

IIRC we currently hard-block EGL if egltest in glxtest fails. We really should fix this one, will try to have a look soon.

Ok, if you need testing for some patch or whatever, just ask. Thank you!

(In reply to Darkspirit from comment #23)

Can you update to Nvidia driver 495 and use Ubuntu 22.04 with Wayland?

Yes, I'm already using the latest NVIDIA 495.46 driver.

Regarding Ubuntu, I use my own installation, so it wouldn't help.

I was looking at the relevant code below:

  if (eglGetDisplayDriverName) {
    // TODO(aosmond): If the driver name is empty, we probably aren't using Mesa
    // and instead a proprietary GL, most likely NVIDIA's. The PCI device list
    // in combination with the vendor name is very likely sufficient to identify
    // the device.
    const char* driDriver = eglGetDisplayDriverName(dpy);
    if (driDriver) {
      record_value("DRI_DRIVER\n%s\n", driDriver);
    }
  } else if (require_driver) {
    record_warning("libEGL missing eglGetDisplayDriverName");

What exactly eglGetDisplayDriverName(dpy) should return in Nvidia case? Just so I know how bad the situation is.

Or I could hard code the expected value just as a quick workaround...

(In reply to Dan from comment #25)

(In reply to Darkspirit from comment #23)
...
What exactly eglGetDisplayDriverName(dpy) should return in Nvidia case? Just so I know how bad the situation is.

Or I could hard code the expected value just as a quick workaround...

Hi Dan and all, good to see that there is new life with this topic. When you look at my first report on this, you'll see that this if condition is never fulfilled with NVIDIA since the symbol "eglGetDisplayDriverName" is not declared in the NVIDIA dirver. I'am afraid that the PCI device list has to be scanned to verify that we are dealing with a NVIDIA device [e.g. for my desktop: 01:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)] if eglGetDisplayDriverName is not defined.

Cheers and all the best for 2022, Ado

Just for the record: the main issue here is that we can't rely on users using recent drivers. If everyone was on recent Mesa or Nvidia drivers it would be easy to solve. But we have to take e.g. Mesa versions without eglGetDisplayDriverName into account as well. What we could easily do is make gfx.x11-egl.force-enabled or at least MOZ_X11_EGL=1 force EGL even if the EGL test failed.

(In reply to Robert Mader [:rmader] from comment #27)

gfx.x11-egl.force-enabled or at least MOZ_X11_EGL=1 force EGL even if the EGL test failed.

Would be great to have this!

If this helps in any way (from about:support):

Failure Log
(#0) Error: glxtest: libEGL no display
(#1) Error: glxtest: No visuals found
(#2) Error: glxtest: libEGL no display
(#3) Error: More than 1 GPU vendor detected via PCI, cannot deduce vendor
(#4) Error: PCI candidate 0x8086/0x5912 --> Intel onbord device
(#5) Error: PCI candidate 0x10de/0x1d01 --> NVIDIA

Blocks: 1737428

After bug 1751252 and bug 1742994 this should be the only case where Nvidia users get HW-WR with GLX. Otherwise it should be all EGL or SW-WR, thus this is the last blocker to close all Nvidia+GLX-only bugs.

(In reply to Robert Mader [:rmader] from comment #30)

After bug 1751252 and bug 1742994 this should be the only case where Nvidia users get HW-WR with GLX. Otherwise it should be all EGL or SW-WR, thus this is the last blocker to close all Nvidia+GLX-only bugs.

Hello! Any idea when this bug will be fixed? It's been 5 months since it's opened... Thank you!

We already run the EGL test before the GLX one for a while now. Some
reordering and ignoring the case of multi-GPU systems with outdated
Mesa, combined with the fact that the only non-Mesa driver where we
enable HW-WR is the Nvidia one, which again we only support on driver
versions with EGL support, allows us to do a bunch of cleanups.

  • Stop requiring EGL_MESA_query_driver support for EGL on multi-GPU
    systems.
  • Make use of the fact that we always run the EGL test first, stop
    doing it after the GLX one.
  • Lots of cleanups that become possible as the result.

Potential issues to have an eye on:

  • EGL on Nvidia-Prime should now get HW-WR on EGL (including dmabuf
    etc.). This was previously blocked and thus needs testing.
  • Multi-GPU system with old Mesa version between 17.0 and 19.0 may
    loose HW-WR.
  • Mesa users on Xorg using 30bit color depth now run the EGL GL test
    fully (no issues expected here).

Here is a try-build with the patch from above. Testing on an affected system would be highly appreciated, given that it was not possible to force-enable and test EGL on Prime on Nvidia so far.

https://treeherder.mozilla.org/jobs?repo=try&revision=5441652461a16ddc00921f5c52eb13ed86228e61

Edit: direct download link https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/OSvlUuJ0RsaR29SkJLHJpw/runs/0/artifacts/public/build/target.tar.bz2

(In reply to Robert Mader [:rmader] from comment #33)

https://treeherder.mozilla.org/jobs?repo=try&revision=5441652461a16ddc00921f5c52eb13ed86228e61

I tested your patch here and it works perfectly!

Thank you very much!

(In reply to Dan from comment #34)

I tested your patch here and it works perfectly!

Thank you very much!

Thanks! Mind attaching your about:support ("copy text to clipboard" -> paste in a comment here -> bz will ask to make it an attachment -> yes) here so I can have a quick check? :)

(In reply to Robert Mader [:rmader] from comment #35)
> Thanks! Mind attaching your `about:support` ("copy text to clipboard" -> paste in a comment here -> bz will ask to make it an attachment -> yes) here so I can have a quick check? :)

No problem ;-)

Version: 105.0

Hm, are you sure that the build from above?

Edit: ah, is that your own build?

(In reply to Robert Mader [:rmader] from comment #37)

Version: 105.0

Hm, are you sure that the build from above?

I applied your patch directly to my repository (since I always compile Firefox from scratch).

So it's your patch against the latest 105 release.

Right, makes sense :)
Great, looks good!

Hello, as requested by :rmader attached is a copy of the about:support page. Webrender fails to enable. The console output:

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./firefox
[GFX1-]: No GPUs detected via PCI
[GFX1-]: glxtest: process failed (received signal 11)

(In reply to killercontact1.7.4.0 from comment #40)

Created attachment 9296084 [details]
about:support output with :rmader's patch enabled running driver 515.65.01

Hello, as requested by :rmader attached is a copy of the about:support page. Webrender fails to enable. The console output:

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./firefox
[GFX1-]: No GPUs detected via PCI
[GFX1-]: glxtest: process failed (received signal 11)

Thanks! This looks like bug 1759315. Can you test again with MOZ_ENABLE_WAYLAND=0 or in an X11 session? Wayland is already enabled by default on nightly but not on release/beta.

See Also: → 1759315

(In reply to Robert Mader [:rmader] from comment #41)

(In reply to killercontact1.7.4.0 from comment #40)

Created attachment 9296084 [details]
about:support output with :rmader's patch enabled running driver 515.65.01

Hello, as requested by :rmader attached is a copy of the about:support page. Webrender fails to enable. The console output:

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia ./firefox
[GFX1-]: No GPUs detected via PCI
[GFX1-]: glxtest: process failed (received signal 11)

Thanks! This looks like bug 1759315. Can you test again with MOZ_ENABLE_WAYLAND=0 or in an X11 session? Wayland is already enabled by default on nightly but not on release/beta.

I did it surely as requested. Running on Fedora 36, Gnome Xorg. Now:

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia MOZ_ENABLE_WAYLAND=0 ./firefox

      "[GFX1-]: glxtest: VA-API test failed: failed to initialise VAAPI connection.",
      "[GFX1-]: Failed to create EGLSurface!: 0x3009",
      "[GFX1-]: Failed to create EGLSurface. 1 renderers, 0 active.",
      "[GFX1-]: Handling webrender error 3",
      "[GFX1-]: Fallback WR to SW-WR"

(In reply to killercontact1.7.4.0 from comment #42)

...
I did it surely as requested. Running on Fedora 36, Gnome Xorg. Now:
...

Thanks! Can you shortly confirm that other EGL apps do work with the same environment variables, such as glmark2-es2?

Flags: needinfo?(killercontact1.7.4.0)

Going forward with the patch despite the issue above - prop. Nvidia in a multi-gpu setup on Wayland is just too niche to be a blocker here (and might be caused by driver issues). Lets continue with that in a follow-up.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 107 Branch
Regressions: 1792771

(In reply to Robert Mader [:rmader] from comment #46)

Going forward with the patch despite the issue above - prop. Nvidia in a multi-gpu setup on Wayland is just too niche to be a blocker here (and might be caused by driver issues). Lets continue with that in a follow-up.

Apologies for the late reply. Running the benchmarks results in the same error, unfortunately this means it is either a misconfiguration or a driver bug. It is Fedora 36 with packages that are up-to-date, even reinstalling driver does nothing.

Even with the new patch the problem remains, but as updates continue coming, trying it will be done again.

Flags: needinfo?(killercontact1.7.4.0)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: