Closed Bug 1706452 Opened 3 years ago Closed 3 years ago

Firefox Nightly 90.0a1 (2021-4-20) on Wayland started on X with the error glxtest: Could not connect to wayland socket

Categories

(Core :: Widget: Gtk, defect, P2)

Firefox 90
defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- unaffected
firefox90 --- fixed

People

(Reporter: matt.fagnani, Assigned: glandium)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Crash Data

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

I updated to Firefox Nightly 90.0a1 (2021-4-20) in a Fedora 34 KDE Plasma installation. I started Firefox Nightly 90.0a1 (2021-4-20) on Wayland in Plasma 5.21.4 on Wayland with the following steps. Start konsole. Change to the directory Firefox Nightly 90.0a1 is in. MOZ_ENABLE_WAYLAND=1 ./firefox

Actual results:

Firefox Nightly 90.0a1 (2021-4-20) on Wayland started on X with the error glxtest: Could not connect to wayland socket shown in konsole and Troubleshooting Information > Graphics > Failure Log. The following errors were in Troubleshooting Information > Graphics > WebGL 1 Driver Renderer
WebGL creation failed:

  • WebglAllowWindowsNativeGl:false restricts context creation on this system. ()
  • Exhausted GL driver options. (FEATURE_FAILURE_WEBGL_EXHAUSTED_DRIVERS)

WebGL 2 Driver Renderer
WebGL creation failed:

  • AllowWebgl2:false restricts context creation on this system. ()

Window Protocol xwayland
Desktop Environment kde

The compositor was OpenGL instead of the default WebRender. The WebGL Driver Renderer errors also happened when I started 90.0a1 (2021-4-20) on X with MOZ_ENABLE_WAYLAND=0 ./firefox

These errors didn't happen with 90.0a1 (2021-4-19) or earlier. I ran MOZ_ENABLE_WAYLAND=1 mozregression --bad 2021-4-20 --good 2021-4-19

5:55.96 INFO: Narrowed integration regression window from [cb69b22f, f8eb1926] (3 builds) to [30af8f80, f8eb1926] (2 builds) (~1 steps left)
5:55.96 INFO: No more integration revisions, bisection finished.
5:55.96 INFO: Last good revision: 30af8f80e2754783f1981485dd911a2e341d9afd
5:55.96 INFO: First bad revision: f8eb1926bfdbb630b4aa17e064c3472c4a32709e
5:55.96 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=30af8f80e2754783f1981485dd911a2e341d9afd&tochange=f8eb1926bfdbb630b4aa17e064c3472c4a32709e

The first bad revision f8eb1926bfdbb630b4aa17e064c3472c4a32709e had the following changes.
User
Push date [To Local] Changeset Patch author — Commit message
mh@glandium.org
Tue Apr 20 01:59:31 2021 +0000 f8eb1926bfdbb630b4aa17e064c3472c4a32709e Mike Hommey — Bug 1377445 - Remove gtk+2 from docker images. r=firefox-build-system-reviewers,andi,mhentges
0ce69d9f596b3910876f4838c5b04250a553b9c4 Mike Hommey — Bug 1377445 - Remove build dependencies on gtk+2. r=firefox-build-system-reviewers,mhentges
44d676a26d1a8c6514eca25ef4310e7f2a9df873 Mike Hommey — Bug 1377445 - Use dlsym for atk_bridge_adaptor_init. r=eeejay

Expected results:

Firefox Nightly 90.0a1 on Wayland would start normally without errors.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Regressed by: 1377445
Has Regression Range: --- → yes
Assignee: nobody → mh+mozilla
Status: NEW → ASSIGNED

Set release status flags based on info from the regressing bug 1377445

Blocks: wayland

I guess making the fallback symbols weak doesn't help? Not even when they're inside libxul?

PS: Never mind, I see symbols can only be weak when statically linking.

(Although GLibC ld.so can also do strong-over-weak resolution when dynamically linking, this goes against the standard and is hidden behind LD_DYNAMIC_WEAK.)

I believe I'm reproducing this same error on native X11 on Nvidia, it also started with my Nightly update yesterday and I was kicked back to legacy GL rendering with both WebRender and WebGL saying unqualified. about:support failure log shows:

(#0) Error	No GPUs detected via PCI
(#1) Error	glxtest: process failed (received signal 11)

lspci | grep -i vga shows

08:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2080 Rev. A] (rev a1)

I can file another bug if necessary but this seems related.

Priority: -- → P2
Pushed by robert.mader@posteo.de:
https://hg.mozilla.org/integration/autoland/rev/1859720a3225
Reintroduce a mozgtk library after bug 1377445. r=firefox-build-system-reviewers,rmader,mhentges

Confirmed here in Sway 1.6 as well. I do have 2 GPUs but I don't have the nvidia driver installed at all and nouveau is blacklisted, so the system only really recognises the intel HD630.

~  lspci | grep -i vga
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
~ lspci | grep -i nvid  
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] (rev a1)

Firefox 89b2 is working fine.

@rmader now do we have to build it with either MOZ_WAYLAND=1 or MOZ_WAYLAND=gtk to enable wayland support? ac_add_options --enable-default-toolkit=cairo-gtk3-wayland still matters?

(In reply to Pedro Lara Campos from comment #10)

@rmader now do we have to build it with either MOZ_WAYLAND=1 or MOZ_WAYLAND=gtk to enable wayland support? ac_add_options --enable-default-toolkit=cairo-gtk3-wayland still matters?

I don't think anything changed - as soon as the patch above hit central, things should be as they were before bug 1377445

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
Blocks: 1707124

90.0a1 (2021-4-22) with build id 20210422213157 started on Wayland without the error glxtest: Could not connect to wayland socket. Thanks. The WebGL Driver Renderer errors were still present in 90.0a1 (2021-4-22) 20210422213157, and OpenGL compositing was used instead of WebRender. I noticed preferences like gfx.blacklist.canvas2d.acceleration: 4, gfx.blacklist.canvas2d.acceleration.failureid: FEATURE_FAILURE_GLXTEST_FAILED which had started with 90.0a1 (2021-4-20). Only the gfx.blacklist..failureid remained with 90.0a1 (2021-4-22) 20210422213157. I removed all of the 30 or so gfx.blacklist..failureid in about:config and restarted Firefox. After that, the WebGL Driver Renderer errors were gone, and WebRender compositing was used again.

I can confirm with build id 20210422213157 on my Nvidia+X11+Gnome system it is once again defaulting to WebRender and WebGL support has returned.

(In reply to Vash63 from comment #14)

I can confirm with build id 20210422213157 on my Nvidia+X11+Gnome system it is once again defaulting to WebRender and WebGL support has returned.

While this is somewhat surprising, I may make sense, see bug 1706762

See Also: → 1706762

I am at 8adfd0fb0253e6cbb34b5d66211c8a8ecc138932 and I still experience this regression.
AMD + Wayland. "glxtest: Could not connect to wayland socket" and Firefox is run under Xwayland.

I'm also on Arch, running sway on Intel graphics, latest commit as of writing. I'm not running the prebuilt nightly from Mozilla, but an Arch-specific build provided by a maintainer (https://pkgbuild.com/~heftig/repo/x86_64/).
Just like the above commenter, I'm still running into this issue. LD_PRELOAD=/usr/lib/libwayland-client.so does allow Firefox to use Wayland again, but WR falls back to software mode and WebGL is unavailable. I've cleared all the gfx.blacklist.* from my preferences. I see this in my terminal:

[GFX1-]: Failed to create EGLSurface
[GFX1-]: Fallback WR to SW-WR```

I made a mistake while posting the code block above, here it is corrected:

[GFX1-]: window is null
[GFX1-]: Failed to create EGLSurface
[GFX1-]: Fallback WR to SW-WR

Well, I've already figured it out. Using LD_PRELOAD=/usr/lib/firefox-nightly/libmozgtk.so instead does the trick, with Wayland, WR and WGL working properly. Something might be wrong with the build I'm using, or the fix isn't entirely correct.

It looks like libmozgtk.so doesn't appear in dependentlibs.list, I think that's the culprit. Any idea what could be causing that?
Sorry for the spam.

This fix also didn't work for me. I also need to use LD_PRELOAD=/usr/lib/firefox/libmozgtk.so in order to be able to open Firefox.

Working for me in Ubuntu 21.04 and Firefox 90.0a1hg20210424r577289-0ubuntu0.21.04.1~umd1 2021-04-24. No issues with libraries. Installed from https://launchpad.net/~ubuntu-mozilla-daily/+archive/ubuntu/ppa

(In reply to novenary from comment #17)

I'm also on Arch, running sway on Intel graphics, latest commit as of writing. I'm not running the prebuilt nightly from Mozilla, but an Arch-specific build provided by a maintainer (https://pkgbuild.com/~heftig/repo/x86_64/).

I never noticed this bug with my own builds so I'm not sure what's required to reproduce it.

(In reply to Jan Alexander Steffens [:heftig] from comment #23)

I never noticed this bug with my own builds so I'm not sure what's required to reproduce it.

The problem is easily spotted, libmozgtk.so is missing from dependentlibs.list, but the shared object file itself is where it belongs. It was present in older builds from your repo (need to check the date), and it's also present in Mozilla's latest build.

(In reply to novenary from comment #24)

The problem is easily spotted, libmozgtk.so is missing from dependentlibs.list, but the shared object file itself is where it belongs. It was present in older builds from your repo (need to check the date), and it's also present in Mozilla's latest build.

Hmm, you're right. I think the issue might be that our system LDFLAGS contain --as-needed, which makes the linker (lld in our case) drop libmozgtk from libxul because it fulfills no dependencies.

That could be it then. The fix that was committed does handle this for GNU LD but not lld. I didn't figure out how dependentlibs.list is generated but as far as I understand, the launcher binary loads everything listed in there with dlopen before calling the actual entry point in order to ensure things are loaded in the right order. If the list is derived from what libxul actually links against, that's the culprit.

(In reply to novenary from comment #26)

That could be it then. The fix that was committed does handle this for GNU LD but not lld. I didn't figure out how dependentlibs.list is generated but as far as I understand, the launcher binary loads everything listed in there with dlopen before calling the actual entry point in order to ensure things are loaded in the right order. If the list is derived from what libxul actually links against, that's the culprit.

I don't think ld.bfd vs ld.lld is the problem here. GCC_USE_GNU_LD is defined for both, and both support --as-needed.

https://searchfox.org/mozilla-central/source/build/moz.configure/toolchain.configure#2225-2238

I've rebuilt it without --as-needed now, and besides the shared libs gaining lots of new deps, dependentlibs.list has changed:

           old                                      new
       libnspr4.so                              libnspr4.so
       libmozsandbox.so
       libplc4.so                               libplc4.so
       liblgpllibs.so
       libplds4.so                              libplds4.so
                                                libmozsandbox.so
                                                liblgpllibs.so
       libnssutil3.so                           libnssutil3.so
       libnss3.so                               libnss3.so
       libsmime3.so                             libsmime3.so
       libmozsqlite3.so                         libmozsqlite3.so
       libssl3.so                               libssl3.so
                                                libmozgtk.so
       libmozwayland.so                         libmozwayland.so
       libxul.so                                libxul.so

Mike, can you have another look? :)

Flags: needinfo?(mh+mozilla)
Crash Signature: [@ bp-ad54f33f-671f-4b31-8e3a-8ac9f0210420] [@ mozilla::detail::MutexImpl::~MutexImpl | __run_exit_handlers]

Please file a separate bug marked regressed by 1377445.

Crash Signature: [@ bp-ad54f33f-671f-4b31-8e3a-8ac9f0210420] [@ mozilla::detail::MutexImpl::~MutexImpl | __run_exit_handlers] → [@ bp-ad54f33f-671f-4b31-8e3a-8ac9f0210420] [@ mozilla::detail::MutexImpl::~MutexImpl | __run_exit_handlers]
Flags: needinfo?(mh+mozilla)

Filed bug 1707834.

No longer regressions: 1712969
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: