Closed Bug 1689207 Opened 4 years ago Closed 4 years ago

glxtest fails in Firefox 86 if GLES >= 3.0 is not supported

Categories

(Core :: Graphics, defect, P2)

Firefox 86
defect

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox86 --- wontfix
firefox87 --- fixed

People

(Reporter: vd.Kraats, Assigned: rmader)

References

(Regression)

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(4 files, 2 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:86.0) Gecko/20100101 Firefox/86.0

Steps to reproduce:

Normally I run firefox with MOZ_ENABLE_WAYLAND=1 at config/environment.d/60_firefox.conf at Linux debian 5.10.0-1-686-pae.
When I start Firefox it looks normal.

Actual results:

But a web-page at the first tab cannot be scrolled down by mouse at the right scrollbar. Also the page for the second tab does not appear. If you close firefox it then asks permission for closing 2 tabs.
Windows do not appear or do not disappear.
In fact it is completely unusable.

Expected results:

Firefox ESR with wayland and 86.0b2 without wayland are working correctly.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

Can you use mozregression tool to find the wrong commit? How-to is here:
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems?rd=Bug_info_Firefox#Use_Mozregression_tool
Thanks.

Flags: needinfo?(vd.Kraats)
Attached file firefox_regression

I add a regression-file.
The wrong firefox-sessions already stop at a popup-screen from Google, where it asks for a advertisement-policy.

Flags: needinfo?(vd.Kraats)

I'm afraid the push log is too bug so I can't identify the issue from it.
I wonder if it can be a duplicate of Bug 1687212.
btw. Do you see it with latest nightly too?
Thanks.

Flags: needinfo?(vd.Kraats)

Please attach content of about:support page.
Thanks.

Blocks: wayland
Priority: -- → P2
Attached file firefox_support

I attach about:support.
Does not look good.

When starting firefox with wayland I see the warning:
[GFX1-]: glxtest: libGLESv2 glGetString returned null

Nightly has the same problem.

Flags: needinfo?(vd.Kraats)

If I only use the laptop-screen I get the errors:

(#0) Error: No GPUs detected via PCI
(#1) Error: glxtest: process failed (received signal 11)

QA Whiteboard: [qa-regression-triage]

Extra info.
At previous versions, Firefox with Wayland accepted OpenGL ES 2.0:
WebGL 1 Driver Version:
OpenGL ES 2.0 Mesa 20.3.3
Intel Open Source Technology Center -- Mesa DRI Intel(R) 945GM x86/MMX/SSE2

Looking at the source I have the impression that Beta and Nightly now require a minimal version ES 3.0. otherwise the error:
glxtest: libGLESv2 glGetString returned null
occurs.

This is strange because Wayland itself supports ES 2.0 and firefox previously did not give problems with ES 2.0.

This error disappears if LIBGL_ALWAYS_SOFTWARE=1 is used.
After clearing the cache WEBGL1 and WEBGL2 are supported again according to "about:support", but firefox with wayland still is not usable.

In that case at Nightly with wayland and without wayland still remain the same errormessages:
[GFX1-]: More than 1 GPU detected via PCI, cannot deduce vendor
[GFX1-]: PCI candidate 0x8086/0x27a6
[GFX1-]: PCI candidate 0x8086/0x27a2

This can be caused by duplicate bus devices for same VGA controller, which seems to be a known property of 945GM:

lspci -nn
.....
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller [8086:27a2] (rev 03)
00:02.1 Display controller [0380]: Intel Corporation Mobile 945GM/GMS/GME, 943/940GML Express Integrated Graphics Controller [8086:27a6] (rev 03)
.....

Jan 29 16:51:29 debian kernel: [ 3.848451] [Firmware Bug]: Duplicate ACPI video bus devices for the same VGA controller, please try module parameter "video.allow_duplicates=1"if the current driver doesn't work.

Robert, may that come from the multi-gpu patches?
Thanks.

Flags: needinfo?(robert.mader)

(In reply to vd.Kraats from comment #8)

...
Looking at the source I have the impression that Beta and Nightly now require a minimal version ES 3.0. otherwise the error:
glxtest: libGLESv2 glGetString returned null
occurs.
...

I pocked at this a bit and got some weird result.

First of all I using MESA_GLES_VERSION_OVERRIDE=2.0 with the EGL version of glGetString makes it break - it always returns NULL, making us fail with the above message. The GLX version, however, works just fine. This part looks like a mesa bug.

However, that should not make the whole browser break - it should simply fall back to software rendering. Your about:support indicates that this works fine.

For the record: this probably have started after bug 1640053 landed, which makes us use EGL in glxtest instead of GLX. Now while this definitely should get fixed in mesa so setups like this have a chance of getting WebGL 1 support, what needs further investigation is why things fall apart - I assume some process crashes or so. Will dick deeper.

Flags: needinfo?(robert.mader)
See Also: → 1688873

Just set up an old laptop with similar hardware, showing the same behaviour in about:support. However it works well apart from WebGL 1 being broken. As this is a Mesa bug it should get fixed there IMO (will give it a try soon).

vd.Kraats: is latest nightly still broken for you, apart from WebGL?

Flags: needinfo?(vd.Kraats)

Both latest nightly and also latest beta 86.0b9 with wayland are not broken anymore, but still have the WEBGL-issue.

Flags: needinfo?(vd.Kraats)

As it would make us fail on GLES 2.0 hardware. We could do much
better here by properly checking GL and GLES context etc. but apparently
we are only really interested in whether we are on GL/GLES 1 hardware.
Therefore keep the test as simply as it is for now.

Also error out if glGetString returns empty values, in order to
fall back to GLX where applicable.

Assignee: nobody → robert.mader

(In reply to vd.Kraats from comment #13)

Both latest nightly and also latest beta 86.0b9 with wayland are not broken anymore, but still have the WEBGL-issue.

Great - the WebGL issue should be fixed by the trivial patch above.

  • Do not require GLES >= 3.0 any more to succeed, fixing this bug.
  • Use GL instead of GLES, matching GLX behavior. This way we can avoid
    most regressions related to the EGL switch. One scenario here is older
    intel hardware supporting GL 1.4 and GLES 2.0 - we want to continue
    blocking this hardware altogether, as e.g. WebGL does not support a
    fallback to GLES in this case, resulting in crashes.
  • Add more error messages and early returs, making future debugging
    easier.
  • Remove eglCreatePbufferSurface - it always failed anyway, unnoticed!

After this patch we should always match GLX behavior when setting
different combinations of MESA_GL_VERSION_OVERRIDE and MESA_GLES_VERSION_OVERRIDE

Attachment #9202917 - Attachment is obsolete: true
See Also: → 1689707

Repurposing this bug for the glxtest issue.

Regressed by: 1640053
Summary: Firefox 86.0b1 and 86.0b2 not working with wayland → glxtest fails in Firefox 86 if GLES >= 3.0 is not supported
Has Regression Range: --- → yes
No longer blocks: wayland
Component: Widget: Gtk → Graphics
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Comment on attachment 9203044 [details]
Bug 1689207 - Some fixes for EGL in glxtest, r=aosmond,stransky

Beta/Release Uplift Approval Request

  • User impact if declined: All Linux users, X11 and Wayland, without GLES >3.0 support will have no WebGL. This includes many older devices and apparently Debian with prop. Nvidia drivers.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Only applies to Linux, changes are only in glxtest - in the very worst case, i.e. a crash, certain users wouldn't have WebGL/WR. But this is exactly what the patch is aimed at to avoid and what was tested on several devices.
  • String changes made/needed:
Attachment #9203044 - Flags: approval-mozilla-beta?
  • Do not require GLES >= 3.0 any more to succeed, fixing this bug.
  • Use GL instead of GLES, matching GLX behavior. This way we can avoid
    most regressions related to the EGL switch. One scenario here is older
    intel hardware supporting GL 1.4 and GLES 2.0 - we want to continue
    blocking this hardware altogether, as e.g. WebGL does not support a
    fallback to GLES in this case, resulting in crashes.
  • Add more error messages and early returs, making future debugging
    easier.
  • Downgrade record_error to record_warning in some more places where
    we don't want to fail hard (bug 1689707).
  • Remove eglCreatePbufferSurface - it always failed anyway, unnoticed!

Simplified backport of D105107

Attachment #9203044 - Flags: approval-mozilla-beta? → approval-mozilla-release?

I am keeping the uplift request open in case we have a dot release this cycle and we have some bake time in beta 87 with this patch.

I tested nightly at Debian Bullseye (32 bit).
The error "glxtest: libGLESv2 glGetString returned null" indeed has
disappeared.
The warning "More than 1 GPU detected via PCI, cannot deduce vendor"
still is present, but does not harm, and possibly to typical properties
of the 945GM Graphics Controller. showing 2 bus devices for one
controller.

I do not quite understand how things are supposed to work now, but
WEBGL1 is not automatically activated (also not by clearing cache), but the
error:
"WebglAllowWindowsNativeGl:false restricts context creation on this system"
is shown at about:support.
Activating is possible by setting "webgl.force-enabled" true at about:config.
ES2.0 is used then.

At Firefox ESR 78.7 this is not needed and WEBGL1 is automatically
activated.

Tested nightly also at Ubuntu 18.04.5 gave same results.

WEBGL1 (and (WEBGL2) are both activated if LIBGL_ALWAYS_SOFTWARE=1.
Remarkable but not caused by Firefox is the poor performance of the WEBGL
aquarium-demo at Debian, compared to Ubuntu at the same machine.
Ubuntu uses LLVM 10, Debian LLVM11, both using 3.1 Mesa with version
20.0.8 (Ubuntu) resp. 20.3.4 (Debian).
At Ubuntu LIBGL_ALWAYS_SOFTWARE is 2 to 3 times faster than ES 2.0, at Debian it has the same "speed", but uses double cpu.
~

(In reply to vd.Kraats from comment #24)

...
I do not quite understand how things are supposed to work now, but
WEBGL1 is not automatically activated (also not by clearing cache), but the
error:
"WebglAllowWindowsNativeGl:false restricts context creation on this system"
is shown at about:support.
Activating is possible by setting "webgl.force-enabled" true at about:config.
ES2.0 is used then.
...

IIUC the device falls into the GL 1.4/GLES 2.0 category, right? While it would be technically possible to allow automatic WebGL activation for these devices, I personally will not work in that direction. The GLX behaviour was always to block GL < 2 devices - allowing GLES 2 devices would need careful checking if fallback paths work across a wide range of old devices and the general rule of thumb has been that GL < 2 drivers tend to be very buggy. What could work is to force-enable GL 2.0 support - AFAIK that was somehow possible.

I'm confused that it should have worked it ESR 78.7. Could you share an about:support from there?

Attached file firefox_support_esr

I attached the requested about:support from Firefox ESR at Debian Bullseye

(In reply to vd.Kraats from comment #26)

Created attachment 9204169 [details]
firefox_support_esr

I attached the requested about:support from Firefox ESR at Debian Bullseye

Thanks! So that shows that for some reason mesa fell back to llvmpipe in glxtest, tricking it to pass, and then successfully creating a WebGL context via ES 2.0 on hardware. So that was more of an accident, not intended and not what you should have got on GLX.

I'd love to have this case working, but Jeff Gilbert previously told me that he'd not accept any patches allowing GL 1.x era hardware - too many issues in the past for too little gain. Sorry :/

Indeed GL 1.x should not be used anymore. I regret a working solution is not working anymore automatically, but I can understand your decision, because graphics and drivers are a terrible mess. I will Force WEBGL1 as long as it works and software rendering is slow.
Thanks.

This issue still occurs for me when running Nightly with "MOZ_ENABLE_WAYLAND=1 MOZ_X11_EGL=1 ./firefox" on Xorg. It still works with 86b9. I've got both envs exported globally for convenience reasons and it'd be nice if this continued to work.

Edit: Forgot to mention: Primary GPU is RX 5700 XT (RadeonSI), whereas there's also the IGP of the 6700k available in the system (Iris OGL 4.6, recent mesa git-master).

(In reply to walmartguy from comment #29)

This issue still occurs for me when running Nightly with "MOZ_ENABLE_WAYLAND=1 MOZ_X11_EGL=1 ./firefox" on Xorg. It still works with 86b9. I've got both envs exported globally for convenience reasons and it'd be nice if this continued to work.

What exact issue do you mean? I suppose you're not running into the GLES < 3.0 scenario?

(In reply to Robert Mader [:rmader] from comment #31)

What exact issue do you mean? I suppose you're not running into the GLES < 3.0 scenario?

I probably should have mentioned that (oops): The issue is that Webrender doesn't work at all for me in Nightly with the aforementioned env vars specified. Since my GPUs are well supported by Mesa, I wouldn't think that it's anything about missing features levels.
Webrender with EGL on Xorg still works normally as long as I don't specify MOZ_ENABLE_WAYLAND=1 at the same time (which worked with 86).

Yet it seems to be related to GPU selection, as I get the "[GFX1-]: More than 1 GPU detected via PCI, cannot deduce vendor" error verbosity.

Ah I see. So on nightly we do not fall back to GLX any more if IsWaylandDisabled() is false (1), which is unconditionally set MOZ_ENABLE_WAYLAND=1, regardless of the backend actually used. So the assumption is IsWaylandDisabled() == false -> we are using the Wayland backend.

We could fix that for glxtest, however there are several other places where we rely on the same assumption (2), making me wonder why this worked for you in the first place. It should have had a bunch of odd side effects - maybe their effects were small enough.

Looking at the Fedora /usr/bin/firefox script we have the following:

if ! [ $MOZ_DISABLE_WAYLAND ]; then
  if [ "$XDG_CURRENT_DESKTOP" == "GNOME" ]; then
    export MOZ_ENABLE_WAYLAND=1
  fi
  if false && [ "$XDG_SESSION_TYPE" = "wayland" ]; then
    export MOZ_ENABLE_WAYLAND=1
  fi
fi

I.e. MOZ_ENABLE_WAYLAND=1 will also get set unconditionally, without e.g. checking for $XDG_SESSION_TYPE. That will probably also break hard soon then. I suppose it would make sense to check in firefox IsWaylandDisabled() for $XDG_SESSION_TYPE and, if set, only enable it if its value is wayland.

By the way: we should IMO rename IsWaylandDisabled() to IsWaylandEnabled() - inverted logic is usually not a good idea.

1: https://searchfox.org/mozilla-central/source/toolkit/xre/glxtest.cpp#1202-1206
2: https://searchfox.org/mozilla-central/search?q=IsWaylandDisabled&path=

Martin, this will probably be important for Fedora soon. What do you think about the points above?

Flags: needinfo?(stransky)

(In reply to Robert Mader [:rmader] from comment #34)

We could fix that for glxtest, however there are several other places where we rely on the same assumption (2), making me wonder why this worked for you in the first place. It should have had a bunch of odd side effects - maybe their effects were small enough.

Now you got me an idea: With 86 beta, I've noticed that there was intermittent stutter (can take a minute or two to occur) on vsynctester.com, opposed to 85 stable.
Well, it very much looks like this was caused by MOZ_ENABLE_WAYLAND=1. I've repeated the test several times and without the variable set, there never was stutter (not counting the first few seconds), while with it set, there always was after 10 - 180 seconds. Very odd indeed. :)

(In reply to Robert Mader [:rmader] from comment #34)

Looking at the Fedora /usr/bin/firefox script we have the following:

if ! [ $MOZ_DISABLE_WAYLAND ]; then
  if [ "$XDG_CURRENT_DESKTOP" == "GNOME" ]; then
    export MOZ_ENABLE_WAYLAND=1
  fi
  if false && [ "$XDG_SESSION_TYPE" = "wayland" ]; then
    export MOZ_ENABLE_WAYLAND=1
  fi
fi

That's a typo, it should be "$XDG_SESSION_TYPE" == "wayland". It was intended for https://bugzilla.redhat.com/show_bug.cgi?id=1922608 but given Kwin/Wayland state I'm going to remove it anyway so we'll use Wayland for Gnome only for now.

I.e. MOZ_ENABLE_WAYLAND=1 will also get set unconditionally, without e.g. checking for $XDG_SESSION_TYPE. That will probably also break hard soon then. I suppose it would make sense to check in firefox IsWaylandDisabled() for $XDG_SESSION_TYPE and, if set, only enable it if its value is wayland.

By the way: we should IMO rename IsWaylandDisabled() to IsWaylandEnabled() - inverted logic is usually not a good idea.

Yes, we can rename IsWaylandDisabled() to IsWaylandEnabled() and also check Wayand availability at IsWaylandEnabled() do make sure when IsWaylandEnabled() returns true we're really using Wayland.

Flags: needinfo?(stransky)

Martin, following up on this: until we have a check in FF I think the fedora launch script should get adopted from

if [ "$XDG_CURRENT_DESKTOP" == "GNOME" ]; then
  export MOZ_ENABLE_WAYLAND=1
fi

to something like

if [ "$XDG_CURRENT_DESKTOP" == "GNOME" ] && [ "$XDG_SESSION_TYPE" == "wayland" ]; then
  export MOZ_ENABLE_WAYLAND=1
fi

Otherwise glxtest will fail in X11 sessions once this patch lands. And even without this patch there are likely some subtle differences that could have negative effects such as in https://searchfox.org/mozilla-central/source/dom/ipc/BrowserChild.cpp#2174

Flags: needinfo?(stransky)
See Also: → 1692024

I think it will be fixed by Bug 1695453, right?

Flags: needinfo?(stransky)

I'll update the launch script anyway, Thanks.

Attachment #9203044 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #9203604 - Attachment is obsolete: true
Regressions: 1730856
No longer regressions: 1730856
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: