Closed Bug 1480755 Opened 6 years ago Closed 6 years ago

No accelerated WebGL rendering with current Mesa (git)

Categories

(Core :: Security: Process Sandboxing, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox61 --- wontfix
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: haagch+ff, Assigned: gcp, NeedInfo)

References

Details

(Keywords: nightly-community, regression, Whiteboard: sb+)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID: 20180802100128

Steps to reproduce:

I'm currently on 63.0a1 (2018-08-02) and using latest mesa radeonsi git on an RX 480.

It's been an issue for a couple of days but I don't know when exactly it started. Mostly because it was masked by another issue which was just fixed with this patch:
https://lists.freedesktop.org/archives/mesa-dev/2018-July/201308.html

The problem I'm reporting here is the issue he describes *after* his fix, which looks like this:

libGL error: MESA-LOADER: failed to retrieve device information
libGL error: unable to load driver: amdgpu_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: amdgpu

This happens when trying to open a website using webgl like get.webgl.org or maps.google.com



Expected results:

I tracked it down to this call which fails with errno 13 "permission denied":
https://gitlab.freedesktop.org/mesa/drm/blob/87fdbfb62fb3de6759d465d07cc13f922084694e/xf86drm.c#L3007

The input path is /sys/dev/char/226:128/device which is a symlink:
/sys/dev/char/226:128/device -> ../../../0000:08:00.0
which realpath() is supposed to resolve to /sys/devices/pci0000:00/0000:00:03.1/0000:08:00.0
but it can't be resolved because of said permission denied error.

Turns out this is not handled very gracefully in libdrm or mesa and you get the not very helpful libGL error message above.

The temporary fix is going to about:config, searching security.sandbox.content.read _path_whitelist and adding /sys/. You can probably do it with a bit more fine grained whitelist though.
I isolated this same issue to the libdrm changes with an Intel Kabylake system finding that /sys/ permission denied as well.  

Sandbox: Recording mapping /sys/bus/pci ->
/sys/dev/char/226:128/device/subsystem
Sandbox: SandboxBroker: denied op=stat rflags=400000 perms=0 path=/sys
for pid=9987
Sandbox: Failed errno -13 op stat flags 0400000 path /sys

It comes from this commit.

https://cgit.freedesktop.org/mesa/drm/commit/?id=a02900133b32dd4a7d6da4966f455ab337e80dfc

Same issue, just a different Mesa driver.

Symlink input path of /sys/dev/char/226:128/device:
/sys/dev/char/226:128/device -> ../../../0000:00:02.0/ and that realpath() is 
supposed to resolve to /sys/devices/pci0000:00/0000:00:02.0/
and that doesn't complete either.

Sean
The issue affects also Firefox 61.0.1, not just nightly.

It might be worth revisiting the changes from this commit:

https://hg.mozilla.org/releases/mozilla-beta/rev/4ef3cb563636

to resolve this issue:

https://bugzilla.mozilla.org/show_bug.cgi?id=1416016
Assignee: nobody → gpascutto
Status: UNCONFIRMED → NEW
Component: Untriaged → Security: Process Sandboxing
Ever confirmed: true
Product: Firefox → Core
Whiteboard: sb?
Version: 63 Branch → Trunk
Priority: -- → P1
Whiteboard: sb? → sb+
>The issue affects also Firefox 61.0.1, not just nightly.

It is caused by changes in how the Mesa driver accesses the hardware, so it will affect all Firefox versions with sandboxing.

I can reproduce this. What's notable is that Mesa (at least in my setup) gracefully falls back to swrast, which works well enough that we'll still get WebGL1+2 support. But of course, not accelerated by the GPU. Now, when we display the information in about:support, we are reporting what we detect in the parent. But that may not necessarily be what ends up happening in the content processes, especially with this Mesa fallback. So in my case about:support claims Radeon RX 560 DRM, etc, whereas a content process actually sees llvmpipe (LLVM 6.0, 256 bits).

I don't like the lies and silent breakage that are implicit here. There may be a chance of detecting when the GL stack in the client is llvmpipe and it is not in the parent, and using this to modify what about:support reports. But admittedly, it's a very specific case, though also one of the most common Linux setups. Probably not worth working around as it'll be obsoleted by remote WebGL.
Summary: sandbox breaks rendering with mesa → No accelerated WebGL rendering with current Mesa (git)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #5)
> I don't like the lies and silent breakage that are implicit here. There may
> be a chance of detecting when the GL stack in the client is llvmpipe and it
> is not in the parent, and using this to modify what about:support reports.

GLX is probed in a forked process (in case the driver crashes or hangs; search for "glxtest"), so it should be relatively easy to apply the content sandbox policy to it.
What I'm seeing is that Mesa probes /sys/dev/char/226:0/device/subsystem and readlinks it to /sys/bus/pci. Based on this it determines the type of GPU (https://gitlab.freedesktop.org/mesa/drm/blob/87fdbfb62fb3de6759d465d07cc13f922084694e/xf86drm.c#L2967). We correctly deal with this and remember that /sys/bus/pci maps to a subdir of the /sys/dev/char/226 node.

But then Mesa tries to stat /sys, which it has no permissions on. We might need to cover that with some AddAncestors incantation, but I need to find what Mesa is actually trying to probe.
MozReview-Commit-ID: CD9ATGHUOZ1
Comment on attachment 9002850 [details]
Bug 1480755 - Add support for new Mesa device probing. r?jld

Jed Davis [:jld] (⏰UTC-6) has approved the revision.
Attachment #9002850 - Flags: review+
Pushed by gpascutto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8a34342120f1
Add support for new Mesa device probing. r=jld
https://hg.mozilla.org/mozilla-central/rev/8a34342120f1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
IIUC, this affects ESR60+? Is this something we need to consider backporting? It grafts cleanly, so go ahead and nominate if the answer is yes.
Flags: qe-verify+
Flags: needinfo?(gpascutto)
>IIUC, this affects ESR60+? Is this something we need to consider backporting?

I would hope that the distros that won't take Firefox updates also aren't the ones going to be to running very recent Mesa versions? That said, this is a very risk free patch.

I would need to understand exactly what is affected, but I think it's only Arch/Manjaro (who run very recent stuff) and maybe Fedora Core 28 so far? And those won't use Firefox ESR.
Flags: needinfo?(gpascutto)
Comment on attachment 9002850 [details]
Bug 1480755 - Add support for new Mesa device probing. r?jld

Approval Request Comment
[Feature/Bug causing the regression]: Upgrade of Mesa on Linux distros. Afffects all recent Firefoxes.
[User impact if declined]: No WebGL or not accelerated.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Extends a whitelist.
[String changes made/needed]:
Attachment #9002850 - Flags: approval-mozilla-beta?
Comment on attachment 9002850 [details]
Bug 1480755 - Add support for new Mesa device probing. r?jld

Improves Firefox support for newer Mesa versions on Linux. Approved for 62.0b20.
Attachment #9002850 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hello all ,

I have been trying to reproduce this issue with the affected build both from when this issue was submitted(2018-08-03) and from where the duplicated 1483110 issue was created.

I have tried it on Ubuntu 18.04 and Ubuntu 16.04. I have tried to reproduce this issue with Mesa drivers 18.04, 18.1.1 , 18.1.6 and still no success. 

Can you please give us a little bit more information on how to reproduce this issue? 

Platform version,Mesa driver version, even graphical card on which this issue can be reproduced if possible.

Thank you in advance!
Flags: needinfo?(gpascutto)
I believe this issue occurs on operating systems with a rolling release strategy.

In the duplicate bug 1483110 user "Stebs" mentions:

> FWIW, among the update yesterday was also glibc 2.27-3 -> 2.28-4 and libglvnd 1.0.0-1 -> 1.1.0-1

It may have something to do with that.

I included my about:support output in the duplicate bug as well. I believe this has the required driver version information.
I'm on Debian Testing and see the issue:

> ⋊> ~ apt-cache policy firefox                                           10:00:19
> firefox:
>   Installed: 61.0.1-1
>   Candidate: 61.0.1-1
>   Version table:
>  *** 61.0.1-1 100
>         100 http://ftp.us.debian.org/debian unstable/main amd64 Packages
>         100 /var/lib/dpkg/status
> ⋊> ~ apt-cache policy libgl1-mesa-glx                                   10:00:25
> libgl1-mesa-glx:
>   Installed: 18.1.6-1
>   Candidate: 18.1.6-1
>   Version table:
>  *** 18.1.6-1 500
>         500 http://ftp.us.debian.org/debian testing/main amd64 Packages
>         100 http://ftp.us.debian.org/debian unstable/main amd64 Packages
>         100 /var/lib/dpkg/status
> ⋊> ~ uname -a                                                           10:00:59
> Linux raven 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux
> ⋊> ~ lspci | grep -i VGA                                                                                                10:01:04
> 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

I have applied the workaround and it works:
> security.sandbox.content.read_path_whitelist;/sys/
On fedora 28 with testing repository enabled and see this issue.
libglvnd-1.1.0-1
mesa 18.1.3

IIRC it was the libglvnd update that brought up the issue.

lspci | grep -i VGA 
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)

Workaround from comment above works, thought.
Hello,

I have installed fedora 28 as a virtual machine and attempted to install mesa 18.1.3 and activate the testing repository without any success at all. Each inputted code in the terminal returns errors or does nothing at all. 

:haagch is there any way for you to check if this issue is still present in the latest Nightly and Beta builds?
Flags: needinfo?(haagch+ff)
:vlucaci Fedora uses a Wayland session by default, which is sometimes still tricky on virtual machines. You can try setting an Xorg session in GDM
AIUI this libdrm commit https://gitlab.freedesktop.org/mesa/drm/commit/bcb9d976cd91c018aa4eef13563813288984601f should address the issue from their side.
>I have installed fedora 28 as a virtual machine

Given that this bug is about GPU access, I'm not sure it can be tested unless you have some VM setup that can do full GPU pass-through.
Flags: needinfo?(gpascutto)
>Platform version,Mesa driver version, even graphical card on which this issue can be reproduced if possible.

Affected cards would be AMD and Intel (integrated) graphics. It don't fully understand how the Mesa version factors in (even though libdrm seems to be part of that).
For reference:

This is the Mesa bug: https://bugs.freedesktop.org/show_bug.cgi?id=107516

Chromium just did the same fix: https://chromium.googlesource.com/chromium/src/+/8655d49f657d3878c937f1387b3d31fa66c8e76a%5E%21/content/gpu/gpu_sandbox_hook_linux.cc (but given that it didn't seem to affect their users, I'm not sure they ship with that sandbox enabled).
Hello ,

So I tried reproducing this issue with both VMWare PRO 14 and Oracle Virtual Box with full GPU pass-through.

Unfortunately , I was unable to properly install Fedora as at the end of the installation process, when I should normally reboot the machine, I was taken right back at the Welcome to Fedora installation method selector. Because of this I did not have any means to properly activate the full GPU pass-through.

Thus I have tried to reproduce this issue with the Fedora running from Live Media. 

Firstly I have attempted to reproduce this issue without the MESA drivers , only with 3D Acceleration toggled to on on the affected 63.0a1 build and did not manage to reproduce the issue.

After that I have attempted to reproduce this issue with the MESA drivers 18.1.3 alongside the 3D Acceleration toggled to on on the affected 63.0a1 build and did manage to partially reproduce the issue on http://webglreport.com/ and https://get.webgl.org/.
 
For the latter, the animation was displayed , however it was running extremely coarse and I did have the following errors in the WebConsole:

`libGL error: MESA-LOADER: failed to retrieve device information
libGL error: driver pointer missing`

With the same MESA drivers and 3D acceleration turned to on, I have tried to reproduce with the latest Nightly 63 and the 62 release on the same webpages.

This time https://get.webgl.org/ animation was not running coarse, and did not have any errors inside the WebConsole on both builds.

However , I am uncertain how much my findings would actually help you guys, seeing as how I was unable to activate the full GPU pass-through because Fedora would not properly install.

Hardware used to test this issue: AMD FX-8320 (2.5 GHz) ,ATI Radeon HD 5450 (1GB) , 16 GB RAM

In another note , studying the https://bugs.freedesktop.org/show_bug.cgi?id=107516 you have me, I see that the issue was reported there for Ubuntu 16.04.

So I have redirected my attention to that OS, but this time I no longer had the above described hardware specs and did not run the OS on a VM, but instead on a multi-boot PC with lesser GPU capabilities (AMD FX 8320 3.5GHz, 16GB RAM, ATI Radeon 3000_integrated).

On this machine , with the same MESA drivers as above, I was able to pull back the same behavior on https://get.webgl.org/ as in Fedora on the affected FF build(including the WebConsole errors)and the same behavior on the latest Nightly 63 and the 62 release.

However, seeing as how I did not manage to create the exact same environment as described by the Reporter, I am reticent to confirm this issue as verified fixed. 

Hope all this helps.
Flags: qe-verify+

Firefox 75.0, Linux 4.19.117-1-MANJARO, Mesa 20.0.4, A8-7600 (Kaveri) using amdgpu kernel module, this bug is still alive. Whitelist workaround works.

You're probably seeing bug 1623885 rather than this particular bug which was fixed 2 years ago.

(In reply to Julien Cristau [:jcristau] from comment #34)

You're probably seeing bug 1623885 rather than this particular bug which was fixed 2 years ago.

Indeed, thanks for clarification. The only difference is that at least under Manjaro webgl still works, just via llvmpipe - webglreport shows that, while other things just work with worse performance. about:support shows hosts renderer, not llvmpipe. Launching from terminal shows errors, obviously, but regular users may have trouble tracking such issues down since about:support basically isn't telling the whole truth.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: