Open Bug 1748460 Opened 4 years ago Updated 1 year ago

Allow more syscalls for nvidia-vaapi-driver, possibly behind a pref

Categories

(Core :: Security: Process Sandboxing, enhancement, P2)

x86_64
Linux
enhancement

Tracking

()

People

(Reporter: rmader, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

412.16 KB, text/x-log
Details

There's a new VAAPI wrapper imlementation for Nvidia: https://github.com/elFarto/nvidia-vaapi-driver#firefox

It says it need the syscall 41,49,50,332 (socket, bind, listen, statx).

Jed, does that look sensible to you? It also says: "This is not recommended for general use as it reduces security" and indeed sounds somewhat dangerous.

Flags: needinfo?(jld)

I'll need to find out more about what's going on here. I really don't want to allow sockets if there's any way to avoid it; I see that connect isn't in the list, so it's possible that this isn't trivially broken, but with datagram sockets sendmsg can send to any destination even with an unconnected socket and that's also something I've tried to avoid allowing where possible. If it's a case of Nvidia blobs that try to use sockets but degrade gracefully if socket returns an error, then that's fine; we already have issues like that in content processes. We definitely won't allow statx with arbitrary arguments; currently we'll reject it with ENOSYS and ideally the caller would fall back to an older syscall which will be replaced with a file broker request, so I need to find out why that's not working. (Supporting statx directly in the broker would be possible, but it would be a nontrivial amount of complexity.)

Background:
At the moment, the project's readme suggests setting security.sandbox.content.level to 0.

jrmuizel recommended rather setting MOZ_DISABLE_RDD_SANDBOX=1 than disabling the content process sandbox:

even running in the rdd with the sandbox completely disabled wouldn't be the worst option in the world

definitely better than disabling the sandbox in the content processes

So we recommended media.rdd-ffmpeg.enabled=true + MOZ_DISABLE_RDD_SANDBOX=1 in this issue:
https://github.com/elFarto/nvidia-vaapi-driver/issues/6#issuecomment-1005630454

If the required syscalls could be behind a security.sandbox.rdd.nvidia-highly-experimental-vaapi pref, it would allow users of that project to not disable any sandbox.

Type: defect → enhancement
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Summary: Allow more syscalls for nvidia-vaapi-driver → Allow more syscalls for nvidia-vaapi-driver, possibly behind a pref
Depends on: 1749324
Severity: -- → S3
Priority: -- → P2
Depends on: 1771382
Depends on: 1787714
No longer depends on: 1787714
See Also: → 1830300
Flags: needinfo?(jld)

MOZ_DISABLE_RDD_SANDBOX=1 (and well setting media.hardware-video-decoding.force-enabled, because bug 1752494 blocked every single nvidia across the board) is everything that is required today.
This is the driver log when you adjust those variables

376322.287508944 [1100815-1100878] ../src/vabackend.c:2187       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
376322.287518143 [1100815-1100878] ../src/vabackend.c:2196       __vaDriverInit_1_0 Now have 0 (0 max) instances
376322.287523143 [1100815-1100878] ../src/vabackend.c:2222       __vaDriverInit_1_0 Selecting Direct backend
376322.295299379 [1100815-1100878] ../src/direct/nv-driver.c: 267            init_nvdriver Initing nvdriver...
376322.295330388 [1100815-1100878] ../src/direct/nv-driver.c: 285            init_nvdriver NVIDIA kernel driver version: 550.90.07, major version: 550, minor version: 90
376322.295338142 [1100815-1100878] ../src/direct/nv-driver.c: 292            init_nvdriver Got dev info: 100 1 2 6
376322.449570794 [1100815-1100878] ../src/vabackend.c:1445      nvQueryImageFormats In nvQueryImageFormats
376322.594721510 [1100815-1100878] ../src/vabackend.c: 674           nvCreateConfig got profile: 6 with 0 attributes
376322.594778409 [1100815-1100878] ../src/vabackend.c:1801 nvQuerySurfaceAttributes with 4 (8) (nil) 0

And this is it out of the box

377533.729002022 [1107094-1107164] ../src/vabackend.c: 155                     init CUDA ERROR 'OS call failed or operation not supported on this OS' (304)
377533.729025017 [1107094-1107164] ../src/vabackend.c:2174       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
377533.729027517 [1107094-1107164] ../src/vabackend.c:2183       __vaDriverInit_1_0 Now have 0 (0 max) instances
377533.729029675 [1107094-1107164] ../src/vabackend.c:2209       __vaDriverInit_1_0 Selecting Direct backend
377533.742221673 [1107094-1107164] ../src/direct/nv-driver.c: 267            init_nvdriver Initing nvdriver...
377533.742372975 [1107094-1107164] ../src/direct/nv-driver.c: 189          nv_get_versions nv_check_version failed: -1 25
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
[Child 1106989, MediaDecoderStateMachine #1] WARNING: Decoder=74378b6f5c00 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR (0x806e0005) - auto mozilla::MediaChangeMonitor::CreateDecoderAndInit(MediaRawData *)::(anonymous class)::operator()(const MediaResult &) const: Unable to create decoder: file /usr/src/debug/firefox/firefox-128.0/dom/media/MediaDecoderStateMachineBase.cpp:167

https://github.com/elFarto/nvidia-vaapi-driver/blob/v0.0.12/src/vabackend.c#L168
The long story short is that cuda cannot be initialized (not sure if the "move ffmpeg to gpu process" idea advanced in bug 1683808 couldn't help?)

(In reply to mirh from comment #4)

MOZ_DISABLE_RDD_SANDBOX=1 (and well setting media.hardware-video-decoding.force-enabled, because bug 1752494 blocked every single nvidia across the board) is everything that is required today.
This is the driver log when you adjust those variables

376322.287508944 [1100815-1100878] ../src/vabackend.c:2187       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
376322.287518143 [1100815-1100878] ../src/vabackend.c:2196       __vaDriverInit_1_0 Now have 0 (0 max) instances
376322.287523143 [1100815-1100878] ../src/vabackend.c:2222       __vaDriverInit_1_0 Selecting Direct backend
376322.295299379 [1100815-1100878] ../src/direct/nv-driver.c: 267            init_nvdriver Initing nvdriver...
376322.295330388 [1100815-1100878] ../src/direct/nv-driver.c: 285            init_nvdriver NVIDIA kernel driver version: 550.90.07, major version: 550, minor version: 90
376322.295338142 [1100815-1100878] ../src/direct/nv-driver.c: 292            init_nvdriver Got dev info: 100 1 2 6
376322.449570794 [1100815-1100878] ../src/vabackend.c:1445      nvQueryImageFormats In nvQueryImageFormats
376322.594721510 [1100815-1100878] ../src/vabackend.c: 674           nvCreateConfig got profile: 6 with 0 attributes
376322.594778409 [1100815-1100878] ../src/vabackend.c:1801 nvQuerySurfaceAttributes with 4 (8) (nil) 0

And this is it out of the box

377533.729002022 [1107094-1107164] ../src/vabackend.c: 155                     init CUDA ERROR 'OS call failed or operation not supported on this OS' (304)
377533.729025017 [1107094-1107164] ../src/vabackend.c:2174       __vaDriverInit_1_0 Initialising NVIDIA VA-API Driver: 31
377533.729027517 [1107094-1107164] ../src/vabackend.c:2183       __vaDriverInit_1_0 Now have 0 (0 max) instances
377533.729029675 [1107094-1107164] ../src/vabackend.c:2209       __vaDriverInit_1_0 Selecting Direct backend
377533.742221673 [1107094-1107164] ../src/direct/nv-driver.c: 267            init_nvdriver Initing nvdriver...
377533.742372975 [1107094-1107164] ../src/direct/nv-driver.c: 189          nv_get_versions nv_check_version failed: -1 25
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
[Child 1106989, MediaDecoderStateMachine #1] WARNING: Decoder=74378b6f5c00 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR (0x806e0005) - auto mozilla::MediaChangeMonitor::CreateDecoderAndInit(MediaRawData *)::(anonymous class)::operator()(const MediaResult &) const: Unable to create decoder: file /usr/src/debug/firefox/firefox-128.0/dom/media/MediaDecoderStateMachineBase.cpp:167

https://github.com/elFarto/nvidia-vaapi-driver/blob/v0.0.12/src/vabackend.c#L168
The long story short is that cuda cannot be initialized (not sure if the "move ffmpeg to gpu process" idea advanced in bug 1683808 couldn't help?)

(In reply to mirh from comment #104)

See also bug 1748460 probably

yes, i pinged the people in charge of that over matrix but got no answer

If you have hardware, you can now run with the profiler to collect sandbox infos, this landed a few weeks ago in nightly: https://firefox-source-docs.mozilla.org/tools/profiler/sandbox.html#recoding-sandbox-violations

With uptodate infos from the profiler we can likely work on the sandbox holes more easily

Flags: needinfo?(mirh)

After countless struggle.. (documentation doesn't tell you that the settings that are pre-filled in the button/toolbar doesn't actually work with MOZ_PROFILER_STARTUP which requires MOZ_PROFILER_STARTUP_FEATURES set with the *perftools-presets-debug details)

The great majority of threads only has this to report in the marker table (tens of thousands of times in just half a minute):

SandboxBrokerClient — SandboxBrokerClient  id 20409  op openrflags 0path /proc/13360/statmpath2 (empty)pid 13360
SandboxBrokerClient — SandboxBrokerClient  id 20410  op openrflags 591872path /proc/13360/taskpath2 (empty)pid 13360

Then, in one of the youtube sometimes you'll have this:

SandboxBrokerClient — SandboxBrokerClient  id 20413  op readlinkrflags 0path /proc/self/exepath2 (empty)pid 13360

Which is eventually followed by an incredible number of libraries read attempts (same parameters of the statm one): linux-vdso.so.1, ./libmozsandbox.so, /usr/lib/libdl.so.2 and /usr/lib/libstdc++.so.6 are just the few firsts but then the list goes on and on.

Flags: needinfo?(mirh)

You don't need to do it using MOZ_PROFILER_STARTUP, and please share the generated profile

So can you generate a profile and share it? no need for profiler startup, as long as you start the profiler on a fresh profile instance, once you load e.g. a youtube page and start a few seconds of video it should be enough to kick the RDD process and have the required information. Once you have it, share here, and we can iterate like we did on bug 1903688

Flags: needinfo?(mirh)
Flags: needinfo?(mirh)

Unfortunately, I dont see any RDD process here. Was it present in about:processes ? Its name should be "Remote Data Decoder" (it would be localized if you use a non english build)

No it's not there.
Though maybe this has something to do with its absence
https://crash-stats.mozilla.org/report/index/ee582eb5-392d-43a0-8c69-250a50240715

(In reply to mirh from comment #11)

No it's not there.
Though maybe this has something to do with its absence
https://crash-stats.mozilla.org/report/index/ee582eb5-392d-43a0-8c69-250a50240715

Unfortunately, hard to actionate. Could sandboxing be triggering a bug in nvidia's code? And we would be tricked because we loose profiler info from the crashing process.

We need to fallback to MOZ_SANDBOX_LOGGING=1 MOZ_SANDBOX_RDD_LOGGING=1 firefox 2>&1 | tee sandbox.log to investigate, unfortunately.

Flags: needinfo?(mirh)
Attached file sandbox.log

And this is from coredumpctl

>>> bt
#0  0x00007ff9a4b75e0b in __GI_____strtol_l_internal (nptr=nptr@entry=0x0, endptr=endptr@entry=0x0, base=base@entry=10, group=group@entry=0, 
    bin_cst=bin_cst@entry=true, loc=0x7ff9a4d0c3c0 <_nl_global_locale>) at ../stdlib/strtol_l.c:304
#1  0x00007ff9a4b75dbc in __GI___isoc23_strtol (nptr=nptr@entry=0x0, endptr=endptr@entry=0x0, base=base@entry=10) at ../stdlib/strtol.c:126
#2  0x00007ff98f76dda2 in atoi (__nptr=0x0) at /usr/include/stdlib.h:483
#3  init_nvdriver (context=context@entry=0x7ff99409fc00, drmFd=27) at ../src/direct/nv-driver.c:283
#4  0x00007ff98f76d551 in direct_initExporter (drv=0x7ff99409fb30) at ../src/direct/direct-export-buf.c:100
#5  0x00007ff98f77423c in __vaDriverInit_1_0 (ctx=0x7ff9a49303e0) at ../src/vabackend.c:2246
#6  0x00007ff99d3521a3 in vaInitialize () from /usr/lib/libva.so.2
#7  0x00007ff999d4ecb2 in mozilla::FFmpegVideoDecoder<46465650>::CreateVAAPIDeviceContext() () from /nightly-root/firefox/libxul.so
#8  0x00007ff999d4e4af in mozilla::FFmpegVideoDecoder<46465650>::InitVAAPIDecoder() () from /nightly-root/firefox/libxul.so
#9  0x00007ff999d4868c in mozilla::FFmpegVideoDecoder<46465650>::Init() () from /nightly-root/firefox/libxul.so
#10 0x00007ff999d20360 in mozilla::detail::ProxyFunctionRunnable<mozilla::MediaDataDecoderProxy::Init()::$_0, mozilla::MozPromise<mozilla::TrackInfo::TrackType, mozilla::MediaResult, true> >::Run() () from /nightly-root/firefox/libxul.so
#11 0x00007ff996a8c015 in mozilla::TaskQueue::Runner::Run() () from /nightly-root/firefox/libxul.so
Flags: needinfo?(mirh)

(one of the reason the profiler is more comfortable) can you confirm PID 119874 was the RDD process?

So far I dont see any https://searchfox.org/mozilla-central/rev/8c6edfe25c094e032a27722ef30f69555f556bf8/security/sandbox/linux/Sandbox.cpp#156-161 but there are several files denied close to tentatives to loading cuda on e.g., 119874 and a few others.

That's the one that returned the above gdb trace, yes.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: