RLBox - Safely load dynamic library dealing with content sandbox policies
Categories
(Core :: Security: Process Sandboxing, enhancement, P3)
Tracking
()
People
(Reporter: shravanrn, Assigned: froydnj)
References
Details
Attachments
(5 files)
Project description - See Bug 1554268
RLBox modifies Firefox to use wasm sandboxed versions of third party or otherwise risky libraries. The first target is libGraphite (font library), which is used in libThebes.
Currently, wasm sandboxed libraries must be compiled and used as shared object binaries, that must be dynamically loaded and thus cannot be statically or dynamically linked.
To this end, the code in libThebes must be able to dynamically load (dlopen) the shared object. However, since libThebes is in a content process, the content sandbox prevents this.
I'm hoping to get some info on the best options to get past this. Is there a way to allow read only to a set of some safe files from the content process?
Perhaps the content sandbox already permits access to some files in the Firefox binary folder?
Any information or examples of how to get past the content sandbox for this specific use case would be great.
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
Have you encountered this problem on specific platforms? (I added :haik and :jld for OSX/Linux.)
On Windows, I don't believe we have any current restriction that prevents this from occurring in the Content Process; but we want to in the future so it may make sense to future-proof it. Presently, our best approach for doing so is to preload the library; that is, LoadLibrary() it before we lower the sandbox. Later on, a subsequent call to LoadLibrary will be a no-op, but will successfully return the handle of the already-loaded library.
Comment 2•5 years ago
|
||
On Linux, there are no restrictions on filesystem access before the sandbox start point (e.g., ContentChild::RecvSetProcessSandbox
). After the sandbox is started, content processes can continue to load libraries; GMP and RDD processes cannot.
In general, the advice for this is to load the library earlier (assuming the library itself is from a trusted source); see the Windows GMP code for an example.
Note that the Mac sandbox is started relatively early, so that might be causing problems there.
Comment 3•5 years ago
|
||
On Mac, once the content sandbox has started, libraries from within the main Firefox.app should still be able to be loaded.
Is there a way to allow read only to a set of some safe files from the content process?
Yes, there is a way to do this if the libraries can't be delivered in the .app bundle.
If you could provide some more details about where you're loading from now and what failures you're seeing, that would help.
Reporter | ||
Comment 4•5 years ago
•
|
||
Thanks a bunch for the info! So from your comments, I believe one option is that we should perform an early load of the library, prior to sandbox creation. I think this makes sense, and I will investigate this option.
I think the above solution makes sense for the current example of sandboxing the font library. In the future when we move to sandboxing rarely used libraries, I think we would not necessarily want to preload sandboxed libraries. For instance, if we sandbox a library that handles a rarely used image format, we want to defer loading until we hit a page that has an image in this format.
On Linux ... content processes can continue to load libraries
On Mac ... libraries from within the main Firefox.app should still be able to be loaded
On Windows, our best approach... is to preload the library
From your comments, above, it seems that deferred loading is possible at least on Linux/Mac. So it sounds like dlopen
would work.
Question(Linux/Mac) - Does this mean that syscalls like open
would also continue to work?
On Windows, restriction ... in the Content Process ... in the future
Question(Win) - Once this is enabled, would a LoadLibrary of a non pre-loaded dll be permitted, at least for dlls in the Firefox binary folder? We can definitely punt on this question if the details of the Windows content sandbox are unclear at this time.
Have you encountered this problem on specific platforms?
Sorry, I should have mentioned this earlier. The initial plan for wasm sandboxing is Linux 64. So my testing and this bug description is for Linux 64. However the goal is to keep this general, so that we can expand to other platforms. Mac 64 is a relatively easy next step in the roadmap. Win64 is a longer term goal as Lucet does not currently support it.
Yes, there is a way to do this if the libraries can't be delivered in the .app bundle. If you could provide some more details about where you're loading from now and what failures you're seeing, that would help.
Ok, this makes sense now. The current prototype is a proof of concept and is a bit hacky - it would load from a path outside the Firefox binary folder/app bundle. After cleanup, I am hoping to set it up so that the load is from the same folder (app bundle?) as libxul.
Question(Mac) - From your comments, it seems that this deferred library load would be permitted?
Caveat - I am not very familiar with the app bundles on Mac, so I apologize if the above description is flawed.
Follow up investigation from my side - The code that dynamically loads the wasm sandbox (RLBox_Lucet + a customized Lucet compiler runtime), uses 2 file related system calls prior to loading
- dlopen/LoadLibrary
- File open, read, close syscalls - to parse some additional metadata from the shared library file that is not included in the dlopen above
I will check to see if we can eliminate #2 above, in case that helps simplify working with the content sandbox.
Reporter | ||
Updated•5 years ago
|
Comment 5•5 years ago
|
||
(In reply to Shravan Narayan from comment #4)
In the future when we move to sandboxing rarely used libraries, I think we would not necessarily want to preload sandboxed libraries. For instance, if we sandbox a library that handles a rarely used image format, we want to defer loading until we hit a page that has an image in this format.
Question(Win) - Once this is enabled, would a LoadLibrary of a non pre-loaded dll be permitted, at least for dlls in the Firefox binary folder? We can definitely punt on this question if the details of the Windows content sandbox are unclear at this time.
No; there are several restrictions in place here that would prevent it. The sandboxed process can't open the file to load it - because it can't open any files. Additionally, any dll not signed by Microsoft would be prevented from loading.
Preloading the library may be disadvantageous for performance reasons; but I'm not worried about it otherwise. It's possible that by the time we got the Windows sandbox restricted in this way we could have a solution that maps libraries into the process' address space from the parent.
(Also, not a big deal; but tom@ritter.vg is an old/person bugzilla account; better to use tom@mozilla)
Comment 6•5 years ago
|
||
Question(Linux/Mac) - Does this mean that syscalls like open would also continue to work?
On Mac, open syscalls are still permitted, but it depends on the path. We have SandboxPolicyContent.h in the tree which uses a scheme program to specify the sandbox rules and you can get an idea of what is allowed from that.
After cleanup, I am hoping to set it up so that the load is from the same folder (app bundle?) as libxul.
Question(Mac) - From your comments, it seems that this deferred library load would be permitted?
Caveat - I am not very familiar with the app bundles on Mac, so I apologize if the above description is flawed.
For Mac, that sounds fine and there are several other .dylibs in the same directory (Firefox.app/Contents/MacOS). This library would be like other libraries we ship and be signed by Mozilla.
Comment 7•5 years ago
|
||
(In reply to Shravan Narayan from comment #4)
Question(Linux/Mac) - Does this mean that syscalls like
open
would also continue to work?
Strictly speaking, in the Linux content sandbox the open
syscall doesn't work: it raises SIGSYS
, the signal handler communicates with an unsandboxed broker, and the register state is modified to reflect the simulated syscall return value; the real open
syscall can't be invoked.
But from the caller's point of view, it's similar to the Mac answer above: open
appears to work, but restricted by a policy. The policy is constructed in the awkwardly named SandboxBrokerPolicyFactory.cpp
; note the parser for ld.so.conf
, so that we can allow loading system libraries, but also the section with NS_GRE_DIR
which should cover anything that's shipped as part of Firefox.
Setting MOZ_SANDBOX_LOGGING
in the environment may be helpful for understanding what's going on.
Reporter | ||
Comment 8•5 years ago
|
||
Thanks a bunch to everyone for the info! Will investigate the above options and keep this thread updated.
Reporter | ||
Comment 9•5 years ago
|
||
Hmm... Seeing some super odd behaviors here. Hoping to get some help with below
I am currently testing Linux 64. The dynamic library being loaded is libGraphite. libGraphite.so
is now output in the same directory as libxul.so
as part of the build. libGraphite.so
is also mentioned in package-manifest.in, so that it looks like a normal binary shipped with Firefox.
I start by calling dlopen on libGraphite.so
prior to enabling of the content sandbox, and then trying to dlopen libGraphite.so
at the relevant location. For preloading, I dlopen'd the library here.
Question - is this a good location for the preload. I'm trying to make sure there are no races, or this isn't too late etc.
This seems to work when running Firefox say using ./mach run layout/reftests/text/graphite-01.html
. However, the second dlopen seems to be failing on mach tests... For instance running ./mach test layout/reftests/text/graphite-01.html
crashes with content sandbox violations.
To pin down what is happening, I enabled export MOZ_SANDBOX_LOGGING=1
. Both mach run
and mach test
produce
0:00.97 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:00.97 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:08.80 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:08.80 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:47.68 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:47.68 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:55.63 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:55.63 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
After this mach run
outputs
Sandbox: Recording mapping /home/USER/mozilla-central/libGraphite.so -> /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libGraphite.so
while mach test
outputs
1:23.59 GECKO(2838) Sandbox: SandboxBroker: denied op=stat rflags=400000 perms=0 path=/home for pid=3001
1:23.59 GECKO(2838) Sandbox: Failed errno -13 op stat flags 0400000 path /home
Question - Do mach tests have different policies on content sandbox? Is there any reason the above works for mach run
but doesn't work for mach test
?
Syscalls
I also wanted to be more complete about the set of syscalls used. I have removed the need for the open syscall from the code. So we just use dlopen. However, there are also some calls to the RUST API cannonicalize to standardize file paths... According to the docs this is equivalent to realpath
which I imagine has some syscalls to resolve symlinks...
Updated•5 years ago
|
Comment 10•5 years ago
|
||
I start by calling dlopen on libGraphite.so prior to enabling of the content sandbox, and then trying to dlopen libGraphite.so at the relevant location. For preloading, I dlopen'd the library here.
Note that ContentParent runs in the parent/chrome process. This is probably not what you want?
Do mach tests have different policies on content sandbox?
No. There are some differences between running a normal (packaged) Firefox and running the build output. If you're running the build output, the entire source repo gets explicitly whitelisted, because some JS stuff doesn't correctly copy its files into the objdir but leaves them in random places in the filesystem. See for example https://searchfox.org/mozilla-central/source/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#427
I'm not sure if this is different for mach run vs mach test. Looking at MOZ_SANDBOX_LOGGING output will tell you if the expected paths were whitelisted.
Sandbox: Recording mapping
FYI what's happening here is that the sandbox broker notices that this library isn't actually in $OBJDIR/dist/bin but that it's a symlink to another location. It will remember this so it knows that opening the library in the original path is actually OK.
Is there any reason the above works for mach run but doesn't work for mach test?
There's nothing relevant in the excerpts you posted related to opening libGraphite.so. Or maybe there is, but we can't tell from so little info.
However, there are also some calls to the RUST API cannonicalize to standardize file paths... According to the docs this is equivalent to realpath which I imagine has some syscalls to resolve symlinks...
Yes, it will stat/lstat() paths along the way. These are brokered, and the broker will refuse access to some paths because they are outside the allowed paths. This is likely what you saw in the log snippet you posted, i.e. the stat() call on /home is refused.
Comment 11•5 years ago
|
||
FWIW, uses of nsIFile::Normalize(), which is the platform-independent way of calling realpath() and likely the same as canonicalize(), typically have their return values ignored in the content process: https://searchfox.org/mozilla-central/search?q=symbol:_ZN11nsLocalFile9NormalizeEv%2C_ZN7nsIFile9NormalizeEv&redirect=false
Note for example this code: https://searchfox.org/mozilla-central/source/dom/media/gmp/GMPChild.cpp#147 which shows that it's best effort but continues even if it fails.
Reporter | ||
Comment 12•5 years ago
•
|
||
Note that ContentParent runs in the parent/chrome process. This is probably not what you want?
Oh you're right... that definitely seems like the wrong place. Would you have any thoughts on the appropriate location in the code to do this preloading? I'm hoping to find the place in source which runs in the content process prior to the content sandbox enforcement.
There are some differences between running a normal (packaged) Firefox and running the build output.
This is good to know! For the moment, we are still in the phase where we are testing on build output only. I will look definitely into this when we start testing the packaged builds.
I'm not sure if this is different for mach run vs mach test. Looking at MOZ_SANDBOX_LOGGING output will tell you if the expected paths were whitelisted.
Thanks, I will try this out.
typically have their return values ignored in the content process
Unfortunately the call to canonicalize is in third party rust code, which does abort on failure... From your description, it seems possible that this call to canonicalize may be one of the sources of problems for mach test
... I will investigate and check if this needs to be removed.
Comment 13•5 years ago
|
||
(In reply to Shravan Narayan from comment #12)
Note that ContentParent runs in the parent/chrome process. This is probably not what you want?
Oh you're right... that definitely seems like the wrong place. Would you have any thoughts on the appropriate location in the code to do this preloading? I'm hoping to find the place in source which runs in the content process prior to the content sandbox enforcement.
Trace your way backwards from https://searchfox.org/mozilla-central/rev/e5327b05c822cdac24e233afa37d72c0552dbbaf/security/sandbox/linux/Sandbox.cpp#521 which turns the sandbox on and you'll find the code that happens right before it
Updated•5 years ago
|
Reporter | ||
Comment 14•5 years ago
•
|
||
Hmm... I'm still having a lot of trouble with this. In short mach run
works, but mach test
does not...
Description
- I am preloading the dynamic library in ContentChild.cpp
- The only call is to dlopen, there are no calls to APIs like canonicalize (realpath)
Would anyone have any approaches to debugging this?
What I have tried
- I have already looked through the logs of
mach run
andmach test
with MOZ_SANDBOX_LOGGING enabled but this is not illuminating...mach run
runs successfully, butmach test
fails and produces the following log statements just prior to the crash
1:26.13 GECKO(24069) Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 path=/dev/null for pid=24240
1:26.13 GECKO(24069) Sandbox: Failed errno -13 op open flags 02000000 path /dev/null
1:26.13 GECKO(24069) Hit MOZ_CRASH(failed to open /dev/null: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }) at src/libcore/result.rs:999
There is no mention of '/dev/null' in the logs of mach run
- I tried attaching the debugger to
mach test
(both gdb and rr), but this doesn't seem to be supported
Questions
- froydnj: Is there a way to hook the rr debugger on to
mach test
... I can try to trace the source of the code that needs/dev/null
- gcp, tjr: Is there any other way to investigate this other than the above? Any other logging mechanism? Alternately rather than preloading, should we be considering explicitly whitelisting the path to the dynamic library? Also, if it helps I can attach the logs of
mach run
andmach test
Thoughts welcome!
Comment 15•5 years ago
|
||
Alternately rather than preloading, should we be considering explicitly whitelisting the path to the dynamic library?
No, we already established that the build process puts it in a whitelisted directory, and if you preload it then the broker doesn't even come into play wrt to that. So I don't see what doing that would achieve.
You should figure out what is trying to open /dev/null. If that's code that is hard to modify, we can whitelist /dev/null. But removing the access would be even better, because there's no sane reason anything would need it and the less access, the better...
Whatever code is doing that access seems to be Rust code...
Comment 16•5 years ago
|
||
I'm going to take a wild guess that this is the offending code:
https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/process/process_common.rs#L347
Updated•5 years ago
|
Reporter | ||
Comment 17•5 years ago
|
||
gcp: oh nice find! That looks plausible. Will investigate.
Reporter | ||
Comment 18•5 years ago
|
||
So, it wasn't that particular piece of code, but I did identify where we rely on /dev/null
. This comes from an upstream dependency "lucet" we rely on for library sandboxing. We do maintain a fork of lucet for our purposes, so I can remove the offending code.
That said, the inconsistency of mach run
allowing /dev/null
and mach test
not doing so is a bit strange. I'm wondering if this behavior is a separate content sandbox bug
Further
- when I look at the logs of
mach run
/dev/null is never mentioned - when I look at SandboxBrokerPolicyFactory.cpp,
/dev/null
is not mentioned - when I look at list of open files in Firefox (to check if
/dev/null
was opened before the sandbox was enabled),/dev/null
is not mentioned
However, the same code that uses /dev/null
runs without issue here... So, not super sure what's going on here... The only other explanations I can think of seem unlikely...
- somehow /dev/null is symlinked to something that is allowed, which was opened
- The source repo whitelist on when running from the build folder, somehow allow /dev/null
Any thoughts on whether I should open a new content sandbox bug to track this?
Comment 19•5 years ago
|
||
That said, the inconsistency of mach run allowing /dev/null and mach test not doing so is a bit strange. I'm wondering if this behavior is a separate content sandbox bug
Are you sure that this is actually what happens? As far as I can tell we don't allow /dev/null anywhere.
Any thoughts on whether I should open a new content sandbox bug to track this?
I see no evidence there is any issue with the content sandbox.
The only other explanations I can think of seem unlikely...
I can think of at least one other one: "lucet" or some other code is trying to open /dev/null when running tests but not when doing a normal run. For some reason.
Reporter | ||
Comment 20•5 years ago
|
||
"lucet" or some other code is trying to open /dev/null when running tests but not when doing a normal run. For some reason.
I was wondering about this as well and that was my initial theory. So I did some printf debugging to make sure the same code is executing... While it is the same code that's running, the results of the code here are confusing with how it could interact with the content sandbox. Given that you believe this is not a content sandbox issue, I will investigate the code in question (described below) more carefully.
As far as I can tell we don't allow /dev/null anywhere.
Good to know. This definitely seems to indicate its not a content sandbox issue. Let me investigate further from my side.
Also, here is a quick description of the code I'm looking at. I have some theories about what is going wrong. (Sharing here in case its something obvious that I'm missing :) )
Code that's executing
The lucet code executes these 2 functions
Briefly, these 2 functions
- Create 3 new file descriptors
fds[0]
,fds[1]
,fds[2]
- Open file
/dev/null
, duplicates the descriptor and assigns to all 3fds
- Reassign
fds
by duplicatingstdin
,stdout
andstderr
Possible execution paths for this code
- If the firefox process'
stdin
,stdout
andstderr
are left as default, the code would attempt to open/dev/null
briefly, but never actually read or write to it. - If
stdin
,stdout
andstderr
are/dev/null
during launch, this code would attempt to open/dev/null
, duplicate the file descriptor pointing to open/dev/null
and read from/write to it.
Investigations from my side
I am planning to check if one or both execution paths described above work. If only one works while the other fails, I will see if mach run
and mach test
cause different execution paths
Comment 21•5 years ago
|
||
Have you watched procfs and see where they point to during execution?
Comment 22•5 years ago
|
||
It would actually make a lot of sense that during tests the stdout/stderr/stdin are redirected, but not during a normal run.
Reporter | ||
Comment 23•5 years ago
|
||
I'm still investigating, but thought I'd post my progress
I did some checks on the procfs, by calling readlink
on file descriptors 0, 1 and 2 in the code
- For mach run,
stdin
,stdout
andstderr
point to the terminal. - For mach test,
stdin
points to the terminal,stdout
andstderr
are piped somewhere (readlink
returns strings that look like "pipe:[255774]" - I haven't yet figured out a way to decode what that is, as the id in the string is not a PID)
It seems like this could be the issue... Here are some of the next tests I tried
To see if "/dev/null" was the culprit, I did confirm that adding /dev/null
to SandboxBrokerPolicyFactory.cpp allows mach test
to run, so there is light at the end of the tunnel :)
To see if I can get mach run to crash with a similar message, I executed mach run with input and output pointing to "/dev/null" directly, with the below command. However, this worked fine without a crash.
2>/dev/null 1>/dev/null /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox layout/reftests/text/graphite-01.html -no-remote -profile /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/tmp/profile-default
Did the same with the output and error pointing to a log file on my Desktop (which is a path not permitted by the sandbox broker), this too works fine.
I am doing some follow up to see of calling dup
on the pipe is somehow the issue... AFAICT, the linux manpages don't explicitly disallow this, so not sure if this is the issue...
Reporter | ||
Comment 24•5 years ago
|
||
Confirmed that dup is not the issue - dup followed by write works correctly on the stdin, stdout and stderr file descriptors even in mach test
... At this point unless there is a way to attach rr on to mach test
, I think I have exhausted most other avenues/approaches to explore this...
To this end, I think it may be time to consider if we can allow read write access to /dev/null in the linux sandbox
gcp, tjr: Do you think this is an option? Since I'm unfamiliar with the process, I also wanted to ask if there are additional considerations i.e. does this require additional sign-offs, approvals or security reviews?
Comment 25•5 years ago
|
||
Not my call; probably gcp's or Jed's.
Comment 26•5 years ago
|
||
(In reply to Shravan Narayan from comment #23)
(
readlink
returns strings that look like "pipe:[255774]" - I haven't yet figured out a way to decode what that is, as the id in the string is not a PID)
That's an inode number. You can try to find the other end with something like ls -l /proc/*/fd/* 2>/dev/null | grep -w 255774
. (Both ends of a pipe use the same inode; contrast socketpairs, which have an adjacent pair of inode numbers. Credit for this trick goes to Chromium.)
But in this case, the question seems to be whether the WASI virtual fds 0-2 point to the same file as the real fds 0-2.
Comment 27•5 years ago
|
||
I don't see any problem with adding /dev/null to the policy. Right now we for sure allow much riskier stuff.
But I still think it is unneeded - may just not be worth the effort to work around/fix whatever funnies the rust stdlib does.
Given that it's only needed for mach test, how about using the whitelist preference (security.sandbox.content.read_path_whitelist) to inject it during tests?
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Adding /dev/null
shouldn't be a problem, given that we already allow /dev/urandom
(and allow inheriting tty fds).
But it would be good to know why this happens only with mach test
and not mach run
. According to comment #20 this isn't the Rust stdlib; it's this line, and there doesn't seem to be anything in that module that would construct a WasiCtx
without calling it (although that could be changed easily enough). If the sandbox is allowing access to a file and we don't know why, that's kind of a problem.
Instrumenting the broker to log when accesses are allowed (instead of just denials) might help, or using strace
, or adding a sandbox policy rule with the CRASH_INSTEAD
flag and backtracing in a debugger.
As for rr
, it should be possible to record the entire mach test
run, select the process of interest in rr ps
, and replay that process.
Reporter | ||
Comment 29•5 years ago
•
|
||
But in this case, the question seems to be whether the WASI virtual fds 0-2 point to the same file as the real fds 0-2.
This is the question I am currently investigating, although I should point out that I'm not fully sure of what is going on here yet, so this may stiil be a red herring...
may just not be worth the effort to work around/fix whatever funnies the rust stdlib does.
Yeah this is sort of the stage I'm approaching, given the debugger issues I am seeing (described below)
Given that it's only needed for mach test, how about using the whitelist preference
I am sort of worried about doing this... Clearly there is some sort of configuration that can get Firefox to run that causes this code to require /dev/null which is somehow triggerred in mach test. Given this, there is probably some way to get mach run
or stock firefox to also operate in this configuration... If some user does actually do this, this would cause a crash, so adding an exception to mach test
only feels like a bug waiting to happen
Fwiw, I have unsuccessfully tried to get firefox to crash with the same error in mach run
when
- by piping output and err to another program
- by redirecting output and err to /dev/null
- by redirecting output in C, with
dup2
+exec(firefox)
(in case C was needed to trigger the pipe) - by setting up pipes and executing firefox from a C program (in case C + pipes was needed to trigger the pipe)
All of the above cases work without crashing.
Re using rr
Unfortunately have run into a couple of blocking bugs when trying to use rr
- unlike mach run,
mach test
doesn't seem to use the "--debugger" flag (it allows it, but then ignores it) to set the debugger only on the Firefox process - if i try to record all of
mach test
, I run into an rr bug, where the replay just causes rr to crash with a backtrace (attached below if you are curious)...
I spoke to @froydnj offline, and he suggested I open bugs for both of these, but this does unfortunately make it harder to understand why mach run and mach test differ
Using gdb directly on mach test is also infeasibile as there are about 50 fork/exec's here before the content process is launched... Following the correct forks and execs to get to the right one would be too tedious
Next steps
Instrumenting the broker to log when accesses are allowed (instead of just denials) might help, or using strace, or adding a sandbox policy rule with the CRASH_INSTEAD flag and backtracing in a debugger.
- These are good suggestions... Will investigate!
That's an inode number. You can try to find the other end with ...
- Oh nice! will investigate the inode numbers to see if this sheds some light on why this is happening.
rr backtrace
[ERROR /build/rr-ViolGa/rr-5.1.0/src/Registers.cc:256:maybe_print_reg_mismatch()] r9 0x7ffcc7be38c0 != 0x7ffcc7be3ac0 (replaying vs. recorded)
[FATAL /build/rr-ViolGa/rr-5.1.0/src/Registers.cc:366:compare_register_files()]
(task 15563 (rec:13231) at time 24770)
-> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:216752702/216752713)
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x41)[0x561ed216ddd1]
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x5c5)[0x561ed20a0135]
rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x11e)[0x561ed20ad6be]
rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0xe3)[0x561ed2110a93]
rr(_ZN2rr10ReplayTask13validate_regsEj+0x26c)[0x561ed212a46c]
rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x575)[0x561ed2117d05]
rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xdb)[0x561ed211a73b]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x128)[0x561ed211b3b8]
rr(_ZN2rr14ReplayTimeline19replay_step_forwardENS_10RunCommandEl+0xdf)[0x561ed2132a5f]
rr(_ZN2rr9GdbServer12serve_replayERKNS0_15ConnectionFlagsE+0x81)[0x561ed209f281]
rr(+0xfdbcf)[0x561ed2111bcf]
rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x570)[0x561ed2112d40]
rr(main+0x1da)[0x561ed2051ffa]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f9019caab97]
rr(_start+0x2a)[0x561ed20521ca]
=== End rr backtrace
Reporter | ||
Comment 30•5 years ago
•
|
||
_
Reporter | ||
Comment 31•5 years ago
•
|
||
(Ignore the last attachment - hit send too early)
Reporter | ||
Comment 32•5 years ago
•
|
||
Aha, managed to figure out how to get gdb attached to the right process in mach run
and mach test
!
I can confirm that the thing causing the crash is indeed the lucet rust code... Further the crash has nothing to do with the calls to dup
or pipes. The crash definitely occurs here which is simply a call in rust to open("/dev/null"). When this line of code is executed in mach run
, the SYSSIG handler in SandboxBrokerClient.cpp is able to successfully request opening of /dev/null here (returns a descriptor that is not -1), while the same request is rejected for mach test (returns -1).
This also leads to a better framing of the issue I'm seeing. The question is not why mach test fails, but rather why does mach run pass? For the purposes of the code I am working on, the "correct" behavior should require explicitly allowing open and write access to /dev/null in the broker.
Barring any objections, I believe adding the exception seems to be right way forward. In the interest of scoping the work I'm doing - I do believe the fact that mach run allows this without the explicit rule feels like an orthogonal bug that may need separate investigation.
Here are a couple of other tests I did to see if I can get more info on why mach run
passes.
- Test: Added a policy of CRASH_INSTEAD on /dev/null
Result: this causes both mach run and mach test to fail on this line
Conclusion: apparently, without CRASH_INSTEAD,mach run
seems to permit /dev/null - Test: Tried opening /dev/null in C++ code
Result: this fails in both mach run and mach test
Conclusion: This result is very confusing. Somehow rust code can open /dev/null, but C++ code cannot?
For completeness, here are the results of the next steps from my previous post
- strace - did this. Wasn't too illuminating as both have calls to open dev/null... one succeeds the other one deosn't
- CRASH_INSTEAD flag - mach run and mach test fails on the rust code listed above, if this policy is included.
- inode number investigation - Wasn't the issue, as the crash is on the open, not on dup or the pipes...
gcp, jld: Please let me know if you have objections to the above conclusion i.e. adding the /dev/null exception + tracking any other part of this as a separate bug with different owners.
Comment 33•5 years ago
|
||
Added a policy of CRASH_INSTEAD on /dev/null
Result: this causes both mach run and mach test to fail on this line
Conclusion: apparently, without CRASH_INSTEAD, mach run seems to permit /dev/null
Do you have a backtrace here? The log you give above is for another file.
This result is very confusing. Somehow rust code can open /dev/null, but C++ code cannot?
This isn't really possible as far as we understand. For this to work the calls have to be either in a different process (i.e. a non-sandboxed one for rust) or before the sandbox is initialized.
gcp, jld: Please let me know if you have objections to the above conclusion i.e. adding the /dev/null exception + tracking any other part of this as a separate bug with different owners.
You can split the bug to do whatever you originally wanted to do here, and to deal with figuring out why /dev/null is being accessed (and to get rid of it again, potentially) in another bug. I have no idea why the second bug would have to have another owner, though :-)
When this line of code is executed in mach run, the SYSSIG handler in SandboxBrokerClient.cpp is able to successfully request opening of /dev/null here (returns a descriptor that is not -1), while the same request is rejected for mach test (returns -1).
A MOZ_SANDBOX_LOGGING=1 output of a run where that happens (i.e. it being allowed) might reveal a lot.
Comment 34•5 years ago
|
||
Is there some way for us to reproduce the testing that you are doing?
If necessary dump the WIP onto a github repo or something...
Reporter | ||
Comment 35•5 years ago
|
||
Reporter | ||
Comment 36•5 years ago
|
||
Reporter | ||
Comment 37•5 years ago
•
|
||
Do you have a backtrace here? The log you give above is for another file.
Have attached the 2 logs
- mach run with the current policy unchanged (i.e. no crash instead) - this allows the open of /dev/null
- mach run with the crash instead on /dev/null policy - this crashed on the open of /dev/null
In both logs, you can look for the string "!!!!!!!!!About to call the Graphite setup" which is printed just prior to the call to open "/dev/null". If the call to open /dev/null is successful in rust, the rust code prints "!!!!!!!!!!!!!!!!!!!!/dev/null opennig was ok" - this is present in the first log.
/dev/null is being accessed (and to get rid of it again, potentially) in another bug
Actually, we can't get rid of this for 2 reasons...
- My original confusion was actually because I thought I did not need a policy for open... I had thought that I needed a policy only on read and write. However, given that I need a policy for open as well, the only solution here is to add /dev/null
- We also do need write access for the library sandboxing use case. While we initially were Ok with redirecting output from the sandboxed library to the terminal, we subsequently found this is actually something that needs be context dependent or configurable... If we choose to disallow output to the terminal based on the config, this would mean redirecting all writes to /dev/null...
Comment 38•5 years ago
|
||
Thanks to the logs, I think I know what's going on here: the test is being loaded from a file:
URL in the mach run
case. File URLs are loaded in a dedicated content process which has read access to the entire filesystem (see FILE_REMOTE_TYPE
in the source, and the policy for /: 1 -> 35
line in the logs), and File::open
is a read-only open (despite being used for virtual stdout/stderr, which might be a bug).
Reporter | ||
Comment 39•5 years ago
|
||
File URLs are loaded in a dedicated content process which has read access to the entire filesystem
Oh, nice find! That is definitely it! The whole thing makes sense.
mach test
loads these files from a local test server, which is why it was failing- the upstream lucet rust code is opening /dev/null read only (which as you pointed out looks like a bug given that they are using this for writing)
- my C++ test of open /dev/null crashed even in mach run as in my test, I was opening with write access (which is what the upstream code should have been doing anyway)
Thanks so much for your help! This definitely clarifies everything.
I think just to summarize the long thread, I think the conclusions are
- default policy is that /dev/null is inaccessible by default (except when handling file urls, where the entire file system is read only by design)
- since we need read/write access to /dev/null, we need a sandbox broker policy here
- upstream lucet has a bug where they are opening /dev/null read only for some reason
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 40•5 years ago
|
||
Updated•5 years ago
|
Reporter | ||
Comment 41•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 42•5 years ago
|
||
Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eeaa7ecf70e3 part 2 - Allow RW access to /dev/null in content sandbox r=gcp
Comment 43•5 years ago
|
||
bugherder |
Comment 44•5 years ago
|
||
Sorry there's still Part 1 outstanding.
Updated•5 years ago
|
Comment 45•4 years ago
|
||
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0309a3ff69a0 part 1 - Preload RLBox dynamic library before content sandbox to allow subsequent loads r=froydnj
Comment 46•4 years ago
|
||
bugherder |
Comment 47•4 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Description
•