1575985 - RLBox - Safely load dynamic library dealing with content sandbox policies

Reporter

Description

•

6 years ago

Project description - See Bug 1554268

RLBox modifies Firefox to use wasm sandboxed versions of third party or otherwise risky libraries. The first target is libGraphite (font library), which is used in libThebes.

Currently, wasm sandboxed libraries must be compiled and used as shared object binaries, that must be dynamically loaded and thus cannot be statically or dynamically linked.

To this end, the code in libThebes must be able to dynamically load (dlopen) the shared object. However, since libThebes is in a content process, the content sandbox prevents this.

I'm hoping to get some info on the best options to get past this. Is there a way to allow read only to a set of some safe files from the content process?

Perhaps the content sandbox already permits access to some files in the Firefox binary folder?

Any information or examples of how to get past the content sandbox for this specific use case would be great.

Shravan Narayan

Reporter

Updated

•

6 years ago

Flags: needinfo?(nfroyd)

Shravan Narayan

Reporter

Updated

•

6 years ago

Blocks: 1569370
No longer blocks: 1566238

Tom Ritter [:tjr] (OOTO until April)

Comment 1

•

6 years ago

Have you encountered this problem on specific platforms? (I added :haik and :jld for OSX/Linux.)

On Windows, I don't believe we have any current restriction that prevents this from occurring in the Content Process; but we want to in the future so it may make sense to future-proof it. Presently, our best approach for doing so is to preload the library; that is, LoadLibrary() it before we lower the sandbox. Later on, a subsequent call to LoadLibrary will be a no-op, but will successfully return the handle of the already-loaded library.

Component: General → Security: Process Sandboxing

Flags: needinfo?(nfroyd)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 2

•

6 years ago

On Linux, there are no restrictions on filesystem access before the sandbox start point (e.g., ContentChild::RecvSetProcessSandbox). After the sandbox is started, content processes can continue to load libraries; GMP and RDD processes cannot.

In general, the advice for this is to load the library earlier (assuming the library itself is from a trusted source); see the Windows GMP code for an example.

Note that the Mac sandbox is started relatively early, so that might be causing problems there.

Haik Aftandilian [:haik]

Comment 3

•

6 years ago

On Mac, once the content sandbox has started, libraries from within the main Firefox.app should still be able to be loaded.

Is there a way to allow read only to a set of some safe files from the content process?

Yes, there is a way to do this if the libraries can't be delivered in the .app bundle.

If you could provide some more details about where you're loading from now and what failures you're seeing, that would help.

Shravan Narayan

Reporter

Comment 4

•

6 years ago

•

Edited

Thanks a bunch for the info! So from your comments, I believe one option is that we should perform an early load of the library, prior to sandbox creation. I think this makes sense, and I will investigate this option.

I think the above solution makes sense for the current example of sandboxing the font library. In the future when we move to sandboxing rarely used libraries, I think we would not necessarily want to preload sandboxed libraries. For instance, if we sandbox a library that handles a rarely used image format, we want to defer loading until we hit a page that has an image in this format.

On Linux ... content processes can continue to load libraries
On Mac ... libraries from within the main Firefox.app should still be able to be loaded
On Windows, our best approach... is to preload the library

From your comments, above, it seems that deferred loading is possible at least on Linux/Mac. So it sounds like dlopen would work.
Question(Linux/Mac) - Does this mean that syscalls like open would also continue to work?

On Windows, restriction ... in the Content Process ... in the future

Question(Win) - Once this is enabled, would a LoadLibrary of a non pre-loaded dll be permitted, at least for dlls in the Firefox binary folder? We can definitely punt on this question if the details of the Windows content sandbox are unclear at this time.

Have you encountered this problem on specific platforms?

Sorry, I should have mentioned this earlier. The initial plan for wasm sandboxing is Linux 64. So my testing and this bug description is for Linux 64. However the goal is to keep this general, so that we can expand to other platforms. Mac 64 is a relatively easy next step in the roadmap. Win64 is a longer term goal as Lucet does not currently support it.

Yes, there is a way to do this if the libraries can't be delivered in the .app bundle. If you could provide some more details about where you're loading from now and what failures you're seeing, that would help.

Ok, this makes sense now. The current prototype is a proof of concept and is a bit hacky - it would load from a path outside the Firefox binary folder/app bundle. After cleanup, I am hoping to set it up so that the load is from the same folder (app bundle?) as libxul.
Question(Mac) - From your comments, it seems that this deferred library load would be permitted?
Caveat - I am not very familiar with the app bundles on Mac, so I apologize if the above description is flawed.

Follow up investigation from my side - The code that dynamically loads the wasm sandbox (RLBox_Lucet + a customized Lucet compiler runtime), uses 2 file related system calls prior to loading

dlopen/LoadLibrary
File open, read, close syscalls - to parse some additional metadata from the shared library file that is not included in the dlopen above

I will check to see if we can eliminate #2 above, in case that helps simplify working with the content sandbox.

Shravan Narayan

Reporter

Updated

•

6 years ago

Flags: needinfo?(tom)

Flags: needinfo?(jld)

Flags: needinfo?(haftandilian)

Tom Ritter [:tjr] (OOTO until April)

Comment 5

•

6 years ago

(In reply to Shravan Narayan from comment #4)

In the future when we move to sandboxing rarely used libraries, I think we would not necessarily want to preload sandboxed libraries. For instance, if we sandbox a library that handles a rarely used image format, we want to defer loading until we hit a page that has an image in this format.

Question(Win) - Once this is enabled, would a LoadLibrary of a non pre-loaded dll be permitted, at least for dlls in the Firefox binary folder? We can definitely punt on this question if the details of the Windows content sandbox are unclear at this time.

No; there are several restrictions in place here that would prevent it. The sandboxed process can't open the file to load it - because it can't open any files. Additionally, any dll not signed by Microsoft would be prevented from loading.

Preloading the library may be disadvantageous for performance reasons; but I'm not worried about it otherwise. It's possible that by the time we got the Windows sandbox restricted in this way we could have a solution that maps libraries into the process' address space from the parent.

(Also, not a big deal; but tom@ritter.vg is an old/person bugzilla account; better to use tom@mozilla)

Flags: needinfo?(tom)

Haik Aftandilian [:haik]

Comment 6

•

6 years ago

Question(Linux/Mac) - Does this mean that syscalls like open would also continue to work?

On Mac, open syscalls are still permitted, but it depends on the path. We have SandboxPolicyContent.h in the tree which uses a scheme program to specify the sandbox rules and you can get an idea of what is allowed from that.

After cleanup, I am hoping to set it up so that the load is from the same folder (app bundle?) as libxul.
Question(Mac) - From your comments, it seems that this deferred library load would be permitted?
Caveat - I am not very familiar with the app bundles on Mac, so I apologize if the above description is flawed.

For Mac, that sounds fine and there are several other .dylibs in the same directory (Firefox.app/Contents/MacOS). This library would be like other libraries we ship and be signed by Mozilla.

Flags: needinfo?(haftandilian)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 7

•

6 years ago

(In reply to Shravan Narayan from comment #4)

Question(Linux/Mac) - Does this mean that syscalls like open would also continue to work?

Strictly speaking, in the Linux content sandbox the open syscall doesn't work: it raises SIGSYS, the signal handler communicates with an unsandboxed broker, and the register state is modified to reflect the simulated syscall return value; the real open syscall can't be invoked.

But from the caller's point of view, it's similar to the Mac answer above: open appears to work, but restricted by a policy. The policy is constructed in the awkwardly named SandboxBrokerPolicyFactory.cpp; note the parser for ld.so.conf, so that we can allow loading system libraries, but also the section with NS_GRE_DIR which should cover anything that's shipped as part of Firefox.

Setting MOZ_SANDBOX_LOGGING in the environment may be helpful for understanding what's going on.

Flags: needinfo?(jld)

Shravan Narayan

Reporter

Comment 8

•

6 years ago

Thanks a bunch to everyone for the info! Will investigate the above options and keep this thread updated.

Shravan Narayan

Reporter

Comment 9

•

6 years ago

Hmm... Seeing some super odd behaviors here. Hoping to get some help with below

I am currently testing Linux 64. The dynamic library being loaded is libGraphite. libGraphite.so is now output in the same directory as libxul.so as part of the build. libGraphite.so is also mentioned in package-manifest.in, so that it looks like a normal binary shipped with Firefox.

I start by calling dlopen on libGraphite.so prior to enabling of the content sandbox, and then trying to dlopen libGraphite.so at the relevant location. For preloading, I dlopen'd the library here.

Question - is this a good location for the preload. I'm trying to make sure there are no races, or this isn't too late etc.

This seems to work when running Firefox say using ./mach run layout/reftests/text/graphite-01.html. However, the second dlopen seems to be failing on mach tests... For instance running ./mach test layout/reftests/text/graphite-01.html crashes with content sandbox violations.

To pin down what is happening, I enabled export MOZ_SANDBOX_LOGGING=1. Both mach run and mach test produce

0:00.97 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:00.97 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:08.80 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:08.80 GECKO(2480) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:47.68 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:47.68 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3
0:55.63 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/: 1 -> 35
0:55.63 GECKO(2838) Sandbox: policy for /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin: 1 -> 3

After this mach run outputs

Sandbox: Recording mapping /home/USER/mozilla-central/libGraphite.so -> /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/libGraphite.so

while mach test outputs

1:23.59 GECKO(2838) Sandbox: SandboxBroker: denied op=stat rflags=400000 perms=0 path=/home for pid=3001
1:23.59 GECKO(2838) Sandbox: Failed errno -13 op stat flags 0400000 path /home

Question - Do mach tests have different policies on content sandbox? Is there any reason the above works for mach run but doesn't work for mach test?

Syscalls
I also wanted to be more complete about the set of syscalls used. I have removed the need for the open syscall from the code. So we just use dlopen. However, there are also some calls to the RUST API cannonicalize to standardize file paths... According to the docs this is equivalent to realpath which I imagine has some syscalls to resolve symlinks...

Flags: needinfo?(jld)

Gian-Carlo Pascutto [:gcp]

Updated

•

6 years ago

Flags: needinfo?(jld) → needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 10

•

6 years ago

I start by calling dlopen on libGraphite.so prior to enabling of the content sandbox, and then trying to dlopen libGraphite.so at the relevant location. For preloading, I dlopen'd the library here.

Note that ContentParent runs in the parent/chrome process. This is probably not what you want?

Do mach tests have different policies on content sandbox?

No. There are some differences between running a normal (packaged) Firefox and running the build output. If you're running the build output, the entire source repo gets explicitly whitelisted, because some JS stuff doesn't correctly copy its files into the objdir but leaves them in random places in the filesystem. See for example https://searchfox.org/mozilla-central/source/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#427

I'm not sure if this is different for mach run vs mach test. Looking at MOZ_SANDBOX_LOGGING output will tell you if the expected paths were whitelisted.

Sandbox: Recording mapping

FYI what's happening here is that the sandbox broker notices that this library isn't actually in $OBJDIR/dist/bin but that it's a symlink to another location. It will remember this so it knows that opening the library in the original path is actually OK.

Is there any reason the above works for mach run but doesn't work for mach test?

There's nothing relevant in the excerpts you posted related to opening libGraphite.so. Or maybe there is, but we can't tell from so little info.

However, there are also some calls to the RUST API cannonicalize to standardize file paths... According to the docs this is equivalent to realpath which I imagine has some syscalls to resolve symlinks...

Yes, it will stat/lstat() paths along the way. These are brokered, and the broker will refuse access to some paths because they are outside the allowed paths. This is likely what you saw in the log snippet you posted, i.e. the stat() call on /home is refused.

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 11

•

6 years ago

FWIW, uses of nsIFile::Normalize(), which is the platform-independent way of calling realpath() and likely the same as canonicalize(), typically have their return values ignored in the content process: https://searchfox.org/mozilla-central/search?q=symbol:_ZN11nsLocalFile9NormalizeEv%2C_ZN7nsIFile9NormalizeEv&redirect=false

Note for example this code: https://searchfox.org/mozilla-central/source/dom/media/gmp/GMPChild.cpp#147 which shows that it's best effort but continues even if it fails.

Shravan Narayan

Reporter

Comment 12

•

6 years ago

•

Edited

Note that ContentParent runs in the parent/chrome process. This is probably not what you want?

Oh you're right... that definitely seems like the wrong place. Would you have any thoughts on the appropriate location in the code to do this preloading? I'm hoping to find the place in source which runs in the content process prior to the content sandbox enforcement.

There are some differences between running a normal (packaged) Firefox and running the build output.

This is good to know! For the moment, we are still in the phase where we are testing on build output only. I will look definitely into this when we start testing the packaged builds.

I'm not sure if this is different for mach run vs mach test. Looking at MOZ_SANDBOX_LOGGING output will tell you if the expected paths were whitelisted.

Thanks, I will try this out.

typically have their return values ignored in the content process

Unfortunately the call to canonicalize is in third party rust code, which does abort on failure... From your description, it seems possible that this call to canonicalize may be one of the sources of problems for mach test... I will investigate and check if this needs to be removed.

Flags: needinfo?(gpascutto)

Tom Ritter [:tjr] (OOTO until April)

Comment 13

•

6 years ago

(In reply to Shravan Narayan from comment #12)

Note that ContentParent runs in the parent/chrome process. This is probably not what you want?

Oh you're right... that definitely seems like the wrong place. Would you have any thoughts on the appropriate location in the code to do this preloading? I'm hoping to find the place in source which runs in the content process prior to the content sandbox enforcement.

Trace your way backwards from https://searchfox.org/mozilla-central/rev/e5327b05c822cdac24e233afa37d72c0552dbbaf/security/sandbox/linux/Sandbox.cpp#521 which turns the sandbox on and you'll find the code that happens right before it

Tom Ritter [:tjr] (OOTO until April)

Updated

•

6 years ago

Flags: needinfo?(gpascutto)

Shravan Narayan

Reporter

Comment 14

•

6 years ago

•

Edited

Hmm... I'm still having a lot of trouble with this. In short mach run works, but mach test does not...

Description

I am preloading the dynamic library in ContentChild.cpp
The only call is to dlopen, there are no calls to APIs like canonicalize (realpath)

Would anyone have any approaches to debugging this?

What I have tried

I have already looked through the logs of mach run and mach test with MOZ_SANDBOX_LOGGING enabled but this is not illuminating... mach run runs successfully, but mach test fails and produces the following log statements just prior to the crash

 1:26.13 GECKO(24069) Sandbox: SandboxBroker: denied op=open rflags=2000000 perms=0 path=/dev/null for pid=24240
 1:26.13 GECKO(24069) Sandbox: Failed errno -13 op open flags 02000000 path /dev/null
 1:26.13 GECKO(24069) Hit MOZ_CRASH(failed to open /dev/null: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }) at src/libcore/result.rs:999

There is no mention of '/dev/null' in the logs of mach run

I tried attaching the debugger to mach test (both gdb and rr), but this doesn't seem to be supported

Questions

froydnj: Is there a way to hook the rr debugger on to mach test... I can try to trace the source of the code that needs /dev/null
gcp, tjr: Is there any other way to investigate this other than the above? Any other logging mechanism? Alternately rather than preloading, should we be considering explicitly whitelisting the path to the dynamic library? Also, if it helps I can attach the logs of mach run and mach test

Thoughts welcome!

Flags: needinfo?(tom)

Flags: needinfo?(nfroyd)

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 15

•

6 years ago

Alternately rather than preloading, should we be considering explicitly whitelisting the path to the dynamic library?

No, we already established that the build process puts it in a whitelisted directory, and if you preload it then the broker doesn't even come into play wrt to that. So I don't see what doing that would achieve.

You should figure out what is trying to open /dev/null. If that's code that is hard to modify, we can whitelist /dev/null. But removing the access would be even better, because there's no sane reason anything would need it and the less access, the better...

Whatever code is doing that access seems to be Rust code...

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 16

•

6 years ago

I'm going to take a wild guess that this is the offending code:
https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/process/process_common.rs#L347

Gian-Carlo Pascutto [:gcp]

Updated

•

6 years ago

Flags: needinfo?(tom)

Shravan Narayan

Reporter

Comment 17

•

6 years ago

gcp: oh nice find! That looks plausible. Will investigate.

Shravan Narayan

Reporter

Comment 18

•

6 years ago

So, it wasn't that particular piece of code, but I did identify where we rely on /dev/null. This comes from an upstream dependency "lucet" we rely on for library sandboxing. We do maintain a fork of lucet for our purposes, so I can remove the offending code.

That said, the inconsistency of mach run allowing /dev/null and mach test not doing so is a bit strange. I'm wondering if this behavior is a separate content sandbox bug

Further

when I look at the logs of mach run /dev/null is never mentioned
when I look at SandboxBrokerPolicyFactory.cpp, /dev/null is not mentioned
when I look at list of open files in Firefox (to check if /dev/null was opened before the sandbox was enabled), /dev/null is not mentioned

However, the same code that uses /dev/null runs without issue here... So, not super sure what's going on here... The only other explanations I can think of seem unlikely...

somehow /dev/null is symlinked to something that is allowed, which was opened
The source repo whitelist on when running from the build folder, somehow allow /dev/null

Any thoughts on whether I should open a new content sandbox bug to track this?

Gian-Carlo Pascutto [:gcp]

Comment 19

•

6 years ago

That said, the inconsistency of mach run allowing /dev/null and mach test not doing so is a bit strange. I'm wondering if this behavior is a separate content sandbox bug

Are you sure that this is actually what happens? As far as I can tell we don't allow /dev/null anywhere.

Any thoughts on whether I should open a new content sandbox bug to track this?

I see no evidence there is any issue with the content sandbox.

The only other explanations I can think of seem unlikely...

I can think of at least one other one: "lucet" or some other code is trying to open /dev/null when running tests but not when doing a normal run. For some reason.

Shravan Narayan

Reporter

Comment 20

•

6 years ago

"lucet" or some other code is trying to open /dev/null when running tests but not when doing a normal run. For some reason.

I was wondering about this as well and that was my initial theory. So I did some printf debugging to make sure the same code is executing... While it is the same code that's running, the results of the code here are confusing with how it could interact with the content sandbox. Given that you believe this is not a content sandbox issue, I will investigate the code in question (described below) more carefully.

As far as I can tell we don't allow /dev/null anywhere.

Good to know. This definitely seems to indicate its not a content sandbox issue. Let me investigate further from my side.

Also, here is a quick description of the code I'm looking at. I have some theories about what is going wrong. (Sharing here in case its something obvious that I'm missing :) )

Code that's executing
The lucet code executes these 2 functions

First.
Second.

Briefly, these 2 functions

Create 3 new file descriptors fds[0], fds[1], fds[2]
Open file /dev/null, duplicates the descriptor and assigns to all 3 fds
Reassign fds by duplicating stdin, stdout and stderr

Possible execution paths for this code

If the firefox process' stdin, stdout and stderr are left as default, the code would attempt to open /dev/null briefly, but never actually read or write to it.
If stdin, stdout and stderr are /dev/null during launch, this code would attempt to open /dev/null, duplicate the file descriptor pointing to open /dev/null and read from/write to it.

Investigations from my side
I am planning to check if one or both execution paths described above work. If only one works while the other fails, I will see if mach run and mach test cause different execution paths

Tom Ritter [:tjr] (OOTO until April)

Comment 21

•

6 years ago

Have you watched procfs and see where they point to during execution?

Gian-Carlo Pascutto [:gcp]

Comment 22

•

6 years ago

It would actually make a lot of sense that during tests the stdout/stderr/stdin are redirected, but not during a normal run.

Shravan Narayan

Reporter

Comment 23

•

6 years ago

I'm still investigating, but thought I'd post my progress

I did some checks on the procfs, by calling readlink on file descriptors 0, 1 and 2 in the code

For mach run, stdin, stdout and stderr point to the terminal.
For mach test, stdin points to the terminal, stdout and stderr are piped somewhere (readlink returns strings that look like "pipe:[255774]" - I haven't yet figured out a way to decode what that is, as the id in the string is not a PID)

It seems like this could be the issue... Here are some of the next tests I tried

To see if "/dev/null" was the culprit, I did confirm that adding /dev/null to SandboxBrokerPolicyFactory.cpp allows mach test to run, so there is light at the end of the tunnel :)

To see if I can get mach run to crash with a similar message, I executed mach run with input and output pointing to "/dev/null" directly, with the below command. However, this worked fine without a crash.

2>/dev/null 1>/dev/null /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/dist/bin/firefox layout/reftests/text/graphite-01.html -no-remote -profile /home/USER/mozilla-central/obj-x86_64-pc-linux-gnu/tmp/profile-default

Did the same with the output and error pointing to a log file on my Desktop (which is a path not permitted by the sandbox broker), this too works fine.

I am doing some follow up to see of calling dup on the pipe is somehow the issue... AFAICT, the linux manpages don't explicitly disallow this, so not sure if this is the issue...

Shravan Narayan

Reporter

Comment 24

•

6 years ago

Confirmed that dup is not the issue - dup followed by write works correctly on the stdin, stdout and stderr file descriptors even in mach test... At this point unless there is a way to attach rr on to mach test, I think I have exhausted most other avenues/approaches to explore this...

To this end, I think it may be time to consider if we can allow read write access to /dev/null in the linux sandbox

gcp, tjr: Do you think this is an option? Since I'm unfamiliar with the process, I also wanted to ask if there are additional considerations i.e. does this require additional sign-offs, approvals or security reviews?

Flags: needinfo?(tom)

Flags: needinfo?(gpascutto)

Tom Ritter [:tjr] (OOTO until April)

Comment 25

•

6 years ago

Not my call; probably gcp's or Jed's.

Flags: needinfo?(tom) → needinfo?(jld)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 26

•

6 years ago

(In reply to Shravan Narayan from comment #23)

(readlink returns strings that look like "pipe:[255774]" - I haven't yet figured out a way to decode what that is, as the id in the string is not a PID)

That's an inode number. You can try to find the other end with something like ls -l /proc/*/fd/* 2>/dev/null | grep -w 255774. (Both ends of a pipe use the same inode; contrast socketpairs, which have an adjacent pair of inode numbers. Credit for this trick goes to Chromium.)

But in this case, the question seems to be whether the WASI virtual fds 0-2 point to the same file as the real fds 0-2.

Gian-Carlo Pascutto [:gcp]

Comment 27

•

6 years ago

I don't see any problem with adding /dev/null to the policy. Right now we for sure allow much riskier stuff.

But I still think it is unneeded - may just not be worth the effort to work around/fix whatever funnies the rust stdlib does.

Given that it's only needed for mach test, how about using the whitelist preference (security.sandbox.content.read_path_whitelist) to inject it during tests?

Gian-Carlo Pascutto [:gcp]

Updated

•

6 years ago

Flags: needinfo?(gpascutto)

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 28

•

6 years ago

Adding /dev/null shouldn't be a problem, given that we already allow /dev/urandom (and allow inheriting tty fds).

But it would be good to know why this happens only with mach test and not mach run. According to comment #20 this isn't the Rust stdlib; it's this line, and there doesn't seem to be anything in that module that would construct a WasiCtx without calling it (although that could be changed easily enough). If the sandbox is allowing access to a file and we don't know why, that's kind of a problem.

Instrumenting the broker to log when accesses are allowed (instead of just denials) might help, or using strace, or adding a sandbox policy rule with the CRASH_INSTEAD flag and backtracing in a debugger.

As for rr, it should be possible to record the entire mach test run, select the process of interest in rr ps, and replay that process.

Flags: needinfo?(jld)

Shravan Narayan

Reporter

Comment 29

•

6 years ago

•

Edited

But in this case, the question seems to be whether the WASI virtual fds 0-2 point to the same file as the real fds 0-2.

This is the question I am currently investigating, although I should point out that I'm not fully sure of what is going on here yet, so this may stiil be a red herring...

may just not be worth the effort to work around/fix whatever funnies the rust stdlib does.

Yeah this is sort of the stage I'm approaching, given the debugger issues I am seeing (described below)

Given that it's only needed for mach test, how about using the whitelist preference

I am sort of worried about doing this... Clearly there is some sort of configuration that can get Firefox to run that causes this code to require /dev/null which is somehow triggerred in mach test. Given this, there is probably some way to get mach run or stock firefox to also operate in this configuration... If some user does actually do this, this would cause a crash, so adding an exception to mach test only feels like a bug waiting to happen

Fwiw, I have unsuccessfully tried to get firefox to crash with the same error in mach run when

by piping output and err to another program
by redirecting output and err to /dev/null
by redirecting output in C, with dup2 + exec(firefox) (in case C was needed to trigger the pipe)
by setting up pipes and executing firefox from a C program (in case C + pipes was needed to trigger the pipe)

All of the above cases work without crashing.

Re using rr

Unfortunately have run into a couple of blocking bugs when trying to use rr

unlike mach run, mach test doesn't seem to use the "--debugger" flag (it allows it, but then ignores it) to set the debugger only on the Firefox process
if i try to record all of mach test, I run into an rr bug, where the replay just causes rr to crash with a backtrace (attached below if you are curious)...

I spoke to @froydnj offline, and he suggested I open bugs for both of these, but this does unfortunately make it harder to understand why mach run and mach test differ

Using gdb directly on mach test is also infeasibile as there are about 50 fork/exec's here before the content process is launched... Following the correct forks and execs to get to the right one would be too tedious

Next steps

Instrumenting the broker to log when accesses are allowed (instead of just denials) might help, or using strace, or adding a sandbox policy rule with the CRASH_INSTEAD flag and backtracing in a debugger.

These are good suggestions... Will investigate!

That's an inode number. You can try to find the other end with ...

Oh nice! will investigate the inode numbers to see if this sheds some light on why this is happening.

rr backtrace

[ERROR /build/rr-ViolGa/rr-5.1.0/src/Registers.cc:256:maybe_print_reg_mismatch()] r9 0x7ffcc7be38c0 != 0x7ffcc7be3ac0 (replaying vs. recorded)
[FATAL /build/rr-ViolGa/rr-5.1.0/src/Registers.cc:366:compare_register_files()] 
 (task 15563 (rec:13231) at time 24770)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:216752702/216752713)
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x41)[0x561ed216ddd1]
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x5c5)[0x561ed20a0135]
rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x11e)[0x561ed20ad6be]
rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0xe3)[0x561ed2110a93]
rr(_ZN2rr10ReplayTask13validate_regsEj+0x26c)[0x561ed212a46c]
rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x575)[0x561ed2117d05]
rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xdb)[0x561ed211a73b]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x128)[0x561ed211b3b8]
rr(_ZN2rr14ReplayTimeline19replay_step_forwardENS_10RunCommandEl+0xdf)[0x561ed2132a5f]
rr(_ZN2rr9GdbServer12serve_replayERKNS0_15ConnectionFlagsE+0x81)[0x561ed209f281]
rr(+0xfdbcf)[0x561ed2111bcf]
rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x570)[0x561ed2112d40]
rr(main+0x1da)[0x561ed2051ffa]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f9019caab97]
rr(_start+0x2a)[0x561ed20521ca]
=== End rr backtrace

Flags: needinfo?(nfroyd)

Shravan Narayan

Reporter

Comment 30

•

6 years ago

•

Edited

Attached file Trace from crash in mach run from using CRASH_INSTEAD policy on /dev/null — Details

_

Shravan Narayan

Reporter

Comment 31

•

6 years ago

•

Edited

(Ignore the last attachment - hit send too early)

Shravan Narayan

Reporter

Comment 32

•

6 years ago

•

Edited

Aha, managed to figure out how to get gdb attached to the right process in mach run and mach test!

I can confirm that the thing causing the crash is indeed the lucet rust code... Further the crash has nothing to do with the calls to dup or pipes. The crash definitely occurs here which is simply a call in rust to open("/dev/null"). When this line of code is executed in mach run, the SYSSIG handler in SandboxBrokerClient.cpp is able to successfully request opening of /dev/null here (returns a descriptor that is not -1), while the same request is rejected for mach test (returns -1).

This also leads to a better framing of the issue I'm seeing. The question is not why mach test fails, but rather why does mach run pass? For the purposes of the code I am working on, the "correct" behavior should require explicitly allowing open and write access to /dev/null in the broker.

Barring any objections, I believe adding the exception seems to be right way forward. In the interest of scoping the work I'm doing - I do believe the fact that mach run allows this without the explicit rule feels like an orthogonal bug that may need separate investigation.

Here are a couple of other tests I did to see if I can get more info on why mach run passes.

Test: Added a policy of CRASH_INSTEAD on /dev/null
Result: this causes both mach run and mach test to fail on this line
Conclusion: apparently, without CRASH_INSTEAD, mach run seems to permit /dev/null
Test: Tried opening /dev/null in C++ code
Result: this fails in both mach run and mach test
Conclusion: This result is very confusing. Somehow rust code can open /dev/null, but C++ code cannot?

For completeness, here are the results of the next steps from my previous post

strace - did this. Wasn't too illuminating as both have calls to open dev/null... one succeeds the other one deosn't
CRASH_INSTEAD flag - mach run and mach test fails on the rust code listed above, if this policy is included.
inode number investigation - Wasn't the issue, as the crash is on the open, not on dup or the pipes...

gcp, jld: Please let me know if you have objections to the above conclusion i.e. adding the /dev/null exception + tracking any other part of this as a separate bug with different owners.

Flags: needinfo?(jld)

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 33

•

6 years ago

Added a policy of CRASH_INSTEAD on /dev/null
Result: this causes both mach run and mach test to fail on this line
Conclusion: apparently, without CRASH_INSTEAD, mach run seems to permit /dev/null

Do you have a backtrace here? The log you give above is for another file.

This result is very confusing. Somehow rust code can open /dev/null, but C++ code cannot?

This isn't really possible as far as we understand. For this to work the calls have to be either in a different process (i.e. a non-sandboxed one for rust) or before the sandbox is initialized.

gcp, jld: Please let me know if you have objections to the above conclusion i.e. adding the /dev/null exception + tracking any other part of this as a separate bug with different owners.

You can split the bug to do whatever you originally wanted to do here, and to deal with figuring out why /dev/null is being accessed (and to get rid of it again, potentially) in another bug. I have no idea why the second bug would have to have another owner, though :-)

When this line of code is executed in mach run, the SYSSIG handler in SandboxBrokerClient.cpp is able to successfully request opening of /dev/null here (returns a descriptor that is not -1), while the same request is rejected for mach test (returns -1).

A MOZ_SANDBOX_LOGGING=1 output of a run where that happens (i.e. it being allowed) might reveal a lot.

Flags: needinfo?(gpascutto)

Gian-Carlo Pascutto [:gcp]

Comment 34

•

6 years ago

Is there some way for us to reproduce the testing that you are doing?

If necessary dump the WIP onto a github repo or something...

Shravan Narayan

Reporter

Comment 35

•

6 years ago

Attached file Logs of mach run WITHOUT crash instead — Details

Shravan Narayan

Reporter

Comment 36

•

6 years ago

Attached file Logs of mach run WITH crash instead — Details

Shravan Narayan

Reporter

Comment 37

•

6 years ago

•

Edited

Do you have a backtrace here? The log you give above is for another file.

Have attached the 2 logs

mach run with the current policy unchanged (i.e. no crash instead) - this allows the open of /dev/null
mach run with the crash instead on /dev/null policy - this crashed on the open of /dev/null

In both logs, you can look for the string "!!!!!!!!!About to call the Graphite setup" which is printed just prior to the call to open "/dev/null". If the call to open /dev/null is successful in rust, the rust code prints "!!!!!!!!!!!!!!!!!!!!/dev/null opennig was ok" - this is present in the first log.

/dev/null is being accessed (and to get rid of it again, potentially) in another bug

Actually, we can't get rid of this for 2 reasons...

My original confusion was actually because I thought I did not need a policy for open... I had thought that I needed a policy only on read and write. However, given that I need a policy for open as well, the only solution here is to add /dev/null
We also do need write access for the library sandboxing use case. While we initially were Ok with redirecting output from the sandboxed library to the terminal, we subsequently found this is actually something that needs be context dependent or configurable... If we choose to disallow output to the terminal based on the config, this would mean redirecting all writes to /dev/null...

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Comment 38

•

6 years ago

Thanks to the logs, I think I know what's going on here: the test is being loaded from a file: URL in the mach run case. File URLs are loaded in a dedicated content process which has read access to the entire filesystem (see FILE_REMOTE_TYPE in the source, and the policy for /: 1 -> 35 line in the logs), and File::open is a read-only open (despite being used for virtual stdout/stderr, which might be a bug).

Flags: needinfo?(jld)

Shravan Narayan

Reporter

Comment 39

•

6 years ago

File URLs are loaded in a dedicated content process which has read access to the entire filesystem

Oh, nice find! That is definitely it! The whole thing makes sense.

mach test loads these files from a local test server, which is why it was failing
the upstream lucet rust code is opening /dev/null read only (which as you pointed out looks like a bug given that they are using this for writing)
my C++ test of open /dev/null crashed even in mach run as in my test, I was opening with write access (which is what the upstream code should have been doing anyway)

Thanks so much for your help! This definitely clarifies everything.

I think just to summarize the long thread, I think the conclusions are

default policy is that /dev/null is inaccessible by default (except when handling file urls, where the entire file system is read only by design)
since we need read/write access to /dev/null, we need a sandbox broker policy here
upstream lucet has a bug where they are opening /dev/null read only for some reason

Shravan Narayan

Reporter

Updated

•

6 years ago

Type: task → enhancement

Summary: RLBox - Require info on Content sandbox policies → RLBox - Safely load dynamic library dealing with content sandbox policies

Shravan Narayan

Reporter

Comment 40

•

6 years ago

Attached file Bug 1575985 part 1 - Preload RLBox dynamic library before content sandbox to allow subsequent loads r=froydnj — Details

Phabricator Automation

Updated

•

6 years ago

Attachment #9092869 - Attachment description: Bug 1575985 - Safely load RLBox dynamic library dealing with content sandbox policies r=froydnj → Bug 1575985 part 1 - Preload RLBox dynamic library before content sandbox to allow subsequent loads r=froydnj

Shravan Narayan

Reporter

Comment 41

•

6 years ago

Attached file Bug 1575985 part 2 - Allow RW access to /dev/null in content sandbox r=froydnj,jld,gcp — Details

Phabricator Automation

Updated

•

6 years ago

Attachment #9093198 - Attachment description: Bug 1575985 part2 - Allow RW access to /dev/null in content sandbox, needed by lucet to run WASM sandboxed libraries r=froydnj,jld,gcp → Bug 1575985 part2 - Allow RW access to /dev/null in content sandbox r=froydnj,jld,gcp

Phabricator Automation

Updated

•

6 years ago

Attachment #9093198 - Attachment description: Bug 1575985 part2 - Allow RW access to /dev/null in content sandbox r=froydnj,jld,gcp → Bug 1575985 part 2 - Allow RW access to /dev/null in content sandbox r=froydnj,jld,gcp

Pulsebot

Comment 42

•

6 years ago

Pushed by tritter@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/eeaa7ecf70e3 part 2 - Allow RW access to /dev/null in content sandbox r=gcp

Raul Gurzau (:RaulG)

Comment 43

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/eeaa7ecf70e3

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox71: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla71

Tom Ritter [:tjr] (OOTO until April)

Comment 44

•

6 years ago

Sorry there's still Part 1 outstanding.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

status-firefox71: fixed → affected

Pulsebot

Comment 45

•

6 years ago

Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0309a3ff69a0 part 1 - Preload RLBox dynamic library before content sandbox to allow subsequent loads r=froydnj

Bogdan Tara[:bogdan_tara | bogdant]

Comment 46

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0309a3ff69a0

Status: REOPENED → RESOLVED

Closed: 6 years ago → 6 years ago

status-firefox73: --- → fixed

Resolution: --- → FIXED

BugBot [:suhaib / :marco/ :calixte]

Comment 47

•

6 years ago

Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.

status-firefox72: --- → ?

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox71: affected → wontfix

status-firefox72: ? → wontfix

Jed Davis [:jld] ⟨⏰|UTC-8⟩ ⟦he/him⟧

Updated

•

5 years ago

Regressions: 1644159

Trace from crash in mach run from using CRASH_INSTEAD policy on /dev/null 6 years ago Shravan Narayan 8.99 KB, text/plain		Details
Logs of mach run WITHOUT crash instead 6 years ago Shravan Narayan 160.66 KB, text/plain		Details
Logs of mach run WITH crash instead 6 years ago Shravan Narayan 168.42 KB, text/plain		Details
Bug 1575985 part 1 - Preload RLBox dynamic library before content sandbox to allow subsequent loads r=froydnj 6 years ago Shravan Narayan 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1575985 part 2 - Allow RW access to /dev/null in content sandbox r=froydnj,jld,gcp 6 years ago Shravan Narayan 47 bytes, text/x-phabricator-request		Details \| Review