Linux sandbox code does mainthread IO to check if all the allowed directories are directories
Categories
(Core :: Security: Process Sandboxing, defect, P5)
Tracking
()
Performance Impact | low |
People
(Reporter: Gijs, Unassigned)
References
(Blocks 2 open bugs)
Details
(Keywords: main-thread-io, perf, perf:startup)
(phase selection is just a guess - the profiler doesn't notice these stats, but strace does)
Cf. https://searchfox.org/mozilla-central/rev/a777ff11b6d700a698c61e5bd17e73b044304494/security/sandbox/linux/broker/SandboxBroker.cpp#267 , https://searchfox.org/mozilla-central/rev/a777ff11b6d700a698c61e5bd17e73b044304494/security/sandbox/linux/broker/SandboxBroker.cpp#199,203
This affects most/all of https://searchfox.org/mozilla-central/rev/a777ff11b6d700a698c61e5bd17e73b044304494/security/sandbox/linux/broker/SandboxBrokerPolicyFactory.cpp#270-291 so we end up stat'ing a bunch of stuff.
I know very little about our sandboxing code, but I'm unsure I understand the potential for negative consequences if we don't stat the directory. That is, presumably from an adversarial perspective it doesn't matter (ie if it doesn't exist or isn't a directory and then an attacker can write to it in order to allow the content process read access, it could probably have done so before launching firefox). Do the sandbox APIs balk if handed directories that don't exist, or can we just get rid of this?
Comment 1•6 years ago
•
|
||
AddDynamic changes behavior depending on the result of the stat(), but it should be a rare call - it's actually only set during tests, but not in normal use.
The advantage of doing the stat() is that we avoid populating the broker mapping with permissions for directories that don't exist anyway. The directories we stat are all devices (so in memory) or things we expect to access. So the gain here is likely to be small.
I can't think of an attack scenario whereby being able to read a path prefix (we force append a / during construction) for a file that didn't exist yet on startup allows you more leverage (read restrictions are after all mostly there to prevent info leaks or exfiltrating sensitive information), but note that in some instances we do allow write access with AddDir too. You could conceivably write out fake devices or library dirs and then rely on bugs in other apps that read those. But then the user would have to be able write there to begin with, which isn't likely if the dir didn't already exist.
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #1)
Thanks for the clarifications!
The advantage of doing the stat() is that we avoid populating the broker mapping with permissions for directories that don't exist anyway.
Can you expand a bit - ie is there runtime overhead for this, or are there other negative consequences to having directories that don't exist in the mapper? Do we have data on how many of the items in the list "normally" get filtered out this way?
The directories we stat are all devices (so in memory)
How many of those devices are guaranteed to exist?
or things we expect to access.
Maybe, but you're accessing them on the mainthread during startup. In general, especially during startup, the list of places we access from our own instrumented code on the mainthread is actually quite limited (cf. https://searchfox.org/mozilla-central/source/browser/base/content/test/performance/browser_startup_mainthreadio.js - unfortunately, it seems some *nix stat
calls aren't instrumented which is probably why the calls from the sandbox code don't show up in that list). I'd expect future accesses to be non-mainthread (and from the content process anyway, which won't slow down the main browser process).
So the gain here is likely to be small.
Sure, but equally, a patch to just remove the stat calls would also be straightforward.
Comment 3•6 years ago
•
|
||
Can you expand a bit - ie is there runtime overhead for this, or are there other negative consequences to having directories that don't exist in the mapper? Do we have data on how many of the items in the list "normally" get filtered out this way?
It's runtime overhead. Given that it's likely backed by a hashmap, it'd only be some small memory overhead, not performance.
How many of those devices are guaranteed to exist?
All of them except the GPU related keys I think (of which only 1 will exist). What could be missing is font dirs and lib dirs, whose layout depends on the distro.
In general, especially during startup, the list of places we access from our own instrumented code on the mainthread is actually quite limited
Yeah, uh, I don't see the system library dirs in there, but I'm quite sure that you won't be able to launch the browser without them. (One thing I'm not so sure about is fonts, which I also don't see - maybe exactly because the access isn't in our own code)
Note that: https://searchfox.org/mozilla-central/source/browser/base/content/test/performance/browser_startup_mainthreadio.js#876 makes the point that accessing device files etc isn't really the IO you're looking for.
Sure, but equally, a patch to just remove the stat calls would also be straightforward.
Yes, if there's no impact on security. I don't think there is here, but I'd lie if I said I was 100% sure. I understand that it helps performance to not do mainthread IO (on startup), but due to the nature of these calls, which are all stats() on directory entries of core paths, it's just not the kind of hit-the-disk IO I suspect you're looking for.
Comment 4•6 years ago
|
||
The SandboxBrokerPolicyFactory
ctor is used in two places:
- per content process, as part of the hacks that were added to deal with font access, and that whole thing is in fact a cause of significant main-thread jank but the sandbox policy stat calls are the least of our concerns; see bug 1439412 comment #15
- once at browser startup (the results are cached), and it's unclear that this is even noticeable compared to all the other file access happening on startup (let alone other theoretically avoidable startup jank, like sync launch for the GPU process and WebExtensions content process)
So this seems very low priority, and I'd question whether the fission
tag in the whiteboard is accurate.
Reporter | ||
Comment 5•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #4)
The
SandboxBrokerPolicyFactory
ctor is used in two places:
- per content process, as part of the [hacks that were added to deal with font access][font-hack], and that whole thing is in fact a cause of significant main-thread jank but the sandbox policy stat calls are the least of our concerns; see bug 1439412 comment #15
- once [at browser startup][startup] (the results are cached), and it's unclear that this is even noticeable compared to all the other file access happening on startup (let alone other theoretically avoidable startup jank, like sync launch for the GPU process and WebExtensions content process)
The problem for me is: afaict the fix here is literally just "remove 3 lines of stat() and IsDir calls", whereas "fix fontconfig usage in the content process" is pretty difficult. Also, given that we do "lots" of IO on startup, "this isn't even noticeable compared to all the other file access" is an argument for not doing work on every one of those accesses taken individually, but that would just reduce us to never fixing any of them. As it is, this code is causing us to stat the newly created temp dir and some stuff inside the profile for every content process we start, on the mainthread - and those are unlikely to be cached (certainly for the first process). And sure, the impact may be small, but in aggregate all these removals will add up - if nothing else, it'll clear up strace/profiler marker logs and make it more obvious what the "real" culprits are.
(In reply to Gian-Carlo Pascutto [:gcp] from comment #3)
Yes, if there's no impact on security. I don't think there is here, but I'd lie if I said I was 100% sure.
How could we be sure? :-)
Comment 6•6 years ago
|
||
As it is, this code is causing us to stat the newly created temp dir and some stuff inside the profile for every content process we start, on the mainthread - and those are unlikely to be cached (certainly for the first process)
The directory entry for the new temp dir is certain to be cached if there's enough memory to launch Firefox.
Likewise, the "stuff inside the profile" sits under ".mozilla", so unless I misunderstand how dentry caching works, it's directory entry is guaranteed to be cached. I have no idea where your "unlikely" comes from.
I think the one case where this might make sense is if the homedir is on a remote filesystem, because then dentry caching is nonfunctional.
Reporter | ||
Comment 7•6 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)
As it is, this code is causing us to stat the newly created temp dir and some stuff inside the profile for every content process we start, on the mainthread - and those are unlikely to be cached (certainly for the first process)
The directory entry for the new temp dir is certain to be cached if there's enough memory to launch Firefox.
Likewise, the "stuff inside the profile" sits under ".mozilla", so unless I misunderstand how dentry caching works,
I mean, on Windows there is no reliable caching, and we routinely see individual stats taking 100s of ms in profiles if the machine is "busy" for some meaning of the word. Despite repeated searches, I can't find information about what exactly Linux does - my point was mostly that, unlike the system directories, it is unlikely that other apps have used $HOME/.mozilla
, so I don't see why it or its subfolders would necessarily be cached when first accessed by us.
Anyway, before we remove this, we should be sure about the security impact... how can we organize that?
Comment 8•6 years ago
|
||
I mean, on Windows there is no reliable caching
This code is Linux specific.
Despite repeated searches, I can't find information about what exactly Linux does
There's a specific cache for directory entries (dcache), which has a child cache that holds the inode entries referenced from it (inode cache). These are implemented at the VFS layer so they're filesystem-agnostic, and together they are (AFAIK) enough to resolve stat() calls. Because of the high access locality of the involved directory entries, it sounds somewhat unlikely that we could be in a situation where that happens and we somehow still have enough RAM to launch Firefox.
As said above, this likely doesn't apply for networked file systems, but if you're using one of these for vital Firefox files, you have other performance problems...
Now, I think what isn't guaranteed is that if you access a file in a directory, that it necessarily pulls in all dentries (and thus inodes) of other files in the same directory. It's likely to happen with dentries for traditional linear mappings, and for inodes because they are stored in block groups, but I guess it's at least theoretically possible to do a hash/b-tree based directory lookup and manage to fetch the original inode while other directory entries have inodes that were in another group, thus forcing you to hit the medium when stat() on another file is called. So yes, if all planets align, you might have an extra IO for a file you're about to access soon anyway.
Now, another thing of course is that those stat() are syscalls. Those have overhead too. Especially on current Intel systems.
my point was mostly that, unlike the system directories, it is unlikely that other apps have used $HOME/.mozilla, so I don't see why it or its subfolders would necessarily be cached when first accessed by us.
The code that sets up the sandbox broker policy inspects preferences. So, the settings from the profile have to have been loaded. Which means that the directory entry must just have been accessed to find the preference files and that's the same thing that contains (or doesn't contain...) things like systemextensionsdev.
Anyway, it's clear that there's a "benefit" to removing the stat(), but it's on the order of "remove a syscall", which isn't quite the same as "remove an IO that must hit disk". And the reason why we're arguing is that we don't have good data on what the latter is, even though the impact on performance is several orders of magnitude.
This isn't a criticism, I'm verbalizing to myself why we got into this situation where we're arguing back and forth over a few stat() calls with me and jed screaming that these aren't the droids you're looking for, whereas you are wondering if the one over there doesn't vaguely look like the one that turned out to be 3CPO anyway.
Anyway, before we remove this, we should be sure about the security impact... how can we organize that?
I'm afraid there's falsifiability problems here. So we think really hard if we can think of a way to break Firefox or the sandbox if we make this change. Note that this is easier the more understandable the code and logic is, and having to worry about mappings for files that don't exist doesn't help. If we can't think of any attack, we can consider doing it and hope for the best.
I can't come up with something, so you could make this chance, and avoid a few (overwhelmingly likely to be cached) stat() calls at some small extra memory cost.
Comment 9•6 years ago
|
||
On the other hand, I have a Linux box (Intel Celeron with a fast SSD) where firstPaint
takes over 3.2 seconds and firstLoadURI
over 9.9 seconds. selectProfile
takes 240 ms. Are these times normal?
I think the Linux startup sequence is less understood. The sandbox code may or may not be part of the problem.
Updated•6 years ago
|
Comment 10•6 years ago
•
|
||
On the other hand...The sandbox code may or may not be part of the problem.
I'm not sure how this relates to the discussion of this patch. Is there any indication stat()-ing the specific files the broker checks is the actual cause? If not this should be in a separate performance bug.
The influence of the sandbox can be established by doing multiple runs and flipping it on or off via prefs. It is likely to have at least some overhead (which may or may not be measurable), but again, that doesn't necessarily have anything whatsoever to do with this bug.
Comment 11•6 years ago
|
||
I'm not sure. I can add up the times reported by strace
, but I don't think that's too reliable. Maybe using ftrace
would have lower overhead?
Alternatively, can the sandbox be disabled?
Comment 12•6 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #4)
So this seems very low priority, and I'd question whether the
fission
tag in the whiteboard is accurate.
Removing fission
whiteboard tag because this bug doesn't block Fission MVP and is not Fission-specific.
Updated•3 years ago
|
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Description
•