Open Bug 1452461 Opened 7 years ago Updated 3 years ago

Native messaging clients on non-Windows platforms sometimes inherit file descriptors for network connections

Categories

(Toolkit :: Async Tooling, defect, P2)

59 Branch
defect

Tracking

()

People

(Reporter: sami.vanttinen, Unassigned)

References

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:59.0) Gecko/20100101 Firefox/59.0 Build ID: 20180323154952 Steps to reproduce: I'm using KeePassXC with KeePassXC-Browser. The browser extension uses native messaging to connect with keepassxc-proxy binary that transfers encrypted traffic between KeePassXC and delivers it back to KeePassXC-Browser. Surfing a while, using the Firefox preferences page etc. Then running the following command: sudo netstat -apn | grep keepassxc-proxy And for comparison looking the same for firefox process. Actual results: Netstat command shows that keepassxc-proxy has opened a TCP connection: tcp 0 0 10.0.2.15:44354 217.212.252.102:80 ESTABLISHED 3733/keepassxc-proxy keepassxc-proxy doesn't handle any TCP sockets. It only creates one Unix Domain socket to the localhost. This IP in the example line belongs to Akamai, so I find this a little strange. Sometimes netstat shows multiple connections to various hosts, port 80 or 443. Connections are sometimes made to Amazon hosts too. The same connection is not visible when showing netstat results from firefox process. Expected results: Native messaging shouldn't create any TCP sockets that look that these are belonging to the native messaging host application. What are these connections? Some kind of telemetry or statistics? This doesn't happen with non-Firefox browsers and native messaging.
OS: Unspecified → All
Hardware: Unspecified → All
For me the opened connection was a HTTP GET request to http://detectportal.firefox.com/success.txt so this is related to the Captive Portals. It still doesn't explain why the native messaging host opens those TCP sockets. Even more TCP sockets are opened if Firefox has telemetry or usage statistics enabled.
Component: Untriaged → Networking
Product: Firefox → Core
So I'm no expert on the native messaging stuff, but it seems to me that another full gecko process is being started, which then spins up its own socket thread, and does all the usual stuff gecko does on startup (check for captive portal, telemetry stuff, etc). Necko isn't responsible for this directly, it's just doing as it's told. Seems more likely to me that this would be on the extension native messaging code.
Component: Networking → WebExtensions: Untriaged
Product: Core → Toolkit
Summary: Native messaging opens TCP sockets to various locations → Native messaging clients on non-Windows platforms sometimes inherit file descriptors for network connections
Priority: -- → P2
This is easier said than done, for non-Windows systems, since they don't have a standard API that allows closing inheritable file descriptors by default. We do have some helper code in the IPC module to handle this, but changing Subprocess.jsm to use them is easier said than done. In the mean time, it probably makes sense to set CLOEXEC on our socket FDs. That isn't exactly foolproof, since there's a possible race between opening the FD, setting CLOEXEC on it, and spawning a child process, where we may fork the child process between the open and the fcntl. But it's better than nothing, I guess.
Blocks: 1269501
Component: WebExtensions: Untriaged → Async Tooling
Depends on: 1455446
I encountered the same issue on Linux/KDE Plasma browser integration: https://bugs.kde.org/show_bug.cgi?id=399449. There is some work around code for that extension, but it's a guessing game, because only the parent (Firefox) knows what file descriptors should be closed before exec(). The more I think about this issue the more I'm convinced that leaking sockets to a fork()'d process is actually a security issue. I agree that the currently mentioned native extensions don't do anything with the leaked information, but IMHO this can be exploited by a malicious extension, at least to collect meta information about the network connections opened by Firefox. As for the CLOEXEC: is there a specific place in the Firefox code that exec()'s the native extension code? Couldn't that place be extended to simply close() all unnecessary file descriptors in the child process after fork() but before calling exec()? Is there a reason why O_CLOEXEC isn't used during the open() of sockets? Using that flag avoids the race condition problems of setting FD_CLOEXEC.

Every time I do an lsof -i on my laptop, I always see some unexpected Python process with socket connections to random Internet addresses and I start freaking out before I remember that this bug is still a thing.

I first found out about this bug when I discovered that deleting large files downloaded by Firefox doesn't always free up the space they were using because a native messaging subprocess could still be holding handles to them. So I occasionally have to go and disable/enable a random extension if I'm running out of disk space.

Neither of these are exactly showstopper issues, but they're quite annoying relative to the (possibly) simple fix proposed in comment #5. Is there a reason not to use O_CLOEXEC?

Just wondering if this issue is ever going to be addressed? It was reported over 3 years ago, yet still seems to be causing many connections to random places via keepassxc-proxy.

Is there a reason why O_CLOEXEC isn't used during the open() of sockets? Using that flag avoids the race condition problems of setting FD_CLOEXEC.

Would you know more about that, Dragana?

Flags: needinfo?(dd.mozilla)

(In reply to David Teller [:Yoric] (please use "needinfo") from comment #8)

Is there a reason why O_CLOEXEC isn't used during the open() of sockets? Using that flag avoids the race condition problems of setting FD_CLOEXEC.

Would you know more about that, Dragana?

I do no know why it is not used. Do you wan to open a separate bug to investigate this?

Flags: needinfo?(dd.mozilla)

Sockets aren't opened; we'll need SOCK_CLOEXEC, which is planned for the next version of POSIX. Linux has it (see also bug 1363378), but macOS doesn't appear to support it as of 10.13. (However, macOS has the POSIX_SPAWN_CLOEXEC_DEFAULT flag.)

It seems that this problems is causing considerable paranoia - in this case wrt keepassxc, an external password manager for firefox appearing to making and/or having connections to "random" IPs: https://github.com/keepassxreboot/keepassxc-browser/issues/100, which would be catastrophic.

So it would be very, very welcome, if FF would not inherit random TCP connections to children.

I am not informed enough to know whether the children inherit open connections or only CLOSE_WAIT connections, which I am currently seeing with FF 78 ESR. If it would be the former, then that would be a security issue, although I am not informed either on how the security model for extensions looks like, how far they are intended to be isolated from whatever parts of the browser - the principle of letting every part and component, tab, JS executor, extension etc. have access to every other components context makes me cringe hard - that should not be. In this case, extensions should absolutely not inherit random TCP sockets.

It'd be very nice if this ticket could get resolved and inheriting random TCP sockets to extensions could be eliminated.

If it would be the former, then that would be a security issue, although I am not informed either on how the security model for extensions looks like, how far they are intended to be isolated from whatever parts of the browser

Native messaging extensions contain a native code part that runs with the same permissions as the main process of the browser itself. It's hence not clear what the security issue would be, given that you're already starting from such a privileged state, i.e. you already have the same powers as Firefox itself.

Similarly:

IMHO this can be exploited by a malicious extension, at least to collect meta information about the network connections opened by Firefox.

If you have installed and are running native code (required for WebExtensions' native messaging) and it is malicious, then it doesn't need this inheritance behavior to snoop on Firefox's network connections. It can just replace the real Firefox binary by a fake one that sends all the data to the attacker directly.

So although this is a bit ugly, I don't see any indications there's a real security issue here. The starting point for any exploit of this behavior ("I installed native code that is malicious") already amounts to a full system compromise, at which point any browser security feature is meaningless.

I first found out about this bug when I discovered that deleting large files downloaded by Firefox doesn't always free up the space they were using because a native messaging subprocess could still be holding handles to them.

Downloads aren't sockets, so this should be easier to address. It may be worth filing this separately.

Status: UNCONFIRMED → NEW
Ever confirmed: true

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #10)

Sockets aren't opened; we'll need SOCK_CLOEXEC, which is [planned for the next version of POSIX][ag411]. Linux has it (see also bug 1363378), but macOS doesn't appear to support it as of 10.13.

It might be a good idea to use it where it's supported. In the mean time, fcntl with FD_CLOEXEC, but there is of course a race with that method.

There is also, these days, a necko process, and sockets opened in that process will not be inherited.

(However, macOS has [the POSIX_SPAWN_CLOEXEC_DEFAULT flag][cld].)

We should probably add that to subprocess_unix when we're on macos...

but there is of course a race with that method.

If we don't see a real security issue with the current behavior, then the small probability of the race is probably not something we actually care about? It would likely be enough to stop people from yelling at extension authors.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #15)

but there is of course a race with that method.

If we don't see a real security issue with the current behavior, then the small probability of the race is probably not something we actually care about? It would likely be enough to stop people from yelling at extension authors.

No, I think it should be fine. And it would at least reduce the occurrence of resource leaks. But if we can use SOCK_CLOEXEC on Linux and POSIX_SPAWN_CLOEXEC_DEFAULT on macOS, we should be pretty much covered everywhere anyway.

(In reply to Kris Maglione [:kmag] from comment #16)

No, I think it should be fine. And it would at least reduce the occurrence of resource leaks. But if we can use SOCK_CLOEXEC on Linux and POSIX_SPAWN_CLOEXEC_DEFAULT on macOS, we should be pretty much covered everywhere anyway.

SOCK_CLOEXEC in Necko (and IPC; we use EOF on the channel to detect that the other process died, so leaks should be avoided), MSG_CMSG_CLOEXEC for fd-passing in IPC, O_CLOEXEC with downloads and whatever is opening all of these .xpi/omni.ja files that I can see being leaked, and there are some pipes whose creators will need to be hunted down to use pipe2. (I think the Necko change would have to be in NSPR if we want to use SOCK_CLOEXEC instead of just setting the flag non-atomically, so we'd need to get an NSPR release rolled out and bump Gecko's required version.) I'm also seeing a netlink socket and a DRI device. There are a lot of problems here.

Alternately, IPC LaunchApp needs two minor features (chdir and not inheriting the environment; also disclaim on Mac but that looks relatively simple) and then it can be exposed to JS and used to replace the posix_spawn backend of Subprocess. (As an added bonus, we can stop changing the entire parent process's cwd.) I threw together a proof-of-concept today and it seems to work and pass the existing Subprocess unit tests on Linux, so I'll probably end up assigning myself bug 1276388.

It's probably also worth fixing the missing cloexec bits where we can, because there are going to be libraries using system/popen/etc.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #17)

SOCK_CLOEXEC in Necko (and IPC; we use EOF on the channel to detect that the other process died, so leaks should be avoided), MSG_CMSG_CLOEXEC for fd-passing in IPC, O_CLOEXEC with downloads and whatever is opening all of these .xpi/omni.ja files that I can see being leaked, and there are some pipes whose creators will need to be hunted down to use pipe2. (I think the Necko change would have to be in NSPR if we want to use SOCK_CLOEXEC instead of just setting the flag non-atomically, so we'd need to get an NSPR release rolled out and bump Gecko's required version.) I'm also seeing a netlink socket and a DRI device. There are a lot of problems here.

Fair enough. I think the necko ones are the ones I'm most worried about, since most of them are ephemeral, and we open a lot of them, so they have more potential to leak system resources. The XPI/jar file descriptors at least generally stay open for the lifetime of the browser process.

Alternately, IPC LaunchApp needs two minor features (chdir and not inheriting the environment; also disclaim on Mac but that looks relatively simple) and then it can be [exposed to JS][ioutils] and used to replace the posix_spawn backend of Subprocess.

That would be ideal. r=me if you write the patch.

(As an added bonus, we can stop [changing the entire parent process's cwd][cwd-oops].)

Yes, I was never happy about that.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.