1620045 - Enforce that rust code does not perform networking calls

Reporter

Description

•

5 years ago

To aid Tor in ensuring rust code in Firefox was not causing proxy bypasses, in Bug 1376621 we landed a simple check for Linux that looks at the imports of the gkrust object file and alerts if any networking functions appear there. In Bug 1566938 we noted that we opened that allowlist up to accommodate new WebRTC stuff, and in doing so, made the check ineffectual.

We want to get back to a position of supporting Tor in ensuring proxy-safety of Rust code. (We also want to avoid doing networking in Rust for Firefox also, but for Tor is represents a systemic threat.) For Tor's case, we would be happy to immediately abort/crash if Rust tries to do networking.

After debating several approaches (a clippy/linter approach, mucking with the shared rust object, creating separate rust libraries, forking rust's stdlib, auditing all network calls) we think we settled on one that is workable.

We will take the gkrust generated object file, and rewrite it, to redirect all imports of networking functions to stubs of functions that will abort when called.

The problem with this approach is that when LTO is enabled; it is not feasible to do this rewriting, as we would be rewriting bitcode that is subject to change as rust/clang evolve. The work-around is to disable cross-language LTO, so that while Firefox's C++ code can benefit from LTO, the Rust code cannot.

Dan Minor [:dminor]

Comment 1

•

5 years ago

What does this mean for the future of the Rust networking code I added to support mDNS queries for WebRTC?

Flags: needinfo?(tom)

Tom Ritter [:tjr] (OOTO until 9/2)

Reporter

Comment 2

•

5 years ago

(In reply to Dan Minor [:dminor] from comment #1)

What does this mean for the future of the Rust networking code I added to support mDNS queries for WebRTC?

We'll be putting this behind a configure flag so it means it will work fine, except in Tor Browser where it will crash :)

Flags: needinfo?(tom)

Dan Minor [:dminor]

Comment 3

•

5 years ago

Thanks for the clarification!

Nathan Froyd [:froydnj]

Comment 4

•

5 years ago

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #0)

We will take the gkrust generated object file, and rewrite it, to redirect all imports of networking functions to stubs of functions that will abort when called.

I think this is roughly how you would do it on Linux; I don't know all the particulars for Mach-O or PE/COFF, but I assume they'd be sort of similar:

Have a C file with all the abort()'ing networking stubs: connect_abort, etc. You may want to make these default visibility (i.e. exported from libxul), because that will probably make your life easier later.
Look through the symbol table of all object files in libgkrust.a for connect et al and rewrite those entries to point at connect_abort etc.
I don't think you need to muck with relocations at this point, because the relocations should already point at the symbols that you rewrote in place in step 2? If they don't point at those symbols, you need to figure out how to rewrite the relocations for text sections (ideally there are no pointers in data-related sections).

Everything should Just Work at this point, because the linker will try to resolve the symbols from libgkrust.a to the file you added in step 1.

The problem with this approach is that when LTO is enabled; it is not feasible to do this rewriting, as we would be rewriting bitcode that is subject to change as rust/clang evolve. The work-around is to disable cross-language LTO, so that while Firefox's C++ code can benefit from LTO, the Rust code cannot.

Presumably whatever configure option gets added to turn this on will also do the appropriate checking that cross-language LTO is not turned on.

Bobby Holley (:bholley)

Comment 5

•

5 years ago

It seems a bit unfortunate to both (1) remove Firefox's existing checking against Necko-bypassing network calls, and (2) require Tor Browser to disable LTO (something we plan to rely on increasingly in the future).

Am I correct that the original check was working fine, but we needed to turn it off because we were building a separate executable that ships with Firefox, links against libXUL, and uses OS-level networking APIs? Are there other ways we could deal with this problem?

Bobby Holley (:bholley)

Comment 6

•

5 years ago

In particular, it seems to me that things that link against libXUL should be able to do networking via Necko, and things that don't link against libXUL shouldn't trip the linter.

Mike Hommey [:glandium]

Comment 7

•

5 years ago

The original check actually works in exactly one setup, and that is not the one we ship with (it only works with rust LTO enabled but with cross LTO disabled).

Bobby Holley (:bholley)

Comment 8

•

5 years ago

(In reply to Mike Hommey [:glandium] from comment #7)

The original check actually works in exactly one setup, and that is not the one we ship with (it only works with rust LTO enabled but with cross LTO disabled).

Sure, but IIUC the original check was effectively a static check on the control flow graph, and that checking would remain valid across different LTO configurations because LTO doesn't modify the CFG. The new check, on the other hand, is a dynamic check, which means that the instrumentation actually needs to exist in the bits we ship.

Bobby Holley (:bholley)

Comment 9

•

5 years ago

(In reply to Bobby Holley (:bholley) from comment #5)

Am I correct that the original check was working fine, but we needed to turn it off because we were building a separate executable that ships with Firefox, links against libXUL, and uses OS-level networking APIs? Are there other ways we could deal with this problem?

Just chatted with Nils, and this wasn't quite correct. the mDNS stuff is not a separate binary (it's just part of Firefox). The issue is that WebRTC has historically handled its own networking, and this was a new piece of Rust code in WebRTC which happened to run afoul of the linter against doing OS-level networking from Rust code. So the linter was neutered (8 months ago), and this bug is about addressing that.

Stepping back a bit, I'm aware of two technical reasons why we generally want to ensure that networking stuff goes through Necko:
(1) Most (but not all) networking stuff involves crypto, and we want to make sure our crypto stack is used consistently (and not e.g. openssl).
(2) We want to avoid accidental proxy bypasses.

One can of course use OS-level networking APIs from C++, but Rust's package ecosystem makes it particularly easy to accidentally do so. And since all of our core networking technology is already in C++, a kludgy-but-simple way to reduce the risk is to ensure that libGkRust does not contain any calls into such APIs. The solution proposed in comment 0 also kinda works, but is less ideal for the reasons identified in comment 5.

Neither (1) nor (2) apply to the mDNS case, so that code itself isn't a problem. But it crosses the rubicon of importing networking symbols into libGkRust, which prevents us from trivially linting against other potentially-problematic cases that might appear over time.

So I can think of a few approaches forward here:
(A) The approach in comment 0.
(B) Proxy the mDNS networing calls into C++ by FFI.
(C) Compile out mDNS in tor builds, and run the static checker on those builds rather than regular Firefox builds.

Nils thought it might generally be a good idea to unify the WebRTC networking stack with Necko, in which case (B) might be the most appropriate path forward. But I don't have a good sense of how invasive that would be to the mDNS code.

Nils Ohlmeier [:drno]

Comment 10

•

5 years ago

(In reply to Bobby Holley (:bholley) from comment #9)

So I can think of a few approaches forward here:
(A) The approach in comment 0.
(B) Proxy the mDNS networing calls into C++ by FFI.
(C) Compile out mDNS in tor builds, and run the static checker on those builds rather than regular Firefox builds.

AFAIK the TOR browser disables WebRTC right now via setting a pref, which effectively disables the mDNS functionality as well. But it is obviously only a run time configuration.

If it is correct that the TOR browser right now is not interested in any of the WebRTC functionality I was actually thinking more of
(D) Compile the TOR browser with WebRTRC disabled at compile time

That should enable running the static checker on TOR builds.

Nils thought it might generally be a good idea to unify the WebRTC networking stack with Necko, in which case (B) might be the most appropriate path forward. But I don't have a good sense of how invasive that would be to the mDNS code.

In that case I was not only thinking about the mDNS feature, but all of our WebRTC networking code. But that would be quite a project.

So in terms of getting back to the point where we can run the static checker again I think C or D would get us the fastest there. And then we could start looking into B to re-enable running the static checker on all Firefox builds again.

Dan Minor [:dminor]

Comment 11

•

5 years ago

I think that TOR builds with --disable-webrtc, which is a build time flag, so option D should just work, but maybe that has changed.

I had thought about using Necko through a C++ FFI when this first came up (some context in https://bugzilla.mozilla.org/show_bug.cgi?id=1376621#c29) but it seemed like a lot of extra work and additional complexity at the time. As a longer term solution, that might be a good way ahead. For what it is worth, I did discuss using the Rust networking functions with the Necko team when I was first implementing this, and they didn't raise any concerns about it at the time or later during code review.

Tom Ritter [:tjr] (OOTO until 9/2)

Reporter

Comment 12

•

5 years ago

(In reply to Bobby Holley (:bholley) from comment #5)

It seems a bit unfortunate to both (1) remove Firefox's existing checking against Necko-bypassing network calls, and (2) require Tor Browser to disable LTO (something we plan to rely on increasingly in the future).

(2) is only partly true: Tor Browser has to disable cross-language LTO, but can otherwise leave LTO on. This is already the case with the existing (neutered) check, but it's true that you could build once with the check enabled and then build a shippable build with the check disabled. Tor doesn't do this presently AFAIK.
(1) The approach in Comment #0 doesn't remove the check any more than the new solution in Comment 10. In both cases the 'Tor Build' would have the networking check and the 'Mozilla Build' wouldn't. The two builds are very similar with the exception of --disable-webrtc which is something we would like to get to the point of being able to remove and we could disable WebRTC via prefs instead.

Futhermore, as we try to help Tor transition off ESR and onto the regular train (this year!) we will hopefully bring more Tor Browser builds into TC, so we will get a better signal about breakage earlier on.

Am I correct that the original check was working fine

As glandium mentioned, it was only working on one platform (which was concerning) and already out-of-the-gate had to allowlist a few networking calls that were manually verified. So it was far from 'fine'.

(In reply to Bobby Holley (:bholley) from comment #9)

Stepping back a bit, I'm aware of two technical reasons why we generally want to ensure that networking stuff goes through Necko:
(1) Most (but not all) networking stuff involves crypto, and we want to make sure our crypto stack is used consistently (and not e.g. openssl).
(2) We want to avoid accidental proxy bypasses.

One can of course use OS-level networking APIs from C++, but Rust's package ecosystem makes it particularly easy to accidentally do so. And since all of our core networking technology is already in C++, a kludgy-but-simple way to reduce the risk is to ensure that libGkRust does not contain any calls into such APIs.

Also Rust code is harder to audit than C++ because fewer people know it.

Neither (1) nor (2) apply to the mDNS case, so that code itself isn't a problem.

Are you suggesting (2) doesn't apply because mDNS is local-network? I do not think Tor would agree on this point.

But it crosses the rubicon of importing networking symbols into libGkRust, which prevents us from trivially linting against other potentially-problematic cases that might appear over time.

Well, we already crossed that rubicon a little bit, but yes this brought us all the way across.

So I can think of a few approaches forward here:
(A) The approach in comment 0.
(B) Proxy the mDNS networing calls into C++ by FFI.
(C) Compile out mDNS in tor builds, and run the static checker on those builds rather than regular Firefox builds.

(In reply to Dan Minor [:dminor] from comment #11)

I think that TOR builds with --disable-webrtc, which is a build time flag, so option D should just work, but maybe that has changed.

Yes, Tor builds with --disable-webrtc. The only --disable-webrtc build we have in TC though, is the Windows MinGW build which the existing (neutered) linting check is not implemented on. (But again, see above about bringing more Tor Browser builds into TC to help with their ESR->regular transition. And then also see again about eventually wanting to --enable-webrtc. If we wanted to keep that same approach we would need some new --disable-udp-webrtc -like flag...)

Bobby Holley (:bholley)

Comment 13

•

5 years ago

(In reply to Tom Ritter [:tjr] (ni for response to sec-[approval|rating|advisories|cve]) from comment #12)

(In reply to Bobby Holley (:bholley) from comment #5)

It seems a bit unfortunate to both (1) remove Firefox's existing checking against Necko-bypassing network calls, and (2) require Tor Browser to disable LTO (something we plan to rely on increasingly in the future).

(2) is only partly true: Tor Browser has to disable cross-language LTO, but can otherwise leave LTO on.

I meant cross-language LTO, sorry.

(1) The approach in Comment #0 doesn't remove the check any more than the new solution in Comment 10. In both cases the 'Tor Build' would have the networking check and the 'Mozilla Build' wouldn't.

The approach in comment 0 is a runtime check for tor builds. The approach in comment 10 is a compile-time.

I agree that the present state of affairs is that we have no checking at all. But the question at hand is how best to reintroduce the checking.

which is something we would like to get to the point of being able to remove and we could disable WebRTC via prefs instead.

If this is a strong long-term goal, then approach (D) won't work long-term.

Are you suggesting (2) doesn't apply because mDNS is local-network? I do not think Tor would agree on this point.

No, I'm suggesting it's not a problem because Tor disables WebRTC, and because the impact of the mDNS thing on non-Tor proxies is presumably considered acceptable.

My lean, at least for now, is to go with approach (D) - run the non-neutered linter on --disable-webrtc linux builds. We can eventually do this directly on Tor builds once they move into TC, but for now we can just add a tier-2 build configuration to mozilla-central.

Nathan Froyd [:froydnj]

Comment 14

•

5 years ago

(In reply to Nathan Froyd [:froydnj] from comment #4)

Look through the symbol table of all object files in libgkrust.a for connect et al and rewrite those entries to point at connect_abort etc.

I learned today that objcopy has a --redefine-sym/--redefine-syms flag that accomplishes this exact sort of rewriting. llvm-objcopy purports to support the same flag, and llvm-objcopy might actually work on LLVM bitcode as well (?).

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Simon Friedberger [:simonf]

Updated

•

1 year ago

Comment 15

•

1 year ago

For context: Tor developers are doing manual reviews for certain keywords: https://gitlab.torproject.org/tpo/applications/tor-browser-spec/-/blob/main/audits/code_audit.sh

Simon Friedberger [:simonf]

Updated

•

1 year ago

Bugzilla

Enforce that rust code does not perform networking calls

Categories

(Firefox Build System :: General, enhancement)

Tracking

(Not tracked)

People

(Reporter: tjr, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tor])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Updated

Updated

Comment 15

Updated