Closed Bug 1376621 Opened 7 years ago Closed 5 years ago

Enforce that Rust code is proxy-safe (doesn't call directly into libc networking functions)

Categories

(Firefox Build System :: General, enhancement, P2)

enhancement

Tracking

(firefox57 wontfix, firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox57 --- wontfix
firefox67 --- fixed

People

(Reporter: tjr, Assigned: tjr)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [tor 21862])

Attachments

(1 file, 4 obsolete files)

The rust code in 52 might have been proxy unsafe, so Tor turned it off. We should make sure it is proxy safe.

https://trac.torproject.org/projects/tor/ticket/21862
Somehow related: bug 1367932 comment 1.
Blocks: meta_tor
Priority: -- → P2
The getaddrinfo call is from the ToSocketAddrs impl, which is a trait impl in the url crate not used by the Rust integration.

As far as I can tell, none of the ongoing Rust projects should involve any getaddrinfo calls.
Is there any way we can reliably enforce that?
What do we do for C++ code right now?

I suspect the correct linker script / link args would work. I tried compiling an empty Rust library with LTO (rustc test.rs -Clto --crate-type=staticlib), however the library still had an unlinked getaddrinfo symbol. There should be some combo of link args and nm/objdump/readelf calls that let us differentiate between a staticlib that uses a symbol and one that doesn't. I'm not sure what it is.
I don't believe we have anything for C++ code, because historically we haven't imported arbitrary C++ code that can do networking. :) I guess the best thing to do here would be to add stub functions for libc networking functions that we don't want Rust code calling directly, and have them proxy everything to Necko?

It sounds like as-filed this bug is INVALID, because the way we're using the rust-url crate does not cause it to do DNS lookups with getaddrinfo, but we don't have any backstops that would actually prevent that, so maybe we can just morph this bug to cover that.

glandium: does my vague proposal sound plausible?
Summary: Rust Code may not obey proxy safety → Enforce that Rust code is proxy-safe (doesn't call directly into libc networking functions)
I would rather have stub functions that just abort.

We'd have to force Rust to link to those in such a way that the final XUL link step isn't polluted. I'm not sure how that can be done, perhaps a linker script?
I assume that if we linked in a Rust crate that provided the right symbols it might work, but we'd have to ensure that they did not wind up as public symbols to which the other stuff in XUL would work. I'm also not sure what extern linkage actually does in Rust, so it's possible this would not work.
We create a regular C-like staticlib; we could link those symbols in first and create a different staticlib, and then link it into XUL?
This bug is showing up in some Firefox 57 tracking dashboard because it was changed recently. I don't think it has anything to do with Firefox 57, so I'm setting the tracking flag so it doesn't show up. If anyone disagrees, please change the flag!
Hey Georg, I'm flagging you in this to help us figure out what/how we should proceed.
Flags: needinfo?(gk)
Not sure what exactly I should help with but comments 5-8 had some ideas about how to solve this bug. I am not familiar enough with Rust/linking internals to evaluate that one but, yes, we are very interested in a solution that avoids us removing random network related Rust APIs just because we can't be sure that there is no way they can bypass the proxy settings in the browser.
Flags: needinfo?(gk)
Whiteboard: [tor] → [tor 21862]
Product: Core → Firefox Build System

We talked about this a bit today and established at least the next step forward:

<@ted> we could definitely use the same techniques we use to replace malloc and abort to replace stuff like socket / connect, but that'd apply to all of libxul
<@ted> so we'd have to ensure that we made the existing networking code in Firefox work with it first
<@ted> i'm not sure there's going to be a sensible way to enforce this for just rust code
<froydnj> could we just lint the rust staticlib(s) for things we don't want to see? won't solve dlopen issues, but would be a good first step?
<tjr> So let me speculate wildly about things that I don't really know how they work..... When we're compiling the rust code; we're making object files that will eventually get merged (linked?) into the resulting library (xul).
<tjr> Those object files must have symbols they need from other libraries. Can we look for particular symbols that are scary and abort or something?
<@ted> sure
<tjr> (This kind of sounds like what froydnj just said...)
<@ted> yeah :)
<@ted> i'm not sure if that wouldn't produce false positives right now, though, since we might be compiling crates that have networking code that we're not using?
<froydnj> tjr: great minds think alike! ;)
<tjr> Hm. And modifying third party crates is quite a maintenance burden.... Would it be crazy to suggest modifying the object file (or preprocess the crate) to swap those functions out to call stub_socket or whatever; and provide implementations that runtime abort?
<@ted> those both sound fragile
<tjr> Well; as a starting point I guess we can see how many occurances of the third party crate problem we have....
<tjr> would you be able to point me where to start putting some object file investigation work?
<froydnj> my gut feeling is that there shouldn't be too many false positives and if there are false positives, that probably means the crate functionality needs to be split up
<@ted> maybe!
<froydnj> tjr: you probably want to put a check into the rule for rust target libraries: https://searchfox.org/mozilla-central/source/config/rules.mk#1038-1040
<froydnj> tjr: and the check should look something like $(TOOLCHAIN_PREFIX)readelf -sW | awk '$7 == "UND" { print $8 }' |egrep -q 'socket|connect|accept|...' 2>/dev/null && echo 'TEST-UNEXPECTED-FAIL | check_networking' || true
<froydnj> readelf -sW $some_magic_rust_library
<tjr> Thanks!
<@ted> we've got mio and net2 vendored, i don't know if either of those actually wind up in gkrust
<froydnj> mio does on linux, I think
<froydnj> for remoting audio

So at this point; I have a patch that will search through the built rust static libraries[0] for exported symbols we care about by running

readelf -sW /home/tom/Documents/moz/mozilla-unified-2/obj-x86_64-pc-linux-gnu/x86_64-unknown-linux-gnu/release/libgkrust.a | awk '$7 == "UND" { print $8 }' | egrep 'socket|connect|accept'

When built without --enable-release, readelf -sW will produce a gigantic file (85MB) with many of the 3 initial symbols.

When built with --enable-release readelf -sW output is 3.2M and does not contain any of those symbols.

From a previous discussion on #build[1], I was thinking I might need to perform dead code elimination on the rust static libraries by completing some sort of partial link process with libxul to remove what was needed; but now it looks like that is not necessary; and performing this symbol search when enable-release is enabled is a path forward...

[0] obj-x86_64-pc-linux-gnu/x86_64-unknown-linux-gnu/[debug,release]/[libgkrust.a, libjsrust.a]
[1] https://mozilla.logbot.info/build/20190116#c15838788

As a checkpoint, I got this working (and passing on Linux) with the attached patch. After talking with glandium; I'm going to refactor this to a check_binary check. And then I'll start trying to be more exhaustive with the networking functions we look for.

Assignee: nobody → tom

Alright; here's a first pass to get some feedback on the changes to check_binary.py

Attachment #9037824 - Attachment is obsolete: true
Attachment #9038023 - Flags: review?(mh+mozilla)

Hm. This does not work for anything but Linux. I am investigating finding the correct readelf programs/command for OSX, Windows, and mingw-clang...

Comment on attachment 9038023 [details] [diff] [review]
Bug 1376621 - Search for networking symbols in rust code and fail if they are found

Review of attachment 9038023 [details] [diff] [review]:
-----------------------------------------------------------------

::: python/mozbuild/mozbuild/action/check_binary.py
@@ +311,5 @@
>  
>  
> +def check_networking(target, binary):
> +    retcode = 0
> +    networking_functions = ["socket", "connct", "accept"]

Missing a e in connect.

@@ +319,5 @@
> +    try:
> +        for sym in at_least_one(iter_readelf_symbols(target, binary)):
> +            if sym['index'] == 'UND' and sym['name'] in networking_functions:
> +                bad_occurances_count += 1
> +                bad_occurances_names.add(sym['name'])

Keeping a separate count doesn't sound too useful. len(bad_occurances_names) will have the same value in most cases, and even if it doesn't have the same value, it's not all that important.

@@ +326,5 @@
> +
> +    basename = os.path.basename(binary)
> +    if bad_occurances_count:
> +        s = 'TEST-UNEXPECTED-FAIL | check_networking | {} | Identified {} ' + \
> +            'networking function(s) being imported in the rust static libraries ({})'

library

@@ +328,5 @@
> +    if bad_occurances_count:
> +        s = 'TEST-UNEXPECTED-FAIL | check_networking | {} | Identified {} ' + \
> +            'networking function(s) being imported in the rust static libraries ({})'
> +        print(s.format(basename, bad_occurances_count,
> +            ",".join(x for x in bad_occurances_names)),

join(bad_occurances_names) does the same, but sorted() would be even better.

Also, it's occurrences :)

@@ +399,5 @@
>          return checks(HOST, options.binary)
>      elif options.target:
>          return checks(TARGET, options.binary)
> +    elif options.networking:
> +        return check_networking(TARGET, options.binary)

Considering there _are_ host and target rust libraries, and that this only works for target rust libraries, I'd rather have one of --host/--target still be required, and --networking --host be an explicit error.
Attachment #9038023 - Flags: review?(mh+mozilla)

Here's a version that is restricted to Linux and works for just a few functions. How do you feel about landing this as incremental work?

Attachment #9038023 - Attachment is obsolete: true
Attachment #9038324 - Flags: review?(mh+mozilla)
Comment on attachment 9038324 [details] [diff] [review]
Bug 1376621 - Search for networking symbols in rust code and fail if they are found

Review of attachment 9038324 [details] [diff] [review]:
-----------------------------------------------------------------

::: python/mozbuild/mozbuild/action/check_binary.py
@@ +388,5 @@
>  
>      options = parser.parse_args(args)
>  
>      if options.host == options.target:
> +       print('Exactly one of --host or --target must be given',

These lines are not indented correctly.

@@ +395,3 @@
>  
> +    if options.networking and options.host:
> +       print('--networking is only valid with --target',

likewise
Attachment #9038324 - Flags: review?(mh+mozilla) → review+

Note: now that I'm rebasing my stuff on top of this and hitting some problems related to the fact that my code doesn't handle static libraries just yet, that made me realize that this is not going to work along cross-language LTO because in that case static libraries contain llvm ir, not objects.

So I'm not sure anymore whether this should land or not. I don't think we're that far off with cross-language LTO :(

If bug 1516228 lands before this, please note it's bitrotting the patch here, and requires some trivial changes:

  • Replace iter_readelf_symbols with iter_symbols, removing the target argument.
  • Replace the sym['index'] == 'UND' check with sym['addr'] == 0
  • Replace readelf with llvm-objdump in the RuntimeError message.

That should be all.

Essentially, that's the same set of changes as the last patch in that bug applied to check_dep_versions.

(In reply to Mike Hommey [:glandium] from comment #20)

Note: now that I'm rebasing my stuff on top of this and hitting some problems related to the fact that my code doesn't handle static libraries just yet, that made me realize that this is not going to work along cross-language LTO because in that case static libraries contain llvm ir, not objects.

So I'm not sure anymore whether this should land or not. I don't think we're that far off with cross-language LTO :(

Here's what my investigation figured out (much of this from glandium):

  • The output of building with rust cross-lang-lto (Bug 1512723) makes libgkrust.a an ar archive
  • We can use ar x to extract gkrust-b8785d1bfe9e4b58.gkrust.369bom9g-cgu.0.rcgu.o from the archive. This file is ~196 MB
  • We can use llvm-dis to disassemble this to a file. This takes about 20 seconds on my 2-year-old xeon tower and makes a 1.1GB file
  • Inside the output .ll file; we can find symbol imports beginning with 'declare'
  • We can parse it of course; but shell-wise we can get a list with a command like grep -o -e "^declare .* @[^(]*" gkrust-b8785d1bfe9e4b58.gkrust.369bom9g-cgu.0.rcgu.o.ll | rev | cut -f 1 -d " " | rev
    • The |rev | cut | rev technique is to get the last field.
  • The llvm-dis version we use needs to match up with the clang compiler used by rust; in my case I was using rust nightly and needed to get get (build) a version of llvm-dis to hit clang 8.0 trunk.
    • We could require our major clang version to go in lockstep with the clang version used by rust...
    • We could build a llvm-dis as a toolchain artifact
    • :mw says that building rust will produce a set of llvm tools most of which are not packaged (including llvm-dis). I'm pretty sure we build our own rust; so we can do this for automation. I don't think it'd work for local builds though.
      • In automation, we'd need to package llvm-dis and put it in the right place so we can use it in check_binary without making it the default llvm-dis otherwise llvm-dis wouldn't match the clang on the PATH
    • We could bring it down somehow with mach bootstrap.

So it seems like we have a path forward even with cross-lang-LTO.

Here is the list of networking functions I investigated:

  1. socket
  2. socketpair
  3. connect
  4. accept
  5. bind
  6. listen
  7. recv
  8. recvfrom
  9. recvmsg
  10. send
  11. sendto
  12. sendmsg
  13. gethostbyname
  14. gethostbyaddr
  15. gethostent
  16. sethostent
  17. endhostent
  18. gethostent_r
  19. gethostbyname2
  20. gethostbyaddr
  21. gethostbyaddr_r
  22. gethostbyname_r
  23. gethostbyname2_r
  24. getaddrinfo
  25. getservent
  26. getservbyname
  27. getservbyport
  28. setservent
  29. getprotoent
  30. getprotobyname
  31. getprotobynumber
  32. setprotoent
  33. endprotoent
  34. getsockname
  35. getsockopt
  36. setsockop

I got three hits: socketpair, recvmsg, and sendmsg.

socketpair is used here in mio-uds: https://searchfox.org/mozilla-central/rev/465dbfe030dfec7756b9b523029e90d48dd5ecce/third_party/rust/mio-uds/src/socket.rs#59,68

sendmsg is used in mio and audioipc here: https://searchfox.org/mozilla-central/rev/465dbfe030dfec7756b9b523029e90d48dd5ecce/third_party/rust/mio/src/sys/unix/uds.rs#202 https://searchfox.org/mozilla-central/rev/465dbfe030dfec7756b9b523029e90d48dd5ecce/media/audioipc/audioipc/src/msg.rs#117

recvmsg is used in the same: https://searchfox.org/mozilla-central/rev/465dbfe030dfec7756b9b523029e90d48dd5ecce/media/audioipc/audioipc/src/msg.rs#88 https://searchfox.org/mozilla-central/source/third_party/rust/mio/src/sys/unix/uds.rs#168

recvmsg and sendmsg require a socket; and I didn't find socket in the import. socketpair creates a pair of sockets but is restricted to making AF_UNIX sockets. My conclusion is therefore that mio is making a socketpair and sending and receiving messages on them; and that audioipc is not the source of the sendmsg/recvmsg calls.

I can allow calls to socketpair, recvmsg, and sendmsg then. If anything wants to use recvmsg/sendmsg on a socket that goes to the internet; looking for a call to the socket function would raise a red flag. I don't know of any other way to create a socket besides a call to socket. Nonetheless I am also trigger on calls to related functons like connect, accept, bind, listen, recv, send.

I am also triggering on all the DNS resolution functions I could find: gethostbyname and related.

Finally I am also triggering on port/protocol lookup functions getprotoent, getservent, and related: these functions are suspicious do not necessarily mean networking is occurring.

This is rebased on top of your iter_symbols change; but still linux-only for now.

Attachment #9038324 - Attachment is obsolete: true
Attachment #9039171 - Flags: review?(mh+mozilla)
Comment on attachment 9039171 [details] [diff] [review]
Bug 1376621 - Search for networking symbols in rust code and fail if they are found r?glandium

Review of attachment 9039171 [details] [diff] [review]:
-----------------------------------------------------------------

::: python/mozbuild/mozbuild/action/check_binary.py
@@ +316,5 @@
>      except Empty:
>          raise RuntimeError('Could not parse readelf output?')
>  
>  
> +def check_networking(target, binary):

You're not using the target argument ; you can remove it.

@@ +318,5 @@
>  
>  
> +def check_networking(target, binary):
> +    retcode = 0
> +    networking_functions = [

you may want a set here.

@@ +322,5 @@
> +    networking_functions = [
> +        # socketpair is not concerning; it is restricted to AF_UNIX
> +        "socket", "connect", "accept", "bind", "listen",
> +        "getsockname", "getsockopt", "setsockopt",
> +        "recv", "recvfrom", #"recvmsg",

Please don't include commented values, that's just confusing. If you want to document that we explicitly don't look for those, please write a comment in the form of sentences.
Attachment #9039171 - Flags: review?(mh+mozilla)
Attachment #9039375 - Flags: review?(mh+mozilla) → review+
Keywords: checkin-needed

Pushed by opoprus@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/435183826647
Search for networking symbols in rust code and fail if they are found. r=glandium

Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Blocks: 1524408

Over in Bug 1554976 I'm attempting to get multicast DNS support working to obfuscate local ip addresses for WebRTC. As part of this, I need to stand up a server to respond to mDNS queries.

We sort of have an implementation in tree [1] however that is focused on service discovery, not responding to simple hostname queries. I added support to the JavaScript fallback library to generate A and AAAA answers, but as it turns out, the JavaScript version is not used on OS X because it fails to work there, and the C++ version that is used in OS X is not working properly anymore either. The code itself seems semi-abandoned and I'm not sure how much use it ever received.

Rather than try to fix up the C++ version and maintain two separate libraries, it seemed like it might be better to create a small Rust library to handle queries. I'm adding this as part of the mtransport code [2] that we use for WebRTC signaling. I'm running the mDNS code inside the StunAddrsRequest IPC code that has been explicitly set up to do networking in its own process. I'm not involved in the work, but we're in the process of moving all of the WebRTC signaling code into its own process anyway.

At any rate, it seems that as a result of this test added by this bug, I'm stuck using C++ to write new networking code rather than using Rust, which seems like a step backwards from a safety and security point of view. If the intention is to audit third party code, could we restrict the checks to just things that have been vendored in tree? Do I need to wire up a bunch of FFI bindings to the necko code? Is there any other way ahead for me here? Thanks!

[1] https://searchfox.org/mozilla-central/source/netwerk/dns/mdns/libmdns
[2] https://searchfox.org/mozilla-central/source/media/mtransport
[3] https://searchfox.org/mozilla-central/source/media/mtransport/ipc/StunAddrsRequestParent.cpp

Flags: needinfo?(tom)

(In reply to Dan Minor [:dminor] from comment #29)

At any rate, it seems that as a result of this test added by this bug, I'm stuck using C++ to write new networking code rather than using Rust, which seems like a step backwards from a safety and security point of view.

We need to document the test better; a that is definitely not what we want to encourage.

If the intention is to audit third party code, could we restrict the checks to just things that have been vendored in tree?

Unfortunately; no. All the rust code gets compiled into one big object file; there's no way to separate it out AFAIK.

Do I need to wire up a bunch of FFI bindings to the necko code? Is there any other way ahead for me here? Thanks!

Subtract the function calls you need from the test to make it pass. File a new bug, dependent on this one, explaining the problem (doesn't need much) and reference your bug and the function calls you needed to subtract. Hopefully that will unstick you in the least problemtic way and give us something to think about for the next iteration of this test.

Flags: needinfo?(tom)

Thank you! I'm also checking with the necko folks about what they would prefer me to do so I may not end up using the Rust version anyway, but it's good to know this test won't block me if that is the approach we end up going with.

Blocks: 1566938
Blocks: 1620045
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: