add "trust checking" to |mach vendor rust|

NEW
Unassigned

Status

2 years ago
3 months ago

People

(Reporter: froydnj, Unassigned, NeedInfo)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [stylo])

(Reporter)

Description

2 years ago
All the Rust code we currently (transitively) depend on was either written by Mozilla employees, Servo core contributors, or Rust core contributors, all of whom we implicitly trust to write code that goes into Firefox.  (e.g. crates written by them or containing code reviewed by them are not going to have nefarious code)  The same cannot necessarily be said of code that gets pulled in via |mach vendor rust|.

We should add some kind of checking that the crates we add (or update) are controlled by people we trust, either via some sort of whitelist on Cargo.toml authors, a whitelist of people who own the crates on crates.io, a whitelist of known-good crates, or some other mechanism.
I think gps mentioned this as well today.

Comment 2

2 years ago
I was basically saying that we need to start thinking about how to vet "third party" code. I think Nathan's ideas in comment #0 sound like a great start! Literally anything is better than nothing.
(Reporter)

Comment 3

2 years ago
(In reply to Gregory Szorc [:gps] from comment #2)
> I was basically saying that we need to start thinking about how to vet
> "third party" code. I think Nathan's ideas in comment #0 sound like a great
> start! Literally anything is better than nothing.

I should have been clear that all of those were Manish's ideas!
A few thoughts, as I've been working through a new commit access model:

* Trusting an individual cannot mean trusting anything committed with their credentials.  Especially if we don't control the repo access/security, and can't be proactive about detecting compromises.  Whatever solution we put into place should keep this in mind.

* AMO has a model where external deps are tied to a whitelist of versions with known hashes.  It might be a little early with that for Rust, but pinned versions are much easier to work with (and validate) in the long term.

* A related, more tinfoil-y tweak would be to maintain a set of staging repos that |mach vendor rust| pulls from, and have designated individuals responsible for pulling in those changes from upstream.  Yes, this is a bit of extra overhead, but it'd put the onus on anyone taking an external dependency to ensure that dependency remains sound.

* Treating all external dependencies the same probably isn't the right policy.  Not all libs are the same.
(In reply to Mike Connor [:mconnor] from comment #4)
> * AMO has a model where external deps are tied to a whitelist of versions
> with known hashes.  It might be a little early with that for Rust, but
> pinned versions are much easier to work with (and validate) in the long term.

Rust does use pinned hashes in Cargo.lock already, thankfully:
https://dxr.mozilla.org/mozilla-central/source/toolkit/library/rust/Cargo.lock

The problem here is when someone adds a *new* dependency, or updates to a newer version of an existing dependency we want some way to ensure that we're not pulling in something unexpected.

> * A related, more tinfoil-y tweak would be to maintain a set of staging repos that |mach vendor rust| pulls from, and have designated individuals responsible for pulling in those changes from upstream.  Yes, this is a bit of extra overhead, but it'd put the onus on anyone taking an external dependency to ensure that dependency remains sound.

This adds an extra step in the process, but I'm not sure it adds any extra security. Either way someone has to manually verify the changes in the code we're importing.

I don't think we've ever really faced this question with our vendored C++ dependencies because they are few in number and the update process is manual. When someone wants to land a new version of ICU they run a script and ask it to import whatever version number and it updates the in-tree code accordingly. I doubt that we're regularly auditing the full set of changes there, we have generally trusted that our upstream repositories are sensible (which is maybe not the best thing). Using crates from crates.io changes this equation because it's now super-easy to find a crate that does something you need there, add a dependency on it, and have it all Just Work, without having any real sense of who wrote that code.

One useful feature of the way we had Alex make cargo-vendor work is that the crates are vendored as source directories into `third_party/rust`, without version numbers (unless we wind up with multiple versions of a single crate, in which case additional versions will have version numbers):
https://dxr.mozilla.org/mozilla-central/source/third_party/rust

That means that after you run `mach vendor rust` you can usefully inspect the diff, at least.

Does Servo have any sort of policy here, or is anything on crates.io fair game for inclusion?
> The problem here is when someone adds a *new* dependency, or updates to a newer version of an existing dependency we want some way to ensure that we're not pulling in something unexpected.

A worse problem is that this step might end up being automatic for stylo? Bumping dependencies on Servo happens often. Though I guess in that case we can force folks who make the PR to make a revendoring commit on m-c.

> Does Servo have any sort of policy here, or is anything on crates.io fair game for inclusion?

We enforce licensing issues (no (L)GPL). Our policy is to also ensure that dependency bumps have been reviewed but in practice this doesn't happen. We avoid zombie crates -- a crate must be maintained or we must fork and maintain it ourselves.

No trust checking. We do full code reviews when moving external (e.g. unmaintained) crates into the servo org but not otherwise.


We ... should have a better policy here :)

>  A related, more tinfoil-y tweak would be to maintain a set of staging repos that |mach vendor rust| pulls from, and have designated individuals responsible for pulling in those changes from upstream.

I'm not sure why the extra staging repo is necessary here? We could just restrict changes to the third-party/rust folder directly, and have designated individuals allowed to run 
mach vendor rust|.

The staging repo gives us the ability to be proactive about this. I don't really see that happening, and I agree with ted that this doesn't add any layers of additional security. For updates that don't need code changes we can be proactive anyway by running cargo update + mach vendor rust, reviewing it, and landing it in m-c's vendoring folder before we need it. The only thing this affects is our ability to be proactive about updates which force us to change code within m-c. That's a minor use case and if we want to be proactive there the review can be done informally.


-------------

Note that even with my conservative "controlled by people we trust" proposal (whitelist authors to Mozilla employees, "The Servo Project Developers" and "The Rust Project Developers") there are multiple layers of trust involved, because these projects have their own sets of reviewers (not all of them employees/Firefox committers, however they are transitively trusted by employees), who will also review contributions from people who are not trusted at all.



-------------

Do we care about build scripts (and in the future, plugins)? Build scripts let you run arbitrary code at build time (think Makefile). The metadata about them exists in the Cargo.toml file, so we can set up tooling that targets these specifically and is more strict about upgrading these. However, I'm not sure if we should be worried about build-time code in particular. The threat model seems to mostly be the same, the only difference is if you were building (but not running/testing) Firefox.

Comment 7

2 years ago
pcwalton and I were talking at LAX on Saturday about additional foo Cargo could do. e.g. it might be possible to flag changes to only unsafe code. Additionally, there was talk of detecting usage of "bad" patterns from rlib metadata. It was really late and I can't remember many details...
Oh, yeah, this was brought up during the work week too (perhaps by Patrick himself? Can't remember who)

Theoretically, we can binary-analysis sandbox all the things. If they don't link to OS primitives for files (or w/e), they can't cause any harm aside from lying in their API contracts (which can transitively cause harm, but it's significantly trickier to inject malicious code that way).

It's an involved analysis since crates may contain generics which get instantiated later. You need some kind of taint system or something. http://github.com/llogiq/metacollect would be helpful here if it was finished.
mconnor, should this trust checking block the landing of Servo's vendored dependencies in mozilla-central? We will soon have bidirectional synchronization of Servo code that lands in mozilla-central and Servo's GitHub repo. New Servo dependencies committed in GitHub would automatically show up in mozilla-central.

Do we also need Mozilla-compatible license checks for vendored Rust dependencies?
Flags: needinfo?(mconnor)
(In reply to Chris Peterson [:cpeterson] from comment #9)
> Do we also need Mozilla-compatible license checks for vendored Rust
> dependencies?

Rust license checking is bug 1316990 and bug 1314071.
Here's a strawman proposal:
1) Any new crates vendored into third_party/rust must have had an explicit code review in a standalone bug. Reviewing multiple new crates in one bug is acceptable, it just needs to be separate from the bug that's making use of the crates. This is simply to ensure that someone has looked at the code instead of treating it as a black box that's coming along with a Gecko patch. We should file a bug to have one or more people post-hoc review the crates that have already been vendored (I count 11 in my tree right now).
2) Updated versions of crates should receive at least a perfunctory code review. We don't enforce that this receives its own bug, but it should at least be in a separate patch so that someone is r+ing the actual crate updates. I would not expect this to be a full code review like we'd do for Gecko code, but a once-over to make sure that nothing unexpected has been added.
3) We add a repository hook to enforce the above, possibly requiring a special annotation like "rust crate foo addition: r=someone". We could also enforce that changes to third_party/rust land in standalone commits, where the only changes are to code in that directory, Cargo.{toml,lock} files, and possibly adding "extern crate" lines to the requisite lib.rs files.

It strikes me that if we implement a policy like this we might want to have a set of reviewers who are fluent in Rust to do the crate reviews, so that we're getting people with the right expertise to look at it. Perhaps that warrants a new code module.

We could also have some standalone record of "blessed crate versions", which would be a list of crate name + version + hash along with a sign off from a person and a crypto signature of the crate data serving as record that that person has reviewed that version of the crate and is OK with it. This could be useful if we wanted to share this trust data with Servo or other Mozilla Rust projects.
I like this proposal.

Does this include crates that have the Servo team (or the Rust team) as reviewers? Not all Servo reviewers are Mozilla employees and/or have L3 on m-c. Servo does not have a commit access policy requirement. However, we are already going to have to trust this set of reviewers with Stylo since that will be autosynced, unless we have a way to make that sync model include this. I guess for these crates we would have to do the crate addition commit anyway, but I'm wondering if we actually need to do the check. (This matters more for the crate version bumps which will probably be more common than crate additions)

A module for a set of rust-specific reviewers makes sense to me. I really like the idea of the standalone record, though I'm not sure where we would keep it. IIRC the lockfile metadata already contains a crate hash, so we can just copy that info over, assuming that the hashing algorithm will never change when cargo is updated.

Don't we already have an sr=foo or a=foo mechanism that we use for some sections of the tree (I forget which it is)? Can this be used and enforced for third_party/rust?
Over in bug 1335525 I plan on adding a big pile of rust code to third_party/rust as part of getting quantum render landed in m-c. Do I need to get that reviewed by somebody (i.e. have we decided on a suitable plan to deal with this trust checking issue)?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> 2) Updated versions of crates should receive at least a perfunctory code
> review. We don't enforce that this receives its own bug, but it should at
> least be in a separate patch so that someone is r+ing the actual crate
> updates. I would not expect this to be a full code review like we'd do for
> Gecko code, but a once-over to make sure that nothing unexpected has been
> added.

When updating a dependency to a version that includes breaking change, the servo/gecko code using that dependency often needs to be updated as well. Separating the two means that there is an intermediate commit that doesn’t build, which could break bisect for someone investigating an unrelated issue.
Out of curiosity, do we require similar reviews for vendoring third-party code that doesn’t happen to be written in Rust? https://wiki.mozilla.org/ThirdPartyCode
I believe so, but those will get revendored less often, so it's probably a per-case thing.
(In reply to Simon Sapin (:SimonSapin) from comment #14)
> When updating a dependency to a version that includes breaking change, the
> servo/gecko code using that dependency often needs to be updated as well.
> Separating the two means that there is an intermediate commit that doesn’t
> build, which could break bisect for someone investigating an unrelated issue.

This makes sense. In these situations landing them as a single changeset is fine.

(In reply to Simon Sapin (:SimonSapin) from comment #15)
> Out of curiosity, do we require similar reviews for vendoring third-party
> code that doesn’t happen to be written in Rust?
> https://wiki.mozilla.org/ThirdPartyCode

We don't have a consistent policy there, but a lot of the third-party projects we use are well-maintained upstream and require code review before landing, so we're not as worried about those. Generally the concern here is the ease with which you can add crates.io dependencies, which means that someone can very easily add code of unknown origin to the build. With C++ dependencies it's an entirely manual process, so it's never historically been a problem.
(In reply to Simon Sapin (:SimonSapin) from comment #15)
> Out of curiosity, do we require similar reviews for vendoring third-party
> code that doesn’t happen to be written in Rust?

Yes. For example, here is a list of bugs updating Gecko's copy of libpng:

https://bugzilla.mozilla.org/showdependencytree.cgi?id=1328354
In general I think the review is going to need to be ex-post-facto. We should set up a process to make sure it gets done but I think it's going to be too much friction to try to gate mechanical landings on it.
Reading more bugmail (bug 1340945 comment 1) suggests that the current plan is to block landings on such updates. I'm certainly fine to try this if everyone involved thinks it's workable. But we should keep the alternative in mind.
I hadn't considered the fact that automated servo vcs syncing would require vendored updates when I made my proposal. It definitely makes that trickier. I'm worried about anything that *doesn't* block landing, though, because anything that doesn't block progress is likely to fall by the wayside due to accidents or schedule pressure or whatever.

Manish floated a hand-wavy proposal where if we maintained a central source of trusted crate versions, we could have the sign-off step involve updating that, and then force Servo PRs to block landing if they update dependency versions and they're not signed-off-on. That proposal does depend on us setting up some sort of trust repository, which makes it more complicated than my mostly-manual proposal, but it does offer a solution to the Servo sync problem.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #19)
> In general I think the review is going to need to be ex-post-facto. We
> should set up a process to make sure it gets done but I think it's going to
> be too much friction to try to gate mechanical landings on it.

I'm certainly game for this; I thought that Firefox had stricter requirements on what code ships, but if that's not the case ex-post-facto works for me. Bit concerned that it might end up with stuff piling up, since it no longer becomes urgent (basically, I share ted's concern from comment 21).


Then again, doing coordinated version bumps is already hard. Lars also didn't want folks to have to go through more process to do this: "I'm fine with mirrored landing and some increased "rigor" bars, but I get less excited about things that require github contributors to touch firefox-specific tools" (referring to having to muck around bugzilla to land PRs in particular -- the final process doesn't need to involve this).
No longer blocks: 1340945
Blocks: 1345321
Whiteboard: [stylo]
Related: I just saw a similar conversation happening in #rust-internals. The rust compiler now allows pulling in crates.io crates to use in the compiler itself, and they're trying to figure out what that means in practice.

Updated

7 months ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.