Closed Bug 1316990 Opened 7 years ago Closed 7 years ago

Figure out license story for vendored rust libraries

Categories

(Core :: Graphics: WebRender, defect)

52 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: kats, Assigned: froydnj)

References

Details

Attachments

(3 files)

Webrender depends on a whole pile of other libraries, which have various licenses. We need to make sure those licenses are compatible with the terms under which we ship to users, and ensure those licenses show up in about:license as necessary.

There is a tool called cargo-license [1] that can show the set of licenses. I don't know if the other rust projects (stylo, etc.) already have figured out how to deal with this.

[1] https://github.com/onur/cargo-license
So AFAICT the license story hasn't been dealt with yet. For example, right now in m-c there is the third_party/rust/matches crate, which specifies an MIT license [1], but I don't see anything corresponding to that in toolkit/content/license.html.

Gerv, do you know what the plan is for updating about:license for rust libraries? This is a little more tricky than usual, because rust libraries can be pulled and discarded almost automatically when people run |mach vendor rust|. I don't think it makes sense to have a required manual update step for about:license.

[1] https://hg.mozilla.org/mozilla-central/file/tip/third_party/rust/matches/Cargo.toml#l5
Flags: needinfo?(gerv)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> Gerv, do you know what the plan is for updating about:license for rust
> libraries? This is a little more tricky than usual, because rust libraries
> can be pulled and discarded almost automatically when people run |mach
> vendor rust|. 

That sounds concerning. Unless there is some sort of (SPDX?) machine-readable license information where we can check that the license is known and accounted for, we can't have people just pulling in random code from Cargo and shipping it without license review.

> I don't think it makes sense to have a required manual update
> step for about:license.

Well, if you want to engineer something to make it less manual, I'm not going to stop you :-) But the idea of random bits of code with random licenses ending up in Firefox without review is not a winner.

Gerv
Flags: needinfo?(gerv)
(In reply to Gervase Markham [:gerv] from comment #2)
> That sounds concerning. Unless there is some sort of (SPDX?)
> machine-readable license information where we can check that the license is
> known and accounted for, we can't have people just pulling in random code
> from Cargo and shipping it without license review.

The Cargo.toml files in rust libraries generally do specify their licenses using SPDX identifiers [1] but may also specify non-SPDX licenses by specifying a license-file.

[1] http://doc.crates.io/manifest.html#package-metadata
A system which passed through known-good or known-included licenses but flagged new ones or unapproved ones for manual review would be fine...

Gerv
Manish/Valentin/Ted - have you guys figured out how to deal with licenses for rust libraries getting pulled in? needinfo'ing you as the authors/reviewers of the patch that pulled in the |matches| crate as a dependency for rust-url.
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(ted)
Flags: needinfo?(manishearth)
(In reply to Gervase Markham [:gerv] from comment #4)
> A system which passed through known-good or known-included licenses but
> flagged new ones or unapproved ones for manual review would be fine...

I've never touched about:license before so I'm a bit unclear on what exactly needs to go in. Say we have two rust libraries that both specify MIT license. Is it sufficient to just have a single copy of the standard MIT license in about:license? Do we need to specify which libraries are included that have this license?

I think it's probably feasible to break about:license into a folder with individual licenses, some of which can be autogenerated during the vendoring process. about:license could then use JS to pull in the licenses from the folder and display them.
I haven't, but presumably this is a thing we could validate during `mach vendor rust`. We should get this sorted before we wind up vendoring something that's not license-compatible.
Flags: needinfo?(ted)
Comment 6 sounds like a reasonable plan, if you are willing to do the hacking and the perf is OK. 

> Is it sufficient to just have a single copy of the standard MIT license in about:license? Do we need to specify which 
> libraries are included that have this license?

Unfortunately, not all copies of the MIT license are the same, and we have to include a copy of each individual variant. Becaus the "copyright" line is included in the text, that means we often have to include many, many copies of what is essentially the same licence :-((

But a folder of license files, where the first line is the title, then the next has the applicable directories/files (if needed), then a blank line, then the license text, would be fine. It would make it easier to add to about:license as well.

Gerv
Yeah, we should automate copying the license. Cargo.toml allows you to specify the license name, but the license text could be something entirely different in the repository.
Flags: needinfo?(manishearth)
I can work on the license automation part, but probably not until after Hawaii, since I'm going on PTO in a couple of days. I'm not sure when it'll get done either - in the meantime we should probably manually add the missing rust licenses to about:license, specially if the code is not #ifdef'd away and is going to ship in an upcoming release.
Flags: needinfo?(valentin.gosu)
Nathan said he would look into this. Assigning to him for now.
Assignee: nobody → nfroyd
Blocks: 1322769
We're going to be adding some more code to the vendor function, and
splitting this function out helps make vendor a little more manageable.
Attachment #8847106 - Flags: review?(giles)
This commit adds the core support to `mach vendor rust` to verify that
the licenses of vendored packages are OK for distribution in Firefox.

Following the comments for the license whitelists, the licenses themselves will
be added in a separate commit for review.
Attachment #8847107 - Flags: review?(giles)
This commit adds the initial whitelist of (mostly) SPDX license identifiers and
the initial whitelist of separately-specified license files for those packages
that require it.  For gamma-lut, the license that the hash identifies is:

https://hg.mozilla.org/mozilla-central/file/47611a305c44/third_party/rust/gamma-lut/LICENSE

For deque, the license that the hash identifies is:

https://hg.mozilla.org/mozilla-central/file/47611a305c44/third_party/rust/deque/LICENSE-MIT

Integrating any new licenses into about:license is left as separate work
for the time being.

The comments in LICENSE_WHITELIST are mostly there for my own convenience; I'm
happy to remove them.  I didn't know if bindgen's license matters given that we
use it as a build-time thing, or whether licenses of build-time only things
could be left out of future about:license changes.
Attachment #8847109 - Flags: review?(gerv)
Attachment #8847106 - Flags: review?(giles) → review+
Comment on attachment 8847107 [details] [diff] [review]
part 2a - check licenses of vendored Cargo packages

Review of attachment 8847107 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. r=me with nits addressed.

::: python/mozbuild/mozbuild/vendor_rust.py
@@ +126,5 @@
> +    def _check_licenses(self, vendor_dir):
> +        # A whitelist of acceptable license identifiers for the packages.license
> +        # field from https://spdx.org/licenses/.  Cargo documentation claims that
> +        # values are checked against the above list and that multiple entries can
> +        # be separated by '/'.  We choose to list all combination instead for the

'all combinations' (should be plural)

@@ +132,5 @@
> +        # conform to the format prescribed in the documentation.
> +        #
> +        # It is insufficient to have additions to this whitelist reviewed
> +        # solely by a build peer; any additions must be checked by somebody
> +        # competent to review licensing minutiae.

It would be nice to give more direct guidance here. In my experience this means r?gerv. Do we have something more formal?

@@ +148,5 @@
> +        LICENSE_FILE_PACKAGE_WHITELIST = {
> +        }
> +
> +        LICENSE_LINE_RE = re.compile(r'license\s*=\s*"([^"]+)"')
> +        LICENSE_FILE_LINE_RE = re.compile(r'license[-_]file\s*=\s*"([^"]+)"')

The regexps look correct except TOML allows (space or tab) indentation. Maybe put a '\s*' at the beginning of each pattern.

FWIW, `license_file` stopped working somewhere between cargo 1.16 and today's 1.18-nightly. Many of those changes have been reverted though, so probably best to keep it for now. https://github.com/rust-lang/cargo/commit/a5a298f1fd5b5ccf03ccb71c0cb6b97867e26d18#diff-f4550ee862e63b97b52a09a7f644ac7bR333

@@ +158,5 @@
> +            toml_file = os.path.join(vendor_dir, package, 'Cargo.toml')
> +
> +            # pytoml is not sophisticated enough to parse Cargo.toml files
> +            # with [target.'cfg(...)'.dependencies sections, so we resort
> +            # to scanning individual lines.

At some point we should vendor a capable python toml library (toml4?) but I'm fine with this approach for now.

@@ +160,5 @@
> +            # pytoml is not sophisticated enough to parse Cargo.toml files
> +            # with [target.'cfg(...)'.dependencies sections, so we resort
> +            # to scanning individual lines.
> +            with open(toml_file, 'r') as f:
> +                license_lines = [l for l in f if l.startswith(b'license')]

You'll need `l.strip().startswith()` here if you want to handle indents. Or just run the regexp on every line.

@@ +258,5 @@
> +
> +        if not self._check_licenses(vendor_dir):
> +            self.log(logging.ERROR, 'license_check_failed', {},
> +                     '''The changes from `mach vendor rust` will NOT be added to version control.''')
> +            sys.exit(1)

Hmm. It's a little nasty to leave things half-imported in the working tree here, but I guess it simplifies examining the rejected crates.
Attachment #8847107 - Flags: review?(giles) → review+
Thanks for the fast review!

(In reply to Ralph Giles (:rillian) | needinfo me from comment #15)
> @@ +132,5 @@
> > +        # conform to the format prescribed in the documentation.
> > +        #
> > +        # It is insufficient to have additions to this whitelist reviewed
> > +        # solely by a build peer; any additions must be checked by somebody
> > +        # competent to review licensing minutiae.
> 
> It would be nice to give more direct guidance here. In my experience this
> means r?gerv. Do we have something more formal?

I don't think we have anything more formal.  I was attempting to avoid sticking people's names into the comments so the comments don't get outdated.  Admittedly, this is probably more opaque than your suggestion.

> @@ +148,5 @@
> > +        LICENSE_FILE_PACKAGE_WHITELIST = {
> > +        }
> > +
> > +        LICENSE_LINE_RE = re.compile(r'license\s*=\s*"([^"]+)"')
> > +        LICENSE_FILE_LINE_RE = re.compile(r'license[-_]file\s*=\s*"([^"]+)"')
> 
> The regexps look correct except TOML allows (space or tab) indentation.
> Maybe put a '\s*' at the beginning of each pattern.

Good point, will fix.

> FWIW, `license_file` stopped working somewhere between cargo 1.16 and
> today's 1.18-nightly. Many of those changes have been reverted though, so
> probably best to keep it for now.
> https://github.com/rust-lang/cargo/commit/
> a5a298f1fd5b5ccf03ccb71c0cb6b97867e26d18#diff-
> f4550ee862e63b97b52a09a7f644ac7bR333

Is this commit the commit that broke license_file, or the commit that reverted the changes?

> @@ +158,5 @@
> > +            toml_file = os.path.join(vendor_dir, package, 'Cargo.toml')
> > +
> > +            # pytoml is not sophisticated enough to parse Cargo.toml files
> > +            # with [target.'cfg(...)'.dependencies sections, so we resort
> > +            # to scanning individual lines.
> 
> At some point we should vendor a capable python toml library (toml4?) but
> I'm fine with this approach for now.

When I did the initial cargo work, `toml` wasn't even good enough to parse basic Cargo.toml files; is toml4 a fork or an improvement of some sort?

> @@ +258,5 @@
> > +
> > +        if not self._check_licenses(vendor_dir):
> > +            self.log(logging.ERROR, 'license_check_failed', {},
> > +                     '''The changes from `mach vendor rust` will NOT be added to version control.''')
> > +            sys.exit(1)
> 
> Hmm. It's a little nasty to leave things half-imported in the working tree
> here, but I guess it simplifies examining the rejected crates.

I agree that it's not great.  The actual `$VCS add` command happens after this bit, though, so at least the files aren't in a going-to-be committed state.
(In reply to Nathan Froyd [:froydnj] from comment #16)

> > https://github.com/rust-lang/cargo/commit/
> > a5a298f1fd5b5ccf03ccb71c0cb6b97867e26d18#diff-
> > f4550ee862e63b97b52a09a7f644ac7bR333
> 
> Is this commit the commit that broke license_file, or the commit that
> reverted the changes?

It was may guess for the commit that broke `license_file` as a key. I didn't do a build to check.

> When I did the initial cargo work, `toml` wasn't even good enough to parse
> basic Cargo.toml files; is toml4 a fork or an improvement of some sort?

I thought toml4 was a fork of toml, with 0.4.0 spec support similar to pytoml, but now I can't find anything about it, so nevermind. I probably misremembered out conversation in bug 1231764.
froydnj: what type of review are you looking for from me? Agreement that the list of licenses are all fine for Mozilla product code?

Build-time-only stuff needs to be open source but doesn't need to go in about:license.

Gerv
(In reply to Gervase Markham [:gerv] from comment #18)
> froydnj: what type of review are you looking for from me? Agreement that the
> list of licenses are all fine for Mozilla product code?

Yes; this is my understanding of what comment 4 suggested.  If certain licenses are not OK, then it would also be useful to know what the problem with them is, from our perspective, so we can try to provide context for upstream maintainers if we have to enter a licensing discussion with them.
Comment on attachment 8847109 [details] [diff] [review]
part 2b - add initial set of approved licenses for Rust packages

Review of attachment 8847109 [details] [diff] [review]:
-----------------------------------------------------------------

r=gerv - all of these are OK.

Gerv
Attachment #8847109 - Flags: review?(gerv) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6bac54c8627a
part 1 - split out an _ensure_cargo function; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/ce7407d9b18b
part 2a - check licenses of vendored Cargo packages; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bcb7b097c66
part 2b - add initial set of approved licenses for Rust packages; r=gerv
You need to log in before you can comment on or make changes to this bug.