Closed Bug 1306078 Opened 5 years ago Closed 4 years ago

`mach vendor rust` should print vendored crate size stats, warn or error on adding large files

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox54 fixed)

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: ted, Assigned: froydnj)

References

Details

Attachments

(3 files, 3 obsolete files)

The `mach vendor rust` command that I'm adding in bug 1294565 will run `cargo vendor`, which will vendor the contents of a bunch of Rust crates into `third_party/rust`. gps suggested that it would be nice if it printed stats afterwards, and I agree, but I'm pushing it off to this followup bug. Maybe something like:

Old size: N kiB
New size: M kiB
[Added|Removed] X kiB

gps also suggested we warn or error when adding lots of data because it's not great for version control. That's a little trickier to define, but maybe we could just set a threshold for individual files, like "no files larger than 1MB"? We can always add a loophole to work around it or adjust it in the future.
This command is useful for users who wish to perform something like the
following:

1. Add/remove a bunch of files;
2. Examine the additions/removals/modifications for interesting changes;
3. Reject the add/remove if it doesn't meet some set of conditions.

Bikeshedding welcome on the name, or whether I need the --config additions for
the hg version, as HgRepository.add_remove_files implements.  Part of me thinks
that some sort of context manager is a little more appropriate, but that would
require a '.approve()' operation or similar on the context manager, which I'm
not sure would be a win.
Attachment #8831250 - Flags: review?(gps)
This is a strawman proposal that would add least implement the large files
check discussed in this bug.  I left out the stats printing because I wasn't
clear on whether that would be useful or not, since it'd only be seen by the
person doing the vendoring, and they'd probably already be aware that they're
vendoring a bunch of stuff?

This doesn't add support for detecting size changes for modified files.
Thoughts welcome on how to handle this case; I didn't see an obvious way to get
the previous size in git, and I assume it's difficult to do in hg as well.
Attachment #8831259 - Flags: feedback?(gps)
I think one reason gps wanted the stats printed is that it's pretty easy to pull in a huge dependency tree from crates.io. A developer may have just added `foo = "1.0"`, but foo may have a zillion dependencies that we wind up vendoring. It would be nice for that to be called out somehow, even if it's just informational.
Figuring out the files that have been added is also something that you
want to do with a source code repository.
Attachment #8831315 - Flags: review?(gps)
I'm on PTO'ish the first half of next week and may not get to this until Wednesday or Thursday.

Ted is right that I'm concerned about blowing up repo size. Once something is checked in, it is always there :/

There's a few bugs for server-side hooks to restrict incoming large files. Will probably be implemented as checks against both number of files, individual file size, and total size of modifications. We would have a special syntax to override the check most likely.

Also, for file sizes we care more about compressed / on-disk size in the store than raw size, since the compressed version is what is shipped around and networks are slow. You can get an estimate of that by running `hg bundle -r <revs> --base 'parents(roots(<revs>))'` That will produce a bundle containing only <revs> but not the data for the base revisions. With Git, you would generate a "thin pack" somehow (I can't recall the arguments off the top of my head).
Attachment #8831250 - Flags: review?(gps) → review+
Comment on attachment 8831315 [details] [diff] [review]
part 1a - add Repository.get_added_files

Review of attachment 8831315 [details] [diff] [review]:
-----------------------------------------------------------------

::: python/mozversioncontrol/mozversioncontrol/__init__.py
@@ +85,5 @@
>      def get_modified_files(self):
>          return [line.strip().split()[1] for line in self._run('status', '--modified').splitlines()]
>  
> +    def get_added_files(self):
> +        return [line.strip().split()[1] for line in self._run('status', '--added').splitlines()]

Add -n to hide the status prefix and you can avoid the .split()[1].

I insist on this change because .split() will split on all whitespace and if a path has whitespace, bad things will happen. Strictly speaking, you could use .split(1) so it splits at most once. But if you are going to change this, might as well avoid the issue entirely.
Attachment #8831315 - Flags: review?(gps) → review-
Comment on attachment 8831259 [details] [diff] [review]
part 2 - make `mach vendor rust` check for overly-large files

Review of attachment 8831259 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good start and I'd be happy to have something rather than nothing.

A simple feature to add would be the cumulative size of added files and possibly a warning if that were too large (say 5 or 10 MB). But that could be a follow-up.

To get the on-disk size of files, on Mercurial, `hg cat -r . <path0> <path1>...` and count the bytes. For Git, you could `git ls-tree -r HEAD` and pipe relevant object SHA-1s into `git cat-file --batch-check` and parse out the blob size.

::: python/mozbuild/mozbuild/vendor_rust.py
@@ +144,5 @@
> +        for f in self.repository.get_added_files():
> +            path = mozpath.join(self.topsrcdir, f)
> +            if os.stat(path).st_size > FILESIZE_LIMIT \
> +               and f not in whitelist:
> +                large_files.add(f)

large_files is a dict, which doesn't have an add(). You need to `large_files = set()`.

@@ +153,5 @@
> +
> +{files}
> +
> +Please find a way to reduce the sizes of these files or file a bug to have them
> +added to the whitelist of large files.

This should say which component to file the bug in. Presumably Core :: Build Config.
Attachment #8831259 - Flags: feedback?(gps) → feedback+
Applied review feedback, along with side-arming changes to
HgRepository.get_modified_files to receive the same --no-status treatment.

Asking Ted for review since gps is blocking requests.
Attachment #8835571 - Flags: review?(ted)
Attachment #8831315 - Attachment is obsolete: true
Applied feedback, and included a warning for overly-large vendorings.  Didn't
block on this for reasons given in the patch.

Asking Ted for review since gps is blocking requests.
Attachment #8835572 - Flags: review?(ted)
Attachment #8831259 - Attachment is obsolete: true
Attachment #8835571 - Flags: review?(ted) → review+
Comment on attachment 8835572 [details] [diff] [review]
part 3 - make `mach vendor rust` check for overly-large files

Review of attachment 8835572 [details] [diff] [review]:
-----------------------------------------------------------------

This looks generally fine, I just have one question.

::: python/mozbuild/mozbuild/vendor_rust.py
@@ +122,5 @@
> +        # we couldn't vendor those crates, and if we set it higher, we'd risk
> +        # all sorts of large files creeping in.  So we set a relatively low
> +        # limit with a whitelist of file names that are permitted to exceed
> +        # the limit.  Additional files can be whitelisted here with proper
> +        # build peer review.

You're only looking at added files below, so why do we need a whitelist of files that are already vendored?
Attachment #8835572 - Flags: review?(ted) → review+
Assignee: nobody → nfroyd
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> This looks generally fine, I just have one question.
> 
> ::: python/mozbuild/mozbuild/vendor_rust.py
> @@ +122,5 @@
> > +        # we couldn't vendor those crates, and if we set it higher, we'd risk
> > +        # all sorts of large files creeping in.  So we set a relatively low
> > +        # limit with a whitelist of file names that are permitted to exceed
> > +        # the limit.  Additional files can be whitelisted here with proper
> > +        # build peer review.
> 
> You're only looking at added files below, so why do we need a whitelist of
> files that are already vendored?

Ah...that's a good point.  I think I was assuming that the `rm -rf third_party/rust` call:

https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/vendor_rust.py#104

meant that we would re-add files or something, but that doesn't make any sense in retrospect.

That's unpleasant, then, because the expected workflow with this check was:

1. `mach vendor rust`
2. Notice files are getting rejected
3. Modify whitelist
4. `mach vendor rust`
5. Submit patch for build peer approval with whitelist changes.

But additions to the whitelist never get checked again, since those files are vendored/checked-in, not added, so the whitelist is really more like "a history of large files that we have added in the past".  That doesn't seem like the correct state of affairs.

Maybe we should ditch the whitelist concept altogether and just complain about large files?  But then there's no way to ensure that somebody actually considers whether we need the large files...unless we have a server-side hook for that check.  WDYT?
Flags: needinfo?(ted)
Until we get a hook of some kind enforcing things everyone is effectively on the honor system anyway, right? What if you kept the error but added an `--gps-said-i-could-add-large-files` flag to override it? If we get a hook in place we can do some validation there, but it'd still be nice to get early warning when vendoring that you're doing something potentially bad.
Flags: needinfo?(ted)
This version removes the whitelist, instead just opting for a message about any
overly-large files being added and a --build-peers-said-large-imports-were-ok
flag on `mach vendor rust` to override that check, per comment 12.
Attachment #8837310 - Flags: review+
Attachment #8835572 - Attachment is obsolete: true
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f944183330cc
part 1 - add forget_add_remove_files to mozversioncontrol Repository objects; r=gps
https://hg.mozilla.org/integration/mozilla-inbound/rev/08891e1a5977
part 2 - add Repository.get_added_files; r=ted.mielczarek
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b9b3653e416
part 3 - make `mach vendor rust` check for overly-large files; r=ted.mielczarek
Depends on: 1378443
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.