Closed
Bug 1306078
Opened 8 years ago
Closed 7 years ago
`mach vendor rust` should print vendored crate size stats, warn or error on adding large files
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(firefox54 fixed)
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: ted, Assigned: froydnj)
References
Details
Attachments
(3 files, 3 obsolete files)
2.56 KB,
patch
|
gps
:
review+
|
Details | Diff | Splinter Review |
2.92 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
6.17 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
The `mach vendor rust` command that I'm adding in bug 1294565 will run `cargo vendor`, which will vendor the contents of a bunch of Rust crates into `third_party/rust`. gps suggested that it would be nice if it printed stats afterwards, and I agree, but I'm pushing it off to this followup bug. Maybe something like: Old size: N kiB New size: M kiB [Added|Removed] X kiB gps also suggested we warn or error when adding lots of data because it's not great for version control. That's a little trickier to define, but maybe we could just set a threshold for individual files, like "no files larger than 1MB"? We can always add a loophole to work around it or adjust it in the future.
Assignee | ||
Comment 1•7 years ago
|
||
This command is useful for users who wish to perform something like the following: 1. Add/remove a bunch of files; 2. Examine the additions/removals/modifications for interesting changes; 3. Reject the add/remove if it doesn't meet some set of conditions. Bikeshedding welcome on the name, or whether I need the --config additions for the hg version, as HgRepository.add_remove_files implements. Part of me thinks that some sort of context manager is a little more appropriate, but that would require a '.approve()' operation or similar on the context manager, which I'm not sure would be a win.
Attachment #8831250 -
Flags: review?(gps)
Assignee | ||
Comment 2•7 years ago
|
||
This is a strawman proposal that would add least implement the large files check discussed in this bug. I left out the stats printing because I wasn't clear on whether that would be useful or not, since it'd only be seen by the person doing the vendoring, and they'd probably already be aware that they're vendoring a bunch of stuff? This doesn't add support for detecting size changes for modified files. Thoughts welcome on how to handle this case; I didn't see an obvious way to get the previous size in git, and I assume it's difficult to do in hg as well.
Attachment #8831259 -
Flags: feedback?(gps)
Reporter | ||
Comment 3•7 years ago
|
||
I think one reason gps wanted the stats printed is that it's pretty easy to pull in a huge dependency tree from crates.io. A developer may have just added `foo = "1.0"`, but foo may have a zillion dependencies that we wind up vendoring. It would be nice for that to be called out somehow, even if it's just informational.
Assignee | ||
Comment 4•7 years ago
|
||
Figuring out the files that have been added is also something that you want to do with a source code repository.
Attachment #8831315 -
Flags: review?(gps)
Comment 5•7 years ago
|
||
I'm on PTO'ish the first half of next week and may not get to this until Wednesday or Thursday. Ted is right that I'm concerned about blowing up repo size. Once something is checked in, it is always there :/ There's a few bugs for server-side hooks to restrict incoming large files. Will probably be implemented as checks against both number of files, individual file size, and total size of modifications. We would have a special syntax to override the check most likely. Also, for file sizes we care more about compressed / on-disk size in the store than raw size, since the compressed version is what is shipped around and networks are slow. You can get an estimate of that by running `hg bundle -r <revs> --base 'parents(roots(<revs>))'` That will produce a bundle containing only <revs> but not the data for the base revisions. With Git, you would generate a "thin pack" somehow (I can't recall the arguments off the top of my head).
Updated•7 years ago
|
Attachment #8831250 -
Flags: review?(gps) → review+
Comment 6•7 years ago
|
||
Comment on attachment 8831315 [details] [diff] [review] part 1a - add Repository.get_added_files Review of attachment 8831315 [details] [diff] [review]: ----------------------------------------------------------------- ::: python/mozversioncontrol/mozversioncontrol/__init__.py @@ +85,5 @@ > def get_modified_files(self): > return [line.strip().split()[1] for line in self._run('status', '--modified').splitlines()] > > + def get_added_files(self): > + return [line.strip().split()[1] for line in self._run('status', '--added').splitlines()] Add -n to hide the status prefix and you can avoid the .split()[1]. I insist on this change because .split() will split on all whitespace and if a path has whitespace, bad things will happen. Strictly speaking, you could use .split(1) so it splits at most once. But if you are going to change this, might as well avoid the issue entirely.
Attachment #8831315 -
Flags: review?(gps) → review-
Comment 7•7 years ago
|
||
Comment on attachment 8831259 [details] [diff] [review] part 2 - make `mach vendor rust` check for overly-large files Review of attachment 8831259 [details] [diff] [review]: ----------------------------------------------------------------- This is a good start and I'd be happy to have something rather than nothing. A simple feature to add would be the cumulative size of added files and possibly a warning if that were too large (say 5 or 10 MB). But that could be a follow-up. To get the on-disk size of files, on Mercurial, `hg cat -r . <path0> <path1>...` and count the bytes. For Git, you could `git ls-tree -r HEAD` and pipe relevant object SHA-1s into `git cat-file --batch-check` and parse out the blob size. ::: python/mozbuild/mozbuild/vendor_rust.py @@ +144,5 @@ > + for f in self.repository.get_added_files(): > + path = mozpath.join(self.topsrcdir, f) > + if os.stat(path).st_size > FILESIZE_LIMIT \ > + and f not in whitelist: > + large_files.add(f) large_files is a dict, which doesn't have an add(). You need to `large_files = set()`. @@ +153,5 @@ > + > +{files} > + > +Please find a way to reduce the sizes of these files or file a bug to have them > +added to the whitelist of large files. This should say which component to file the bug in. Presumably Core :: Build Config.
Attachment #8831259 -
Flags: feedback?(gps) → feedback+
Assignee | ||
Comment 8•7 years ago
|
||
Applied review feedback, along with side-arming changes to HgRepository.get_modified_files to receive the same --no-status treatment. Asking Ted for review since gps is blocking requests.
Attachment #8835571 -
Flags: review?(ted)
Assignee | ||
Updated•7 years ago
|
Attachment #8831315 -
Attachment is obsolete: true
Assignee | ||
Comment 9•7 years ago
|
||
Applied feedback, and included a warning for overly-large vendorings. Didn't block on this for reasons given in the patch. Asking Ted for review since gps is blocking requests.
Attachment #8835572 -
Flags: review?(ted)
Assignee | ||
Updated•7 years ago
|
Attachment #8831259 -
Attachment is obsolete: true
Reporter | ||
Updated•7 years ago
|
Attachment #8835571 -
Flags: review?(ted) → review+
Reporter | ||
Comment 10•7 years ago
|
||
Comment on attachment 8835572 [details] [diff] [review] part 3 - make `mach vendor rust` check for overly-large files Review of attachment 8835572 [details] [diff] [review]: ----------------------------------------------------------------- This looks generally fine, I just have one question. ::: python/mozbuild/mozbuild/vendor_rust.py @@ +122,5 @@ > + # we couldn't vendor those crates, and if we set it higher, we'd risk > + # all sorts of large files creeping in. So we set a relatively low > + # limit with a whitelist of file names that are permitted to exceed > + # the limit. Additional files can be whitelisted here with proper > + # build peer review. You're only looking at added files below, so why do we need a whitelist of files that are already vendored?
Attachment #8835572 -
Flags: review?(ted) → review+
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → nfroyd
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10) > This looks generally fine, I just have one question. > > ::: python/mozbuild/mozbuild/vendor_rust.py > @@ +122,5 @@ > > + # we couldn't vendor those crates, and if we set it higher, we'd risk > > + # all sorts of large files creeping in. So we set a relatively low > > + # limit with a whitelist of file names that are permitted to exceed > > + # the limit. Additional files can be whitelisted here with proper > > + # build peer review. > > You're only looking at added files below, so why do we need a whitelist of > files that are already vendored? Ah...that's a good point. I think I was assuming that the `rm -rf third_party/rust` call: https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/vendor_rust.py#104 meant that we would re-add files or something, but that doesn't make any sense in retrospect. That's unpleasant, then, because the expected workflow with this check was: 1. `mach vendor rust` 2. Notice files are getting rejected 3. Modify whitelist 4. `mach vendor rust` 5. Submit patch for build peer approval with whitelist changes. But additions to the whitelist never get checked again, since those files are vendored/checked-in, not added, so the whitelist is really more like "a history of large files that we have added in the past". That doesn't seem like the correct state of affairs. Maybe we should ditch the whitelist concept altogether and just complain about large files? But then there's no way to ensure that somebody actually considers whether we need the large files...unless we have a server-side hook for that check. WDYT?
Flags: needinfo?(ted)
Reporter | ||
Comment 12•7 years ago
|
||
Until we get a hook of some kind enforcing things everyone is effectively on the honor system anyway, right? What if you kept the error but added an `--gps-said-i-could-add-large-files` flag to override it? If we get a hook in place we can do some validation there, but it'd still be nice to get early warning when vendoring that you're doing something potentially bad.
Flags: needinfo?(ted)
Assignee | ||
Comment 13•7 years ago
|
||
This version removes the whitelist, instead just opting for a message about any overly-large files being added and a --build-peers-said-large-imports-were-ok flag on `mach vendor rust` to override that check, per comment 12.
Attachment #8837310 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Attachment #8835572 -
Attachment is obsolete: true
Comment 14•7 years ago
|
||
Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f944183330cc part 1 - add forget_add_remove_files to mozversioncontrol Repository objects; r=gps https://hg.mozilla.org/integration/mozilla-inbound/rev/08891e1a5977 part 2 - add Repository.get_added_files; r=ted.mielczarek https://hg.mozilla.org/integration/mozilla-inbound/rev/8b9b3653e416 part 3 - make `mach vendor rust` check for overly-large files; r=ted.mielczarek
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f944183330cc https://hg.mozilla.org/mozilla-central/rev/08891e1a5977 https://hg.mozilla.org/mozilla-central/rev/8b9b3653e416
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•