Closed Bug 1346026 Opened 3 years ago Closed 2 years ago

Add ability to vendor python modules to |mach vendor|

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set

Tracking

(firefox62 fixed)

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: ahal, Assigned: davehunt)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 2 obsolete files)

There's a mach command that assists with vendoring in rust crates, called with |mach vendor rust|. We should add |mach vendor python| to make adding python modules easier as well.

Maybe to start, it might only download wheel packages and error out if none are found.
In case anyone decides to tackle this, the preferred way to vendor is to only check-in source files (not docs, tests, changelogs, setup.py, etc). Probably the easiest way to do this is to unzip a built wheel. A recent example from bug 1381915 which was updating voluptuous:

    $ cd third_party/python
    $ rm -rf voluptuous
    $ hg rm --after voluptuous/*
    $ pip install wheel
    $ pip wheel voluptuous
    $ unzip voluptuous-0.10.5-py2-none-any.whl
    $ hg add voluptuous/*
    $ hg commit -m "Update voluptuous"

I haven't verified those steps, so I'm not sure it's exact.. but a |mach vendor python| command should do something along those lines.
Just so I'm on the same page here; we need to vendor the python modules to create a local copy of for example private repositories, which cannot be accessed by everyone. This local copy can then be distributed to the users more easily.

In this task, we are required to create a function that mimics the |mach vendor rust| except for python modules instead.
This would involve writing code for unzipping the wheels of each of the required modules, finding the source files among them (and install/update the latest version?).
Flags: needinfo?(ahalberstadt)
You have it mostly right with a few changes:

1. This isn't for private repositories, it's for vendoring [1] public repositories. For now we can focus on only projects that have wheels uploaded to pypi (and error out if it doesn't)
2. Don't need to find source files in a .whl, any files in there are already relevant (things like docs and tests are already stripped out).
3. Don't need to install anything, this tool just needs to extract the files to third_party/python/<module>

Thanks for looking into this! Let me know if you need any more guidance.

[1] https://stackoverflow.com/a/39643873
Flags: needinfo?(ahalberstadt)
I stumbled upon https://github.com/oconnor663/peru recently. Looks interesting.
I was referring to the VendorRust class in https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/vendor_rust.py 
To start, the code should list all the public repositories that are present in third_party/python/<module> (os.listdir)
But how do I check is the wheel exists or not?
I assume I unpack with a library like zipfile.
I need some more help with this. I am really confused as to what I am to do.
Can someone link the relevant files (how the functionality was implemented in rust) and other helpful resources?
Thank you.
Flags: needinfo?(ahalberstadt)
dyex719 and I talked offline, we're going to focus on another bug instead as this one still has a few open questions.
Flags: needinfo?(ahalberstadt)
So it turns out this bug is slightly underspecified! Let me spell out a few things and see if we can settle on a design so someone can actually implement it.

`mach vendor rust` works by running a tool called `cargo-vendor` internally. `cargo-vendor` reads the Cargo.lock files in our tree, which are generated by `cargo` from our Cargo.toml files:
https://dxr.mozilla.org/mozilla-central/source/toolkit/library/rust/Cargo.lock
https://dxr.mozilla.org/mozilla-central/source/toolkit/library/rust/Cargo.toml

Cargo.toml files specify the crates that a Rust crate depends on, and Cargo.lock files contain the full set of transitive dependencies for those crates--everything your crate needs to build.

I don't know if there are precise equivalents here in Python land. Cargo.toml maps pretty well to requirements.txt, but I don't know that there's an equivalent to Cargo.lock. (I'm not really up-to-speed on the state of the art in Python dependency management.)

After a little poking around, I did find pip-tools, which has a `pip-compile` command which takes a list of dependencies and outputs a frozen list of the complete set of dependencies needed to satisfy them, which sounds useful:
https://github.com/jazzband/pip-tools
http://nvie.com/posts/better-package-management/

pip also has a `pip download` command nowadays, which might be usable for doing the actual vendoring given a full set of input requirements:
https://pip.pypa.io/en/stable/reference/pip_download/

Given these, I could propose something like this:
1) We have a `third_party/python/requirements.txt.in` file. This file lists Python packages that we need for in-tree use. When you want to vendor a new Python package, you add it to this file.
2) `mach vendor python` first runs `pip-compile` to generate `third_party/python/requirements.txt` from the .in file (hopefully there's some mode of operation to ask it to do a minimal set of updates like cargo does, and not just choose the newest version of every package) then runs `pip download` or something like that to fetch every package listed in requirements.txt and unpack them into directories under `third_party/python`.
3) `mach vendor python` should also implement some of the same features as `mach vendor rust`--compatible license checking, checks that we're not vendoring enormous files, VCS commands to check for a dirty working tree and add/remove files. We can probably be a little more lenient with license checking than we are with Rust crates, since we aren't going to ship any of this Python code with Firefox. We will still need to avoid GPL-licensed Python packages, though.

CCing a few of our Python experts to see if there are things I'm missing about modern Python packaging.
I like this approach! Looks like there's a project to support this in pip itself:
https://github.com/pypa/pipfile

Though the README suggests it isn't ready for production use and development seems fairly stalled out. Afaict, pip-compile still looks like the best bet.
pip-compile looks interesting. What I generally do is a "pip install" followed by a "pip freeze" and then weed out the "wheel" and "setuptools" packages by hand. But pip-compile isn't just a wrapper around that; it seems to implement its own dependency resolver (which pip doesn't have a proper one of: https://github.com/pypa/pip/issues/988). So it might not do what pip does, but that might be an asset.

When you're pulling down packages to check into the tree, be aware that architecture-specific packages (like ones with a C build step) may have multiple archives to download, and that can get big.

I'm not sure how much security you're shooting for. Once you get a package vendored, you're obviously safe from it changing out from under you. But is there ever an instance in which you'd want to download a new copy? If so, I recommend adding hashes to your requirements file (https://pip.readthedocs.io/en/stable/reference/pip_install/#hash-checking-mode), support for which I added to pip 8. Peter then wrote https://pypi.python.org/pypi/hashin/, which you run against a version-pinned requirements.txt to add the hashes. Those are the tools we used to distribute the stock Let's Encrypt client: we didn't want to vendor everything, because that would have led to unnecessarily large downloads and a lot of VCS churn when updating things. An example of what we ended up with is https://github.com/certbot/certbot/blob/ae0be73b5386b18a45dc60eefdba4c26bfeb393e/letsencrypt-auto-source/pieces/dependency-requirements.txt#L1. Move outward into the surrounding dirs to see the source to the "certbot-auto" tool which actually installs everything (and self-upgrades and does other whizzy things).

Whatever you do, be sure to pass --no-deps to pip when you do the actual install, so it doesn't go out into the world to download things you somehow missed. "Somehow" isn't terribly unlikely: there is a seldom-used feature called setup_requires which pip is not at a sufficiently low level to control. See https://pip.readthedocs.io/en/stable/reference/pip_install/#controlling-setup-requires for the annoying way in which you have to control it. Few packages use it, and it's better to avoid them outright.

Happy to answer more questions. It's a jungle out there. I'm particularly interested to understand what you're chasing: are you trying to avoid depending on PyPI being up? Guard against people changing or removing packages upstream?
We want building Firefox to be deterministic and dependable. We aggressively vendor packages to avoid packages from changing out from under us, URLs turning into 404s, and 3rd party services being offline. e.g. we don't want a DoS against PyPI or a random package maintainer removing an old tarball from PyPI to delay a Firefox chemspill.
Then our choices will be...

1. Stick to pure-Python packages or
2. Vendor the architecture-specific package's source and let pip build it on demand (at a speed penalty) or
3. Vendor the built wheel for every arch we're interested in (at a space penalty)
We don't want to vendor wheels because they aren't transparent. We want to vendor the package's source (either from a tar.gz distribution, source wheel, or the canonical source repo) along with a setup.py that we can invoke [via pip] (if necessary).
That sounds good. Just watch out for any dependencies hidden by setup_requires.
I stumbled upon pipfile today, which sounds like the thing we actually want (but is in flux):
https://github.com/pypa/pipfile

It's essentially cargo-style dependency specification for pip. The goal looks to be to eventually integrate the functionality into pip itself. Currently there's a usable implementation called pipenv:
http://docs.pipenv.org/en/latest/
pipfile and its ilk won't do anything to let us build Firefox if PyPI is down, of course. It just stores pointers; it doesn't vendor files. For the most part, pipfile and pipenv are just syntactic sugar. The main features are that you can multiplex multiple requirements.txt files into one (which we don't care about) and that everything comes in one command (rather than having to use hashin (https://pypi.python.org/pypi/hashin/) and pip separately).
Correct. They just both solve the first half of things, which is "specify our Python package requirements in an easy way, but also pin exact versions". We'd have to write a vendoring tool on top of that (that might just be `pip download`?)
Depends on: 1437593
Product: Core → Firefox Build System
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

This is the approach I've been using when vendoring Python packages. It's incomplete, but I thought I'd put it up to get some feedback.
Attachment #8970117 - Flags: feedback?(ahalberstadt)
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review244568

::: python/mozbuild/mozbuild/vendor_python.py:23
(Diff revision 1)
> +        self.log_manager.enable_unstructured()
> +        vendor_dir = os.path.join(self.topsrcdir, 'third_party', 'python')
> +
> +        self._activate_virtualenv()
> +        with TemporaryDirectory() as tmp:
> +            args = ['download', package, '--dest', tmp, '--no-binary', ':all:', '--no-deps']

Is the --no-binary for diff-ability? Or maybe so things don't break if an authors of an sdist-only package later adds a wheel without bumping the version number? I ask not because I necessarily disagree but because I think the tradeoff against the slower install speed of non-binary packages should be explicit.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review246762

::: python/mozbuild/mozbuild/vendor_python.py:30
(Diff revision 1)
> +        for root, _, files in os.walk(path):
> +            for name in files:
> +                return os.path.join(root, name)

You'll want to use a `mozpack.files.FileFinder` object for this, it's optimized across all platforms and for various edge cases (do a dxr search for some example usage).

::: python/mozbuild/mozbuild/vendor_python.py:35
(Diff revision 1)
> +        path = self._find(src)
> +        if path.endswith('tar.gz'):
> +            tar = tarfile.open(path, 'r:gz')
> +            for member in tar.getmembers():
> +                if member.isreg():
> +                    member.name = member.name.partition('/')[-1]
> +                    tar.extract(member, dest)

You should use `mozfile.extract` here.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

I think this looks good, though it's suspiciously simple (hopefully that's a good thing)! I think :erikrose or :gps are better suited to do this review than myself though.
Attachment #8970117 - Flags: feedback?(ahalberstadt) → feedback+
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review244568

> Is the --no-binary for diff-ability? Or maybe so things don't break if an authors of an sdist-only package later adds a wheel without bumping the version number? I ask not because I necessarily disagree but because I think the tradeoff against the slower install speed of non-binary packages should be explicit.

I think diff-ability is a good reason on its own, but mainly the build system virtualenv doesn't have the ability to deal with vendored wheels across platforms yet. It uses a non-standard manifest to list packages at the moment:
https://dxr.mozilla.org/mozilla-central/source/build/virtualenv_packages.txt
Given that there is ongoing [1][2] (Bug 1454867) work to standardize vendoring, it would probably be worthwhile to coordinate this work with that. 


[1] https://groups.google.com/forum/?pli=1#!searchin/mozilla.dev.platform/vendor%7Csort:date/mozilla.dev.platform/-B8MECCeJnM/APHPh-E8BQAJ
[2] https://goo.gl/QZyz4x
See Also: → moz.yaml
We touched on this a little bit in the design document for the vendoring work, and for right now I don't think it fits in well. That work is currently focused on vendoring individual projects from upstream repositories, where the Python / Rust vendoring vendors an entire dependency graph. When glob gets his initial version working we should definitely look at whether we can build better integration atop what he's built.
(In reply to Andrew Halberstadt [:ahal] from comment #22)
> I think this looks good, though it's suspiciously simple (hopefully that's a
> good thing)! I think :erikrose or :gps are better suited to do this review
> than myself though.

I was kind of hoping that we'd wind up with something that worked like `mach vendor rust`--that is, we'd have a Pipfile + Pipfile.lock in the tree that would contain the full set of our vendored modules, and `mach vendor python` would simply ensure that third_party/python matched the contents of Pipfile.lock by fetching missing packages.
Assignee: nobody → dave.hunt
Status: NEW → ASSIGNED
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review244568

> I think diff-ability is a good reason on its own, but mainly the build system virtualenv doesn't have the ability to deal with vendored wheels across platforms yet. It uses a non-standard manifest to list packages at the moment:
> https://dxr.mozilla.org/mozilla-central/source/build/virtualenv_packages.txt

It's my understanding that we want to vendor the source of the packages rather than the built wheels. This is why I download the source archives and extract them to the third_party/python directory. If there's some other way to use the binaries that I'm missing please let me know.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review246762

> You'll want to use a `mozpack.files.FileFinder` object for this, it's optimized across all platforms and for various edge cases (do a dxr search for some example usage).

Sorry, I missed this comment. I'll take a look into using mozpack now.

> You should use `mozfile.extract` here.

I suspect I won't be able to change the destination using `mozfile.extract` to avoid the version number being in the directory name, but I should be able to rename the directory after extracting the files.
Attachment #8974440 - Attachment is obsolete: true
Attachment #8974440 - Flags: review?(ahalberstadt)
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review248826

::: python/mozbuild/mozbuild/vendor_python.py:56
(Diff revision 3)
> +            self._cleanup(target)
> +
> +    def _cleanup(self, path):
> +        exclude = [
> +            '.gitignore', '.travis.yml',
> +            '/examples/', '/doc/', '/docs/',

Excluding docs appears to be causing issues with some package installations. For example, virtualenv's setup.py refers to docs/index.rst and docs/changes.rst and raises an exception if they're not present. Perhaps we should preserve doc/docs?
Comment on attachment 8974636 [details]
Bug 1346026 - Update all vendored python dependencies;

https://reviewboard.mozilla.org/r/243020/#review248828

I'm anticipating some further patches will be necessary, such as pinning to older versions of packages if later versions cause issues, and changes to paths now that all packages are using source distributions.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review248852

Thanks, this looks great!

::: Pipfile:7
(Diff revision 3)
> +url = "https://pypi.org/simple"
> +verify_ssl = true
> +name = "pypi"
> +
> +[packages]
> +attrs = "*"

Just to clarify, when running |mach vendor python|, having a \* here won't cause the package to get updated right?

Is there any benefit to making the minimum version the ones that already exist in-tree?

::: python/docs/index.rst:34
(Diff revision 3)
> +
> +Adding a Python package
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To vendor a Python package you must first add the package name along with any
> +version constraints to the ``Pipfile`` in the top source directory. You can

Should this `Pipfile` live in topsrcdir, or in `third_party/python`? I'm not actually sure, could see arguments going either way.

::: python/mozbuild/mozbuild/vendor_python.py:63
(Diff revision 3)
> +            '/test/', '/tests/', '/testing/',
> +            '.egg-info', 'MANIFEST.in', 'PKG-INFO']
> +        finder = FileFinder(path, find_dotfiles=True)
> +        for path, _ in finder.find('*'):
> +            path = os.path.join(finder.base, path)
> +            if any(map(lambda v: v in path, exclude)):

You can pass these exclude paths into the FileFinder like:
    
    finder = FileFinder(..., ignore=exclude)
    
Though, just double check it doesn't need any wildcards or anything like that.
Attachment #8970117 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8974636 [details]
Bug 1346026 - Update all vendored python dependencies;

https://reviewboard.mozilla.org/r/243020/#review248856

I think this is going to cause all sorts of havoc. My preference is to only update a package if there is a need to do so, and only do one per bug (or at least one group of related packages per bug). I don't think using the latest versions of a package just for the sake of it should be a goal.
Attachment #8974636 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review248852

> Just to clarify, when running |mach vendor python|, having a \* here won't cause the package to get updated right?
> 
> Is there any benefit to making the minimum version the ones that already exist in-tree?

Judging by the next commit, it looks like these wildcards will cause the package to be updated? If so, I'd consider that an issue. I think the workflow should be:

1. Bump the package(s) you care about in the Pipfile
2. Run |mach python vendor| so *only* those packages you touched get updated
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review250350

::: python/mozbuild/mozbuild/vendor_python.py:37
(Diff revision 3)
> +                stdout=requirements)
> +
> +            with TemporaryDirectory() as tmp:
> +                self.virtualenv_manager._run_pip([
> +                    'download',
> +                    '-r', requirements.name,

It wouldn't hurt to add --require-hashes here as a redundant check in case "pipenv lock" can be tricked into not including hashes somehow.

::: python/mozbuild/mozbuild/vendor_python.py:48
(Diff revision 3)
> +
> +    def _extract(self, src, dest):
> +        finder = FileFinder(src)
> +        for path, _ in finder.find('*'):
> +            tld = mozfile.extract(os.path.join(finder.base, path), dest)[0]
> +            target = os.path.join(dest, tld.rpartition('-')[0].lower())

Why lower()? It's not to match the module name. (Just `pip download sphinx-js --no-binary :all:` to see for yourself.) Aesthetics?

::: python/mozbuild/mozbuild/vendor_python.py:56
(Diff revision 3)
> +            self._cleanup(target)
> +
> +    def _cleanup(self, path):
> +        exclude = [
> +            '.gitignore', '.travis.yml',
> +            '/examples/', '/doc/', '/docs/',

We're also going to want to preserve setup.cfg, as it sometimes contains info that's important when running setup.py to do the installation. (In general, the trend is to move things that can be expressed declaratively out of setup.py into setup.cfg.)

More broadly, I'm uneasy about excluding things by name that whose semantics are only guessed at rather than enforced by some tool. Doc, docs, examples, and testing are the scariest. Ironically, one of the things we're deleting, MANIFEST.in, actually defines which things need to be in the built package, though it would be nontrivial to reimplement the logic setuptools uses to obey it, and there are other things (like setup.py kwargs) which factor in. We probably do save quite a bit of weight by excluding tests, but I'd like to see numbers on it. Someday somebody's going to have to debug another failure (like davehunt already found) due to this excision, and they might not know it exists. If there were a way to surface it in the error message that begins the chase, I would be completely satisfied. How much, accounting for compression, do we actually save through excision?

[Edit: Now that I read the source, I see that we're downloading the released sdists from PyPI, not from their source repos or anything crazy like that. So we're actually already getting the advantage of having MANIFEST.in's exclusions obeyed. There are probably fairly few packages that actually ship their docs or examples (or, less surely, tests) with their sdists, making excision even less necessary.]
Attachment #8970117 - Flags: review-
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review250350

> It wouldn't hurt to add --require-hashes here as a redundant check in case "pipenv lock" can be tricked into not including hashes somehow.

Sadly, `pipenv lock -r` currently generates a `requirements.txt` file *without* hashes... I've raised this with one of the core contributors, and intend to open an issue. At some point in the future we should be able to use the `Pipfile.lock` directly, but nobody seems confident that this will be any time soon.

> Why lower()? It's not to match the module name. (Just `pip download sphinx-js --no-binary :all:` to see for yourself.) Aesthetics?

This is mostly to match the existing directories in `third_party/python`, but I'm happy with changing this. Based on other comments, I'm thinking my initial patch will just include a single dependency and any transient dependencies, so we can then tackle the remaining packages separately, including cleanup such as renaming `hglib` to `python-hglib` to match the actual package name.

> We're also going to want to preserve setup.cfg, as it sometimes contains info that's important when running setup.py to do the installation. (In general, the trend is to move things that can be expressed declaratively out of setup.py into setup.cfg.)
> 
> More broadly, I'm uneasy about excluding things by name that whose semantics are only guessed at rather than enforced by some tool. Doc, docs, examples, and testing are the scariest. Ironically, one of the things we're deleting, MANIFEST.in, actually defines which things need to be in the built package, though it would be nontrivial to reimplement the logic setuptools uses to obey it, and there are other things (like setup.py kwargs) which factor in. We probably do save quite a bit of weight by excluding tests, but I'd like to see numbers on it. Someday somebody's going to have to debug another failure (like davehunt already found) due to this excision, and they might not know it exists. If there were a way to surface it in the error message that begins the chase, I would be completely satisfied. How much, accounting for compression, do we actually save through excision?
> 
> [Edit: Now that I read the source, I see that we're downloading the released sdists from PyPI, not from their source repos or anything crazy like that. So we're actually already getting the advantage of having MANIFEST.in's exclusions obeyed. There are probably fairly few packages that actually ship their docs or examples (or, less surely, tests) with their sdists, making excision even less necessary.]

My original patch did not exclude these directories and paths, but I was considering it, so it didn't take much for :ahal to persuade me to do so. Based on the issue with the docs and :erik's comment I'm now flipping back. I wonder if we could open issues against the worst offending packages to suggest the might want to exclude their tests and ci configurations from their distributions.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review248852

> Judging by the next commit, it looks like these wildcards will cause the package to be updated? If so, I'd consider that an issue. I think the workflow should be:
> 
> 1. Bump the package(s) you care about in the Pipfile
> 2. Run |mach python vendor| so *only* those packages you touched get updated

I agree. I think I'm going to chenge this to `mach vendor python [PACKAGE]`. This will run `pipenv install [PACKAGE]` and regenerate a `requirements.txt` file to be checked into the tree. The `requirements.txt` file will then be used for vendoring the new package. Running `mach vendor python` without a package will just re-vendor based on the existing `requirements.txt` file. Eventually we should be able to use `Pipfile.lock` instead of `requirements.txt`.

> Should this `Pipfile` live in topsrcdir, or in `third_party/python`? I'm not actually sure, could see arguments going either way.

I'm okay with either, though if we're generating a `requirements.txt` then pipenv will pick this up if it's in the same directory as `Pipfile` and use it as the source dependencies. Also, `pipenv install [PACKAGE]` from `third_party/python` will pick up a directory name matching the package if it exists rather than using PyPI.
Attachment #8974636 - Attachment is obsolete: true
Attachment #8970117 - Flags: review?(ahal)
Attachment #8970117 - Flags: review+
Attachment #8970117 - Flags: feedback?(erik)
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review250350

> This is mostly to match the existing directories in `third_party/python`, but I'm happy with changing this. Based on other comments, I'm thinking my initial patch will just include a single dependency and any transient dependencies, so we can then tackle the remaining packages separately, including cleanup such as renaming `hglib` to `python-hglib` to match the actual package name.

I've raised this as https://github.com/pypa/pipenv/issues/2218
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review251022

::: python/mozbuild/mozbuild/mach_commands.py:2120
(Diff revision 4)
>          vendor_command = self._spawn(VendorAOM)
>          vendor_command.vendor(**kwargs)
>  
> +    @SubCommand('vendor', 'python',
> +                description='Vendor Python packages from pypi.org into third_party/python')
> +    @CommandArgument('packages', default=None, nargs='*', help='Packages to vendor')

What does running |mach vendor python| without any packages do? Does it update unpinned packages? Might be good to mention this in the help text.

::: python/mozbuild/mozbuild/vendor_python.py:23
(Diff revision 4)
> +
> +    def vendor(self, packages=None):
> +        self.populate_logger()
> +        self.log_manager.enable_unstructured()
> +
> +        vendor_dir = mozpath.join(self.topsrcdir, 'third_party/python')

nit: os.path.join('third_party', 'python')
Attachment #8970117 - Flags: review?(ahal) → review+
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

https://reviewboard.mozilla.org/r/244766/#review251030

::: third_party/python/virtualenv/PKG-INFO:3
(Diff revision 1)
> -Metadata-Version: 1.1
> +Metadata-Version: 1.2
>  Name: virtualenv
> -Version: 15.2.0
> +Version: 16.0.0

Just fyi, this upgrade will affect many of our tasks in CI. I don't anticipate any problems, but you might want to do a thorough try run before pushing just in case.

Also, nice that we'll be using pip 10.0.1 and won't see that annoying warning anymore! :)
Attachment #8976686 - Flags: review?(ahal) → review+
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

https://reviewboard.mozilla.org/r/244766/#review251030

> Just fyi, this upgrade will affect many of our tasks in CI. I don't anticipate any problems, but you might want to do a thorough try run before pushing just in case.
> 
> Also, nice that we'll be using pip 10.0.1 and won't see that annoying warning anymore! :)

Last time I tried updating to pip 10, there were quite a few failures, so I would be wary of landing this without running the full test suite.
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

https://reviewboard.mozilla.org/r/244766/#review251030

> Last time I tried updating to pip 10, there were quite a few failures, so I would be wary of landing this without running the full test suite.

Also https://reviewboard.mozilla.org/r/244686/ is going to make mozharness use thie copy of virtualenv, which will impact basically all the tests.
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

https://reviewboard.mozilla.org/r/244766/#review251030

> Also https://reviewboard.mozilla.org/r/244686/ is going to make mozharness use thie copy of virtualenv, which will impact basically all the tests.

Yeah, this wasn't entirely intentional as I just vendored the latest pipenv and virtualenv is a transient dependency. If it's going to cause issues I can simply pin the version of virtualenv to the same one we currently vendor.
Comment on attachment 8976687 [details]
Bug 1346026 - Unvendor pathlib2 as it is no longer a dependency for pipenv;

https://reviewboard.mozilla.org/r/244768/#review251482
Attachment #8976687 - Flags: review?(ted) → review+
> My original patch did not exclude these directories and paths, but I was considering it, so it didn't take much for :ahal to persuade me to do so. Based on the issue with the docs and :erik's comment I'm now flipping back. I wonder if we could open issues against the worst offending packages to suggest the might want to exclude their tests and ci configurations from their distributions.

Having dealt with the same situation vendoring Rust crates I think this is the best way to go. Vendor the packages as-is, and if there are packages including extraneous junk then file upstream issues to get them to fix that.

Related--it'd probably be good to implement similar file size checking logic for Python vendoring like we have for Rust vendoring (perhaps we could figure out a way to reuse the same code, even). That would be fine to do in a followup.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review251486

This looks good, I just have a few things I'd like to see fixed before landing.

::: python/mozbuild/mozbuild/vendor_python.py:28
(Diff revision 4)
> +        vendor_dir = mozpath.join(self.topsrcdir, 'third_party/python')
> +
> +        packages = packages or []
> +        pipenv = self.ensure_pipenv()
> +
> +        for package in packages:

You mention in the docs that people should use specific versions here. I think you should validate that any packages passed here contain a version specifier of some sort. I'm not sure if pip version specifiers are simple to parse, but maybe a check that the package contains a valid operator, and that splitting on the operator produces two non-empty halves would be sufficient?

::: python/mozbuild/mozbuild/vendor_python.py:33
(Diff revision 4)
> +        for package in packages:
> +            subprocess.check_call(
> +                [pipenv, 'install', package],
> +                cwd=self.topsrcdir)
> +
> +        requirements = mozpath.join(vendor_dir, 'requirements.txt')

It doesn't look like the requirements.txt file gets cleaned up afterwards. Perhaps a `finally:` block to do that would be good?

::: python/mozbuild/mozbuild/vendor_python.py:48
(Diff revision 4)
> +                'download',
> +                '-r', requirements,
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])
> +            self._extract(tmp, vendor_dir)

This could stand a short explanatory comment. I gather that `pip download` downloads .tgz or .whl packages direct from PyPi, and we want to vendor their contents, right?
Attachment #8970117 - Flags: review?(ted) → review+
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review250894

::: python/docs/index.rst:35
(Diff revision 4)
> +Adding a Python package
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To vendor a Python package, run ``mach vendor python [PACKAGE]``, where
> +``[PACKAGE]`` is one or more package names along with a version number in the
> +format ``pytest==3.5.1``. If you do not specify the version number then there

It would be trivial to solve this for most practical scenarios by saying something like "if not all(('==' in package) for package in packages): blow up" down in vendor().

::: python/docs/index.rst:46
(Diff revision 4)
> +``third_party/python`` directory.
> +
> +If you're familiar with ``Pipfile`` you can also directly modify this in the in
> +the top source directory and then run ``mach vendor python`` for your changes
> +to take effect. This allows advanced options such as specifying alternative
> +package indexed (see below), and

"indexes"

::: python/mozbuild/mozbuild/vendor_python.py:3
(Diff revision 4)
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, # You can obtain one at http://mozilla.org/MPL/2.0/.

Extra #

::: python/mozbuild/mozbuild/vendor_python.py:47
(Diff revision 4)
> +            self.virtualenv_manager._run_pip([
> +                'download',
> +                '-r', requirements,
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])

We might as well add --disable-pip-version-check to avoid an annoying message and a little network activity.

::: python/mozbuild/mozbuild/vendor_python.py:47
(Diff revision 4)
> +            self.virtualenv_manager._run_pip([
> +                'download',
> +                '-r', requirements,
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])

We might as well add --disable-pip-version-check to avoid an annoying message and a little network activity.

::: python/mozbuild/mozbuild/vendor_python.py:50
(Diff revision 4)
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])
> +            self._extract(tmp, vendor_dir)
> +
> +        self.repository.add_remove_files(vendor_dir)

I've finally wrapped my head around what this is doing; sorry for the delay. I'd like to understand the intended purpose of pipenv here. Is it mostly just a convenient way to capture hashes?

* If so, what's the point of keeping requirements.txt around and checking it into version control? It's a generated artifact and can be reproduced at will from Pipfile.

* If not, why not just stick with vanilla pip?
Attachment #8970117 - Flags: review-
(Erik suddenly realizes that you actually have to Publish a MozReview review.)
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review251510
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review251814

::: python/docs/index.rst:35
(Diff revision 4)
> +Adding a Python package
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +To vendor a Python package, run ``mach vendor python [PACKAGE]``, where
> +``[PACKAGE]`` is one or more package names along with a version number in the
> +format ``pytest==3.5.1``. If you do not specify the version number then there

I was also considering taking two positional arguments, one for the package name and another for the version. This would mean only one package could be vendored at a time, but I suspect that's not an issue. What do you think?

::: python/mozbuild/mozbuild/mach_commands.py:2120
(Diff revision 4)
>          vendor_command = self._spawn(VendorAOM)
>          vendor_command.vendor(**kwargs)
>  
> +    @SubCommand('vendor', 'python',
> +                description='Vendor Python packages from pypi.org into third_party/python')
> +    @CommandArgument('packages', default=None, nargs='*', help='Packages to vendor')

It determines the dependency graph from the Pipfile, generates a requirements.txt file, and replaces/adds any vendored packages. If there are any packages with loose or no version specifiers then they may be updated. Any transient dependencies may also be updated.

::: python/mozbuild/mozbuild/vendor_python.py:28
(Diff revision 4)
> +        vendor_dir = mozpath.join(self.topsrcdir, 'third_party/python')
> +
> +        packages = packages or []
> +        pipenv = self.ensure_pipenv()
> +
> +        for package in packages:

Would you be okay with me changing the command to the syntax `mach vendor python [package] [version]`? This wouldn't allow use to vendor multiple packages in a single command, but would ensure that we're pinning the dependencies to a specific version.

::: python/mozbuild/mozbuild/vendor_python.py:33
(Diff revision 4)
> +        for package in packages:
> +            subprocess.check_call(
> +                [pipenv, 'install', package],
> +                cwd=self.topsrcdir)
> +
> +        requirements = mozpath.join(vendor_dir, 'requirements.txt')

I'll revisit this, originally the `requirements.txt` was a temporary file. I switched to having the file in-tree because I was thinking of making the command two steps (lock/vendor). I ultimately made it uneccesary to commit the requirements file though.

::: python/mozbuild/mozbuild/vendor_python.py:47
(Diff revision 4)
> +            self.virtualenv_manager._run_pip([
> +                'download',
> +                '-r', requirements,
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])

+1

::: python/mozbuild/mozbuild/vendor_python.py:48
(Diff revision 4)
> +                'download',
> +                '-r', requirements,
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])
> +            self._extract(tmp, vendor_dir)

+1

::: python/mozbuild/mozbuild/vendor_python.py:50
(Diff revision 4)
> +                '--no-deps',
> +                '--dest', tmp,
> +                '--no-binary', ':all:'])
> +            self._extract(tmp, vendor_dir)
> +
> +        self.repository.add_remove_files(vendor_dir)

pipenv is reading the direct dependencies in the Pipfile, which is what we _want_ and determining the full dependency graphs, which is what we _need_. It's creating a `Pipfile.lock` with all of the dependencies, including hashes. For now, we can't use pipenv to download the dependencies, but we can generate a `requirements.txt` from the lock file. With this, we can use `pip download` to download the source distributions for all packages and vendor them in the tree.

You are correct that we don't need the `requirements.txt` to hang around. It's there because at one point I was thinking this would be two separate commands (lock, vendor) but it turned out it was easy enough to do it in one.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review251978

::: python/mozbuild/mozbuild/vendor_python.py:3
(Diff revision 4)
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, # You can obtain one at http://mozilla.org/MPL/2.0/.

Well spotted. It seems this is in a few other files, including the one I copied it from.
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

:ted & :erik would you mind taking another look over this?
Attachment #8970117 - Flags: review?(ted)
Attachment #8970117 - Flags: review?(erik)
Attachment #8970117 - Flags: review+
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

:ahal I've ow pinned virtualenv to reduce the number of things changing here. Would you mind taking another look? Also, are there any tasks that should trigger whenever the vendored packages are modified? If so, we could have them watch the Pipfile.lock file.

Finally, can you suggest what I should run on try before landing these changes? Is it worth running a broad range of jobs?
Attachment #8976686 - Flags: review+ → review?(ahal)
Comment on attachment 8976686 [details]
Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;

https://reviewboard.mozilla.org/r/244766/#review252164
Attachment #8976686 - Flags: review?(ahal) → review+
Comment on attachment 8970117 [details]
Bug 1346026 - Add ability to vendor Python modules using mach;

https://reviewboard.mozilla.org/r/238916/#review252198

Looks good to me! Nice work!

::: python/mozbuild/mozbuild/vendor_python.py:59
(Diff revisions 4 - 5)
> -            self._extract(tmp, vendor_dir)
> +                self._extract(tmp, vendor_dir)
>  
>          self.repository.add_remove_files(vendor_dir)
>  
>      def _extract(self, src, dest):
> +        # extract source distribution into vendor directory

This would make a fine docstring (rather than comment).
Attachment #8970117 - Flags: review?(erik) → review+
(In reply to Dave Hunt [:davehunt] ⌚️UTC+0 from comment #64)
> Comment on attachment 8976686 [details]
> Bug 1346026 - Vendor pipenv 2018.5.18 and transient dependencies;
> 
> :ahal I've ow pinned virtualenv to reduce the number of things changing
> here. Would you mind taking another look? Also, are there any tasks that
> should trigger whenever the vendored packages are modified? If so, we could
> have them watch the Pipfile.lock file.

There likely are tasks like these, though they can watch directories so I'd prefer they just watch whatever packages they depend on rather than the Pipfile.lock. Watching the latter would make them run anytime *any* package is updated. Since they could (and should) have been doing this all along, I'd suggest not worrying about it in this bug.
 
> Finally, can you suggest what I should run on try before landing these
> changes? Is it worth running a broad range of jobs?

Since this isn't modifying virtualenv anymore, I don't think it needs a very extensive try push. Probably even just the jobs that show up with |mach try empty| is good enough (as long as some python-tests get scheduled). Maybe also the doc task.
Pushed by dhunt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f293475f462d
Add ability to vendor Python modules using mach; r=ahal,erik,ted
https://hg.mozilla.org/integration/autoland/rev/705e8cb185d4
Vendor pipenv 2018.5.18 and transient dependencies; r=ahal
https://hg.mozilla.org/integration/autoland/rev/95410b5cdecc
Unvendor pathlib2 as it is no longer a dependency for pipenv; r=ted
Blocks: 1463793
Backed out 3 changesets (bug 1346026) for Bugzilla linting

Log:
https://treeherder.mozilla.org/logviewer.html#?job_id=179842079&repo=autoland

Missing Bugzilla component: Pipfile
Missing Bugzilla component: Pipfile.lock

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=95410b5cdecc643367cf7cfb6cd9017730dcc34d

Backout:
https://hg.mozilla.org/integration/autoland/rev/a3e88c28ba04228a1b501f963b19d18704e392d0

:davehunt: Hi, please add the component in Bugzilla for the 2 files above.
Flags: needinfo?(dave.hunt)
Pushed by dhunt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3c117c97e17c
Add ability to vendor Python modules using mach; r=ahal,erik,ted
https://hg.mozilla.org/integration/autoland/rev/f7c1d82b1c12
Vendor pipenv 2018.5.18 and transient dependencies; r=ahal
https://hg.mozilla.org/integration/autoland/rev/1020ace9849b
Unvendor pathlib2 as it is no longer a dependency for pipenv; r=ted
Flags: needinfo?(dave.hunt)
https://hg.mozilla.org/mozilla-central/rev/3c117c97e17c
https://hg.mozilla.org/mozilla-central/rev/f7c1d82b1c12
https://hg.mozilla.org/mozilla-central/rev/1020ace9849b
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.