Closed Bug 1447057 Opened 2 years ago Closed 2 years ago

Document policies for using Python packages

Categories

(Firefox Build System :: Generated Documentation, enhancement)

3 Branch
enhancement
Not set

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: davehunt, Assigned: davehunt)

Details

Attachments

(1 file)

We currently have no documentation covering our use of Python packages in mozilla-central. This should cover when to vendor packages, and any expectations for vendoring or installation from a package index.
I've uploaded an initial patch for discussion. In addition, I had the following questions over our vendoring policies:

1. Should we limit the size of vendored packages?
2. Should we allow vendoring of multiple versions?
3. What should our policy say with regard to package licenses?

Note that the proposed policies were partly inspired by conversations in bug 1346026 and bug 1437593, in addition to the vendoring policies in use by the pip project: https://github.com/pypa/pip/blob/277bd6b00b9265d8578dc865b63626b7f5eafbad/src/pip/_vendor/README.rst
Flags: needinfo?(gps)
Adding a few of our Python experts to provoke some feedback.
(In reply to Dave Hunt [:davehunt] ⌚️UTC+0 (PTO until 03-Mar-2018) from comment #2)
> I've uploaded an initial patch for discussion. In addition, I had the
> following questions over our vendoring policies:

This looks generally sensible. It might be worthwhile to clarify the bit about the internal package mirror. Specifically, anything that's going to run in CI should not be hitting external package repositories, but as long as it's not part of the build itself it's OK for it to use packages that are not vendored.

> 1. Should we limit the size of vendored packages?

I think having checks similar to what we have in `mach vendor rust` would be good to prevent us adding huge files without conscious acknowledgement:
https://dxr.mozilla.org/mozilla-central/rev/6ff60a083701d08c52702daf50f28e8f46ae3a1c/python/mozbuild/mozbuild/vendor_rust.py#281

> 2. Should we allow vendoring of multiple versions?

I don't know what the right answer is here. We allow this for Rust crates, but Rust has explicit support for this. This might be OK if we have separate commands that need different major versions of packages as long as we don't wind up trying to install two different versions of the same package in the same virtualenv.

> 3. What should our policy say with regard to package licenses?

This is a little different from the Rust code we use, in that we aren't going to ship any of this Python code as part of Firefox. However, given that most (all?) of the Python code we have written in mozilla-central is MPL licensed we likely need to set the bar to require vendored Python packages to be MPL-compatible (which means no GPL or LGPL). This might wind up being a PITA in practice, but I don't see any way around it.
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review234810

::: commit-message-4f101:1
(Diff revision 1)
> +Bug 1447057 - Document policies for using third-party Python packages; r=gps

You can add this docs dir to the codespell linter for spell checking:
https://searchfox.org/mozilla-central/source/tools/lint/codespell.yml#4
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review235230

Otherwise, this LGTM!

::: python/docs/index.rst:35
(Diff revision 1)
> +Using a Python package index
> +============================
> +
> +If the Python package is not used in the building of Firefox then it can be
> +installed from a package index. Some tasks are not permitted to use external
> +resources, and for those we can publish packages to an internal PyPI mirror.

I don't know much about the history of our PyPI mirror, but my intent with introducing hash-checking into pip was to reduce the cases in which one would be needed (along with the concomitant administration, audit-trail tracking, etc.). Specifically, an internal mirror should now be useful only for added availability, not because of trust issues. Toward making sure our policy matches the new reality, I'm curious which tasks are not permitted to use external resources: are they all availability-motivated?

::: python/docs/index.rst:43
(Diff revision 1)
> +or another package index.
> +
> +The following policy applies to **ALL** installations from a package index:
> +
> +* Packages **MUST** be pinned to a specific version.
> +* Package hashes **MUST** be specified.

The first bullet is really redundant with the second, since specifying a hash completely locks down the accepted content. Also, pip will say "Hey, you need to pin the version" if you specify a hash without doing so. (I'm the author of pip's hash-checking machinery.)

::: python/docs/index.rst:46
(Diff revision 1)
> +
> +* Packages **MUST** be pinned to a specific version.
> +* Package hashes **MUST** be specified.
> +
> +Note that when using a Python package index there is a risk that the service
> +could be unavailable, or packages may be updated or even pulled without notice.

Since we're using hash-checking, the package can't be updated out from under us anymore.
Attachment #8960267 - Flags: review-
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review235230

> I don't know much about the history of our PyPI mirror, but my intent with introducing hash-checking into pip was to reduce the cases in which one would be needed (along with the concomitant administration, audit-trail tracking, etc.). Specifically, an internal mirror should now be useful only for added availability, not because of trust issues. Toward making sure our policy matches the new reality, I'm curious which tasks are not permitted to use external resources: are they all availability-motivated?

This is mainly motivated by availability: we can't have a Firefox chemspill delayed because a 3rd party service is under DoS or something. Whereas we can operate multi-datacenter services off the public Internet that are reasonably prone to outages, including DoS attacks.

For services where we don't have a good content integrity story, trust/security is another major motivation.

Of course, not all our services may have the desired level of availability and fault tolerance. But the trend is certainly in that direction.
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review235266

This is mostly reasonable and reflects our not-yet-formally-documented policy. But since there are a few small problems, it gets r-.

Thanks for writing this up!

::: python/docs/index.rst:24
(Diff revision 1)
> +* Vendored libraries **MUST** not be modified except as required to
> +  successfully vendor them.
> +* Vendored libraries **MUST** be released copies of libraries available on PyPI.
> +* The versions of libraries vendored in pip **MUST** be reflected in
> +  third_party/python/vendor.txt.

I think these are too strong. Periodically we've historically needed to make one-off changes to vendored code. Sometimes it is more convenient to make a one-off change to fix a high-profile bug than it is to wait on upstream to perform a release with the needed fix.

Generally speaking, vendored libraries don't vary from upstream releases and we should strive for that property. But bit-for-bit parity shouldn't be set in stone.

::: python/docs/index.rst:43
(Diff revision 1)
> +or another package index.
> +
> +The following policy applies to **ALL** installations from a package index:
> +
> +* Packages **MUST** be pinned to a specific version.
> +* Package hashes **MUST** be specified.

In addition to what Erik says, we may want to give a technical example. e.g. *all package installs from a package index SHOULD be performed with pip and be using the --require-hashes flag*.
Attachment #8960267 - Flags: review?(gps) → review-
(In reply to Dave Hunt [:davehunt] ⌚️UTC+0 (PTO until 03-Mar-2018) from comment #2)
> I've uploaded an initial patch for discussion. In addition, I had the
> following questions over our vendoring policies:
> 
> 1. Should we limit the size of vendored packages?

We can and do have mechanisms to check for bloat in vendored Rust packages. I think it is reasonable to extend this to Python as well.

> 2. Should we allow vendoring of multiple versions?

Ideally, no. In reality, there will need to be a reason to do this. We probably want the newest version of a package to live in a directory with no version component. Consumers of a specific, older version can point at a directory with the version number in it.

> 3. What should our policy say with regard to package licenses?

We should maintain an allow list of licenses somewhere and the `mach vendor` (or equivalent) command should enforce this. We already do something similar for Rust. Although unlike Rust, I don't believe Python's packaging tools support screening licenses for compliance. At least that's not something built into pip. Maybe someone has written an extension for pip?

FWIW, glob is working on a more formal proposal for all things vendoring. End goal is everything that is vendored into mozilla-central has its vendoring definition defined in a machine readable form and all vendoring is reproducible and turnkey. I've CCd him here in case he wants to chime in.
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #9)
> > 1. Should we limit the size of vendored packages?
> 
> We can and do have mechanisms to check for bloat in vendored Rust packages.
> I think it is reasonable to extend this to Python as well.

+1

vendor-rust blocks vendoring of individual files > 100kb, and warns if the whole library is > 5mb.
(see python/mozbuild/mozbuild/vendor_rust.py)
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review241030

Looks great!

::: python/docs/index.rst:18
(Diff revision 2)
> +
> +If the Python package is to be used in the building of Firefox itself, then we
> +**MUST** use a vendored version. This ensures that to build Firefox we only
> +require a checkout of the source, and do not depend on a package index. This
> +ensures that building Firefox is deterministic and dependable, avoids packages
> +from changing our from under us, and means we’re not affected when 3rd party

Nit: typo in "our"
Attachment #8960267 - Flags: review?(gps) → review+
Pushed by dhunt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53109cc01f60
Document policies for using third-party Python packages; r=gps
Comment on attachment 8960267 [details]
Bug 1447057 - Document policies for using third-party Python packages;

https://reviewboard.mozilla.org/r/229032/#review241070


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: python/docs/index.rst:40
(Diff revision 3)
> +See `how to upload to internal PyPI <https://wiki.mozilla.org/ReleaseEngineering/How_To/Upload_to_internal_Pypi>`_
> +for more details. If you are not restricted, you can install packages from PyPI
> +or another package index.
> +
> +All packages installed from a package index **MUST** specify hashes to ensure
> +compatibiliy and protect against remote tampering. Hash-checking mode can be

Warning: Compatibiliy  ==> compatibility [codespell]
https://hg.mozilla.org/mozilla-central/rev/53109cc01f60
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.