Overhaul our Python package management



4 years ago
3 years ago


(Reporter: emorley, Assigned: emorley)




(1 attachment, 5 obsolete attachments)

On th-prod-admin:

[emorley@treeherderadm.private.scl3 treeherder.mozilla.org]$ ls -l
total 20
-rw-r--r--  1 treeherder treeherder 1574 Jun  3 15:56 admin-requirements.txt
-rw-r--r--  1 treeherder treeherder 1574 Jun  5 13:24 processor-requirements.txt
drwxr-xr-x 13 treeherder treeherder 4096 Sep 19 18:07 treeherder-service
drwxr-xr-x  6 root       root       4096 Sep 19 18:07 treeherder-ui
-rw-r--r--  1 treeherder treeherder 1574 Jun  5 13:24 web-requirements.txt

The contents of each of these files are the combined requirements from a selection of the files at:

I'm really against not having files checked into the repo, since it makes it easy to miss stale dependencies, as well as hard for people working on treeherder to see the full picture unless they have both SSH access and the knowledge that these additional requirements files exist.

We should:
1) At the least check these files into the repo under https://github.com/mozilla/treeherder-service/tree/master/requirements
2) Ideally try and combine these with the existing requirements files, so there is as little duplication as possible.
Blocks: 1072681
Blocks: 1080757
No longer blocks: 1072681
No longer blocks: 1080757
Component: Treeherder → Treeherder: Infrastructure
Priority: P2 → P3
Blocks: 1125838
Created attachment 8563358 [details]

Current file contents
Created attachment 8563359 [details]

Where admin really means rabbitmq1
Created attachment 8563360 [details]
Created attachment 8563366 [details]

Attach the correct file this time.
Attachment #8563360 - Attachment is obsolete: true
All three files are identical, apart from admin-requirements.txt also has:

Comparing to the in-repo files, the diff is as follows:

--- production-requirements
+++ in-repo-requirements
Created attachment 8563384 [details]
List of in-repo vs prod requirements file vs actually installed versions
Summary: Check {admin,processor,web}-requirements.txt into the repo → Make prod vs stage vs in-repo Python lib requirements files consistent
I suggest:

1) Using a single file for all node types, given there is only one dependency difference, and it seems cleaner to just have the same libs on all nodes.

2) Making this single file refer to the {pure,dev,...}.txt files using includes, rather than copy-pasting. ie:
-r dev.txt
-r pure.txt
...or else doing away with the {pure,dev,...}.txt files entirely.

3) Removing any no-unused libs (eg celerymon).

4) Adding missing libraries to the requirements files, that were installed manually (see the attachment for libs where there is no corresponding "*in-repo*" line.

5) Updating all stage/prod nodes to use the updated requirements file - since we currently have discrepancies:

Django	1.5.10	treeherder-etl1
Django	1.5.10	treeherder-etl2
Django	1.5.10	treeherder-processor3
Django	1.5.10	treeherder3.webapp
Django	1.5.11	treeherder-etl1.stage
Django	1.5.11	treeherder-etl2.stage
Django	1.5.11	treeherder-processor1.stage
Django	1.5.11	treeherder-processor2.stage
Django	1.5.11	treeherder-processor3.stage
Django	1.5.11	treeherder-rabbitmq1
Django	1.5.11	treeherder-rabbitmq1.stage
Django	1.5.11	treeherder1.stage.webapp
Django	1.5.11	treeherder2.stage.webapp
Django	1.5.11	treeherder3.stage.webapp
Django	1.5.8	treeherder-processor1
Django	1.5.8	treeherder-processor2
Django	1.5.8	treeherder1.webapp
Django	1.5.8	treeherder2.webapp

In the future we could consider using virtualenv+peep as suggested by Mauro, but that's fodder for another bug :-)
Priority: P3 → P2
Summary: Make prod vs stage vs in-repo Python lib requirements files consistent → Make prod vs stage vs in-repo Python lib versions consistent & more maintainable
Actually, since everything in pure.txt gets checked into the vendor directory in treeherder-service, surely we don't need the things in pure.txt in the system (or venv) environment anyway, since we _should_ always be using the copy in the repo?
Your assumption is correct. Including your use of _should_.
Since we use:
sys.path.append(path('..', 'vendor'))

Rather than:
sys.path.insert(1, path('..', 'vendor'))

The site-packages version will be seen first - so we're potentially using older versions of packages on prod/stage than are contained in vendor/.

We're also pip installing pure.txt unnecessarily on Travis runs too (only adds another 15 seconds, but still).
Morphing this to be just about reorganising/consolidating the current files to they're easier to keep in sync. Later potential switches to {virtualenvs, peep, making stage/prod puppet use the files straight from the repo} can happen in other bugs, and after the email I sent out with some initial questions. 

Current in-repo files:

Files on treeherderadm:


1) Remove all references to pure.txt from anywhere other than bin/generate-vendor-lib.py - since we should not be pip installing these files (they are already in the vendor directory in the repo). This includes Travis and wherever on prod it is referenced.

2) Rename 'pure.txt' to 'checked-in.txt', or similar (to make it clearer that these relate to the contents of vendor/ and should not be pip installed).

3) Making sure the current in-repo files reflect what we're actually using in production (ie add anything missing, tweak versions etc).

4) Copy treeherder-service/requirements/* to:

(so they are outside the repo, to alleviate the security concerns that were raised - but are still super easy to sync)

5) Add two new files:

...which have contents (pip allows referencing other files, we shouldn't need to copy and paste like at present):
-r requirements/compiled.txt
-r requirements/prod.txt

6) Point prod/stage puppet at those new files.

7) Remove the old {web,admin,...}-requirements.txt files.

8) Next time we need to update packages, we update first in the repo, then just get fubar to repeat step #4. -> Profit \o/

9) As part of update.py we could even run a diff of the in-repo requirements directory against the prod one, so at a glance from the deploy logs we can see if there is any drift and we need to repeat step #4 above.

And then in the future (and other bugs) we can consider using the in-repo files directory, using virtualenvs & peep.

Does that sound like a viable plan?
Flags: needinfo?(mdoglio)
Flags: needinfo?(klibby)
Summary: Make prod vs stage vs in-repo Python lib versions consistent & more maintainable → Consolidate the in-repo Python package requirements files & those on stage/prod
I guess there's also:

10) Remove old packages from stage/prod, including the ones we shouldn't have pip installed from pure.txt, since they are in the vendor directory. We'll obviously need to check every consumer of them is able to use them from vendor before removing.
Blocks: 1134140
No longer blocks: 1134140
Thanks Ed, it sounds like a good plan.
Flags: needinfo?(mdoglio)
Depends on: 1134740
Assignee: nobody → emorley
Flags: needinfo?(klibby)
Priority: P2 → P3
Comment on attachment 8563384 [details]
List of in-repo vs prod requirements file vs actually installed versions

This list is now obsolete - for the latest see:
Attachment #8563384 - Attachment is obsolete: true
Created attachment 8576633 [details] [review]
Clean up the in-repo requirements files

This PR:
* Gives the requirements files clearer filenames
* Specifies exact versions throughout rather than leaving packages unpinned or using version ranges
* Lists all inferred dependencies, rather than letting pip install whatever version of them it chooses
* Makes versions reflect what we're actually using in production, so as to make the transition to using these new requirements easier (production currently doesn't use the files in the repo)
+ misc other cleanup
Attachment #8563358 - Attachment is obsolete: true
Attachment #8563359 - Attachment is obsolete: true
Attachment #8563366 - Attachment is obsolete: true
Attachment #8576633 - Flags: review?(mdoglio)
Before we switch to using a virtualenv, we need to be sure we're not missing any requirements in common.txt that were installed directly into production & aren't somehow causing errors during our Travis run. I've vetted the requirements files (with the changes in this PR) against the spreadsheet in comment 14, and the following deviations remain:

* Production has the contents of checked-in.txt, docs.txt & dev.txt installed, when they are not required. We can clean these up at some point (likely after we switch to a virtualenv, since we can do everything at once), but no rush.

* We stopped using the following packages, so they are present in production but not our requirements files (again, no rush to clean up):
argparse (in this bug's PR; it's included in Python 2.7)
django-celery (bug 1016117)
celerymon (bug 1016117)
tornado (was required by celerymon)
backports.ssl-match-hostname (was required by tornado)
certifi (was required by tornado)
South (bug 1119479)

* The admin node needs the Chief requirements installed globally (and it makes sense not to add these to our requirements files):

* A few other packages of unknown origin:

...apart from that all packages installed globally in production are accounted for.

Does anyone know what those last 3 were used for and if we need them in our virtualenv?
Component: Treeherder: Infrastructure → Treeherder
Summary: Consolidate the in-repo Python package requirements files & those on stage/prod → Overhaul our Python package management
Depends on: 1142498
Depends on: 1142508
Attachment #8576633 - Flags: review?(mdoglio) → review+

Comment 17

4 years ago
Commits pushed to master at https://github.com/mozilla/treeherder-service

Bug 1070470 - Rename pure.txt to checked-in.txt

The packages in this file are those that have been checked in to
vendor/, and the new name makes this more obvious.

Bug 1070470 - Set explicit versions for all packages in requirements

For changes in compiled.txt, the version matches that installed globally
in production. For those in checked-in.txt I've used the version in
vendor/ which is actually different from that in production's global
site-packages, but is what we're actually using, since vendor/ is
earlier in the Python path.

Bug 1070470 - Remove argparse package since it's built into Python 2.7


Bug 1070470 - Remove duplicate jsonschema requirement

|jsonschema==2.4.0| is already listed higher up in the file.

Bug 1070470 - Fix capitalisation of MozillaPulse to match PyPI

Bug 1070470 - Use source archives rather than git for faster pip install

Using git+git means cloning the repo, trying to determine if the ref
specified is a tag or revision or branch etc. Instead Github provides
direct archive zips that are much faster when using pip install.

Bug 1070470 - Specify #egg= for dependencies from zip archives

Since peep doesn't like it if it's not set, plus it apparently helps pip
figure out sooner in the process whether a dependency is covered.

Bug 1070470 - Add inferred dependencies to requirements files

There were many packages that end up being installed via dependency
chains, that were not themselves listed in the requirements files. To
ensure determinism with pip (and to prevent errors with peep, since it
uses --no-deps by default), all packages must be listed explicitly.

I've avoided adding any more packages to checked-in.txt since we will
soon be deleting the vendor directory, so it seems silly to pollute it
further. compiled.txt is now rather unfortunately named, since it lists
packages that are pure and could have been in vendor/.

Versions have been set to match those currently used in production, or
in the case of blessings (which is not installed in production, we're
just lucky our use of mozlog does not hit the import & so haven't seen
the error), I've just set it to the latest available version.

Bug 1070470 - Rename compiled.txt to common.txt

The packages in this file are already a mixture of pure and compiled
packages. It's not worth moving the pure packages to checked-in.txt,
since we'll eventually be removing checked-in.txt and the associated
vendor/ and moving everything in there to this file. As such, common.txt
more accurately reflects the purpose of this file.

Bug 1070470 - Update newrelic version to that used in production

Bug 1070470 - Add newrelic-plugin-agent to prod.txt

It is already installed in production; prod.txt now reflects reality.

Bug 1070470 - Explicitly list all packages and versions in docs.txt

Pins Sphinx to an exact version & explicitly lists all required
packages. Also updates sphinxcontrib-httpdomain from 1.1.7 to 1.3.0.

Bug 1070470 - Move six to dev.txt since it's not used in production

Bug 1070470 - Remove django-webtest from dev packages since it's unused

We use WebTest but not django-webtest.
Depends on: 1076710
Attachment #8576633 - Flags: checkin+
Depends on: 1143033
The rough plan I have in mind is:

1) [Done] Clean up the in-repo requirements files, so everything is pinned to an exact version, no dependencies are missing and thus get sneak installed at a random version, and all versions match current prod (we can update them later, but easier to transition if the versions match for now).
2) [In progress] Add hashes for peep and get our Travis run using those to verify.
3) Change our deploy script to do something similar to Kitsune's([1]):
 (a) Set up a virtualenv if one doesn't exist
 (b) Use peep to install our deps into that, using the new shiny in-repo requirements files
 (c) Make the virtualenv relocatable
 (d) Rsync that virtualenv to all the other nodes
4) Update our various scripts to use this new virtualenv instead of the globally installed python/packages
5) Remove now unnecessary packages from the global site-packages (just leaving requirements for Chief + of course virtualenv/pip/peep/setuptools).

This would mean that deploying would ryncing the code + matching virtualenv at the same time, making it super easy to keep them in sync.

I just need to know whether people:
* think the above sounds ok (vs say not copying the virtualenv and instead getting each node to peep install itself by getting the Chief script to run the command remotely on each node)?
* think we should use the peep installed globally on the admin node, or else check it into the repo and use that. The peep docs suggest the latter (for ultimate chain of trust), and that is was Kitsune does. I'm learning towards checking it in (we can then update it ourselves and even just uninstall it from the global site-packages).

[1] https://github.com/mozilla/kitsune/blob/master/scripts/update/deploy.py
Flags: needinfo?(mdoglio)
Flags: needinfo?(klibby)
(In reply to Ed Morley [:edmorley] from comment #18)
> * think we should use the peep installed globally on the admin node, or else
> check it into the repo and use that. The peep docs suggest the latter (for
> ultimate chain of trust), and that is was Kitsune does. I'm learning towards
> checking it in (we can then update it ourselves and even just uninstall it
> from the global site-packages).

For context, peep is a single 11KB file - we could just put it under bin/
Depends on: 1143350
The plan looks good to me. Also, good idea to check what kitsune is doing!
Flags: needinfo?(mdoglio)
I would recommend rsyncing over having chief run peep install everywhere. Checking peep into the repo sounds like a fine idea, as well.
Flags: needinfo?(klibby)
Great - thank you for the feedback :-)
Blocks: 1145712
Depends on: 1146184
Blocks: 1145606
Priority: P3 → P2

Comment 23

3 years ago
Commit pushed to master at https://github.com/mozilla/treeherder-service

Bug 1070470 - Correct "Required by" comment for fancy-tag

fancy-tag is only required by 'django-browserid', not also 'responses'.
Depends on: 1156731
Depends on: 1156838
All done here.

Some things to note:
1) We use peep from the repo, not from pypi (otherwise it kind of defeats the point). So don't expect |peep| to work, you need to use |./bin/peep.py| from the root of the treeherder(-service) repo.
2) You'll need to one-off destroy and re-create your VM for this and the repo rename. Or else delete the 'venv' directory and then run a |vagrant provision|.
3) To add/change packages from now on:
 * Update a package version in the requirements file (or add a new package)
 * Either push to Travis, or in the Vagrant environment, run |treeherder$ ./bin/peep.py install -r requirements/...| and then when it errors out (since the hash has changed or is missing) - just copy and paste the new value and use that in the requirements file.
 * For stage/prod: when that commit is deployed, the package gets installed updated.
 * For dev: Run a manual |/bin/peep.py install|
 * For Vagrant: run a |vagrant provision|
4) If you want to run manage.py or other commands on a stage/prod node, you'll need to make sure you use the virtualenv python. The 'bin/activate' script doesn't work after relocation, so either use /data/www/treeherder.{mozilla,allizom}.org/venv/bin/python or add that bin directory to the front of your PATH.
5) If we ever need to purge the contents of the virtualenv (eg to remove unused packages or because of an issue), we can just delete the venv directory for stage and/or prod on the admin node, and it will be created by the deploy script on next deploy (nothing will break in the interim, since other than when the deploy script is running, nothing uses the venv directory on the admin node).
6) To update peep, just copy the peep.py file verbatim from the release archives on https://github.com/erikrose/peep/releases
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.