Closed Bug 958077 Opened 10 years ago Closed 9 years ago

Drop playdoh-lib in favor of peep and a requirements file

Categories

(Webtools Graveyard :: Elmo, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Pike, Assigned: adrian)

References

Details

Attachments

(2 files, 3 obsolete files)

We should probably move our playdoh-lib reference from the 1.4 branch over to master.

That's a pretty hairy update, as the whole transaction stuff changed, and I bet some of our use of the decorators etc was not really right to begin with.

So this will be more like "making what we tried do what it's intended to do" than a straight-forward search-and-replace.
Priority: -- → P2
For the record we're not on the django-1.4 branch [0]. We're on the master branch but stuck at 32967d8 [1] which means that if we... `cd vendor; git pull origin master` we're going to be on Django 1.5.6 (or Django 1.5.8 once [2] lands).

Both 1.4 and 1.5 had security releases. We could sweep our technical debt and switch to the django-1.4 branch of playdoh but that's likely not going to be sustainable. My advice would be to head straight for Django 1.5 and fix what breaks. 

The a10n side of the elmo code scares me and I didn't write the complex transaction stuff you speak of :)

[0] https://github.com/mozilla/playdoh-lib/tree/django-1.4
[1] https://github.com/mozilla/playdoh-lib/tree/32967d88be03f4fac91f046fc04bb442b194589b
[2] https://github.com/mozilla/playdoh-lib/pull/63
Seems the transactions are on 1.6. We don't have 1.6, right?

a10n uses its own django through requirements.txt. We'll need somewhat sync that with elmo, as it's calling elmo-hosted django code, notably

life.models.Forest.objects.filter
life.models.Locale.objects.get_or_create
life.models.Repository.objects.create
life.models.Repository.objects.filter
life.models.Repository.objects.get
pushes.utils.PushJS
pushes.utils.handlePushes

More to the point. What really bothers me about reviewing vendor updates is the baggage of non-django stuff we're changing.

Do we have a list of stuff we're using?

I think it's wrong to take all those as part of a django security update. I see two ways out:
- have a calendar item to update vendor, so that if we're caught in a firedrill, the list of changes is small
- drop using django-lib, and just pull in the things we need directly.
(In reply to Axel Hecht [:Pike] from comment #2)
> Seems the transactions are on 1.6. We don't have 1.6, right?
> 
Correct.

> a10n uses its own django through requirements.txt. We'll need somewhat sync
> that with elmo, as it's calling elmo-hosted django code, notably
> 
> life.models.Forest.objects.filter
> life.models.Locale.objects.get_or_create
> life.models.Repository.objects.create
> life.models.Repository.objects.filter
> life.models.Repository.objects.get
> pushes.utils.PushJS
> pushes.utils.handlePushes
> 
> More to the point. What really bothers me about reviewing vendor updates is
> the baggage of non-django stuff we're changing.
> 
You and me both brother! It really irks me. However, to this day it hasn't been a cause of major pain. 

> Do we have a list of stuff we're using?
> 
> I think it's wrong to take all those as part of a django security update. I
> see two ways out:
> - have a calendar item to update vendor, so that if we're caught in a
> firedrill, the list of changes is small
That's almost in general the approach I take on my other django projects. I often just upgrade playdoh-lib when a new django comes out even if I'm 99% certain I don't need to. Usually the advantage is that the steps are smaller and more manageable that way. 

> - drop using django-lib, and just pull in the things we need directly.

I have a better idea. Something I've been brewing for months but constantly struggle to find a time slot for. Basically to drop playdoh-lib entirely and replace all of it with a big requirements.txt file E.g. https://bugzilla.mozilla.org/show_bug.cgi?id=980351
How would a update process/script look for an all-requirements-based setup? i.e., in scripts/update_site.py, what would we do?
(In reply to Axel Hecht [:Pike] from comment #4)
> How would a update process/script look for an all-requirements-based setup?
> i.e., in scripts/update_site.py, what would we do?

The way I envisioned it is just one requirements.txt and one dev-requirements.txt (only used by developers) and something like::

peep install --no-dependencies --install-option="--home=`pwd`/vendor" -r requirements.txt

And additionally `echo vendor/ >> .gitignore` and keep the funfactory hacks in manage.py so it reads dependencies from a local directory instead of the site-packages. By doing that we don't "pollute" site-packages which can be really useful if you're NOT using virtualenv.

I think there might be more flags to pip for good sanity but I haven't researched the exact details of that yet. 

Oh, and use of peep [0] is important since it'll prevent man-in-the-middle attacks on pypi's domains. 

[0] https://github.com/erikrose/peep/
I'm not opposed, doesn't seem to work, though.

For one, peep merrily ignores --install-option, though it does work with pip.

On the other hand, pip doesn't make any effort to actually check if stuff is already installed. Though peep apparently does.
Looks like we need a little more logic. There's a TODO in the code: https://github.com/erikrose/peep/blob/1381ec1feac0d16a08a7c74ae87c189b4f659946/peep.py#L138.
Erik, can you take a look at the PR I sent for the peep issue, https://github.com/erikrose/peep/pull/43 ?
Flags: needinfo?(erik)
Depends on: 1020373
Rephrasing to be more precise, and assigning to Adrian, as we just talked about it.
Assignee: nobody → adrian
Summary: Update vendor to take update to django 1.5.x, from django 1.4.x → Drop playdoh-lib in favor of peep and a requirements file
Status: NEW → ASSIGNED
Attached patch bug-958077-requirements.diff (obsolete) — Splinter Review
This patch: 

 * removes the vendor submodule and its content
 * adds a requirements.txt file containing all dependencies needed for  production
 * adds a requirements_dev.txt file containing dependencies needed for development
 * does not yet include peep.

This is how I set up my virtualenv: 

> virtualenv env
> source env/bin/activate
> pip install peep
> peep install -r requirements.txt
> peep install -r requirements_dev.txt

It works for me on Ubuntu 14.04. I have no way to test it on Mac though, so I would greatly appreciate your help. Notably, I removed the --no-sites-package option from the documentation because it wasn't needed for me. If it's a Mac thing, please let me know. The wiki will need to get updated of course, I will take care of that once we have merged this. 

Regarding peep, I don't know how to proceed. The way I'm trying to go is to find a way to have an executable of peep downloaded somewhere in the repo, and to make the instructions use that executable directly. I can't get there however so far, but I'm working on it. If you think it's a bad idea or have a better way to do it, please let me know!
Attachment #8505977 - Flags: feedback?(l10n)
Comment on attachment 8505977 [details] [diff] [review]
bug-958077-requirements.diff

Review of attachment 8505977 [details] [diff] [review]:
-----------------------------------------------------------------

I think we should stick to the requirements/* files.

Also, the stuff in prod.txt doesn't need to be in the main requirements, because it's in vendor-local. I'd not throw those out yet.

I'm wondering, can we drop some requirements yet? Say Jinja? (oh gosh, it's imported in apps/commons/helper.py, which we don't use.) Maybe not.

Also, we need to install into the current elmo install, maybe create a new directory like vendor-peep, with a vendor.pth in there?

Reason for that is that we need to be able to run different versions of elmo within one apache wsgi server, i.e., with only one version of python. And thus no virtual env.
Attachment #8505977 - Flags: feedback?(l10n) → feedback-
Adrian, you have the right idea regarding keeping a copy of peep in your source tree: look under "Embedding" at https://pypi.python.org/pypi/peep/.
Attached patch bug-958077-requirements.diff (obsolete) — Splinter Review
New version, with peep included (thanks Erik! ) and all dependencies in the requirements/*.txt files. I have removed the dependencies that currently exist in our vendor-local/ directory because they are useless. 

Axel, I am not sure how to test for our production systems. I can't find a script that shows how we build it, is that documented somewhere? 

This new way of doing things doesn't seem too different from the previous one to me though, but maybe I'm lacking some information. At the moment we need to install the compiled requirements anyway, so I suppose that knowing how we do that will help me understand more. And I also suppose that installing compiled.txt + prod.txt shouldn't too different than just installing the former. And same for replacing the `pip` part with `./vendor-local/lib/python/peep.py`.
Attachment #8505977 - Attachment is obsolete: true
Attachment #8509285 - Flags: review?(l10n)
Comment on attachment 8509285 [details] [diff] [review]
bug-958077-requirements.diff

Review of attachment 8509285 [details] [diff] [review]:
-----------------------------------------------------------------

This is closer.

The deployment script is at https://github.com/mozilla/elmo/blob/master/scripts/update_site.py, and for anything that doesn't need to be compiled, that does the right thing. That's our deployment automation script, basically.

The idea is that with one python binary, you can run both dev and stage, so peep needs to be called from update_site.py in a way that it installs into a known local location.

I don't think I'd mind if the compiled dependencies end up in there, too.

I like that prod.txt gave us an idea what's installed in vendor-local/lib/python. I'm not bound to the file name, but the concept is nice.
Attachment #8509285 - Flags: review?(l10n) → review-
For reference, here's DXR's deploy script. It uses peep and installs into a new virtualenv each time. https://github.com/mozilla/dxr/blob/master/deploy.py
Pike, here's what I came up with regarding the update script and having all deps in one folder. One problem though, when using --target with pip, the second time you do it it will error out. The solution would be to remove the 'vendor-peep' folder every time we run the update script, is that acceptable? Or even wanted, since that ensures that all deps are correctly updated?
Flags: needinfo?(l10n)
I've been toying around with things locally a bit, and I can't help but think that we're hitting an aweful lot of crappy codepaths.

I'm running

 pip install -r reqs.txt --target=./foo --exists-action s

twice in a row, and I end up with

(peep)kochbuch:peep ahecht$ find foo/ -name \*-info
foo//feedparser-5.1-py2.7.egg-info
foo//feedparser-5.1-py2.7.egg-info/feedparser-5.1-py2.7.egg-info
foo//pytz-2012d-py2.7.egg-info
foo//pytz-2012d-py2.7.egg-info/pytz-2012d-py2.7.egg-info
foo//raven-3.1.16-py2.7.egg-info
foo//raven-3.1.16-py2.7.egg-info/raven-3.1.16-py2.7.egg-info
foo//South-0.8.1-py2.7.egg-info
foo//South-0.8.1-py2.7.egg-info/South-0.8.1-py2.7.egg-info

To quote Sheldon, in which universe is this a good thing? Note, pip 1.5.6 does this, peep is off the hook on this one.

I'm not sure I know a good way to hack around this. A possible solution is to just make the compiled.txt requirements a hard requirement of system upgrade, and move the pure-python deps into vendor-local.

I really think that re-installing everything everytime is a bad(TM) idea.
Flags: needinfo?(l10n)
From our discussion in the elmo meeting:

 * include all the pure Python dependencies into the code base, in ./vendor-local/
 * do nothing regarding dependencies updating in upgrade script
 * keep the requirements in text files for people using virtualenvs?
FWIW, I consider reinstalling from scratch each time a good pattern. There is, at present, no other way to make sure your deployed configuration matches only your stated requirements--obsolete dependencies can be left over, having unintended effects. Using a --download-cache reduces the network traffic to PyPI. (peep 2.0 doesn't support --download-cache at the moment, but there's a PR open.)
Here's a new proposal. I really don't like the idea of having all the dependencies in the code base on github. It makes update commits a mess, especially if you upgrade as part of a wider commit. So instead, I kept the previous behavior with a few differences: 

 * deps are put in the ./vendor folder (which funfactory already adds to the path, so no more work needed there)
 * the ./manage.py hack to get funfactory is reworked a little
 * whenever we do an upgrade, all dependencies are removed and reinstalled. As Erik stated, this is the best way to make sure no artifact was left in place, and all things are clean. And since we use peep, we know that the installed sources are good. 

Axel, what do you think? The only downside of this is that upgrading will get a bit longer, but that's an action that we only perform once in a while.
Attachment #8509285 - Attachment is obsolete: true
Attachment #8535524 - Attachment is obsolete: true
Attachment #8545267 - Flags: review?(l10n)
Looks good to me! I'm curious, though: what are all the embedded libs for?
Comment on attachment 8545267 [details] [diff] [review]
bug-958077-requirements-proposal.diff

Review of attachment 8545267 [details] [diff] [review]:
-----------------------------------------------------------------

Let's do this, but there are a few technical things we talked about on the phone.

I've put them into writing below.

::: requirements/embedded.txt
@@ +4,5 @@
> +# in ./vendor-local/lib/python/. There is no need to install them.
> +
> +elasticsearch==1.0.0
> +feedparser==5.1
> +ftp://ftp.tummy.com/pub/python-memcached/old-releases/python-memcached-1.47.tar.gz

IIRC, ftp doesn't work with pip anymore or so, I had a local change to switch this to http. Wanna do that as ride-along?

::: requirements/prod.txt
@@ +3,5 @@
> +# ---------------------------------------------------
> +# Loaded from PyPi
> +
> +# sha256: nwLQNXGE3h8JPBABK1LnRUoQCL5qXBhat6Mwes6x0S4
> +Babel>=0.9.4

I'm pretty sure this needs to be a strict version.

@@ +30,5 @@
> +# sha256: pucH2csXyL8eVTcTrRSzEnSoHVwM4PziGwKTbQ79fbs
> +html5lib==0.95
> +
> +# sha256: TmPMMyXedc-Y_61AvnL6aXU96CRpUXMXj3TANP5PUmA
> +nose==1.3.0

Can you check if nose and friends are needed in a wsgi environment?

I can see how ./manage.py serve would need them, but then they're good enough to be in dev.txt.

::: scripts/update_site.py
@@ +38,5 @@
> +    "./vendor-local/lib/python/peep.py install "
> +    "-r requirements/compiled.txt "
> +    "-r requirements/prod.txt "
> +    "--target=%s" % VENDOR_DIR
> +)

Let's install into a stage, and them remove vendor and move the stage to vendor.

That way, we have a smaller timewindow where there's no code on disk at all.

Can you file a follow-up bug to add caching once https://github.com/erikrose/peep/pull/61 is in? I'm asking now because it'd make a nice comment with a reference to the bug number.
Attachment #8545267 - Flags: review?(l10n) → feedback+
Blocks: 1121459
Blocks: 1121496
@Pike I have addressed your comments, except for the removal of unneeded dependencies from prod.txt. I have instead filed a new bug, bug 1121496, to do both that and the cleaning of useless deps. I hope that's fine.
Attachment #8548927 - Flags: review?(l10n)
Comment on attachment 8548927 [details] [diff] [review]
bug-958077-requirements-proposal-comments.diff

Review of attachment 8548927 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with two follow ups.

::: scripts/update_site.py
@@ -32,5 @@
>  
>  GIT_PULL = "git pull -q origin %(branch)s"
>  GIT_SUBMODULE = "git submodule update --init --recursive"
>  GIT_REVISION = "git rev-parse HEAD > collected/static/revision"
> -PEEP_CLEANUP = "rm -rf %s/*" % VENDOR_DIR

I'd keep this for TMP_VENDOR_DIR just to make sure. Defense in depth against previously failed attempts spilling into next installs etc.

@@ +42,5 @@
>      "-r requirements/prod.txt "
> +    "--target=%s" % TMP_VENDOR_DIR
> +)
> +PEEP_REPLACE_VENDOR = "rm -rf %s/* && mv %s %s" % (
> +    VENDOR_DIR, TMP_VENDOR_DIR, VENDOR_DIR

This should remove vendor completely, cause this way, you end up with vendor/vendor-tmp instead of the contents moved over.
Attachment #8548927 - Flags: review?(l10n) → review+
Commit pushed to develop at https://github.com/mozilla/elmo

https://github.com/mozilla/elmo/commit/dfd502324057b375f488e650734b89d857d2d5c9
Fixes bug 958077 - Moved all playdoh things into requirements text files, use peep to install deps. r=Pike
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Commit pushed to develop at https://github.com/mozilla/elmo

https://github.com/mozilla/elmo/commit/80a9c989983a1e1a0f7957b68ec7f56bcfde2fd5
bug 958077, let's remove vendor stage first, then install, then move to vendor, rs=foopy
Reopening. I can't get this to deploy, sadly.

The VM refuses to install a pip that's current enough, and I'm loosing my patience right now, it just doesn't want to compute. There's various pips and stuff, but the peep installer picks up a bad one, and I broke one, and well, it's basically a pile of crap right now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm surprised. Peep works with any pip>=0.6.2. That's really, really old.

When you install peep, pip does not automatically get installed. (How would its hash get checked, after all?) Rather, peep uses whatever pip an "import" delivers. Get a Python prompt in whatever virtualenv (or not) you're using, and see what version of pip you have:

> import pip
> pip.__file__

...and then go track down that file and see what the version is. There's a pip.__version__ on modern pips.
It finds 0.3.1, which is the package in the old ubuntu version we're running on. i.e., updating that === cans of worms.

There's a pip in /usr/local, 1.1, but require() didn't find that.

And then there's a ton of stuff in the package manager that just frustrates me tonight.
Peep actually imports and calls pip routines; it doesn't shell out. That's why it's not using the copy in /usr/local. Likely the pip in /usr/local is just a stub. If you examine it, you'll find a reference back to a packaged pip somewhere. If you add its containing folder to the PYTHONPATH before calling peep, everything should work out.
Commit pushed to develop at https://github.com/mozilla/elmo

https://github.com/mozilla/elmo/commit/dfb184440ac28b73344708730d310ec6eb241293
bug 958077, disable a few packages in compiled.txt, rs=foopy
The current prod VM can't get MySQL-python nor python-ldap compiled,
but has local packages for that.
All environments have been successfully upgraded by Pike!
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
No longer depends on: 1020373
Blocks: 1144664
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: