Switch local development to use Docker
Categories
(Tree Management :: Treeherder, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: emorley, Assigned: armenzg)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
Comment hidden (obsolete) |
Reporter | ||
Comment 1•9 years ago
|
||
Docker is also now (slowly) getting less painful on Windows - eg the docker client now supports it (you couldn't run docker commands outside the boot2docker VM previously): http://azure.microsoft.com/blog/2014/11/18/docker-cli-for-windows-clients/ Also boot2docker-cli is being replaced by Docker Machine (https://docs.docker.com/machine/) so the Windows story is looking hopeful. Microsoft are contributing quite a bit towards Windows compatibility. As such, my concerns over using docker (from the painfulness of both using it personally and for contributors on Windows) are diminishing by the day :-)
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
I've filed issues for the various quirks encountered so far: https://github.com/docker/toolbox/issues/656 https://github.com/docker/machine/issues/4115 https://github.com/docker/hub-feedback/issues/1066 https://github.com/docker/hub-feedback/issues/1070 https://github.com/travis-ci/docs-travis-ci-com/issues/1258
Reporter | ||
Comment 3•7 years ago
|
||
Some more issues filed / PRs opened: https://github.com/travis-ci/docs-travis-ci-com/issues/1267 https://github.com/travis-ci/travis-build/pull/1102 https://github.com/moby/moby/issues/12641#issuecomment-314428616 https://github.com/docker/docker.github.io/issues/3854 https://github.com/docker/docker.github.io/issues/3857 https://github.com/travis-ci/travis-ci/issues/6418#issuecomment-314280366
Reporter | ||
Comment 4•7 years ago
|
||
Dustin couldn't get the current Vagrant environment working. Between that and a few other issues people have had, it would be make a Docker solution available. Docker seems to have finally overtaken Vagrant in terms of project activity/reliability/...
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 5•6 years ago
|
||
Another instance where a Docker/Docker Compose based solution would be preferable: <igoldan> I think there's a bug on the celery workers now <igoldan> https://bugzilla.mozilla.org/show_bug.cgi?id=1395356#c54 <igoldan> I checked out an old revision and didn't reproduced it; so it's not my local env <emorley> you need to run provision to install the new dependencies <igoldan> for vagrant, you mean? <emorley> Yeah - from the host run `vagrant provision` <emorley> the list of python and JS packages is regularly changing, so it's good to run provision semi-regularly (or at least whenever something isn't working) :-) <emorley> If we were to switch to a Docker (and Docker Compose) based workflow, this would happen automatically
Reporter | ||
Comment 6•6 years ago
|
||
<Aryx> has anybody recently set up vagrant? was the download of the vbox from vagrant slow (20-60k/s)? <emorley> I seem to remember it not maxing out my connection, but not quite that slow (more like 200-500kb/s) <emorley> A docker image download from hub.docker.com would no doubt be faster hehe :-)
Reporter | ||
Comment 7•5 years ago
|
||
I'm unable to get Vagrant working on my new laptop (it hangs during provision; I'm presuming Virtualbox 6's new "work with Hyper-V" feature is not as complete as implied), so until we do this bug I don't have a way of working on anything backend related.
Updated•5 years ago
|
Comment 8•5 years ago
|
||
I would like to have Docker instead of Vagrant: The latter requires bios changes to enable some virtualbox feature, and requires a physical network cable because of reasons. https://github.com/klahnakoski/treeherder-for-windows
Docker on windows will get around the problems of Vagrant. The Docker script is effective instructions on how to setup Treeherder on a new machine; important instructions that I would have preferred over Vagrant. Finally, with Docker, we can overlay other images for development: Specifically, Pycharm's remote debugger; so debugging is easier.
Reporter | ||
Comment 9•5 years ago
•
|
||
Some thoughts/ideas from when I looked into this previously:
-
Docker and docker-compose are much more reliable (especially on Windows) than they were a few years ago, whereas the number of issues seen with Vagrant seems to have increased. Between that, the fact that Taskcluster/other Mozilla projects extensively use Docker, and that it's generally more popular in the open source community - switching definitely make senses from an ease of contribution point of view.
-
Since Treeherder uses multiple external services (MySQL, Redis, RabbitMQ), a docker-compose based development environment that uses those project's native Docker Hub images seems best.
-
It would be great to use the same docker-compose project on Travis as works locally, since it would (a) avoid having to keep two environments in sync, (b) mean that there is now actually test coverage for the development environment (currently the Vagrant scripts aren't tested).
-
Heroku could eventually use a Docker (rather than buildpack) based solution too (bug 1506909), however the production and development Dockerfiles would likely need to be quite different (additional Python packages for the latter, as well as things like installing Firefox for Selenium), and unfortunately Docker doesn't yet support "including" one Dockerfile from another. As such, it's probably best to ignore the production case and write the Dockerfile with only development in mind for now (eg only
COPY
the bare minimum source into the image and.dockerignore
everything else to avoid cache churn, and rely on docker-compose source directory bind mounts). -
To reduce the time taken for the initial Docker build, as well as how often the cache gets busted, careful thought will need to be given to how the main
Dockerfile
is written, particularly since Treeherder uses both Python and Node.js. Possible approaches might be:- Use Docker's new multi-stage builds feature. ie have an initial stage that uses
FROM node:...
, followed by theFROM python:...
stage that copies across the Node binaries. - (Probably preferred) Separate out the Python and Node.js parts into separate docker-compose services. eg: have the main service be the Python app, then another Node.js image-using service that just runs the
yarn start
.
- Use Docker's new multi-stage builds feature. ie have an initial stage that uses
-
An additional complication (particularly if the separate Python and Node.js services approach is taken above) are Treeherder's Selenium tests. These require: Firefox, Geckodriver, the Python dev environment (since they are run using pytest), and a built UI (which requires Node for the
yarn build
). To avoid adding too much complexity to the main Python docker image, it might be worth seeing if https://github.com/SeleniumHQ/docker-selenium can be used alongside it. However those images are pretty heavyweight, so if used are probably best kept out of the main docker-compose.yml (running Selenium tests locally is not something we do as often) - perhaps by having a separate docker compose file thatextends
the main one. -
As part of initial setup (and in fact any time new migrations are added), Django migrations need to be run (amongst other things). As such it will likely be necessary to have the main Python image use a custom entrypoint script that runs these prior to
exec
-ing the passed command (see official docker hub images for examples of this pattern), to replace what happens at the end ofvagrant/setup.sh
. -
Django's development server,
runserver
, currently doesn't gracefully handle the DB not being available at startup (see https://groups.google.com/forum/#!topic/django-developers/gNRC4IzInms). This means that if the MySQL docker container isn't ready (which is particularly likely at first run, since the mysql docker image performs a few additional tasks such as creating the empty database), then it will error. To avoid this, the custom entrypoint script mentioned above, should also check that MySQL is ready, for example by using! nc -z
in a while loop or similar. -
The Vagrant environment currently uses an
iptables
hack to mean that incoming requests get sent to runserver/webpack-dev-server even if they aren't listening specifically on 0.0.0.0 (this was performed this way to save people from having to manually pass CLI arguments to runserver every time). This can perhaps be replaced by having the Python app container'sCMD
be runserver with the appropriate network adapter binding arguments. -
Our Travis setup script currently moves the MySQL data directory onto a ramdisk (tmpfs) since it reduces the pytest suite runtime by 30%. As part of running tests in CI using the new docker environment, it might be worth experimenting with the docker tmpfs storage volume type to see if it similarly improves test runtime (given data loss isn't an issue in CI). Incidentally, I think the reason we see such a speedup is that we use Django's
transactional_db
fixture in places where we should really be usingdb
(see bug 1348947), which iirc means Django doesn't use a simpler ~faked in-memory approach. -
It appears that Django's runserver doesn't listen to the default of SIGTERM, so docker-compose must send SIGINT instead to avoid waiting 10 seconds for the time out. This can be achieved by using
stop_signal: SIGINT
in the compose config for the Python image. -
One way to speed up Travis (which if we're not careful will be slower than at present, since it has no native docker caching support) might be to have Docker Hub (now called Docker Cloud) automatically build the image from
master
, and then reference that image via the Dockercache_from
feature, which means if there have been no docker-related changes, existing image layers can be re-used. This would also help speed up the time it takes for initial setup locally too. The downside appears to be that it then causes cache misses in other cases, unless workarounds are applied (see https://github.com/moby/moby/issues/32612), plus doesn't yet play nicely with multi-stage builds (see https://github.com/moby/moby/issues/34715).
I'll also see if I can tidy up my local experiment branch to push up to the Treeherder repo as a starting point.
Reporter | ||
Comment 10•5 years ago
•
|
||
Meant to say that a few recent changes have handily reduced the complexity for this bug:
- Treeherder's recent switch from Karma to Jest now means that a browser environment is no longer needed to run the JS tests, so the stock node image can be used for them (more commonly run locally than the selenium tests)
- The vagrant/Travis elasticsearch pieces were recently removed, since we decided to disable that feature locally/in CI, for improved prod parity (bug 1527868). As such it's no longer necessary to set up an elasticsearch compose service here, which sidesteps a few annoyances that I'd seen using their (bloated) docker images.
Comment 11•5 years ago
|
||
Reporter | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
As part of moving from Vagrant to Docker I'm looking to run a couple of test groups within Docker and the other two outside of it. This can add 2-4 minutes more to each run. I'm considering leaving js-test and python-linters outside Docker to get those results back faster while running python-tests-main and python-tests-selenium outside of it.
Here's the increase for python-tests-main [1][2] 4 min 24 sec --> 7 min 25 sec.
In the future we should be able to speed up the Docker/Travis set up:
http://atodorov.org/blog/2017/08/07/faster-travis-ci-tests-with-docker-cache/
In the future we should also need to publish the built images.
You can already test the Docker work by following these steps:
git fetch origin
git checkout -b docker origin/docker
git merge master
docker compose up
[1] https://travis-ci.org/mozilla/treeherder/builds/522371686
[2] https://travis-ci.org/mozilla/treeherder/builds/523251453
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 14•5 years ago
|
||
When running the Python tests for slow tests or Selenium, do we want the Django migrations to be applied or not?
Assignee | ||
Comment 15•5 years ago
|
||
I have Travis running green:
https://travis-ci.org/mozilla/treeherder/builds/523558055
Assignee | ||
Comment 16•5 years ago
|
||
sclements: igoldan: This is ready to be tested.
If any of you could try celery tasks it would be great as I don't know what is supposed to happen with them.
I also don't know what the Debug toolbar
in the settings is about:
https://github.com/mozilla/treeherder/pull/4901/files#diff-26d4413823b415e0e1902bf845ff7067L351
I've also asked for review in here:
https://github.com/mozilla/treeherder/pull/4901
Steps for testing (You should see these in the changes to the docs:
- Install Docker for Desktop if not installed (https://www.docker.com/products/docker-desktop)
- Checkout the branch from the treeherder repo (not my user repo) [1]
docker-compose build
- The first time this will take a long time since it will download a lot of images and build the Django app
- In the future we can speed this up
docker-compose up
This will start up all services (redis, mysql & rabbitmq), the frontend and the Django app. You can do code changes and it will reload the UI or the backend. You can load http://localhost:8000/docs/ for the backend and http://localhost:5000 for the UI. The backend will automatically apply the migrations upon first run. When you're done you can Ctrl + Cdocker-compose run -p 8000:8000 backend bash
This will start all services except the frontend. Your prompt will be within the Django app container. No migrations will be applied by default. You can apply them with./initialize_data.sh
. You can start the backend like so./manage.py runserver 0.0.0.0:8000
.- If you need to start the backend with an environment variable preset you can do so with
-e
(e.g.docker-compose run -e KEY='hi!' backend bash
) docker-compose run -p 5000:5000 frontend sh -c "yarn && yarn start -env.BACKEND=https://treeherder.allizom.org --host 0.0.0.0"
This will only start the frontend. You can load it and I think it is pulling data from staging.
NOTES:
At any moment when you're done and want to shut containers down you can do so like this:
- Use
docker-compose stop
which will stop the containers w/o deleting them - Use
docker-compose down
which will destroy the containers, remove the volumes and created network.
Starting the backend and the frontend with 0.0.0.0
is important for Docker to route things properly. Check in the commands above to see where it's used.
Troubleshoot:
If you can't load http://localhost:8000 then check the PORT mapping for the backend command. Run docker container ls
and
check that the container treeherder-backend
has ports 0.0.0.0:8000->8000/tcp
rather than 8000/tcp
.
Try again with -p 8000:8000
. If that doesn't work and you're on Mac try downgrading Docker to https://download.docker.com/mac/stable/26764/Docker.dmg and see https://github.com/docker/for-mac/issues/3350#issuecomment-472141881 for more details.
[1]
git fetch origin
git checkout -b docker origin/docker
git merge master
Assignee | ||
Comment 17•5 years ago
|
||
Tasks post this bug:
- See what changes are needed for the Debug Toolbar in the settings.py file
- Try using tmpfs for the MySQL data directory which reduces pytest runtime by 30% (see deleted travis-setup.sh file)
- Automatically publish image to Docker Hub
- Faster Travis builds (http://atodorov.org/blog/2017/08/07/faster-travis-ci-tests-with-docker-cache/)
Updated•5 years ago
|
Comment 18•5 years ago
•
|
||
The Django debug toolbar provides an overlay when accessing API's on localhost. It shows the SQL statement and how long the query took to execute, among other details. It's quite useful. The celery tasks are executed by Heroku via the procfile so I'm not sure if they need to be tested. I can however test out one of the django commands for the intermittents commenter (in test mode) - it's a celery task but also has a separate Django command so it can be manually run.
Edit: I just remembered the Commenter isn't a celery task anymore. It was originally, but we changed it so it's executed by the Heroku Scheduler instead. But, it'll still be worth making sure running Django commands work as expected. I'll test this out later today.
Assignee | ||
Comment 20•5 years ago
|
||
Assuming you're all happy about the PR, do you want it to land tomorrow (I'm PTO Friday) and back out if there are issues?
Or wait until I come back on May 20th to land it?
It should not be easy to bitrot while I'm away.
Comment 21•5 years ago
|
||
For anyone else who tries running the backend with an environment variable preset the command is: docker-compose run -p 8000:8000 -e DATABASE_URL=<read-only-replica> backend bash
Assignee | ||
Comment 22•5 years ago
|
||
A note from IRC, for development, we can make the backend image also have node and yarn to only have a single dev environment. We could name it the dev image instead of the backend image.
On production we will use two different images since we want very slim images.
Comment 23•5 years ago
|
||
I've been experimenting with this. It seems to work pretty well. I've hit a couple hiccups, but I think they're pilot error on my part. I'll play with this more Monday.
Assignee | ||
Comment 24•5 years ago
|
||
This landed few weeks ago.
Assignee | ||
Updated•5 years ago
|
Updated•2 years ago
|