Open Bug 1396154 Opened 7 years ago Updated 2 years ago

Make it easier to build optimal Docker images

Categories

(Firefox Build System :: Task Configuration, task)

task

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

Details

Attachments

(3 files, 1 obsolete file)

Using Dockerfiles for building Docker images results in a lot of copy pasta. This results in complex Dockerfiles, images diverging from one another, best practices not being followed, etc.

I have some patches to introduce a new mechanism to build Docker images that will make things more consistent and easier to reason about. It will also provide an easier path forward for more deterministic Docker images.
Comment on attachment 8903855 [details]
Bug 1396154 - Extract finding archive files to own function;

https://reviewboard.mozilla.org/r/175608/#review181430
Attachment #8903855 - Flags: review?(dustin) → review+
Comment on attachment 8903856 [details]
Bug 1396154 - Docker image to build a standalone Python tarball;

https://reviewboard.mozilla.org/r/175610/#review181534

::: commit-message-06239:1
(Diff revision 1)
> +Bug 1396154 - Docker image to build a standalone Python tarball; r?dustin

I don't understand why this is a Docker image.  Really, it's a shell script that runs in the ubuntu:16.04 docker image.  It just happens to create a final image that has a useful tarball in it.

This would probably be better represented as a script (perhaps the first of many) that *uses* Docker to build a Python binary.  Alternately, it could be a task definition that will create a Python binary (with `image: ubuntu:16.04` and `command: 'curl raw/url/of/run.sh > run.sh && bash run.sh'`)

The latter would be pretty cool, actually, since it suggests the possibility of eventually generating all custom-built packages within taskcluster *and* doing so in a way where dependencies would cause them to be regenerated as necessary.  Not something that needs to happen eventually, but the creation of a sense of anticipation would be nice.
Attachment #8903856 - Flags: review?(dustin) → review-
Attachment #8903877 - Attachment is obsolete: true
Comment on attachment 8903857 [details]
Bug 1396154 - Support building Docker images without Dockerfile;

https://reviewboard.mozilla.org/r/175612/#review181548

This is definitely interesting!

It's starting to feel like AnsibleAgain 0.1.  I think we should either go with something built of simple composable parts and not to many magic special cases; or go with something like Ansible or Chef or Puppet.  This already feels like it's inventing a language, and I'm sure it will quickly get a few notches more complex once it lands.  The special cases I'm thinking of are specifically things lke the "tooltool-manifests" key or the shell-helpers file.

Looked at another way, I think we need to make a choice: is this the domain of a few people who understand sytem configuration deeply and have built/adopted tools to do it effectively, or is this an area where we expect and invite any dev to be able to make changes easily.  Or, what sort of changes do we want to make "easy"?

The problem with Ansible and Puppet is, they are not terribly composable.  Ansible's idea of variables is about as well-thought-out as PHP5, and roles are not the reusable components you want them to be.  Puppet is a little better for composability, but of course then you need a Ruby runtime.  We have both Puppet and Ansible knowledge within releng / relops.  I don't know much about Chef.

Considerations for other platforms apply, too -- the more easily this can evolve into supporting various OS's and harware, the better.  It would be wonderful to have one common way of describing the build/test environments we use, even if deployment of changes still takes some specialized knowledge and/or access.

I'd like to mark this f+, but it's not something I'm comfortable landing as-is.

::: taskcluster/docker/lint/image.yml:20
(Diff revision 3)
> +extra-files:
> +  - tools/lint/python/flake8_requirements.txt
> +  - tools/lint/tox/tox_requirements.txt
> +
> +local-recipes:
> +  - system-setup.sh

Could all of these be recipes, with the recipes taking arguments?

```
recipes:
  - install-fzf.sh
  - install-node.sh
  - python-packages.sh:
    - requirements: tools/lint/python/flake8_requirements.txt
    - requirements: tools/lint/tox/tox_requirements.txt
  - node-packages.sh:
    - tooltool-manifest: tools/lint/eslint/manifest.tt
    - tooltool-manifest: tools/lint/eslint/eslint-plugin-mozilla/manifest.tt
```

As a random additional thought, what if this generated a shell script and then ran it with `set -x`?  That might make debugging a bit easier (and diffing `before.sh` and `after.sh` would be useful during development).

This approach might additionally let us generate scripts tailored to the specific applications of the image.yml -- run on startup of a new image for OS X, run in a new image for Docker, maybe embedded in a Powershell script for Windows...

::: taskcluster/docs/docker-images.rst:132
(Diff revision 3)
> +Recipes should not:
> +
> +* Download anything via ``curl``, ``wget``, etc. All downloads should be
> +  mediated through supported and secure download channels (namely tooltool
> +  and the system package manager).
> +* Invoke the system package directory. Use the wrapper exported from the

directly
Attachment #8903857 - Flags: review?(dustin) → review-
Comment on attachment 8903857 [details]
Bug 1396154 - Support building Docker images without Dockerfile;

https://reviewboard.mozilla.org/r/175612/#review181548

Thanks for the review and the thoughts.

Repeating what I said in our Vidyo chat earlier, I think configuration management tools like Ansible, Chef, and Puppet - while they purport to solve this problem space - are heavyweight solutions and actually in a slightly difference space, which is system "normalization." i.e. given a system in undefined state X, make it have state Y. For one-time operations starting from a known state, most of the complexity disappears and I don't think use of tools like Ansible and Puppet are justified. If we weren't talking about immutable Docker images, my opinion would be different.

Regarding considerations for other platforms, I'm going to call YAGNI. Until we actually have in-tree management of Windows and Mac system configs, I'm not too interested in contemplating a unified solution that addresses those platforms. Even if we were there, Windows is so different from Linux that I'm not sure we should use the same tool even if we could. This opinion is driven from what I've perceived is a lackluster perception of Puppet and other tools on Windows.

My primary goal for this "framework" is to make the common things simple and consistent while not infringing upon flexibility to do complicated things. We shouldn't require boilerplate in Dockerfiles and integration in per-image scripts to achieve optimal results. I want the barrier to creating and managing images to be low.

Because I want to focus on simplicity, I don't want to invent a mini language or reinvent many wheels. I want a very focused set of primitives that are obvious when you see them - you shouldn't need to consult the docs when you see a YAML file to know what each line does if you know a thing or two about how Docker+Linux works.

When I step back and look at what existing images do, I see a few core activities. These include:

* Install system packages
* Build/compile custom things
* Download and install things from tooltool
* Install non-system packages (Python, Node, etc)
* Copy files into well-defined locations

Squinting really hard, you could likely reduce all these to "install a set of packages." I'd very much like our end state to have standalone tasks for producing system packages and then for the Docker images to install these packages. e.g. instead of downloading Mercurial from tooltool, copying robustcheckout.py, and setting up a custom hgrc, we'd have a mozilla-mercurial .deb package that did this for us. With some Taskgraph magic (or a custom Apt server/repo) we could do this. It is out of scope for now. But it is definitely something I'm thinking about. My dream end state is for every file in the final image [used to build Firefox] to be built from source in a manner that is verifiably reproducible and deterministic. Once we start pivoting more towards Debian for images (which has a strong reproducible builds effort), I'd like to connect with some Debian packagers and see what we'd need to do to make our custom packages and Docker image content compatible with their world order. But that's for another day...

> Could all of these be recipes, with the recipes taking arguments?
> 
> ```
> recipes:
>   - install-fzf.sh
>   - install-node.sh
>   - python-packages.sh:
>     - requirements: tools/lint/python/flake8_requirements.txt
>     - requirements: tools/lint/tox/tox_requirements.txt
>   - node-packages.sh:
>     - tooltool-manifest: tools/lint/eslint/manifest.tt
>     - tooltool-manifest: tools/lint/eslint/eslint-plugin-mozilla/manifest.tt
> ```
> 
> As a random additional thought, what if this generated a shell script and then ran it with `set -x`?  That might make debugging a bit easier (and diffing `before.sh` and `after.sh` would be useful during development).
> 
> This approach might additionally let us generate scripts tailored to the specific applications of the image.yml -- run on startup of a new image for OS X, run in a new image for Docker, maybe embedded in a Powershell script for Windows...

Using a generated shell script to wrap with `set -x` is a good idea - especially if all recipes are shell scripts (which they are at the moment). Although the Python driver already validates the exit code, so wrapping for wrapping's sake doesn't add any benefit.

Regarding adding arguments, this is more complexity and something I am squeemish about. Yes, primitives like creating Python virtualenvs and installing Node packages is likely a good idea because we do these things a lot. I think those should be exposed as top-level primitives in the YAML instead of exposed via recipes. I like the idea of a recipe as a standalone executable/script that does one thing and does it well. If you need to customize, hoist the concept into the YAML or define variables in the YAML that can be exposed to a recipe via a written JSON file or sourced autogenerated shell script or something. I think autogenerated shell scripts start to infringe on the Puppet/Chef/Ansible domain space and if we go that route we should use one of those tools.
I think we differ on the meaning of "simple".  I want something that says, "to build this image, do A, B, and C in that sequence", possibly with some detail as to what A, B, and C are.  Whether to use

  - install-python-packages:
    - requirements: tools/lint/python/flake8_requirements.txt

or

  - install-flake8-packages

is a matter of taste. Either one "reads" correctly, as an imperative sentence suggesting the action that will be performed.  On the other hand, 

  tooltool-manifests:
    - tools/lint/eslint/manifest.tt

isn't a verb.  What does it do?  Does it include that manifest somewhere similar to extra-files?  Does it download the tooltool packages?  Where, to a cache?  To a special location?  Does it unpack them?  Again, where?  Does this run before or after the recipes?  Does order in the YAML file matter?

Put another way, I think we should have one primitive: run a recipe.  And those primitives should be combined in one way: put them in a list and expect them to be performed in that order, starting with an image named by the flavor.
Comment on attachment 8903856 [details]
Bug 1396154 - Docker image to build a standalone Python tarball;

https://reviewboard.mozilla.org/r/175610/#review181834

::: taskcluster/docker/python-ubuntu1604/run.sh:31
(Diff revisions 1 - 2)
> +rm -rf Lib/test
>  make -j$(grep -c ^processor /proc/cpuinfo) install
> +# Don't need the static library.
> +find /image-build/python-bootstrap -name libpython${SHORT}.a -exec rm {} \;
> +# Don't need .pyo files.
> +find /image-build/python-bootstrap -type f -name '*.pyo' -exec rm {} \;

I like this!
Attachment #8903856 - Flags: review?(dustin) → review-
Thx :pmoore for adding me on CC (I just came back from parental leave).

In mozilla-releng/services[1] we are building Docker images using Nix[2].
 - declarative (compared to Dockerfile)
 - reproducibility (currently only build repr. but when needed we can start checking for binary repr.)
 - minimal/optimized (you only include packages you need)

I discussed some time ago with :rail to make a recepe to install Nix using Puppet, but time never allowed us to work on it[3].
Anyway, in my view it makes sense to use Nix, since it also opens other options of usage especially in the area of reproducibility. Ofcourse the unfamiliarity with the tool is always going to be a problem, whatever we choose :)

Would a solution in this direction be of any interest?


[1] https://github.com/mozilla-releng/services
[2] https://nixos.org/nixpkgs/manual/#sec-pkgs-dockerTools
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1344413
Flags: needinfo?(dustin)
I don't think I'm high on the list of deciders for that question, but since you asked: yes, that sounds appealing, especially in that it could be used on both hardware and in docker images, ensuring we have the same image.

One issue I see is that we are currently assuming an Ubuntu environment, and the tests are very sensitive to the window manager and other bits of desktop in a way that may take a lot of fiddling if we switch to a more generic wm/desktop.
Flags: needinfo?(dustin)
:dustin tnx, i noticed :gps is on vacation so i pinged you :)

- apart from running build/tests on hardware/docker nix could be also used in local environments. eg. ./mach bootstrap --nix. this could be also a nice setup to build/test m-c against different set of compilers (or other parameters), we already have this in place in mozilla/nixpkgs-mozilla[1].

- i might be wrong and i'm definetly not familiar with the details, but i see this tight coupling a ubuntu environment a tech debt which needs to be fixed at one point. few times i got suggestions like: don't touch this, it is working and we don't know why :P.

- nix could also make boostrapping on osx easier, but that is another wormwhole which i don't think we should step it on day one.


definetly nix might not be a solution for everything, especially initially, but it wouldn't be smart to ignore it since in case it works it might bring us a lot of flexibility you can not gain with other tools. anyway let me know if this nix stuff would be of any interest.


[1] https://github.com/mozilla/nixpkgs-mozilla/blob/master/release.nix#L91
I agree, it's a promising tool for this purpose.

And I agree that the close ties to the vagaries of Ubuntu kind of stink but (a) it's intentional, in that Ubuntu is the most common user environment, and (b) fixing it requires a lot of fiddly changes to tests that generally 0 or 1 people are familiar with -- a process we're going through painfully with windows 10 right now.  So changing that is a question that needs some careful thought at higher levels of the Firefox org.

Maybe we could start with builds as a proof of concept?  The build environment is pretty simple, and most of the variable stuff we include in builds comes from toolchain tasks.  If you could put together a nix-based docker image under taskcluster/docker that can produce passing linux builds, that would be pretty persuasive.
(In reply to Dustin J. Mitchell [:dustin] from comment #18)
> I agree, it's a promising tool for this purpose.
> 
> And I agree that the close ties to the vagaries of Ubuntu kind of stink but
> (a) it's intentional, in that Ubuntu is the most common user environment,
> and (b) fixing it requires a lot of fiddly changes to tests that generally 0
> or 1 people are familiar with -- a process we're going through painfully
> with windows 10 right now.  So changing that is a question that needs some
> careful thought at higher levels of the Firefox org.
> 
> Maybe we could start with builds as a proof of concept?  The build
> environment is pretty simple, and most of the variable stuff we include in
> builds comes from toolchain tasks.  If you could put together a nix-based
> docker image under taskcluster/docker that can produce passing linux builds,
> that would be pretty persuasive.

We have a good start for this with the android-build image, which moved the Android builds from Centos 6 (!) to Debian (using snapshots).  I don't think it would be much work to extend android-build to a desktop-build replacement image.

I don't think that nix is an _obvious_ fit for the Android build images, but that's not the same as a _poor_ fit.  Just that the set of people who care about Android are very unlikely to know about nix.
:nalexander I understand that there are already existing better Dockerfiles which would only need a few tweaking to be repurposed. When I have my drama moments I usually say that "Docker is fine. Dockefiles are the hidden evil." :) Trying not to go into too many details I think we can all see that imperative nature of Dockerfiles and linear composability of image layers is far from optimal especially in bigger setups. And sadly results only show up after some time of using Docker. Of course, there are ways to fix these problems, but I didn't find any (until Nix) to actually to prevent this from happening. 

I expect nobody to know about Nix. But if we can get enough benefits from Nix (which might be in other areas as well), we might actually seriously consider it. I understand that this is a bit far-fetched, but I find building Docker images a small and isolated enough task to get the initial feeling of how things could work with Nix.


:dustin As a proof of concept I'm just going to add docker "output" in already existing geckoDev[1] nix expression (read: nix recipe) and then paste here commands to reproduce the build process. 


[1] https://github.com/mozilla/nixpkgs-mozilla/blob/master/pkgs/gecko/default.nix
The problem with nix is that if you build Firefox in a nix environment, you get a Firefox that only runs in a nix environment. But what we want is a Firefox that runs on centos, redhat, suse, debian, ubuntu, etc. And nix doesn't give you that, except if you run patchelf to, ironically, do the opposite of what it was written for (it was written to change the elf interpreter from /lib/ld-linux.so.2 to whatever it is in nix, there it would be needed to do the exact opposite). Having to run patchelf is not really appealing.

FWIW, I started actively looking at bug 1399679.
:glandium

Nix was never designed to use patchELF, nor it was never designed to not use it :P in Nix we tend to build everything from source, when that is not possible or when we are lazy, we resource to patchELF. Many - including me - thought the patchELF approach will be hard to maintain. But that was until I tried it :) Funnily when you have a (build)reproducible environment using patchelf is as stable as building from source. I didn't believe it until I tried it. 

Going the other way, patchELF the Nix binary also works. A thing I usually do with Rust and Go binaries so I make them work on other distros. I assume you won't believe me it works reliably until you try it, I know I would be skeptical.

As said Nix wasn't designed for or against usage of patchelf, it just works more reliably with Nix due to (build)reproducible environment.

(different approach) There is also another way to build binaries, which you also proposed in a comment of Bug 1399679, that is to use chroot environment. Nix can also build an FHS like structure chroot environment. :nbp already played with these idea[1] for his use case. Using FHS like structure would also produce FHS like binaries. We already use this approach to run static analysis on the code, by providing a "gecko-env" tools which run its arguments inside gecko environment, eg. "gecko-env ./mach configure"[2].

It is hard to provide a fully working replacement in my free time. With mozilla-releng/services[3] and nixpkgs-mozilla[4] I think we already proved that there is a huge potential which we should explore more. I'm sure that not everybody is as "high on Nix" as I am :) - which is fine - but potential benefits are very high to just ignore Nix.

Maybe an introductory session/meeting in Austin all hands would be a nice step in giving Nix a more closer look. I find building Docker images is where Nix shines and is a small surface to give Nix a try.


[1] https://github.com/mozilla/nixpkgs-mozilla/blob/master/pkgs/gecko/default.nix#L90
[2] https://github.com/mozilla-releng/services/blob/master/src/shipit_static_analysis/shipit_static_analysis/workflow.py#L139
[3] https://github.com/mozilla-releng/services
[4] https://github.com/mozilla/nixpkgs-mozilla

:dustin

I couldn't find a time to have an example docker image, current workload is too high. I would like to look at it during Austin all hands.
Product: TaskCluster → Firefox Build System
Assignee: gps → nobody
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: