Closed Bug 896023 Opened 11 years ago Closed 9 years ago

Publish archives of Linux mock/chroot Linux build environments

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Assigned: mrrrgn)

References

(Blocks 1 open bug)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1990] )

Attachments

(1 file)

I'm requesting that Release Engineering publish archives of the mock/chroot build environment constructed by the Linux builders. The purpose of this request is to allow developers to reconstruct the official Mozilla build environment without having access to a Linux build slave (an act that isn't scalable nor available to all Mozilla contributors).

I'm thinking that Release Engineering could create a new buildbot job, probably daily - possibly as part of the Nightly job. This job (or mozharness script) would create the mock_mozilla environment just like build jobs/scripts create it today. Once the vanilla environment setup is complete, it creates an archive (.tar.bz2 or similar) of the content and uploads it to a public server (probably ftp.mozilla.org). Any developer could then fetch the archive, uncompress it, and chroot/mock into it, and build mozilla-central.

Bug 886226 tracks a similar effort. However, that would likely require RelEng to fully support things like the Yum packages repo. Since the mock_mozilla environment is the important piece for reproducing the Linux build environment (not the underlying operating system), I think publishing just the chroot/mock environment is sufficient for solving this problem of reproducible build environments. I certainly think it is easier than the alternatives.

Publishing a pre-built mock environment should also enable faster builds and more reproducible builds over time. For example, bug 851294 tracks the issue of excessive time spent populating mock environments during builds. I'm highly confident that decompressing a pre-built archive will be much faster than reconstructing a new mock environment on every build. We could even get smart and use things like filesystem snapshots and/or aufs to eliminate all but the initial decompress operation, shaving yet more time off build setup.

Since mock environments will be published somewhere central and will be strongly versioned (each archive would be its own version by definition), it would be possible for the tree to define which environment archive to use for building. For example, the change process for modifying the Linux build environment would involve publishing a new mock/chroot archive then telling the tree/build configs to use that new version. It would be easy to "go back in time" and build with an old environment.

Anyway, this is currently just a feature request without any real justification or priority behind it. I consider it a nice-to-have. I think a lot of developers would appreciate building in the same environment as the official automation - especially casual contributors who don't want to install dozens of packages just to build Firefox. I also think this would lead to optimizations in RelEng land and improvements in build reproduction over time.
Doing this would also help developers who are trying to reproduce test failures that occur on build slaves but aren't reproducible on developer machines, like bug 689291.  Even a one-off or on-demand publishing of such environments would be handy!
I think that this would be awesome.
I also think we'll hit a lot of non-distributable packages, which will definitely complicate things.
(In reply to Aki Sasaki [:aki] from comment #2)
> I think that this would be awesome.
> I also think we'll hit a lot of non-distributable packages, which will
> definitely complicate things.

Are you referring to Mozilla proprietary bits (such as signing keys) or legal issues around distribution e.g. GPL packages?
Flags: needinfo?(aki)
(In reply to Gregory Szorc [:gps] from comment #3)
> (In reply to Aki Sasaki [:aki] from comment #2)
> > I think that this would be awesome.
> > I also think we'll hit a lot of non-distributable packages, which will
> > definitely complicate things.
> 
> Are you referring to Mozilla proprietary bits (such as signing keys) or
> legal issues around distribution e.g. GPL packages?

More the latter, e.g. libraries/ndks/sdks.  The former would need to be excluded as well.
Flags: needinfo?(aki)
So, I guess we have a question for legal, then.

Gerv: We essentially want to distribute an archive of a minimal Linux operating system (likely by putting it on a public FTP/HTTP server somewhere) that reproduces Mozilla's official build environment so developers can use it. Presumably there's a cornucopia of licenses in there - everything from friendly BSD to GPLv3 to custom. There will be source files, compiled executables and shared libraries - everything.

What must we do to satisfy legal requirements around distribution of such a magnificent collection of software?
Flags: needinfo?(gerv)
gps: it depends what's in it! :-)

aki is right. If the build environment contains non-redistributable code, then, er, we can't redistribute it. 

The problem remains unscoped until someone can give me a list of the names of the packages of the software involved, and ideally a technical overview of how constructing the environment works technically, and what's included and not included.

Can we instead distribute the scripts for creating such an environment?

Gerv
Flags: needinfo?(gerv)
(In reply to Gervase Markham [:gerv] from comment #6)
> Can we instead distribute the scripts for creating such an environment?

This would be a nice thing to do, at least for the linux env, using something like vagrant and puppet.
We'd essentially be archiving/distributing a number of pre-built CentOS packages. I could give you the full list - but it will likely be several hundred packages long!

I /think/ what we're talking about is essentially the same as creating a derived Linux distribution. I naively think that if Linux distribution can exist, this proposed archive could exist.

Is it fair to assume that if a package exists in the official CentOS package repositories (and thus is being distributed today) that we only need to be concerned with packages that don't come from official CentOS repositories?

And, yes, we can definitely distribute the scripts for creating such an environment. This is essentially what is proposed in bug 886226 and it met some resistance. However, it deprives users of speed and convenience and is likely more work for Release Engineering, since it involves supporting a public package repository, etc. I believe this bug is preferred.
My preference would be to maintain a proper public repository of our packages rather than try and publish the chroots used by various build types.
(In reply to Chris AtLee [:catlee] from comment #9)
> My preference would be to maintain a proper public repository of our
> packages rather than try and publish the chroots used by various build types.

Even if we do that, we don't necessarily obtain other benefits described in the initial comment, such as time idempotency. I like publishing the raw chroots because they can be archived and restored without fear of them changing. As long as I have the archive of environment X, I can build with environment X. That's harder to guarantee when environment X depends on a hosted service (which can change over time).
A lot of the packages in such a distribution would be copylefted (e.g. the Linux kernel), and that would mean we'd need to provide the source as well as the binaries (even if the user didn't download the source). Source needs to be provided from the "same place" in GPL language. We can't just wave our hand at the CentOS repos and say "it's all over there somewhere".

Whereas if we publish a script/vagrantfile, the packages come from CentOS servers, and source is available from the same place. No problem.

If you say that the environment may change over time, then what this means (AFAICS) is that each time we created an environment, we'd need to assemble the relevant source packages into a big tarball and save it for download. I guess that would be possible, but it would be more work (in writing a script to assemble such a thing) and more disk space.

The "always reproducible" argument only works fully if all the code concerned is redistributable. (Otherwise, the environment we distribute will necessarily be incomplete and will need to be added to by the user.) I don't know if that's true; what does the environment need that is not under standard open source terms?

Gerv
Product: mozilla.org → Release Engineering
The mock environments are not built from repos "over there" - they're entirely built from repos we already mirror, publicly, at http://puppetagain.pub.build.mozilla.org/data/repos/yum.  So I don't think that the copyleft problem is an issue.

In fact, if there's some post-processing of the chroot (to remove the other files aki referred to), then it'd be trivial to fix up the yum repos list to use that public mirror URL instead of the internal URLs releng uses.  The internal URLs host exactly the same content, just in a more resilient and load-balanced fashion.
Taras, Coop: https://brendaneich.com/2014/01/trust-but-verify/ generated renewed interest in this bug. Is this on anyone's roadmap? Should it be?
Flags: needinfo?(taras.mozilla)
Flags: needinfo?(coop)
It's on our longer-term radar for sure. Possibly Q2? We have a bunch of interns coming in May, and this seems like a big enough chunk of work to keep one of them busy, especially if we add in the "make releng infra use the published archives" part.

gps: curious how/whether this relates to our Docker discussions?
Flags: needinfo?(coop)
(In reply to Chris Cooper [:coop] from comment #15)
> gps: curious how/whether this relates to our Docker discussions?

This would most likely be a prerequisite to Docker.

When you create Docker images, you have 2 options:

1) Start from scratch and populate the image by running commands, possibly by using a Dockerfile
2) Import an existing archive's contents

The most flexible for us is #2 (this bug), as it doesn't tie us and our users to Docker.
As coop said, q2 is a possibility. We are not going to be extending our buildbot system with non-critical features since the ci world will be changing in q2 with TaskCluster. Reproducible builds are in the post/ontopof TaskCluster bucket atm.
Flags: needinfo?(taras.mozilla)
At least as far as bare-metal provisioning, we can probably provide machine images, or a simple process to create them, once we're in the post-TaskCluster world.  For example, we can provide public AMIs of our linux systems.  The main limitation is licensing for OSX/Windows, obviously.
Took a quick stab at this this weekend. Right now I have 14 distinct environments to publish, each around 250MB compressed. These represent the different mock environments for desktop, mobile and b2g desktop builds for various branches.

I'd like to upload these somewhere, but am having trouble figuring out a good naming scheme. Many of the environments are shared between different branches or build types. What's a reasonable way to name them, or link to them given branch/build type?
Can you list the 14 distinct environments? Perhaps we should identify the important ones, only publish them, and plan to consolidate down to that set in the future? Hard to know without seeing what the variances are.

To facilitate 3rd party verification of Firefox releases, we care about the environments used to create binaries published on the official release channels. That's Nightly, Aurora, Beta, Release, and ESR for desktop and mobile. Not sure about b2g. Need to see the list.

As for naming, it doesn't matter too much, IMO. Long term, I imagine we'll store the filename of the environment in the tree and have automation look up the filename at build time so each build is tied to a specific chroot environment (like how tooltool works). Otherwise, builds vary over time as the chroot changes and that's not good for consistency or verifiability.
this is the list of mock environments for firefox, fennec and b2g desktop builds. doesn't include b2g device builds ATM.
I decided not to name them at all.

I've published the set of environments to S3. There's a json file here:
https://s3.amazonaws.com/mozilla-releng-mock-archive/mock_archives.json

which describes each of the environments. You can use the 'name' parameter of each environment as the S3 key name of the tarball of the chroot. e.g. 

https://s3.amazonaws.com/mozilla-releng-mock-archive/b69e5d970bab401bd098ad732fcdd03f5dd665f8

these are .tar.xz files.

Please take a look and let me know if there are any tweaks I should make.
Assignee: nobody → catlee
I should mention that these environments do NOT contain packages I don't think we're allowed to redistribute, such as the android SDK and NDKs.
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1983]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1983] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1990]
Assignee: catlee → winter2718
Depends on: 1129789
We now have base docker images for Linux: https://github.com/mrrrgn/build-mozilla-build-environments I'm using these as a foundation for publishing build specific containers.
12 environments published to: https://github.com/mozilla/build-environments
So, I'm going to call this one finished, though, I'm going to continue tweaking this repository in other bugs: 1134637
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: