Open Bug 1289812 Opened 8 years ago Updated 2 years ago

updated desktop-test images cause tests to fail

Categories

(Firefox Build System :: Task Configuration, task)

task

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Blocks 1 open bug)

Details

From bug 1272629 comment 40, gps wrote: > (In reply to Mike Hommey [:glandium] from comment #39) > > Created attachment 8774935 [details] [diff] [review] > > Packages differences between the build that succeeded with 0.1.6 and the > > build that failed with 0.1.7 > > > > As can be seen, there's a lot more than bison that has been changed... the > > joys of docker. > > This makes me sad. > > The base image builds "FROM centos:6." So when we build that image, it will > download from Docker Hub whatever the latest version of the centos:6 image > is (and it gets updated every few days). That is not deterministic. > > Then when we do a `yum` operation as part of building the image, we sync the > latest version of the yum package database from a 3rd party server and > install the latest versions of packages. > > The reason why so many packages changed and the reason Valgrind broke is > because our image building process isn't deterministic over time. > > So any time someone changes the base image, we run the risk of pulling in > unwanted package version bumps that could break things. It is a ticking time > bomb. And it went off in this bug. > > You can work around the issue by using the existing centos6-build and > centos6-build-upd image tags and add binutils in the desktop-build image. > Unfortunately, my work in bug 1289249 to overhaul the desktop-build Docker > image won't be so fortunate. Furthermore, my work will make rebuilding the > desktop-build Docker image more frequent, which means higher chances of > random breakage due to unwanted package updates. For better or worse, I > guess I'll need to figure out package pinning. Gah, scope bloat. From bug 1272629 comment 41, Dustin wrote: > There's no practical way to make builds of full linux installs like this > deterministic (CentOS is bad, Ubuntu is worse!), because of those 3rd party > package databases. Freezing the repos is, IMHO, impractical (we tried that > with PuppetAgain and it's been a nightmare). > > We also need to balance the stability of a deterministic build against the > need to keep the images up-to-date. Aside from the (admittedly minor) > security issues with running out-of-date images, it can be very difficult to > install new packages on a system that is using a years-old base image and > repositories. Again, PuppetAgain has demonstrated this pretty clearly. For > example, this bison install may have required either rebuilding bison > against the frozen repository, or importing a lot more packages than just > bison into a custom repository; and some of those additional packages may > have been the cause of the valgrind issues. At any rate, with a frozen > image there's a rapidly increasing disincentive to touch it as it becomes > more and more out-of-date and the likelihood of bustage from an innocent > modification grow without bound. > > I think the right place to get determinism is in using the same image for > multiple builds. > > We can get the updates, as well as minimizing the number of extraneous > changes pulled in when making changes like this, by rebuilding the docker > image periodically (I had suggested weekly), in a distinct cset that can be > backed out and investigated if it causes failures. We're not doing this > yet, due to a shortage of round tuits, but I think it is the best way to > balance the competing concerns here. As you can see, we don't have determinism around system package management when generating Docker images used by TaskCluster automation and it is making upgrading some images difficult. It also undermines progress towards bug 885777. I'll add my own thoughts/proposal in a few minutes...
Depends on: 1272629
No longer depends on: 127629
I'm going to argue that ensuring base image stability over time is important. We had 2 examples in the past week where unwanted changes from rebuilding the base image slowed down projects. I don't want unwanted changes slowing down the development velocity of Firefox. Put another way, we shouldn't be scared to update base images. So on these grounds I'm not a fan of layering all images on top of a "golden" base image. (A "golden" base image also prevents others from reproducing our environment, which is a goal of bug 885777.) As Dustin states, there's no practical way to make builds of Linux distros deterministic - at least not when playing by the normal rules, which are to run `yum update` or `apt-get update` to synchronize the package database with a remote server. That synchronization is inherently non-deterministic. So any deterministic process requires a package database that is frozen or never synchronizes. I see the following general solutions: 1) Never update the package database on a central server that Mozilla controls. This isn't practical because you can never upload new versions of packages, including packages with security fixes. 2) Snapshot the package database used by image building. i.e. seed the image with a snapshot of /var/lib/rpm, /var/lib/yum, etc. I /think/ this should work. As long as package files don't disappear from servers (we'd have to run our own package server), in theory things will be deterministic over time. There might be issues from e.g. expiring GPG keys years from now. But there should be mitigations to that, especially if we run our own package server. 3a) Pin all package versions; don't allow unpinned packages to be installed. 3b) Bypass apt/yum and install packages via `rpm -i` and/or `dpkg -i` In 3a & 3b, we probably have "manifests" listing packages and their versions. Each Docker image logically has a single manifest/list (although it could be assembled by combining various recipes together). When we build images, we set package pinning for apt/yum (3a) or we download and install .rpm/.deb files manually (3b). I kinda like 3b. One of the reasons I like it is because it acknowledges a reality that distro packaging for containers hasn't yet: that containers aren't operating systems: they are run-time environments. apt/yum manage operating systems and all the bells and whistles that go along with them. A container is supposed to be a minimal environment to do a specific task. Running package post-install actions to e.g. configure system services is not needed in containers. I just want the files from the package, thank you very much. Yes, in rare cases you need those post-install actions (like ca-certificates generating a cert bundle), but I think that's the exception, not the rule. Maintaining an explicit list of packages and their pinned versions gives us *complete* control over an image. I can guarantee you 100% that our current build image has unnecessary packages in it - packages pulled in via some stated dependency. These extra packages consume size (probably hundreds of megabytes), which means images take longer to create, upload, and download and that slows down automation (not to mention poor souls downloading the images for local testing). If we bypassed the package dependency graph by explicitly enumerating packages, we could eliminate packages and make our images as slow (and fast) as possible. I concede we wouldn't start there on day 0 (we would likely seed the packages list with whatever the final state of `yum install X` did). But it would be possible to optimize going forward. Maintaining an explicit list of packages could be annoying because you'd have to keep things updated. But like all problems, tooling could come to the rescue. We could build a tool that compared the latest versions of packages available from distros, bulk update all references, push to Try, and have someone sign off on bulk package version updates if automation was happy with the new versions. This is similar to periodically pulling down a new upstream release of a vendored component, like WebRTC. Most packages within a distro release maintain ABI compatibility, so cherry picking packages for updates should be doable. We could even query the apt/yum databases to verify any minimum package version dependencies are met. I actually have experience using something like 3b in production. 10 years ago, I worked at a company that maintained its own central server holding .tar.gz packages. Each archive contained the equivalent of the result of a `make install DESTDIR=/some/root`. Each machine type configuration maintained a manifest of package versions. When you went to configure a machine, it would download corresponding .tar.gz from the manifest, decompress them to e.g. /opt/packages, then use GNU stow to create a symlink farm under e.g. /usr/local. You could also have specially named files in the archives that would get executed when the environment was being created. (This whole approach is not unlike how the Nix packager works.) You then had a fully self-contained environment under e.g. /usr/local that was mostly deterministic and reproducible. It was amazing and years ahead of its time (you can argue that nobody has really figured out deterministic system packaging because if they had we wouldn't be discussing it in this bug!). OK, I think that's enough brain dump for now.
> As Dustin states, there's no practical way to make builds of Linux distros > deterministic - at least not when playing by the normal rules, which are to run > `yum update` or `apt-get update` to synchronize the package database with a > remote server. At least for Debian, this is not true: you can use snapshot.debian.org as an apt archive stuck at a determined date. > Running package post-install actions to e.g. configure system services is not needed in > containers. And you don't have to run them. In Debian, setup a policy-rc.d that doesn't start services. e.g. https://jpetazzo.github.io/2013/10/06/policy-rc-d-do-not-start-services-automatically/
> We could build a tool that compared the latest versions of packages available from distros, bulk update all references, push to Try, and have someone sign off on bulk package version updates if automation was happy with the new versions. This is basically what I'm suggesting with building new docker images frequently. I think we need to make a binary decision: frozen forever, or updated automatically. Any of the in-between options, including what we have now, and including manually installing updates and resolving dependencies, are *worse* than either extreme, as they leave anyone wanting to touch this stuff trying to learn to be a distro maintainer overnight, just to install an upgraded version of libfoo. Also, this is a relatively simple, academic exercise for build images -- they have very little "stuff" installed, now that toolchains are installed from tooltool. The test images are *much* worse, as they install the full desktop environment, and many of the test suites are sensitive to the configuration and setup of that desktop environment. 3a or 3b would involve carefully coordinating versions of hundreds and hundreds of packages (1631 right now for Ubuntu-12.04, probably more for 16.04), and the tarball method would require a *lot* of configuration scripts cribbed from Ubuntu. > Maintaining an explicit list of packages and their pinned versions gives us *complete* control over an image. I don't think we want this -- complete control brings complete responsibility, and at least for testers that's complete responsibility to duplicate the Ubuntu desktop environment that our tests require. Nobody wants to own that, and I don't think anyone should. I'm strongly in favor of "updated automatically" because, for testers at least, that represents what our users are using. We don't want to ship a version of Firefox that bombs out on every Ubuntu desktop because it's incompatible with a sec update to libfoo that our users have installed, but we have not. Furthermore, I think that tests should be written to test the browser, not the OS, and test suites that go perma-orange on every weekly update are, in fact, permaorange -- holding the test environment stable and fiddling with timeouts until they turn green is just papering over the issue. We could certainly choose to diverge in our approaches for building and testing. Nix may, indeed, be a great choice for builders. And of course, none of this applies to a platform for which there are more than a tiny number of users! So that's my braindump :)
I didn't know about http://snapshot.debian.org/. That looks *really* nice. Unfortunately, I don't get a warm and fuzzy feeling from that page that their service can handle our load. Plus, we'd have to switch to Debian (which I'm not opposed to, but it is a bit of work). I'm tempted to say we should borrow the ideas behind snapshot.debian.org and run our own snapshotted server. Here's how I could see it working. We'd maintain a list of distros/versions to "mirror." For each of those, we'd maintain a list of packages to mirror (because mirroring every package would be overkill). Periodically (say once a month or so), we'd perform a partial mirror of the upstream package repo to a date-stamped endpoint. e.g. /ubuntu/xenial/20160809/... This endpoint would contain the package metadata *and* the .deb/.rpm files for the packages we've chosen to mirror. We treat the package metadata as immutable and read-only. So once we've taken a snapshot of the available packages, it is frozen. However, we can upload new packages to a date-stamped "bucket". This way, if a new consumer comes along and wants to install a package we aren't mirroring, we can make it available to them without having to create a new whole snapshot. Once we have the HTTP server with snapshots in place, we change our Docker image building to replace all apt/yum sources with our custom mirror. We define a variable in the Dockerfile with the date-stamped snapshot we're using. When we create a new snapshot, we go through the Dockerfiles and update the snapshot. If we encounter failures due to new package versions, we try to fix those. But, there is no rush to do so because as long as we're pinned on an snapshot, we should be deterministic. The bulk of this work is the server pieces. However, there are a lot of tools for maintaining package mirrors. I'm sure one could be abused to work for our needs. For the server, we could probably use S3. Set up a bucket somewhere. Configure replication. Then write some glue code to create new snapshots and upload new packages to existing snapshots.
(In reply to Gregory Szorc [:gps] from comment #4) > I didn't know about http://snapshot.debian.org/. That looks *really* nice. > > Unfortunately, I don't get a warm and fuzzy feeling from that page that > their service can handle our load. What load? That would only be used to refresh images, right?
We did almost exactly this with PuppetAgain, and it did not work out very well. The biggest problem is that, especially with Ubuntu, you can't "pick and choose" packages -- the repo metadata gives a set of packages which work together, and mixing packages from different revisions of that repo metadata leads to apt-get failures and turns simple "upgrade libmesa" bugs into weeks-long slogs -- exactly the situation we're trying to avoid. Mirroring only the packages you need is not really practical -- there are 1000's, and the set changes frequently as packages are split or joined upstream. Discovering the set would require building a new image against the full repo, then taking its package list as the basis for the mirror. And adding a subset of packages in a secondary bucket, as suggested, typically requires lots more packages in that bucket than you'd think, due to narrow version requirements (so most of those secondary buckets will have their own libstdc++ or the like). Managing repos is kind of a nightmare. There are a lot of tools, and they are all broken in one way or another. Dependency resolution doesn't work the way you want it to. Apt doesn't really support mirrors, except by using multiple A RR's, which is not an option with S3, so you're left either using a single region or special-casing sources.list based on your region.. the list goes on. I don't know who would be responsible for doing all of this stuff, but it's not me -- I've been in that pit of despair before. Just to throw out a few other options to consider: * Use Gentoo with binary builds enabled - version dependencies are very loose - lots of stuff already has ebuilds, and ebuilds are easy to modify or create - USE flags could let us cut out a lot of cruft like desktop games * Treat the docker image as a snapshot - automatically update the image frequently, using a bumper-bot of some sort - provide package lists from the build process to allow easy diffing (what else changed when I added libfoo?) * Make the docker image process depend on an automatically-created snapshot - create snapshots using an in-tree process based on building an image, sampling its package list, and building a repo containing those packages - update snapshots frequently, using a bumper-bot of some sort - package additions/upgrades might require a new snapshot, but at least that's controllable
(In reply to Dustin J. Mitchell [:dustin] from comment #7) > * Use nix :) Yup, this should just work (TM) :)
Using nix would add other problems, like the fact that AIUI you can't run non-nix binaries in a nix environment because ld.so is not e.g. /lib64/ld-linux-x86-64.so.2 (it's /something/HASH-glibc-version/lib/ld-linux-x86-64.so.2), so that makes our tooltool stuff not runnable in nix out of the box. And using nix toolchains instead of what we have in tooltool would generate binaries that can only run in nix, which is not what we want to produce either.
(In reply to Dustin J. Mitchell [:dustin] from comment #6) > We did almost exactly this with PuppetAgain, and it did not work out very > well. > > The biggest problem is that, especially with Ubuntu, you can't "pick and > choose" packages -- the repo metadata gives a set of packages which work > together, and mixing packages from different revisions of that repo metadata > leads to apt-get failures and turns simple "upgrade libmesa" bugs into > weeks-long slogs -- exactly the situation we're trying to avoid. Perhaps you didn't grok my proposal: I'm proposing to take snapshots of the package metadata and store that in datestamped URLs. e.g. deb https://mozilla-packages.org/ubuntu/20160810/ xenial main Would download: https://mozilla-packages.org/ubuntu/20160810/xenial/main/binary-amd64/Packages.gz These Packages.gz files contain the metadata and all dependencies. As long as this metadata is complete and immutable, we've got determinism. > Mirroring only the packages you need is not really practical -- there are > 1000's, and the set changes frequently as packages are split or joined > upstream. Discovering the set would require building a new image against > the full repo, then taking its package list as the basis for the mirror. > And adding a subset of packages in a secondary bucket, as suggested, > typically requires lots more packages in that bucket than you'd think, due > to narrow version requirements (so most of those secondary buckets will have > their own libstdc++ or the like). I agree maintaining a partial list would be annoying. We could build it into image generation: configure the build to go through a HTTP proxy that records which packages are accessed and have a mechanism to feed the missing packages into an uploaded. Obviously not something we could use in production. But certainly we could use it as part of the development cycle. Also, I now realize Apt allows you to put the packages at any URL relative to the URL after the "deb" in the sources file. So there isn't a problem of duplicating the .deb files for various snapshots. We could probably run a full "mirror" of an official repo. If we're concerned about files changing, the package URL could contain the hash of its content. It's easy enough to rewrite the "Filename" metadata in Packages files. > Managing repos is kind of a nightmare. There are a lot of tools, and they > are all broken in one way or another. Dependency resolution doesn't work > the way you want it to. Apt doesn't really support mirrors, except by using > multiple A RR's, which is not an option with S3, so you're left either using > a single region or special-casing sources.list based on your region.. the > list goes on. I don't know who would be responsible for doing all of this > stuff, but it's not me -- I've been in that pit of despair before. We would be replacing sources.list at image build time. The Docker base images don't have any package metadata. So as long as we replace sources.list before doing an `apt-get update` we should be fine. We could probably grab the appropriate package URL (with the local S3 region) from a TC "secret." > Just to throw out a few other options to consider: > > * Use Gentoo with binary builds enabled > - version dependencies are very loose > - lots of stuff already has ebuilds, and ebuilds are easy to modify or > create > - USE flags could let us cut out a lot of cruft like desktop games > > * Treat the docker image as a snapshot > - automatically update the image frequently, using a bumper-bot of some > sort We could do this. But it sacrifices reproducibility since a 3rd party won't be able to reproduce the image since it isn't deterministic over time. Plus, if we lose the original images, we're out of luck reproducing them. That's not good for disaster recovery. > - provide package lists from the build process to allow easy diffing > (what else changed when I added libfoo?) > > * Make the docker image process depend on an automatically-created snapshot > - create snapshots using an in-tree process based on building an image, > sampling its package list, and building a repo containing those packages > - update snapshots frequently, using a bumper-bot of some sort > - package additions/upgrades might require a new snapshot, but at least > that's controllable More reproducibility and determinism concerns.
As much as I like nix, alpine, gentoo, and other "minimal" packaging approaches and distros, we have to consider trust of the upstream package provider. Nothing against these great projects, but when it comes to trusting Linux distros to provide good packages and timely security updates, I trust Debian and RedHat the most. It would be difficult to convince me that we should ship bits to users built from any other Linux package provider. Of course, we could go the Tor approach and build all our build dependencies from source. I'm not opposed to that. But it feels out of scope at this juncture.
https://www.aptly.info/doc/overview/ looks like exactly what we need.
For Nix and Gentoo, I was suggesting building the world from source -- the binary packages are a *local* cache of binaries we've built before, which makes `emerge world` on a stage-2 system run a *lot* faster! But yes, those distros are a little more "loose" than others in many ways, and it wasn't an entirely serious suggestion. I do grok your proposal. It's easy enough to take a snapshot and that's what you use forever. What's hard is taking a snapshot but still allowing cherry-picked updates of packages for which devs want a specific version (libmesa's one that comes to mind). At that point you run into package version conflicts, either when you install the cherry-picked version, or next time you try to bump the base mirror version. Looked at another way, determinism is easy. Determinism in given the sort of changes we expect is hard, especially if we want this to be self-serve without requiring the intervention of someone skilled in the dark arts of Debian packaging. But I can boil my argument down to "that won't work and/or will be hell on wheels to maintain", which isn't constructive. So I'll stop making noise. Let's see what a rough draft of this looks like and how it works over time! I'm not familiar with aptly, as it was substantially less powerful last time I looked at it, so that looks like a good option for Ubuntu. I have some scripts used within PuppetAgain (sadly, no longer public) to rebuild apt repos from a set of .deb's, and similarly for yum repos, that I'd be happy to share.
for arch in i386 amd64; do for dist in precise trusty; do mkdir -p dists/${dist}-updates/all/binary-$arch dpkg-scanpackages --multiversion --arch $arch pool > dists/${dist}-updates/all/binary-$arch/Packages bzip2 < dists/${dist}-updates/all/binary-$arch/Packages > dists/${dist}-updates/all/binary-$arch/Packages.bz2 done done Note that this doesn't sign the repository. If I recall, some issue with apt-get prevented it recognizing non-Canonical signatures on updates repositories.
I agree that cherry picking packages can be problematic. But I think that's generally a problem, regardless of whether we're using deterministic package repos or not. Just earlier this week I ran into problems with libmesa and libgl where we asserted we installed version X, ubuntu bumped to version Y, and we failed to build the image. Attempting to pin the package to version X resulted in tons of package conflicts and apt uninstalling hundreds of packages! Lemme try to come up with a more concrete proposal on package hosting...
Something I don't get here. If you go ahead with something like aptly, why not simply use snapshot.debian.org? Determinism is nice, but using a "private" repo only moves the reproducibility pole. That is, you'd move from "download this image that you trust we built the way we say we built it, or you can try very hard reproducing it", to "create this image from this repo that you trust we built the way we say we built it, or you can try very hard reproducing it".
My push at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=21fdf73bbb17e34cfe00e372695c4f21e4ba3e6a (and the subsequent partial backout of that push) resulted in spurious test failures on Linux64 (like https://treeherder.mozilla.org/logviewer.html#?job_id=3573922&repo=autoland#L10745). My changes should not have introduced regressions this way. The only thing I can come up with is that by triggering Docker images regeneration, we picked up a new version of an upstream package and that somehow broke tests. So now we have a timebomb on both the Linux build and test images: the next person to touch them breaks things. This basically blocks my work on running tests from a source checkout. I don't like this rabbit hole.
Here are the package upgrades available to the last known good desktop-test image: cpp-4.6 g++-4.6 g++-4.6-multilib gcc-4.6 gcc-4.6-base gcc-4.6-base:i386 gcc-4.6-multilib lib32gcc1 lib32gomp1 lib32quadmath0 lib32stdc++6 libstdc++6-4.6-dev python-imaging And here is the `apt-get upgrade` log showing impacted versions. Preparing to replace g++-4.6-multilib 4.6.3-1ubuntu5 (using .../g++-4.6-multilib_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace gcc-4.6-multilib 4.6.3-1ubuntu5 (using .../gcc-4.6-multilib_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace libstdc++6-4.6-dev 4.6.3-1ubuntu5 (using .../libstdc++6-4.6-dev_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace g++-4.6 4.6.3-1ubuntu5 (using .../g++-4.6_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace gcc-4.6 4.6.3-1ubuntu5 (using .../gcc-4.6_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace cpp-4.6 4.6.3-1ubuntu5 (using .../cpp-4.6_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace lib32gcc1 1:4.6.3-1ubuntu5 (using .../lib32gcc1_1%3a6.2.0-3ubuntu11~12.04_amd64.deb) ... Preparing to replace lib32gomp1 4.6.3-1ubuntu5 (using .../lib32gomp1_6.2.0-3ubuntu11~12.04_amd64.deb) ... Preparing to replace lib32quadmath0 4.6.3-1ubuntu5 (using .../lib32quadmath0_6.2.0-3ubuntu11~12.04_amd64.deb) ... Preparing to replace lib32stdc++6 4.6.3-1ubuntu5 (using .../lib32stdc++6_6.2.0-3ubuntu11~12.04_amd64.deb) ... Preparing to replace gcc-4.6-base:i386 4.6.3-1ubuntu5 (using .../gcc-4.6-base_4.6.4-1ubuntu1~12.04_i386.deb) ... Preparing to replace gcc-4.6-base 4.6.3-1ubuntu5 (using .../gcc-4.6-base_4.6.4-1ubuntu1~12.04_amd64.deb) ... Preparing to replace python-imaging 1.1.7-4ubuntu0.12.04.1 (using .../python-imaging_1.1.7-4ubuntu0.12.04.2_amd64.deb) ...
The gcc packages shouldn't matter since we're not building anything there. The only thing that I could plausibly suspect would be the libstdc++ package? (That feels like it'd be pretty weird to cause random failures though.) Can you run the known-good image and the image that caused failures and diff the filesystems?
Blocks: 1303843
No longer blocks: 1289249
Summary: Make system packages in Docker images more deterministic → updated desktop-test images cause tests to fail
Bug 1324492 shows that it's not only desktop-test that's important, updating desktop-build can also break tests (in this case the valgrind tests.)
Summary: updated desktop-test images cause tests to fail → updated desktop-test or desktop-build images cause tests to fail
(In reply to :Ehsan Akhgari from comment #20) > Bug 1324492 shows that it's not only desktop-test that's important, updating > desktop-build can also break tests (in this case the valgrind tests.) And comment 0 shows this shouldn't have been a surprise to anyone here... :/ Sorry for the noise!
A few fixes for this are being discussed, notably bug 1396154. I believe this is still an issue with this docker image.
Product: TaskCluster → Firefox Build System
glandium fixed this for the build images in bug 1399679--they're using snapshot.debian.org. We still need a solution for test images. bug 1503756 is one current issue that's blocking work from landing due to failures with an updated test image.
See Also: → 1399679
Summary: updated desktop-test or desktop-build images cause tests to fail → updated desktop-test images cause tests to fail
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.