Open Bug 885777 (fx-reproducible-build) Opened 7 years ago Updated 1 month ago

[meta] Deterministic, reproducible, bit-identical and/or verifiable Linux builds

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Depends on 12 open bugs, Blocks 2 open bugs, )

Details

(Keywords: meta, Whiteboard: [tor])

Attachments

(3 obsolete files)

Currently, it's not possible for a local developer to reproduce Mozilla's official build environment. This is mostly because the operating system config and packages used on official infrastructure are not publicly available.

While there are good reasons for the tree to be developed and tested on as many OS/package/environment combinations as possible, there are also times where developers want to ensure their environment is as close to the official one as possible. e.g. when developing C++, you gain confidence that your local compiler warnings will be exactly what's encountered on the official build.

I'm opening this bug to track the ability for local developers to reproduce RelEng's build environment so produced builds are deterministic and bit-identical to what they would get on RelEng hardware (at least to a reasonable degree, such as using the same toolchain/packages). While I'd like to see us do this on all 3 of the tier 1 platforms (Windows, OS X, and Linux), I'm limiting this bug to just Linux because I feel that's the only platform where this is realistically attainable at the moment.

This effort is currently low priority. This bug is mostly for tracking purposes.
Depends on: 768879
There are some useful packages at http://puppetagain.pub.build.mozilla.org/data/repos/yum/releng/public/CentOS/6/x86_64/.

We *might* have enough content there to attempt automated reproduction of the official CentOS build environment...
Note that it's currently impossible to get idempotent builds even on the same machine. bug 742169 is one example.
Depends on: 886226
And here is my first stab at it. Once I publish the CentOS 6.2 image I
created (will need to do it tomorrow once I have a faster internet
connection), you should be able to:

  cd config/bootstrap-linux64
  vagrant up
  # Wait a few minutes for the VM to initialize, download package
  # updates, populate mock_mozilla, etc.
  vagrant ssh
  mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/me/build --unpriv --shell /bin/bash
  ./mach build

I currently mount the cloned source directory as /builds/slave/me/build,
which is fine as the regular ssh user. However, it gets unmounted in the
mock environment. I need to do some homework.

For this to get checked into the tree, I'd like to have the Vagrant .box
file published somewhere somewhat official. Unless people counts, I may
start asking around for hosting. It should be a one-time thing,
methinks.
Assignee: nobody → gps
Comment on attachment 766556 [details] [diff] [review]
Add a Vagrantfile for creating the Linux64 build environment

And wrong bug. Derp. See bug 886226.
Attachment #766556 - Attachment is obsolete: true
Depends on: 896023
Assignee: gps → nobody
Summary: [meta] Deterministic, bit-identical Linux builds → [meta] Deterministic, bit-identical and/or verifiable Linux builds
Depends on: 935637
Attached patch Disable library timestamping (obsolete) — Splinter Review
libfaketime is unable to spoof these timestamps because they use sub-millisecond precision.
Miscellaneous reproducibility fixes to the Firefox build system for Tor Browser.
I've attached a couple fixes we had to do to the build system (beyond using libfaketime and Gitian to provide a controlled environment, which probably gives us quite a bit in terms of build paths, hostname, package versions, etc as described in the technical details post that Gregory Szorc linked in Comment 5).

On Linux, we also had to remove the FIPS140 .chk library signatures for NSS, as these were created using a throwaway private key during the build process and are not reproducible.

For Windows and Mac, we use MinGW-W64 and a fork of 'toolchain4', respectively, to cross-compile for those platforms in Gitian in a reproducible way.
Mike: Could you please open new bugs for each attachment so we can discuss each issue separately? This is intended to be a tracking bug with only high-level discussion and status updates. Feel free to ask for assistance in #build on irc.mozilla.org.
Flags: needinfo?(mikeperry)
Also rebase on mozilla-central, because i'm sure some of the things in the second patch are already fixed.
Whiteboard: [tor]
Depends on: 981558
Depends on: 942091
Depends on: 943331
Depends on: 982055
Depends on: 982075
I just discovered Gentoo Prefix (https://www.gentoo.org/proj/en/gentoo-alt/prefix/). It makes it possible to install Gentoo (a Linux distro distinguished by how much it relies on compiling from source) on pretty much any UNIX-like operating system, including OS X.

It got me thinking that we could potentially use Gentoo Prefix to produce our build chroot/container archives/images. Not sure if it has any compelling advantages over Gitian (which I would assume to be the frontrunner at this stage). Just throwing the idea out there.
Duplicate of this bug: 1036178
There is also an ongoing project for Debian, if it helps. https://wiki.debian.org/ReproducibleBuilds
Did you consider creating builder with Docker?
(In reply to Adam Stankiewicz from comment #14)
> Did you consider creating builder with Docker?

Efforts (not tracked by this bug) are underway to modernize Firefox's building infrastructure. This includes support for building in containers / Docker.
(In reply to Gregory Szorc [:gps] (away Sep 10 through 27) from comment #15)
> (In reply to Adam Stankiewicz from comment #14)
> > Did you consider creating builder with Docker?
> 
> Efforts (not tracked by this bug) are underway to modernize Firefox's
> building infrastructure. This includes support for building in containers /
> Docker.

@Gregory, I realize there's been a lot of noise about deterministic builds (e.g. bug 1036178).  However, I am curious, is there any status update on this?  You said "Efforts are underway", are those efforts detailed anywhere?  Can the community do anything to help move the efforts forward?
catlee: Could you please provide a status update?
Flags: needinfo?(catlee)
From the infrastructure side, we've started work on building docker containers for FirefoxOS. This work should be a good basis for use for Firefox builds as well. I can't comment to any changes required by the build system.
Flags: needinfo?(catlee)
Assignee: nobody → winter2718
Depends on: 1115874
Duplicate of this bug: 1166201
Duplicate of this bug: 1083277
Summary: [meta] Deterministic, bit-identical and/or verifiable Linux builds → [meta] Deterministic, reproducible, bit-identical and/or verifiable Linux builds
Depends on: 1166243
See Also: → tb-reproducible-build
Depends on: 1166538
Depends on: 1166547
Depends on: 1166550
Depends on: 1166554
Depends on: 1168231
Depends on: 1168316
Depends on: 1169158
Depends on: 1169174
Alias: fx-reproducible-build
Assignee: winter2718 → nobody
I was asked to look at the current state of this bug since it's been a while.

For Linux, we currently have nearly bit-identical builds when specifying the buildid by exporting MOZ_BUILD_DATE, except for the following:

 1) The NSS .chk files are always different (see #c8).
 2) Timestamps of the files inside the .tar.bz2 package will differ, but untarring them and using a recursive diff will reveal no differences (except for the aforementioned .chk files)
 3) PGO

For 1), it's possible we can just remove them from the built package. We already don't ship them on OSX (see the thread starting here: https://bugzilla.mozilla.org/show_bug.cgi?id=1096494#c7). It's not clear if we still want to support FIPS-mode, or if it is even possible given that we ship with NSS from our tree rather than something that's be FIPS-certified. This is probably worth looking into so we can remove that packaging code and make diffing packages easier. We might just need someone senior enough to sign off on removing it.

For 2) I'm not sure how much we care about generating two tarballs with the same hash. I think it's pretty straightforward to untar and then diff to determine equality, personally. However, it would be helpful if we had a clear idea of the end-goal for this bug to know whether this is something we should support or not.

I believe this is part of what "Disable library timestamping" patch is trying to fix, along with use of libfaketime. The "Misc reproducibility fixes" patch has already been absorbed by other bugs (bug 982075, bug 1168316, and bug 943331).

Part 3) is likely the bulk of the remaining effort, since our releases use PGO, and PGO is not currently reproducible. For this we need bug 935637, so that anyone who wants to verify a build can use the same PGO profile that we used to build a release. In addition to publishing PGO profiles, this may involve some build system work so that a PGO build can be done using an existing profile rather than first generating one.

Note that all this only applies to Linux builds - I haven't investigated other platforms yet.
While we may have (mostly) deterministic builds, there is still the issue of reproducible. That requires a build environment that is reproducible. There is still a lot of work to do here.

While TaskCluster is using Docker images on Linux now, Docker images are notoriously not very reproducible (because `yum update/install` installs the latest version of packages advertised on servers and that can change over time). We /could/ publish the Docker images Mozilla uses (they are probably already public for all I know). However, paranoid people will want to reproduce those independently. It's turtles all the way down of course. The question is how far do we want to go.

That's why we need an end-goal for this bug :)
I started a discussion of what a good end-goal could be https://groups.google.com/forum/#!topic/mozilla.dev.platform/dFCNZC_pnq4
Depends on: 1288610
Depends on: 1289812
Depends on: 1330608
Depends on: 1341674
Blocks: 1325617
Depends on: 1380458
Depends on: 1399679
Depends on: 1409583
Depends on: 1428989
Clearing a 4 year old needinfo request.
Flags: needinfo?(mikeperry)
Product: Core → Firefox Build System
Duplicate of this bug: 1155354
Attachment #8363336 - Attachment is obsolete: true
Attachment #8363337 - Attachment is obsolete: true
Duplicate of this bug: 1563434
Duplicate of this bug: 1563434

Note that with 3-tier PGO, Linux builds should now be reproducible, provided one does a --enable-profile-use build using the same profile data as the Firefox build (the profile data is now available as an artifact), and uses the same build environment. This also assumes clang does reproducible decisions based on the profile data.

shouldn't this be easy, at least for a linux build? why not make "the operating system config and packages used on official infrastructure" publicly available?

They are publicly available. Not necessarily in a very convenient or easily discoverable way, but they are. This is why this bug is not closed while everything is theoretically in place to allow for it.

(In reply to Mike Hommey [:glandium] from comment #31)

They are publicly available. Not necessarily in a very convenient or easily discoverable way, but they are. This is why this bug is not closed while everything is theoretically in place to allow for it.

Excellent. Give me the link and the hash that will match a release Firefox build. I'll try it out.

This is the task that built the linux64 68.0 release (http://ftp.mozilla.org/pub/firefox/releases/68.0/linux-x86_64/en-US/firefox-68.0.tar.bz2):
https://tools.taskcluster.net/groups/HtPXIhzvRNa6DfzEOMSb2A/tasks/ed97SEKlQzm2wWedt11XIQ/details

You'll find there the links for the docker image and toolchains used for that build, as well as the profile data, and the script run in the docker image, along with all the environment variables. Differences you'd "obviously" get:

  • signature files would either be missing (*.sig) or different (*.chk)
  • the precomplete file would be different because of the .sig files.
  • some API keys will be missing (this affects one file contained in omni.ja)

You can find, recursively, how each of the toolchains and docker images have been built.
Running the full script outside of taskcluster may fail. You may need to cherry-pick commands from the build log.

(In reply to Mike Hommey [:glandium] from comment #33)

This is the task that built the linux64 68.0 release (http://ftp.mozilla.org/pub/firefox/releases/68.0/linux-x86_64/en-US/firefox-68.0.tar.bz2):
https://tools.taskcluster.net/groups/HtPXIhzvRNa6DfzEOMSb2A/tasks/ed97SEKlQzm2wWedt11XIQ/details

You'll find there the links for the docker image and toolchains used for that build, as well as the profile data, and the script run in the docker image, along with all the environment variables. Differences you'd "obviously" get:

  • signature files would either be missing (.sig) or different (.chk)
  • the precomplete file would be different because of the .sig files.
  • some API keys will be missing (this affects one file contained in omni.ja)

You can find, recursively, how each of the toolchains and docker images have been built.
Running the full script outside of taskcluster may fail. You may need to cherry-pick commands from the build log.

these conditions seem pretty odd. why isn't it easier? why can't I just build this from a shared cache almost instantly?

Which specific conditions?

(In reply to Mike Hommey [:glandium] from comment #35)

Which specific conditions?

Is this the default build for a release copy of Firefox on any Linux distro? Furthermore, what needs to happen to reproduce this build in similar ways on Windows and macOS?

Even if I accept the argument about some narrow Linux build... this bug is still open after six years. Why is that?

(In reply to Robert Sayre from comment #36)

(In reply to Mike Hommey [:glandium] from comment #35)

Which specific conditions?

Is this the default build for a release copy of Firefox on any Linux distro?

Distros build Firefox the way they want to build it. The simple fact of building Firefox from a source tarball rather than mercurial will introduce small differences. Building from a different directory with a different version of the compiler (or a different compiler altogether) will introduce large differences. So no, there is not one release copy of Firefox on any Linux distro. There's the one that Mozilla ships, and then there are the ones each distro builds. Welcome to the Linux world.

Furthermore, what needs to happen to reproduce this build in similar ways on Windows and macOS?

The same kind of thing, except they require SDKs that are not redistributable, so you'd have to procure them on your own. You can find the tasks used for Windows and macOS builds on https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=255019294&revision=353628fec415324ca6aa333ab6c47d447ecc128e&searchStr=shippable%2Cbuild

Even if I accept the argument about some narrow Linux build... this bug is still open after six years. Why is that?

Because it's not as simple as you seem to think it is, and because resources are scarce.

(In reply to Mike Hommey [:glandium] from comment #37)

(In reply to Robert Sayre from comment #36)

(In reply to Mike Hommey [:glandium] from comment #35)

Which specific conditions?

Is this the default build for a release copy of Firefox on any Linux distro?

Distros build Firefox the way they want to build it.

So, no.

Furthermore, what needs to happen to reproduce this build in similar ways on Windows and macOS?

The same kind of thing, except they require SDKs that are not redistributable,

So, no.

Even if I accept the argument about some narrow Linux build... this bug is still open after six years. Why is that?

Because it's not as simple as you seem to think it is, and because resources are scarce.

So, no.

(In reply to Mike Hommey [:glandium] from comment #37)

The same kind of thing, except they require SDKs that are not redistributable, so you'd have to procure them on your own. You can find the tasks used for Windows and macOS builds on https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=255019294&revision=353628fec415324ca6aa333ab6c47d447ecc128e&searchStr=shippable%2Cbuild

Although, the Windows builds have not switched to 3-tier PGO yet, so they are, in fact, not reproducible because their profile data is not published. This will be different for 69. MacOS builds don't do PGO at all yet, so they should be reproducible.

(In reply to Mike Hommey [:glandium] from comment #39)

(In reply to Mike Hommey [:glandium] from comment #37)

The same kind of thing, except they require SDKs that are not redistributable, so you'd have to procure them on your own. You can find the tasks used for Windows and macOS builds on https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=255019294&revision=353628fec415324ca6aa333ab6c47d447ecc128e&searchStr=shippable%2Cbuild

Although, the Windows builds have not switched to 3-tier PGO yet, so they are, in fact, not reproducible because their profile data is not published. This will be different for 69. MacOS builds don't do PGO at all yet, so they should be reproducible.

Maybe the easiest way to make some progress on this bug: let's set up other builds that mirror the Firefox release. I'm happy to pay for the Firefox Linux x64 (Ubuntu) release reproducibility check.

If you want to reproduce the Firefox Linux x64 release from Ubuntu, you need to talk to Ubuntu.

(In reply to Robert Sayre from comment #40)

(In reply to Mike Hommey [:glandium] from comment #39)

(In reply to Mike Hommey [:glandium] from comment #37)

The same kind of thing, except they require SDKs that are not redistributable, so you'd have to procure them on your own. You can find the tasks used for Windows and macOS builds on https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=255019294&revision=353628fec415324ca6aa333ab6c47d447ecc128e&searchStr=shippable%2Cbuild

Although, the Windows builds have not switched to 3-tier PGO yet, so they are, in fact, not reproducible because their profile data is not published. This will be different for 69. MacOS builds don't do PGO at all yet, so they should be reproducible.

Maybe the easiest way to make some progress on this bug: let's set up other builds that mirror the Firefox release. I'm happy to pay for the Firefox Linux x64 (Ubuntu) release reproducibility check.

(In reply to Mike Hommey [:glandium] from comment #41)

If you want to reproduce the Firefox Linux x64 release from Ubuntu, you need to talk to Ubuntu.

Just to check understanding: you're saying that Windows and macOS contain binaries that might change, and you can't reproducibly build any "Firefox" build that might appear in a common Linux distro.

No, I'm saying the Windows builds for 68 are not reproducible because the profile data is not published, but it will be for 69, and that Windows and macOS builds should be reproducible, but if you want to reproduce them, you must download the necessary SDKs (MacOS SDK/Windows 10 SDK) on your own, and not from Mozilla servers, because they can't be redistributed by Mozilla due to their license.

As for common Linux distros, Mozilla doesn't control how they build Firefox, so reproducibility of those builds depends on them making them reproducible. The Debian builds used to be reproducible but apparently aren't anymore... which could be caused by a reproducibility issue in the rust compiler.

(In reply to Mike Hommey [:glandium] from comment #43)

No, I'm saying the Windows builds for 68 are not reproducible because the profile data is not published, but it will be for 69, and that Windows and macOS builds should be reproducible, but if you want to reproduce them, you must download the necessary SDKs (MacOS SDK/Windows 10 SDK) on your own, and not from Mozilla servers, because they can't be redistributed by Mozilla due to their license.

As for common Linux distros, Mozilla doesn't control how they build Firefox, so reproducibility of those builds depends on them making them reproducible. The Debian builds used to be reproducible but apparently aren't anymore... which could be caused by a reproducibility issue in the rust compiler.

So, just to check understanding: the Debian builds used be reproducible, but no longer are, and that might be due to an issue in the Rust compiler?

Looking a little bit closer, the status on Debian is that they changed what reproducible builds test, and they now test building from different build paths, so that's why the Debian builds are marked as non-reproducible on https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/firefox.html. They are otherwise reproducible.

Turned out to be less problematic than I thought it would be for Linux: https://glandium.org/blog/?p=3923
I also confirmed that the Mac toolchain has some determinism issue that manifests itself in mach-o headers. I didn't check Windows, but now that I think of it, ISTR last time I tried there were linker-induced non-determinism. But that may have been when were still using MSVC.

(In reply to Mike Hommey [:glandium] from comment #46)

Turned out to be less problematic than I thought it would be for Linux: https://glandium.org/blog/?p=3923
I also confirmed that the Mac toolchain has some determinism issue that manifests itself in mach-o headers. I didn't check Windows, but now that I think of it, ISTR last time I tried there were linker-induced non-determinism. But that may have been when were still using MSVC.

I'm not sure what you're saying.

"the Mac toolchain has some determinism issue that manifests itself in mach-o headers."

what do you mean, exactly?

and you "didn't check Windows, but now that I think of it, ISTR last time I tried there were linker-induced non-determinism"

are you working on this bug?

(In reply to Robert Sayre from comment #47)

"the Mac toolchain has some determinism issue that manifests itself in mach-o headers."

what do you mean, exactly?

That building Firefox for Mac with the same environment, same tools, etc. generates executables and libraries that differ in their headers. Although there seems to be other differences in the XUL library, even with LTO and signing disabled. See https://taskcluster-artifacts.net/GQvevOGtS0yOcB1OGb7H_w/0/public/diff.html

and you "didn't check Windows, but now that I think of it, ISTR last time I tried there were linker-induced non-determinism"

are you working on this bug?

Not actively. But also note that this bug is about Linux.

(In reply to Mike Hommey [:glandium] from comment #48)

are you working on this bug?

Not actively. But also note that this bug is about Linux.

so, although you are arguing, you're not working on this bug, this bug is only about Linux, and this bug is assigned to no one.

this all sounds mozilla af, but I want to check the facts

Robert, are you volunteering to help? Contributions are welcome.

(btw, Mike is explaining what is the state of this bug - not arguing)

(In reply to Sylvestre Ledru [:sylvestre] from comment #50)

Robert, are you volunteering to help? Contributions are welcome.

(btw, Mike is explaining what is the state of this bug - not arguing)

Who should this bug be assigned to? I don't think I'm the right choice.

Well, why is this issue unassigned? I mean, I suspect it's assigned to no one because it's been deemed impossible. Not in a technical sense, that's actually easy. Anybody can build something with Bazel and Nix and get good reproducibility.

So, like, what's the real reason?

First, please start contributing for real to this bug instead of questioning or suggesting that we have an hidden agenda.
Otherwise, i will have to moderate this thread.

The real reason is what mike said. We cannot do everything. Lately, we decided to focus on performances, which involved pgo (with many other things) which is making the reproducible effort much harder.
In parallel, Firefox is one of the most complex piece of software, we depends on many tools (compilers, linker, source to source tools, libraries, etc) which makes this work much harder than for most software or libraries.

As you can on Mike blog post, it is possible but not trivial. Otherwise, we would have done it already.

(In reply to Sylvestre Ledru [:sylvestre] from comment #53)

First, please start contributing for real to this bug instead of questioning or suggesting that we have an hidden agenda.
Otherwise, i will have to moderate this thread.

Go for it. No one commented on this thread for years. One action that would make sense: assign the bug to a person.

I'll add this, and then I'm done "arguing". While this bug doesn't show progress and is not assigned, countless bugs over the years have contributed to being in a state where it is, in fact, now possible, albeit not entirely trivial, to do it. It was not possible at all a few days ago (if you don't count betas and nightlies for which it's only been possible for a few weeks). Like, all the dependencies of this bug that are closed, and also a bunch of others.

(In reply to Mike Hommey [:glandium] from comment #55)

I'll add this, and then I'm done "arguing". While this bug doesn't show progress and is not assigned, countless bugs over the years have contributed to being in a state where it is, in fact, now possible, albeit not entirely trivial, to do it. It was not possible at all a few days ago (if you don't count betas and nightlies for which it's only been possible for a few weeks). Like, all the dependencies of this bug that are closed, and also a bunch of others.

I'm confused. are you saying this bug should be assigned to you? Maybe Sylvestre?

(In reply to Mike Hommey [:glandium] from comment #46)

Turned out to be less problematic than I thought it would be for Linux: https://glandium.org/blog/?p=3923

Whoa! That's an amazing milestone reached on this topic!
Thank you for the blogpost with the steps!
Thank you very much for all the work on this to everyone involved over the years!

Depends on: 1596025
Depends on: 1596283
Depends on: 1596341
Depends on: 1596350
Depends on: 1626188
You need to log in before you can comment on or make changes to this bug.