Closed Bug 683975 Opened 13 years ago Closed 12 years ago

Need infra for developer contributed compilers

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: espindola)

Details

(Whiteboard: [leave open])

Attachments

(1 file, 26 obsolete files)

4.34 KB, patch
rail
: review+
espindola
: checked-in+
Details | Diff | Splinter Review
Some use cases:

* dev is doing clang work, he wants to do a try build with a new version of clang
* dev patches gcc to fix a compiler bug, he wants to do a try build with it and then update the 'real' compilers once the try build is good

These are currently impossible or very difficult without a lot of back and forth with releng.

One idea was to have a pair of svn repos, one for try and one for production. Compilers would be checked in to e.g.
/linux32/gcc45/...
/macosx64/clangXX/...

the mozconfigs for builds would specify the path to repo, as well as a revision

the build machines would then look for e.g. /path/to/local/compilers/gcc45-r1 and if it doesn't exist, check it out from svn.

this prevents having to do updates across revisions with svn.
From earlier conversations, a key point here is that this needs to be hands-free for releng -- a developer with try access should be able to test a custom compiler with no human interaction.

A few questions:
 - are any of these compilers secret, or can they be publicly accessible?
 - how large are the compilers?
 - how often do you expect new compilers to be uploaded?  Are we looking at some percentage of the try runs, or one or two per week?
 - do the compilers need to be kept around after the try run(s)?

I'd like to avoid adding additional systems or moving parts to support this, if possible.

Other implementation options:

 - drop files somewhere on FTP -- we'd have to figure out the auth for that, and data lifetime

 - user hg repos -- auth for free, and the dev controls the data lifetime (by deleting the repo)

Either of these options could be combined with changes to the build system, and no changes to the buildbot configs.  Also, they both follow existing network flows.
Is this bug the same as 678088?

I like the idea of using mozconfigs to specify both repo and revision. Once they are in m-c they will be a very convenient. This should be used for all compilers and other tools, including the ones we build m-c with. Advantages:

*) releng doesn't need to be involved in testing or deploying a new compiler or tool (let say we start depending on bison for some crazy reason).
*) Test compiler and production compilers use the same infrastructure (test what you ship)
*) Tests can be done with the regular try servers.
*) Production changes (switch to clang on mac, gcc 4.6 on linux, etc) show up as a regular push on tbpl and can be monitored and reverted by regular devs.
>  - how often do you expect new compilers to be uploaded?  Are we looking at
> some percentage of the try runs, or one or two per week?

Even less than that. Since I joined, I think all the pushes would have been

* Two attempts to switch to gcc 4.5
* Two try versions of clang
* some pushes (less than 5) trying to fix gcc bugs

Do note that while there are not many of these, they do have a tendency to cluster, so there can be a week with 3 or 4 try pushes, which is a lot more than what existing infrastructure is convenient for.
Still looking for answers to the other questions, but it seems like this could be done with almost no change to releng infra:

First, add support for selecting/downloading tools in mozconfigs, using an arbitrary URL.  Any downloads should be part of the objdir, and not re-downloaded unnecessarily[1].

When you want to *test* a new tool, add it to a user repo and pull directly from that repo, since build machines don't have access to arbitrary internet URLs.  However, once the tool is in production this would put quite a load on hg, so at that point file a bug with releng to get the tool installed poolwide, and adjust the mozconfigs to point to the installed version.

Thoughts?

[1] Actually, this may be a lot of complexity for little benefit, particularly since try always clobbers.
Assignee: server-ops-releng → dustin
So from an IRC conversation in #build, the above proposal has the downside of requiring releng intervention in the deployment phase.  It also makes it hard to keep old tools around until their branches die, and age them out at that time.  Here's a new proposal, as discussed:

Build a system for downloading tools during the build process, but download them somewhere *outside* the objdir, so that they will persist between builds even across clobbers.  Store them using hashes of some sort to prevent collisions, and build a system to age out unused tools.

For the let's-try-a-new-tool phase, this system would be pointed at tools in a user hg repo.  This is a place that devs can access easily and iterate quickly.  Since it's content-addressable, parallel builds can even use different versions of the tools.

For the deploy-a-new-tool phase, the tool would be uploaded to a well-known directory on ftp.m.o, and downloaded from there.  Different branches can still use different versions of tools -- they'll just download the required version.  So this upload to FTP could happen as soon as the new tool is deemed likely to be deployed -- there's no need to coordinate it with mozconfig changes in m-c or m-i or the like.

The FTP upload is still a question.  Options:

 * releng bug: "Bug XXXXXX: Please add hg:/path/to/some/tool/1.2.3 to /tools/some/tool/1.2.3 on FTP".  This is a simple scp, so the bug would be quick to take care of.

 * SSH key access to the directory for a wider audience: Maybe release-drivers? select devs working on new tools?

 * Automation: a web-based uploader, or some other automatic tool (some security concerns there)

For the record, my opposition to SVN is based on two things:
 1. We don't actively use SVN in build right now, so starting to use it would add another moving piece to break
 2. SVN, while better than a DVCS, is not the right tool for handling large binary files

P.S. We briefly discussed using a CAS for this purpose.  Theoretically it's a good solution, but we have no such infrastructure set up at this point, and no concrete plans to do so.
Since the proposed system can point to a different repo, it should be possible to do try by uploading compilers to people.mozilla.com or a user repo (they also support http anyway).

With try jobs being so easy, I think it is reasonable to ask releng to copy an yet unused package to a ftp server once it is ready to be used in production.

I will try to build something/copy it from rust.
Rafael, access from the build network is pretty severely restricted (well, will be soon), so this would have to be in a fairly limited number of places.  I suspect that will *not* include people, but will include user repos.
That should be fine. A user repo can include a 100MB tar.bz2, no?
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #8)
> That should be fine. A user repo can include a 100MB tar.bz2, no?

Yes.
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> The FTP upload is still a question.  Options:
> 
>  * releng bug: "Bug XXXXXX: Please add hg:/path/to/some/tool/1.2.3 to
> /tools/some/tool/1.2.3 on FTP".  This is a simple scp, so the bug would be
> quick to take care of.

RelEng already has access to FTP through cltbld & ffxbld accounts on surf. Would it be enough to create, eg, /pub/mozilla.org/buildtools owned by cltbld?
That's my thinking, yes.
I think this is a releng task from here on out, then?
Assignee: dustin → nobody
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
I can do the first bits: Converting the spec file into a shell script to build a relocatable tar.br2.
Assignee: nobody → respindola
Attached file gcc build script (obsolete) —
I have had this bug as an idle task for some time. I have started by creating a build script for gcc. The final objective it that running the script on any linux machine should produce

* a .tar.bz2 file with a 32 bit compiler
* a .tar.bz2 file with a 64 bit compiler
* a pair of rpm files with the same contents as the .tar.bz2 files

In addition, the content of those files should be reproducible.

What I have so far is a script that works both on a new fedora 16 and in the very old centos 5 that we use. So far it only produces a 64 bit toolchain and the results are partially deterministic. Running the script twice will produce the exact same contents, but running it on two different distros will not.

I would like to check this in somewhere in m-c while I work on finishing it up if possible.
Attachment #589030 - Flags: review?(rail)
Comment on attachment 589030 [details]
gcc build script

Looks good to me.
Attachment #589030 - Attachment mime type: text/x-python → text/plain
Attachment #589030 - Flags: review?(rail) → review+
Attachment #589031 - Attachment mime type: application/x-shellscript → text/plain
Attachment #589031 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/a363fe2b2d3f
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch bootstrap the linker too. (obsolete) — Splinter Review
Attachment #589030 - Attachment is obsolete: true
Attachment #589031 - Attachment is obsolete: true
Attachment #590032 - Flags: review?(rail)
Attachment #590032 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/8c71c2afb684
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch build glibc (obsolete) — Splinter Review
We this we should be producing identical binaries everywhere, but I haven't tested that yet.
Attachment #590032 - Attachment is obsolete: true
Attachment #590936 - Flags: review?(rail)
Comment on attachment 590936 [details] [diff] [review]
build glibc

Review of attachment 590936 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/unix/build-toolchain/build-gcc.py
@@ +103,5 @@
>  def build_source_dir(prefix, version):
>      return source_dir + '/' + prefix + version
>  
>  binutils_version = "2.21.1"
> +glibc_version = "2.13" #FIXME: should probably use 2.5.1

Yup. 2.13 sounds too old. The slaves have glibc-2.5-12 installed. r+ with this fixed unless you explicitly want to use 2.13
Attachment #590936 - Flags: review?(rail) → review+
You mean too *new*, right?
My hope was to push this first since it works and then backport any patches needed to build 2.5 with currentish gcc.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #24)
> You mean too *new*, right?
> My hope was to push this first since it works and then backport any patches
> needed to build 2.5 with currentish gcc.

I mean I prefer 2.5, because it used by our current infrastructure. Using 2.13 shouldn't be a problem though.
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ca66b666f85

Yes, we will probably have to switch to 2.5. This is just a step in the right direction.
A particular issue in making reproducible builds is that the source and build directories show up in the debug info.

While it might be possible to have a more elegant solution, this patch implement the simple one: always use the same directory.
Attachment #590936 - Attachment is obsolete: true
Attachment #591353 - Flags: review?(rail)
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #26)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/3ca66b666f85
> 
> Yes, we will probably have to switch to 2.5. This is just a step in the
> right direction.

https://hg.mozilla.org/mozilla-central/rev/3ca66b666f85

Leaving open for remaining work.
Comment on attachment 591353 [details] [diff] [review]
Always extract source and build in the same directory

Review of attachment 591353 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the following fix:

::: build/unix/build-toolchain/build-gcc.py
@@ +99,5 @@
>  ##############################################
>  
> +# The directories end up in the debug info, so the easy way of getting
> +# a reproducible build is to run it in a know absolute directory.
> +base_dir = "/tmp/moz-toolchain"

Hmmm, this may be a problem because /tmp is not a separate mount.
Can you use /builds/slave/moz-toolschain so it will be purged by automation?
Attachment #591353 - Flags: review?(rail) → review+
Attached patch Build glibc with the new gcc (obsolete) — Splinter Review
This patch has two small changes:
*) Don't build the c++ compiler on the first stage, saving some time
*) Build glibc with the compiler we just built (i.e. same stage). This is so that
  *) We only have to worry about glibc building with one gcc version
  *) The last glibc is built with at full compiler. In a future patch I will have to strip down the stage 1 gcc a bit more.
Attachment #591353 - Attachment is obsolete: true
Attachment #591699 - Flags: review?(rail)
Attachment #591699 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/a90aec7e5250

Not sure if this is done yet, leaving open.
Attached patch Use the built libc when linking (obsolete) — Splinter Review
This patch adds a patch that was missing from hg (sorry about that) and changes the glibc build to be installed in $prefix/lib64. This is where gcc looks first, and so makes sure we use our glibc when linking, no just for headers.
Attachment #591699 - Attachment is obsolete: true
Attachment #591830 - Flags: review?(rail)
Attachment #591830 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/7a0a7c36def8
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
libc.so is a linker script, and without this patch it points to the build directory. With this patch it uses a relative path so that the correct objects are used after the build directory is deleted.
Attachment #591830 - Attachment is obsolete: true
Attachment #592729 - Flags: review?(rail)
Attachment #592729 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/e42f47918faf

Not sure if this needs to be left open; but doing so seeing how many times it's been resolved and then reverted. If there are any more landings on inbound, adding "[leave open]" to the whiteboard will prevent people merging from resolving - if that helps? :-)
Whiteboard: [leave open]
Attached patch Start dowgrading glibc (obsolete) — Splinter Review
For glibc 2.12.2 all that was needed was manually running autoconf.
Attachment #592729 - Attachment is obsolete: true
Attachment #596967 - Flags: review?(rail)
Attachment #596967 - Flags: review?(rail) → review+
Attached patch downgrade glibc to 2.11.1 (obsolete) — Splinter Review
The patch is a bit big because glibc 2.11.1 doesn't build with make 3.82, so we have to build make 3.81 first.
Attachment #596967 - Attachment is obsolete: true
Attachment #597546 - Flags: review?(rail)
Attachment #597546 - Flags: review?(rail) → review+
Attached patch Downgrade glibc to 2.10.1. (obsolete) — Splinter Review
Downgrade clang to 2.10.1.

This time all that was needed was remove the broken version check that think that 2.21 < 2.13.
Attachment #597546 - Attachment is obsolete: true
Attachment #597825 - Flags: review?(rail)
Attachment #597825 - Attachment description: Downgrade clang to 2.10.1. → Downgrade glibc to 2.10.1.
Attachment #597825 - Flags: review?(rail) → review+
Attached patch Downgrade clang to 2.9 (obsolete) — Splinter Review
This required backporting patches that avoid the extremely brittle linker script patching that old glibc's used to do.
Attachment #597825 - Attachment is obsolete: true
Attachment #597875 - Flags: review?(rail)
Attachment #597875 - Flags: review?(rail) → review+
Attached patch dowgrade glibc to 2.7 (obsolete) — Splinter Review
Surprisingly all that was needed this time was rebasing the patch.
Attachment #597875 - Attachment is obsolete: true
Attachment #597921 - Flags: review?(rail)
Attachment #597921 - Flags: review?(rail) → review+
Attached patch dowgrade glibc to 2.6.1 (obsolete) — Splinter Review
getting close :-)
Attachment #597921 - Attachment is obsolete: true
Attachment #598213 - Flags: review?(rail) → review+
With this patch we are finally at glibc 2.5.1. The patch also includes a fix (adding -p to gzip) to make the build reproducible again.
Attachment #598213 - Attachment is obsolete: true
Attachment #598257 - Flags: review?(rail)
Attachment #598257 - Flags: review?(rail) → review+
3 small fixes to make the results more reproducible:

*    Build our own linux headers.
*    Disable multilib as we don't build a 32 bit glibc.
*    Build c++ on stage1 as the binutils configures wants a c++ preprocessor.
Attachment #598257 - Attachment is obsolete: true
Attachment #598420 - Flags: review?(rail)
Attachment #598420 - Flags: review?(rail) → review+
Attached patch build unifdef (obsolete) — Splinter Review
We need to build unifdef since the version in centos and the one in fedora 16 produce different results.
Attachment #598420 - Attachment is obsolete: true
Attachment #600242 - Flags: review?(rail)
Attachment #600242 - Flags: review?(rail) → review+
This avoids libstdc++ being built with the system glibc headers.
Attachment #600242 - Attachment is obsolete: true
Attachment #600952 - Flags: review?(rail)
Attachment #600952 - Flags: review?(rail) → review+
This patch does two things
* disable gcc's fixinc. It was copying system headers into the toolchain.tar file
* point FLAGS_FOR_TARGET to the correct header install dir. Stage2 was still using the glibc headers from /usr/include

With these fixes the last header difference is auto-host.h.
Attachment #600952 - Attachment is obsolete: true
Attachment #601256 - Flags: review?(rail)
Attachment #601256 - Flags: review?(rail) → review+
Attached patch Disable lto (obsolete) — Splinter Review
Without --enable-lto or --disable-lto configure will enable it if it finds libelf-dev installed, which makes the build harder to reproducible.
Attachment #601256 - Attachment is obsolete: true
Attachment #608834 - Flags: review?(rail)
Attachment #608834 - Flags: review?(rail) → review+
Attached patch Set PATH when building stage2 (obsolete) — Splinter Review
gcc's configure runs ldd to find glibc's version. If we don't set the PATH then it find the host glibc version.
Attachment #608834 - Attachment is obsolete: true
Attachment #608866 - Flags: review?(rail)
Attachment #608866 - Flags: review?(rail) → review+
It is only used for compressed debug info which we don't use and is normally enabled iff /usr/include/zlib.h is found.
Attachment #608866 - Attachment is obsolete: true
Attachment #611533 - Flags: review?(rail)
Attachment #611533 - Flags: review?(rail) → review+
Attached patch make lib a symlink to lib64 (obsolete) — Splinter Review
gcc (or its build system) has a bug in that xgcc will look at inst/lib but not inst/lib64, so libraries built by gcc during its build process use the system libc.

The easy way to fix this is to just create a link. With this libiberty.a built on centos5 becomes identical to the one built on fedora16.
Attachment #611533 - Attachment is obsolete: true
Attachment #620724 - Flags: review?(rail)
Attachment #620724 - Flags: review?(rail) → review+
This patch backports part of libtool's 74c8993c178a1386ea5e2363a01d919738402f30 commit to the copy in gcc 4.5. With this we produce identical libstdc++.a in centos and fedora.
Attachment #620724 - Attachment is obsolete: true
Attachment #621743 - Flags: review?
Attachment #621743 - Flags: review? → review?(rail)
Attachment #621743 - Flags: review?(rail) → review+
Attached patch do a 3 step bootstrap (obsolete) — Splinter Review
In the two step bootstrap we got all the .a files identical. Unfortunately, given the order we have to build gcc and glibc, stage1 .a files end up being linked in stage2 .so files.

By doing a 3 step bootstrap we get a lot more files with identical md5s. For example, simple utilities like sln are now bit by bit identical.
Attachment #621743 - Attachment is obsolete: true
Attachment #623915 - Flags: review?(rail)
Attachment #623915 - Flags: review?(rail) → review+
Attached patch build gawk (obsolete) — Splinter Review
glibc's build uses gawk and the versions in centos 5 and fedora 16 produce different results. With this we now get an identical libc-2.5.1.so.
Attachment #623915 - Attachment is obsolete: true
Attachment #624268 - Flags: review?(rail)
Attachment #624268 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/71f88d947796
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [leave open]
Status: REOPENED → ASSIGNED
Attached patch build zlibSplinter Review
Binutils' configure tries to add -lz to the link line if it can. That fails on fedora 16 because the system one depends on a newer libc than the one we are using.

By building zlib -lz gets used in both the centos and fedora builds.
Attachment #624268 - Attachment is obsolete: true
Attachment #624740 - Flags: review?(rail)
Attachment #624740 - Flags: review?(rail) → review+
https://hg.mozilla.org/mozilla-central/rev/bc8c8f0afdb7
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
Whiteboard: [leave open]
Lets call this fixed by tooltool. We still have to move the gcc packages to use it, but it is less confusing to have a bug for that.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: