Docker containers used for TaskCluster Linux builds cannot run clang

RESOLVED FIXED

Status

defect
RESOLVED FIXED
4 years ago
Last year

People

(Reporter: Ehsan, Assigned: Ehsan)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(firefox43 fixed)

Details

I'm trying <https://treeherder.mozilla.org/#/jobs?repo=try&revision=83df91361e4f> to run Linux64 static analysis builds using Task Cluster.  I'm hitting the following error now: <https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/yVeuqpCLSDChXeFb9Blwyw/0/public/logs/live_backing.log>

03:07:09     INFO -  configure:1184: checking host system type
03:07:09     INFO -  configure:1205: checking target system type
03:07:09     INFO -  configure:1223: checking build system type
03:07:09     INFO -  configure:1302: checking for gawk
03:07:09     INFO -  configure:1387: checking for python2.7
03:07:09     INFO -  configure:1497: checking Python environment is Mozilla virtualenv
03:07:09     INFO -  configure:1718: checking for perl5
03:07:09     INFO -  configure:1718: checking for perl
03:07:09     INFO -  configure:2196: checking for objcopy
03:07:09     INFO -  configure:3449: checking for gcc
03:07:09     INFO -  configure:3562: checking whether the C compiler (/home/worker/workspace/build/src/clang/bin/clang  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works
03:07:09     INFO -  configure:3578: /home/worker/workspace/build/src/clang/bin/clang -o conftest   -L/home/worker/workspace/build/src/gtk3/usr/local/lib  conftest.c  1>&5
03:07:09     INFO -  Warning: -Wimplicit-int in configure: type specifier missing, defaults to 'int'
03:07:09     INFO -  configure:3575:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
03:07:09     INFO -  main(){return(0);}
03:07:09     INFO -  ^
03:07:09     INFO -  1 warning generated.
03:07:09     INFO -  /usr/bin/ld: crtbegin.o: No such file: No such file or directory
03:07:09     INFO -  clang: error: linker command failed with exit code 1 (use -v to see invocation)
03:07:09     INFO -  configure: failed program was:
03:07:09     INFO -  #line 3573 "configure"
03:07:09     INFO -  #include "confdefs.h"
03:07:09     INFO -  main(){return(0);}
03:07:09     INFO -  configure: error: installation or configuration problem: C compiler cannot create executables.
03:07:09     INFO -  *** Fix above errors and then restart with\
03:07:09     INFO -                 "/usr/bin/gmake -f client.mk build"
03:07:09     INFO -  gmake[2]: *** [configure] Error 1

Note this part: "/usr/bin/ld: crtbegin.o: No such file: No such file or directory"

Morgan, is there a correct way to run clang on these docker containers?  Is there any way to investigate what's going on here?

Thanks!
Flags: needinfo?(winter2718)
(Note that this version of clang has been built on a CentOS VM, and works on the debug configuration of these builds that are run through buildbot.)
It's likely a problem that can be solved by modifying the search path. I'm tagging Dustin here, since he's the most up to date on the state of these containers.
Flags: needinfo?(winter2718) → needinfo?(dustin)
Cool!  For the record, this is the new (CentOS 6) image.  And it's using clang and gtk3 from tooltool.

[root@taskcluster-worker ~]# find / -name crtbegin.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.4/32/crtbegin.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o

I'm fuzzy on the boundaries between glibc and gcc, but my impression is that crtbegin.o should be included with the compiler package.  Glandium will likely know more.

That said, I see

/home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 64: Xvfb: command not found
/home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 75: xvinfo: command not found

which suggests this is using a fairly old version of the tag (the workflow for testing docker images is still pretty sketchy, sorry).  I just re-pushed the latest and greatest, c3c0587087ed.  Try rebuilding?
Flags: needinfo?(dustin) → needinfo?(mh+mozilla)
oops, no needinfo for glandium yet
Flags: needinfo?(mh+mozilla)
In fact, Ehsan, if you change testing/docker/desktop-build/REGISTRY to 'taskcluster' in your try push, you'll guarantee getting the right image.  I just added bug 1201864 to log the image ID so we could at least verify which one a task ran.  Sorry about that mess!
I tried that, and now it seems like the build doesn't even pick up the clang binary.  Not sure why that is...

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2aaa5126c2dc
Good thing I didn't offer terms with that guarantee.. that was still running an image without Xvfb, which means it's from pretty early in my process on bug 1189892 - definitely before things were working.

I've bumped the version number to 0.1.2, of which There Is Only One, so hopefully this time you get the right image if you make a similar bump (see the patch I just put up for review).  I can confirm this is the most up-to-date:

dustin@euclid ~/code/moz/t/m-c $ docker run -ti --rm a4cef8b82f74 which Xvfb
/usr/bin/Xvfb

Sorry about all this.  Improving the docker-build process is definitely on my (and by extension our) radar -- I'm just hustling to get some build images that work at all, first.
Note the output of configure:

15:36:57     INFO -  loading cache ./config.cache
15:36:57     INFO -  checking host system type... x86_64-unknown-linux-gnu
15:36:57     INFO -  checking target system type... x86_64-unknown-linux-gnu
15:36:57     INFO -  checking build system type... x86_64-unknown-linux-gnu
15:36:57     INFO -  checking for gawk... (cached) gawk
15:36:57     INFO -  checking for python2.7... (cached) /usr/bin/python2.7
15:36:57     INFO -  Creating Python environment
15:36:57     INFO -  checking Python environment is Mozilla virtualenv... yes
15:36:58     INFO -  checking for perl5... (cached) /usr/bin/perl
15:36:58     INFO -  checking for objcopy... (cached) /home/worker/workspace/build/src/gcc/bin/objcopy
15:36:58     INFO -  checking for gcc... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc
15:36:58     INFO -  checking whether the C compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes
15:36:58     INFO -  checking whether the C compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) is a cross-compiler... no
15:36:58     INFO -  checking whether we are using GNU C... (cached) yes
15:36:58     INFO -  checking whether /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc accepts -g... (cached) yes
15:36:58     INFO -  checking for c++... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++
15:36:58     INFO -  checking whether the C++ compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes
15:36:58     INFO -  checking whether the C++ compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) is a cross-compiler... no
15:36:58     INFO -  checking whether we are using GNU C++... (cached) yes
15:36:58     INFO -  checking whether /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ accepts -g... (cached) yes
15:36:58     INFO -  checking for ranlib... (cached) ranlib
15:36:58     INFO -  checking for as... (cached) /home/worker/workspace/build/src/gcc/bin/as
15:36:58     INFO -  checking for ar... (cached) ar
15:36:58     INFO -  checking for ld... (cached) ld
15:36:58     INFO -  checking for strip... (cached) strip
15:36:58     INFO -  checking for windres... no
15:36:58     INFO -  checking for otool... no
15:36:58     INFO -  checking for ccache... (cached) /usr/bin/ccache
15:36:58     INFO -  checking for rustc... no
15:36:58     INFO -  checking how to run the C preprocessor... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc -E
15:36:58     INFO -  checking how to run the C++ preprocessor... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ -E
15:36:58     INFO -  checking for a BSD compatible install... (cached) /usr/bin/install -c
15:36:58     INFO -  checking whether ln -s works... (cached) yes

It's picking cached build configs.  Aren't try jobs supposed to be clobbers?
Thanks!  I guess I need to wait for that bug before retrying...
Depends on: 1201920
Thanks for debugging this with me :)
btw, you can work around this by changing `c64' to something else (both times it appears) in testing/taskcluster/tasks/builds/opt_linux64.yml
Do you mean c6?

I'm not quite sure what this means.  :-)
OK, now the clobber issue is fixed, and we're back to the missing crtbegin.o error:

<https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/fG2c0tnOS9mJh-otyA4_oA/0/public/logs/live_backing.log>
Yes, c6.  I'm having an ESTACKOVERFLOW day!

That still has

/home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 64: Xvfb: command not found
/home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 75: xvinfo: command not found

Given that inbound was closed last I checked, I wasn't able to check in all of my fixes yet.  I don't want to have you keep spinning on my mistakes.  Let's sit on this until Monday, when hopefully you can just push from the tip of inbound and things will "just work".
I'll take this bug and get a try run running for you as soon as this stuff is landed.  LMK if I should base it on something more than https://hg.mozilla.org/try/rev/85691d6755a1
Assignee: nobody → dustin
Thanks!  You probably want https://hg.mozilla.org/try/rev/399c4b47889e which fixes the clobber issue.  Note that the parent of that commit is my import of your patch in bug 1201920, so you may want to rebase on top of that once that bug lands.
If this looks good, then I'll hand the bug back over to you:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3ba8fa9753e&exclusion_profile=false
fingers crossed
OK!  The bits that were failing last week are no longer failing.

From the opt linux64 build:
  15:09:29     INFO -  checking whether the C compiler (/home/worker/workspace/build/src/gcc/bin/gcc  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes

but from the static analysis build:

  15:06:50     INFO -  checking whether the C compiler (/home/worker/workspace/build/src/clang/bin/clang  -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... no
  ...
  15:06:50     INFO -  /usr/bin/ld: crtbegin.o: No such file: No such file or directory

I assume that the different compiler is part of the static analysis process.

On the Buildbot hosts, we have several system-level compilers installed:

  https://github.com/mozilla/build-puppet/blob/master/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg#L11
  ... gcc  ...
  https://github.com/mozilla/gecko-dev/blob/3b0d95ee9e777c021324df85e9ae90aff0e9cd7f/testing/mozharness/configs/builds/releng_base_linux_64_builds.py#L114
  'gcc45_0moz3', 'gcc454_0moz1', 'gcc472_0moz1', 'gcc473_0moz1',

So perhaps one of those is providing the necessary crtbegin.o in that case.  That said, we have a system-level compiler in the TaskCluster docker image, too:

  dustin@euclid ~/code/moz/t/m-c $ docker run -ti --rm taskcluster/desktop-build:0.1.2
  [root@taskcluster-worker ~]# find / -name crtbegin.o
  /usr/lib/gcc/x86_64-redhat-linux/4.4.4/32/crtbegin.o
  /usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o
  [root@taskcluster-worker ~]# rpm -qf /usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o
  gcc-4.4.7-16.el6.x86_64

I'm really not sure what to suggest.
Assignee: dustin → ehsan
Note to self: this command gives me a docker image I can build clang in:

docker run -ti taskcluster/centos6-build:0.0.1
Flags: needinfo?(ehsan)
This CentOS ships with gcc 4.4.7 which is way too old to be able to build clang.  Is there a gcc 4.7 installation somewhere in the docker image that I can use?  I can't figure out how to install a new gcc myself.  :/
How can I bootstrap myself if my boots don't have any straps?!

I assume you can use the gcc473_0moz1 compiler at
  http://mockbuild-repos.pub.build.mozilla.org/releng/public/CentOS/6/x86_64/gcc473_0moz1-4.7.3-0moz1.x86_64.rpm

If we want to automate things, we should probably toss that into tooltool, but just 'yum install <that url>' should work for now.
Thanks, that seems to work.  I don't need any of this automated, at least not for now!
(In reply to Dustin J. Mitchell [:dustin] from comment #24)
> How can I bootstrap myself if my boots don't have any straps?!
> 
> I assume you can use the gcc473_0moz1 compiler at
>  
> http://mockbuild-repos.pub.build.mozilla.org/releng/public/CentOS/6/x86_64/
> gcc473_0moz1-4.7.3-0moz1.x86_64.rpm
> 
> If we want to automate things, we should probably toss that into tooltool,
> but just 'yum install <that url>' should work for now.

There are gcc packages in tooltool already, just not rpms.
https://hg.mozilla.org/mozilla-central/rev/262632c896db
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
So I built a new clang in the CentOS 6 docker container, but even with that I am still getting the crtbegin.o error on the try server: https://treeherder.mozilla.org/#/jobs?repo=try&revision=12766c63178e

Is there a way to run the build command in a docker container locally so that I can investigate why the compiler fails?  It seems to work just fine in the container I built it on...
Flags: needinfo?(ehsan) → needinfo?(dustin)
Yep, if you click through to the task inspector
  https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/0
and click on the "Task" tab
  https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/
there's a "Run Locally" that you can -- more or less -- copy/paste.  It will take a little while since there are no caches on your local machine.

However, note that you're running against taskcluster/desktop-build:0.1.1, while the latest image, including that against which my try jobs ran, was 0.1.2 (https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/ and https://hg.mozilla.org/integration/mozilla-inbound/file/8480dd03b9c1/testing/docker/desktop-build/VERSION).  If you rebase on top of central, you should see better behavior.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #29)
> Yep, if you click through to the task inspector
>   https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/0
> and click on the "Task" tab
>   https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/
> there's a "Run Locally" that you can -- more or less -- copy/paste.  It will
> take a little while since there are no caches on your local machine.

Great, that is handy to know!

> However, note that you're running against taskcluster/desktop-build:0.1.1,
> while the latest image, including that against which my try jobs ran, was
> 0.1.2 (https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/
> and
> https://hg.mozilla.org/integration/mozilla-inbound/file/8480dd03b9c1/testing/
> docker/desktop-build/VERSION).  If you rebase on top of central, you should
> see better behavior.

Yeah.  I actually already figured out the issue in the compiler and will soon have a patch ready for review.  My builds are almost working on try now, and I did rebase on top of central.  Thanks for your help!
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.