Closed Bug 1201776 Opened 10 years ago Closed 10 years ago

Docker containers used for TaskCluster Linux builds cannot run clang

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(firefox43 fixed)

RESOLVED FIXED
Tracking Status
firefox43 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

I'm trying <https://treeherder.mozilla.org/#/jobs?repo=try&revision=83df91361e4f> to run Linux64 static analysis builds using Task Cluster. I'm hitting the following error now: <https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/yVeuqpCLSDChXeFb9Blwyw/0/public/logs/live_backing.log> 03:07:09 INFO - configure:1184: checking host system type 03:07:09 INFO - configure:1205: checking target system type 03:07:09 INFO - configure:1223: checking build system type 03:07:09 INFO - configure:1302: checking for gawk 03:07:09 INFO - configure:1387: checking for python2.7 03:07:09 INFO - configure:1497: checking Python environment is Mozilla virtualenv 03:07:09 INFO - configure:1718: checking for perl5 03:07:09 INFO - configure:1718: checking for perl 03:07:09 INFO - configure:2196: checking for objcopy 03:07:09 INFO - configure:3449: checking for gcc 03:07:09 INFO - configure:3562: checking whether the C compiler (/home/worker/workspace/build/src/clang/bin/clang -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works 03:07:09 INFO - configure:3578: /home/worker/workspace/build/src/clang/bin/clang -o conftest -L/home/worker/workspace/build/src/gtk3/usr/local/lib conftest.c 1>&5 03:07:09 INFO - Warning: -Wimplicit-int in configure: type specifier missing, defaults to 'int' 03:07:09 INFO - configure:3575:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int] 03:07:09 INFO - main(){return(0);} 03:07:09 INFO - ^ 03:07:09 INFO - 1 warning generated. 03:07:09 INFO - /usr/bin/ld: crtbegin.o: No such file: No such file or directory 03:07:09 INFO - clang: error: linker command failed with exit code 1 (use -v to see invocation) 03:07:09 INFO - configure: failed program was: 03:07:09 INFO - #line 3573 "configure" 03:07:09 INFO - #include "confdefs.h" 03:07:09 INFO - main(){return(0);} 03:07:09 INFO - configure: error: installation or configuration problem: C compiler cannot create executables. 03:07:09 INFO - *** Fix above errors and then restart with\ 03:07:09 INFO - "/usr/bin/gmake -f client.mk build" 03:07:09 INFO - gmake[2]: *** [configure] Error 1 Note this part: "/usr/bin/ld: crtbegin.o: No such file: No such file or directory" Morgan, is there a correct way to run clang on these docker containers? Is there any way to investigate what's going on here? Thanks!
Flags: needinfo?(winter2718)
(Note that this version of clang has been built on a CentOS VM, and works on the debug configuration of these builds that are run through buildbot.)
It's likely a problem that can be solved by modifying the search path. I'm tagging Dustin here, since he's the most up to date on the state of these containers.
Flags: needinfo?(winter2718) → needinfo?(dustin)
Cool! For the record, this is the new (CentOS 6) image. And it's using clang and gtk3 from tooltool. [root@taskcluster-worker ~]# find / -name crtbegin.o /usr/lib/gcc/x86_64-redhat-linux/4.4.4/32/crtbegin.o /usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o I'm fuzzy on the boundaries between glibc and gcc, but my impression is that crtbegin.o should be included with the compiler package. Glandium will likely know more. That said, I see /home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 64: Xvfb: command not found /home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 75: xvinfo: command not found which suggests this is using a fairly old version of the tag (the workflow for testing docker images is still pretty sketchy, sorry). I just re-pushed the latest and greatest, c3c0587087ed. Try rebuilding?
Flags: needinfo?(dustin) → needinfo?(mh+mozilla)
oops, no needinfo for glandium yet
Flags: needinfo?(mh+mozilla)
In fact, Ehsan, if you change testing/docker/desktop-build/REGISTRY to 'taskcluster' in your try push, you'll guarantee getting the right image. I just added bug 1201864 to log the image ID so we could at least verify which one a task ran. Sorry about that mess!
I tried that, and now it seems like the build doesn't even pick up the clang binary. Not sure why that is... https://treeherder.mozilla.org/#/jobs?repo=try&revision=2aaa5126c2dc
Good thing I didn't offer terms with that guarantee.. that was still running an image without Xvfb, which means it's from pretty early in my process on bug 1189892 - definitely before things were working. I've bumped the version number to 0.1.2, of which There Is Only One, so hopefully this time you get the right image if you make a similar bump (see the patch I just put up for review). I can confirm this is the most up-to-date: dustin@euclid ~/code/moz/t/m-c $ docker run -ti --rm a4cef8b82f74 which Xvfb /usr/bin/Xvfb Sorry about all this. Improving the docker-build process is definitely on my (and by extension our) radar -- I'm just hustling to get some build images that work at all, first.
Note the output of configure: 15:36:57 INFO - loading cache ./config.cache 15:36:57 INFO - checking host system type... x86_64-unknown-linux-gnu 15:36:57 INFO - checking target system type... x86_64-unknown-linux-gnu 15:36:57 INFO - checking build system type... x86_64-unknown-linux-gnu 15:36:57 INFO - checking for gawk... (cached) gawk 15:36:57 INFO - checking for python2.7... (cached) /usr/bin/python2.7 15:36:57 INFO - Creating Python environment 15:36:57 INFO - checking Python environment is Mozilla virtualenv... yes 15:36:58 INFO - checking for perl5... (cached) /usr/bin/perl 15:36:58 INFO - checking for objcopy... (cached) /home/worker/workspace/build/src/gcc/bin/objcopy 15:36:58 INFO - checking for gcc... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc 15:36:58 INFO - checking whether the C compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes 15:36:58 INFO - checking whether the C compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) is a cross-compiler... no 15:36:58 INFO - checking whether we are using GNU C... (cached) yes 15:36:58 INFO - checking whether /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc accepts -g... (cached) yes 15:36:58 INFO - checking for c++... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ 15:36:58 INFO - checking whether the C++ compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes 15:36:58 INFO - checking whether the C++ compiler (/usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) is a cross-compiler... no 15:36:58 INFO - checking whether we are using GNU C++... (cached) yes 15:36:58 INFO - checking whether /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ accepts -g... (cached) yes 15:36:58 INFO - checking for ranlib... (cached) ranlib 15:36:58 INFO - checking for as... (cached) /home/worker/workspace/build/src/gcc/bin/as 15:36:58 INFO - checking for ar... (cached) ar 15:36:58 INFO - checking for ld... (cached) ld 15:36:58 INFO - checking for strip... (cached) strip 15:36:58 INFO - checking for windres... no 15:36:58 INFO - checking for otool... no 15:36:58 INFO - checking for ccache... (cached) /usr/bin/ccache 15:36:58 INFO - checking for rustc... no 15:36:58 INFO - checking how to run the C preprocessor... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/gcc -E 15:36:58 INFO - checking how to run the C++ preprocessor... (cached) /usr/bin/ccache /home/worker/workspace/build/src/gcc/bin/g++ -E 15:36:58 INFO - checking for a BSD compatible install... (cached) /usr/bin/install -c 15:36:58 INFO - checking whether ln -s works... (cached) yes It's picking cached build configs. Aren't try jobs supposed to be clobbers?
Thanks! I guess I need to wait for that bug before retrying...
Depends on: 1201920
Thanks for debugging this with me :)
btw, you can work around this by changing `c64' to something else (both times it appears) in testing/taskcluster/tasks/builds/opt_linux64.yml
Do you mean c6? I'm not quite sure what this means. :-)
OK, now the clobber issue is fixed, and we're back to the missing crtbegin.o error: <https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/fG2c0tnOS9mJh-otyA4_oA/0/public/logs/live_backing.log>
Yes, c6. I'm having an ESTACKOVERFLOW day! That still has /home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 64: Xvfb: command not found /home/worker/workspace/build/src/testing/taskcluster/scripts/builder/build-linux.sh: line 75: xvinfo: command not found Given that inbound was closed last I checked, I wasn't able to check in all of my fixes yet. I don't want to have you keep spinning on my mistakes. Let's sit on this until Monday, when hopefully you can just push from the tip of inbound and things will "just work".
I'll take this bug and get a try run running for you as soon as this stuff is landed. LMK if I should base it on something more than https://hg.mozilla.org/try/rev/85691d6755a1
Assignee: nobody → dustin
Thanks! You probably want https://hg.mozilla.org/try/rev/399c4b47889e which fixes the clobber issue. Note that the parent of that commit is my import of your patch in bug 1201920, so you may want to rebase on top of that once that bug lands.
If this looks good, then I'll hand the bug back over to you: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3ba8fa9753e&exclusion_profile=false fingers crossed
OK! The bits that were failing last week are no longer failing. From the opt linux64 build: 15:09:29 INFO - checking whether the C compiler (/home/worker/workspace/build/src/gcc/bin/gcc -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... yes but from the static analysis build: 15:06:50 INFO - checking whether the C compiler (/home/worker/workspace/build/src/clang/bin/clang -L/home/worker/workspace/build/src/gtk3/usr/local/lib ) works... no ... 15:06:50 INFO - /usr/bin/ld: crtbegin.o: No such file: No such file or directory I assume that the different compiler is part of the static analysis process. On the Buildbot hosts, we have several system-level compilers installed: https://github.com/mozilla/build-puppet/blob/master/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg#L11 ... gcc ... https://github.com/mozilla/gecko-dev/blob/3b0d95ee9e777c021324df85e9ae90aff0e9cd7f/testing/mozharness/configs/builds/releng_base_linux_64_builds.py#L114 'gcc45_0moz3', 'gcc454_0moz1', 'gcc472_0moz1', 'gcc473_0moz1', So perhaps one of those is providing the necessary crtbegin.o in that case. That said, we have a system-level compiler in the TaskCluster docker image, too: dustin@euclid ~/code/moz/t/m-c $ docker run -ti --rm taskcluster/desktop-build:0.1.2 [root@taskcluster-worker ~]# find / -name crtbegin.o /usr/lib/gcc/x86_64-redhat-linux/4.4.4/32/crtbegin.o /usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o [root@taskcluster-worker ~]# rpm -qf /usr/lib/gcc/x86_64-redhat-linux/4.4.4/crtbegin.o gcc-4.4.7-16.el6.x86_64 I'm really not sure what to suggest.
Assignee: dustin → ehsan
Note to self: this command gives me a docker image I can build clang in: docker run -ti taskcluster/centos6-build:0.0.1
Flags: needinfo?(ehsan)
This CentOS ships with gcc 4.4.7 which is way too old to be able to build clang. Is there a gcc 4.7 installation somewhere in the docker image that I can use? I can't figure out how to install a new gcc myself. :/
How can I bootstrap myself if my boots don't have any straps?! I assume you can use the gcc473_0moz1 compiler at http://mockbuild-repos.pub.build.mozilla.org/releng/public/CentOS/6/x86_64/gcc473_0moz1-4.7.3-0moz1.x86_64.rpm If we want to automate things, we should probably toss that into tooltool, but just 'yum install <that url>' should work for now.
Thanks, that seems to work. I don't need any of this automated, at least not for now!
(In reply to Dustin J. Mitchell [:dustin] from comment #24) > How can I bootstrap myself if my boots don't have any straps?! > > I assume you can use the gcc473_0moz1 compiler at > > http://mockbuild-repos.pub.build.mozilla.org/releng/public/CentOS/6/x86_64/ > gcc473_0moz1-4.7.3-0moz1.x86_64.rpm > > If we want to automate things, we should probably toss that into tooltool, > but just 'yum install <that url>' should work for now. There are gcc packages in tooltool already, just not rpms.
So I built a new clang in the CentOS 6 docker container, but even with that I am still getting the crtbegin.o error on the try server: https://treeherder.mozilla.org/#/jobs?repo=try&revision=12766c63178e Is there a way to run the build command in a docker container locally so that I can investigate why the compiler fails? It seems to work just fine in the container I built it on...
Flags: needinfo?(ehsan) → needinfo?(dustin)
Yep, if you click through to the task inspector https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/0 and click on the "Task" tab https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/ there's a "Run Locally" that you can -- more or less -- copy/paste. It will take a little while since there are no caches on your local machine. However, note that you're running against taskcluster/desktop-build:0.1.1, while the latest image, including that against which my try jobs ran, was 0.1.2 (https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/ and https://hg.mozilla.org/integration/mozilla-inbound/file/8480dd03b9c1/testing/docker/desktop-build/VERSION). If you rebase on top of central, you should see better behavior.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #29) > Yep, if you click through to the task inspector > https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/0 > and click on the "Task" tab > https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/ > there's a "Run Locally" that you can -- more or less -- copy/paste. It will > take a little while since there are no caches on your local machine. Great, that is handy to know! > However, note that you're running against taskcluster/desktop-build:0.1.1, > while the latest image, including that against which my try jobs ran, was > 0.1.2 (https://tools.taskcluster.net/task-inspector/#OQG3WzmQQHa1h3EhtzbH1g/ > and > https://hg.mozilla.org/integration/mozilla-inbound/file/8480dd03b9c1/testing/ > docker/desktop-build/VERSION). If you rebase on top of central, you should > see better behavior. Yeah. I actually already figured out the issue in the compiler and will soon have a patch ready for review. My builds are almost working on try now, and I did rebase on top of central. Thanks for your help!
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.