Closed Bug 1164641 Opened 4 years ago Closed 4 years ago

ld failing for docker ff64 builds: "this linker was not configured to use sysroots"

Categories

(Release Engineering :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mrrrgn, Assigned: mrrrgn)

References

Details

The build completes, and tests pass, despite the problem; but it seems to cause mozharness to return a failing exit status. Further, it's just a bad idea to leave this unchecked.

-g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/psutil/_psutil_linux.o -o build/lib.linux-x86_64-2.7/_psutil_linux.so
12:55:21     INFO -  /home/worker/workspace/build/src/gcc/bin/ld: this linker was not configured to use sysroots
12:55:21     INFO -  collect2: error: ld returned 1 exit status
12:55:21     INFO -  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Blocks: 1151508
Assignee: nobody → winter2718
To put this in context, it's part of the ./configure run, so it's quite possible that this is an entirely normal thing -- ./configure tries lots of things experimentally, then sets the configuration so that the Makefile will use what works.  So presumably here configure did not include --sysroots in the linker flags.  Most checks in ./configure redirect these errors to config.log and just print "ok" or something like that after the check is complete.  This case may be an exception to that rule (and that would be a good fix).

I have a hard time seeing how a difference in a ./configure check could, many minutes later, alter mozharness's exit status.  There must be something here we're not seeing.
So, with the build working, and all the make check tests passing, this is the only difference I see between my failed TC job logs and a successful BB job:

Succeeding job:

06:15:00     INFO -  Installing setuptools, pip...done.
06:15:01     INFO -  running build_ext
06:15:01     INFO -  building '_psutil_linux' extension
06:15:01     INFO -  creating build
06:15:01     INFO -  creating build/temp.linux-x86_64-2.7
06:15:01     INFO -  creating build/temp.linux-x86_64-2.7/psutil
06:15:01     INFO -  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DNDEBUG -O3 -Wall -Wstrict-prototypes -fPIC -I/tools/python27/include/python2.7 -c psutil/_psutil_linux.c -o build/temp.linux-x86_64-2.7/psutil/_psutil_linux.o
06:15:01     INFO -  creating build/lib.linux-x86_64-2.7
06:15:01     INFO -  gcc -pthread -shared -Wl,-rpath=/tools/python27/lib build/temp.linux-x86_64-2.7/psutil/_psutil_linux.o -L/tools/python27/lib -lpython2.7 -o build/lib.linux-x86_64-2.7/_psutil_linux.so
06:15:01     INFO -  building '_psutil_posix' extension
06:15:01     INFO -  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DNDEBUG -O3 -Wall -Wstrict-prototypes -fPIC -I/tools/python27/include/python2.7 -c psutil/_psutil_posix.c -o build/temp.linux-x86_64-2.7/psutil/_psutil_posix.o
06:15:01     INFO -  gcc -pthread -shared -Wl,-rpath=/tools/python27/lib build/temp.linux-x86_64-2.7/psutil/_psutil_posix.o -L/tools/python27/lib -lpython2.7 -o build/lib.linux-x86_64-2.7/_psutil_posix.so
06:15:01     INFO -  copying build/lib.linux-x86_64-2.7/_psutil_linux.so ->
06:15:01     INFO -  copying build/lib.linux-x86_64-2.7/_psutil_posix.so ->
06:15:01     INFO -  checking Python environment is Mozilla virtualenv... yes

My failed job:

22:22:46     INFO -  Installing setuptools, pip...done.
22:22:47     INFO -  running build_ext
22:22:47     INFO -  building '_psutil_linux' extension
22:22:47     INFO -  creating build
22:22:47     INFO -  creating build/temp.linux-x86_64-2.7
22:22:47     INFO -  creating build/temp.linux-x86_64-2.7/psutil
22:22:47     INFO -  x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c psutil/_psutil_linux.c -o build/temp.linux-x86_64-2.7/psutil/_psutil_linux.o
22:22:47     INFO -  creating build/lib.linux-x86_64-2.7
22:22:47     INFO -  x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/psutil/_psutil_linux.o -o build/lib.linux-x86_64-2.7/_psutil_linux.so
22:22:47     INFO -  /home/worker/workspace/build/src/gcc/bin/ld: this linker was not configured to use sysroots
22:22:47     INFO -  collect2: error: ld returned 1 exit status
22:22:47     INFO -  error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
22:22:47     INFO -  Error processing command. Ignoring because optional. (optional:setup.py:python/psutil:build_ext:--inplace)
22:22:47     INFO -  checking Python environment is Mozilla virtualenv... yes
22:22:47     INFO -  checking for perl5... no
This is happening as part of trying to build the binary bits of the psutil Python module. I don't know why it's using `/home/worker/workspace/build/src/gcc/bin/ld` as the linker, maybe there's something in your environment causing it to pick that up? (It might be as simple as having that in $PATH.)

psutil is not needed for the build, but it's nice to have (it gives us build timing information for build steps), and we should certainly try to minimize differences between our existing builders and the Docker setup.

You can try `x86_64-linux-gnu-gcc -print-prog-name=ld` to see what gcc wants to use for ld.
Mach changes the path to this, where "/builds/slave/m-in-l64-000000000000000000000/build" can be anything ($topsrcdir). This is set by the mozbuild.linux config

PATH=/builds/slave/m-in-l64-000000000000000000000/build/src/gcc/bin:/builds/slave/m-in-l64-000000000000000000000/build/src/gcc/bin:/tools/buildbot/bin:/usr/local/bin:/usr/lib64/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin
We worked through this on IRC--turns out Python's distutils chooses the default compiler from /usr/lib/python2.7/config-x86_64-linux-gnu/Makefile:
>>> import distutils.sysconfig
>>> distutils.sysconfig.get_makefile_filename()
'/usr/lib/python2.7/config-x86_64-linux-gnu/Makefile'

And the default CC winds up being x86_64-linux-gnu-gcc on Ubuntu:
$ python -c "import sysconfig; print sysconfig.get_config_vars('CC')"
['x86_64-linux-gnu-gcc -pthread']

However! distutils honors CC/CXX from the environment:
http://svn.python.org/view/python/branches/release27-maint/Lib/distutils/ccompiler.py?revision=86238&view=markup#l35

but! It doesn't use those for linking, it uses LDSHARED (WTF):
$ python -c "import sysconfig; print sysconfig.get_config_vars('LDSHARED')"
['x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security  ']

So the problem here is that we wind up using our tooltool gcc for compiling, and the system gcc to invoke linking, but it winds up calling ld from our tooltool package (because that's first in $PATH) and they're not compatible. On CentOS LDSHARED just specifies "gcc" as the compiler name, apparently, so we get lucky there.

The simplest fix I could think of was to just stick a x86_64-linux-gnu-gcc -> gcc symlink in the tooltool gcc bin dir, and Morgan confirmed that fixes the problem.
Just to summarize from our voice chat:

 * this is a good fix; specifically we should build a new toolchain and upload it to tooltool, and update the releng.manifest files to point to it

 * installing the toolchain in place of the "system" compiler on Ubuntu would be interesting, but complicated and not a big benefit over the fix ted has proposed

 * but long-term, we'd like to do this, so that experimentation with new compilers would take place by altering the (in-tree) Dockerfile and having a push to try automatically generate a new image from that Dockerfile and build against it.

 * ted's bet for the reason the Mozharness job fails despite mach returning success is that there's a Mozharness OutputParser spotting one of those error lines and converting the build result to error.
This has been "fixed" but the hacky way we're using (symlink) needs to be improved upon. There's already a ticket for that, so I'm just going to reference this bug there. see: https://bugzilla.mozilla.org/show_bug.cgi?id=1164617
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.