Closed Bug 1342503 Opened 3 years ago Closed 3 years ago
Buildbot OS X builder uses system clang/ld during cargo build, fails with internal linker assertion failure
59 bytes, text/x-review-board-request
It looks like buildbot OS X builds are using clang 3.8.0, while taskcluster OS X builds are using clang 3.9.0. I'm seeing a linker error on the buildbot builds  which isn't showing up in the TC builds. Can we upgrade buildbot to also use 3.9.0? https://email@example.com&fromchange=7a09ffb2e287e6270ddec04e41e6c5c0cab757c4&group_state=expanded&selectedJob=80032101
Looks like the relevant tooltool manifests are in-tree, at browser/config/tooltool-manifests/macosx64/releng.manifest and cross-releng.manifest. I'll try updating it and see what happens.
I tried the naive fix of copying the clang and cctools bits of clang.manifest into releng.manifest  and pushed to try, but it failed  while trying to run /usr/bin/ranlib. Before I go any further down this road, Ehsan, do you have any suggestions? I assumed that the clang-3.9.0 build you put together for the static analysis build should also work here.  https://hg.mozilla.org/try/rev/fc202eb466cf246e43382f4eaa0c9aa7df9cfc3b  https://treeherder.mozilla.org/logviewer.html#?job_id=80058724&repo=try&lineNumber=5750
Hey kats, Thanks for taking this on! See bug 1331957 which is where I made this clang work for us for the first time for static analysis builds. :-) I totally know the ranlib error you are seeing. See bug 1331957 comment 16 for where I hit it also and further comments for the fix. This commit is the fix for that issue: https://hg.mozilla.org/mozilla-central/rev/13413bf54f41. Basically you want to upgrade cctools through the tooltool manifest in order to have a ranlib that is able to deal with the binaries clang 3.9 generates, and you need the mozconfig hackery to make things point to the toolchain in the new cctools. Let me know how that works out. If you hit any other issues, you may wanna search that bug to see if I also hit similar issues. Many of the issues I was running into had to do with our clang plugin so I'm hoping your road doesn't end up being so long. I still have most of the details relatively fresh in my memory, so if you needed help with anything don't hesitate to ask!
So it looks like even with clang 3.9.0 I get the same linker error that I was seeing with clang 3.8.0 in comment 1. Here's the try push which is using clang 3.9.0 for the buildbot builds: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d73e273a6c057d93f3b1cdd854d3f5ae81822bd3&selectedJob=80074052 The error text is: 13:30:29 INFO - Compiling mp4parse v0.6.0 (file:///builds/slave/try-m64-0000000000000000000000/build/src/media/libstagefright/binding/mp4parse) 13:30:29 INFO - error: linking with `cc` failed: exit code: 1 13:30:29 INFO - | 13:30:29 INFO - = note: "cc" "-m64" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/build/core-foundation-sys-c08dadd2afb524ac/build_script_build-c08dadd2afb524ac.0.o" "-o" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/build/core-foundation-sys-c08dadd2afb524ac/build_script_build-c08dadd2afb524ac" "-Wl,-dead_strip" "-nodefaultlibs" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/deps" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libstd-f1544d51c14ee547.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/librand-6ce8560490ee791c.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcollections-77c40ab2fac1172e.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libstd_unicode-a98ebaa82aaee358.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-0973ad751bdffbae.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libunwind-30637a1739b412eb.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liballoc-40208fb59386bff5.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liballoc_jemalloc-4b56f5c0b7251555.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liblibc-cba64299ce12485f.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcore-cfc94a4f91ad8df0.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcompiler_builtins-51cf9867f46a760f.rlib" "-l" "System" "-l" "pthread" "-l" "c" "-l" "m" 13:30:29 INFO - = note: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file /SourceCache/ld64/ld64-123.2.1/src/ld/ld.hpp, line 573. 13:30:29 INFO - 0 0x10f28371c __assert_rtn + 76 13:30:29 INFO - 1 0x10f2fc01c ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 172 13:30:29 INFO - 2 0x10f2fea25 ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 3909 13:30:29 INFO - 3 0x10f2faf70 ld::tool::OutputFile::writeOutputFile(ld::Internal&) + 816 13:30:29 INFO - 4 0x10f2f3ab9 ld::tool::OutputFile::write(ld::Internal&) + 153 13:30:29 INFO - 5 0x10f283caa main + 1178 13:30:29 INFO - collect2: ld returned 1 exit status 13:30:29 INFO - error: aborting due to previous error Searching for the error message led me to a number of bugs (bug 1165528 and bug 1188030 and bug 1301001 and bug 1269808) which also showed this assertion. So it seems unlikely that this will be solved by a clang version update. More likely there's a difference in the flags/configuration/environment between buildbot and taskcluster that is causing the build to work in taskcluster but not in buildbot. I'll poke around some more.
Or even more likely, the fact that on TaskCluster we build on Linux and on Buildbot we built on OSX may matter. :-) Note that you can run the taskcluster builds locally under docker which should make debugging super simple.
Since the problem is in the buildbot version rather than the TC version, I'm not sure debugging the TC one will help. I've requested a loaner buildbot slave to debug it there. I tried to reproduce it locally but was unable to, even after using the LDFLAGS that is applied on the "local" OS X builds  (and which isn't used on the TC cross-compiled OS X builds).  http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#32
(In reply to Kartikaya Gupta (email:firstname.lastname@example.org) from comment #6) > Since the problem is in the buildbot version rather than the TC version, I'm > not sure debugging the TC one will help. Oh, sorry I misunderstood the issue. > I've requested a loaner buildbot > slave to debug it there. I tried to reproduce it locally but was unable to, > even after using the LDFLAGS that is applied on the "local" OS X builds  > (and which isn't used on the TC cross-compiled OS X builds). I think that's a great idea. Our OSX builders are ancient and it's totally possible that we are using an old version of some tool in the toolchain (which is actually the issue in comment 2, for example.) I have done some toolchain work and it's never possible to repro our OSX builds locally unless if you have a 10.7 machine lying around.
I reproduced the problem on the loaner, and I think the root cause is that when we are running the cargo build, it invokes clang/ld but doesn't use the clang/ld we specified in the mozconfig. Instead it falls back to the system default of /usr/bin/../llvm-gcc-4.2/bin/i686-apple-darwin11-llvm-gcc-4.2 and /usr/llvm-gcc-4.2/bin/../libexec/gcc/i686-apple-darwin11/4.2.1/ld, which is what dies with the assertion above. I tried manually using the ld from cctools/bin/ld to do the link step that was crashing, and it completed successfully. So I think we need to make the cargo build step smarter in terms of what compiler/linker it delegates to.
Component: General Automation → Build Config
Product: Release Engineering → Core
QA Contact: catlee
Summary: Upgrade buildbot OS X builders to use clang 3.9.0 instead of 3.8.0 → Buildbot OS X builder uses system clang/ld during cargo build, fails with internal linker assertion failure
Whiteboard: [ignore stuff before comment 8]
On IRC :acrichto said we need to set additional environment variables to get cargo/rustc to use the right compiler/linker. Specifically on this machine I need to set the envvar CARGO_TARGET_X86_64_APPLE_DARWIN_LINKER to point to the clang we want to use . That gets me a little further, in that it stops using the cc compiler from /usr/bin and instead uses the clang we want it to use. However, it's still crashing on link because it's using ld from /usr/bin and I *believe* that we want it to use the ld in cctools/bin. It's not clear to me at all how the -B flag at  works, and I can't find any documentation on it. Nathan (or :mshal, since Nathan might be away), do you know how this flag works?  http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#14  http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#19
I think this is the same issue as bug 1329737.
It appears to be, yes. Thanks!
Depends on: 1329737
I verified that the build completes successfully if I hack it up a bit. I added this wrapper script at /builds/slave/try-m64-0000000000000000000000/build/src/clang/bin/clang-wrapper: ------ #!/usr/bin/env bash MYDIR=$(cd -P -- "$(dirname -- "$0")" && printf '%s\n' "$(pwd -P)/") $MYDIR/clang -B$MYDIR/../../cctools/bin $* ------ and then at  I inserted the following lines: ------ CARGO_TARGET_X86_64_APPLE_DARWIN_LINKER="$(topsrcdir)/clang/bin/clang-wrapper" \ CC_x86_64_apple_darwin="$(topsrcdir)/clang/bin/clang-wrapper" \ ------ That tells cargo/rustc to use the wrapper script, which in turns uses the custom linker from tooltool sitting in cctools/bin. In retrospect I might also have gotten it to work by just putting cctools/bin as the first thing on the PATH. Anyhow, the proper fix here seems to be to get rustc to properly use the linker we want it to (or pass options such as -B to the compiler it uses). I tried setting RUSTFLAGS like Alex suggested in bug 1329737 comment 6 but that didn't work.  http://searchfox.org/mozilla-central/rev/4039fb4c5833706f6880763de216974e00ba096c/config/rules.mk#944
Sadly, my patches for bug 1329737 in their current form don't address this, and the error messages I'm getting out of rustc/cargo are unhelpful: https://treeherder.mozilla.org/#/jobs?repo=try&revision=199195f35c6c74417c96cef2c9f6a590576e6a7d It's possible there are bugs in my patches, though.
Bug 1365993 seems to have taken care of this, webrender-enabled buildbot OS X builds work now. Here's a try push with a green buildbot build that I verified runs locally: https://treeherder.mozilla.org/#/jobs?repo=try&revision=621455d6977726df9276ae89977d7f125d7395e3&selectedJob=102597114
Comment on attachment 8872087 [details] Bug 1342503 - Build webrender by default on OS X buildbot builds. https://reviewboard.mozilla.org/r/143588/#review147328 Excellent!
Attachment #8872087 - Flags: review?(nfroyd) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/0593f5f447c8 Build webrender by default on OS X buildbot builds. r=froydnj
You need to log in before you can comment on or make changes to this bug.