Buildbot OS X builder uses system clang/ld during cargo build, fails with internal linker assertion failure

RESOLVED FIXED in Firefox 55

Status

Firefox Build System
General
RESOLVED FIXED
a year ago
3 months ago

People

(Reporter: kats, Assigned: kats)

Tracking

unspecified
mozilla55
Dependency tree / graph

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [ignore stuff before comment 8])

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

It looks like buildbot OS X builds are using clang 3.8.0, while taskcluster OS X builds are using clang 3.9.0. I'm seeing a linker error on the buildbot builds [1] which isn't showing up in the TC builds. Can we upgrade buildbot to also use 3.9.0?

https://treeherder.mozilla.org/#/jobs?repo=try&author=kgupta@mozilla.com&fromchange=7a09ffb2e287e6270ddec04e41e6c5c0cab757c4&group_state=expanded&selectedJob=80032101
Looks like the relevant tooltool manifests are in-tree, at browser/config/tooltool-manifests/macosx64/releng.manifest and cross-releng.manifest. I'll try updating it and see what happens.
I tried the naive fix of copying the clang and cctools bits of clang.manifest into releng.manifest [1] and pushed to try, but it failed [2] while trying to run /usr/bin/ranlib. Before I go any further down this road, Ehsan, do you have any suggestions? I assumed that the clang-3.9.0 build you put together for the static analysis build should also work here.

[1] https://hg.mozilla.org/try/rev/fc202eb466cf246e43382f4eaa0c9aa7df9cfc3b
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=80058724&repo=try&lineNumber=5750
Flags: needinfo?(ehsan)

Comment 3

a year ago
Hey kats,

Thanks for taking this on!  See bug 1331957 which is where I made this clang work for us for the first time for static analysis builds.  :-)

I totally know the ranlib error you are seeing.  See bug 1331957 comment 16 for where I hit it also and further comments for the fix.  This commit is the fix for that issue: https://hg.mozilla.org/mozilla-central/rev/13413bf54f41.  Basically you want to upgrade cctools through the tooltool manifest in order to have a ranlib that is able to deal with the binaries clang 3.9 generates, and you need the mozconfig hackery to make things point to the toolchain in the new cctools.

Let me know how that works out.  If you hit any other issues, you may wanna search that bug to see if I also hit similar issues.  Many of the issues I was running into had to do with our clang plugin so I'm hoping your road doesn't end up being so long.  I still have most of the details relatively fresh in my memory, so if you needed help with anything don't hesitate to ask!
Flags: needinfo?(ehsan)
So it looks like even with clang 3.9.0 I get the same linker error that I was seeing with clang 3.8.0 in comment 1. Here's the try push which is using clang 3.9.0 for the buildbot builds:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d73e273a6c057d93f3b1cdd854d3f5ae81822bd3&selectedJob=80074052

The error text is:

13:30:29     INFO -     Compiling mp4parse v0.6.0 (file:///builds/slave/try-m64-0000000000000000000000/build/src/media/libstagefright/binding/mp4parse)
13:30:29     INFO -  error: linking with `cc` failed: exit code: 1
13:30:29     INFO -    |
13:30:29     INFO -    = note: "cc" "-m64" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/build/core-foundation-sys-c08dadd2afb524ac/build_script_build-c08dadd2afb524ac.0.o" "-o" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/build/core-foundation-sys-c08dadd2afb524ac/build_script_build-c08dadd2afb524ac" "-Wl,-dead_strip" "-nodefaultlibs" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/obj-firefox/toolkit/library/release/deps" "-L" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libstd-f1544d51c14ee547.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/librand-6ce8560490ee791c.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcollections-77c40ab2fac1172e.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libstd_unicode-a98ebaa82aaee358.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libpanic_unwind-0973ad751bdffbae.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libunwind-30637a1739b412eb.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liballoc-40208fb59386bff5.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liballoc_jemalloc-4b56f5c0b7251555.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/liblibc-cba64299ce12485f.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcore-cfc94a4f91ad8df0.rlib" "/builds/slave/try-m64-0000000000000000000000/build/src/rustc/lib/rustlib/x86_64-apple-darwin/lib/libcompiler_builtins-51cf9867f46a760f.rlib" "-l" "System" "-l" "pthread" "-l" "c" "-l" "m"
13:30:29     INFO -    = note: Assertion failed: (_mode == modeFinalAddress), function finalAddress, file /SourceCache/ld64/ld64-123.2.1/src/ld/ld.hpp, line 573.
13:30:29     INFO -  0  0x10f28371c  __assert_rtn + 76
13:30:29     INFO -  1  0x10f2fc01c  ld::tool::OutputFile::addressOf(ld::Internal const&, ld::Fixup const*, ld::Atom const**) + 172
13:30:29     INFO -  2  0x10f2fea25  ld::tool::OutputFile::applyFixUps(ld::Internal&, unsigned long long, ld::Atom const*, unsigned char*) + 3909
13:30:29     INFO -  3  0x10f2faf70  ld::tool::OutputFile::writeOutputFile(ld::Internal&) + 816
13:30:29     INFO -  4  0x10f2f3ab9  ld::tool::OutputFile::write(ld::Internal&) + 153
13:30:29     INFO -  5  0x10f283caa  main + 1178
13:30:29     INFO -  collect2: ld returned 1 exit status
13:30:29     INFO -  error: aborting due to previous error

Searching for the error message led me to a number of bugs (bug 1165528 and bug 1188030 and bug 1301001 and bug 1269808) which also showed this assertion. So it seems unlikely that this will be solved by a clang version update. More likely there's a difference in the flags/configuration/environment between buildbot and taskcluster that is causing the build to work in taskcluster but not in buildbot. I'll poke around some more.

Comment 5

a year ago
Or even more likely, the fact that on TaskCluster we build on Linux and on Buildbot we built on OSX may matter.  :-)

Note that you can run the taskcluster builds locally under docker which should make debugging super simple.
Since the problem is in the buildbot version rather than the TC version, I'm not sure debugging the TC one will help. I've requested a loaner buildbot slave to debug it there. I tried to reproduce it locally but was unable to, even after using the LDFLAGS that is applied on the "local" OS X builds [1] (and which isn't used on the TC cross-compiled OS X builds).

[1] http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#32

Comment 7

a year ago
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> Since the problem is in the buildbot version rather than the TC version, I'm
> not sure debugging the TC one will help.

Oh, sorry I misunderstood the issue.

> I've requested a loaner buildbot
> slave to debug it there. I tried to reproduce it locally but was unable to,
> even after using the LDFLAGS that is applied on the "local" OS X builds [1]
> (and which isn't used on the TC cross-compiled OS X builds).

I think that's a great idea.  Our OSX builders are ancient and it's totally possible that we are using an old version of some tool in the toolchain (which is actually the issue in comment 2, for example.)

I have done some toolchain work and it's never possible to repro our OSX builds locally unless if you have a 10.7 machine lying around.
I reproduced the problem on the loaner, and I think the root cause is that when we are running the cargo build, it invokes clang/ld but doesn't use the clang/ld we specified in the mozconfig. Instead it falls back to the system default of /usr/bin/../llvm-gcc-4.2/bin/i686-apple-darwin11-llvm-gcc-4.2 and /usr/llvm-gcc-4.2/bin/../libexec/gcc/i686-apple-darwin11/4.2.1/ld, which is what dies with the assertion above. I tried manually using the ld from cctools/bin/ld to do the link step that was crashing, and it completed successfully. So I think we need to make the cargo build step smarter in terms of what compiler/linker it delegates to.
Component: General Automation → Build Config
Product: Release Engineering → Core
QA Contact: catlee
Summary: Upgrade buildbot OS X builders to use clang 3.9.0 instead of 3.8.0 → Buildbot OS X builder uses system clang/ld during cargo build, fails with internal linker assertion failure
Whiteboard: [ignore stuff before comment 8]
On IRC :acrichto said we need to set additional environment variables to get cargo/rustc to use the right compiler/linker. Specifically on this machine I need to set the envvar CARGO_TARGET_X86_64_APPLE_DARWIN_LINKER to point to the clang we want to use [1]. That gets me a little further, in that it stops using the cc compiler from /usr/bin and instead uses the clang we want it to use.

However, it's still crashing on link because it's using ld from /usr/bin and I *believe* that we want it to use the ld in cctools/bin. It's not clear to me at all how the -B flag at [2] works, and I can't find any documentation on it. Nathan (or :mshal, since Nathan might be away), do you know how this flag works?

[1] http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#14
[2] http://searchfox.org/mozilla-central/rev/90d1cbb4fd3dc249cdc11fe5c3e0394d22d9c680/build/macosx/local-mozconfig.common#19
Flags: needinfo?(nfroyd)
Flags: needinfo?(mshal)
I think this is the same issue as bug 1329737.
It appears to be, yes. Thanks!
Depends on: 1329737
Flags: needinfo?(nfroyd)
Flags: needinfo?(mshal)
I verified that the build completes successfully if I hack it up a bit. I added this wrapper script at /builds/slave/try-m64-0000000000000000000000/build/src/clang/bin/clang-wrapper:

------
#!/usr/bin/env bash

MYDIR=$(cd -P -- "$(dirname -- "$0")" && printf '%s\n' "$(pwd -P)/")
$MYDIR/clang -B$MYDIR/../../cctools/bin $*
------

and then at [1] I inserted the following lines:

------
	CARGO_TARGET_X86_64_APPLE_DARWIN_LINKER="$(topsrcdir)/clang/bin/clang-wrapper" \
	CC_x86_64_apple_darwin="$(topsrcdir)/clang/bin/clang-wrapper" \
------

That tells cargo/rustc to use the wrapper script, which in turns uses the custom linker from tooltool sitting in cctools/bin. In retrospect I might also have gotten it to work by just putting cctools/bin as the first thing on the PATH.

Anyhow, the proper fix here seems to be to get rustc to properly use the linker we want it to (or pass options such as -B to the compiler it uses). I tried setting RUSTFLAGS like Alex suggested in bug 1329737 comment 6 but that didn't work.

[1] http://searchfox.org/mozilla-central/rev/4039fb4c5833706f6880763de216974e00ba096c/config/rules.mk#944
Sadly, my patches for bug 1329737 in their current form don't address this, and the error messages I'm getting out of rustc/cargo are unhelpful:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=199195f35c6c74417c96cef2c9f6a590576e6a7d

It's possible there are bugs in my patches, though.
See Also: → bug 1365993
Bug 1365993 seems to have taken care of this, webrender-enabled buildbot OS X builds work now. Here's a try push with a green buildbot build that I verified runs locally: https://treeherder.mozilla.org/#/jobs?repo=try&revision=621455d6977726df9276ae89977d7f125d7395e3&selectedJob=102597114
Depends on: 1365993
See Also: bug 1365993
Assignee: nobody → bugmail
Comment hidden (mozreview-request)

Comment 16

a year ago
mozreview-review
Comment on attachment 8872087 [details]
Bug 1342503 - Build webrender by default on OS X buildbot builds.

https://reviewboard.mozilla.org/r/143588/#review147328

Excellent!
Attachment #8872087 - Flags: review?(nfroyd) → review+

Comment 17

a year ago
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0593f5f447c8
Build webrender by default on OS X buildbot builds. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/0593f5f447c8
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55

Updated

3 months ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.