Closed Bug 1209937 Opened 7 years ago Closed 5 years ago

Stand up Mac ASAN builds in Taskcluster

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Unassigned)

References

Details

Attachments

(1 file)

This should be pretty straightforward on top of the work in bug 921040.
What's the progress here?
Flags: needinfo?(sdeckelmann)
Hey Coop -- could we get this prioritized for Q4? We're building one-offs to support ongoing fuzzing work, and having it not be automated is productivity bummer.
Flags: needinfo?(sdeckelmann) → needinfo?(coop)
Who owns the ASAN builds? 

Does releng even need to be involved? We have task definitions in-tree for ASAN builds and Mac-on-linux already. Releng could assist here by consulting and providing reviews.
Flags: needinfo?(coop)
I thought release engineering owned official builds. RelEng owns the ASAN on Linux builds after all.
The fuzzing team shouldn't own producing ASAN builds for all of Mozilla. We don't know the details of taskcluster and build generation. It isn't our specialty.
(In reply to Al Billings [:abillings] from comment #4)
> I thought release engineering owned official builds. RelEng owns the ASAN on
> Linux builds after all.

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/builds/releng_sub_linux_configs/64_asan_tc.py
https://dxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/linux64/nightly-asan
https://dxr.mozilla.org/mozilla-central/source/build/unix/mozconfig.asan
https://dxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/macosx64/nightly
https://dxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/macosx64/debug-asan

We've put the configs in-tree so that releng wouldn't need to be a bottleneck. Anyone with try access can iterate on the above set of configs to get a working build.

Releng doesn't have time this quarter to bootstrap this, but as I said above, given a working build, we can get it scheduled.
We've been asking for this for over a year now. Can we get it added to a quarter's goals?
I agree with coop. RelEng can't commit to getting this done in the next few quarters, we have too many projects with higher priority.

The good news is that unlike a year ago, you're no longer blocked by us. Taskcluster should allow you, your team, or anybody else who's interested to self-serve this. We're happy to help give pointers if the above links and the taskcluster documentation aren't sufficient.
(In reply to Al Billings [:abillings] from comment #4)
> I thought release engineering owned official builds. RelEng owns the ASAN on
> Linux builds after all.

It's...a bit fuzzy. Historically RelEng owned standing up new builds because the only way to do so was to grovel around in RelEng-specific Buildbot configuration. With Taskcluster, it's possible to stand up new builds without RelEng involvement at all. It doesn't make much sense to ask RelEng to own and support a bunch of build variants that aren't in the set of things we ship to users.

There's a little bit of a learning curve to get the hang of Taskcluster, but it means that future changes are all self-serve instead of trying to get time from RelEng. The flip side of this is that for RelEng to stand up a new build type, someone in RelEng has to figure out the specifics of how that build works, which doesn't seem like any less effort than someone from another team figuring out Taskcluster.

We've had a lot of success with this model so far. Ehsan stood up OS X static analysis builds on his own, ttaubert stood up a whole NSS CI infra (with their own nss-try repo). I think it's worth having someone on the security team spend the time to try standing the ASAN build up themselves, even if they need some hand-holding to get it going.
I think we need to clarify what needs to be done here, in multiple steps:

1) We need to get ASan builds to run at all using the Linux cross-build toolchain. From what I heard this is not working, :truber will document the failures. Getting this to work might require expertise from multiple teams.

2) Getting a build running in Taskcluster. This is probably the easiest job, from what I understand.

3) Making these builds a Tier 1 build. This is an important step to ensure these builds don't break and from what I know, this is where RelEng needs to step in.
(In reply to Christian Holler (:decoder) from comment #10)
> 3) Making these builds a Tier 1 build. This is an important step to ensure
> these builds don't break and from what I know, this is where RelEng needs to
> step in.

Doesn't require RelEng at all. If you get the job working in Taskcluster and it satisfies the requirements laid out below, nothing additional will need to be done.
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy
My mozconfig is simply:
. $topsrcdir/browser/config/mozconfigs/macosx64/debug-asan
. $topsrcdir/build/macosx/cross-mozconfig.common

This mozconfig and toolchain works if I use the macosx64/debug config instead of debug-asan.

With debug-asan I get the following:
 0:06.04 checking for llvm-symbolizer... /home/truber/src/m/unified/clang/bin/llvm-symbolizer
 0:06.06 configure: error: compiler is incompatible with sanitize options
 0:06.06 DEBUG: <truncated - see config.log for full output>
 0:06.06 DEBUG: 1 error generated.
 0:06.06 DEBUG: configure: failed program was:
 0:06.06 DEBUG: #line 4544 "configure"
 0:06.07 DEBUG: #include "confdefs.h"
 0:06.07 DEBUG:
 0:06.07 DEBUG: int main() {
 0:06.07 DEBUG: return sizeof(__thumb2__);
 0:06.07 DEBUG: ; return 0; }
 0:06.07 DEBUG: configure:4983: checking for llvm-symbolizer
 0:06.07 DEBUG: configure:5215: /home/truber/src/m/unified/clang/bin/clang -target x86_64-apple-darwin10 -mlinker-version=136 -B /home/truber/src/m/unified/cctools/bin -isysroot /home/truber/src/m/unified/MacOSX10.7.sdk -std=gnu99 -o conftest -fsanitize=address  -Qunused-arguments  -fsanitize=address -Wl,-syslibroot,/home/truber/src/m/unified/MacOSX10.7.sdk -Wl,-dead_strip conftest.c  1>&5
 0:06.07 DEBUG: ld: file not found: /home/truber/src/m/unified/clang/bin/../lib/clang/3.8.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib
 0:06.07 DEBUG: clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)
 0:06.07 DEBUG: configure: failed program was:
 0:06.07 DEBUG: #line 5208 "configure"
 0:06.07 DEBUG: #include "confdefs.h"
 0:06.07 DEBUG:
 0:06.07 DEBUG: int main() {
 0:06.07 DEBUG:
 0:06.07 DEBUG: ; return 0; }
 0:06.07 DEBUG: configure: error: compiler is incompatible with sanitize options
 0:06.08 ERROR: old-configure failed
 0:06.10 *** Fix above errors and then restart with\
 0:06.10                "/usr/bin/make -f client.mk build"
 0:06.11 make: *** [client.mk:379: configure] Error 1

The config.log doesn't give any extra detail.

So the asan dylib is not generated. There is a (linux) libasan.so in the clang tar which is not very useful. I'm guessing compiler-rt would have to be cross-compiled using the SDK + cctools? I tried this and was not successful, but I need to go back and document what went wrong.
I don't think you have to cross-compile compiler-rt, you just need an OS X version of it. Does it exist in the clang packages we use for native OS X builds in automation?
https://dxr.mozilla.org/mozilla-central/rev/dc89484d4b45abf442162e5ea2dd46f9de40197d/browser/config/tooltool-manifests/macosx64/releng.manifest#3

If so we can either repack the Linux clang package to include it or package it up separately and include it in the cross-releng.manifest for the cross-compile builds to use.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> I don't think you have to cross-compile compiler-rt, you just need an OS X
> version of it. Does it exist in the clang packages we use for native OS X
> builds in automation?

Thanks, that's a great idea. The asan dylib is in the native OS X clang package.

I unpacked that on linux, and then the cross clang over it. I am able to use this toolchain to cross-compile a simple hello world with -fsanitize=address and it runs on my mac.

Now I am hitting the same error as the native OS X asan build: https://bugzilla.mozilla.org/show_bug.cgi?id=1311129

I added CFLAGS=-v & LDFLAGS += -v in cross-mozconfig.common but they don't make it to the nss rule that's failing.
Note that I'm rewriting how we build NSS in bug 1295937, so if building NSS is the issue you can try on top of those patches to see if it helps.
Note that we don't have the capacity to test TC-built builds yet -- we have a limited hardware pool, so the plan is to run all builds at tier 2, verify in try that tests can run correctly against those builds via buildbot-bridge (BBB), and then "flip" everything TC to tier 1 at the same time, disabling buildbot scheduling of tests in the process.

So for the moment, this bug should target tier-2, without tests enabled.
I managed to get this to build on inbound tip, and confirmed that the result works on macos and crashes are handled by ASan.

Roughly what I did:
* same config as comment #12 but removed --enable-clang-plugin in build/macosx/cross-mozconfig.common
* attached patch to rewrite_asan_dylib.py
* fetch from browser/config/tooltool-manifests/macosx64/asan.manifest
* fetch from browser/config/tooltool-manifests/macosx64/cross-releng.manifest (overlaying clang from previous, except bin/llvm-symbolizer which should be the mac version)
* rebuild dmg & hfsplus for ubuntu (tooltool version failed with a solib version mismatch)
* ./mach build
* cp clang/bin/llvm-symbolizer obj-x86_64-apple-darwin/dist/NightlyDebug.app/Contents/MacOS
* ./mach package

I also did this with an opt build and custom mozconfig and confirmed both worked. I didn't run tests because I'm not sure how that's done for cross builds.

For the patch, I'm not sure why otool is giving @rpath/asan.dylib instead of the absolute path. Is this caused by the LD_LIBRARY_PATH set for ld? If @rpath has to be resolved manually here, I'm not sure how to do it cleanly.

I'm also not sure why LLVM_SYMBOLIZER isn't copied over.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> If so we can either repack the Linux clang package to include it or package
> it up separately and include it in the cross-releng.manifest for the
> cross-compile builds to use.

This is the next step. I had to rebuild the native OSX clang to 3.9 to match the cross compile version after bug 1337233. build-clang.py worked for this with only minimal changes to clang-static-analysis-macosx64.json

The sanitizer dylibs and mach-o llvm-symbolizer need to be added to the cross-releng toolchain for the build to work.

Ted, is updating tooltool something I can do? Where should these dylibs live?
Flags: needinfo?(ted)
You need permissions added to your account to upload to tooltool but it's not onerous. If you have questions about updating the  clang tooltool packages specifically I'd recommend you ask ehsan or froydnj.
Flags: needinfo?(ted)
Done in 1421728.
Status: NEW → RESOLVED
Closed: 5 years ago
Depends on: 1421728
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.