Note: There are a few cases of duplicates in user autocompletion which are being worked on.

stylo build-script-build hung on mac

NEW
Assigned to

Status

()

Core
Build Config
P1
normal
2 months ago
3 days ago

People

(Reporter: rillian, Assigned: rillian)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

A couple of times building with --enable-stylo recently I've had bindgen's build script hang. it sits there not making progress for up to an hour. The previous time I had to `kill -9` the process.

This time I attached with lldb, but didn't get much from a stack trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001054b2f46 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x000000010560d6e5 libsystem_pthread.dylib`pthread_join + 425
    frame #2: 0x0000000104dada08 build-script-build`build_script_build::main::hc1b8957df3d0a5f7 + 5448
    frame #3: 0x0000000105063c56 build-script-build`std::panicking::try::do_call<fn(),()> at panicking.rs:454 [opt]
    frame #4: 0x0000000105064ecb build-script-build`panic_unwind::__rust_maybe_catch_panic at lib.rs:98 [opt]
    frame #5: 0x0000000105064397 build-script-build`std::rt::lang_start [inlined] std::panicking::try<(),fn()> at panicking.rs:433 [opt]
    frame #6: 0x000000010506436b build-script-build`std::rt::lang_start [inlined] std::panic::catch_unwind<fn(),()> at panic.rs:361 [opt]
    frame #7: 0x000000010506436b build-script-build`std::rt::lang_start at rt.rs:57 [opt]
    frame #8: 0x0000000104d908b4 build-script-build`start + 52

Does __rust_maybe_catch_panic mean it's panic'd and the unwind got stuck, or is that just the normal handler? This is $objdir/toolkit/library/release/build/style-db4933fa280ec189/build-script-build

Other threads look like libclang failed, so I wonder if this is different manifestation of bug 1368083.

* thread #1: tid = 0x5b77d, 0x00000001054b2f46 libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x5b793, 0x000000010724e7de libclang.dylib`llvm::CrashRecoveryContext::~CrashRecoveryContext() + 158, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  thread #3: tid = 0x5b794, 0x000000010724e937 libclang.dylib`llvm::CrashRecoveryContext::unregisterCleanup(llvm::CrashRecoveryContextCleanup*) + 23, stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  thread #4: tid = 0x5b795, 0x000000010724e937 libclang.dylib`llvm::CrashRecoveryContext::unregisterCleanup(llvm::CrashRecoveryContextCleanup*) + 23, stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
I've seen this locally on my macbook pro running 10.12.5, but it may also have occurred on the integration build machines in https://treeherder.mozilla.org/#/jobs?repo=try&revision=a92424ea8e26bb0da4230f66bacc9467d68e2b1a&selectedJob=102119358
I should also say this is intermittant or a recoverable failure for the build system state. Restarting the build can continued to success.
Another backtrace, this time from the macOS 10.10 loaner.

(lldb) bt
* thread #1: tid = 0x12d87, 0x00007fff89e5e48a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff89e5e48a libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff92d3e01d libsystem_pthread.dylib`pthread_join + 445
    frame #2: 0x0000000104bac048 build-script-build`build_script_build::main [inlined] std::thread::{{impl}}::join<()> + 65 at mod.rs:902
    frame #3: 0x0000000104bac007 build-script-build`build_script_build::main [inlined] std::thread::{{impl}}::join<()> + 42 at mod.rs:996
    frame #4: 0x0000000104babfdd build-script-build`build_script_build::main + 3360 at build_gecko.rs:564
    frame #5: 0x0000000104bab2bd build-script-build`build_script_build::main [inlined] build_script_build::build_gecko::generate + 195 at build_gecko.rs:600
    frame #6: 0x0000000104bab1fa build-script-build`build_script_build::main + 1786 at build.rs:90
    frame #7: 0x0000000104e62d36 build-script-build`std::panicking::try::do_call<fn(),()> + 6 at panicking.rs:454
    frame #8: 0x0000000104e63fab build-script-build`panic_unwind::__rust_maybe_catch_panic + 27 at lib.rs:98
    frame #9: 0x0000000104e63477 build-script-build`std::rt::lang_start [inlined] std::panicking::try<(),fn()> + 44 at panicking.rs:433
    frame #10: 0x0000000104e6344b build-script-build`std::rt::lang_start [inlined] std::panic::catch_unwind<fn(),()> at panic.rs:361
    frame #11: 0x0000000104e6344b build-script-build`std::rt::lang_start + 347 at rt.rs:57
    frame #12: 0x0000000104b8eef4 build-script-build`start + 52

(lldb) thread list
Process 51934 stopped
* thread #1: tid = 0x12d87, 0x00007fff89e5e48a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x12dcc, 0x000000010784e937 libclang.dylib`llvm::CrashRecoveryContext::unregisterCleanup(llvm::CrashRecoveryContextCleanup*) + 23, stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  thread #3: tid = 0x12dcd, 0x000000010784e937 libclang.dylib`llvm::CrashRecoveryContext::unregisterCleanup(llvm::CrashRecoveryContextCleanup*) + 23, stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
I've seen build-script processes lingering in the background after a Stylo build on my local machine.
Blocks: 1356991
Priority: -- → P1
So these seem to be because clang crashes... We run clang in parallel for the different tasks (https://github.com/servo/servo/blob/master/components/style/build_gecko.rs#L557). I bet disabling that bit and making it run sequentially "fixes" it... Though it seems like an upstream bug.
Yes, it might be chrome crashes, or our triggering chrome crashes by feeding it bogus source...but we do need a work around or a fix to unblock stylo. Just failing promptly rather than leaving a process in busy-wait would be an improvement.

I will take a look in a day or two if no one else can.
(In reply to Ralph Giles (:rillian) | needinfo me from comment #6)
> Yes, it might be chrome crashes, or our triggering chrome crashes by feeding
> it bogus source...but we do need a work around or a fix to unblock stylo.
> Just failing promptly rather than leaving a process in busy-wait would be an
> improvement.
> 
> I will take a look in a day or two if no one else can.

If it's only on one platform, we can disable the parallelism there. I don't have a Mac though, so if someone could test that works, that'd be great.
FWIW, when we don't hang, stylo can build ok. https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c8e7ec248dec45d9cf7a2a3ea68a5a2970bcfdf&selectedJob=102916206

Suggests this is different from bug 1368083.
I guess we can add another environment variable to control whether bindgen is run in parallel. I sometimes saw weird clang crash on Windows as well. If libclang cannot reliably run in parallel, we probably need to either accept it and run it sequentially, or try to spawn independent processes for it.
I guess for gecko build on CI, we should probably just run it sequentially, because running it parallelly may not really make anything faster. Parallelism is only useful when we are doing an incremental rebuild for changes to some headers.
Created attachment 8873130 [details] [diff] [review]
Add STYLO_SERIAL_CLANG

Something like this?
Assignee: nobody → giles
Attachment #8873130 - Flags: review?(xidorn+moz)
Hmm, adding the env to my mozconfig with this patch

>  export STYLO_SERIAL_CLANG=1

I still get the hang with libclang in CrashRecoveryContextCleanup.
That probably means it isn't the parallelism to be blamed, and we probably don't need to add this env...
Glandium suggested:

> mk_add_options STYLO_SERIAL_CLANG=1

Which may have worked, because now lldb says there's only one stalled clang thread:

>  thread #1: tid = 0x5f6d34, 0x000000010f952f46 libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread'
> * thread #2: tid = 0x5f6d62, 0x0000000111a4e937 libclang.dylib`llvm::CrashRecoveryContext::unregisterCleanup(llvm::CrashRecoveryContextCleanup*) + 23, stop reason = EXC_BAD_ACCESS (code=1, address=0x20)

OTOH, that was the only spawn() I found in the build scripts. I thought maybe the thread encapsulation is inside libclang or the binding, but no:

Thread #1 is in pthread_join called from build-script-build::main.
Thread #2 is clearly pthread running a closure which calls into build_gecko::bindings::generate_structs

I must be missing something.
I don't think the build script should spawn any thread if we take the sequential branch.

Also, by comment 13 I mean, probably clang crashes not because of it is run in parallel, but some other issue.

BTW what version of clang are you using? It seems mach bootstrap nowadays installs 5.0.0.
Comment on attachment 8873130 [details] [diff] [review]
Add STYLO_SERIAL_CLANG

Review of attachment 8873130 [details] [diff] [review]:
-----------------------------------------------------------------

If making it run sequentially doesn't fix this issue... then we probably don't need this.
Attachment #8873130 - Flags: review?(xidorn+moz)
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #15)

> BTW what version of clang are you using? It seems mach bootstrap nowadays
> installs 5.0.0.

Really? I ran bootstrap yesterday and

> $ ~/.mozbuild/clang/bin/llvm-config --version
> 3.9.0
I see. mach bootstrap downloads the version from browser/config/tooltool-manifests/*/clang.manifest, with is 5.0.0 on Windows, and 3.9.0 on macOS and Linux.

> browser/config/tooltool-manifests/linux64/clang.manifest:    "version": "clang 3.9.0",
> browser/config/tooltool-manifests/macosx64/clang.manifest:    "version": "clang 3.9.0",
> browser/config/tooltool-manifests/win32/clang.manifest:    "version": "clang 5.0pre/r293859",
> browser/config/tooltool-manifests/win64/clang.manifest:    "version": "clang 5.0pre/r293859",
Xidorn, do you remember if you've seen the clang crash on Windows since we switched to llvm 5?

I'm still confused by the attached patch didn't work, but moving the conditional outside the macro, I get proper unwinding. I'm wondering if we just need to update libclang.

> 8:25.69    Compiling style v0.0.1 (file:///Users/giles/firefox/servo/components/style)
> 8:39.27 error: failed to run custom build command for `style v0.0.1 (file:///Users/giles/firefox/servo/components/style)`
> 8:39.27 process didn't exit successfully: `/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/toolkit/library/release/build/style-db4933fa280ec189/build-script-build` (signal: 11, SIGSEGV: invalid memory reference)
> [...]
> 8:39.46 libc++abi.dylib: Pure virtual function called!
> 8:39.46 libclang: crash detected during parsing: {
> 8:39.46   'source_filename' : ''
> 8:39.46   'command_line_args' : ['clang', '-I', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include', '-I', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nspr', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla-config.h', '-DDEBUG=1', '-DJS_DEBUG=1', '-x', 'c++', '-std=c++14', '-DTRACING=1', '-DIMPL_LIBXUL', '-DMOZ_STYLO_BINDINGS=1', '-DMOZILLA_INTERNAL_API', '-DRUST_BINDGEN', '-DMOZ_STYLO', '-DOS_POSIX=1', '-DOS_MACOSX=1', '-stdlib=libc++', '--target=x86_64-apple-darwin', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsStyleStruct.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ServoPropPrefList.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/StyleAnimationValue.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/gfxFontConstants.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsThemeConstants.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/dom/AnimationEffectReadOnlyBinding.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/dom/KeyframeEffectBinding.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/AnimationPropertySegment.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ComputedTiming.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ComputedTimingFunction.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/Keyframe.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ServoElementSnapshot.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ServoElementSnapshotTable.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/dom/Element.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/dom/NameSpaceConstants.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/LookAndFeel.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/mozilla/ServoBindings.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsCSSCounterStyleRule.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsCSSFontFaceRule.h', '-include', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsMediaFeatures.h', '/Users/giles/firefox/obj-x86_64-apple-darwin16.6.0/dist/include/nsMediaList.h'],
> 8:39.46   'unsaved_files' : [],
> 8:39.46   'options' : 1,
> 8:39.46 }
(In reply to Ralph Giles (:rillian) | needinfo me from comment #19)
> Xidorn, do you remember if you've seen the clang crash on Windows since we
> switched to llvm 5?

Since 5.0, I don't think so, but note that I just switched to 5.0 several days ago, and the crash wasn't that frequent before that.

Before using 5.0, I had also been using 4.0 for quite a while, and IIRC I don't see any crash either, so probably you can try 4.0 first.

> I'm still confused by the attached patch didn't work, but moving the
> conditional outside the macro, I get proper unwinding. I'm wondering if we
> just need to update libclang.

Yeah, probably try updating to 4.0 first.
I've not been able to reproduce the libclang issue pointing LLVM_CONFIG at a local build of today's clang 5 master branch. I bet that won't run on the macOS 10.7 builders though...

I'll experiment with building 4.0 and pushing to try.
llvm/clang 4.0.0 complains about missing libatomic in the macOS 10.7 target. CMake is successful for 10.9, it looks like we're up against the age of the mac builders again. Adding bug 1368144 as blocker.

I'll try 3.9.1 in the meantime.
Depends on: 1368144
3.9.1 and 4.0.0 both report the libclang crash instead of hanging, so the bug is only half fixed there. Like 4.0.0, the 3.9.1 release doesn't build against the macOS 10.7 sdk, so we'd again need to update the build machines or get cross-compilation working.
I don't have macOS 10.7 handy, but if this crash is reproducible and I was given ssh access to a macOS 10.7 build machine, I could try running C-Reduce to create a minimal, isolated test case that crashes. This would allow us to pinpoint the problematic interactions and descide if we can add a work around to bindgen.
A C-Reduce case would be excellent!

I've been able to reproduce on macOS 10.12.5. Run `./mach boostrap`, say '1' to downloading clang for servo, and add the suggested snippet to your mozconfig.

Then do a clobber build, and attempt reproduction by touching $topsrcdir/servo/components/style/build_gecko.rs and rebuilding. Look for a hung `build-script-build` process.

I've just reproduced with a few test patches on top of git commit b1eb11c773f58a8af38574e4791fef2d601d7ae0 which looks like hg rev 57ed35189e19.
(In reply to Ralph Giles (:rillian) | needinfo me from comment #25)
> Then do a clobber build, and attempt reproduction by touching
> $topsrcdir/servo/components/style/build_gecko.rs and rebuilding. Look for a
> hung `build-script-build` process.

If this is a clang crash, you don't need to touch build_gecko.rs. Touching layout/style/ServoBindings.toml would be enough. That would just trigger the build script to rerun without building anything else.
I've not been able to reproduce with clang 4.0.1rc2, which is good news; there should be a stable release we can upgrade to soon, and there's a smaller section of changes (4.0.0 to 4.0.1) to search for the fix.
I've not been able to reproduce on 10.11.6 El Captitan. Does `./mach build` always use the libclang `./mach bootstrap` installs? I also have a source build of clang's 3.9.X branch on my system, which is what I use normally with bindgen.
`./mach build` will use libclang from whatever llvm-config is in-path. You can force the one `./mach bootstrap` installs by adding

> export LLVM_CONFIG=/Users/fitzgen/.mozbuild/clang/bin/llvm-config

to your mozconfig.

I can reproduce 50-80% of the time, and I don't understand why the attached patch doesn't work either. For a while I thought just turning on logging fixed it, but it seems not.
Ok! I think I've reproduced it: https://gist.github.com/fitzgen/13df5818f732ecc12e52f23ef9657a6c
Created attachment 8876275 [details]
stylo-preprocessed.hpp

This is the post-preprocessing of the C++ code that stylo is invoking bindgen on.

However, I'm not yet able to reproduce the hang nor crash by invoking bindgen directly, outside of `mach build`. Still working on it.
I haven't been able to reproduce the last couple of days either. It's even worked on a try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=26416942350c88dc75aba23b71fb24f36385867f&selectedJob=105905469

Maybe we rearranged the code enough this week to not trigger the bug?
If I use the homebrew libclang 4, it doesn't reproduce. If I use the libclang 4 that you shared with me, the crash does reproduce. If I use the libclang 3.9 I built from source, it hangs without crashing.
Duplicate of this bug: 1373325
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #33)
> If I use the homebrew libclang 4, it doesn't reproduce. If I use the
> libclang 4 that you shared with me, the crash does reproduce. If I use the
> libclang 3.9 I built from source, it hangs without crashing.

It apparently hangs with the mac clang that we install from `mach bootstrap`, too.  If you were able to reproduce the hang, do you have a plan for fixing it?
Flags: needinfo?(nfitzgerald)
Nick, were you able to find a minimal bindgen test case using c-reduce that hangs clang?

Now that RelEng is moving the Mac builds from the 10.7 builders to Linux cross-compiled builds, we will no longer be stuck with libclang 3.9. If we can avoid these hangs and crashes by upgrading to whatever version of libclang 4 that homebrew is using, we can do that.
My plan was to

1. Use `clang -save-temps` with the same flags that bindgen is ultimately passing to libclang to get the preprocessed header file that bindgen is being run on
2. Reproduce the hang/crash with this giant preprocessed header file when invoking bindgen directly
3. Run C-Reduce to create a reduced test case that reproduces the hang/crash, and would hopefully make creating a fix/workaround easy

However, I was never able to complete step 2: bindgen would always complete successfully when invoked on the preprocessed header file and outside of m-c's build.

Further explorations might try and disentangle invoking bindgen outside of m-c's build and invoking bindgen on the preprocessed header and seeing which (or both) of those things is problematic here. If invoking bindgen on a preprocessed header is enough to make the bug go away, then C-Reduce is pretty much off the table, which would be a big bummer.

I don't have any intentions of personally digging deeper right now; I'd need to talk to my manager about my priorities and where my time should go.

Happy to answer questions / have ideas bounced off me, though.
Flags: needinfo?(nfitzgerald)
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #37)
> Further explorations might try and disentangle invoking bindgen outside of
> m-c's build and invoking bindgen on the preprocessed header and seeing which
> (or both) of those things is problematic here. If invoking bindgen on a
> preprocessed header is enough to make the bug go away, then C-Reduce is
> pretty much off the table, which would be a big bummer.

Interesting.

Ralph, what do you recommend we do now? Could this be a problem with the way our build system or clang-sys invokes libclang and thus not hit when Nick's runs bindgen directly? Why might the libclang 4 build you gave Nick hang, but homebrew's libclang 4 doesn't hang?

The Mac cross builds should be running next week (bug 1368144), so we should be free to update from clang 3.9 to 4.0.x then.
Flags: needinfo?(giles)
I think we should resolve this bug by updating the cross-builds to clang 4.0.1, or if that causes problems packaging clang 4.0.1 separately for bindgen's use.

I was waiting for the releng transition before verifying that solution. Then I was going to move on to bug 1368083 which may also block us on the cross builds.

It would be nice to know what's going on here, and I don't understand why the homebrew build would work while my build and the one in tooltool don't. Different MACOSX_DEPLOYMENT_TARGET maybe? But it's intermittent and I suspect it's not worth investing further resources once it's off the critical path for stylo.
Flags: needinfo?(giles)
(In reply to Ralph Giles (:rillian) | needinfo me from comment #39)
> I think we should resolve this bug by updating the cross-builds to clang
> 4.0.1, or if that causes problems packaging clang 4.0.1 separately for
> bindgen's use.

This hang also affects native Mac users using the clang we give them from tooltool.  So we have to come up with a solution there as well.
(In reply to Nathan Froyd [:froydnj] from comment #40)
> (In reply to Ralph Giles (:rillian) | needinfo me from comment #39)
> > I think we should resolve this bug by updating the cross-builds to clang
> > 4.0.1, or if that causes problems packaging clang 4.0.1 separately for
> > bindgen's use.
> 
> This hang also affects native Mac users using the clang we give them from
> tooltool.  So we have to come up with a solution there as well.

Should we update native Mac builds to clang 4.0.1rc now, in anticipation 4.0.1 will fix these hangs for both native and cross builds?
This is easiest to do after the releng transition. Right now updating the mac manifest `mach bootstrap` uses would break the native macOS 10.7 integration builds.

We could also switch to homebrew's llvm, apparently.
(In reply to Chris Peterson [:cpeterson] from comment #41)
> (In reply to Nathan Froyd [:froydnj] from comment #40)
> > (In reply to Ralph Giles (:rillian) | needinfo me from comment #39)
> > > I think we should resolve this bug by updating the cross-builds to clang
> > > 4.0.1, or if that causes problems packaging clang 4.0.1 separately for
> > > bindgen's use.
> > 
> > This hang also affects native Mac users using the clang we give them from
> > tooltool.  So we have to come up with a solution there as well.
> 
> Should we update native Mac builds to clang 4.0.1rc now, in anticipation
> 4.0.1 will fix these hangs for both native and cross builds?

That would be ideal.  As Ralph noted in comment 22 and comment 23, though, clang 4.0.x doesn't build against the Mac 10.7 SDK.  I guess we *could* try hacking the source to make the libatomic dependency go away, but it's possible that is an exercise in yak hairdressing.

If we wanted to avoid anything to do with yaks, we'd have to wait for 10.10 build machines to use clang 4.0.x in Mac native builds.

If cross builds are happening and the change sticks, we can upgrade cross builders there.  Our scripts to compile clang for Mac native also use a 10.10 SDK, so we *could* compile clang 4.0.1 on our infrastructure, upload the resulting package to tooltool, and then have mach bootstrap install that package.  (Note that we'd need a separate manifest to avoid the problem in comment 42.)  That would ideally solve the issues that led to bug 1373325.

Alternatively, since 3.9.1 crashes outright, we might be able to work around the crash in bindgen somehow?
(In reply to Nathan Froyd [:froydnj] from comment #43)
> (In reply to Chris Peterson [:cpeterson] from comment #41)
> > (In reply to Nathan Froyd [:froydnj] from comment #40)
> > > (In reply to Ralph Giles (:rillian) | needinfo me from comment #39)
> > > > I think we should resolve this bug by updating the cross-builds to clang
> > > > 4.0.1, or if that causes problems packaging clang 4.0.1 separately for
> > > > bindgen's use.
> > > 
> > > This hang also affects native Mac users using the clang we give them from
> > > tooltool.  So we have to come up with a solution there as well.
> > 
> > Should we update native Mac builds to clang 4.0.1rc now, in anticipation
> > 4.0.1 will fix these hangs for both native and cross builds?
> 
> That would be ideal.  As Ralph noted in comment 22 and comment 23, though,
> clang 4.0.x doesn't build against the Mac 10.7 SDK.  I guess we *could* try
> hacking the source to make the libatomic dependency go away, but it's
> possible that is an exercise in yak hairdressing.

Sorry, I forgot about that clang 4.0.x doesn't run on the 10.7 builders. No need for us to hack anything up here when the proper cross builds should arrive next week!
Blocks: 1375774
No longer blocks: 1356991
Depends on: 1377214
I can still see this hang with Mac clang 4.0.1 builds, built in automation, running on my local machine. :(
Depends on: 1379341
clang 4.0 is installed on Linux by ./mach bootstrap, and it segfaults in the linux32 build that I'm trying to use. I'm having a ridiculous time trying to get a clang 4.0.1 build working, since it's not a packaged release for Ubuntu yet.
Does the tooltool build from https://reviewboard.mozilla.org/r/153388/diff/2/ work?
No, since those are linux64 binaries.
ah, you're actually building on linux32. ok.
You need to log in before you can comment on or make changes to this bug.