Closed Bug 1301751 Opened 8 years ago Closed 8 years ago

llvm-dsymutil crashing on XUL

Categories

(Firefox Build System :: General, defect, P1)

defect

Tracking

(firefox50- fixed, firefox51- fixed, firefox52 fixed)

RESOLVED FIXED
mozilla52
Tracking Status
firefox50 - fixed
firefox51 - fixed
firefox52 --- fixed

People

(Reporter: ted, Assigned: ted)

References

Details

Attachments

(2 files)

Yesterday's Mac nightly failed to run llvm-dsymutil on XUL, because llvm-dsymutil crashed:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=nightly&selectedJob=4904533

05:04:39     INFO -  0  llvm-dsymutil     0x0000000105dc768b llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 43
05:04:39     INFO -  1  llvm-dsymutil     0x0000000105dc6dd6 llvm::sys::RunSignalHandlers() + 70
05:04:39     INFO -  2  llvm-dsymutil     0x0000000105dc7e37 _ZL13SignalHandleri + 519
05:04:39     INFO -  3  libsystem_c.dylib 0x00007fff96953cfa _sigtramp + 26
05:04:39     INFO -  4  libsystem_c.dylib 0x00007fff969185c3 szone_free_definite_size + 1815
05:04:39     INFO -  5  llvm-dsymutil     0x0000000105a3f32a llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 2858
05:04:39     INFO -  6  llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  7  llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  8  llvm-dsymutil     0x0000000105a3f32a llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 2858
05:04:39     INFO -  9  llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  10 llvm-dsymutil     0x0000000105a3f32a llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 2858
05:04:39     INFO -  11 llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  12 llvm-dsymutil     0x0000000105a3f32a llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 2858
05:04:39     INFO -  13 llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  14 llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  15 llvm-dsymutil     0x0000000105a3f41f llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 3103
05:04:39     INFO -  16 llvm-dsymutil     0x0000000105a3ac1d llvm::dsymutil::(anonymous namespace)::DwarfLinker::link(llvm::dsymutil::DebugMap const&) + 8573
05:04:39     INFO -  17 llvm-dsymutil     0x0000000105a384be llvm::dsymutil::linkDwarf(llvm::StringRef, llvm::dsymutil::DebugMap const&, llvm::dsymutil::LinkOptions const&) + 1086
05:04:39     INFO -  18 llvm-dsymutil     0x0000000105a2d70e main + 1982
05:04:39     INFO -  19 llvm-dsymutil     0x0000000105a2ce64 start + 52
05:04:39     INFO -  Stack dump:
05:04:39     INFO -  0.	Program arguments: /builds/slave/m-cen-m64-ntly-000000000000000/build/src/clang/bin/llvm-dsymutil --arch=i386 --arch=x86_64 dist/universal/firefox/FirefoxNightly.app/Contents/MacOS/XUL
05:04:39     INFO -  56337: Error running dsymutil: Command '['/builds/slave/m-cen-m64-ntly-000000000000000/build/src/clang/bin/llvm-dsymutil', '--arch=i386', '--arch=x86_64', 'dist/universal/firefox/FirefoxNightly.app/Contents/MacOS/XUL']' returned non-zero exit status -11

This is probably from my patch in bug 1300152, which added a tiny bit more rust code to the library. This is bad because it made us fail to upload symbols for libxul, which is bad for crash reporting:
https://crash-stats.mozilla.com/report/index/e5f4134f-28b7-4504-9b52-61ee82160909
I believe bholley ran into this recently and "fixed" by either upgrading down downgrading Xcode, although I don't know the specifics unfortunately :(

Some more info may be here: https://github.com/rust-lang/rust/issues/36185
That patch got backed out, so this won't be an issue for tomorrow's nightly.
We're building with llvm-dsymutil from the clang we use, which is pretty recent:
https://dxr.mozilla.org/mozilla-central/rev/176aff980979bf588baed78c2824571a6a7f2b96/browser/config/tooltool-manifests/macosx64/releng.manifest#3

Unfortunately it's tricky to do diagnostics here because we'd basically need to grab a build machine loaner and run commands on it.
bholley: do you remember what you did to fix your lldb issues?
Flags: needinfo?(bobbyholley)
So this is unfortunate. njn noted that our crash reports for the past few days were missing symbols on OS X, and it looks like we're getting this same crash during buildsymbols. I'll fix up a patch to make this failure turn the build red, but I think what's going to happen is that our mac builds are going to turn perma-red.

This log snippet is from the log linked here:
https://tools.taskcluster.net/index/artifacts/#gecko.v2.mozilla-central.nightly.2016.09.15.revision.29af101880db7ce7f5f87f58e1ff20988c1c5fc3.firefox/gecko.v2.mozilla-central.nightly.2016.09.15.revision.29af101880db7ce7f5f87f58e1ff20988c1c5fc3.firefox.macosx64-opt
njn said this broke again in the 2016-09-15 nightly, so I looked at the regression range from the previous nightly to that:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=82d0a583a9a39bf0b0000bccbf6d5c9ec2596bcc&tochange=29af101880db7ce7f5f87f58e1ff20988c1c5fc3

froydnj noted that bug 1301065 landed in that range. It is totally plausible that that is causing this bustage.
Very small change to have triggered the buildsymbols crash! FWIW I can't reproduce running dump_syms on the test executable in https://github.com/mozilla/mp4parse-rust/tree/master/mp4parse_capi/examples
Depends on: 1304042
The bug I filed was https://github.com/rust-lang/rust/issues/36185
Flags: needinfo?(bobbyholley)
(In reply to Ralph Giles (:rillian) needinfo me from comment #7)
> Very small change to have triggered the buildsymbols crash! FWIW I can't
> reproduce running dump_syms on the test executable in
> https://github.com/mozilla/mp4parse-rust/tree/master/mp4parse_capi/examples

The error is when we run llvm-dsymutil on the XUL library:
05:04:39     INFO -  56337: Error running dsymutil: Command '['/builds/slave/m-cen-m64-ntly-000000000000000/build/src/clang/bin/llvm-dsymutil', '--arch=i386', '--arch=x86_64', 'dist/universal/firefox/FirefoxNightly.app/Contents/MacOS/XUL']' returned non-zero exit status -11

Presumably there's something about the generated DWARF that it's choking on.

I don't think it's specific to your Rust code, I think it was just a latent bug waiting to be triggered. My tiny little patch to add `intentional_panic` also tripped this. We still might have to back your patch out because right now we don't have symbols for libxul in mac nightlies, which is pretty bad.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> We still might have to back your patch out because right
> now we don't have symbols for libxul in mac nightlies, which is pretty bad.

That sounds prudent!
For someone who has no experience building Firefox, how would I reproduce this issue the easiest way? I've downloaded and successfully built FF with XCode 8, according to these instructions:
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Mac_OS_X_Prerequisites

Does my clone of the repository contain the revision that exposes the bug? If so, where do I find the change-set ID of the revision?
It's happening on current mozilla-central, but I haven't reproduced it locally yet. I was trying to get a working build using the same toolchain we use in CI but I could not get it to work. I'm going to try rebuilding with my local toolchain and the Rust toolchain from CI and see if I can reproduce that with the llvm-dsymutil from CI...
OK, I was able to reproduce this locally. I used the rustc we use in CI, which is just stock Rust 1.11, and the clang that came with my XCode. I'm on OS X 10.11.6. If you do a Firefox build, you'll need to include `ac_add_options --enable-rust` in your mozconfig, since it's not enabled by default. With that, once you have a successful build you should be able to run `llvm-dsymutil --arch=x86_64 $objdir/dist/bin/XUL` to reproduce the failure.

If you haven't explicitly set MOZ_OBJDIR in your mozconfig it'll be `$topobjdir/obj-$target`.
The llvm-dsymutil we're using in CI is:
LLVM (http://llvm.org/):
  LLVM version 3.8.0
  Optimized build.
  Built Apr 11 2016 (18:23:44).
  Default target: x86_64-apple-darwin15.6.0
  Host CPU: haswell

The plain `dsymutil` from Xcode on my machine also crashes when run against that XUL binary, with a less verbose "Segmentation fault: 11".
mw, let me know if the fix you come up with here seems likely to fix https://github.com/rust-lang/rust/issues/36185 . If not, I'll go ahead and create a repro case for you.
Hm, I've not been able to reproduce this so far. I think, I using all the same program versions as you, Ted:

- OSX 10.11.6
- llvm-dsymutil from LLVM 3.8.0
- clang-800.0.38 (comes with XCode 8)
- rustc 1.11.0 (stable)

Running llvm-dsymutil on XUL finishes without crashing. Same for system dsymutil.

I've tried this with the following configurations in .mozconfig:

(1) ac_add_options --enable-debug

(2) ac_add_options --enable-debug-symbols

(3) ac_add_options --enable-debug-symbols
    ac_add_options --enable-optimize

Is there anything that I'm missing? Maybe I'm on the wrong branch or something?
I'm building off of changeset 62f79d676e0e, which is from yesterday. My clang is "Apple LLVM version 7.3.0 (clang-703.0.31)". The only thing I did that was slightly odd was to copy an older SDK and use that, I used a 10.7 SDK because that's what our official builds are done with. I'll try without that and see if I can still reproduce.
OK, I can still reproduce when building with the default (10.11) SDK. My mozconfig is just:
```
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../automation-firefox
. "$topsrcdir/build/mozconfig.rust"
```

The second line just sets --enable-rust and uses the rust/cargo we use in CI, but you should be able to just `ac_add_options --enable-rust` if you have a system rustc.

(In reply to michaelwoerister from comment #16)
> I've tried this with the following configurations in .mozconfig:
> 
> (1) ac_add_options --enable-debug

Don't use this, we haven't seen this problem on debug builds! I suspect it's because we use this to select the cargo profile.

> (2) ac_add_options --enable-debug-symbols
> 
> (3) ac_add_options --enable-debug-symbols
>     ac_add_options --enable-optimize

These options are both on-by-default.
Priority: -- → P1
mw said he was able to reproduce this by using clang 3.8.0 (it didn't reproduce with the clang from Xcode 8). However, he wasn't able to reproduce when using beta rust. I tried beta rust locally and it made the problem go away. Since beta becomes stable next week (1.12), I think we should just update to 1.12 on OS X to fix this.
Depends on: 1304815
[Tracking Requested - why for this release]: This bug is causing us to not have symbols for libxul on OS X, which makes our crash reports basically useless. bug 1304815 should be the simple fix, but it got backed out on inbound for other bustage. If we can't get a simple fix on central that we can uplift we'll have to figure out some other way to fix Aurora so that we can fix crash reporting before we ship.
Ted: Have you ever tried if compiling the Rust code with -Cdebuginfo=1 instead of -g sidesteps the issue? You would lose debuginfo for types, local variables, and function arguments, but you should be able to get proper backtraces.

That being said, updating the Rust compiler version would still be a better fix.
That seems to work in a local build. I'll upload a patch.
Comment on attachment 8795237 [details]
bug 1301751 - work around llvm-dsymutil crash by building rust with debuginfo=1 instead of -g.

https://reviewboard.mozilla.org/r/81358/#review79966

You gotta do what you gotta do.

::: config/rules.mk:939
(Diff revision 1)
>  # We need to run cargo unconditionally, because cargo is the only thing that
>  # has full visibility into how changes in Rust sources might affect the final
>  # build.
>  force-cargo-build:
>  	$(REPORT_BUILD)
> -	env CARGO_TARGET_DIR=. RUSTC=$(RUSTC) $(CARGO) build $(cargo_build_flags) --
> +	env CARGO_TARGET_DIR=. RUSTC=$(RUSTC) RUSTFLAGS='-C debuginfo=1' $(CARGO) build $(cargo_build_flags) --

Can you add an XXX comment here about why we're passing RUSTFLAGS?

::: toolkit/library/gtest/rust/Cargo.toml:25
(Diff revision 1)
>  harness = false
>  
>  # Explicitly specify what our profiles use.
>  [profile.dev]
>  opt-level = 1
> -debug = true
> +debug = false

Are all of these set to `false` now because otherwise rustc would complain about `-g` in combination with `-C debuginfo=1`?  Worth adding a comment why we're doing this.

::: toolkit/library/rust/Cargo.toml
(Diff revision 1)
> -debug = true
>  rpath = false
> +debug = false

Uber-nit: let's keep these in the same order as the dev profile.
Attachment #8795237 - Flags: review?(nfroyd) → review+
Comment on attachment 8795237 [details]
bug 1301751 - work around llvm-dsymutil crash by building rust with debuginfo=1 instead of -g.

https://reviewboard.mozilla.org/r/81358/#review79966

> Are all of these set to `false` now because otherwise rustc would complain about `-g` in combination with `-C debuginfo=1`?  Worth adding a comment why we're doing this.

It's worse than that, it actually errors in that configuration. I figured we'd explicitly set them to `false` for now just to avoid that situation. Hopefully we'll get the issue with Rust 1.12 sorted out quickly and we can back this patch out when that lands.

> Uber-nit: let's keep these in the same order as the dev profile.

Oops, I think I deleted this line and put it back in the wrong place.
Blocks: 1305731
I filed bug 1305731 to revert these changes once we get our mac builds updated to Rust 1.12.
Blocks: 1304042
No longer depends on: 1304042, 1304815
Pushed by tmielczarek@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/441dc90a1f8f
work around llvm-dsymutil crash by building rust with debuginfo=1 instead of -g. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/441dc90a1f8f
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
I looked at the build log from today's Mac nightly:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=nightly&selectedJob=5107977

...and it successfully dumped symbols for libxul. The patch in bug 1304042 got backed out because a) llvm-dsymutil is *still* crashing on taskcluster builds, which are trying to dump symbols for the gtest XUL library, which sucks but is not impacting crash reporting and those are also tier-2 builds, so we can live with that until we get the Rust 1.12 situation sorted out.

I'm not super happy with not having that patch because it means we can easily backslide here into silently failing again, but hopefully we'll get things sorted out with Rust 1.12 soon.
Comment on attachment 8795237 [details]
bug 1301751 - work around llvm-dsymutil crash by building rust with debuginfo=1 instead of -g.

Approval Request Comment
[Feature/regressing bug #]: I believe bug 1301065 caused this to break, but it doesn't currently fail the build so we don't notice.
[User impact if declined]: No symbols for OS X crash reports (this is bad)
[Describe test coverage new/current, TreeHerder]: N/A
[Risks and why]: This patch changes the way the Rust compiler generates debug symbols. It's possible this could cause other build failures, but it's probably not worse than the thing it fixes.
[String/UUID change made/needed]: N/A
Attachment #8795237 - Flags: approval-mozilla-aurora?
Assignee: nobody → ted
Comment on attachment 8795237 [details]
bug 1301751 - work around llvm-dsymutil crash by building rust with debuginfo=1 instead of -g.

This patch fixes a build issue related to symbols. Take it in 51 aurora.
Attachment #8795237 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Un-track for 51 as it's fixed.
[Tracking Requested - why for this release]: 50.0b8 Mac signatures are hitting this issue: http://bit.ly/2eFhoe9
(In reply to Marcia Knous [:marcia - use ni] from comment #35)
> [Tracking Requested - why for this release]: 50.0b8 Mac signatures are
> hitting this issue: http://bit.ly/2eFhoe9

Let me know if I should file a new bug for this issue or reopen this bug - thanks!
I looked at the 50.0b8 OS X build log:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&selectedJob=1815769

and I didn't see any evidence of this bug happening. There are no llvm-dsymutil crash stacks.

There are a bunch of warnings like:
09:29:55     INFO -  warning: no debug symbols in executable (-arch x86_64)

...which seems bad. File a new bug on this, please!
Filed Bug 1311462 for the new issue.
If the root cause of this issue is the lack of OS X symbols, we have fixed that in Beta50. Please let me know if I misunderstand.
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: