Closed Bug 1429056 Opened 3 years ago Closed 3 years ago

Make it easier to diagnose and fix situations when llvm-dsymutil crashes process mac debug info

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox59 fixed)

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: ted, Assigned: glandium)

References

Details

Attachments

(3 files)

We've hit issues many times with llvm-dsymutil crashing on our mac builds, usually handling debug info from Rust code: bug 1397382, bug 1381043, bug 1410148 are the bugs I have at hand.

We usually wallpaper over it by making mac builds use `-C debuginfo=1` which often makes the problem go away, but isn't a great solution. Historically the problem persists until glandium performs heroic feats and diagnoses the LLVM bug. We should figure out how to make this easier in the future.

Here's a specific proposal, feel free to suggest something better:
- Make the build system copy all the object files and the binary or library involved to be uploaded as an artifact when llvm-dsymutil fails. Assuming that docker-worker will actually upload artifacts if the build fails, this should not be very hard: the binary/libary whose symbols are being processed is known to the symbolstore script, and the build system has knowledge of all the linker inputs, so we could simply package them all up in one gigantic artifact.

We could also write up a short document detailing how to proceed once you have this artifact, which glandium is best-equipped to write but I suspect goes something like:
1. Ensure you can reproduce the crash with the version of llvm-dsymutil in use in the builds that are failing.
2. Attempt to reproduce with llvm-dsymutil from llvm trunk.
2a. If it reproduces with llvm trunk, attempt to minimize a testcase and file an llvm bug. From past bugs this seems to mostly have involved glandium removing all the object files except for the one generated by the Rust compiler, since that's inevitably the root cause of the issues.
2b. If the issue does not reproduce, figure out what llvm patch fixed it and cherry-pick it into the in-tree patch stack for our clang toolchain build. This effectively means bisecting over llvm. I don't know if there's a simple way to handle this or not.
For the documentation part, I can write up something, we "just" need to decide where to put it.

For the artifacts part... I'm not entirely sure this is useful/necessary. Downloading an objdir and trying to replicate locally can be time consuming. I've actually had great success using one-click loaners for this. The only glitch is that one-click loaners currently don't work for jobs that are not level-1 (try) out of the box, they require some editing of the task before creating them (bug 1421077).
Bug 1430315 addressed the "make it easier to fix" part: we can now update llvm-dsymutil without touching clang, and that can even be used to test llvm trunk on try.

I'm however going U-turn on comment 1. Describing my process to generate a reduced testcase from a one-click-loaner is not particularly difficult, but so is writing a script that generates an artifact automatically, and it's better if a human doesn't actually need to open a one-click-loaner to manually run the procedure when a script can do all that automatically.
Assignee: nobody → mh+mozilla
Depends on: 1430315
Demonstration in this try build, that has these patches + a change that makes it build an unpatched llvm-dsymutil 3.9, which crashes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7429457ce42545ff17b95242acf0f0eebe43da35
(In reply to Mike Hommey [:glandium] from comment #6)
> Demonstration in this try build, that has these patches + a change that
> makes it build an unpatched llvm-dsymutil 3.9, which crashes:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=7429457ce42545ff17b95242acf0f0eebe43da35

Oh wow, this is great! We might want to tweak the output somehow so we get something in the error summary that's more useful for sheriffs. If we could get the error summary to be like "llvm-dsymutil crash [@ ... ]" I think that'd help.
Attachment #8943895 - Flags: review?(core-build-config-reviews) → review?(ted)
Comment on attachment 8943895 [details]
Bug 1429056 - Don't strip llvm-dsymutil.

https://reviewboard.mozilla.org/r/214254/#review219984

Is there any way to convince the llvm build to keep `.debug_line` info as well? I note that the stack traces from your try push don't have line info:
[task 2018-01-19T10:29:59.781Z] 10:29:59     INFO -  #4 0x000000000045f4b7 llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) (/builds/worker/workspace/build/src/llvm-dsymutil/bin/llvm-dsymutil+0x45f4b7)
Attachment #8943895 - Flags: review?(ted) → review+
Comment on attachment 8943896 [details]
Bug 1429056 - Add llvm-symbolizer to the llvm-dsymutil toolchain.

https://reviewboard.mozilla.org/r/214256/#review219986
Attachment #8943896 - Flags: review+
Attachment #8943896 - Flags: review?(core-build-config-reviews)
Attachment #8943897 - Flags: review?(core-build-config-reviews) → review?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> Comment on attachment 8943895 [details]
> Bug 1429056 - Don't strip llvm-dsymutil.
> 
> https://reviewboard.mozilla.org/r/214254/#review219984
> 
> Is there any way to convince the llvm build to keep `.debug_line` info as
> well? I note that the stack traces from your try push don't have line info:
> [task 2018-01-19T10:29:59.781Z] 10:29:59     INFO -  #4 0x000000000045f4b7
> llvm::dsymutil::(anonymous
> namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous
> namespace)::DwarfLinker::RelocationManager&,
> llvm::DWARFDebugInfoEntryMinimal const&, llvm::dsymutil::DebugMapObject
> const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int)
> (/builds/worker/workspace/build/src/llvm-dsymutil/bin/llvm-dsymutil+0x45f4b7)

I don't think we ever had line numbers in those crash reports. We'd need to add -g to the build, but I think that would make llvm-dsymutil size blow up again... That's followup fodder, anyways.
Comment on attachment 8943897 [details]
Bug 1429056 - Wrap llvm-dsymutil calls on automation.

https://reviewboard.mozilla.org/r/214258/#review219988

I don't love that it's a shell script, but having this is way better than not having it. Thanks!

::: build/macosx/cross-mozconfig.common:35
(Diff revision 1)
>  export LLVMCONFIG=$topsrcdir/clang/bin/llvm-config
>  export LDFLAGS="-Wl,-syslibroot,$CROSS_SYSROOT -Wl,-dead_strip"
>  export BINDGEN_CFLAGS="$FLAGS"
>  export TOOLCHAIN_PREFIX=$CROSS_CCTOOLS_PATH/bin/x86_64-apple-darwin11-
> -export DSYMUTIL=$topsrcdir/llvm-dsymutil/bin/llvm-dsymutil
> +export DSYMUTIL=$topsrcdir/build/macosx/llvm-dsymutil
> +mk_add_options "export REAL_DSYMUTIL=$topsrcdir/llvm-dsymutil/bin/llvm-dsymutil"

Is this worthwhile, or should we just hardcode this path into the script? (It's hardcoded in a file in the tree either way, right?)
Attachment #8943897 - Flags: review?(ted) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> Comment on attachment 8943897 [details]
> Bug 1429056 - Wrap llvm-dsymutil calls on automation.
> 
> https://reviewboard.mozilla.org/r/214258/#review219988
> 
> I don't love that it's a shell script, but having this is way better than
> not having it. Thanks!
> 
> ::: build/macosx/cross-mozconfig.common:35
> (Diff revision 1)
> >  export LLVMCONFIG=$topsrcdir/clang/bin/llvm-config
> >  export LDFLAGS="-Wl,-syslibroot,$CROSS_SYSROOT -Wl,-dead_strip"
> >  export BINDGEN_CFLAGS="$FLAGS"
> >  export TOOLCHAIN_PREFIX=$CROSS_CCTOOLS_PATH/bin/x86_64-apple-darwin11-
> > -export DSYMUTIL=$topsrcdir/llvm-dsymutil/bin/llvm-dsymutil
> > +export DSYMUTIL=$topsrcdir/build/macosx/llvm-dsymutil
> > +mk_add_options "export REAL_DSYMUTIL=$topsrcdir/llvm-dsymutil/bin/llvm-dsymutil"
> 
> Is this worthwhile, or should we just hardcode this path into the script?
> (It's hardcoded in a file in the tree either way, right?)

I think it's somehow better this way because if we somehow change things back to use llvm-dsymutil from clang, or whatever, then we don't have to remember/figure out that the path is hardcoded in the wrapper script.
I'll land as is, we can bikeshed in followups.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/558cb0501f8e
Don't strip llvm-dsymutil. r=ted
https://hg.mozilla.org/integration/autoland/rev/afec89310b68
Add llvm-symbolizer to the llvm-dsymutil toolchain. r=ted
https://hg.mozilla.org/integration/autoland/rev/2082870a97c2
Wrap llvm-dsymutil calls on automation. r=ted
Product: Core → Firefox Build System
See Also: → 1420336
You need to log in before you can comment on or make changes to this bug.