Closed Bug 1268328 Opened 8 years ago Closed 7 years ago

Crash reports from Rust code are inadequate

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [stylo])

Bug 1266309 is about a Rust crash (well, abort). The crash report is here:

https://crash-stats.mozilla.com/report/index/fe3a0851-557e-4ef8-be95-4c1782160417

The crash report is missing some vital data.

- The function name is not demangled.

- The source file is not listed, which means there is also no link to the source.

Also, Rust symbols end with 'h' followed by a 16 digit hex number. If this number changes frequently it will mess with Socorro's signature-based grouping. Perhaps we could strip it?
I notice that the Rust frames are unwound via the frame pointer tactic, rather than CFI.  Could it just be that we're not building Rust code with -g?
Ted, is this something you've been looking at?
Flags: needinfo?(ted)
Thanks for the needinfo, dbaron.

The crash report in the description is on windows. There's some upstream discussion of the issue on mac in https://github.com/rust-lang/rust/issues/24346

It...kinda looks like we don't pass -g to rustc. I'll fix that.
Depends on: 1268617
No, I haven't looked at this at all. I seem to recall having a discussion with jrmuizel about llvm's debug info support in the past, this may be partially an issue with that.

rillian: you can test this in local builds by doing a build, then either running `./mach buildsymbols`, or just running $objdir/dist/host/bin/dump_syms $objdir/toolkit/library/$libxul, and looking for those functions in the output.
Flags: needinfo?(ted)
Looking at the symbol file from the crash in comment 0 (http://symbols.mozilla.org/xul.pdb/59C3D44296B248DFBCBCD486778345432/xul.sym, 20MB), it looks like we don't have source line info for these functions:
PUBLIC 3c560 0 mp4parse_read

If you look at the PDB dumping code in Breakpad:
https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/toolkit/crashreporter/google-breakpad/src/common/windows/pdb_source_line_writer.cc#379

You can see that we ask DIA for `SymTagFunction`, which results in printing `FUNC` lines, and fall back to `SymTagPublicSymbol`, which prints `PUBLIC` lines if we can't find a function.

Also, re: demangling, I'm not surprised. We get the mangled names from DIA and call `get_undecoratedNameEx` to demangle it, which isn't likely to support Rust function names any time soon:
https://dxr.mozilla.org/mozilla-central/rev/86730d0a82093d705e44f33a34973d28b269f1ea/toolkit/crashreporter/google-breakpad/src/common/windows/pdb_source_line_writer.cc#972

If there's a way we can detect Rust symbols and demangle them we could fix that in Breakpad.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)

> rillian: you can test this in local builds by doing a build, then either
> running `./mach buildsymbols`, or just running
> $objdir/dist/host/bin/dump_syms $objdir/toolkit/library/$libxul, and looking
> for those functions in the output.

Thanks. Using `rustc -g` on linux I see FILE entries for the source, but still only two PUBLIC function entries:

PUBLIC 34f70d0 0 mp4parse_get_track_audio_info
PUBLIC 34f7140 0 mp4parse_get_track_video_info

These are defined in

FILE 6935 $(topsrcdir)/media/libstagefright/binding/mp4parse/capi.rs

but are the only symbols to show up there either way.
Another problem with Rust crash reports is discussed in bug 1271529 -- currently lots of crashes get conflated unhelpfully into a single "rust_panic" signature.
So bug 1268617 should have changed the state of affairs here significantly. Looking at the generated symbol files we seem to have sensible function names and source line info.
Although unfortunately bug 1272278 (manifesting as bug 1272738) means we're probably not seeing the benefit in actual crash reports yet, since all the Windows crash reports are probably broken.
Depends on: 1275424
Depends on: 1275780
So I only just now noticed bug 1291650, which is about crashes on 32-bit Linux on CPUs without SSE2, like:
https://crash-stats.mozilla.com/report/index/28af2d67-07ad-4d75-bba5-921c02160801

That crash report looks mostly usable, with the caveat that frame 0 is in a source file in libcore, so we don't have a useful link to the source. If we fixed that (partially covered by bug 1275424), that crash report would be no worse than equivalent C++ crash reports.

I poked around crash-stats to see if we had other reported crashes in mp4parse:
https://crash-stats.mozilla.com/search/?product=Firefox&signature=~mp4parse&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

There aren't very many, but there are a few. The most frequent (all of 11 crashes when I ran that query) are in mp4parse_read, all on Windows. These crash reports fall squarely into the "not very helpful" category, they all look something like:
https://crash-stats.mozilla.com/report/index/aec4c22b-f836-46b6-bb4d-22c762160830

We don't have any source line info for any of the frames, and any frames that aren't using C linkage are not even demangled. That's pretty poor.

Moving down the list we see a few OOM crashes in mp4parse_new, which all look roughly like:
https://crash-stats.mozilla.com/report/index/e80213b5-3c79-4069-8483-017042160831

These are also not a stellar example of a crash report, as there's no source line info for the mp4parse_new frame, and we appear to have mucked up the unwind there somehow. The "raw dump" tab tells me that the stackwalker actually found frame 3 (the mp4parse_new frame) by scanning, which would indicate that the frame *before* it was missing unwind info, which is odd.

After that we have a couple of signatures with only a single report each:
mp4parse_get_track_info: https://crash-stats.mozilla.com/report/index/b90e3581-156b-4ed7-89de-c55f22160826

This is on OS X, and has a useful stack but no source info in the top frame, so it'd be pretty hard to diagnose.

mp4parse_new: https://crash-stats.mozilla.com/report/index/083d0c90-b410-4a95-a9db-1a71f2160827

This is on Windows and actually looks nicer than any of the other crash reports here. Maybe because the top frame is actually in Rust code from m-c, and not something inlined or monomorphized from the standard library? In any event, it's got source info that links to VCS, so that's pretty great.
It's annoying to have to rely on bugs in our Rust code to track improvements here, so I wrote some patches on bug 1300152.
Related: https://github.com/rust-lang/rust/issues/36452

Rust's libstd crate (that ships with the compiler) doesn't include debug symbols, which means that any non-generic methods that we link from it won't have function names or source line info when linked into libxul.
(In reply to Jed Davis [:jld] {⏰UTC-6} from comment #1)
> I notice that the Rust frames are unwound via the frame pointer tactic,
> rather than CFI.  Could it just be that we're not building Rust code with -g?

We are now building with -g, which is good, but per above libstd is not, so that's a problem for unwinding. Rust apparently omits frame pointers on most archs:
https://github.com/rust-lang/rust/blob/c87ba3f1222ba20d491e8ed76a04977283280742/src/librustc_back/target/mod.rs#L267-L268
I think bug 1311458 should fix most of what ails us here. Looking at some random Win32/Win64 nightly symbols it looks like we have decent-looking function names for Gecko Rust code as well as source line info:
FUNC 3e820 1ad 0 read<&mut mp4parse::BMFFBox<mp4parse_capi::mp4parse_io>>
3e820 9 1491 1387

On Mac/Linux the symbols demangle slightly differently:
FUNC 4055d00 269 0 {{inlined-root}}::read<mp4parse::BMFFBox<mp4parse::BMFFBox<mp4parse_capi::mp4parse_io>>>
4055d00 17 397 7075

FUNC 3621670 210 0 {{inlined-root}}::read<mp4parse::BMFFBox<mp4parse::BMFFBox<mp4parse_capi::mp4parse_io>>>
3621670 17 397 6675

This is because rustc uses GCC-style C++ name mangling, and dump_syms uses `cxx_demangle` if it finds a mangled name, and so we get sort-of-C++-looking symbol names back. We could fix dump_syms to not do this (upstream Breakpad took a patch to handle Swift symbols that way: https://codereview.chromium.org/2147523005/ ) and then do the Rust demangling ourselves in symbolstore.py in order to get function names that match across platforms.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> (In reply to Jed Davis [:jld] {⏰UTC-6} from comment #1)
> > I notice that the Rust frames are unwound via the frame pointer tactic,
> > rather than CFI.  Could it just be that we're not building Rust code with -g?
> 
> We are now building with -g, which is good, but per above libstd is not, so
> that's a problem for unwinding. Rust apparently omits frame pointers on most
> archs:
> https://github.com/rust-lang/rust/blob/
> c87ba3f1222ba20d491e8ed76a04977283280742/src/librustc_back/target/mod.
> rs#L267-L268

To follow up on that--apparently enabling debuginfo currently forces the use of frame pointers in rustc: https://github.com/rust-lang/rust/issues/36452#issuecomment-254678281 . If that ever gets fixed we'll be unhappy on Win32 unless someone fixes LLVM to support writing FPO data on Win32. Other platforms should be fine, as we'll use CFI for non-Windows and Win64 has its own unwind tables that LLVM does have support for.
Depends on: 1319298
I reproduced the crash from bug 1318943 with the latest nightly, which was built with Rust 1.14 (which has debuginfo in libstd), and the crash report looks a lot better:
https://crash-stats.mozilla.com/report/index/95603eb6-4f4d-49a1-9752-3ff1c2161129
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> (In reply to Jed Davis [:jld] {⏰UTC-6} from comment #1)
> > I notice that the Rust frames are unwound via the frame pointer tactic,
> > rather than CFI.  Could it just be that we're not building Rust code with -g?
> 
> We are now building with -g, which is good, but per above libstd is not, so
> that's a problem for unwinding. Rust apparently omits frame pointers on most
> archs:
> https://github.com/rust-lang/rust/blob/
> c87ba3f1222ba20d491e8ed76a04977283280742/src/librustc_back/target/mod.
> rs#L267-L268

To follow up on this point: if you look at the "raw dump" tab of the crash from comment 15, you can see that since bug 1311458 landed we have debug info to unwind through Rust's libstd.
Whiteboard: [stylo]
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16)
> To follow up on this point: if you look at the "raw dump" tab of the crash
> from comment 15, you can see that since bug 1311458 landed we have debug
> info to unwind through Rust's libstd.

Looks adequate to me. Can this bug be marked as fixed?
Flags: needinfo?(ted)
Yes, I think this is sufficiently fixed as filed. We can move the still-open dependencies to a followup bug for improvements to what we currently have.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(ted)
Resolution: --- → FIXED
Filed bug 1348896 to track those.
You need to log in before you can comment on or make changes to this bug.