1474871 - Rust crash signatures are broken: Symbols aren't getting demangled

Reporter

Description

•

6 years ago

Nightly 63 x64 20180710222524 de_DE @ Debian Testing (KDE, Xorg, GTX 1060 using https://packages.debian.org/buster/nvidia-driver (390.67-2)) 

Please correct the bug title if needed as I have no insight in the magic happening here.

(Jan Andre Ikenmeyer [:darkspirit] from bug 1423311 comment 9)
> (In reply to Jesse Schwartzentruber (:truber) from bug 1423311 comment 0)
> > Created attachment 8934650 [details]
> > testcase.html
> 
> Crash reason changed:
> bp-7f92be4a-9163-4583-a990-3ab210180706
> > Moz2D replay problem
> 
> And it looks like rust crash signatures are broken. Now there's one signature for multiple crash reasons:
> https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozalloc_abort%20%7C%20abort%20%7C%20_ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE%20%7C%20__rust_start_panic%20%7C%20_ZN3std9panicking20rust_panic_with_hook17h608586f043d70222E&date=%3E%3D2018-06-29T15%3A24%3A56.000Z&date=%3C2018-07-06T15%3A24%3A56.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

(Kartikaya Gupta (email:kats@mozilla.com) from bug 1423311 comment 10)
> Looks like the symbols aren't getting demangled, and so the step that's supposed to filter out the rust panic boilerplate stack frames out of the signature isn't getting triggered.

(not currently active) Ted Mielczarek

Comment 1

•

6 years ago

This could be a regression from bug 1447116, which merged to central 10 days ago, or bug 1309172, which merged to central 15 days ago.

Darkspirit

Reporter

Comment 2

•

6 years ago

RUST_BACKTRACE=1 mozregression --launch 2018-06-25 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:38.62 INFO: thread 'WRRenderBackend#1' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335:21
> 0:39.17 INFO: stack backtrace:
> 0:39.17 INFO:    0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:39.17 INFO:    1: std::sys_common::backtrace::print
> 0:39.17 INFO:    2: std::panicking::default_hook::{{closure}}
> 0:39.17 INFO:    3: std::panicking::default_hook
> 0:39.17 INFO:    4: std::panicking::rust_panic_with_hook
> 0:39.17 INFO:    5: std::panicking::begin_panic
> 0:39.17 INFO:    6: std::panicking::begin_panic_fmt
> 0:39.17 INFO:    7: rust_begin_unwind
> 0:39.17 INFO:    8: core::panicking::panic_fmt
> 0:39.17 INFO:    9: core::panicking::panic
> 0:39.17 INFO:   10: <unknown>
> 0:39.17 INFO: Redirecting call to abort() to mozalloc_abort

RUST_BACKTRACE=1 mozregression --launch 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:42.76 INFO: thread 'WRWorker#3' panicked at 'Moz2D replay problem', gfx/webrender_bindings/src/moz2d_renderer.rs:453:21
> 0:42.96 INFO: stack backtrace:
> 0:42.96 INFO:    0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:42.96 INFO:    1: std::sys_common::backtrace::print
> 0:42.96 INFO:    2: std::panicking::default_hook::{{closure}}
> 0:42.96 INFO:    3: std::panicking::default_hook
> 0:42.96 INFO:    4: std::panicking::rust_panic_with_hook
> 0:42.96 INFO:    5: <unknown>
> 0:42.96 INFO: Redirecting call to abort() to mozalloc_abort

I assume the now missing core::panicking::panic_fmt is the relevant difference? (I have really no idea what I'm talking about.)

RUST_BACKTRACE=1 mozregression --good 2018-06-25 --bad 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 12:05.55 INFO: Last good revision: 27f60c02583dca519ae203bcf34ac8133483f0e7
> 12:05.55 INFO: First bad revision: ba2b15dbe7d87b170a8f89bcdef6c739583b322c
> 12:05.55 INFO: Pushlog:
> https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=27f60c02583dca519ae203bcf34ac8133483f0e7&tochange=ba2b15dbe7d87b170a8f89bcdef6c739583b322c

> ba2b15dbe7d8	Martin Robinson — Bug 1470125 - Update WR bindings for changes in WR PR 2848. r=Gankro
> 720f0dd717a1	Kartikaya Gupta — Bug 1470125 - Update bindings for changes in WR PR 2849. r=Gankro
> afe2313f83ed	Kartikaya Gupta — Bug 1470125 - Update webrender to commit cdfaaeb5f74e416f39af1081c9a676c752d23896. r=Gankro

> WR @ 866a83f86ae129052c38d009da5587dfed90c144
RUST_BACKTRACE=1 mozregression --repo try --launch a66923a8f067835f563a0a80ff4aea48275a3acd -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
good (has panic_fmt; 'called `Option::unwrap()` on a `None` value')

> WR @ 27443178dae2596536224b34445fb480c3cbc079
RUST_BACKTRACE=1 mozregression --repo try --launch 92f0c1a966c975f4822d2fd01570c339a2e8e022 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
bad (no panic_fmt; 'Moz2D replay problem')

"Regression range": https://github.com/servo/webrender/compare/866a83f86ae129052c38d009da5587dfed90c144...27443178dae2596536224b34445fb480c3cbc079

There was no longer a core::panicking::panic_fmt when the crash reason changed.
I don't know if this finding is related to this bug. Please mark this comment as offtopic if it is not relevant.
It would be easier if I knew how to enable the crash reporter within mozregression or how to get the crash signature as it would be displayed on Socorro.

Mike Hommey [:glandium]

Assignee

Comment 3

•

6 years ago

rust 1.28 changed how panic is hooked up at the lower level, and the panic_fmt function doesn't exist anymore. If soccoro relies on that, well, that's now obviously broken. Symbols not being demangled is ... unexpected. For instance, _ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE demangles perfectly fine to panic_abort::__rust_start_panic::abort with c++filt, so I don't know why those would not be demangled... except if the DWARF changed in a way that breakpad doesn't like, kind of like bug 1464537.

Mike Hommey [:glandium]

Assignee

Comment 4

•

6 years ago

The good news is that the demangling issue only affects linux builds (I checked symbols for android, windows and mac, they're fine)

Mike Hommey [:glandium]

Assignee

Comment 5

•

6 years ago

It's a regression from bug 1309172.

Assignee: nobody → mh+mozilla

Blocks: 1309172

Mike Hommey [:glandium]

Assignee

Comment 6

•

6 years ago

Hahaha the new version of breakpad added this: https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/toolkit/crashreporter/google-breakpad/src/common/language.cc#159-166

and the fallback is to use the mangled version, instead of trying to C++-demangle it, like it used to.

Mike Hommey [:glandium]

Assignee

Comment 7

•

6 years ago

Ironically, the change was from Ted: https://chromium-review.googlesource.com/c/402308/

Mike Hommey [:glandium]

Assignee

Comment 8

•

6 years ago

rust-demangle-capi fails to build. It's using rusty-cheddar to generate a header, which depends on syntex_syntax, which fails to build:
 0:23.23 error: to use a constant of type `codemap::Span` in a pattern, `codemap::Span` must be annotated with `#[derive(PartialEq, Eq)]`
 0:23.23    --> third_party/rust/syntex_syntax/src/errors/emitter.rs:100:18
 0:23.23     |
 0:23.23 100 |             Some(COMMAND_LINE_SP) => self.emit_(FileLine(COMMAND_LINE_SP), msg, code, lvl),
 0:23.23     |                  ^^^^^^^^^^^^^^^

rusty-cheddar is deprecated in favor of cbindgen, OTOH, that seems overkill to generate a header for two functions.

Mike Hommey [:glandium]

Assignee

Comment 9

•

6 years ago

Note, it bugs me that the problem doesn't happen on mac and android. Does that mean the dwarf info is not annotated as being rust there?

Darkspirit

Reporter

Comment 10

•

6 years ago

Don't know if related: WebRender crash signatures from Windows and Mac often, but not always, looked a bit different than from Linux. Examples: bug 1455597, bug 1455743.

Mike Hommey [:glandium]

Assignee

Comment 11

•

6 years ago

Actually, mac was affected to some degree. I guess Android probably too.

Mike Hommey [:glandium]

Assignee

Comment 12

•

6 years ago

(In reply to Mike Hommey [:glandium] from comment #3)
> rust 1.28 changed how panic is hooked up at the lower level, and the
> panic_fmt function doesn't exist anymore.

Actually, it does, but it's mangled: _ZN4core9panicking9panic_fmt17h545cb08e3b473cc5E

So there might actually be a separate problem in that unwinding can't unwind through it... if that's the case, that's bad.

Mike Hommey [:glandium]

Assignee

Comment 13

•

6 years ago

Crash reports like
https://crash-stats.mozilla.com/report/index/743446b3-396d-4adc-8628-442030180711

suggest we do unwind properly, but there are probably no frames left with panic_fmt for optimization reasons.

Comment hidden (mozreview-request)

The new version of breakpad imported in bug 1309172 doesn't demangle
rust symbols at all, contrary to before, where it tried to C++ demangle
them, which worked for many, although far from all. It however has
rust-demangle support as long as it's linked against a copy of the
rust-demangle-capi crate from https://github.com/luser/rust-demangle-capi/

This imports the code from the rust-demangle-capi crate but because of
some build system complications it's not taken as-is:
- it uses rusty-cheddar, which is deprecated, to generate a C header.
- rusty-cheddar depends on syntex_syntax, which now fails to build.
- rust-demangle-capi has crate-type staticlib, which can't be used
  as a dependency in a Cargo.toml. For that reason, we can't create
  a fake crate that depends on it to have it vendored.

Overall, it's only a few lines of rust, and the C header can be written
manually, so this is what we do here. The created crate is named in a way
specific to dump_syms.

The build system doesn't know how to figure out what system libraries
are required to link rust static libraries, although the rust compiler
has /some/ support to get the information, so we handle that manually.

Review commit: https://reviewboard.mozilla.org/r/256416/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/256416/

(not currently active) Ted Mielczarek

Comment 16

•

6 years ago

mozreview-review

Comment on attachment 8991516 [details]
Bug 1474871 - Link dump_syms against rustc-demangle.

https://reviewboard.mozilla.org/r/256416/#review263372

Thanks for finishing this! When I originally implemented that upstream I didn't get this part done becuase we didn't have host Rust compilation wired up, and we needed to update Breakpad to pick up those changes as well. Your reworking of the -capi crate I wrote is totally sensible, doing all that work just to generate two C function declarations doesn't make much sense.

Attachment #8991516 - Flags: review?(ted) → review+

Pulsebot

Comment 17

•

6 years ago

Pushed by tmielczarek@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7bf77a18b13e
Link dump_syms against rustc-demangle. r=ted

(not currently active) Ted Mielczarek

Updated

•

6 years ago

Depends on: 1475278

Andreea Pavel [:apavel]

Comment 18

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7bf77a18b13e

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox63: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Darkspirit

Reporter

Comment 19

•

6 years ago

Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
Could the crash signature be made as meaningful as in the past again? (e.g. bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)

Darkspirit

Reporter

Comment 20

•

6 years ago

Or is it correct like it is now?

Mike Hommey [:glandium]

Assignee

Comment 21

•

6 years ago

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #19)
> Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
> Could the crash signature be made as meaningful as in the past again? (e.g.
> bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)

File a Soccoro bug for some filtering on the panic symbols, but I'll note that the signature *does* contain the relevant bit (<T>::execute), although the specific rayon type is masked.

Mike Hommey [:glandium]

Assignee

Comment 22

•

6 years ago

(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #20)
> Or is it correct like it is now?

It doesn't seem to be the same crash at all, so you can't really compare them.

Darkspirit

Reporter

Comment 23

•

6 years ago

Okay, thank you!

Darkspirit

Reporter

Updated

•

6 years ago

Updated

•

6 years ago

status-firefox61: --- → unaffected

status-firefox62: --- → unaffected

status-firefox-esr52: --- → unaffected

status-firefox-esr60: --- → unaffected