Closed Bug 1474871 Opened 2 years ago Closed 2 years ago

Rust crash signatures are broken: Symbols aren't getting demangled

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 --- unaffected
firefox63 --- fixed

People

(Reporter: darkspirit, Assigned: glandium)

References

Details

(Keywords: nightly-community, regression)

Attachments

(1 file)

Nightly 63 x64 20180710222524 de_DE @ Debian Testing (KDE, Xorg, GTX 1060 using https://packages.debian.org/buster/nvidia-driver (390.67-2)) 

Please correct the bug title if needed as I have no insight in the magic happening here.

(Jan Andre Ikenmeyer [:darkspirit] from bug 1423311 comment 9)
> (In reply to Jesse Schwartzentruber (:truber) from bug 1423311 comment 0)
> > Created attachment 8934650 [details]
> > testcase.html
> 
> Crash reason changed:
> bp-7f92be4a-9163-4583-a990-3ab210180706
> > Moz2D replay problem
> 
> And it looks like rust crash signatures are broken. Now there's one signature for multiple crash reasons:
> https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozalloc_abort%20%7C%20abort%20%7C%20_ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE%20%7C%20__rust_start_panic%20%7C%20_ZN3std9panicking20rust_panic_with_hook17h608586f043d70222E&date=%3E%3D2018-06-29T15%3A24%3A56.000Z&date=%3C2018-07-06T15%3A24%3A56.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

(Kartikaya Gupta (email:kats@mozilla.com) from bug 1423311 comment 10)
> Looks like the symbols aren't getting demangled, and so the step that's supposed to filter out the rust panic boilerplate stack frames out of the signature isn't getting triggered.
This could be a regression from bug 1447116, which merged to central 10 days ago, or bug 1309172, which merged to central 15 days ago.
RUST_BACKTRACE=1 mozregression --launch 2018-06-25 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:38.62 INFO: thread 'WRRenderBackend#1' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:335:21
> 0:39.17 INFO: stack backtrace:
> 0:39.17 INFO:    0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:39.17 INFO:    1: std::sys_common::backtrace::print
> 0:39.17 INFO:    2: std::panicking::default_hook::{{closure}}
> 0:39.17 INFO:    3: std::panicking::default_hook
> 0:39.17 INFO:    4: std::panicking::rust_panic_with_hook
> 0:39.17 INFO:    5: std::panicking::begin_panic
> 0:39.17 INFO:    6: std::panicking::begin_panic_fmt
> 0:39.17 INFO:    7: rust_begin_unwind
> 0:39.17 INFO:    8: core::panicking::panic_fmt
> 0:39.17 INFO:    9: core::panicking::panic
> 0:39.17 INFO:   10: <unknown>
> 0:39.17 INFO: Redirecting call to abort() to mozalloc_abort

RUST_BACKTRACE=1 mozregression --launch 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 0:42.76 INFO: thread 'WRWorker#3' panicked at 'Moz2D replay problem', gfx/webrender_bindings/src/moz2d_renderer.rs:453:21
> 0:42.96 INFO: stack backtrace:
> 0:42.96 INFO:    0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
> 0:42.96 INFO:    1: std::sys_common::backtrace::print
> 0:42.96 INFO:    2: std::panicking::default_hook::{{closure}}
> 0:42.96 INFO:    3: std::panicking::default_hook
> 0:42.96 INFO:    4: std::panicking::rust_panic_with_hook
> 0:42.96 INFO:    5: <unknown>
> 0:42.96 INFO: Redirecting call to abort() to mozalloc_abort

I assume the now missing core::panicking::panic_fmt is the relevant difference? (I have really no idea what I'm talking about.)

RUST_BACKTRACE=1 mozregression --good 2018-06-25 --bad 2018-07-10 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
> 12:05.55 INFO: Last good revision: 27f60c02583dca519ae203bcf34ac8133483f0e7
> 12:05.55 INFO: First bad revision: ba2b15dbe7d87b170a8f89bcdef6c739583b322c
> 12:05.55 INFO: Pushlog:
> https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=27f60c02583dca519ae203bcf34ac8133483f0e7&tochange=ba2b15dbe7d87b170a8f89bcdef6c739583b322c

> ba2b15dbe7d8	Martin Robinson — Bug 1470125 - Update WR bindings for changes in WR PR 2848. r=Gankro
> 720f0dd717a1	Kartikaya Gupta — Bug 1470125 - Update bindings for changes in WR PR 2849. r=Gankro
> afe2313f83ed	Kartikaya Gupta — Bug 1470125 - Update webrender to commit cdfaaeb5f74e416f39af1081c9a676c752d23896. r=Gankro

> WR @ 866a83f86ae129052c38d009da5587dfed90c144
RUST_BACKTRACE=1 mozregression --repo try --launch a66923a8f067835f563a0a80ff4aea48275a3acd -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
good (has panic_fmt; 'called `Option::unwrap()` on a `None` value')

> WR @ 27443178dae2596536224b34445fb480c3cbc079
RUST_BACKTRACE=1 mozregression --repo try --launch 92f0c1a966c975f4822d2fd01570c339a2e8e022 -B debug --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://bugzilla.mozilla.org/attachment.cgi?id=8934650'
bad (no panic_fmt; 'Moz2D replay problem')

"Regression range": https://github.com/servo/webrender/compare/866a83f86ae129052c38d009da5587dfed90c144...27443178dae2596536224b34445fb480c3cbc079

There was no longer a core::panicking::panic_fmt when the crash reason changed.
I don't know if this finding is related to this bug. Please mark this comment as offtopic if it is not relevant.
It would be easier if I knew how to enable the crash reporter within mozregression or how to get the crash signature as it would be displayed on Socorro.
rust 1.28 changed how panic is hooked up at the lower level, and the panic_fmt function doesn't exist anymore. If soccoro relies on that, well, that's now obviously broken. Symbols not being demangled is ... unexpected. For instance, _ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE demangles perfectly fine to panic_abort::__rust_start_panic::abort with c++filt, so I don't know why those would not be demangled... except if the DWARF changed in a way that breakpad doesn't like, kind of like bug 1464537.
The good news is that the demangling issue only affects linux builds (I checked symbols for android, windows and mac, they're fine)
It's a regression from bug 1309172.
Assignee: nobody → mh+mozilla
Blocks: 1309172
Hahaha the new version of breakpad added this: https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/toolkit/crashreporter/google-breakpad/src/common/language.cc#159-166

and the fallback is to use the mangled version, instead of trying to C++-demangle it, like it used to.
Ironically, the change was from Ted: https://chromium-review.googlesource.com/c/402308/
rust-demangle-capi fails to build. It's using rusty-cheddar to generate a header, which depends on syntex_syntax, which fails to build:
 0:23.23 error: to use a constant of type `codemap::Span` in a pattern, `codemap::Span` must be annotated with `#[derive(PartialEq, Eq)]`
 0:23.23    --> third_party/rust/syntex_syntax/src/errors/emitter.rs:100:18
 0:23.23     |
 0:23.23 100 |             Some(COMMAND_LINE_SP) => self.emit_(FileLine(COMMAND_LINE_SP), msg, code, lvl),
 0:23.23     |                  ^^^^^^^^^^^^^^^

rusty-cheddar is deprecated in favor of cbindgen, OTOH, that seems overkill to generate a header for two functions.
Note, it bugs me that the problem doesn't happen on mac and android. Does that mean the dwarf info is not annotated as being rust there?
Don't know if related: WebRender crash signatures from Windows and Mac often, but not always, looked a bit different than from Linux. Examples: bug 1455597, bug 1455743.
Actually, mac was affected to some degree. I guess Android probably too.
(In reply to Mike Hommey [:glandium] from comment #3)
> rust 1.28 changed how panic is hooked up at the lower level, and the
> panic_fmt function doesn't exist anymore.

Actually, it does, but it's mangled: _ZN4core9panicking9panic_fmt17h545cb08e3b473cc5E

So there might actually be a separate problem in that unwinding can't unwind through it... if that's the case, that's bad.
Crash reports like
https://crash-stats.mozilla.com/report/index/743446b3-396d-4adc-8628-442030180711

suggest we do unwind properly, but there are probably no frames left with panic_fmt for optimization reasons.
Duplicate of this bug: 1311498
Comment on attachment 8991516 [details]
Bug 1474871 - Link dump_syms against rustc-demangle.

https://reviewboard.mozilla.org/r/256416/#review263372

Thanks for finishing this! When I originally implemented that upstream I didn't get this part done becuase we didn't have host Rust compilation wired up, and we needed to update Breakpad to pick up those changes as well. Your reworking of the -capi crate I wrote is totally sensible, doing all that work just to generate two C function declarations doesn't make much sense.
Attachment #8991516 - Flags: review?(ted) → review+
Pushed by tmielczarek@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7bf77a18b13e
Link dump_syms against rustc-demangle. r=ted
https://hg.mozilla.org/mozilla-central/rev/7bf77a18b13e
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
Could the crash signature be made as meaningful as in the past again? (e.g. bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)
Or is it correct like it is now?
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #19)
> Thank you! Now I get bp-0ca773e7-2b08-4d1a-8fd7-67b4e0180713.
> Could the crash signature be made as meaningful as in the past again? (e.g.
> bp-76972dfe-f63c-4d5a-8d71-1a6a60180425)

File a Soccoro bug for some filtering on the panic symbols, but I'll note that the signature *does* contain the relevant bit (<T>::execute), although the specific rayon type is masked.
(In reply to Jan Andre Ikenmeyer [:darkspirit] from comment #20)
> Or is it correct like it is now?

It doesn't seem to be the same crash at all, so you can't really compare them.
Okay, thank you!
See Also: → 1475820
You need to log in before you can comment on or make changes to this bug.