panic symbol is missing module name
Categories
(Toolkit :: Crash Reporting, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | fixed |
firefox68 | --- | fixed |
firefox69 | --- | fixed |
firefox70 | --- | fixed |
People
(Reporter: willkg, Unassigned)
References
(Regression)
Details
(Keywords: regression)
I was looking into bug #1544246 and it's got these frames:
0 XUL GeckoCrash toolkit/xre/nsAppRunner.cpp:5093 context
1 XUL gkrust_shared::panic_hook toolkit/library/rust/shared/lib.rs:240 frame_pointer
2 XUL core::ops::function::Fn::call src/libcore/ops/function.rs:69 cfi
3 XUL rust_panic_with_hook src/libstd/panicking.rs:482 cfi
4 XUL continue_panic_fmt src/libstd/panicking.rs:385 cfi
5 XUL rust_begin_panic src/libstd/panicking.rs:312 cfi
6 XUL panic_fmt src/libcore/panicking.rs:85 cfi
7 XUL panic src/libcore/panicking.rs:49 cfi
...
The symbols from panicking.rs lack the crate/module/type bits. In other crashes, they were "core::panicking::rust_panic_with_hook", etc.
What changed here and how can we get back to symbols with crate/module/type bits?
Reporter | ||
Comment 1•5 years ago
|
||
The crash in bug #1544246 is from OSX. Looking at the signature report, there are a handful from OSX and a handful from Linux.
Examples:
OSX: https://crash-stats.mozilla.org/report/index/c0351dcd-a07b-4e84-a964-7bfe00190413#tab-details
Linux: https://crash-stats.mozilla.org/report/index/2b9f80f0-e8d6-43ee-a22c-3320b0190412
Here's the same problem on Windows from a crash from yesterday:
https://crash-stats.mozilla.org/report/index/b93c3882-eead-4ac8-b923-539410190414
That has the right signature because signature generation is using "core::panicking::panic" as a signature sentinel. Here's the top 7 frames:
0 xul.dll GeckoCrash toolkit/xre/nsAppRunner.cpp:5092 context
1 xul.dll static void gkrust_shared::panic_hook(struct core::panic::PanicInfo*) toolkit/library/rust/shared/lib.rs:240 cfi
2 xul.dll static void core::ops::function::Fn::call<fn(core::panic::PanicInfo*), (core::panic::PanicInfo*)>(**, struct core::panic::PanicInfo*) src/libcore/ops/function.rs:69 cfi
3 xul.dll static void std::panicking::rust_panic_with_hook() src/libstd/panicking.rs:482 cfi
4 xul.dll static void std::panicking::continue_panic_fmt() src/libstd/panicking.rs:385 cfi
5 xul.dll static void std::panicking::rust_begin_panic() src/libstd/panicking.rs:312 cfi
6 xul.dll static void core::panicking::panic_fmt() src/libcore/panicking.rs:85 cfi
7 xul.dll void core::panicking::panic() src/libcore/panicking.rs:49 cfi
Comment 2•5 years ago
|
||
OK, for reasons I do not understand yet, the Rust 1.34 update changed how the panic symbols are represented in symbol files. Before, we had:
4661480:FUNC 57380f0 35 0 std::panicking::begin_panic
4661513:FUNC 5738450 4e 0 std::panicking::begin_panic_fmt
4661518:FUNC 57384a0 6a 0 std::panicking::continue_panic_fmt
4661525:FUNC 5738510 6b2 0 std::panicking::rust_panic_with_hook
...
4663002:FUNC 5744970 17 0 std::panic::resume_unwind
4663005:FUNC 5744990 59 0 std::panicking::update_count_then_panic
4663317:FUNC 5747610 17 0 __rust_start_panic
4663320:FUNC 5747630 17 0 panic_abort::__rust_start_panic::abort
4663882:FUNC 574c4e0 75 0 core::panicking::panic_bounds_check
4663946:FUNC 574ca30 4e 0 core::panicking::panic_fmt
4664144:FUNC 574ded0 75 0 core::panicking::panic
and after, we have:
4734967:FUNC 573bed0 4e 0 begin_panic_fmt
4734972:FUNC 573bf20 6a 0 continue_panic_fmt
4734979:FUNC 573bf90 6ae 0 rust_panic_with_hook
...
4736318:FUNC 5748340 59 0 update_count_then_panic
4737158:FUNC 574fd40 6b 0 panic_bounds_check
4737188:FUNC 5750280 4e 0 panic_fmt
4737357:FUNC 5751510 75 0 panic
4746497:PUBLIC 573bb90 0 std::panicking::begin_panic::h4112fdbb402e7b9e
4746498:PUBLIC 573c700 0 rust_panic
4746535:PUBLIC 5748320 0 std::panic::resume_unwind::hae3c8b50f45750f6
4746541:PUBLIC 574ae70 0 __rust_start_panic
4746542:PUBLIC 574ae90 0 panic_abort::__rust_start_panic::abort::h02f918ce7ddf92c3
which is not quite a 1-to-1 match in terms of ordering, but you can see that, e.g. panic_bounds_check
results in a different name.
I'm going to see about filing a Rust bug, and maybe somebody will have bright ideas about what could have regressed this. I looked at the obvious src/libcore sources, and they haven't changed in any way that would trigger this.
Comment 3•5 years ago
|
||
Rust 1.33 gives this debug information:
<3><21579134>: Abbrev Number: 300 (DW_TAG_subprogram)
<21579136> DW_AT_low_pc : 0x574c4e0
<2157913a> DW_AT_high_pc : 0x75
<2157913e> DW_AT_frame_base : 1 byte block: 54 (DW_OP_reg4 (esp))
<21579140> DW_AT_linkage_name: (indirect string, offset: 0x9d83e5d): _ZN4core9panicking18panic_bounds_check17hea2fcbc4d6413f35E
<21579144> DW_AT_name : (indirect string, offset: 0x9d83e98): panic_bounds_check
<21579148> DW_AT_decl_file : 37
<21579149> DW_AT_decl_line : 55
<2157914a> DW_AT_external : 1
<2157914a> DW_AT_noreturn : 1
<4><2157914a>: Abbrev Number: 109 (DW_TAG_inlined_subroutine)
<2157914b> DW_AT_abstract_origin: <0x215757ca>
<2157914f> DW_AT_ranges : 0x37aa380
<21579153> DW_AT_call_file : 37
<21579154> DW_AT_call_line : 61
<4><21579155>: Abbrev Number: 0
Note the DW_AT_linkage_name
. Rust 1.34, in contrast:
<1><216f5dd5>: Abbrev Number: 293 (DW_TAG_subprogram)
<216f5dd7> DW_AT_low_pc : 0x5748740
<216f5ddb> DW_AT_high_pc : 0x6b
<216f5ddf> DW_AT_name : (indirect string, offset: 0x9db194a): panic_bounds_check
<2><216f5de3>: Abbrev Number: 107 (DW_TAG_inlined_subroutine)
<216f5de4> DW_AT_abstract_origin: <0x216f5db4>
<216f5de8> DW_AT_ranges : 0x38f1140
<216f5dec> DW_AT_call_file : 10
<216f5ded> DW_AT_call_line : 61
<2><216f5dee>: Abbrev Number: 0
Note the lack of DW_AT_linkage_name
. We use that hereabouts:
and then the result of that:
dmajor noticed that 1.34 transitioned libstd to Rust 2018, which changed how a bunch of things were imported into libstd:
https://github.com/rust-lang/rust/commit/206988751a040522cd7d5da50ca4082a2e65c265
Comment 4•5 years ago
|
||
I filed https://github.com/rust-lang/rust/issues/60020 for the Rust regression. I don't know what policy on handling something like this looks like.
Reporter | ||
Comment 5•5 years ago
|
||
I scry with my little eye that this issue is definitely rooted in the Rust update and isn't going to be fixed any time soon and even if it did, we're probably stuck with it for a while and will have to live with it.
I wonder if this is a problem with all rust symbols or just the panicky ones. I skimmed the proto signatures and stacks for crash reports in nightly channel with buildid >= 20190413214623 and only see the problem manifest with things from panicking.rs:
Is it possible we could band-aid-fix creation of the symbols files?
If not, should we add some kind of "add the module back" rule such that signature generation works?
If not, should we add rust symbols without the module parts to the various signature-generation lists? This could have serious consequences because we won't be able to tell xyz in one module from xyz in another.
Other ideas?
Is the problem that we're on longer matching this rule? https://github.com/mozilla-services/socorro/blob/master/socorro/signature/siglists/irrelevant_signature_re.txt#L91
If so then a band-aid could be to just add some known panic functions by name.
Reporter | ||
Comment 7•5 years ago
|
||
In comment #5, the last idea I had was to add the functions in question to the sentinels list without their module names which is what you're wondering about. However, that could have big consequences since it assumes that we'll never see an "xyz" without a module name that we wouldn't want to be a sentinel. I don't know what the likelihood of that would be or how to estimate it. I suspect it's unlikely for "panic", "panic_bounds_check", "panic_fmt", and "begin_panic", but I have no idea. I'd love to know about the viability of other options first.
Do you have the ability to scan the current db for collisions on these names? Perhaps that could give us some level of reassurance.
Reporter | ||
Comment 9•5 years ago
|
||
It's tricky enough that I'm not sure I would feel fully confident.
The data is in Elasticsearch. I think we can query the proto_signature to see if it has "panic" in it and then look at the resulting crash reports.
However, the situation only started happening 10 days ago and even if we could guarantee we haven't seen a collision of those names, I don't know what we can say about the future.
Socorro has no ability to do something like "use this symbol as a sentinel, but only if the build id is between numbers X and Y".
In Rust culture, is there a rule or requirement or cultural ick that crate authors can't reuse the names of panic functions?
Reporter | ||
Comment 10•5 years ago
•
|
||
I looked at crash reports with panic related functions in them but missing the module and there were 22 in the last month:
(Pretty sure that search gets me what I want. SuperSearch coupled with Elasticsearch regexes is tough.)
So maybe collisions with panicky functions aren't going to be a big deal.
Then I looked at 1000 crash reports from today that have a build id >= 20190413214623 which have a frame that's obviously from a Rust file that has a symbol that doesn't have a :: (if you have a better heuristic for honing in on the problem. I'm all ears).
That 1000 is just a sampling and probably not a representative sample. We get a lot of unhelpful crash reports. I didn't check to see if signatures were the same or not. I can do another pass if we think it's worth the time/effort.
Anyhow, 83 of the 1000 crash reports I looked at have symbols that are missing modules:
git:github.com/rust-lang/rust:src/libstd/panicking.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf rust_begin_panic
git:github.com/rust-lang/rust:src/libstd/sys/unix/thread.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf thread_start
git:github.com/rust-lang/rust:src/libcore/ptr.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf <name omitted>
hg:hg.mozilla.org/mozilla-central:xpcom/rust/gkrust_utils/src/lib.rs:b783cd5203ea589bb7505852e5108ed142d2d37a GkRustUtils_GenerateUUID
hg:hg.mozilla.org/mozilla-central:gfx/webrender_bindings/src/bindings.rs:b783cd5203ea589bb7505852e5108ed142d2d37a wr_renderer_render
git:github.com/rust-lang/rust:src/libcore/fmt/mod.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf write
hg:hg.mozilla.org/mozilla-central:servo/components/style/gecko/arc_types.rs:02b89c29412b6c1444fe32a4847e5261e2bb3d00 Servo_ComputedStyle_Release
git:github.com/rust-lang/rust:src/libstd/panicking.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf continue_panic_fmt
hg:hg.mozilla.org/mozilla-central:media/libcubeb/cubeb-pulse-rs/src/capi.rs:b783cd5203ea589bb7505852e5108ed142d2d37a pulse_rust_init
hg:hg.mozilla.org/mozilla-central:gfx/webrender_bindings/src/bindings.rs:0cf7ad6c1a23ea598cf72c00d06daeb2607ae039 wr_api_hit_test
git:github.com/rust-lang/rust:src/libcore/panicking.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf panic_fmt
git:github.com/rust-lang/rust:src/libstd/panicking.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf rust_panic_with_hook
git:github.com/rust-lang/rust:src/libstd/thread/mod.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf park
git:github.com/rust-lang/rust:src/libcore/panicking.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf panic
hg:hg.mozilla.org/mozilla-central:gfx/webrender_bindings/src/bindings.rs:2049f94666461e2eff7e67f026730a4932c98788 wr_renderer_flush_pipeline_info
git:github.com/rust-lang/rust:src/libstd/sys/unix/thread.rs:91856ed52c58aa5ba66a015354d1cc69e9779bdf new
hg:hg.mozilla.org/mozilla-central:servo/components/style/stylist.rs:72be82c6809e2cc187e5cffdff0c3e686c564a57 Servo_StyleSet_FlushStyleSheets
Pretty sure those symbols have all changed since we upgraded to Rust 1.34. Many of them are from rust-lang, but seems like some of them are from Firefox code.
Comment 11•5 years ago
|
||
All those from Firefox code are actually named that way, they are boundary functions between rust and C, and are extern "C", so they're not mangled or anything, and aren't meant to be.
Comment 12•5 years ago
|
||
I've just run dump_syms
on my local build (using rust 1.35) and the symbols look sane again, e.g.:
FUNC a08cbd0 2e 0 _ZN3std9panicking11begin_panic17h9faa7e103fb56e56E
PUBLIC a08cbd0 0 std::panicking::begin_panic::h9faa7e103fb56e56
Will are you still seeing this?
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Comment 13•5 years ago
|
||
I looked at recent nightlies and I think it looks right now. For example bp-84fea11f-6b30-4f0b-b3e7-74a340190719 which is a crash for Firefox nightly with buildid 20190717215119. It has this in the stack:
{
"file": "git:github.com/rust-lang/rust:src/libstd/panicking.rs:a53f9df32fbb0b5f4382caaad8f1a46f36ea887c",
"frame": 4,
"function": "std::panicking::continue_panic_fmt",
"function_offset": "0x4d",
"line": 381,
"module": "libxul.so",
"module_offset": "0x61a187d",
"offset": "0x7f761225487d",
"trust": "cfi"
},
So, assuming I'm looking at an appropriate crash, I think this looks right now.
Comment 14•5 years ago
|
||
Thanks Will, let's mark this fixed.
Updated•5 years ago
|
Updated•3 years ago
|
Description
•