Breakpad Linux Dumper: support inlined functions
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox68 | --- | fixed |
People
(Reporter: jimb, Assigned: froydnj)
References
Details
Attachments
(6 files)
| Reporter | ||
Updated•16 years ago
|
Comment 1•16 years ago
|
||
| Reporter | ||
Comment 2•16 years ago
|
||
| Reporter | ||
Comment 3•16 years ago
|
||
Comment 4•16 years ago
|
||
Comment 5•15 years ago
|
||
| Reporter | ||
Comment 7•13 years ago
|
||
Comment 8•8 years ago
|
||
| Reporter | ||
Comment 9•8 years ago
|
||
| Reporter | ||
Comment 10•8 years ago
|
||
Comment 11•8 years ago
|
||
Comment 12•8 years ago
|
||
Comment 13•8 years ago
|
||
| Reporter | ||
Comment 14•8 years ago
|
||
| Reporter | ||
Comment 15•8 years ago
|
||
| Reporter | ||
Comment 16•8 years ago
|
||
| Assignee | ||
Comment 17•6 years ago
|
||
I am going to try and tackle this work this quarter. I don't know if we ever settled on a design for how to represent this information in the breakpad symbol file, though? Does it really matter that much if we keep compatibility with upstream at this point?
| Assignee | ||
Comment 18•6 years ago
|
||
Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?
Comment 19•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #18)
Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?
The changes would be in the part of breakpad that we haven't forked. That doesn't really matter in the sense that we can always fork the rest too. I would still propose the changes here for integration upstream because others might find them useful; however I wouldn't wait on upstream to accept them before doing them in mozilla-central.
| Assignee | ||
Comment 20•6 years ago
|
||
Changing the breakpad symbol format seems fraught with all kinda of problems. I think the simpler idea is to do the following:
- Parse all
DW_TAG_inlined_subroutinedirect children ofDW_TAG_subprogramDIEs, recording the extents for each and theDW_AT_call_{file,line}attributes. This step gives you the original (subject to compiler constraints) source line for inlined functions and their inlinees (?). If you squint at it correctly, this information is essentially.debug_lineinformation represented in a slightly different form. - We now have two sets of line information: the "precise" information from
.debug_line, telling you every detail of all inlined functions, and the "coarse" information for the first level of inlining performed by the compiler. - Before we process the precise line information, we're going to determine, for each "coarse" line, which bits of "precise" line information the coarse line overlaps. We're then going to use the coarse line information instead of the overlapped bits. For the example in comment 8, we previously had:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]
and we'll now have, just focusing on the FUNC bits:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f282aa 3 853 602
> [...]
which should be slightly nicer? There's a diff -y of NSS here:
https://gist.github.com/froydnj/9405314a5492c40353e916c5cbfbbfcb
You can search for ParseRFC1485AVA or CERT_CreateAVAFromSECItem to get an idea of how it looks in practice. The symbol file is ~4% smaller, which is a nice side-effect.
I'm not sure how this looks on actual crashes yet; that step is getting investigated tomorrow. This scheme does throw away information, which is never great. For particularly complicated lines, you're just faced with determining which bits of the line actually triggered the crash, instead of knowing that the crash started from a call to some inlined function; I think this is an improvement, but I'm not entirely sure. It would be nice to have the equivalent of llvm-symbolizer -inlines, which prints inlined "frames", but I'm not sure how to do that without reworking Breakpad's symbol format.
Also, this work was done only for DWARF; I don't know if the idea would translate well to Windows.
| Assignee | ||
Comment 21•6 years ago
|
||
The first experiment has not gone well. I looked at https://crash-stats.mozilla.com/report/index/f8e84a6c-079a-410c-8bf0-3588f0190325, which is connected to bug 1521095. Collapsing line information where appropriate reduces the breakpad symbol file size by ~25% for libxul, which is a nice size win. Unfortunately, we're stymied by rustc not giving us enough debug information; we still get:
0 libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h : 314 + 0x0]
1 libxul.so!GeckoCrash [nsAppRunner.cpp : 5093 + 0xa]
2 libxul.so!gkrust_shared::panic_hook [lib.rs : 240 + 0x9]
3 libxul.so!core::ops::function::Fn::call [function.rs : 78 + 0x5]
4 libxul.so!std::panicking::rust_panic_with_hook [panicking.rs : 495 + 0x6]
5 libxul.so!std::panicking::continue_panic_fmt [panicking.rs : 398 + 0x14]
6 libxul.so!rust_begin_unwind [panicking.rs : 325 + 0x5]
7 libxul.so!core::panicking::panic_fmt [panicking.rs : 95 + 0x5]
8 libxul.so!core::option::expect_failed [option.rs : 1008 + 0x11]
9 libxul.so!webrender::resource_cache::ResourceCache::get_cached_image + 0x211
...
Note that we were not able to get file and line number information for the get_cached_image frame, similar to what's happening today. That's because there's simply no line information for the crashing address. The lines in the symbol file look like:
FUNC 5697ba0 2ff 0 webrender::resource_cache::ResourceCache::get_cached_image
5697ba0 13 1520 4571
5697bb3 e8 1522 4571
5697c9b c 1522 4571
5697ca7 100 1522 4571 # covers to 5697da7
# crash happens here, but we have a hole from [5697da7,5697db3)
5697db3 14 1522 4571
5697dc7 83 1523 4571
5697e4a 29 1523 4571
5697e73 f 1524 4571
5697e82 1d 1523 4571
The closest line information that we get is:
which would be better than nothing, but it's not obvious how we get from that line to crashing in option::expect_failed.
To be clear, even if we implemented the more complete proposals described in the bug initially, we'd have the exact same problem. We'd need compiler changes and/or heuristics for extending line information to cover holes to fix cases like the ones in bug 1521095.
| Assignee | ||
Comment 22•6 years ago
|
||
The second experiment is more encouraging. For bug 1508714 and the crash https://crash-stats.mozilla.com/report/index/32351114-a0c4-4d40-a587-f7ef10181120, we now get:
...
7 libxul.so!core::panicking::panic_fmt [panicking.rs : 77 + 0x5]
8 libxul.so!core::option::expect_failed [option.rs : 1000 + 0x11]
9 libxul.so!<core::iter::Map<I, F> as core::iter::iterator::Iterator>::next [mod.rs : 1394 + 0xa6]
10 libxul.so!<webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize [moz2d_renderer.rs : 479 + 0x2f]
11 libxul.so!webrender::frame_builder::FrameBuilder::build [resource_cache.rs : 1471 + 0x6]
The moz2d_renderer.rs line in frame 10 points at:
The mod.rs for frame 9 is libcore/iter/mod.rs.
Frames 9 and 10 were previously symbolicated as:
9 libxul.so <core::iter::Map<I, F> as core::iter::iterator::Iterator>::next libcore/option.rs:312 cfi
10 libxul.so <webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize liballoc/vec.rs:1903 cfi
Which is nonsense: iterator functionality isn't defined in option.rs, and webrender_bindings:: code isn't defined in liballoc/vec.rs. So at least in this case, we'd improve on the status quo. Whether that's enough of an improvement to make crash debugging easier, I'm not sure.
| Assignee | ||
Comment 23•6 years ago
|
||
The DW_AT_call_file attributes that we eventually want to parse from
DW_TAG_inlined_subroutine DIEs refer to the file name table stored in
the .debug_line section. To resolve those DW_AT_call_file attributes,
we need access to that table after parsing of the appropriate
.debug_line bits is done. This patch adds support for extracting that
information from the .debug_line parsing process.
| Assignee | ||
Comment 24•6 years ago
|
||
We record the file and line that these subroutines were inlined from.
We'll use that information to provide more coarse-grained line
information in the next patch.
Depends on D25469
| Assignee | ||
Comment 25•6 years ago
|
||
Depends on D25470
| Assignee | ||
Comment 26•6 years ago
|
||
DW_TAG_subprogram DIEs sometimes have child DW_TAG_lexical_block DIEs
which in turn contain child DW_TAG_inlined_subroutine DIEs that we woud
like to look at. If we skip the DW_TAG_inlined_subroutine DIEs, we miss
important information. We therefore need to look through the
DW_TAG_lexical_block DIEs to find the DIEs that we are interested in.
Depends on D25471
| Assignee | ||
Comment 27•6 years ago
|
||
After replacing precise line information from .debug_line with coarse
line information from DW_AT_call_{file,line}, it's very likely that
adjacent line records actually refer to identical file and line
numbers. Such adjacent records are not really useful and take up more
space than they should in the symbol file. We might as well merge them
and save ourselves some space.
Depends on D25472
Comment 28•6 years ago
|
||
This is great stuff Nathan, thanks for dealing with this. From my POV this is all looking good but I don't know if we'll be able to upstream it mostly because upstream might have its own opinion on how to do this. Still this is going to be a huge improvement in the quality of the stacks we get.
| Assignee | ||
Comment 29•6 years ago
|
||
emilio and kats were kind enough to provide some other examples of crash reports with bad stacks. Consider https://crash-stats.mozilla.com/report/index/8f912656-156f-4c1b-8bd7-571850190329 where we start with:
0 libxul.so selectors::matching::matches_complex_selector_internal servo/components/style/gecko_bindings/sugar/ns_t_array.rs:17 context
1 libxul.so style::dom_apis::query_selector_slow servo/components/selectors/matching.rs:342 cfi
The first frame is a little weird: we're said to be in the selectors crate, but the source file is bits for nsTArray? The second line is strange for much the same reason.
Reprocessing the crash with new symbol information gives:
0 libxul.so!selectors::matching::matches_complex_selector_internal [matching.rs : 488 + 0x292]
1 libxul.so!style::dom_apis::query_selector_slow [dom_apis.rs : 573 + 0x170]
which corresponds to
https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/selectors/matching.rs#l488
https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/style/dom_apis.rs#l573
Which gives us at least better pointers to where we were on the call stack, even if it's not immediately obvious how we get into nsTArray bits from that matches_complex_selector_internal line.
Or https://crash-stats.mozilla.com/report/index/4145fdb1-7a34-42d1-a7c9-8b3df0190330, where we have some nonsense frames in the middle of the call stack:
9 libxul.so webrender::renderer::TextureResolver::bind src/libcore/option.rs:322 cfi
10 libxul.so webrender::renderer::Renderer::draw_instanced_batch gfx/wr/webrender/src/renderer.rs:3398 cfi
11 libxul.so webrender::renderer::Renderer::draw_color_target::{{closure}} gfx/wr/webrender/src/renderer.rs:3900 cfi
12 libxul.so webrender::renderer::Renderer::draw_color_target gfx/wr/webrender/src/renderer.rs:3734 cfi
TextureResolver::bind doesn't live in libcore.
Reprocessing the crash with new symbols, we get:
9 libxul.so!webrender::renderer::TextureResolver::bind [renderer.rs : 1053 + 0x8a]
10 libxul.so!webrender::renderer::Renderer::draw_instanced_batch [renderer.rs : 3398 + 0x10]
11 libxul.so!webrender::renderer::Renderer::draw_color_target::{{closure}} [renderer.rs : 3900 + 0x5]
12 libxul.so!webrender::renderer::Renderer::draw_color_target [renderer.rs : 3734 + 0x15]
and that tells us we're crashing inside TextureResolver::bind at:
which makes a lot more sense.
Comment 30•6 years ago
|
||
Comment 31•6 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/578c94538897
https://hg.mozilla.org/mozilla-central/rev/589e276c75fa
https://hg.mozilla.org/mozilla-central/rev/b3e5b74ed19f
https://hg.mozilla.org/mozilla-central/rev/d048bcf083e5
https://hg.mozilla.org/mozilla-central/rev/17040bb20e25
Updated•6 years ago
|
Description
•