Breakpad Linux Dumper: support inlined functions

RESOLVED FIXED in Firefox 68

Status

()

defect
RESOLVED FIXED
10 years ago
2 months ago

People

(Reporter: jimb, Assigned: froydnj)

Tracking

(Blocks 1 bug)

unspecified
mozilla68
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 fixed)

Details

Attachments

(6 attachments)

Reporter

Description

10 years ago
At the moment, Breakpad's support for dumping DWARF on Linux does not support inlined functions.

This may require a slight modification to the Breakpad symbol file format, which has no special support for inlined functions. In particular, the processor assumes that all FUNC records cover disjoint address ranges, so if f has been inlined into g, you can't say:

    FUNC 1000 50 0 g
    FUNC 1020 10 0 f

to say that g covers the 0x50 bytes starting at 0x1000, but within that range the 0x10 bytes at 0x1020 are an inlined instance of f. Breakpad instead requires you to say:

    FUNC 1000 20 0 g
    FUNC 1020 10 0 f
    FUNC 1030 20 0 g

That is, you have to split g into two "functions".

Which is baloney, of course. Nested FUNC records like the above should be permitted, and the processor should choose the innermost FUNC covering the address in question.
Reporter

Updated

10 years ago
Assignee: nobody → jim
I think on other platforms you just get data that looks like:
FILE 1 foo.c
FILE 2 foo.h
FUNC 1000 50 0 g
1000 4 1 1
1004 4 1 2
1008 4 3 1

It's not the worst thing ever, since you generally get the right source line, but it can be confusing.

Implementing this as described would require some changes to the processor's data structures, because it currently uses a RangeMap to store the function info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#171
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/range_map.h
and RangeMap doesn't permit overlapping address ranges.
You could switch to ContainedRangeMap, which allows for fully contained child address ranges:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/contained_range_map.h
That'd probably have a negative perf impact, but I guess there's not much you can do about that.
Reporter

Comment 2

10 years ago
Yeah, I just noticed that, even though functioninfo.cc does recognize DW_TAG_inlined_subroutine, it doesn't seem to actually output an entry for the inlined function.

I'm really glad to hear about ContainedRangeMap.  It seems like, in the absence of inlined functions, that degenerates into an ordinary map, so at least on first principles I wouldn't expect the performance hit would be too much.

The thing is, if we just list the inlined function *instead* of the function it was inlined into, that's pretty confusing, because the caller is something unconnected.
Reporter

Comment 3

10 years ago
However, since I see now that other platforms don't support inlined functions, I don't think this should block 517824 any more.
No longer blocks: 517824
(In reply to comment #2)
> The thing is, if we just list the inlined function *instead* of the function it
> was inlined into, that's pretty confusing, because the caller is something
> unconnected.

No, it's slightly more confusing than that, because the function name is the containing function, but the source file+line are that of the inlined function.
On Windows, we appear to completely skip inlined code. See bug 492675 comment 39 for an example. AddImportantRules is inlined into nsStyleSet::FileRules, and the breakpad output for nsStyleSet::FileRules just has a line like:
101831 88 573 42076

which encompasses the entire inlined function, but represents it as a single line in the containing function.
Reporter

Updated

9 years ago
Duplicate of this bug: 563776
Reporter

Comment 7

7 years ago
I'm not actively working on this, and I don't have plans to in the near future, so I'm de-assigning myself to it.
Assignee: jimb → nobody
I'd like to propose a different format change to handle this. I don't think we should have overlapping FUNC ranges for inlined functions, because that would probably lead to a lot of repetition of the function names of inlined functions. I'd keep the FUNC lines the way they are and only have them for the outer functions.

We can have a table at the beginning of the symbol dump that lists the function names of all functions that get inlined anywhere, similar to the FILE table:
INLINE 0 SomeFunction
INLINE 1 SomeOtherFunction

And then we can refer to those inlined functions using the pattern
<address> <line_size> <line_number> <file_number> <inline_number>
which is very similar to the existing pattern
<address> <line_size> <line_number> <file_number>
, just with an extra field at the end.

And then we can allow these ranges to overlap / nest.

Here's an example:

dom/base/nsAttrAndChildArray.cpp:
> 846 bool
> 847 nsAttrAndChildArray::GrowBy(uint32_t aGrowSize)
> 848 {
> 849   CheckedUint32 size = 0;
> 850   if (mImpl) {
> 851     size += mImpl->mBufferSize;
> 852     size += NS_IMPL_EXTRA_SIZE;
> 853     if (!size.isValid()) {
> 854       return false;
> 855     }
> 856   }

mfbt/CheckedInt.h:
> 256 IsAddValid(T aX, T aY)
> 257 {
> [...]
> 267  return IsSigned<T>::value
> 268          ? HasSignBit(BinaryComplement(T((result ^ aX) & (result ^ aY))))
> 269          : BinaryComplement(aX) >= aY;
> 270 }
> [...]
> 689
> 690 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Add, +)
> 691 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Sub, -)
> [...]
> 756
> 757 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(+, +=)
> 758 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(*, *=)
> 759 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(-, -=)

Old:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]

New:
> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602
> [...]

or as ascii art:

>                   |---3---|         IsAddValid
>           |-------e-------|         CheckedUint32::operator+
>           |----------17-----------| CheckedUint32::operator+=
>   |---4---|----------17-----------| nsAttrAndChildArray::GrowBy
>   |---4---|---b---|---3---|---9---|
> f2828f  f28293  f2829e  f282a1  f282aa

So we still have the existing
<address> <line_size> <line_number> <file_number>
pattern, which always refers to the outer function (as defined by the current FUNC section), and ranges of this type are non-overlapping.

Only the new
<address> <line_size> <line_number> <file_number> <inline_number>
ranges overlap, with each other and with the existing ranges.

This gives us easy backwards compatibility for consumers that ignore the new pattern. They will simply assign all addresses within an outer function to the file and lines of that outer function.

And for every "stack frame", we now have all three pieces of information: function name, file, and line number.
Reporter

Comment 9

2 years ago
I like the general idea very much: commoning out the names in their own table, and then citing that table from records whose ranges can overlap.

From looking at our breakpad reader code[1], it seems that the parser will complain about a source line record with an extra field just as much as it would complain about an unrecognized record type. So I don't think we get backwards compatibility. So I think we might as well just invent a new record type ("INLINED"?) for the records showing inlined function calls.

[1]: http://searchfox.org/mozilla-central/rev/3a3af33f513071ea829debdfbc628caebcdf6996/toolkit/crashreporter/google-breakpad/src/processor/basic_source_line_resolver.cc#534-536
Reporter

Comment 10

2 years ago
So the parser would manage the stack by first considering the start address of the new line record, and popping inlined frames from the stack until it finds a frame that still covers the new start address; and then pushing an inlined frame when it sees one of these new records.

If we have to adapt consumers anyway, then we could have plain line records implicitly belong to the function of the youngest frame, and avoid repeating the function number on every line record within an inlined call.
(In reply to Jim Blandy :jimb from comment #10)
> So the parser would manage the stack by first considering the start address
> of the new line record, and popping inlined frames from the stack until it
> finds a frame that still covers the new start address; and then pushing an
> inlined frame when it sees one of these new records.

That's exactly correct.

> If we have to adapt consumers anyway, then we could have plain line records
> implicitly belong to the function of the youngest frame, and avoid repeating
> the function number on every line record within an inlined call.

That's an interesting idea. This would help if several consecutive lines belong to the same inlined function. So effectively you'd only have to mention the <inline_number> if the current line in the dump describes a different function than the previous line.
But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

In the example I gave in comment 8, every <inline_number> line refers to a different function than the previous line, so it wouldn't save us anything there. And we have an additional annotation in the line that takes us back to the outer function.
But this example is probably not representative.

> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 481 nsAttrAndChildArray::GrowBy(unsigned int)
(this line is new ^)
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602 481
(and this line has ^^^)
> [...]
What does the DWARF look like for an inlined function? (I know I've seen it before but I can't recall offhand.)
It uses TAG_inlined_subroutine. I'm attaching the output of
$ dwarfdump --lookup 0xdbb5a8 XUL.dSYM
Reporter

Comment 14

2 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> What does the DWARF look like for an inlined function? (I know I've seen it
> before but I can't recall offhand.)

I have a truly beautiful explanation of this, which this multi-gigabyte bug database is unfortunately too small to contain.

(It's very hairy; see the DWARF spec)
Reporter

Comment 15

2 years ago
> But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

I'm not sure I understand this. If we're managing the inlined instance stack as described in comment 10, it seems to me that we always know when a given line entry is not covered by any inlined instance. The line entries covered by the outer FUNC should just be those for which the stack is empty. No?
Reporter

Comment 16

2 years ago
(In reply to Jim Blandy :jimb from comment #14)
> I have a truly beautiful explanation of this, which this multi-gigabyte bug
> database is unfortunately too small to contain.

Less flippantly:

A DW_TAG_subprogram DIE can have a DW_AT_inline attribute that says whether it was declared inline and whether it was actually inlined. If it was inlined, the DW_TAG_subprogram and its children are called an "abstract instance tree", and aren't supposed to mention any attributes that vary from one inlined instance to another, like code addresses.

When you have an inlined callee, the calling function should have a DW_TAG_inlined_subroutine child whose DW_AT_abstract_origin attribute points to the DW_TAG_subroutine at the root of the abstract instance tree. The DW_TAG_inlined_subroutine DIE then supplies all the information specific to this inlined instance that was missing from the abstract instance tree. Each DW_TAG_inlined_subroutine DIE is called a "concrete inlined instance".

When you have a non-inlined instance of a function with a DW_AT_inline attribute, that's called a "concrete out-of-line instance". It's like a concrete inlined instance - it has a DW_AT_abstract_origin attribute, and supplements that with information specific to this out-of-line instance - but it uses DW_TAG_subroutine tag instead of DW_TAG_inlined_subroutine. These are supposed to appear as siblings of the abstract instance, but could appear elsewhere.

For example:

$ cat inline.c
cat: inline.c: No such file or directory
$ cat inline.cc
inline int f(int x) { return x*x; }

int g(int x) { return f(x) * f(x); }
$ g++ -g -O2 -c inline.cc
$ objdump -dr inline.o

inline.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z1gi>:
   0:	89 f8                	mov    %edi,%eax
   2:	0f af c7             	imul   %edi,%eax
   5:	0f af c0             	imul   %eax,%eax
   8:	c3                   	retq   
$ readelf -wi inline.o
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x99 (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x26): GNU C++14 6.4.1 20170727 (Red Hat 6.4.1-1) -mtune=generic -march=x86-64 -g -O2
    <10>   DW_AT_language    : 4	(C++)
    <11>   DW_AT_name        : (indirect string, offset: 0x16): inline.cc
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x6): /home/jimb/play
    <19>   DW_AT_low_pc      : 0x0
    <21>   DW_AT_high_pc     : 0x9
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : g
    <30>   DW_AT_decl_file   : 1
    <31>   DW_AT_decl_line   : 3
    <32>   DW_AT_linkage_name: (indirect string, offset: 0x0): _Z1gi
    <36>   DW_AT_type        : <0x7d>
    <3a>   DW_AT_low_pc      : 0x0
    <42>   DW_AT_high_pc     : 0x9
    <4a>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
    <4c>   DW_AT_GNU_all_call_sites: 1
    <4c>   DW_AT_sibling     : <0x7d>
 <2><50>: Abbrev Number: 3 (DW_TAG_formal_parameter)
    <51>   DW_AT_name        : x
    <53>   DW_AT_decl_file   : 1
    <54>   DW_AT_decl_line   : 3
    <55>   DW_AT_type        : <0x7d>
    <59>   DW_AT_location    : 1 byte block: 55 	(DW_OP_reg5 (rdi))
 <2><5b>: Abbrev Number: 4 (DW_TAG_inlined_subroutine)
    <5c>   DW_AT_abstract_origin: <0x84>
    <60>   DW_AT_low_pc      : 0x2
    <68>   DW_AT_high_pc     : 0x3
    <70>   DW_AT_call_file   : 1
    <71>   DW_AT_call_line   : 3
 <3><72>: Abbrev Number: 5 (DW_TAG_formal_parameter)
    <73>   DW_AT_abstract_origin: <0x92>
    <77>   DW_AT_location    : 0x0 (location list)
 <3><7b>: Abbrev Number: 0
 <2><7c>: Abbrev Number: 0
 <1><7d>: Abbrev Number: 6 (DW_TAG_base_type)
    <7e>   DW_AT_byte_size   : 4
    <7f>   DW_AT_encoding    : 5	(signed)
    <80>   DW_AT_name        : int
 <1><84>: Abbrev Number: 7 (DW_TAG_subprogram)
    <85>   DW_AT_external    : 1
    <85>   DW_AT_name        : f
    <87>   DW_AT_decl_file   : 1
    <88>   DW_AT_decl_line   : 1
    <89>   DW_AT_linkage_name: (indirect string, offset: 0x20): _Z1fi
    <8d>   DW_AT_type        : <0x7d>
    <91>   DW_AT_inline      : 3	(declared as inline and inlined)
 <2><92>: Abbrev Number: 8 (DW_TAG_formal_parameter)
    <93>   DW_AT_name        : x
    <95>   DW_AT_decl_file   : 1
    <96>   DW_AT_decl_line   : 1
    <97>   DW_AT_type        : <0x7d>
 <2><9b>: Abbrev Number: 0
 <1><9c>: Abbrev Number: 0

$
Assignee

Comment 17

4 months ago

I am going to try and tackle this work this quarter. I don't know if we ever settled on a design for how to represent this information in the breakpad symbol file, though? Does it really matter that much if we keep compatibility with upstream at this point?

Assignee

Comment 18

4 months ago

Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?

Flags: needinfo?(gsvelto)

(In reply to Nathan Froyd [:froydnj] from comment #18)

Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?

The changes would be in the part of breakpad that we haven't forked. That doesn't really matter in the sense that we can always fork the rest too. I would still propose the changes here for integration upstream because others might find them useful; however I wouldn't wait on upstream to accept them before doing them in mozilla-central.

Flags: needinfo?(gsvelto)
Assignee

Comment 20

3 months ago

Changing the breakpad symbol format seems fraught with all kinda of problems. I think the simpler idea is to do the following:

  1. Parse all DW_TAG_inlined_subroutine direct children of DW_TAG_subprogram DIEs, recording the extents for each and the DW_AT_call_{file,line} attributes. This step gives you the original (subject to compiler constraints) source line for inlined functions and their inlinees (?). If you squint at it correctly, this information is essentially .debug_line information represented in a slightly different form.
  2. We now have two sets of line information: the "precise" information from .debug_line, telling you every detail of all inlined functions, and the "coarse" information for the first level of inlining performed by the compiler.
  3. Before we process the precise line information, we're going to determine, for each "coarse" line, which bits of "precise" line information the coarse line overlaps. We're then going to use the coarse line information instead of the overlapped bits. For the example in comment 8, we previously had:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]

and we'll now have, just focusing on the FUNC bits:

> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f282aa 3 853 602
> [...]

which should be slightly nicer? There's a diff -y of NSS here:

https://gist.github.com/froydnj/9405314a5492c40353e916c5cbfbbfcb

You can search for ParseRFC1485AVA or CERT_CreateAVAFromSECItem to get an idea of how it looks in practice. The symbol file is ~4% smaller, which is a nice side-effect.

I'm not sure how this looks on actual crashes yet; that step is getting investigated tomorrow. This scheme does throw away information, which is never great. For particularly complicated lines, you're just faced with determining which bits of the line actually triggered the crash, instead of knowing that the crash started from a call to some inlined function; I think this is an improvement, but I'm not entirely sure. It would be nice to have the equivalent of llvm-symbolizer -inlines, which prints inlined "frames", but I'm not sure how to do that without reworking Breakpad's symbol format.

Also, this work was done only for DWARF; I don't know if the idea would translate well to Windows.

Assignee

Comment 21

3 months ago

The first experiment has not gone well. I looked at https://crash-stats.mozilla.com/report/index/f8e84a6c-079a-410c-8bf0-3588f0190325, which is connected to bug 1521095. Collapsing line information where appropriate reduces the breakpad symbol file size by ~25% for libxul, which is a nice size win. Unfortunately, we're stymied by rustc not giving us enough debug information; we still get:

 0  libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h : 314 + 0x0]
 1  libxul.so!GeckoCrash [nsAppRunner.cpp : 5093 + 0xa]
 2  libxul.so!gkrust_shared::panic_hook [lib.rs : 240 + 0x9]
 3  libxul.so!core::ops::function::Fn::call [function.rs : 78 + 0x5]
 4  libxul.so!std::panicking::rust_panic_with_hook [panicking.rs : 495 + 0x6]
 5  libxul.so!std::panicking::continue_panic_fmt [panicking.rs : 398 + 0x14]
 6  libxul.so!rust_begin_unwind [panicking.rs : 325 + 0x5]
 7  libxul.so!core::panicking::panic_fmt [panicking.rs : 95 + 0x5]
 8  libxul.so!core::option::expect_failed [option.rs : 1008 + 0x11]
 9  libxul.so!webrender::resource_cache::ResourceCache::get_cached_image + 0x211
...

Note that we were not able to get file and line number information for the get_cached_image frame, similar to what's happening today. That's because there's simply no line information for the crashing address. The lines in the symbol file look like:

FUNC 5697ba0 2ff 0 webrender::resource_cache::ResourceCache::get_cached_image
5697ba0 13 1520 4571
5697bb3 e8 1522 4571
5697c9b c 1522 4571
5697ca7 100 1522 4571 # covers to 5697da7
# crash happens here, but we have a hole from [5697da7,5697db3)
5697db3 14 1522 4571
5697dc7 83 1523 4571
5697e4a 29 1523 4571
5697e73 f 1524 4571
5697e82 1d 1523 4571

The closest line information that we get is:

https://hg.mozilla.org/mozilla-central/annotate/4a692c812a3fe2f893d2a6e25b9490b38415c907/gfx/wr/webrender/src/resource_cache.rs#l1522

which would be better than nothing, but it's not obvious how we get from that line to crashing in option::expect_failed.

To be clear, even if we implemented the more complete proposals described in the bug initially, we'd have the exact same problem. We'd need compiler changes and/or heuristics for extending line information to cover holes to fix cases like the ones in bug 1521095.

Assignee

Comment 22

3 months ago

The second experiment is more encouraging. For bug 1508714 and the crash https://crash-stats.mozilla.com/report/index/32351114-a0c4-4d40-a587-f7ef10181120, we now get:

...
 7  libxul.so!core::panicking::panic_fmt [panicking.rs : 77 + 0x5]
 8  libxul.so!core::option::expect_failed [option.rs : 1000 + 0x11]
 9  libxul.so!<core::iter::Map<I, F> as core::iter::iterator::Iterator>::next [mod.rs : 1394 + 0xa6]
10  libxul.so!<webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize [moz2d_renderer.rs : 479 + 0x2f]
11  libxul.so!webrender::frame_builder::FrameBuilder::build [resource_cache.rs : 1471 + 0x6]

The moz2d_renderer.rs line in frame 10 points at:

https://hg.mozilla.org/mozilla-central/file/eeddcefcdad847bf8a5737153079e9619ee5aa66/gfx/webrender_bindings/src/moz2d_renderer.rs#l479

The mod.rs for frame 9 is libcore/iter/mod.rs.

Frames 9 and 10 were previously symbolicated as:

9 	libxul.so 	<core::iter::Map<I, F> as core::iter::iterator::Iterator>::next 	libcore/option.rs:312 	cfi
10 	libxul.so 	<webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize 	liballoc/vec.rs:1903 	cfi

Which is nonsense: iterator functionality isn't defined in option.rs, and webrender_bindings:: code isn't defined in liballoc/vec.rs. So at least in this case, we'd improve on the status quo. Whether that's enough of an improvement to make crash debugging easier, I'm not sure.

Assignee

Comment 23

3 months ago

The DW_AT_call_file attributes that we eventually want to parse from
DW_TAG_inlined_subroutine DIEs refer to the file name table stored in
the .debug_line section. To resolve those DW_AT_call_file attributes,
we need access to that table after parsing of the appropriate
.debug_line bits is done. This patch adds support for extracting that
information from the .debug_line parsing process.

Assignee

Comment 24

3 months ago

We record the file and line that these subroutines were inlined from.
We'll use that information to provide more coarse-grained line
information in the next patch.

Depends on D25469

Assignee

Comment 26

3 months ago

DW_TAG_subprogram DIEs sometimes have child DW_TAG_lexical_block DIEs
which in turn contain child DW_TAG_inlined_subroutine DIEs that we woud
like to look at. If we skip the DW_TAG_inlined_subroutine DIEs, we miss
important information. We therefore need to look through the
DW_TAG_lexical_block DIEs to find the DIEs that we are interested in.

Depends on D25471

Assignee

Comment 27

3 months ago

After replacing precise line information from .debug_line with coarse
line information from DW_AT_call_{file,line}, it's very likely that
adjacent line records actually refer to identical file and line
numbers. Such adjacent records are not really useful and take up more
space than they should in the symbol file. We might as well merge them
and save ourselves some space.

Depends on D25472

This is great stuff Nathan, thanks for dealing with this. From my POV this is all looking good but I don't know if we'll be able to upstream it mostly because upstream might have its own opinion on how to do this. Still this is going to be a huge improvement in the quality of the stacks we get.

Assignee

Comment 29

3 months ago

emilio and kats were kind enough to provide some other examples of crash reports with bad stacks. Consider https://crash-stats.mozilla.com/report/index/8f912656-156f-4c1b-8bd7-571850190329 where we start with:

0 	libxul.so 	selectors::matching::matches_complex_selector_internal 	servo/components/style/gecko_bindings/sugar/ns_t_array.rs:17 	context
1 	libxul.so 	style::dom_apis::query_selector_slow 	servo/components/selectors/matching.rs:342 	cfi

The first frame is a little weird: we're said to be in the selectors crate, but the source file is bits for nsTArray? The second line is strange for much the same reason.

Reprocessing the crash with new symbol information gives:

 0  libxul.so!selectors::matching::matches_complex_selector_internal [matching.rs : 488 + 0x292]
 1  libxul.so!style::dom_apis::query_selector_slow [dom_apis.rs : 573 + 0x170]

which corresponds to

https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/selectors/matching.rs#l488
https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/style/dom_apis.rs#l573

Which gives us at least better pointers to where we were on the call stack, even if it's not immediately obvious how we get into nsTArray bits from that matches_complex_selector_internal line.

Or https://crash-stats.mozilla.com/report/index/4145fdb1-7a34-42d1-a7c9-8b3df0190330, where we have some nonsense frames in the middle of the call stack:

9 	libxul.so 	webrender::renderer::TextureResolver::bind 	src/libcore/option.rs:322 	cfi
10 	libxul.so 	webrender::renderer::Renderer::draw_instanced_batch 	gfx/wr/webrender/src/renderer.rs:3398 	cfi
11 	libxul.so 	webrender::renderer::Renderer::draw_color_target::{{closure}} 	gfx/wr/webrender/src/renderer.rs:3900 	cfi
12 	libxul.so 	webrender::renderer::Renderer::draw_color_target 	gfx/wr/webrender/src/renderer.rs:3734 	cfi

TextureResolver::bind doesn't live in libcore.

Reprocessing the crash with new symbols, we get:

 9  libxul.so!webrender::renderer::TextureResolver::bind [renderer.rs : 1053 + 0x8a]
10  libxul.so!webrender::renderer::Renderer::draw_instanced_batch [renderer.rs : 3398 + 0x10]
11  libxul.so!webrender::renderer::Renderer::draw_color_target::{{closure}} [renderer.rs : 3900 + 0x5]
12  libxul.so!webrender::renderer::Renderer::draw_color_target [renderer.rs : 3734 + 0x15]

and that tells us we're crashing inside TextureResolver::bind at:

https://hg.mozilla.org/mozilla-central/annotate/69e9ee0ef3dd97a412d90be9c8377d660d08cbc2/gfx/wr/webrender/src/renderer.rs#l1053

which makes a lot more sense.

Comment 30

2 months ago
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/578c94538897
part 1 - extract file information out of .debug_line parsing; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/589e276c75fa
part 2 - parse DW_TAG_inlined_subroutine DIEs; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/b3e5b74ed19f
part 3 - replace line information for inlined functions; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/d048bcf083e5
part 4 - look through lexical block DIEs where appropriate; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/17040bb20e25
part 5 - merge adjacent line records where possible; r=gsvelto
Assignee: nobody → nfroyd
You need to log in before you can comment on or make changes to this bug.