524410 - Breakpad Linux Dumper: support inlined functions

Reporter

Description

•

15 years ago

At the moment, Breakpad's support for dumping DWARF on Linux does not support inlined functions.

This may require a slight modification to the Breakpad symbol file format, which has no special support for inlined functions. In particular, the processor assumes that all FUNC records cover disjoint address ranges, so if f has been inlined into g, you can't say:

    FUNC 1000 50 0 g
    FUNC 1020 10 0 f

to say that g covers the 0x50 bytes starting at 0x1000, but within that range the 0x10 bytes at 0x1020 are an inlined instance of f. Breakpad instead requires you to say:

    FUNC 1000 20 0 g
    FUNC 1020 10 0 f
    FUNC 1030 20 0 g

That is, you have to split g into two "functions".

Which is baloney, of course. Nested FUNC records like the above should be permitted, and the processor should choose the innermost FUNC covering the address in question.

Jim Blandy :jimb

Reporter

Updated

•

15 years ago

Assignee: nobody → jim

(not currently active) Ted Mielczarek

Comment 1

•

15 years ago

I think on other platforms you just get data that looks like:
FILE 1 foo.c
FILE 2 foo.h
FUNC 1000 50 0 g
1000 4 1 1
1004 4 1 2
1008 4 3 1

It's not the worst thing ever, since you generally get the right source line, but it can be confusing.

Implementing this as described would require some changes to the processor's data structures, because it currently uses a RangeMap to store the function info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#171
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/range_map.h
and RangeMap doesn't permit overlapping address ranges.
You could switch to ContainedRangeMap, which allows for fully contained child address ranges:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/contained_range_map.h
That'd probably have a negative perf impact, but I guess there's not much you can do about that.

Jim Blandy :jimb

Reporter

Comment 2

•

15 years ago

Yeah, I just noticed that, even though functioninfo.cc does recognize DW_TAG_inlined_subroutine, it doesn't seem to actually output an entry for the inlined function.

I'm really glad to hear about ContainedRangeMap.  It seems like, in the absence of inlined functions, that degenerates into an ordinary map, so at least on first principles I wouldn't expect the performance hit would be too much.

The thing is, if we just list the inlined function *instead* of the function it was inlined into, that's pretty confusing, because the caller is something unconnected.

Jim Blandy :jimb

Reporter

Comment 3

•

15 years ago

However, since I see now that other platforms don't support inlined functions, I don't think this should block 517824 any more.

No longer blocks: 517824

(not currently active) Ted Mielczarek

Comment 4

•

15 years ago

(In reply to comment #2)
> The thing is, if we just list the inlined function *instead* of the function it
> was inlined into, that's pretty confusing, because the caller is something
> unconnected.

No, it's slightly more confusing than that, because the function name is the containing function, but the source file+line are that of the inlined function.

(not currently active) Ted Mielczarek

Comment 5

•

15 years ago

On Windows, we appear to completely skip inlined code. See bug 492675 comment 39 for an example. AddImportantRules is inlined into nsStyleSet::FileRules, and the breakpad output for nsStyleSet::FileRules just has a line like:
101831 88 573 42076

which encompasses the entire inlined function, but represents it as a single line in the containing function.

Jim Blandy :jimb

Reporter

Comment 7

•

12 years ago

I'm not actively working on this, and I don't have plans to in the near future, so I'm de-assigning myself to it.

Assignee: jimb → nobody

Karl Tomlinson (:karlt)

Updated

•

12 years ago

Blocks: 793416

Jeff Muizelaar [:jrmuizel]

Updated

•

9 years ago

Blocks: 1148641

Markus Stange [:mstange]

Comment 8

•

7 years ago

I'd like to propose a different format change to handle this. I don't think we should have overlapping FUNC ranges for inlined functions, because that would probably lead to a lot of repetition of the function names of inlined functions. I'd keep the FUNC lines the way they are and only have them for the outer functions.

We can have a table at the beginning of the symbol dump that lists the function names of all functions that get inlined anywhere, similar to the FILE table:
INLINE 0 SomeFunction
INLINE 1 SomeOtherFunction

And then we can refer to those inlined functions using the pattern
<address> <line_size> <line_number> <file_number> <inline_number>
which is very similar to the existing pattern
<address> <line_size> <line_number> <file_number>
, just with an extra field at the end.

And then we can allow these ranges to overlap / nest.

Here's an example:

dom/base/nsAttrAndChildArray.cpp:
> 846 bool
> 847 nsAttrAndChildArray::GrowBy(uint32_t aGrowSize)
> 848 {
> 849   CheckedUint32 size = 0;
> 850   if (mImpl) {
> 851     size += mImpl->mBufferSize;
> 852     size += NS_IMPL_EXTRA_SIZE;
> 853     if (!size.isValid()) {
> 854       return false;
> 855     }
> 856   }

mfbt/CheckedInt.h:
> 256 IsAddValid(T aX, T aY)
> 257 {
> [...]
> 267  return IsSigned<T>::value
> 268          ? HasSignBit(BinaryComplement(T((result ^ aX) & (result ^ aY))))
> 269          : BinaryComplement(aX) >= aY;
> 270 }
> [...]
> 689
> 690 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Add, +)
> 691 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Sub, -)
> [...]
> 756
> 757 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(+, +=)
> 758 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(*, *=)
> 759 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(-, -=)

Old:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]

New:
> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602
> [...]

or as ascii art:

>                   |---3---|         IsAddValid
>           |-------e-------|         CheckedUint32::operator+
>           |----------17-----------| CheckedUint32::operator+=
>   |---4---|----------17-----------| nsAttrAndChildArray::GrowBy
>   |---4---|---b---|---3---|---9---|
> f2828f  f28293  f2829e  f282a1  f282aa

So we still have the existing
<address> <line_size> <line_number> <file_number>
pattern, which always refers to the outer function (as defined by the current FUNC section), and ranges of this type are non-overlapping.

Only the new
<address> <line_size> <line_number> <file_number> <inline_number>
ranges overlap, with each other and with the existing ranges.

This gives us easy backwards compatibility for consumers that ignore the new pattern. They will simply assign all addresses within an outer function to the file and lines of that outer function.

And for every "stack frame", we now have all three pieces of information: function name, file, and line number.

Jim Blandy :jimb

Reporter

Comment 9

•

7 years ago

I like the general idea very much: commoning out the names in their own table, and then citing that table from records whose ranges can overlap.

From looking at our breakpad reader code[1], it seems that the parser will complain about a source line record with an extra field just as much as it would complain about an unrecognized record type. So I don't think we get backwards compatibility. So I think we might as well just invent a new record type ("INLINED"?) for the records showing inlined function calls.

[1]: http://searchfox.org/mozilla-central/rev/3a3af33f513071ea829debdfbc628caebcdf6996/toolkit/crashreporter/google-breakpad/src/processor/basic_source_line_resolver.cc#534-536

Jim Blandy :jimb

Reporter

Comment 10

•

7 years ago

So the parser would manage the stack by first considering the start address of the new line record, and popping inlined frames from the stack until it finds a frame that still covers the new start address; and then pushing an inlined frame when it sees one of these new records.

If we have to adapt consumers anyway, then we could have plain line records implicitly belong to the function of the youngest frame, and avoid repeating the function number on every line record within an inlined call.

Markus Stange [:mstange]

Comment 11

•

7 years ago

(In reply to Jim Blandy :jimb from comment #10)
> So the parser would manage the stack by first considering the start address
> of the new line record, and popping inlined frames from the stack until it
> finds a frame that still covers the new start address; and then pushing an
> inlined frame when it sees one of these new records.

That's exactly correct.

> If we have to adapt consumers anyway, then we could have plain line records
> implicitly belong to the function of the youngest frame, and avoid repeating
> the function number on every line record within an inlined call.

That's an interesting idea. This would help if several consecutive lines belong to the same inlined function. So effectively you'd only have to mention the <inline_number> if the current line in the dump describes a different function than the previous line.
But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

In the example I gave in comment 8, every <inline_number> line refers to a different function than the previous line, so it wouldn't save us anything there. And we have an additional annotation in the line that takes us back to the outer function.
But this example is probably not representative.

> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 481 nsAttrAndChildArray::GrowBy(unsigned int)
(this line is new ^)
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602 481
(and this line has ^^^)
> [...]

(not currently active) Ted Mielczarek

Comment 12

•

7 years ago

What does the DWARF look like for an inlined function? (I know I've seen it before but I can't recall offhand.)

Markus Stange [:mstange]

Comment 13

•

7 years ago

Attached file "dwarfdump --lookup" output with TAG_inlined_subroutine — Details

It uses TAG_inlined_subroutine. I'm attaching the output of
$ dwarfdump --lookup 0xdbb5a8 XUL.dSYM

Jim Blandy :jimb

Reporter

Comment 14

•

7 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> What does the DWARF look like for an inlined function? (I know I've seen it
> before but I can't recall offhand.)

I have a truly beautiful explanation of this, which this multi-gigabyte bug database is unfortunately too small to contain.

(It's very hairy; see the DWARF spec)

Jim Blandy :jimb

Reporter

Comment 15

•

7 years ago

> But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

I'm not sure I understand this. If we're managing the inlined instance stack as described in comment 10, it seems to me that we always know when a given line entry is not covered by any inlined instance. The line entries covered by the outer FUNC should just be those for which the stack is empty. No?

Jim Blandy :jimb

Reporter

Comment 16

•

7 years ago

(In reply to Jim Blandy :jimb from comment #14)
> I have a truly beautiful explanation of this, which this multi-gigabyte bug
> database is unfortunately too small to contain.

Less flippantly:

A DW_TAG_subprogram DIE can have a DW_AT_inline attribute that says whether it was declared inline and whether it was actually inlined. If it was inlined, the DW_TAG_subprogram and its children are called an "abstract instance tree", and aren't supposed to mention any attributes that vary from one inlined instance to another, like code addresses.

When you have an inlined callee, the calling function should have a DW_TAG_inlined_subroutine child whose DW_AT_abstract_origin attribute points to the DW_TAG_subroutine at the root of the abstract instance tree. The DW_TAG_inlined_subroutine DIE then supplies all the information specific to this inlined instance that was missing from the abstract instance tree. Each DW_TAG_inlined_subroutine DIE is called a "concrete inlined instance".

When you have a non-inlined instance of a function with a DW_AT_inline attribute, that's called a "concrete out-of-line instance". It's like a concrete inlined instance - it has a DW_AT_abstract_origin attribute, and supplements that with information specific to this out-of-line instance - but it uses DW_TAG_subroutine tag instead of DW_TAG_inlined_subroutine. These are supposed to appear as siblings of the abstract instance, but could appear elsewhere.

For example:

$ cat inline.c
cat: inline.c: No such file or directory
$ cat inline.cc
inline int f(int x) { return x*x; }

int g(int x) { return f(x) * f(x); }
$ g++ -g -O2 -c inline.cc
$ objdump -dr inline.o

inline.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z1gi>:
   0:	89 f8                	mov    %edi,%eax
   2:	0f af c7             	imul   %edi,%eax
   5:	0f af c0             	imul   %eax,%eax
   8:	c3                   	retq   
$ readelf -wi inline.o
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x99 (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x26): GNU C++14 6.4.1 20170727 (Red Hat 6.4.1-1) -mtune=generic -march=x86-64 -g -O2
    <10>   DW_AT_language    : 4	(C++)
    <11>   DW_AT_name        : (indirect string, offset: 0x16): inline.cc
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x6): /home/jimb/play
    <19>   DW_AT_low_pc      : 0x0
    <21>   DW_AT_high_pc     : 0x9
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : g
    <30>   DW_AT_decl_file   : 1
    <31>   DW_AT_decl_line   : 3
    <32>   DW_AT_linkage_name: (indirect string, offset: 0x0): _Z1gi
    <36>   DW_AT_type        : <0x7d>
    <3a>   DW_AT_low_pc      : 0x0
    <42>   DW_AT_high_pc     : 0x9
    <4a>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
    <4c>   DW_AT_GNU_all_call_sites: 1
    <4c>   DW_AT_sibling     : <0x7d>
 <2><50>: Abbrev Number: 3 (DW_TAG_formal_parameter)
    <51>   DW_AT_name        : x
    <53>   DW_AT_decl_file   : 1
    <54>   DW_AT_decl_line   : 3
    <55>   DW_AT_type        : <0x7d>
    <59>   DW_AT_location    : 1 byte block: 55 	(DW_OP_reg5 (rdi))
 <2><5b>: Abbrev Number: 4 (DW_TAG_inlined_subroutine)
    <5c>   DW_AT_abstract_origin: <0x84>
    <60>   DW_AT_low_pc      : 0x2
    <68>   DW_AT_high_pc     : 0x3
    <70>   DW_AT_call_file   : 1
    <71>   DW_AT_call_line   : 3
 <3><72>: Abbrev Number: 5 (DW_TAG_formal_parameter)
    <73>   DW_AT_abstract_origin: <0x92>
    <77>   DW_AT_location    : 0x0 (location list)
 <3><7b>: Abbrev Number: 0
 <2><7c>: Abbrev Number: 0
 <1><7d>: Abbrev Number: 6 (DW_TAG_base_type)
    <7e>   DW_AT_byte_size   : 4
    <7f>   DW_AT_encoding    : 5	(signed)
    <80>   DW_AT_name        : int
 <1><84>: Abbrev Number: 7 (DW_TAG_subprogram)
    <85>   DW_AT_external    : 1
    <85>   DW_AT_name        : f
    <87>   DW_AT_decl_file   : 1
    <88>   DW_AT_decl_line   : 1
    <89>   DW_AT_linkage_name: (indirect string, offset: 0x20): _Z1fi
    <8d>   DW_AT_type        : <0x7d>
    <91>   DW_AT_inline      : 3	(declared as inline and inlined)
 <2><92>: Abbrev Number: 8 (DW_TAG_formal_parameter)
    <93>   DW_AT_name        : x
    <95>   DW_AT_decl_file   : 1
    <96>   DW_AT_decl_line   : 1
    <97>   DW_AT_type        : <0x7d>
 <2><9b>: Abbrev Number: 0
 <1><9c>: Abbrev Number: 0

$

Jeff Muizelaar [:jrmuizel]

Updated

•

7 years ago

Blocks: 1398533

Nathan Froyd [:froydnj]

Assignee

Comment 17

•

5 years ago

I am going to try and tackle this work this quarter. I don't know if we ever settled on a design for how to represent this information in the breakpad symbol file, though? Does it really matter that much if we keep compatibility with upstream at this point?

Nathan Froyd [:froydnj]

Assignee

Comment 18

•

5 years ago

Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto] (PTO)

Comment 19

•

5 years ago

(In reply to Nathan Froyd [:froydnj] from comment #18)

Gabriele, does either part of this work (dump_syms + minidump_stackwalker) live in the part of breakpad that we have officially forked? If it doesn't, do we need to complete this work upstream? Does it matter how much we change the symbol format to accommodate inline frames?

The changes would be in the part of breakpad that we haven't forked. That doesn't really matter in the sense that we can always fork the rest too. I would still propose the changes here for integration upstream because others might find them useful; however I wouldn't wait on upstream to accept them before doing them in mozilla-central.

Flags: needinfo?(gsvelto)

Nathan Froyd [:froydnj]

Assignee

Comment 20

•

5 years ago

Changing the breakpad symbol format seems fraught with all kinda of problems. I think the simpler idea is to do the following:

Parse all DW_TAG_inlined_subroutine direct children of DW_TAG_subprogram DIEs, recording the extents for each and the DW_AT_call_{file,line} attributes. This step gives you the original (subject to compiler constraints) source line for inlined functions and their inlinees (?). If you squint at it correctly, this information is essentially .debug_line information represented in a slightly different form.
We now have two sets of line information: the "precise" information from .debug_line, telling you every detail of all inlined functions, and the "coarse" information for the first level of inlining performed by the compiler.
Before we process the precise line information, we're going to determine, for each "coarse" line, which bits of "precise" line information the coarse line overlaps. We're then going to use the coarse line information instead of the overlapped bits. For the example in comment 8, we previously had:

> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]

and we'll now have, just focusing on the FUNC bits:

> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f282aa 3 853 602
> [...]

which should be slightly nicer? There's a diff -y of NSS here:

https://gist.github.com/froydnj/9405314a5492c40353e916c5cbfbbfcb

You can search for ParseRFC1485AVA or CERT_CreateAVAFromSECItem to get an idea of how it looks in practice. The symbol file is ~4% smaller, which is a nice side-effect.

I'm not sure how this looks on actual crashes yet; that step is getting investigated tomorrow. This scheme does throw away information, which is never great. For particularly complicated lines, you're just faced with determining which bits of the line actually triggered the crash, instead of knowing that the crash started from a call to some inlined function; I think this is an improvement, but I'm not entirely sure. It would be nice to have the equivalent of llvm-symbolizer -inlines, which prints inlined "frames", but I'm not sure how to do that without reworking Breakpad's symbol format.

Also, this work was done only for DWARF; I don't know if the idea would translate well to Windows.

Nathan Froyd [:froydnj]

Assignee

Comment 21

•

5 years ago

The first experiment has not gone well. I looked at https://crash-stats.mozilla.com/report/index/f8e84a6c-079a-410c-8bf0-3588f0190325, which is connected to bug 1521095. Collapsing line information where appropriate reduces the breakpad symbol file size by ~25% for libxul, which is a nice size win. Unfortunately, we're stymied by rustc not giving us enough debug information; we still get:

 0  libxul.so!MOZ_Crash(char const*, int, char const*) [Assertions.h : 314 + 0x0]
 1  libxul.so!GeckoCrash [nsAppRunner.cpp : 5093 + 0xa]
 2  libxul.so!gkrust_shared::panic_hook [lib.rs : 240 + 0x9]
 3  libxul.so!core::ops::function::Fn::call [function.rs : 78 + 0x5]
 4  libxul.so!std::panicking::rust_panic_with_hook [panicking.rs : 495 + 0x6]
 5  libxul.so!std::panicking::continue_panic_fmt [panicking.rs : 398 + 0x14]
 6  libxul.so!rust_begin_unwind [panicking.rs : 325 + 0x5]
 7  libxul.so!core::panicking::panic_fmt [panicking.rs : 95 + 0x5]
 8  libxul.so!core::option::expect_failed [option.rs : 1008 + 0x11]
 9  libxul.so!webrender::resource_cache::ResourceCache::get_cached_image + 0x211
...

Note that we were not able to get file and line number information for the get_cached_image frame, similar to what's happening today. That's because there's simply no line information for the crashing address. The lines in the symbol file look like:

FUNC 5697ba0 2ff 0 webrender::resource_cache::ResourceCache::get_cached_image
5697ba0 13 1520 4571
5697bb3 e8 1522 4571
5697c9b c 1522 4571
5697ca7 100 1522 4571 # covers to 5697da7
# crash happens here, but we have a hole from [5697da7,5697db3)
5697db3 14 1522 4571
5697dc7 83 1523 4571
5697e4a 29 1523 4571
5697e73 f 1524 4571
5697e82 1d 1523 4571

The closest line information that we get is:

https://hg.mozilla.org/mozilla-central/annotate/4a692c812a3fe2f893d2a6e25b9490b38415c907/gfx/wr/webrender/src/resource_cache.rs#l1522

which would be better than nothing, but it's not obvious how we get from that line to crashing in option::expect_failed.

To be clear, even if we implemented the more complete proposals described in the bug initially, we'd have the exact same problem. We'd need compiler changes and/or heuristics for extending line information to cover holes to fix cases like the ones in bug 1521095.

Nathan Froyd [:froydnj]

Assignee

Comment 22

•

5 years ago

The second experiment is more encouraging. For bug 1508714 and the crash https://crash-stats.mozilla.com/report/index/32351114-a0c4-4d40-a587-f7ef10181120, we now get:

...
 7  libxul.so!core::panicking::panic_fmt [panicking.rs : 77 + 0x5]
 8  libxul.so!core::option::expect_failed [option.rs : 1000 + 0x11]
 9  libxul.so!<core::iter::Map<I, F> as core::iter::iterator::Iterator>::next [mod.rs : 1394 + 0xa6]
10  libxul.so!<webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize [moz2d_renderer.rs : 479 + 0x2f]
11  libxul.so!webrender::frame_builder::FrameBuilder::build [resource_cache.rs : 1471 + 0x6]

The moz2d_renderer.rs line in frame 10 points at:

https://hg.mozilla.org/mozilla-central/file/eeddcefcdad847bf8a5737153079e9619ee5aa66/gfx/webrender_bindings/src/moz2d_renderer.rs#l479

The mod.rs for frame 9 is libcore/iter/mod.rs.

Frames 9 and 10 were previously symbolicated as:

9 	libxul.so 	<core::iter::Map<I, F> as core::iter::iterator::Iterator>::next 	libcore/option.rs:312 	cfi
10 	libxul.so 	<webrender_bindings::moz2d_renderer::Moz2dBlobRasterizer as webrender_api::image::AsyncBlobImageRasterizer>::rasterize 	liballoc/vec.rs:1903 	cfi

Which is nonsense: iterator functionality isn't defined in option.rs, and webrender_bindings:: code isn't defined in liballoc/vec.rs. So at least in this case, we'd improve on the status quo. Whether that's enough of an improvement to make crash debugging easier, I'm not sure.

Nathan Froyd [:froydnj]

Assignee

Comment 23

•

5 years ago

Attached file Bug 524410 - part 1 - extract file information out of .debug_line parsing; r=gsvelto — Details

The DW_AT_call_file attributes that we eventually want to parse from
DW_TAG_inlined_subroutine DIEs refer to the file name table stored in
the .debug_line section. To resolve those DW_AT_call_file attributes,
we need access to that table after parsing of the appropriate
.debug_line bits is done. This patch adds support for extracting that
information from the .debug_line parsing process.

Nathan Froyd [:froydnj]

Assignee

Comment 24

•

5 years ago

Attached file Bug 524410 - part 2 - parse DW_TAG_inlined_subroutine DIEs; r=gsvelto — Details

We record the file and line that these subroutines were inlined from.
We'll use that information to provide more coarse-grained line
information in the next patch.

Depends on D25469

Nathan Froyd [:froydnj]

Assignee

Comment 25

•

5 years ago

Attached file Bug 524410 - part 3 - replace line information for inlined functions; r=gsvelto — Details

Depends on D25470

Nathan Froyd [:froydnj]

Assignee

Comment 26

•

5 years ago

Attached file Bug 524410 - part 4 - look through lexical block DIEs where appropriate; r=gsvelto — Details

DW_TAG_subprogram DIEs sometimes have child DW_TAG_lexical_block DIEs
which in turn contain child DW_TAG_inlined_subroutine DIEs that we woud
like to look at. If we skip the DW_TAG_inlined_subroutine DIEs, we miss
important information. We therefore need to look through the
DW_TAG_lexical_block DIEs to find the DIEs that we are interested in.

Depends on D25471

Nathan Froyd [:froydnj]

Assignee

Comment 27

•

5 years ago

Attached file Bug 524410 - part 5 - merge adjacent line records where possible; r=gsvelto — Details

After replacing precise line information from .debug_line with coarse
line information from DW_AT_call_{file,line}, it's very likely that
adjacent line records actually refer to identical file and line
numbers. Such adjacent records are not really useful and take up more
space than they should in the symbol file. We might as well merge them
and save ourselves some space.

Depends on D25472

Gabriele Svelto [:gsvelto] (PTO)

Comment 28

•

5 years ago

This is great stuff Nathan, thanks for dealing with this. From my POV this is all looking good but I don't know if we'll be able to upstream it mostly because upstream might have its own opinion on how to do this. Still this is going to be a huge improvement in the quality of the stacks we get.

Nathan Froyd [:froydnj]

Assignee

Comment 29

•

5 years ago

emilio and kats were kind enough to provide some other examples of crash reports with bad stacks. Consider https://crash-stats.mozilla.com/report/index/8f912656-156f-4c1b-8bd7-571850190329 where we start with:

0 	libxul.so 	selectors::matching::matches_complex_selector_internal 	servo/components/style/gecko_bindings/sugar/ns_t_array.rs:17 	context
1 	libxul.so 	style::dom_apis::query_selector_slow 	servo/components/selectors/matching.rs:342 	cfi

The first frame is a little weird: we're said to be in the selectors crate, but the source file is bits for nsTArray? The second line is strange for much the same reason.

Reprocessing the crash with new symbol information gives:

 0  libxul.so!selectors::matching::matches_complex_selector_internal [matching.rs : 488 + 0x292]
 1  libxul.so!style::dom_apis::query_selector_slow [dom_apis.rs : 573 + 0x170]

which corresponds to

https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/selectors/matching.rs#l488
https://hg.mozilla.org/releases/mozilla-beta/annotate/a89261364728dbfe37351e0de812e906c436a5d4/servo/components/style/dom_apis.rs#l573

Which gives us at least better pointers to where we were on the call stack, even if it's not immediately obvious how we get into nsTArray bits from that matches_complex_selector_internal line.

Or https://crash-stats.mozilla.com/report/index/4145fdb1-7a34-42d1-a7c9-8b3df0190330, where we have some nonsense frames in the middle of the call stack:

9 	libxul.so 	webrender::renderer::TextureResolver::bind 	src/libcore/option.rs:322 	cfi
10 	libxul.so 	webrender::renderer::Renderer::draw_instanced_batch 	gfx/wr/webrender/src/renderer.rs:3398 	cfi
11 	libxul.so 	webrender::renderer::Renderer::draw_color_target::{{closure}} 	gfx/wr/webrender/src/renderer.rs:3900 	cfi
12 	libxul.so 	webrender::renderer::Renderer::draw_color_target 	gfx/wr/webrender/src/renderer.rs:3734 	cfi

TextureResolver::bind doesn't live in libcore.

Reprocessing the crash with new symbols, we get:

 9  libxul.so!webrender::renderer::TextureResolver::bind [renderer.rs : 1053 + 0x8a]
10  libxul.so!webrender::renderer::Renderer::draw_instanced_batch [renderer.rs : 3398 + 0x10]
11  libxul.so!webrender::renderer::Renderer::draw_color_target::{{closure}} [renderer.rs : 3900 + 0x5]
12  libxul.so!webrender::renderer::Renderer::draw_color_target [renderer.rs : 3734 + 0x15]

and that tells us we're crashing inside TextureResolver::bind at:

https://hg.mozilla.org/mozilla-central/annotate/69e9ee0ef3dd97a412d90be9c8377d660d08cbc2/gfx/wr/webrender/src/renderer.rs#l1053

which makes a lot more sense.

Pulsebot

Comment 30

•

5 years ago

Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/578c94538897
part 1 - extract file information out of .debug_line parsing; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/589e276c75fa
part 2 - parse DW_TAG_inlined_subroutine DIEs; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/b3e5b74ed19f
part 3 - replace line information for inlined functions; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/d048bcf083e5
part 4 - look through lexical block DIEs where appropriate; r=gsvelto
https://hg.mozilla.org/integration/autoland/rev/17040bb20e25
part 5 - merge adjacent line records where possible; r=gsvelto

Narcis Beleuzu [:NarcisB]

Comment 31

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/578c94538897
https://hg.mozilla.org/mozilla-central/rev/589e276c75fa
https://hg.mozilla.org/mozilla-central/rev/b3e5b74ed19f
https://hg.mozilla.org/mozilla-central/rev/d048bcf083e5
https://hg.mozilla.org/mozilla-central/rev/17040bb20e25

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

BugBot [:suhaib / :marco/ :calixte]

Updated

•

5 years ago

Assignee: nobody → nfroyd

Ionuț Goldan [:igoldan]

Updated

•

5 years ago

Regressions: 1546882

Karl Tomlinson (:karlt)

Updated

•

5 years ago

"dwarfdump --lookup" output with TAG_inlined_subroutine 7 years ago Markus Stange [:mstange] 5.77 KB, text/plain		Details
Bug 524410 - part 1 - extract file information out of .debug_line parsing; r=gsvelto 5 years ago Nathan Froyd [:froydnj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 524410 - part 2 - parse DW_TAG_inlined_subroutine DIEs; r=gsvelto 5 years ago Nathan Froyd [:froydnj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 524410 - part 3 - replace line information for inlined functions; r=gsvelto 5 years ago Nathan Froyd [:froydnj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 524410 - part 4 - look through lexical block DIEs where appropriate; r=gsvelto 5 years ago Nathan Froyd [:froydnj] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 524410 - part 5 - merge adjacent line records where possible; r=gsvelto 5 years ago Nathan Froyd [:froydnj] 47 bytes, text/x-phabricator-request		Details \| Review