Breakpad Linux Dumper: support inlined functions

NEW
Unassigned

Status

()

Toolkit
Crash Reporting
9 years ago
4 months ago

People

(Reporter: jimb, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
At the moment, Breakpad's support for dumping DWARF on Linux does not support inlined functions.

This may require a slight modification to the Breakpad symbol file format, which has no special support for inlined functions. In particular, the processor assumes that all FUNC records cover disjoint address ranges, so if f has been inlined into g, you can't say:

    FUNC 1000 50 0 g
    FUNC 1020 10 0 f

to say that g covers the 0x50 bytes starting at 0x1000, but within that range the 0x10 bytes at 0x1020 are an inlined instance of f. Breakpad instead requires you to say:

    FUNC 1000 20 0 g
    FUNC 1020 10 0 f
    FUNC 1030 20 0 g

That is, you have to split g into two "functions".

Which is baloney, of course. Nested FUNC records like the above should be permitted, and the processor should choose the innermost FUNC covering the address in question.
(Reporter)

Updated

9 years ago
Assignee: nobody → jim
I think on other platforms you just get data that looks like:
FILE 1 foo.c
FILE 2 foo.h
FUNC 1000 50 0 g
1000 4 1 1
1004 4 1 2
1008 4 3 1

It's not the worst thing ever, since you generally get the right source line, but it can be confusing.

Implementing this as described would require some changes to the processor's data structures, because it currently uses a RangeMap to store the function info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/basic_source_line_resolver.cc#171
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/range_map.h
and RangeMap doesn't permit overlapping address ranges.
You could switch to ContainedRangeMap, which allows for fully contained child address ranges:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/processor/contained_range_map.h
That'd probably have a negative perf impact, but I guess there's not much you can do about that.
(Reporter)

Comment 2

9 years ago
Yeah, I just noticed that, even though functioninfo.cc does recognize DW_TAG_inlined_subroutine, it doesn't seem to actually output an entry for the inlined function.

I'm really glad to hear about ContainedRangeMap.  It seems like, in the absence of inlined functions, that degenerates into an ordinary map, so at least on first principles I wouldn't expect the performance hit would be too much.

The thing is, if we just list the inlined function *instead* of the function it was inlined into, that's pretty confusing, because the caller is something unconnected.
(Reporter)

Comment 3

9 years ago
However, since I see now that other platforms don't support inlined functions, I don't think this should block 517824 any more.
No longer blocks: 517824
(In reply to comment #2)
> The thing is, if we just list the inlined function *instead* of the function it
> was inlined into, that's pretty confusing, because the caller is something
> unconnected.

No, it's slightly more confusing than that, because the function name is the containing function, but the source file+line are that of the inlined function.
On Windows, we appear to completely skip inlined code. See bug 492675 comment 39 for an example. AddImportantRules is inlined into nsStyleSet::FileRules, and the breakpad output for nsStyleSet::FileRules just has a line like:
101831 88 573 42076

which encompasses the entire inlined function, but represents it as a single line in the containing function.
(Reporter)

Updated

8 years ago
Duplicate of this bug: 563776
(Reporter)

Comment 7

6 years ago
I'm not actively working on this, and I don't have plans to in the near future, so I'm de-assigning myself to it.
Assignee: jimb → nobody
Blocks: 793416
I'd like to propose a different format change to handle this. I don't think we should have overlapping FUNC ranges for inlined functions, because that would probably lead to a lot of repetition of the function names of inlined functions. I'd keep the FUNC lines the way they are and only have them for the outer functions.

We can have a table at the beginning of the symbol dump that lists the function names of all functions that get inlined anywhere, similar to the FILE table:
INLINE 0 SomeFunction
INLINE 1 SomeOtherFunction

And then we can refer to those inlined functions using the pattern
<address> <line_size> <line_number> <file_number> <inline_number>
which is very similar to the existing pattern
<address> <line_size> <line_number> <file_number>
, just with an extra field at the end.

And then we can allow these ranges to overlap / nest.

Here's an example:

dom/base/nsAttrAndChildArray.cpp:
> 846 bool
> 847 nsAttrAndChildArray::GrowBy(uint32_t aGrowSize)
> 848 {
> 849   CheckedUint32 size = 0;
> 850   if (mImpl) {
> 851     size += mImpl->mBufferSize;
> 852     size += NS_IMPL_EXTRA_SIZE;
> 853     if (!size.isValid()) {
> 854       return false;
> 855     }
> 856   }

mfbt/CheckedInt.h:
> 256 IsAddValid(T aX, T aY)
> 257 {
> [...]
> 267  return IsSigned<T>::value
> 268          ? HasSignBit(BinaryComplement(T((result ^ aX) & (result ^ aY))))
> 269          : BinaryComplement(aX) >= aY;
> 270 }
> [...]
> 689
> 690 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Add, +)
> 691 MOZ_CHECKEDINT_BASIC_BINARY_OPERATOR(Sub, -)
> [...]
> 756
> 757 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(+, +=)
> 758 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(*, *=)
> 759 MOZ_CHECKEDINT_CONVENIENCE_BINARY_OPERATORS(-, -=)

Old:
> [...]
> FILE 602 hg:hg.mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 b 690 9655
> f2829e 3 269 9655
> f282a1 9 757 9655
> f282aa 3 853 602
> [...]

New:
> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602
> [...]

or as ascii art:

>                   |---3---|         IsAddValid
>           |-------e-------|         CheckedUint32::operator+
>           |----------17-----------| CheckedUint32::operator+=
>   |---4---|----------17-----------| nsAttrAndChildArray::GrowBy
>   |---4---|---b---|---3---|---9---|
> f2828f  f28293  f2829e  f282a1  f282aa

So we still have the existing
<address> <line_size> <line_number> <file_number>
pattern, which always refers to the outer function (as defined by the current FUNC section), and ranges of this type are non-overlapping.

Only the new
<address> <line_size> <line_number> <file_number> <inline_number>
ranges overlap, with each other and with the existing ranges.

This gives us easy backwards compatibility for consumers that ignore the new pattern. They will simply assign all addresses within an outer function to the file and lines of that outer function.

And for every "stack frame", we now have all three pieces of information: function name, file, and line number.
(Reporter)

Comment 9

10 months ago
I like the general idea very much: commoning out the names in their own table, and then citing that table from records whose ranges can overlap.

From looking at our breakpad reader code[1], it seems that the parser will complain about a source line record with an extra field just as much as it would complain about an unrecognized record type. So I don't think we get backwards compatibility. So I think we might as well just invent a new record type ("INLINED"?) for the records showing inlined function calls.

[1]: http://searchfox.org/mozilla-central/rev/3a3af33f513071ea829debdfbc628caebcdf6996/toolkit/crashreporter/google-breakpad/src/processor/basic_source_line_resolver.cc#534-536
(Reporter)

Comment 10

10 months ago
So the parser would manage the stack by first considering the start address of the new line record, and popping inlined frames from the stack until it finds a frame that still covers the new start address; and then pushing an inlined frame when it sees one of these new records.

If we have to adapt consumers anyway, then we could have plain line records implicitly belong to the function of the youngest frame, and avoid repeating the function number on every line record within an inlined call.
(In reply to Jim Blandy :jimb from comment #10)
> So the parser would manage the stack by first considering the start address
> of the new line record, and popping inlined frames from the stack until it
> finds a frame that still covers the new start address; and then pushing an
> inlined frame when it sees one of these new records.

That's exactly correct.

> If we have to adapt consumers anyway, then we could have plain line records
> implicitly belong to the function of the youngest frame, and avoid repeating
> the function number on every line record within an inlined call.

That's an interesting idea. This would help if several consecutive lines belong to the same inlined function. So effectively you'd only have to mention the <inline_number> if the current line in the dump describes a different function than the previous line.
But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

In the example I gave in comment 8, every <inline_number> line refers to a different function than the previous line, so it wouldn't save us anything there. And we have an additional annotation in the line that takes us back to the outer function.
But this example is probably not representative.

> [...]
> FILE 602 mozilla.org/mozilla-central:dom/base/nsAttrAndChildArray.cpp:5928d905c0bc
> [...]
> FILE 9655 hg:hg.mozilla.org/mozilla-central:mfbt/CheckedInt.h:5928d905c0bc
> [...]
> INLINE 481 nsAttrAndChildArray::GrowBy(unsigned int)
(this line is new ^)
> INLINE 482 CheckedUint32::operator+=
> INLINE 483 CheckedUint32::operator+
> INLINE 484 IsAddValid
> [...]
> FUNC f28270 14d 0 nsAttrAndChildArray::GrowBy(unsigned int)
> f28270 17 848 602
> f28287 3 881 602
> f2828a 5 850 602
> f2828f 4 851 602
> f28293 17 852 602
> f28293 17 757 9655 482
> f28293 e 690 9655 483
> f2829e 3 269 9655 484
> f282aa 3 853 602 481
(and this line has ^^^)
> [...]
What does the DWARF look like for an inlined function? (I know I've seen it before but I can't recall offhand.)
Created attachment 8893527 [details]
"dwarfdump --lookup" output with TAG_inlined_subroutine

It uses TAG_inlined_subroutine. I'm attaching the output of
$ dwarfdump --lookup 0xdbb5a8 XUL.dSYM
(Reporter)

Comment 14

10 months ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #12)
> What does the DWARF look like for an inlined function? (I know I've seen it
> before but I can't recall offhand.)

I have a truly beautiful explanation of this, which this multi-gigabyte bug database is unfortunately too small to contain.

(It's very hairy; see the DWARF spec)
(Reporter)

Comment 15

10 months ago
> But you'd also have to have inline_numbers for the outer FUNC section itself, so that you have a number you can mention once you've left all inlined functions and are back at the outer function. So the INLINE table at the start of the dump might grow a little. It would be nice to find out if this suggestion results in an overall size saving.

I'm not sure I understand this. If we're managing the inlined instance stack as described in comment 10, it seems to me that we always know when a given line entry is not covered by any inlined instance. The line entries covered by the outer FUNC should just be those for which the stack is empty. No?
(Reporter)

Comment 16

10 months ago
(In reply to Jim Blandy :jimb from comment #14)
> I have a truly beautiful explanation of this, which this multi-gigabyte bug
> database is unfortunately too small to contain.

Less flippantly:

A DW_TAG_subprogram DIE can have a DW_AT_inline attribute that says whether it was declared inline and whether it was actually inlined. If it was inlined, the DW_TAG_subprogram and its children are called an "abstract instance tree", and aren't supposed to mention any attributes that vary from one inlined instance to another, like code addresses.

When you have an inlined callee, the calling function should have a DW_TAG_inlined_subroutine child whose DW_AT_abstract_origin attribute points to the DW_TAG_subroutine at the root of the abstract instance tree. The DW_TAG_inlined_subroutine DIE then supplies all the information specific to this inlined instance that was missing from the abstract instance tree. Each DW_TAG_inlined_subroutine DIE is called a "concrete inlined instance".

When you have a non-inlined instance of a function with a DW_AT_inline attribute, that's called a "concrete out-of-line instance". It's like a concrete inlined instance - it has a DW_AT_abstract_origin attribute, and supplements that with information specific to this out-of-line instance - but it uses DW_TAG_subroutine tag instead of DW_TAG_inlined_subroutine. These are supposed to appear as siblings of the abstract instance, but could appear elsewhere.

For example:

$ cat inline.c
cat: inline.c: No such file or directory
$ cat inline.cc
inline int f(int x) { return x*x; }

int g(int x) { return f(x) * f(x); }
$ g++ -g -O2 -c inline.cc
$ objdump -dr inline.o

inline.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_Z1gi>:
   0:	89 f8                	mov    %edi,%eax
   2:	0f af c7             	imul   %edi,%eax
   5:	0f af c0             	imul   %eax,%eax
   8:	c3                   	retq   
$ readelf -wi inline.o
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x99 (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x26): GNU C++14 6.4.1 20170727 (Red Hat 6.4.1-1) -mtune=generic -march=x86-64 -g -O2
    <10>   DW_AT_language    : 4	(C++)
    <11>   DW_AT_name        : (indirect string, offset: 0x16): inline.cc
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x6): /home/jimb/play
    <19>   DW_AT_low_pc      : 0x0
    <21>   DW_AT_high_pc     : 0x9
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_external    : 1
    <2e>   DW_AT_name        : g
    <30>   DW_AT_decl_file   : 1
    <31>   DW_AT_decl_line   : 3
    <32>   DW_AT_linkage_name: (indirect string, offset: 0x0): _Z1gi
    <36>   DW_AT_type        : <0x7d>
    <3a>   DW_AT_low_pc      : 0x0
    <42>   DW_AT_high_pc     : 0x9
    <4a>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
    <4c>   DW_AT_GNU_all_call_sites: 1
    <4c>   DW_AT_sibling     : <0x7d>
 <2><50>: Abbrev Number: 3 (DW_TAG_formal_parameter)
    <51>   DW_AT_name        : x
    <53>   DW_AT_decl_file   : 1
    <54>   DW_AT_decl_line   : 3
    <55>   DW_AT_type        : <0x7d>
    <59>   DW_AT_location    : 1 byte block: 55 	(DW_OP_reg5 (rdi))
 <2><5b>: Abbrev Number: 4 (DW_TAG_inlined_subroutine)
    <5c>   DW_AT_abstract_origin: <0x84>
    <60>   DW_AT_low_pc      : 0x2
    <68>   DW_AT_high_pc     : 0x3
    <70>   DW_AT_call_file   : 1
    <71>   DW_AT_call_line   : 3
 <3><72>: Abbrev Number: 5 (DW_TAG_formal_parameter)
    <73>   DW_AT_abstract_origin: <0x92>
    <77>   DW_AT_location    : 0x0 (location list)
 <3><7b>: Abbrev Number: 0
 <2><7c>: Abbrev Number: 0
 <1><7d>: Abbrev Number: 6 (DW_TAG_base_type)
    <7e>   DW_AT_byte_size   : 4
    <7f>   DW_AT_encoding    : 5	(signed)
    <80>   DW_AT_name        : int
 <1><84>: Abbrev Number: 7 (DW_TAG_subprogram)
    <85>   DW_AT_external    : 1
    <85>   DW_AT_name        : f
    <87>   DW_AT_decl_file   : 1
    <88>   DW_AT_decl_line   : 1
    <89>   DW_AT_linkage_name: (indirect string, offset: 0x20): _Z1fi
    <8d>   DW_AT_type        : <0x7d>
    <91>   DW_AT_inline      : 3	(declared as inline and inlined)
 <2><92>: Abbrev Number: 8 (DW_TAG_formal_parameter)
    <93>   DW_AT_name        : x
    <95>   DW_AT_decl_file   : 1
    <96>   DW_AT_decl_line   : 1
    <97>   DW_AT_type        : <0x7d>
 <2><9b>: Abbrev Number: 0
 <1><9c>: Abbrev Number: 0

$
You need to log in before you can comment on or make changes to this bug.