Last Comment Bug 526484 - symbol dumper behavior fails to include certain code blocks produced by PGO (some sections of functions missing in output)
: symbol dumper behavior fails to include certain code blocks produced by PGO (...
Product: Toolkit
Classification: Components
Component: Breakpad Integration (show other bugs)
: Trunk
: x86 Windows XP
-- normal (vote)
: ---
Assigned To: Ted Mielczarek [:ted.mielczarek]
: Ted Mielczarek [:ted.mielczarek]
Depends on:
  Show dependency treegraph
Reported: 2009-11-04 07:00 PST by Ted Mielczarek [:ted.mielczarek]
Modified: 2010-02-19 13:37 PST (History)
6 users (show)
mbeltzner: blocking1.9.2-
mbeltzner: wanted1.9.2+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

update dump_syms.exe to one built from SVN r430 (60.51 KB, patch)
2009-11-24 06:20 PST, Ted Mielczarek [:ted.mielczarek]
benjamin: review+
mbeltzner: approval1.9.2+
mbeltzner: approval1.9.1.8+
Details | Diff | Splinter Review

Description User image Ted Mielczarek [:ted.mielczarek] 2009-11-04 07:00:12 PST
In bug 525326 comment 18, Jeff notes that the top of the stack here:
is in known code, despite showing up as "xul.dll@0x34311e". If you look at the source of the frame above it, it's clearly calling "GifWrite()", and I suspect GifWrite has been inlined, since that's the only call site for it. I think maybe the Windows symbol dumper isn't handling inlined functions correctly in this case.

We've also seen other instances that could be the same thing, for example:
We clearly have mozjs.dll symbols there, just not for that top frame.

jimb noted elsewhere while working on DWARF CFI work that the Breakpad symbol format doesn't actually handle inlined methods cleanly. IIRC, on OSX/Linux we currently just stick the file/line info of the inlined method inside the function data of the containing method, so in crash stacks you get the name of the containing method, and the file/line info of the inlined method.
Comment 1 User image Ted Mielczarek [:ted.mielczarek] 2009-11-17 07:37:09 PST
The same crash on 3.5.4, since the crash reports linked in comment 0 are nightlies, and as such their symbols have already been deleted from the symbol server (oops):

The libxul symbols for 3.5.4 on Windows:
(there's also the compressed xul.pd_ in that same directory)

There appears to be an entry for GifWrite in the symbol file:
FUNC 157f10 4c4 8 nsGIFDecoder2::GifWrite(unsigned char const *,unsigned int)

but that's clearly not the address in the stack trace (xul.dll@0x348864).

Also, the entry for nsGIFDecoder2::ProcessData looks sane:
FUNC 188635 2e c nsGIFDecoder2::ProcessData(unsigned char *,unsigned int,unsigne
d int *)
188635 0 247 193060
188635 e 250 193060
188643 9 251 193060
18864c d 254 193060
188659 7 260 193060
188660 3 263 193060

(that's the entirety of it). So it doesn't appear to actually have been inlined.

I'm going to poke at the PDB file with the DIA APIs to see what it thinks is at that address.
Comment 2 User image Ted Mielczarek [:ted.mielczarek] 2009-11-17 09:04:18 PST
using the DIA API findSymbolByRVA, I get the following info for that address:

Found something at 0x348864
Symbol type: SymTagBlock
No name present
No source file present
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no
Found lexical parent
Symbol type: SymTagFunction
Name: nsGIFDecoder2::GifWrite
Source File: nsGIFDecoder2::GifWrite
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no

So it's a code block that's apparently a child of GifWrite, but it doesn't have its own name.
Comment 3 User image Ted Mielczarek [:ted.mielczarek] 2009-11-17 09:15:44 PST
I guess part of the problem here is that Breakpad's symbol format doesn't know how to represent noncontiguous functions, which is what this apparently is. The function printer in the symbol dumping code just asks for a list of functions by RVA, and then prints them:

Since this isn't actually a function, it probably doesn't get enumerated there. If I had to guess, I would think that we could call IDiaSymbol::findChildren on each function we get, then for any blocks we find that aren't contained in the function's address space, we could print out an extra FUNC line with the block's rva and size, and its line info.
Comment 4 User image Ted Mielczarek [:ted.mielczarek] 2009-11-18 04:40:14 PST
That guess doesn't seem to pan out. Using DIA, I can get this block if I call findSymbolByRVA with its RVA, and then get the parent function using get_lexicalParent, but if I have the parent function, this block isn't anywhere I can get it using findChildren (it's not a descendant node at all in that tree).

There must be some way to collect these orphaned blocks and associate them with the function they came from, I'll keep looking. Currently dump_syms uses getSymbolsByAddr to enumerate over all functions in the PDB, but this doesn't find these code blocks.
Comment 5 User image Ted Mielczarek [:ted.mielczarek] 2009-11-19 11:21:06 PST
Ok, after much fiddling, I was able to hack dump_syms such that it can locate these missing blocks of code. I can make it output extra FUNC lines, so that we should get function names for these blocks of code, but there doesn't appear to be source line data available at all in the PDB files. This means that we won't get source file + line numbers for them. I'll have to test with VC9 and see if it's any better in this regard. If not, I'll file a bug on Microsoft.
Comment 6 User image Ted Mielczarek [:ted.mielczarek] 2009-11-20 06:17:52 PST
Since I said I thought there was a compiler bug (or at least deficiency) here, Jeff asked what would happen if a Microsoft debugger were to look at these blocks of code. I used the testcase from bug 525326 to crash a Firefox 3.5.4 with WinDBG attached. The top of the stack looks like:
ChildEBP RetAddr  
0012f96c 10188643 xul!nsGIFDecoder2::GifWrite+0x1f0a35
0012f978 1018867b xul!nsGIFDecoder2::ProcessData(unsigned char * data = 0x025ed608 "D???", unsigned int count = 0x4e0b020, unsigned int * _retval = 0x04cb20a0)+0xe [e:\builds\moz2_slave\win32_build\build\modules\libpr0n\decoders\gif\nsgifdecoder2.cpp @ 251]

I don't get source info for the top frame. I have source server debugging enabled, and I get source if I click on the second frame. It seems pretty clear that the compiler just isn't emitting full debug info for these blocks of code that were rearranged by the optimizer. I'm going to try producing a reduced testcase so I can test on VC9 (and the VC10 beta). I'll also get my patch cleaned up and submitted upstream, since having function names is better than what we have now (and appears to be the best we can do with the compiler limitations).
Comment 7 User image Ted Mielczarek [:ted.mielczarek] 2009-11-20 09:32:16 PST
Patch up for upstream review:
Comment 8 User image Ted Mielczarek [:ted.mielczarek] 2009-11-20 13:21:13 PST
I landed this upstream. We should just take it locally (and on branches) because merging with upstream is a pain due to the Linux client rewrite right now. (I will merge that in bug 514188 eventually.)
Comment 9 User image Ted Mielczarek [:ted.mielczarek] 2009-11-23 06:30:03 PST
Actually, I forgot, we don't build dump_syms on Windows (because it doesn't build on MSVC Express, I think), so I just need to update the checked-in binary.
Comment 10 User image Ted Mielczarek [:ted.mielczarek] 2009-11-23 06:30:44 PST
I have another patch awaiting review upstream that I'd like to land first:
Comment 11 User image Ted Mielczarek [:ted.mielczarek] 2009-11-24 06:20:33 PST
Created attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

Not much to review here, really. The upstream patches that this picks up are listed in this bug. I built this using VC8, and the dump_syms.vcproj in the Breakpad tree. The only change I made was to switch it to static CRT linkage.

We should take this on all of our branches, ideally, since it's nothing but good.

I can even regenerate our existing symbols for the latest release on each branch using this, I'll file a separate bug for that.
Comment 12 User image Ted Mielczarek [:ted.mielczarek] 2009-11-24 09:09:19 PST
I filed bug 530783 on that.
Comment 13 User image Ted Mielczarek [:ted.mielczarek] 2009-12-14 04:41:47 PST
Pushed to trunk:
Comment 14 User image Ted Mielczarek [:ted.mielczarek] 2009-12-14 04:42:52 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

We should take this on branches to get better symbols on Windows builds.
Comment 15 User image Daniel Veditz [:dveditz] 2009-12-21 15:40:53 PST
Maybe this is 3.6.1 instead of 3.6, but nominating anyway. Would like this landed on 1.9.2 before 1.9.1
Comment 16 User image Mike Beltzner [:beltzner, not reading bugmail] 2009-12-21 20:20:13 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

a192=beltzner, let's get this in, but if it bounces, we'll pull it, so not blocking.
Comment 17 User image Ted Mielczarek [:ted.mielczarek] 2009-12-22 04:12:13 PST
Pushed to 1.9.2:

Kind of tough to verify, but I ssh'ed to a place with access to the symbol store, and checked the count of functions reported in each symbol file:
$ grep xul.sym firefox-3.7a1pre-WINNT-20091214045914-symbols.txt
$ grep ^FUNC xul.pdb/F6409D49C333416297B8388C2C8F5F592/xul.sym | wc -l

$ grep xul.sym firefox-3.7a1pre-WINNT-20091215045531-symbols.txt
$ grep ^FUNC xul.pdb/52C531559AD943F8A6FB24BB407EE5292/xul.sym | wc -l

You can see that we're reporting about 11k more function names, which are the bits that were missing.
Comment 18 User image Mike Beltzner [:beltzner, not reading bugmail] 2010-01-19 10:17:00 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

a=beltzner for 191 branch
Comment 19 User image Ted Mielczarek [:ted.mielczarek] 2010-01-19 10:35:07 PST
Verified on 1.9.2, while I was at it:
$ grep xul.sym firefox-3.6b6pre-WINNT-20091221053555-symbols.txt
$ grep ^FUNC xul.pdb/E3681D36E1A74C349CF24A8FDB19B2232/xul.sym | wc -l

$ grep xul.sym firefox-3.6b6pre-WINNT-20091222062748-symbols.txt
$ grep ^FUNC xul.pdb/263DF4266ABA49A4A9BF4929F17BF4D12/xul.sym | wc -l
Comment 20 User image Ted Mielczarek [:ted.mielczarek] 2010-01-19 10:40:32 PST
Pushed to 1.9.1:

Note You need to log in before you can comment on or make changes to this bug.