Last Comment Bug 526484 - symbol dumper behavior fails to include certain code blocks produced by PGO (some sections of functions missing in output)
: symbol dumper behavior fails to include certain code blocks produced by PGO (...
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Breakpad Integration (show other bugs)
: Trunk
: x86 Windows XP
: -- normal (vote)
: ---
Assigned To: Ted Mielczarek [:ted.mielczarek]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-04 07:00 PST by Ted Mielczarek [:ted.mielczarek]
Modified: 2010-02-19 13:37 PST (History)
6 users (show)
mbeltzner: blocking1.9.2-
mbeltzner: wanted1.9.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
final-fixed
.8-fixed


Attachments
update dump_syms.exe to one built from SVN r430 (60.51 KB, patch)
2009-11-24 06:20 PST, Ted Mielczarek [:ted.mielczarek]
benjamin: review+
mbeltzner: approval1.9.2+
mbeltzner: approval1.9.1.8+
Details | Diff | Review

Description Ted Mielczarek [:ted.mielczarek] 2009-11-04 07:00:12 PST
In bug 525326 comment 18, Jeff notes that the top of the stack here:
http://crash-stats.mozilla.com/report/index/7c622a06-1d57-4d34-afd8-540af2091016
is in known code, despite showing up as "xul.dll@0x34311e". If you look at the source of the frame above it, it's clearly calling "GifWrite()", and I suspect GifWrite has been inlined, since that's the only call site for it. I think maybe the Windows symbol dumper isn't handling inlined functions correctly in this case.

We've also seen other instances that could be the same thing, for example:
http://crash-stats.mozilla.com/report/index/1c3fde12-d276-4c7b-9d2c-917a82091104?p=1
We clearly have mozjs.dll symbols there, just not for that top frame.

jimb noted elsewhere while working on DWARF CFI work that the Breakpad symbol format doesn't actually handle inlined methods cleanly. IIRC, on OSX/Linux we currently just stick the file/line info of the inlined method inside the function data of the containing method, so in crash stacks you get the name of the containing method, and the file/line info of the inlined method.
Comment 1 Ted Mielczarek [:ted.mielczarek] 2009-11-17 07:37:09 PST
The same crash on 3.5.4, since the crash reports linked in comment 0 are nightlies, and as such their symbols have already been deleted from the symbol server (oops):
http://crash-stats.mozilla.com/report/index/5878c51c-c28f-4bf4-afff-030c52091112

The libxul symbols for 3.5.4 on Windows:
http://symbols.mozilla.org/firefox/xul.pdb/438A50233B37496A8B73EA977D68E98F2/xul.sym
(there's also the compressed xul.pd_ in that same directory)

There appears to be an entry for GifWrite in the symbol file:
FUNC 157f10 4c4 8 nsGIFDecoder2::GifWrite(unsigned char const *,unsigned int)

but that's clearly not the address in the stack trace (xul.dll@0x348864).

Also, the entry for nsGIFDecoder2::ProcessData looks sane:
FUNC 188635 2e c nsGIFDecoder2::ProcessData(unsigned char *,unsigned int,unsigne
d int *)
188635 0 247 193060
188635 e 250 193060
188643 9 251 193060
18864c d 254 193060
188659 7 260 193060
188660 3 263 193060

(that's the entirety of it). So it doesn't appear to actually have been inlined.

I'm going to poke at the PDB file with the DIA APIs to see what it thinks is at that address.
Comment 2 Ted Mielczarek [:ted.mielczarek] 2009-11-17 09:04:18 PST
using the DIA API findSymbolByRVA, I get the following info for that address:

Found something at 0x348864
==================
Symbol type: SymTagBlock
No name present
No source file present
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no
Found lexical parent
==================
Symbol type: SymTagFunction
Name: nsGIFDecoder2::GifWrite
Source File: nsGIFDecoder2::GifWrite
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no

So it's a code block that's apparently a child of GifWrite, but it doesn't have its own name.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2009-11-17 09:15:44 PST
I guess part of the problem here is that Breakpad's symbol format doesn't know how to represent noncontiguous functions, which is what this apparently is. The function printer in the symbol dumping code just asks for a list of functions by RVA, and then prints them:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/windows/pdb_source_line_writer.cc#226

Since this isn't actually a function, it probably doesn't get enumerated there. If I had to guess, I would think that we could call IDiaSymbol::findChildren on each function we get, then for any blocks we find that aren't contained in the function's address space, we could print out an extra FUNC line with the block's rva and size, and its line info.
Comment 4 Ted Mielczarek [:ted.mielczarek] 2009-11-18 04:40:14 PST
That guess doesn't seem to pan out. Using DIA, I can get this block if I call findSymbolByRVA with its RVA, and then get the parent function using get_lexicalParent, but if I have the parent function, this block isn't anywhere I can get it using findChildren (it's not a descendant node at all in that tree).

There must be some way to collect these orphaned blocks and associate them with the function they came from, I'll keep looking. Currently dump_syms uses getSymbolsByAddr to enumerate over all functions in the PDB, but this doesn't find these code blocks.
Comment 5 Ted Mielczarek [:ted.mielczarek] 2009-11-19 11:21:06 PST
Ok, after much fiddling, I was able to hack dump_syms such that it can locate these missing blocks of code. I can make it output extra FUNC lines, so that we should get function names for these blocks of code, but there doesn't appear to be source line data available at all in the PDB files. This means that we won't get source file + line numbers for them. I'll have to test with VC9 and see if it's any better in this regard. If not, I'll file a bug on Microsoft.
Comment 6 Ted Mielczarek [:ted.mielczarek] 2009-11-20 06:17:52 PST
Since I said I thought there was a compiler bug (or at least deficiency) here, Jeff asked what would happen if a Microsoft debugger were to look at these blocks of code. I used the testcase from bug 525326 to crash a Firefox 3.5.4 with WinDBG attached. The top of the stack looks like:
ChildEBP RetAddr  
0012f96c 10188643 xul!nsGIFDecoder2::GifWrite+0x1f0a35
0012f978 1018867b xul!nsGIFDecoder2::ProcessData(unsigned char * data = 0x025ed608 "D???", unsigned int count = 0x4e0b020, unsigned int * _retval = 0x04cb20a0)+0xe [e:\builds\moz2_slave\win32_build\build\modules\libpr0n\decoders\gif\nsgifdecoder2.cpp @ 251]

I don't get source info for the top frame. I have source server debugging enabled, and I get source if I click on the second frame. It seems pretty clear that the compiler just isn't emitting full debug info for these blocks of code that were rearranged by the optimizer. I'm going to try producing a reduced testcase so I can test on VC9 (and the VC10 beta). I'll also get my patch cleaned up and submitted upstream, since having function names is better than what we have now (and appears to be the best we can do with the compiler limitations).
Comment 7 Ted Mielczarek [:ted.mielczarek] 2009-11-20 09:32:16 PST
Patch up for upstream review:
http://breakpad.appspot.com/40007
Comment 8 Ted Mielczarek [:ted.mielczarek] 2009-11-20 13:21:13 PST
I landed this upstream. We should just take it locally (and on branches) because merging with upstream is a pain due to the Linux client rewrite right now. (I will merge that in bug 514188 eventually.)
Comment 9 Ted Mielczarek [:ted.mielczarek] 2009-11-23 06:30:03 PST
Actually, I forgot, we don't build dump_syms on Windows (because it doesn't build on MSVC Express, I think), so I just need to update the checked-in binary.
Comment 10 Ted Mielczarek [:ted.mielczarek] 2009-11-23 06:30:44 PST
I have another patch awaiting review upstream that I'd like to land first:
http://breakpad.appspot.com/43001/show
Comment 11 Ted Mielczarek [:ted.mielczarek] 2009-11-24 06:20:33 PST
Created attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

Not much to review here, really. The upstream patches that this picks up are listed in this bug. I built this using VC8, and the dump_syms.vcproj in the Breakpad tree. The only change I made was to switch it to static CRT linkage.

We should take this on all of our branches, ideally, since it's nothing but good.

I can even regenerate our existing symbols for the latest release on each branch using this, I'll file a separate bug for that.
Comment 12 Ted Mielczarek [:ted.mielczarek] 2009-11-24 09:09:19 PST
I filed bug 530783 on that.
Comment 13 Ted Mielczarek [:ted.mielczarek] 2009-12-14 04:41:47 PST
Pushed to trunk:
http://hg.mozilla.org/mozilla-central/rev/246999503836
Comment 14 Ted Mielczarek [:ted.mielczarek] 2009-12-14 04:42:52 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

We should take this on branches to get better symbols on Windows builds.
Comment 15 Daniel Veditz [:dveditz] 2009-12-21 15:40:53 PST
Maybe this is 3.6.1 instead of 3.6, but nominating anyway. Would like this landed on 1.9.2 before 1.9.1
Comment 16 Mike Beltzner [:beltzner, not reading bugmail] 2009-12-21 20:20:13 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

a192=beltzner, let's get this in, but if it bounces, we'll pull it, so not blocking.
Comment 17 Ted Mielczarek [:ted.mielczarek] 2009-12-22 04:12:13 PST
Pushed to 1.9.2:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/c576c15e01a9

Kind of tough to verify, but I ssh'ed to a place with access to the symbol store, and checked the count of functions reported in each symbol file:
$ grep xul.sym firefox-3.7a1pre-WINNT-20091214045914-symbols.txt
xul.pdb/F6409D49C333416297B8388C2C8F5F592/xul.sym
$ grep ^FUNC xul.pdb/F6409D49C333416297B8388C2C8F5F592/xul.sym | wc -l
63971

$ grep xul.sym firefox-3.7a1pre-WINNT-20091215045531-symbols.txt
xul.pdb/52C531559AD943F8A6FB24BB407EE5292/xul.sym
$ grep ^FUNC xul.pdb/52C531559AD943F8A6FB24BB407EE5292/xul.sym | wc -l
74972

You can see that we're reporting about 11k more function names, which are the bits that were missing.
Comment 18 Mike Beltzner [:beltzner, not reading bugmail] 2010-01-19 10:17:00 PST
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430

a=beltzner for 191 branch
Comment 19 Ted Mielczarek [:ted.mielczarek] 2010-01-19 10:35:07 PST
Verified on 1.9.2, while I was at it:
$ grep xul.sym firefox-3.6b6pre-WINNT-20091221053555-symbols.txt
xul.pdb/E3681D36E1A74C349CF24A8FDB19B2232/xul.sym
$ grep ^FUNC xul.pdb/E3681D36E1A74C349CF24A8FDB19B2232/xul.sym | wc -l
63017

$ grep xul.sym firefox-3.6b6pre-WINNT-20091222062748-symbols.txt
xul.pdb/263DF4266ABA49A4A9BF4929F17BF4D12/xul.sym
$ grep ^FUNC xul.pdb/263DF4266ABA49A4A9BF4929F17BF4D12/xul.sym | wc -l
71349
Comment 20 Ted Mielczarek [:ted.mielczarek] 2010-01-19 10:40:32 PST
Pushed to 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/0b051f59e0ab

Note You need to log in before you can comment on or make changes to this bug.