Closed
Bug 526484
Opened 15 years ago
Closed 15 years ago
symbol dumper behavior fails to include certain code blocks produced by PGO (some sections of functions missing in output)
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | final-fixed |
status1.9.1 | --- | .8-fixed |
People
(Reporter: ted, Assigned: ted)
Details
Attachments
(1 file)
60.51 KB,
patch
|
benjamin
:
review+
beltzner
:
approval1.9.2+
beltzner
:
approval1.9.1.8+
|
Details | Diff | Splinter Review |
In bug 525326 comment 18, Jeff notes that the top of the stack here:
http://crash-stats.mozilla.com/report/index/7c622a06-1d57-4d34-afd8-540af2091016
is in known code, despite showing up as "xul.dll@0x34311e". If you look at the source of the frame above it, it's clearly calling "GifWrite()", and I suspect GifWrite has been inlined, since that's the only call site for it. I think maybe the Windows symbol dumper isn't handling inlined functions correctly in this case.
We've also seen other instances that could be the same thing, for example:
http://crash-stats.mozilla.com/report/index/1c3fde12-d276-4c7b-9d2c-917a82091104?p=1
We clearly have mozjs.dll symbols there, just not for that top frame.
jimb noted elsewhere while working on DWARF CFI work that the Breakpad symbol format doesn't actually handle inlined methods cleanly. IIRC, on OSX/Linux we currently just stick the file/line info of the inlined method inside the function data of the containing method, so in crash stacks you get the name of the containing method, and the file/line info of the inlined method.
Assignee | ||
Comment 1•15 years ago
|
||
The same crash on 3.5.4, since the crash reports linked in comment 0 are nightlies, and as such their symbols have already been deleted from the symbol server (oops):
http://crash-stats.mozilla.com/report/index/5878c51c-c28f-4bf4-afff-030c52091112
The libxul symbols for 3.5.4 on Windows:
http://symbols.mozilla.org/firefox/xul.pdb/438A50233B37496A8B73EA977D68E98F2/xul.sym
(there's also the compressed xul.pd_ in that same directory)
There appears to be an entry for GifWrite in the symbol file:
FUNC 157f10 4c4 8 nsGIFDecoder2::GifWrite(unsigned char const *,unsigned int)
but that's clearly not the address in the stack trace (xul.dll@0x348864).
Also, the entry for nsGIFDecoder2::ProcessData looks sane:
FUNC 188635 2e c nsGIFDecoder2::ProcessData(unsigned char *,unsigned int,unsigne
d int *)
188635 0 247 193060
188635 e 250 193060
188643 9 251 193060
18864c d 254 193060
188659 7 260 193060
188660 3 263 193060
(that's the entirety of it). So it doesn't appear to actually have been inlined.
I'm going to poke at the PDB file with the DIA APIs to see what it thinks is at that address.
Summary: investigate symbol dumper behavior with inlined functions on windows → investigate symbol dumper behavior on windows
Assignee | ||
Comment 2•15 years ago
|
||
using the DIA API findSymbolByRVA, I get the following info for that address:
Found something at 0x348864
==================
Symbol type: SymTagBlock
No name present
No source file present
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no
Found lexical parent
==================
Symbol type: SymTagFunction
Name: nsGIFDecoder2::GifWrite
Source File: nsGIFDecoder2::GifWrite
Is code: no
Is function: no
Has debug info: no
Is compiler generated: no
So it's a code block that's apparently a child of GifWrite, but it doesn't have its own name.
Assignee | ||
Comment 3•15 years ago
|
||
I guess part of the problem here is that Breakpad's symbol format doesn't know how to represent noncontiguous functions, which is what this apparently is. The function printer in the symbol dumping code just asks for a list of functions by RVA, and then prints them:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/windows/pdb_source_line_writer.cc#226
Since this isn't actually a function, it probably doesn't get enumerated there. If I had to guess, I would think that we could call IDiaSymbol::findChildren on each function we get, then for any blocks we find that aren't contained in the function's address space, we could print out an extra FUNC line with the block's rva and size, and its line info.
Assignee | ||
Comment 4•15 years ago
|
||
That guess doesn't seem to pan out. Using DIA, I can get this block if I call findSymbolByRVA with its RVA, and then get the parent function using get_lexicalParent, but if I have the parent function, this block isn't anywhere I can get it using findChildren (it's not a descendant node at all in that tree).
There must be some way to collect these orphaned blocks and associate them with the function they came from, I'll keep looking. Currently dump_syms uses getSymbolsByAddr to enumerate over all functions in the PDB, but this doesn't find these code blocks.
Assignee | ||
Updated•15 years ago
|
Summary: investigate symbol dumper behavior on windows → symbol dumper behavior fails to include certain code blocks produced by PGO (some sections of functions missing in output)
Assignee | ||
Comment 5•15 years ago
|
||
Ok, after much fiddling, I was able to hack dump_syms such that it can locate these missing blocks of code. I can make it output extra FUNC lines, so that we should get function names for these blocks of code, but there doesn't appear to be source line data available at all in the PDB files. This means that we won't get source file + line numbers for them. I'll have to test with VC9 and see if it's any better in this regard. If not, I'll file a bug on Microsoft.
Assignee | ||
Comment 6•15 years ago
|
||
Since I said I thought there was a compiler bug (or at least deficiency) here, Jeff asked what would happen if a Microsoft debugger were to look at these blocks of code. I used the testcase from bug 525326 to crash a Firefox 3.5.4 with WinDBG attached. The top of the stack looks like:
ChildEBP RetAddr
0012f96c 10188643 xul!nsGIFDecoder2::GifWrite+0x1f0a35
0012f978 1018867b xul!nsGIFDecoder2::ProcessData(unsigned char * data = 0x025ed608 "D???", unsigned int count = 0x4e0b020, unsigned int * _retval = 0x04cb20a0)+0xe [e:\builds\moz2_slave\win32_build\build\modules\libpr0n\decoders\gif\nsgifdecoder2.cpp @ 251]
I don't get source info for the top frame. I have source server debugging enabled, and I get source if I click on the second frame. It seems pretty clear that the compiler just isn't emitting full debug info for these blocks of code that were rearranged by the optimizer. I'm going to try producing a reduced testcase so I can test on VC9 (and the VC10 beta). I'll also get my patch cleaned up and submitted upstream, since having function names is better than what we have now (and appears to be the best we can do with the compiler limitations).
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → ted.mielczarek
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•15 years ago
|
||
Patch up for upstream review:
http://breakpad.appspot.com/40007
Assignee | ||
Comment 8•15 years ago
|
||
I landed this upstream. We should just take it locally (and on branches) because merging with upstream is a pain due to the Linux client rewrite right now. (I will merge that in bug 514188 eventually.)
Assignee | ||
Comment 9•15 years ago
|
||
Actually, I forgot, we don't build dump_syms on Windows (because it doesn't build on MSVC Express, I think), so I just need to update the checked-in binary.
Assignee | ||
Comment 10•15 years ago
|
||
I have another patch awaiting review upstream that I'd like to land first:
http://breakpad.appspot.com/43001/show
Assignee | ||
Comment 11•15 years ago
|
||
Not much to review here, really. The upstream patches that this picks up are listed in this bug. I built this using VC8, and the dump_syms.vcproj in the Breakpad tree. The only change I made was to switch it to static CRT linkage.
We should take this on all of our branches, ideally, since it's nothing but good.
I can even regenerate our existing symbols for the latest release on each branch using this, I'll file a separate bug for that.
Attachment #414265 -
Flags: review?(benjamin)
Updated•15 years ago
|
Attachment #414265 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 12•15 years ago
|
||
I filed bug 530783 on that.
Assignee | ||
Comment 13•15 years ago
|
||
Pushed to trunk:
http://hg.mozilla.org/mozilla-central/rev/246999503836
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•15 years ago
|
||
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430
We should take this on branches to get better symbols on Windows builds.
Attachment #414265 -
Flags: approval1.9.2?
Attachment #414265 -
Flags: approval1.9.1.7?
Updated•15 years ago
|
Whiteboard: [landed 12-14, want one day bake before 1.9.2]
Comment 15•15 years ago
|
||
Maybe this is 3.6.1 instead of 3.6, but nominating anyway. Would like this landed on 1.9.2 before 1.9.1
Flags: blocking1.9.2?
Updated•15 years ago
|
blocking1.9.1: --- → needed
status1.9.1:
--- → wanted
Comment 16•15 years ago
|
||
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430
a192=beltzner, let's get this in, but if it bounces, we'll pull it, so not blocking.
Attachment #414265 -
Flags: approval1.9.2? → approval1.9.2+
Updated•15 years ago
|
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Assignee | ||
Comment 17•15 years ago
|
||
Pushed to 1.9.2:
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/c576c15e01a9
Kind of tough to verify, but I ssh'ed to a place with access to the symbol store, and checked the count of functions reported in each symbol file:
$ grep xul.sym firefox-3.7a1pre-WINNT-20091214045914-symbols.txt
xul.pdb/F6409D49C333416297B8388C2C8F5F592/xul.sym
$ grep ^FUNC xul.pdb/F6409D49C333416297B8388C2C8F5F592/xul.sym | wc -l
63971
$ grep xul.sym firefox-3.7a1pre-WINNT-20091215045531-symbols.txt
xul.pdb/52C531559AD943F8A6FB24BB407EE5292/xul.sym
$ grep ^FUNC xul.pdb/52C531559AD943F8A6FB24BB407EE5292/xul.sym | wc -l
74972
You can see that we're reporting about 11k more function names, which are the bits that were missing.
status1.9.2:
--- → final-fixed
Whiteboard: [landed 12-14, want one day bake before 1.9.2]
Updated•15 years ago
|
Whiteboard: [waiting on 1.9.2rc1 bake]
Comment 18•15 years ago
|
||
Comment on attachment 414265 [details] [diff] [review]
update dump_syms.exe to one built from SVN r430
a=beltzner for 191 branch
Attachment #414265 -
Flags: approval1.9.1.8? → approval1.9.1.8+
Assignee | ||
Comment 19•15 years ago
|
||
Verified on 1.9.2, while I was at it:
$ grep xul.sym firefox-3.6b6pre-WINNT-20091221053555-symbols.txt
xul.pdb/E3681D36E1A74C349CF24A8FDB19B2232/xul.sym
$ grep ^FUNC xul.pdb/E3681D36E1A74C349CF24A8FDB19B2232/xul.sym | wc -l
63017
$ grep xul.sym firefox-3.6b6pre-WINNT-20091222062748-symbols.txt
xul.pdb/263DF4266ABA49A4A9BF4929F17BF4D12/xul.sym
$ grep ^FUNC xul.pdb/263DF4266ABA49A4A9BF4929F17BF4D12/xul.sym | wc -l
71349
Whiteboard: [waiting on 1.9.2rc1 bake]
Assignee | ||
Comment 20•15 years ago
|
||
Pushed to 1.9.1:
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/0b051f59e0ab
Updated•15 years ago
|
blocking1.9.1: needed → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•