Closed Bug 654595 Opened 9 years ago Closed 8 years ago
Enabling PGO broke Linux crash reports (no symbols)
STEPS TO REPRODUCE: 1. Install http://code.google.com/p/crashme/ in a fresh profile. 2. Start up a nightly with that profile, and do Tools | CrashMe | Null Pointer Deref 3. Visit about:crashes & load the generated crash report ACTUAL RESULTS: No symbols in crash report EXPECTED RESULTS: Symbols I've confirmed that this started in the 2011-04-29 nightly. Crash report from 2011-04-28 nightly has symbols: bp-9dc722db-b310-4a7a-a248-085f32110503 Crash report from 2011-04-29 nightly has NO symbols: bp-c8442888-5cf2-43aa-8a3c-072b42110503 ...and this is still a problem in today's nightly. Likely to be from one of the GCC config changes on 4/28.
Severity: normal → critical
Regression pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=557165a66267&tochange=88d3c5bde0ba In particular, this cset looks suspicious: > bc80c46f185d Justin Lebar — Bug 590181 part 2 - Switch default gcc optimize options to -O3. r=ted. a=philor CLOSED TREE This could also be from the GCC 4.5 switch / PGO enabling (not shown in pushlog, but shown in this comment: https://bugzilla.mozilla.org/show_bug.cgi?id=559964#c159
Severity: critical → normal
9 years ago
Severity: normal → critical
So far, here is what I know: - symbols are present in .so.dbg files and gdb is happy with them. - CFI information seems complete in .sym files. - only a few symbols are present in .sym files. - what seems common to all these symbols that appear in .sym files is that they come from object files that weren't built with PGO.
And I confirm that disabling PGO makes the .sym files contain all we need :( http://email@example.com/try-linux64/
Assignee: mh+mozilla → nobody
Component: Build Config → Breakpad Integration
Product: Core → Toolkit
QA Contact: build-config → breakpad.integration
From the looks of it, it is a dump_syms problem.
The dump_syms code tries to load DWARF from .debug_info: http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/linux/dump_symbols.cc#516 Which uses the dwarf2reader::CompilationUnit class to kick off the actual DWARF parsing: http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf/dwarf2reader.cc#260 which calls back into DwarfCUToModule to handle function names and stuff them into a google_breakpad::Module: http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc
This is critical, to the point of turning off PGO in nightly builds until it is resolved: who wants to own it?
So, it looks like the blame is to put on gcc, but considering gdb is able to deal with it, I guess we could do something about it. It looks like it all boils down to functions not having low_pc and high_pc defined for the PGO'ed functions, which makes this http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc#422 do nothing. I'll attach two dwarfdump outputs, one from a normal libmozalloc.so, and one from a PGO'ed libmozalloc.so.
(In reply to comment #6) > This is critical, to the point of turning off PGO in nightly builds until it is > resolved: who wants to own it? Nightlies only ?
Attachment #529981 - Attachment mime type: text/html → text/plain
Weirdly, the builds from the buildbot have more dwarf information than mine, and do contain low_pc and high_pc, but in a different form.
Attachment #529981 - Attachment is obsolete: true
Attachment #529986 - Attachment mime type: text/html → text/plain
(In reply to comment #11) > Created attachment 529986 [details] > dwarfdump output for PGO libmozalloc.so > > Weirdly, the builds from the buildbot have more dwarf information than mine, > and do contain low_pc and high_pc, but in a different form. That's actually frame info.
And now I don't know what I've been doing earlier, but I can't even get gdb to like the .so.dbg files anymore: warning: the debug information found in "./tmp/firefox/libxpcom.so.dbg" does not match "./libxpcom.so" (CRC mismatch). warning: the debug information found in "./tmp/firefox/libmozalloc.so.dbg" does not match "./libmozalloc.so" (CRC mismatch). etc.
Summary: Some change on 4/28 broke Linux crash reports (no symbols) → Enabling PGO broke Linux crash reports (no symbols)
This is another instance of a pgo build, from http://firstname.lastname@example.org/try-linux64/ (non-stripped build, which explains the .tar.bz2 size) This one doesn't have DW_AT_frame_base as loclists in all DW_TAG_subprograms. gdb is able to get functions and line numbers with these debug info, so it obviously finds something useful in there... (It however looks wrong, per spec, that there's no DW_AT_low_pc/DW_AT_high_pc)
Attachment #529986 - Attachment is obsolete: true
Mike, if you could attach a PGO binary file for which dump_syms fails to write symbols, I can debug this.
Here you are.
I'd especially like to see one where GDB can get function names (line number info is separate, and self-contained) but dump_syms can't.
(In reply to comment #16) > Here you are. Taking a look...
(btw, on this file, dump_syms outputs each CFI info twice, as in bug 637665)
(In reply to comment #17) > I'd especially like to see one where GDB can get function names (line number > info is separate, and self-contained) but dump_syms can't. I could break in moz_free and have the corresponding line number. But for function names, gdb might just be using the symbols table.
(Once we fix this, could we add some sort of automated test for this? I seem to recall at least one other instance in the past where we had basically the same issue (no usable symbols), and it'd be great if we could automatically catch this the next time it happens.)
(In reply to comment #21) > (Once we fix this, could we add some sort of automated test for this? I seem to > recall at least one other instance in the past where we had basically the same > issue (no usable symbols), and it'd be great if we could automatically catch > this the next time it happens.) Unfortunately, these are two completely different problems that can't be caught at the same level. This one can only be caught by actually checking with some other tool than dump_syms that everything we'd expect to be in the symbols files are there
(these being this one and the previous one you mention, when we had a breakage due to elfhack)
Hum. If we can't upgrade GCC to get around this, we may need to teach dump_syms to read the ELF symbol table and demangle the names. ELF symbols have a type that tells you whether they're a function or not, and a size that gives you the extent of the function.
Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us publicly visible symbols, though?
(In reply to comment #25) > Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us > publicly visible symbols, though? All symbols are in .symtab. Publicly visible symbols are in .dynsym.
Ah, okay. In that case, it'd be fairly easy to modify that patch to work on .symtab and produce FUNC records.
(In reply to comment #25) > Like this? http://breakpad.appspot.com/275001/show Note that you shouldn't be taking the .dynstr section by name, but from the sh_link from the .dynsym section. (which, if you replace with .symtab, still works, since sh_link for .symtab is .strtab) Also, according to the summary, the patch doesn't generate the same kind of information as what you'd get from dwarf, while it should be possible.
Right, hence the "modify that patch" bit. I don't think it'd be a lot of work on top of that if we decided we needed it.
For the record, .symtab information wouldn't have saved us. I'll post a detailed analysis of the whole problem on my blog.
Do we need to track this for Firefox 6 given the two dependencies are resolved and we do have crash stats for linux?
(In reply to comment #31) > Do we need to track this for Firefox 6 given the two dependencies are > resolved and we do have crash stats for linux? We don't. I need to update this bug, btw.
Symbols are missing on Try too, is that the same problem as this bug? https://tbpl.mozilla.org/php/getParsedLog.php?id=11710315&full=1&branch=try#error0
That's some other completely different issue. Note that that's a debug build, and this bug was about PGO builds.
We have been using PGO for some time, has this been fixed?
I believe this got fixed in bug 654975.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.