Closed
Bug 654595
Opened 14 years ago
Closed 12 years ago
Enabling PGO broke Linux crash reports (no symbols)
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: dholbert, Unassigned)
References
Details
(Keywords: regression)
Attachments
(3 files, 2 obsolete files)
STEPS TO REPRODUCE:
1. Install http://code.google.com/p/crashme/ in a fresh profile.
2. Start up a nightly with that profile, and do
Tools | CrashMe | Null Pointer Deref
3. Visit about:crashes & load the generated crash report
ACTUAL RESULTS: No symbols in crash report
EXPECTED RESULTS: Symbols
I've confirmed that this started in the 2011-04-29 nightly.
Crash report from 2011-04-28 nightly has symbols:
bp-9dc722db-b310-4a7a-a248-085f32110503
Crash report from 2011-04-29 nightly has NO symbols:
bp-c8442888-5cf2-43aa-8a3c-072b42110503
...and this is still a problem in today's nightly.
Likely to be from one of the GCC config changes on 4/28.
Severity: normal → critical
Reporter | ||
Comment 1•14 years ago
|
||
Regression pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=557165a66267&tochange=88d3c5bde0ba
In particular, this cset looks suspicious:
> bc80c46f185d Justin Lebar — Bug 590181 part 2 - Switch default gcc optimize options to -O3. r=ted. a=philor CLOSED TREE
This could also be from the GCC 4.5 switch / PGO enabling (not shown in pushlog, but shown in this comment:
https://bugzilla.mozilla.org/show_bug.cgi?id=559964#c159
Severity: critical → normal
Updated•14 years ago
|
Assignee: nobody → mh+mozilla
Reporter | ||
Updated•14 years ago
|
Severity: normal → critical
Comment 2•14 years ago
|
||
So far, here is what I know:
- symbols are present in .so.dbg files and gdb is happy with them.
- CFI information seems complete in .sym files.
- only a few symbols are present in .sym files.
- what seems common to all these symbols that appear in .sym files is that they come from object files that weren't built with PGO.
Comment 3•14 years ago
|
||
And I confirm that disabling PGO makes the .sym files contain all we need :(
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-0a73ffe87f78/try-linux64/
Updated•14 years ago
|
Assignee: mh+mozilla → nobody
Component: Build Config → Breakpad Integration
Product: Core → Toolkit
QA Contact: build-config → breakpad.integration
Comment 4•14 years ago
|
||
From the looks of it, it is a dump_syms problem.
Updated•14 years ago
|
Hardware: x86_64 → All
Comment 5•14 years ago
|
||
The dump_syms code tries to load DWARF from .debug_info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/linux/dump_symbols.cc#516
Which uses the dwarf2reader::CompilationUnit class to kick off the actual DWARF parsing:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf/dwarf2reader.cc#260
which calls back into DwarfCUToModule to handle function names and stuff them into a google_breakpad::Module:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc
Comment 6•14 years ago
|
||
This is critical, to the point of turning off PGO in nightly builds until it is resolved: who wants to own it?
tracking-firefox6:
--- → +
Comment 7•14 years ago
|
||
So, it looks like the blame is to put on gcc, but considering gdb is able to deal with it, I guess we could do something about it.
It looks like it all boils down to functions not having low_pc and high_pc defined for the PGO'ed functions, which makes this
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc#422
do nothing.
I'll attach two dwarfdump outputs, one from a normal libmozalloc.so, and one from a PGO'ed libmozalloc.so.
Comment 8•14 years ago
|
||
(In reply to comment #6)
> This is critical, to the point of turning off PGO in nightly builds until it is
> resolved: who wants to own it?
Nightlies only ?
Comment 9•14 years ago
|
||
Comment 10•14 years ago
|
||
Updated•14 years ago
|
Attachment #529980 -
Attachment description: dwarddump output for non-PGO libmozalloc.so → dwarfdump output for non-PGO libmozalloc.so
Attachment #529980 -
Attachment mime type: text/html → text/plain
Updated•14 years ago
|
Attachment #529981 -
Attachment mime type: text/html → text/plain
Comment 11•14 years ago
|
||
Weirdly, the builds from the buildbot have more dwarf information than mine, and do contain low_pc and high_pc, but in a different form.
Attachment #529981 -
Attachment is obsolete: true
Updated•14 years ago
|
Attachment #529986 -
Attachment mime type: text/html → text/plain
Comment 12•14 years ago
|
||
(In reply to comment #11)
> Created attachment 529986 [details]
> dwarfdump output for PGO libmozalloc.so
>
> Weirdly, the builds from the buildbot have more dwarf information than mine,
> and do contain low_pc and high_pc, but in a different form.
That's actually frame info.
Comment 13•14 years ago
|
||
And now I don't know what I've been doing earlier, but I can't even get gdb to like the .so.dbg files anymore:
warning: the debug information found in "./tmp/firefox/libxpcom.so.dbg" does not match "./libxpcom.so" (CRC mismatch).
warning: the debug information found in "./tmp/firefox/libmozalloc.so.dbg" does not match "./libmozalloc.so" (CRC mismatch).
etc.
Updated•14 years ago
|
Summary: Some change on 4/28 broke Linux crash reports (no symbols) → Enabling PGO broke Linux crash reports (no symbols)
Comment 14•14 years ago
|
||
This is another instance of a pgo build, from http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-f2c7bdb0a32a/try-linux64/ (non-stripped build, which explains the .tar.bz2 size)
This one doesn't have DW_AT_frame_base as loclists in all DW_TAG_subprograms.
gdb is able to get functions and line numbers with these debug info, so it obviously finds something useful in there...
(It however looks wrong, per spec, that there's no DW_AT_low_pc/DW_AT_high_pc)
Attachment #529986 -
Attachment is obsolete: true
Comment 15•14 years ago
|
||
Mike, if you could attach a PGO binary file for which dump_syms fails to write symbols, I can debug this.
Comment 16•14 years ago
|
||
Here you are.
Comment 17•14 years ago
|
||
I'd especially like to see one where GDB can get function names (line number info is separate, and self-contained) but dump_syms can't.
Comment 18•14 years ago
|
||
(In reply to comment #16)
> Here you are.
Taking a look...
Comment 19•14 years ago
|
||
(btw, on this file, dump_syms outputs each CFI info twice, as in bug 637665)
Comment 20•14 years ago
|
||
(In reply to comment #17)
> I'd especially like to see one where GDB can get function names (line number
> info is separate, and self-contained) but dump_syms can't.
I could break in moz_free and have the corresponding line number. But for function names, gdb might just be using the symbols table.
Reporter | ||
Comment 21•14 years ago
|
||
(Once we fix this, could we add some sort of automated test for this? I seem to recall at least one other instance in the past where we had basically the same issue (no usable symbols), and it'd be great if we could automatically catch this the next time it happens.)
Flags: in-testsuite?
Comment 22•14 years ago
|
||
(In reply to comment #21)
> (Once we fix this, could we add some sort of automated test for this? I seem to
> recall at least one other instance in the past where we had basically the same
> issue (no usable symbols), and it'd be great if we could automatically catch
> this the next time it happens.)
Unfortunately, these are two completely different problems that can't be caught at the same level. This one can only be caught by actually checking with some other tool than dump_syms that everything we'd expect to be in the symbols files are there
Comment 23•14 years ago
|
||
(these being this one and the previous one you mention, when we had a breakage due to elfhack)
Comment 24•14 years ago
|
||
Hum. If we can't upgrade GCC to get around this, we may need to teach dump_syms to read the ELF symbol table and demangle the names. ELF symbols have a type that tells you whether they're a function or not, and a size that gives you the extent of the function.
Comment 25•14 years ago
|
||
Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us publicly visible symbols, though?
Comment 26•14 years ago
|
||
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us
> publicly visible symbols, though?
All symbols are in .symtab. Publicly visible symbols are in .dynsym.
Comment 27•14 years ago
|
||
Ah, okay. In that case, it'd be fairly easy to modify that patch to work on .symtab and produce FUNC records.
Comment 28•14 years ago
|
||
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show
Note that you shouldn't be taking the .dynstr section by name, but from the sh_link from the .dynsym section. (which, if you replace with .symtab, still works, since sh_link for .symtab is .strtab)
Also, according to the summary, the patch doesn't generate the same kind of information as what you'd get from dwarf, while it should be possible.
Comment 29•14 years ago
|
||
Right, hence the "modify that patch" bit. I don't think it'd be a lot of work on top of that if we decided we needed it.
Comment 30•14 years ago
|
||
For the record, .symtab information wouldn't have saved us. I'll post a detailed analysis of the whole problem on my blog.
Comment 31•14 years ago
|
||
Do we need to track this for Firefox 6 given the two dependencies are resolved and we do have crash stats for linux?
Comment 32•14 years ago
|
||
(In reply to comment #31)
> Do we need to track this for Firefox 6 given the two dependencies are
> resolved and we do have crash stats for linux?
We don't. I need to update this bug, btw.
Updated•14 years ago
|
tracking-firefox6:
+ → ---
Comment 33•13 years ago
|
||
Symbols are missing on Try too, is that the same problem as this bug?
https://tbpl.mozilla.org/php/getParsedLog.php?id=11710315&full=1&branch=try#error0
Comment 34•13 years ago
|
||
That's some other completely different issue. Note that that's a debug build, and this bug was about PGO builds.
We have been using PGO for some time, has this been fixed?
Comment 36•12 years ago
|
||
I believe this got fixed in bug 654975.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•