Enabling PGO broke Linux crash reports (no symbols)

RESOLVED FIXED

Status

()

Toolkit
Crash Reporting
--
critical
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: dholbert, Unassigned)

Tracking

({regression})

Trunk
All
Linux
regression
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
STEPS TO REPRODUCE:
 1. Install http://code.google.com/p/crashme/ in a fresh profile.
 2. Start up a nightly with that profile, and do
       Tools | CrashMe | Null Pointer Deref
 3. Visit about:crashes & load the generated crash report

ACTUAL RESULTS: No symbols in crash report
EXPECTED RESULTS: Symbols

I've confirmed that this started in the 2011-04-29 nightly.

Crash report from 2011-04-28 nightly has symbols:
bp-9dc722db-b310-4a7a-a248-085f32110503

Crash report from 2011-04-29 nightly has NO symbols:
bp-c8442888-5cf2-43aa-8a3c-072b42110503

...and this is still a problem in today's nightly.

Likely to be from one of the GCC config changes on 4/28.
Severity: normal → critical
(Reporter)

Comment 1

6 years ago
Regression pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=557165a66267&tochange=88d3c5bde0ba

In particular, this cset looks suspicious:
> bc80c46f185d	Justin Lebar — Bug 590181 part 2 - Switch default gcc optimize options to -O3. r=ted. a=philor CLOSED TREE

This could also be from the GCC 4.5 switch / PGO enabling (not shown in pushlog, but shown in this comment:
https://bugzilla.mozilla.org/show_bug.cgi?id=559964#c159
Severity: critical → normal
Assignee: nobody → mh+mozilla
(Reporter)

Updated

6 years ago
Severity: normal → critical
So far, here is what I know:
- symbols are present in .so.dbg files and gdb is happy with them.
- CFI information seems complete in .sym files.
- only a few symbols are present in .sym files.
- what seems common to all these symbols that appear in .sym files is that they come from object files that weren't built with PGO.
And I confirm that disabling PGO makes the .sym files contain all we need :(

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-0a73ffe87f78/try-linux64/
Assignee: mh+mozilla → nobody
Component: Build Config → Breakpad Integration
Product: Core → Toolkit
QA Contact: build-config → breakpad.integration
From the looks of it, it is a dump_syms problem.
Hardware: x86_64 → All
The dump_syms code tries to load DWARF from .debug_info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/linux/dump_symbols.cc#516

Which uses the dwarf2reader::CompilationUnit class to kick off the actual DWARF parsing:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf/dwarf2reader.cc#260

which calls back into DwarfCUToModule to handle function names and stuff them into a google_breakpad::Module:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc
This is critical, to the point of turning off PGO in nightly builds until it is resolved: who wants to own it?
tracking-firefox6: --- → +
So, it looks like the blame is to put on gcc, but considering gdb is able to deal with it, I guess we could do something about it.

It looks like it all boils down to functions not having low_pc and high_pc defined for the PGO'ed functions, which makes this 
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc#422
do nothing.

I'll attach two dwarfdump outputs, one from a normal libmozalloc.so, and one from a PGO'ed libmozalloc.so.
(In reply to comment #6)
> This is critical, to the point of turning off PGO in nightly builds until it is
> resolved: who wants to own it?

Nightlies only ?
Created attachment 529980 [details]
dwarfdump output for non-PGO libmozalloc.so
Created attachment 529981 [details]
dwarfdump output for PGO libmozalloc.so
Attachment #529980 - Attachment description: dwarddump output for non-PGO libmozalloc.so → dwarfdump output for non-PGO libmozalloc.so
Attachment #529980 - Attachment mime type: text/html → text/plain
Attachment #529981 - Attachment mime type: text/html → text/plain
Created attachment 529986 [details]
dwarfdump output for PGO libmozalloc.so

Weirdly, the builds from the buildbot have more dwarf information than mine, and do contain low_pc and high_pc, but in a different form.
Attachment #529981 - Attachment is obsolete: true
Attachment #529986 - Attachment mime type: text/html → text/plain
(In reply to comment #11)
> Created attachment 529986 [details]
> dwarfdump output for PGO libmozalloc.so
> 
> Weirdly, the builds from the buildbot have more dwarf information than mine,
> and do contain low_pc and high_pc, but in a different form.

That's actually frame info.
Depends on: 654700
And now I don't know what I've been doing earlier, but I can't even get gdb to like the .so.dbg files anymore:

warning: the debug information found in "./tmp/firefox/libxpcom.so.dbg" does not match "./libxpcom.so" (CRC mismatch).

warning: the debug information found in "./tmp/firefox/libmozalloc.so.dbg" does not match "./libmozalloc.so" (CRC mismatch).

etc.

Updated

6 years ago
Summary: Some change on 4/28 broke Linux crash reports (no symbols) → Enabling PGO broke Linux crash reports (no symbols)
Created attachment 530060 [details]
dwarfdump output for PGO libmozalloc.so

This is another instance of a pgo build, from http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-f2c7bdb0a32a/try-linux64/ (non-stripped build, which explains the .tar.bz2 size)

This one doesn't have DW_AT_frame_base as loclists in all DW_TAG_subprograms.
gdb is able to get functions and line numbers with these debug info, so it obviously finds something useful in there...

(It however looks wrong, per spec, that there's no DW_AT_low_pc/DW_AT_high_pc)
Attachment #529986 - Attachment is obsolete: true

Comment 15

6 years ago
Mike, if you could attach a PGO binary file for which dump_syms fails to write symbols, I can debug this.
Created attachment 530061 [details]
PGO'ed libmozalloc.so

Here you are.

Comment 17

6 years ago
I'd especially like to see one where GDB can get function names (line number info is separate, and self-contained) but dump_syms can't.

Comment 18

6 years ago
(In reply to comment #16)
> Here you are.

Taking a look...
(btw, on this file, dump_syms outputs each CFI info twice, as in bug 637665)
(In reply to comment #17)
> I'd especially like to see one where GDB can get function names (line number
> info is separate, and self-contained) but dump_syms can't.

I could break in moz_free and have the corresponding line number. But for function names, gdb might just be using the symbols table.
(Reporter)

Comment 21

6 years ago
(Once we fix this, could we add some sort of automated test for this? I seem to recall at least one other instance in the past where we had basically the same issue (no usable symbols), and it'd be great if we could automatically catch this the next time it happens.)
Flags: in-testsuite?
(In reply to comment #21)
> (Once we fix this, could we add some sort of automated test for this? I seem to
> recall at least one other instance in the past where we had basically the same
> issue (no usable symbols), and it'd be great if we could automatically catch
> this the next time it happens.)

Unfortunately, these are two completely different problems that can't be caught at the same level. This one can only be caught by actually checking with some other tool than dump_syms that everything we'd expect to be in the symbols files are there
(these being this one and the previous one you mention, when we had a breakage due to elfhack)

Comment 24

6 years ago
Hum. If we can't upgrade GCC to get around this, we may need to teach dump_syms to read the ELF symbol table and demangle the names. ELF symbols have a type that tells you whether they're a function or not, and a size that gives you the extent of the function.
Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us publicly visible symbols, though?
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us
> publicly visible symbols, though?

All symbols are in .symtab. Publicly visible symbols are in .dynsym.
Ah, okay. In that case, it'd be fairly easy to modify that patch to work on .symtab and produce FUNC records.
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show

Note that you shouldn't be taking the .dynstr section by name, but from the sh_link from the .dynsym section. (which, if you replace with .symtab, still works, since sh_link for .symtab is .strtab)

Also, according to the summary, the patch doesn't generate the same kind of information as what you'd get from dwarf, while it should be possible.
Right, hence the "modify that patch" bit. I don't think it'd be a lot of work on top of that if we decided we needed it.
Depends on: 654975
For the record, .symtab information wouldn't have saved us. I'll post a detailed analysis of the whole problem on my blog.

Comment 31

6 years ago
Do we need to track this for Firefox 6 given the two dependencies are resolved and we do have crash stats for linux?
(In reply to comment #31)
> Do we need to track this for Firefox 6 given the two dependencies are
> resolved and we do have crash stats for linux?

We don't. I need to update this bug, btw.
tracking-firefox6: + → ---
Symbols are missing on Try too, is that the same problem as this bug?
https://tbpl.mozilla.org/php/getParsedLog.php?id=11710315&full=1&branch=try#error0
That's some other completely different issue. Note that that's a debug build, and this bug was about PGO builds.
We have been using PGO for some time, has this been fixed?
I believe this got fixed in bug 654975.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.