Last Comment Bug 654595 - Enabling PGO broke Linux crash reports (no symbols)
: Enabling PGO broke Linux crash reports (no symbols)
Status: RESOLVED FIXED
: regression
Product: Toolkit
Classification: Components
Component: Breakpad Integration (show other bugs)
: Trunk
: All Linux
: -- critical (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 654700 654975
Blocks:
  Show dependency treegraph
 
Reported: 2011-05-03 16:02 PDT by Daniel Holbert [:dholbert]
Modified: 2013-12-27 14:24 PST (History)
20 users (show)
dholbert: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
dwarfdump output for non-PGO libmozalloc.so (110.35 KB, text/plain)
2011-05-04 06:11 PDT, Mike Hommey [:glandium]
no flags Details
dwarfdump output for PGO libmozalloc.so (107.94 KB, text/plain)
2011-05-04 06:12 PDT, Mike Hommey [:glandium]
no flags Details
dwarfdump output for PGO libmozalloc.so (88.80 KB, text/plain)
2011-05-04 06:37 PDT, Mike Hommey [:glandium]
no flags Details
dwarfdump output for PGO libmozalloc.so (98.26 KB, text/html)
2011-05-04 10:11 PDT, Mike Hommey [:glandium]
no flags Details
PGO'ed libmozalloc.so (21.73 KB, application/x-sharedlib)
2011-05-04 10:14 PDT, Mike Hommey [:glandium]
no flags Details

Description Daniel Holbert [:dholbert] 2011-05-03 16:02:51 PDT
STEPS TO REPRODUCE:
 1. Install http://code.google.com/p/crashme/ in a fresh profile.
 2. Start up a nightly with that profile, and do
       Tools | CrashMe | Null Pointer Deref
 3. Visit about:crashes & load the generated crash report

ACTUAL RESULTS: No symbols in crash report
EXPECTED RESULTS: Symbols

I've confirmed that this started in the 2011-04-29 nightly.

Crash report from 2011-04-28 nightly has symbols:
bp-9dc722db-b310-4a7a-a248-085f32110503

Crash report from 2011-04-29 nightly has NO symbols:
bp-c8442888-5cf2-43aa-8a3c-072b42110503

...and this is still a problem in today's nightly.

Likely to be from one of the GCC config changes on 4/28.
Comment 1 Daniel Holbert [:dholbert] 2011-05-03 16:06:59 PDT
Regression pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=557165a66267&tochange=88d3c5bde0ba

In particular, this cset looks suspicious:
> bc80c46f185d	Justin Lebar — Bug 590181 part 2 - Switch default gcc optimize options to -O3. r=ted. a=philor CLOSED TREE

This could also be from the GCC 4.5 switch / PGO enabling (not shown in pushlog, but shown in this comment:
https://bugzilla.mozilla.org/show_bug.cgi?id=559964#c159
Comment 2 Mike Hommey [:glandium] 2011-05-04 00:07:00 PDT
So far, here is what I know:
- symbols are present in .so.dbg files and gdb is happy with them.
- CFI information seems complete in .sym files.
- only a few symbols are present in .sym files.
- what seems common to all these symbols that appear in .sym files is that they come from object files that weren't built with PGO.
Comment 3 Mike Hommey [:glandium] 2011-05-04 01:13:06 PDT
And I confirm that disabling PGO makes the .sym files contain all we need :(

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-0a73ffe87f78/try-linux64/
Comment 4 Mike Hommey [:glandium] 2011-05-04 02:47:10 PDT
From the looks of it, it is a dump_syms problem.
Comment 5 Ted Mielczarek [:ted.mielczarek] 2011-05-04 05:15:44 PDT
The dump_syms code tries to load DWARF from .debug_info:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/linux/dump_symbols.cc#516

Which uses the dwarf2reader::CompilationUnit class to kick off the actual DWARF parsing:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf/dwarf2reader.cc#260

which calls back into DwarfCUToModule to handle function names and stuff them into a google_breakpad::Module:
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc
Comment 6 Benjamin Smedberg [:bsmedberg] 2011-05-04 05:50:27 PDT
This is critical, to the point of turning off PGO in nightly builds until it is resolved: who wants to own it?
Comment 7 Mike Hommey [:glandium] 2011-05-04 06:08:16 PDT
So, it looks like the blame is to put on gcc, but considering gdb is able to deal with it, I guess we could do something about it.

It looks like it all boils down to functions not having low_pc and high_pc defined for the PGO'ed functions, which makes this 
http://code.google.com/p/google-breakpad/source/browse/trunk/src/common/dwarf_cu_to_module.cc#422
do nothing.

I'll attach two dwarfdump outputs, one from a normal libmozalloc.so, and one from a PGO'ed libmozalloc.so.
Comment 8 Mike Hommey [:glandium] 2011-05-04 06:10:51 PDT
(In reply to comment #6)
> This is critical, to the point of turning off PGO in nightly builds until it is
> resolved: who wants to own it?

Nightlies only ?
Comment 9 Mike Hommey [:glandium] 2011-05-04 06:11:48 PDT
Created attachment 529980 [details]
dwarfdump output for non-PGO libmozalloc.so
Comment 10 Mike Hommey [:glandium] 2011-05-04 06:12:20 PDT
Created attachment 529981 [details]
dwarfdump output for PGO libmozalloc.so
Comment 11 Mike Hommey [:glandium] 2011-05-04 06:37:28 PDT
Created attachment 529986 [details]
dwarfdump output for PGO libmozalloc.so

Weirdly, the builds from the buildbot have more dwarf information than mine, and do contain low_pc and high_pc, but in a different form.
Comment 12 Mike Hommey [:glandium] 2011-05-04 07:00:49 PDT
(In reply to comment #11)
> Created attachment 529986 [details]
> dwarfdump output for PGO libmozalloc.so
> 
> Weirdly, the builds from the buildbot have more dwarf information than mine,
> and do contain low_pc and high_pc, but in a different form.

That's actually frame info.
Comment 13 Mike Hommey [:glandium] 2011-05-04 07:59:28 PDT
And now I don't know what I've been doing earlier, but I can't even get gdb to like the .so.dbg files anymore:

warning: the debug information found in "./tmp/firefox/libxpcom.so.dbg" does not match "./libxpcom.so" (CRC mismatch).

warning: the debug information found in "./tmp/firefox/libmozalloc.so.dbg" does not match "./libmozalloc.so" (CRC mismatch).

etc.
Comment 14 Mike Hommey [:glandium] 2011-05-04 10:11:30 PDT
Created attachment 530060 [details]
dwarfdump output for PGO libmozalloc.so

This is another instance of a pgo build, from http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mh@glandium.org-f2c7bdb0a32a/try-linux64/ (non-stripped build, which explains the .tar.bz2 size)

This one doesn't have DW_AT_frame_base as loclists in all DW_TAG_subprograms.
gdb is able to get functions and line numbers with these debug info, so it obviously finds something useful in there...

(It however looks wrong, per spec, that there's no DW_AT_low_pc/DW_AT_high_pc)
Comment 15 Jim Blandy :jimb 2011-05-04 10:12:07 PDT
Mike, if you could attach a PGO binary file for which dump_syms fails to write symbols, I can debug this.
Comment 16 Mike Hommey [:glandium] 2011-05-04 10:14:59 PDT
Created attachment 530061 [details]
PGO'ed libmozalloc.so

Here you are.
Comment 17 Jim Blandy :jimb 2011-05-04 10:17:53 PDT
I'd especially like to see one where GDB can get function names (line number info is separate, and self-contained) but dump_syms can't.
Comment 18 Jim Blandy :jimb 2011-05-04 10:18:11 PDT
(In reply to comment #16)
> Here you are.

Taking a look...
Comment 19 Mike Hommey [:glandium] 2011-05-04 10:18:21 PDT
(btw, on this file, dump_syms outputs each CFI info twice, as in bug 637665)
Comment 20 Mike Hommey [:glandium] 2011-05-04 10:19:27 PDT
(In reply to comment #17)
> I'd especially like to see one where GDB can get function names (line number
> info is separate, and self-contained) but dump_syms can't.

I could break in moz_free and have the corresponding line number. But for function names, gdb might just be using the symbols table.
Comment 21 Daniel Holbert [:dholbert] 2011-05-04 10:41:22 PDT
(Once we fix this, could we add some sort of automated test for this? I seem to recall at least one other instance in the past where we had basically the same issue (no usable symbols), and it'd be great if we could automatically catch this the next time it happens.)
Comment 22 Mike Hommey [:glandium] 2011-05-04 10:51:47 PDT
(In reply to comment #21)
> (Once we fix this, could we add some sort of automated test for this? I seem to
> recall at least one other instance in the past where we had basically the same
> issue (no usable symbols), and it'd be great if we could automatically catch
> this the next time it happens.)

Unfortunately, these are two completely different problems that can't be caught at the same level. This one can only be caught by actually checking with some other tool than dump_syms that everything we'd expect to be in the symbols files are there
Comment 23 Mike Hommey [:glandium] 2011-05-04 10:52:20 PDT
(these being this one and the previous one you mention, when we had a breakage due to elfhack)
Comment 24 Jim Blandy :jimb 2011-05-04 11:06:14 PDT
Hum. If we can't upgrade GCC to get around this, we may need to teach dump_syms to read the ELF symbol table and demangle the names. ELF symbols have a type that tells you whether they're a function or not, and a size that gives you the extent of the function.
Comment 25 Ted Mielczarek [:ted.mielczarek] 2011-05-04 15:23:40 PDT
Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us publicly visible symbols, though?
Comment 26 Mike Hommey [:glandium] 2011-05-04 15:33:47 PDT
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show Doesn't that only get us
> publicly visible symbols, though?

All symbols are in .symtab. Publicly visible symbols are in .dynsym.
Comment 27 Ted Mielczarek [:ted.mielczarek] 2011-05-04 15:37:45 PDT
Ah, okay. In that case, it'd be fairly easy to modify that patch to work on .symtab and produce FUNC records.
Comment 28 Mike Hommey [:glandium] 2011-05-04 15:39:22 PDT
(In reply to comment #25)
> Like this? http://breakpad.appspot.com/275001/show

Note that you shouldn't be taking the .dynstr section by name, but from the sh_link from the .dynsym section. (which, if you replace with .symtab, still works, since sh_link for .symtab is .strtab)

Also, according to the summary, the patch doesn't generate the same kind of information as what you'd get from dwarf, while it should be possible.
Comment 29 Ted Mielczarek [:ted.mielczarek] 2011-05-04 17:13:30 PDT
Right, hence the "modify that patch" bit. I don't think it'd be a lot of work on top of that if we decided we needed it.
Comment 30 Mike Hommey [:glandium] 2011-05-10 08:44:37 PDT
For the record, .symtab information wouldn't have saved us. I'll post a detailed analysis of the whole problem on my blog.
Comment 31 Asa Dotzler [:asa] 2011-05-31 15:22:55 PDT
Do we need to track this for Firefox 6 given the two dependencies are resolved and we do have crash stats for linux?
Comment 32 Mike Hommey [:glandium] 2011-05-31 15:55:19 PDT
(In reply to comment #31)
> Do we need to track this for Firefox 6 given the two dependencies are
> resolved and we do have crash stats for linux?

We don't. I need to update this bug, btw.
Comment 33 Mats Palmgren (:mats) 2012-05-13 11:37:57 PDT
Symbols are missing on Try too, is that the same problem as this bug?
https://tbpl.mozilla.org/php/getParsedLog.php?id=11710315&full=1&branch=try#error0
Comment 34 Ted Mielczarek [:ted.mielczarek] 2012-05-14 05:41:44 PDT
That's some other completely different issue. Note that that's a debug build, and this bug was about PGO builds.
Comment 35 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2012-08-09 07:12:16 PDT
We have been using PGO for some time, has this been fixed?
Comment 36 Ted Mielczarek [:ted.mielczarek] 2012-08-09 07:40:52 PDT
I believe this got fixed in bug 654975.

Note You need to log in before you can comment on or make changes to this bug.