Closed Bug 528231 Opened 10 years ago Closed 4 years ago

get debug symbols for refplatform system libraries in Breakpad format in a location usable in unittests/Talos

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: ted)

References

Details

(Whiteboard: [unittest][talos][oldbugs])

Currently we can get stack traces for crashes in unittest suites and Talos, which is nice. However, we don't have symbols for OS libraries, which is bad (see bug 528138 for example). Since our refplatforms are generally pretty fixed with regards to library versions, it probably wouldn't be that hard to dump symbols in Breakpad format for all the libraries on each platform, and then commit them to the build/tools repo. We could then use these symbols to get better stack traces on our unit test and Talos runs.

For Linux, we'd need to get the debug info packages for all the installed system libraries, then run dump_syms on each one.
For OS X, we simply need to run dump_syms on all the system libraries.
For Windows, we can run symchk.exe (comes with WinDBG) to download PDBs for everything in the Windows directory, then run dump_syms on the resulting PDB files.
(When I say run dump_syms, I probably mean run toolkit/crashreporter/tools/symbolstore.py, since that actually generates the right directory structure for the output.)
Any idea how big these would be? We clone build/tools for every single build, so this could end up having an impact on build times.
I don't really have an estimate. We have a bunch of OS symbols on the symbol server, but they're from a bunch of different OS versions, so I don't know how large the set from one single version is.
The zip files are about 8.1 MB; it looks like the raw symbols files are about 47 MB per OS version (Universal 32-bit).  They compress/expand fairly quickly, so checking in the zip and then having the build process expand before starting shouldn't be *too* much of hit, I'd hope.
(Comment 3 refers to the Mac OS X symbols, of course.)
Ok, if we're looking at tens of megabytes we'll definitely need to find somewhere else to put these.
Summary: get debug symbols for refplatform system libraries in Breakpad format in build/tools repo, use in unittests/Talos → get debug symbols for refplatform system libraries in Breakpad format in a location usable in unittests/Talos
Component: Release Engineering → Release Engineering: Future
So we noticed this again, and I had a better idea. How about we just put them on a network share, and mount it on all the slaves? It's only necessary that it be read-only. In addition, we already have most of the symbols we need on the Socorro symbol store, so if we could just mount that then we're already done. If not, we could rsync from the symbols_os dir, or just zip up the contents and stick them somewhere else. We'll need to manually add the symbols matching the Linux refplatform, since we don't have any Linux symbols on the symbol server.
(The symbol server mount I'm referring to is 10.253.0.11:/vol/socorro/symbols, which is mounted on dm-symbolpush01 as well as the Socorro processor machines.)
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Depends on: 561754
Whiteboard: [unittest][talos]
Assignee: nobody → nrthomas
If we fix bug 561754, we only need them mounted on the server that does the processing.
(In reply to comment #0)
> For Linux, we'd need to get the debug info packages for all the installed
> system libraries, then run dump_syms on each one.
> For OS X, we simply need to run dump_syms on all the system libraries.

Could you define 'system libraries' here ?
It's probably sufficient to:
On Linux, run `ldd` on firefox-bin and every *.so in the app dir. Any library referenced not in the app dir counts as a system library.

On OS X, use `otool -L` on firefox-bin and every *.dylib in the app dir.

Seem reasonable?
Oh, there were some upstream Breakpad changes that would slightly change the mechanics of this. On Linux, you need to install the -dbg packages for every system library package, then you run:
dump_syms /path/to/libwhatever.so /path/to/debug
where /path/to/debug is wherever the -dbg package installed things.

On OS X, dumping system libraries is currently broken. mento has a patch to fix it, so it should be unbroken in the near future.
OK, I'll probably recurse the deps on those libs to cover all the bases. We just want the .sym files ?
Yeah, just the .sym files. You can probably use toolkit/crashreporter/tools/symbolstore.py to do this for you, you can pass it the path to a dump_syms binary, a path to store symbols in, and a list of files and it will dump them all and put them in the right directory structure.
(In reply to comment #14)
> Yeah, just the .sym files. You can probably use
> toolkit/crashreporter/tools/symbolstore.py to do this for you, you can pass it
> the path to a dump_syms binary, a path to store symbols in, and a list of files
> and it will dump them all and put them in the right directory structure.

You can also use the infrastructure that we use to provide OS libraries for user machines: http://wiki.caminobrowser.org/QA:Crash_Reporting_and_Analysis#Generating_Symbols_for_Mac_OS_X_Libraries

IIRC there are still bugs in toolkit's symbolstore.py that prevent its use in driving the OS symbol generation.

As ted notes, though, you'll have to use a dump_syms from an older pull of Breakpad that predates the breakage, or wait for a fixed one.
Whiteboard: [unittest][talos] → [unittest][talos][oldbugs][triagefollowup]
Is this a blocker for bug 495464, or perhaps even a dupe?
No, it's unrelated. This is simply about having Breakpad-format debug symbols for our OS libraries, that one is about getting full-memory minidumps out of crashes.
Whiteboard: [unittest][talos][oldbugs][triagefollowup] → [unittest][talos][oldbugs]
We can put these on stackwalker.pvt.build.mozilla.org, since we'll be doing all our minidump processing there (bug 561754). I think we'll have to modify the CGI slightly, since it currently only looks in one directory (where the application symbols are). It'd be trivial to add a system symbol path to that list.
No longer depends on: 561754
How does this bug fit into our current plans ? The background situation has changed a bit since this was first filed. Can we pass more than one symbol url to the test automation ?
tl;dr - ted would still like to do this, using a static snapshot of OS symbols stored on the slave disk

<nthomas>	I was going to ask you about bug 528231, and what that means today
<ted2>	nthomas: ah
<ted2>	we're not any closer to that than we ever have been :-/
<ted2>	if the stackwalk-cgi thing would have worked out that would have been easy
<nthomas>	that's one of our old bugs, so nominally I supposed to fix it this quarter
<ted2>	heh
<ted2>	okay, so do the slaves have a few tens of MBs free that we could use?
<nthomas>	yes
<ted2>	okay
<nthomas>	static dump of the symbols ?
<ted2>	then we can probably just grab one slave of each type, dump the symbols for all the system libs, and then push that out to all slaves in a fixed location
<ted2>	and then we'd have to fix the automation to know to use it
<ted2>	which probably isn't super hard
<ted2>	i have a script in a cron job on my mac that scrapes system symbols and sends them to the symbol server
<ted2>	http://hg.mozilla.org/users/tmielczarek_mozilla.com/mac-breakpad-symbol-gather/file/tip/gathersymbols.py
<ted2>	a modified version of that would work for mac
<ted2>	for linux, if you install all the -dbg packages, dump_syms knows how to use them
<ted2>	for windows we can probably do one of two things: 1) use the symsrv_convert.exe I use for fetching win32 symbols for crash-stats: http://hg.mozilla.org/users/tmielczarek_mozilla.com/fetch-win32-symbols/file/9ef96c720a1b (which requires us to know the GUID/pdb filename for each library)
<ted2>	or 2) use symchk.exe (comes with windbg), which can just grab PDBs for a set of binaries, and then run dump_syms on each of those
<ted2>	2 is probably easier
Note that you can also download a symbol package for Windows from [1]. These are big (> 1GiB uncompressed) so I hope whatever dump_syms dumps is a lot smaller (or you can select only the used files).

[1] http://msdn.microsoft.com/en-us/windows/hardware/gg463028.aspx
Product: mozilla.org → Release Engineering
Clearly I've not been working on this.
Assignee: nthomas → nobody
Component: Other → General Automation
QA Contact: catlee
Soccoro stores all its symbols in S3 now, so a fairly easy way to fix this would be to:
1) Scrape OS symbols from all our test machines and stuff them into Socorro (easy, just requires running a script on each type of test machine and then uploading the results)
2) Change the minidump_stackwalk we're using to process test crashes to use the HTTP symbol fetching code that Socorro's stackwalker uses so that it can fetch symbols from S3 as well as local.
3) ???
4) Profit
See Also: → 1181610
Duplicate of this bug: 1187388
Per ted on IRC, perhaps this is easier now.  This would help in a large batch of gdl/glib/gobject/cairo/libc/pixman intermittents in widget/gtk (perhaps related to gtk3, at least some of them) which have been showing up.
Flags: needinfo?(coop)
If Ted can point me at the script, I'm happy to run it everywhere we need to and post the symbols.
Flags: needinfo?(coop)
Flags: needinfo?(ted)
For Linux this needs the debug symbols installed first, that's bug 1181610. Once that happens I'll post a script that we can use to scrape the symbols and get them into Socorro's symbol store. Once we have the symbols we'll need to update our minidump_stackwalk binaries to make them able to fetch the symbols over HTTP.
Depends on: 1181610
Flags: needinfo?(ted)
OK, let's try this. coop: can you run this script on a linux32 test machine (the minidump I have is from tst-linux32-spot-577)?
https://github.com/luser/breakpad-scrape-system-symbols/blob/master/gathersymbols.py

You'll need to `pip install requests` somewhere, a virtualenv is fine. You'll also need a dump_syms binary next to the script, you can use this one:
https://people.mozilla.com/~tmielczarek/linux32/dump_syms

It should spit out a symbols.zip when it's done, just copy that somewhere and I'll handle the rest.
Flags: needinfo?(coop)
Oh, you should probably pass --all to that script.
I got loaner Linux 32 and 64 test machines and scraped symbols out of them following these steps:
1) Run Firefox
2) wget https://github.com/luser/breakpad-scrape-system-symbols/raw/master/list-dbgsym-packages-v2.1.sh
3) bash list-dbgsym-packages-v2.1.sh -t -p $(pidof firefox) > debug-packages
4) xargs apt-get install -y < debug-packages
5) apt-get install python-requests
6) wget https://github.com/luser/breakpad-scrape-system-symbols/raw/master/gathersymbols.py
7) wget https://people.mozilla.org/~tmielczarek/linux-`uname -p`/dump_syms; chmod +x dump_syms
8) grep -v ^/home/cltbld /tmp/libs.list | xargs python gathersymbols.py -v
9) Do something with symbols.zip
Flags: needinfo?(coop)
Depends on: 1201012
In bug 1201012 I just landed a patch to use a http-aware minidump_stackwalk that will fetch symbols from the S3 symbol store that crash-stats uses. I also uploaded some symbols from our test machines to the symbol store, you can see the stacks in my try push from that bug:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bd5e41049ea1

I think that's enough to call this bug fixed.
Assignee: nobody → ted
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Comment 30's steps don't work out of the box now because I fiddled the repo in comment 6, so you need:
6) git clone https://github.com/luser/breakpad-scrape-system-symbols.git; cd breakpad-scrape-system-symbols; python setup.py install
..
8) grep -v ^/home/ /tmp/libs.list | xargs gathersymbols -v ./dump_syms
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.