607831 - switch to binary format symbols with a minidump_stackwalk replacement

Reporter

Description

•

15 years ago

The current stackwalk_server + source_daemon combination is proving inadequate for our needs in Socorro (see bug 592467). At the same time, Google has implemented and landed in the Breakpad repository an alternate symbol implementation that uses binary symbol files instead of the current text format files. I am currently investigating using the new binary symbol files, and I think they'll solve some of our problems. There's still the question of how we fit them into our process, since we'd like to scale up to a large number of processor nodes, and Aravind is not happy with the idea of having the symbol NFS mount mounted on a large number of systems. My current testing involved writing a little commandline app to convert text-format symbols into binary-format symbols using the conversion classes available in Breakpad, and also writing a symbol supplier implementation that mimicks SimpleSymbolSupplier, but which simply mmaps the binary symbol files. I then built a modified version of minidump_stackwalk which used that symbol supplier to do processing with the bniary symbols. Initial results are very promising. I downloaded the symbol files for the Firefox 4.0b6 builds: Linux x86, Linux x86-64, Windows x86, and Mac ppc/x86 Universal. In terms of file size, the binary symbols are about 1.6x larger on disk: In terms of processing speed, the binary symbols + mmap supplier is much faster: real 0m1.751s (stock minidump_stackwalk + text symbols) real 0m0.235s (modified mdsw + mmapped binary symbols) Both times are with a fully primed fs cache, so no actual IO overhead. Reading from a cold fs the difference is still huge: real 0m3.654s (text) real 0m0.943s (binary) I also wanted to see the difference in number of bytes read from disk, since with the text-format symbols, the entire file needs to be read in order to do anything with it, whereas the with the binary-format symbols, only part of the file needs to be read (although I haven't read the entire implementation, so I'm not sure if it's as smart as I think it should be). I wrote a little utility to exec a program and cat /proc/<pid>/io after it finished, since I was having no luck with systemtap and other utilities: http://hg.mozilla.org/users/tmielczarek_mozilla.com/procio/ Using this utility I could examine how much data was actually read by both implementations while processing the same minidump: ------------ text symbols: rchar: 37916068 wchar: 103144 syscr: 4666 syscw: 3076 read_bytes: 34646016 write_bytes: 0 cancelled_write_bytes: 0 ------------ binary symbols: rchar: 3319886 wchar: 103072 syscr: 438 syscw: 3076 read_bytes: 13722624 write_bytes: 0 cancelled_write_bytes: 0 ------------ I believe the important number here is read_bytes, which is the number of bytes of actual disk I/O that the program caused. (This is only non-zero when the fs cache is empty, I've been testing with the symbols on a USB drive that I can unmount and remount to flush the cache.) With text symbols for this minidump, we wind up reading about 34MB of data from disk. With binary symbols, we only read about 13MB of data, which is a pretty nice improvement! All that being said, storing the symbol files on disk like we do now is just one option. We could also investigate storing the binary files wholesale in HBase or HDFS. The tools I used for testing are here: http://hg.mozilla.org/users/tmielczarek_mozilla.com/symboltests/ (except for procio, in the previously mentioned repository)