Closed Bug 570287 Opened 9 years ago Closed 9 years ago

New stack fixer that uses breakpad symbol files

Categories

(Core :: XPCOM, enhancement)

x86
macOS
enhancement
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: jruderman, Assigned: jruderman)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

Attached patch patch v1 (obsolete) — Splinter Review
This new script is intended to be a replacement for fix_macosx_stack.py and fix-linux-stack.pl, at least for Tinderbox.  Those older scripts don't work on Tinderbox (other than Bd) because the native debugging symbols are stripped before they get to unit test boxes (bug 547646 / bug 547873).

It's based on lookupsym.py (from bug 547646 comment 17) and fix_macosx_stack.py.

Caveats:

* This script requires breakpad symbols, so it's inconvenient to use on developer machines.  Let's keep fix_macosx_stack.py and fix-linux-stack.pl around for use on developer machines.

* If we stop downloading breakpad symbols on unit test Tinderboxen, this script will have to be modified so it can run partially on the unit test boxes and partially on a server that has symbols.  This script would become the basis of the hypothetical "symbolicate-assertion-stack" that bug 561754 talks about.  (Or, we could leave the raw stacks on Tinderbox, and let developers download the symbol files and run the script manually when they're interested.)

* This script doesn't know how to combine symbols from multiple sources.  This is good enough for most layout crashes, and it doesn't make sense to fix this until we fix bug 561754.

* This script doesn't have an easy way to compute the UUID used by breakpad and minidump_stackwalk.  This is ok as long as there is only one symbol file per filename, but when we start doing universal builds on Mac (bug 411588) there might be two.  Then we'll need a fix for bug 570286 or something.
When it has a symbol file:
nsLineList_iterator::operator==(nsLineList_iterator) [layout/generic/nsLineBox.h:696]

When it doesn't have a symbol file:
AppKit + 0x108bf8

It seems to be getting the same function names as minidump_stackwalk, which is a good sign :)
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #449406 - Attachment is obsolete: true
Attached patch patch v3 (obsolete) — Splinter Review
Attachment #449426 - Attachment is obsolete: true
Attached patch patch v4 (obsolete) — Splinter Review
Attachment #449471 - Attachment is obsolete: true
Works great on Try.

Reading symbols for libxul takes about 4 seconds on Mac and 10 seconds on Linux (larger symbol file). For comparison, fix_macosx_stack.py had about a minute of overhead when it worked, but minidump_stackwalk only takes 1 or 2 seconds.
Attachment #449494 - Flags: review?(ted.mielczarek)
Comment on attachment 449494 [details] [diff] [review]
patch v4

Would be good to have Jim's review of the "readSymbolFile" class, since Ted wrote about half of that part.
Attachment #449494 - Flags: review?(jim)
Assignee: nobody → jruderman
Status: NEW → ASSIGNED
Blocks: 571443
Comment on attachment 449494 [details] [diff] [review]
patch v4

>diff --git a/build/automation.py.in b/build/automation.py.in
>--- a/build/automation.py.in
>+++ b/build/automation.py.in
>@@ -618,43 +618,51 @@ user_pref("camino.use_system_proxy_setti
>         # We should have a "crashinject" program in our utility path
>         crashinject = os.path.normpath(os.path.join(utilityPath, "crashinject.exe"))
>         if os.path.exists(crashinject) and subprocess.Popen([crashinject, str(proc.pid)]).wait() == 0:
>           return
>       #TODO: kill the process such that it triggers Breakpad on OS X (bug 525296)
>     self.log.info("Can't trigger Breakpad, just killing process")
>     proc.kill()
> 
>-  def waitForFinish(self, proc, utilityPath, timeout, maxTime, startTime, debuggerInfo):
>+  def waitForFinish(self, proc, utilityPath, timeout, maxTime, startTime, debuggerInfo, symbolsPath):
>     """ Look for timeout or crashes and return the status after the process terminates """
>     stackFixerProcess = None
>     stackFixerModule = None

You don't really need stackFixerModule anymore, you don't use it anywhere.

>+      if self.IS_DEBUG_BUILD and (self.IS_MAC or self.IS_LINUX) and symbolsPath and os.path.exists(symbolsPath):
>+        # Run each line through a function in fix_stack_using_bpsyms.py (uses breakpad symbol files)
>+        sys.path.insert(0, utilityPath)
>+        import fix_stack_using_bpsyms as stackFixerModule
>+        stackFixerFunction = lambda line: stackFixerModule.fixSymbols(line, symbolsPath)
>+        del sys.path[0]
>+      elif self.IS_DEBUG_BUILD and self.IS_MAC:
>+        # Run each line through a function in fix_macosx_stack.py (uses atos)
>+        sys.path.insert(0, utilityPath)
>+        import fix_macosx_stack as stackFixerModule
>+        stackFixerFunction = lambda line: stackFixerModule.fixSymbols(line)
>+        del sys.path[0]

This is just a fallback for if you don't specify --symbols-path? Is it worth carrying this code around? (All the tinderboxes will have symbols, presumably.)

>diff --git a/testing/mochitest/Makefile.in b/testing/mochitest/Makefile.in
>--- a/testing/mochitest/Makefile.in
>+++ b/testing/mochitest/Makefile.in
>@@ -106,16 +106,18 @@ TEST_HARNESS_BINS := \
> ifeq ($(OS_ARCH),WINNT)
> TEST_HARNESS_BINS += \
>   crashinject$(BIN_SUFFIX) \
>   crashinjectdll$(DLL_SUFFIX) \
>   vmwarerecordinghelper$(DLL_SUFFIX) \
>   $(NULL)
> endif
> 
>+TEST_HARNESS_BINS += fix_stack_using_bpsyms.py

Just stick this in the first definition of TEST_HARNESS_BINS (up with pk12util etc).

>diff --git a/tools/rb/fix_stack_using_bpsyms.py b/tools/rb/fix_stack_using_bpsyms.py
>new file mode 100755
>--- /dev/null
>+++ b/tools/rb/fix_stack_using_bpsyms.py
>+from __future__ import with_statement

I guess we should just bump our minimum Python requirement to 2.5 (this doesn't exist in 2.4).

>+
>+import sys
>+import os
>+import re
>+import bisect
>+
>+def prettyFileName(name):
>+  if name.startswith("../"):
>+    return name.split("/")[-1]

What's this supposed to handle? Also, you should probably just use os.path.basename(name).

>+  elif name.startswith("hg:"):
>+    (junk, repo, path, rev) = name.split(":")
>+    # We could construct an hgweb URL with /file/ or /annotate/, like this:
>+    # url = "http://" + repo + "/file/" + rev + "/" + path

More precisely, like: http://code.google.com/p/socorro/source/browse/trunk/webapp-php/application/config/codebases.php#8 (although that hardcodes the repo bit, you could just subst it as well).

>+        elif line[0] in "0123456789abcdef":
>+          # This is one of the "line records" corresponding to the last FUNC record
>+          # <address> <size> <line> <filenum>
>+          (rva, size, line, filenum) = line.split(None)
>+          rva = int(rva,16)
>+          file = files[filenum]
>+          name = lastFuncName + " [" + file + ":" + line + "]"
>+          funcs[rva] = name

It feels like it'd be smarter to handle this separately, rather than adding lines into one huge funcs array. Like, keep the funcs array to just hold function names, then have a separate dict that does funcname -> [line data]. So you'd do two lookups: one in the rvas list for the function, then a separate lookup in the per-function data. Seems like it might even be faster, since you only have to search over all the functions and not all the line data for every lookup. Something like:
funcs[rva] = {'name': name, 'addrs':[], 'lines':{}]

then in the line handler,
funcs[lastFuncName]['addrs'].append(rva)
funcs[lastFuncName]['lines'][rva] = (file, line)

Does that make sense?
>+  def addrToSymbol(self, address):
>+    i = bisect.bisect(self.addrs, address) - 1
>+    if i > 0:
>+      #offset = address - self.addrs[i]
>+      return self.funcs[self.addrs[i]]
>+    else:
>+      return ""

Then this would be something like:
i = bisect.bisect(self.addrs, address) - 1
if i > 0:
  func = self.funcs[self.addrs[i]]
  j = bisect.bisct(func.addrs, address) - 1
  if j > 0:
    line = func["lines"][func["addrs"][j]
    return "%s [%s:%s]" % (func["name"], line[0], line[1])


>+
>+line_re = re.compile("^(.*) ?\[([^ ]*) \+(0x[0-9A-F]{1,8})\](.*)$")
>+balance_tree_re = re.compile("^([ \|0-9-]*)")
>+
>+def fixSymbols(line, symbolsDir):
>+  result = line_re.match(line)
>+  if result is not None:
>+    # before allows preservation of balance trees
>+    # after allows preservation of counts
>+    (before, file, address, after) = result.groups()
>+    address = int(address, 16)
>+    # throw away the bad symbol, but keep balance tree structure
>+    before = balance_tree_re.match(before).groups()[0]
>+    symbol = addressToSymbol(file, address, symbolsDir)
>+    if not symbol:
>+      symbol = os.path.basename(file) + " + " + hex(address)

I generally prefer string interpolation, so like:
symbol = "%s + 0x%x" % (os.path.basename(file), address)

r=me with those fixes.

Jim's still on vacation till next week, IIRC.
Attachment #449494 - Flags: review?(ted.mielczarek) → review+
> You don't really need stackFixerModule anymore, you don't use it anywhere.

Fixed.

> This is just a fallback for if you don't specify --symbols-path? Is it worth
> carrying this code around? (All the tinderboxes will have symbols, presumably.)

I think it's worth it for developer boxes, since they usually don't strip native symbols and don't run make buildsymbols.  I added  a comment.

> Just stick this in the first definition of TEST_HARNESS_BINS

Done.

> I guess we should just bump our minimum Python requirement to 2.5

Do enough things break that we shouldn't even let people *build* with 2.4?  Anyway, this should be a separate bug.

> What's [if name.startswith("../")] supposed to handle?

dom_quickstubs.cpp and many .h files show up with relative paths that are useless and/or don't correspond to the layout of the source tree.

> Also, you should probably just use os.path.basename(name).

Done.

>>+    # url = "http://" + repo + "/file/" + rev + "/" + path
>More precisely...

Changed to use % and allow a line number.  Still commented out, though, because it's really verbose for log files.

> It feels like it'd be smarter to handle [lines] separately, rather than adding
> lines into one huge funcs array

Say there are M functions and each has N lines. A lookup will either do a binary search over MN entries, or a binary search over M entries followed by a binary search over N entries. Either way it's O(M+N).

Separating it might make the sorting a little faster, but it would also result in creating more objects and extra complexity, so I don't think it's worth changing.

> I generally prefer string interpolation, so like:
> symbol = "%s + 0x%x" % (os.path.basename(file), address)

Fixed.

Also changed initial contributor from MoCo to MoFo, and rebased for bug 573263.
Attached patch patch v5Splinter Review
Attachment #449494 - Attachment is obsolete: true
Attachment #454134 - Flags: review+
Attachment #449494 - Flags: review?(jim)
http://hg.mozilla.org/mozilla-central/rev/94ffc25de876

Works great on Mac32, Linux32, and Linux64.  Should work on Mac64 when bug 558947 is fixed.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.