Closed Bug 813650 Opened 7 years ago Closed 7 years ago

checkForCrashes() should output the top frame of a crash in a TBPL-parsable format

Categories

(Testing :: General, defect)

defect
Not set

Tracking

(firefox18 fixed, firefox19 fixed, firefox-esr17 fixed, b2g18 fixed)

RESOLVED FIXED
mozilla20
Tracking Status
firefox18 --- fixed
firefox19 --- fixed
firefox-esr17 --- fixed
b2g18 --- fixed

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

(Keywords: sheriffing-P1)

Attachments

(2 files, 2 obsolete files)

Currently crash stacks are of the form:

{
getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android-armv6/1353355545/fennec-20.0a1.en-US.android-arm-armv6.crashreporter-symbols.zip
PROCESS-CRASH | Shutdown | application crashed (minidump found)
Crash dump filename: /tmp/tmpEIdmpN/4c89b590-7ae7-4d61-565a1c7b-19bde431.dmp
Operating system: Android
                  0.0.0 Linux 2.6.32.9-00002-gd8084dc-dirty #1 SMP PREEMPT Wed Feb 2 11:32:06 PST 2011 armv7l nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys
CPU: arm
     0 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 4 (crashed)
 0  libxul.so!js::TraceChildren(JSTracer*, void*, JSGCTraceKind) [jsinferinlines.h : 1595 + 0x0]
     r4 = 0x4e2fb92c    r5 = 0x5097a03f    r6 = 0x00000000    r7 = 0x00000000
     r8 = 0x00000200    r9 = 0x548a2c0c   r10 = 0x00000000    fp = 0xffffffff
     sp = 0x4e2fb748    lr = 0x53a81aa8    pc = 0x544ecc60
    Found by: given as instruction pointer in context
 1  libxul.so!NoteJSChild [nsXPConnect.cpp : 719 + 0xe]
     r4 = 0x4e2fb92c    r5 = 0x5097a03f    r6 = 0x00000007    r7 = 0x00000000
     r8 = 0x4e2fb92c    r9 = 0x00000001   r10 = 0x00000000    fp = 0x00242f08
     sp = 0x4e2fb788    pc = 0x53a81aa8
    Found by: call frame info
...
...
}

Whilst many crashes only occur during a small subset of tests (and as such can be starred by matching against the test filename), this still leaves:
1) Shutdown crashes (of which there are too many open bugs, so we explicitly blacklist generating bug suggestions for them)
2) Crashes that occur in so many different tests (eg GC related crashers)

...both of which would benefit from being able to see the top frame in the annotated summary. (It would also improve our ability to confirm that crashes we are starring are actually the same; rather than just assuming so according to test name - if people don't open the full log).

To output the top frame we need to parse the minidump stackwalk output at:
http://mxr.mozilla.org/mozilla-central/source/build/automationutils.py#168

...and find the top frame by matching the line after the "Process \d+ (crashed)", then output the PROCESS-CRASH in a form like:

PROCESS-CRASH | testname | application crashed [@ top_frame]

As a follow-on bug we can even add a search link for the top frame to the annotated summary (likely added by the TBPL UI).

For reference for matching the top frame, common forms are:

 0  libc.so + 0xa888

 0  libnss3.so!nssCertificate_Destroy [certificate.c : 102 + 0x0]

 0  mozjs.dll!js::GlobalObject::getDebuggers() [GlobalObject.cpp:89df18f9b6da : 580 + 0x0]

 0  libxul.so!void js::gc::MarkInternal<JSObject>(JSTracer*, JSObject**) [Marking.cpp : 92 + 0x28]

 0  crashinjectdll.dll!CrashingThread(void *) [crashinjectdll.cpp:2ee20348ae59 : 17 + 0x0]
Attached patch Part 1: Prep v1Splinter Review
In order to output the PROCESS-CRASH line with the top frame (if found) appended, we need to hold off doing that log.info until after we've called minidump stackwalk.

However, we still want the PROCESS-CRASH line to be above the stacktrace, since otherwise when viewing the logs in TBPL, the anchor links to the PROCESS-CRASH line will unhelpfully link to the less relevant end of the stacktrace.

As such, this patch just queues up the minidump stackwalk related output until the end.
Attachment #683998 - Flags: review?(ted)
* Whilst the regex used is a bit wtf, the alternative was iterating through every line, setting a flag once the "Thread N (crashed)" line found, scraping the line after that (still needing some regex) and then breaking after that. As such, I decided to just go with the horrific regex and give examples as to what we're trying to catch.

* The "universal_newlines=True" is due to the minidump stackwalk output on Windows having CRLFs, which would otherwise break out regex (or at least require making it more complex).

* We'll also need to change runpeptests.py (but not currently being used and will require adjusting the patch slightly + I guess we may end up using mozcrash there before we start using peptest again) and mozcrash (different repo). I'll file bugs for these two after this one; I'm just keen to get this part landed asap to help sheriffs.

Try run at:
https://tbpl.mozilla.org/?tree=Try&rev=7cfcf4044c6e

Examples of crashes with top frame on that try run:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17224313&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=17225063&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=17225055&tree=Try
Attachment #684004 - Flags: review?(ted)
Attachment #684004 - Attachment description: Part 2: Find out output the top frame → Part 2: Determine & output the top frame
Attachment #683998 - Flags: review?(ted) → review?(wlachance)
Attachment #684004 - Flags: review?(ted) → review?(wlachance)
Comment on attachment 683998 [details] [diff] [review]
Part 1: Prep v1

Review of attachment 683998 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine.
Attachment #683998 - Flags: review?(wlachance) → review+
Comment on attachment 684004 [details] [diff] [review]
Part 2: Determine & output the top frame

Review of attachment 684004 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/automationutils.py
@@ +183,5 @@
> +          #  0  mozjs.dll!js::GlobalObject::getDebuggers() [GlobalObject.cpp:89df18f9b6da : 580 + 0x0]
> +          #  0  libxul.so!void js::gc::MarkInternal<JSObject>(JSTracer*, JSObject**) [Marking.cpp : 92 + 0x28]
> +          match = re.search(r"Thread \d+ \(crashed\)\n 0  (?:.*!)?(?:void )?([^\[\n]+)", out)
> +          if match:
> +            topFrame = "@ %s" % match.group(1).strip()

Yeah, I'm not digging this horrible regex. I think the simpler solution of iterating over the lines would be better. Something like:
lines = out.splitlines()
for i, line in enumerate(lines):
  if "(crashed)" in line:
    topFrame = re.search("^0 ([^\[]+)", lines[out+1]).group(1)

(regex needs more fiddling if you want to strip out library names, but whatever)

@@ +198,5 @@
>          elif stackwalkPath and not os.path.exists(stackwalkPath):
>            stackwalkOutput.append("MINIDUMP_STACKWALK binary not found: %s" % stackwalkPath)
> +      if not topFrame:
> +        topFrame = "Unknown top frame"
> +      log.info("PROCESS-CRASH | %s | application crashed [%s]", testName, topFrame)

nit: it'd probably be clearer to put the @ in this string and have topFrame just be the bit from the stack trace.
Attachment #684004 - Flags: review?(wlachance) → review-
Using a loop this time :-)

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
> Something like:
> lines = out.splitlines()
> for i, line in enumerate(lines):
>   if "(crashed)" in line:
>     topFrame = re.search("^0 ([^\[]+)", lines[out+1]).group(1)
> 
> (regex needs more fiddling if you want to strip out library names, but
> whatever)

This causes exceptions if we don't match for whatever reason & results in having to shoehorn too much into one line, so I've kept the |if match: topFrame = foo| of patch v1.

The strip() is also necessary otherwise we get the training space from examples 2-5 in comment 0. The alternative regex to avoid the strip is:
"^ 0  (?:.*!)?(?:void )?([^\[]+[^ \[]+)"
...which is even more of a mess, so I'd rather stick with the strip().

> @@ +198,5 @@
> >          elif stackwalkPath and not os.path.exists(stackwalkPath):
> >            stackwalkOutput.append("MINIDUMP_STACKWALK binary not found: %s" % stackwalkPath)
> > +      if not topFrame:
> > +        topFrame = "Unknown top frame"
> > +      log.info("PROCESS-CRASH | %s | application crashed [%s]", testName, topFrame)
> 
> nit: it'd probably be clearer to put the @ in this string and have topFrame
> just be the bit from the stack trace.

This would prefix "@" to the "Unknown top frame" case, which seems counter-intuitive; which is why I kept these separate.

Testing the changed lines in a standalone file seems to work fine & pyflakes doesn't complain. Will send to Try again before pushing as usual etc.
Attachment #684004 - Attachment is obsolete: true
Attachment #686262 - Flags: review?(ted)
Bah, qref.
Attachment #686262 - Attachment is obsolete: true
Attachment #686262 - Flags: review?(ted)
Comment on attachment 686264 [details] [diff] [review]
Part 2: Determine & output the top frame v2

Oops forgot to set r?
(See comment 6 for comment when attached)
Attachment #686264 - Flags: review?(ted)
Blocks: 816952
Attachment #686264 - Flags: review?(ted) → review+
Depends on: 817349
Blocks: 817545
Blocks: 570723
Blocks: 813132
No longer blocks: 830046
Depends on: 855697
You need to log in before you can comment on or make changes to this bug.