Closed Bug 929128 Opened 11 years ago Closed 6 years ago

Gecko assertions don't include full stack trace

Categories

(Firefox OS Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jgriffin, Unassigned)

References

(Blocks 1 open bug)

Details

When a debug B2G mochitest fails due to an assertion, the assertion is logged without a full stack trace, e.g., only:

14:47:44     INFO -  [Child 709] ###!!! ASSERTION: TabChild::SetFocus not supported in TabChild: 'Not Reached', file ../../../gecko/dom/ipc/TabChild.cpp, line 904

For Firefox, it's logged with a full stack trace, which may make debugging easier.

sample log: https://tbpl.mozilla.org/php/getParsedLog.php?id=28953771&tree=Cedar&full=1#error0
Is this just us not setting environment variables right, or is the stackwalking code for assertions really not hooked up on B2G?

XPCOM_DEBUG_BREAK controls the printing of stacks:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#630

Although looking at nsStackWalk.cpp, it looks plausible that it has never worked on arm:
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsStackWalk.cpp#34
I guess we'd have to know if unwind_backtrace exists for our B2G builds:
http://mxr.mozilla.org/mozilla-central/source/configure.in#7371
Here's a B2G crash where the crash reporter is working: https://tbpl.mozilla.org/php/getParsedLog.php?id=30980530&tree=Cedar&full=1#error0

A minidump is generated, and the test automation pulls it and does a stack walk with the breakpad tools, and that goes into the TBPL log. 

Here's a B2G crash (caused by MOZ_CRASH) where this does not work: https://tbpl.mozilla.org/php/getParsedLog.php?id=30610612&tree=Cedar&full=1#error5

We see the system's SIGSEGV handling in action (from what I've seen this is suppressed if the crash reporter is enabled (but this may or may not be due to bug 942407, so don't rely on it)), and the automation doesn't appear to mention anything about minidumps.

I think we should figure out the reason for this disparity in crash handling; if we can fix that, then we have assertion stacks for tbpl.


As for the issues raised in comment #1 and comment #2: Having the crashing process unwind its own stack could also be useful.  There's an unwinder in libgcc, used for exception handling, but we should make sure it won't do anything particularly bad, such as recursively crashing, when it encounters the not-quite-sorted .ARM.exidx sections sometimes seen on ICS due to a linker bug.  If that doesn't work, there's the ARM-specific unwinder I wrote for the profiler, which could be extracted (note that it depends on tools/profiler/shared-libraries*) for use with NS_StackWalk if need be.  There's also a new profiling unwinder that :sewardj is working on (bug 938157), but that's still in development, and in particular it needs more memory usage optimization to be usable on b2g.
Historically we've had two different cases here:
1) Some test suites are run with non-fatal assertions, using XPCOM_DEBUG_BREAK=stack. Mochitest variants and Reftest/Crashtest are run this way. We use the in-process NS_StackWalk code to print a stack every time an assertion is hit, and we keep track of them in the test harness.

2) Some test suites are run with fatal assertions, using XPCOM_DEBUG_BREAK=abort (or stack-and-abort). xpcshell tests are run this way. Assertions terminate the process such that Breakpad catches the exception and writes a minidump, which we then produce a stack from.

Given that comment 0 is talking about a Mochitest, this seems to be something related to #1, meaning the in-process stackwalking isn't working (or isn't configured properly), and not anything to do with minidumps or Breakpad.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)
> Historically we've had two different cases here:
> 1) Some test suites are run with non-fatal assertions, using XPCOM_DEBUG_BREAK=stack. Mochitest variants and Reftest/Crashtest are run this way.

Oh, I see.  The mochitest crash I found was a conditional NS_RUNTIMEABORT, not an assertion (for any of the several different meanings of "assertion"), so that isn't relevant for this bug… meaning that I need to file a separate bug for it.
Sorry, assertions are hard. :-|
Do we have any plan to actually fix this?
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.