Last Comment Bug 718999 - make "You can't dereference a NULL nsCOMPtr" assertions fatal
: make "You can't dereference a NULL nsCOMPtr" assertions fatal
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: 9 Branch
: All All
: -- normal (vote)
: mozilla12
Assigned To: Marco Bonardo [::mak]
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-18 05:42 PST by Marco Bonardo [::mak]
Modified: 2012-01-30 13:21 PST (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (1.91 KB, patch)
2012-01-18 08:11 PST, Marco Bonardo [::mak]
benjamin: review+
Details | Diff | Splinter Review

Description Marco Bonardo [::mak] 2012-01-18 05:42:37 PST
While it's true this is most likely followed by a crash, in some cases we don't get a decent stack from it, for example if this happens on tinderbox and causes a segfault there is no way to get a stack.

Thus, I suggest we convert these assertions to MOZ_ASSERT so we abort and get a stack in all cases.

An example of a tinderbox failure:

###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809
NS_DebugBreak+0x00000026 [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxpcom.so +0x0000203C]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x000025AA]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00003A3B]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00004F7F]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x0000211C]
UNKNOWN [/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/libxul.so +0x010F7C37]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00008B2C]
UNKNOWN [../../../../../dist/bin/test_IHistory +0x00006211]
__libc_start_main+0x000000DC [/lib/libc.so.6 +0x00015DEC]
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../../dist/include/nsCOMPtr.h, line 809
/bin/sh: line 1: 32579 Segmentation fault      XPCOM_DEBUG_BREAK=stack-and-abort /builds/slave/m-aurora-lnx-dbg/build/obj-firefox/dist/bin/run-mozilla.sh ../../../../../dist/bin/$f
make[5]: *** [check] Error 139
make[5]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places/tests/cpp'
make[4]: *** [check] Error 2
make[4]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places/tests'
make[3]: *** [check] Error 2
make[3]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components/places'
make[2]: *** [check] Error 2
make[2]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit/components'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/builds/slave/m-aurora-lnx-dbg/build/obj-firefox/toolkit'
make: *** [check] Error 2
program finished with exit code 2
elapsedTime=782.101013
TinderboxPrint: check<br/>16834/0
Unknown Error: command finished with exit code: 2
Comment 1 Marco Bonardo [::mak] 2012-01-18 06:28:42 PST
As a side note, we can't use MOZ_ASSERT here, due to bug 717540.
So either I use NS_ABORT_IF_FALSE or I have to wait for that bug.
Comment 2 Benjamin Smedberg [:bsmedberg] 2012-01-18 07:24:27 PST
I don't understand how using MOZ_ASSERT will help: this test is already using stack-and-abort, which called mozalloc_abort. MOZ_ASSERT forwards to CrashInJS, which uses the same basic technique, so I don't see how we'd get any better stack...

I think this is just an issue where binary test programs aren't hooked up to breakpad while running tests, so we don't get the good crashreport stacks, just whatever stacks get printed by the program itself.
Comment 3 Marco Bonardo [::mak] 2012-01-18 07:54:30 PST
well, we'd get a stack on the xpcom deref rather than later, plus the benefit of having an aborting assertion rather than a weak one, the error is pretty important to catch.

Even if, according to what you say, looks like even aborting at the deref we'd not get a stack due to how binary tests are built? Would that be hard to fix?
Comment 4 Benjamin Smedberg [:bsmedberg] 2012-01-18 07:56:49 PST
I don't oppose the change, but it won't make this particular case any better.

Each of the binary tests would have to enable the crash reporter and we'd have to run them in a test environment which is aware of and processes the resulting minidumps. For things that don't go through XRE_main that's a real PITA, and it's a pretty large chunk of work.
Comment 5 Marco Bonardo [::mak] 2012-01-18 08:11:07 PST
Created attachment 589512 [details] [diff] [review]
patch

I see, it's a pity we can't figure these failures, being unable to reproduce locally :(
Btw, in case we want to take an abort_if_false patch, here it is.
Comment 6 Ted Mielczarek [:ted.mielczarek] 2012-01-18 08:15:14 PST
Why aren't we getting good stacks out of these crashes? They're running on the build machines as part of "make check", so if they're hitting assertions they should get good stacks on debug builds. (The binaries should be unstripped, so NS_Stackwalk should work just fine.) Are we not hitting these on debug builds, only release builds, or what?
Comment 7 Marco Bonardo [::mak] 2012-01-18 08:18:14 PST
Sorry, I don't know the details of how our stackwalk works, though I can say that these failures happen really late in the shutdown process, much after xpcom-shutdown, so aborting earlier may help, or not.
Comment 8 Marco Bonardo [::mak] 2012-01-18 08:24:34 PST
just as a reference, the above has been copied from this log https://tbpl.mozilla.org/php/getParsedLog.php?id=8628442&tree=Mozilla-Aurora
Comment 9 Ted Mielczarek [:ted.mielczarek] 2012-01-18 09:27:07 PST
Okay, after talking to glandium on IRC, I think the problem here is that NS_StackWalk just hasn't ever given us good symbols on Linux (because we're just using dladdr, and it doesn't read debug info). fix-linux-stack.pl exists to fix this very problem, so we should just fix our C++ unit tests to pipe their output through that. Filed bug 719120 on that.
Comment 10 Marco Bonardo [::mak] 2012-01-18 09:32:53 PST
Thank you, fixing that will be really useful.
Comment 12 :Ms2ger (⌚ UTC+1/+2) 2012-01-25 07:26:10 PST
https://hg.mozilla.org/mozilla-central/rev/0a116f325333
Comment 13 Jesse Ruderman 2012-01-25 17:50:18 PST
This change makes my life harder :/  I'll have to add some fragile code to http://www.squarefree.com/2010/11/21/how-my-dom-fuzzer-ignores-known-bugs/ in order to ignore specific assertions with specific stack traces.
Comment 14 Marco Bonardo [::mak] 2012-01-26 03:31:30 PST
The scope was to make easier for people to detect null dereferences and to get their stacks from automated tests (once bug 719120 is fixed), though, if this makes things harder for the fuzzer we use to debug, we can backout.
My assumption was that this kind of failure should be reported and fixed immediately.
Comment 15 Jesse Ruderman 2012-01-27 14:22:24 PST
Fuzzer bugs often don't get fixed immediately. The DOM fuzzer's ignore list currently has signatures for 32 crash bugs.
Comment 16 Benjamin Smedberg [:bsmedberg] 2012-01-30 10:26:56 PST
I don't understand the problem, Jesse: isn't this assertion *always* immediately followed by a crash? There's really no continuing from a nullptr deref.
Comment 17 Jesse Ruderman 2012-01-30 12:22:06 PST
The problem is that when the assertion is changed to an abort, the crash doesn't happen.  My fuzzer scripts use the entire stack to decide whether a crash is a known bug, but only the assertion message to decide whether an assertion failure is a known bug.
Comment 18 Jeff Walden [:Waldo] (remove +bmo to email) 2012-01-30 13:21:10 PST
That doesn't seem like the right way to do it, given that we have JS_ASSERTs at highly centralized pinch points, and a failure at such a pinch point may well alias more than one issue...

Note You need to log in before you can comment on or make changes to this bug.