Closed Bug 689291 Opened 13 years ago Closed 11 years ago

###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 783 resulting in test-tabs.test window focus changes active tab | Test output exceeded timeout (60s)

Categories

(Add-on SDK Graveyard :: General, defect, P1)

x86
Linux
defect

Tracking

(firefox24 wontfix, firefox25 fixed, firefox26 fixed)

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- wontfix
firefox25 --- fixed
firefox26 --- fixed

People

(Reporter: BenWa, Assigned: evold)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Moving out of the labs product to the SDK.
Component: Jetpack Prototype → General
Product: Mozilla Labs → Add-on SDK
QA Contact: jetpack → general
Gabor, can you reproduce this?
Blocks: 629263
Priority: -- → P1
I think we need some additional info here to be sure of prioritization. Is this failing on every test run? Which tests is it happening in? Presumably in release builds this would be a crash, do we see that?
Whiteboard: [triage:followup]
I had seen it on the single run above on one of my pushes on tbpl for Jetpack.
I haven't seen this since.  And I'm not sure how to make forward progress on a bug like this, given that it doesn't even seem to be intermittent but rather was just a one-time failure.  Does anyone have any ideas?
So far I could not reproduce this. Maybe we should just close it for now and reopen it if it appears again?
I have only seen this once. There isn't much information on the problem and we cannot reproduce it. I don't object to closing this bug if this isn't reproducible.
Ok, sounds good, closing the bug.  But please do reopen it if you start seeing the problem again and/or come up with ideas about how to reproduce it!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Whiteboard: [triage:followup]
We saw this again recently on a Linux 64 debug build <http://tinderbox.mozilla.org/showlog.cgi?tree=Jetpack&errorparser=unix&logfile=1320095820.1320096138.28401.gz&buildtime=1320095820&buildname=jetpack-mozilla-central-fedora64-debug&fulltext=1>:

Testing addon-kit...
Using binary at '/home/cltbld/talos-slave/test/addonsdk-poller/firefox/firefox'.
...
info: executing 'test-tabs.test window focus changes active tab'
...
info: pass: activate was called on windows focus change.
...
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../dist/include/nsCOMPtr.h, line 849

ejpbruel has been trying to reproduce, although without success yet.

BenWa: did you see this on Mac OS X, as implied by the "Platform" setting for this bug, or was it on another OS?  The log for that original failure is no longer available.
Assignee: nobody → ejpbruel
Blocks: 697775
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(In reply to Myk Melez [:myk] from comment #10)
> BenWa: did you see this on Mac OS X, as implied by the "Platform" setting
> for this bug, or was it on another OS?  The log for that original failure is
> no longer available.

It was just the platform I was using when filling the bug. I don't recall what the platform was. Changing to Linux until we get a report from another platform.
OS: Mac OS X → Linux
I think we need to set XPCOM_DEBUG_BREAK in the test harness to give us a stacktrace to help track this down: https://developer.mozilla.org/en/XPCOM_DEBUG_BREAK
Depends on: 699532
(In reply to Dave Townsend (:Mossop) from comment #12)
> I think we need to set XPCOM_DEBUG_BREAK in the test harness to give us a
> stacktrace to help track this down:
> https://developer.mozilla.org/en/XPCOM_DEBUG_BREAK

Roger that.  Filed and fixed in bug 699532.
No longer depends on: 699532
nsCOMPtr.h line 849 [1] calls NS_PRECONDITION [2], which calls NS_DebugBreak [3] with severity NS_DEBUG_ASSERTION, which should cause the stack to be printed the next time this happens now that we've landed the fix for bug 699532.

[1] http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsCOMPtr.h#849
[2] http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsDebug.h#97
[3] http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsDebugImpl.cpp#262
Depends on: 699532
This is the only orange on the most recent push to the SDK:
http://tinderbox.mozilla.org/showlog.cgi?log=Jetpack/1330038511.1330038982.7890.gz
On Linux64Debug, with a build from mozilla-beta.
It failed in the 'test-tabs.test window focus changes active tab' after "pass: activate was called on windows focus change."
(In reply to Wes Kocher (:KWierso) (Jetpack Bugmaster) from comment #16)
> It failed in the 'test-tabs.test window focus changes active tab' after
> "pass: activate was called on windows focus change."

Again: https://tbpl.mozilla.org/php/getParsedLog.php?id=10322192&tree=Jetpack&full=1
One of the top 3 intermittent failure:
https://tbpl.mozilla.org/php/getParsedLog.php?id=12027987&tree=Jetpack

Eddy: You may request access to a test slave.
If you are not able to reproduce the issue, it is an easy way to reproduce it by yourself! I've done this once for a JS issue. See bug 733825 in order to see how to request such access. There you should be able to launch a jetpack test with a connected gdb and then hoppefully the assertion will easily throw!
I don't think we've seen this for a while, closing for now.
Status: REOPENED → RESOLVED
Closed: 13 years ago11 years ago
Resolution: --- → FIXED
https://tbpl.mozilla.org/php/getParsedLog.php?id=19590122&tree=Jetpack&full=1
This time with feeling!
(jetpack-mozilla-release-fedora64-debug)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I don't see that assertion in those logs, what do I do wrong?
They're down near the bottom of the logs, but they don't quite match the bug's title.
Summary: ###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../dist/include/nsCOMPtr.h, line 809 → ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 783
This is a low-frequency intermittent failure, it shouldn't block us from unhiding the Jetpack tests.
No longer blocks: 629263
(In reply to Dave Townsend (:Mossop) from comment #33)
> This is a low-frequency intermittent failure, it shouldn't block us from
> unhiding the Jetpack tests.

Agree. Actually I would even say that this is more like a reason that urges to unhide jetpack tests.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21687552&tree=Fx-Team
Summary: ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 783 → ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 783 resulting in test-tabs.test window focus changes active tab | Test output exceeded timeout (60s)
This has become incredibly frequent in the last few days, bumping priority to p1.
Priority: P2 → P1
This is a very generic error that is nearly impossible to track down without a call stack at least. This is a crash right? Why don't we get a call stack? Are we running jetpack tests against a release build? Or is there another reason? It would be essential to have proper C++ call stack in case of a crash.
Flags: needinfo?(kwierso)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #154)
> This is a very generic error that is nearly impossible to track down without
> a call stack at least. This is a crash right? Why don't we get a call stack?

Unsure; comment 13 suggests we should, but here's the extent of it in the log of the latest episode:

###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 819
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x003441E4]
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x01181B9C]
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x01667571]
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x01843DBC]
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x01843E7D]
UNKNOWN [/builds/slave/test/build/firefox/libxul.so +0x01843EFE]
UNKNOWN [/usr/lib/i386-linux-gnu/libgtk-x11-2.0.so.0 +0x001388A2]
###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../dist/include/nsCOMPtr.h, line 819
Hit MOZ_CRASH() at ../../../memory/mozalloc/mozalloc_abort.cpp:30


> Are we running jetpack tests against a release build? Or is there another
> reason? It would be essential to have proper C++ call stack in case of a
> crash.

See comment 20 for a suggestion about getting access to a test slave where you can reproduce and hook up gdb to the crashed process.
So here is my theory: 

In sdk/tabs/utils.js we don't handle the case where getTabContainer(window) returns null so we throw. The test hangs, and then cuddlefish runner kills it after the timeout (60s ?). And I think the null pointer deref crash we see here is the not so nice way of killing the process if it hangs (on windows we load in a crashinject.dll that does a null pointer deref, I'm not sure how this works on linux but very likely that this is the output of it...)

So in short:
1., getTabContainer can return null in some cases, and sdk/tabs/utils.js must handle it.
2., once we fixed that it would be interesting to understand why does it return null... do we access it too early maybe?
Depends on: 897004
Blocks: 904063
Attachment #790026 - Flags: review?(dtownsend+bugmail)
Attachment #790026 - Flags: review?(dtownsend+bugmail) → review+
Commits pushed to master at https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/88e5551ce4b0fe5bb44d7d0ebc78c1fb26439169
Bug 689291 test-tabs.test window focus changes active tab | Test output exceeded timeout (60s)

* Using a safer method to close the window
* adding some more test asserts so that we have more information if there continue to be failures

https://github.com/mozilla/addon-sdk/commit/b67f61dd359aa55695c914f57e62fd980b596238
Merge pull request #1169 from erikvold/689291

Bug 689291 test-tabs.test window focus changes active tab | Test output exceeded timeout (60s). r=Mossop
It looks like my patch didn't help :(
Assignee: ejpbruel → evold
It looks like since the landing this has changed to the slightly less serious bug 905472. Nice work Erik.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Flags: needinfo?(kwierso)
Target Milestone: --- → mozilla26
Calling this fixed on Aurora by bug 907522.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: