###!!! ASSERTION: 1 dynamic atom(s) with non-zero refcount: [System Principal]: 'nonZeroRefcountAtomsCount == 0', file /Users/kats/zspace/gecko-wr/xpcom/ds/nsAtomTable.cpp, line 445
Categories
(Core :: XPCOM, enhancement)
Tracking
()
People
(Reporter: kats, Assigned: mccr8)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
8.45 KB,
text/plain
|
Details |
I get this assertion every time I try to run reftests on a local debug macOS build.
###!!! ASSERTION: 1 dynamic atom(s) with non-zero refcount: [System Principal]: 'nonZeroRefcountAtomsCount == 0', file /Users/kats/zspace/gecko-wr/xpcom/ds/nsAtomTable.cpp, line 445
It causes the browser to crash on startup and I have to comment out the assertion to actually run the reftests.
Assignee | ||
Comment 1•7 years ago
|
||
This is an assertion related to a leak. It is strange that it is happening on startup. Maybe the test harness is running a XPCShell that is running then shutting down? I also don't understand why an NS_ASSERTION is fatal.
Reporter | ||
Comment 2•7 years ago
|
||
"on startup" was poorly worded. I really meant "before the reftests actually run". But the reftest harness does a bunch of stuff during that time including open and closing some windows so really it could be triggered by some of that.
Is there a way to debug this?
Also of note is that if I comment out the assertion, I hit the assertion in bug 1523944 which I have to comment out as well. Maybe they are related?
Assignee | ||
Comment 3•7 years ago
|
||
Do you have XPCOM_DEBUG_BREAK set to something in your environment? That could make the assertion fatal. So that part is weird.
But yeah, it is probably the same thing. I wouldn't be surprised if XPCWJS being held alive a long time would hit both assertions.
Assignee | ||
Comment 4•7 years ago
|
||
I'm going to mark this as a variant of bug 1523944, with some weird additional twist that XPCOM_DEBUG_BREAK is being set to fatal. But maybe that's normal for the reftest harness.
Reporter | ||
Comment 5•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #3)
Do you have XPCOM_DEBUG_BREAK set to something in your environment? That could make the assertion fatal. So that part is weird.
I think the fatal-ness of the problem might be related to lldb (I've been having issues with lldb which are interfering). When I run without lldb it just takes a really long time (many minutes) to emit a stack for the NS_ASSERTION. But yeah I'm fine with duping this.
Assignee | ||
Comment 6•7 years ago
|
||
erahm pointed out that runxpcshelltests.py includes this line which would explain the crash:
self.env["XPCOM_DEBUG_BREAK"] = "stack-and-abort"
Reporter | ||
Comment 7•7 years ago
|
||
This is still happening for me, even with the fix for bug 1523944. What can I do to debug this?
Reporter | ||
Comment 8•7 years ago
|
||
To be clear, I don't think this is a fatal error, but it just takes so long (many minutes) to dump the stack that it might as well be.
Assignee | ||
Comment 9•7 years ago
|
||
Taking multiple minutes to dump a stack seems like a bug by itself. Is it taking that long to dump a single stack, or are there a ton of stacks being dumped?
Assignee | ||
Comment 10•7 years ago
|
||
Hmm I guess "stack and abort" implies it only dumps a single stack.
Reporter | ||
Comment 11•7 years ago
|
||
It takes about 6 minutes to dump this single stack:
[Parent 91369, Main Thread] ###!!! ASSERTION: 1 dynamic atom(s) with non-zero refcount: [System Principal]: 'nonZeroRefcountAtomsCount == 0', file /Users/kats/zspace/gecko-wr/xpcom/ds/nsAtomTable.cpp, line 445
#01: nsAtomTable::GC(GCKind) (Mutex.h:168, in XUL)
#02: NS_ShutdownAtomTable() (nsAtomTable.cpp:484, in XUL)
#03: mozilla::ShutdownXPCOM(nsIServiceManager*) (XPCOMInit.cpp:779, in XUL)
#04: ScopedXPCOMStartup::~ScopedXPCOMStartup() (nsAppRunner.cpp:1314, in XUL)
#05: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (mozalloc.h:151, in XUL)
#06: XRE_main(int, char**, mozilla::BootstrapConfig const&) (nsAppRunner.cpp:4926, in XUL)
#07: main (nsBrowserApp.cpp:214, in firefox)
Running ps in a separate terminal shows that most of the time is spent in:
/Applications/Xcode.app/Contents/Developer/usr/bin/atos -arch x86_64 -o /Users/kats/zspace/gecko-wr/obj-host-debug/dist/NightlyDebug.app/Contents/MacOS/XUL
Assignee | ||
Comment 12•7 years ago
|
||
Glandium, do you have time to look at this? It seems bad that a single call to MozStackWalk could take 6 minutes.
Assignee | ||
Updated•7 years ago
|
![]() |
||
Comment 13•7 years ago
|
||
atos
is just slow (at least on the first invocation; subsequent invocations--on the same binary?--seem to be much faster). Now that we've gone more all-in on clang, we could just use whatever the LLVM equivalent of atos
is, assuming it has the same functionality...
Assignee | ||
Comment 14•7 years ago
|
||
There's llvm-symbolizer, which we use for ASan.
Reporter | ||
Comment 15•7 years ago
|
||
llvm-symbolizer does appear to be much faster. I hacked fix_macosx_stack.py to call that instead of atos and it returns results in seconds instead of minutes. Although the output format is different and I haven't done the work to parse that properly so it's giving me mostly garbage at the moment. But it does so quickly!
I can spin off another bug to track this switch. It won't actually fix the assertion I originally filed this bug about, although it will make it less painful for me.
![]() |
||
Comment 16•7 years ago
|
||
fix_linux_stack.py
uses addr2line
, which is also horribly slow. (In fact, my experience with DMD stacks that addr2line
is much slower than atos
!) So if a faster clang alternative would work there too, that would be great.
Updated•7 years ago
|
Comment 17•7 years ago
|
||
This is now occurring for me on a newly setup macOS build machine.
Comment 18•7 years ago
•
|
||
Trying to find the relevant place where XPCOM_DEBUG_BREAK is set to "stack-and-abort"...
Reporter | ||
Comment 19•7 years ago
|
||
Ignoring the distraction of the slow stack dump, how is one supposed to actually debug and fix the actual assertion? The [System Principal]
string only appears a handful of places in the code, but it's not clear to me how it ends up in the dynamic atom table or why it has a non-zero refcount on shutdown.
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
Ignoring the distraction of the slow stack dump, how is one supposed to actually debug and fix the actual assertion? The
[System Principal]
string only appears a handful of places in the code, but it's not clear to me how it ends up in the dynamic atom table or why it has a non-zero refcount on shutdown.
Odds are this is just the symptom of a leak in a test suite where we ignore leaks.
You can see the rest of the stuff that is leaking if you run with:
XPCOM_MEM_BLOAT_LOG=1
I'm not a fan of the various "assert on leaks" that we have in the code base, because they cause issues like this.
Reporter | ||
Comment 21•7 years ago
|
||
I get the attached output at the end of the test run, which points to a bunch of XPCOM stuff getting leaked.
Assignee | ||
Comment 22•7 years ago
|
||
That's actually a lot less bad than I'd have thought. It could just be something like XPCShell keeping alive a backstage pass on the stack or something.
Comment 23•7 years ago
|
||
I also have this problem when running ./mach reftest layout/reftests/xxxx.html
every time on mac debug build for some weeks ago. This is really annoying.
![]() |
||
Comment 24•7 years ago
|
||
Also affected by this:
./mach mochitest dom/manifest/test/browser_ManifestObtainer_obtain.js
Assignee | ||
Comment 25•7 years ago
|
||
I'll try to figure out something here.
Assignee | ||
Comment 26•7 years ago
|
||
Bug 1542944 and bug 1542854 should both fix the shutdown hangs, by avoiding the assertion. I tried the STR in comment 24 and it fixed that. I'm going to see if I can do something about stack walking being so slow.
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 27•7 years ago
|
||
The immediate issue with not being able to run reftests or mochitests at all without a hang is fixed, so I'll close this and file another bug about the slow stacks.
Description
•