Closed Bug 1334771 Opened 3 years ago Closed 3 years ago

Debug builds fail with [2084] ###!!! ASSERTION: dynamic atom with non-zero refcount ... nsAtomTable.cpp, line 397

Categories

(MailNews Core :: Database, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 54.0

People

(Reporter: jorgk, Assigned: aleth)

References

Details

Attachments

(2 files, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #1334558 +++

After "fixing" bug 1334558 we're left with failing debug builds and heaps of complaints on the console.

Repeating from bug 1334558 comment #14, repeating the question for Nathan here:

Now, the TB profile manager suffers the same problem as the FF profile manager, in that it gets an error at the end when exiting, see bug 1303637 (and formerly bug 1223574). No one was interested. Looks like this is coming back to bite us. Have you ever run the profile manager in debug mode after landing bug 1276669?

Repeating from bug 1334558 comment #16:
NOTE: The atoms we get complaints about are none of the ones mailnews/ now registers dynamically, for example below:
stringBundleBindings and key_previousMsg (which is defined in XUL!!)

Application starts and runs. However, exiting the profile manager gives, as already known:
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive
inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
[2472] WARNING: '!compMgr', file c:/mozilla-source/comm-central/mozilla/xpcom/co
mponents/nsComponentManagerUtils.cpp, line 63
[2472] ###!!! ASSERTION: dynamic atom with non-zero refcount stringBundleBinding
s!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cp
p, line 397
followed by a million more.

And exiting TB gives the same:
[3760] WARNING: unable to Flush() dirty datasource during XPCOM shutdown: file c
:/mozilla-source/comm-central/mozilla/rdf/base/nsRDFXMLDataSource.cpp, line 745

WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive
inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
[3760] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt
 == 0', file c:/mozilla-source/comm-central/mozilla/xpcom/build/XPCOMInit.cpp, l
ine 1075
[3760] WARNING: '!compMgr', file c:/mozilla-source/comm-central/mozilla/xpcom/co
mponents/nsComponentManagerUtils.cpp, line 63
[3760] ###!!! ASSERTION: dynamic atom with non-zero refcount key_previousMsg!: '
false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, li
ne 397

Repeating from bug 1334558 comment #17:

So the debug builds fail since they seem to actually run the application and that fails with:
NOTE: This time the complaint is about a dynamic atom we registered: FolderFlag.

Executing c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/objdir-tb/dist\bin\xpcshell.exe -g c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\objdir-tb\dist\bin/ -a c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\objdir-tb\dist\bin/ -f c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/toolkit/mozapps/installer\precompile_cache.js -e precompile_startupcache("resource://gre/");
resource://gre/components/BrowserElementParent.js
resource://gre/components/CSSUnprefixingService.js
resource://gre/components/ColorAnalyzer.js

[stuff deleted]

resource://gre/modules/xmpp-xml.jsm
resource://gre/modules/xmpp.jsm
[2084] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/base/nsCycleCollector.cpp, line 3595
[2084] WARNING: '!compMgr', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/components/nsComponentManagerUtils.cpp, line 63
[2084] ###!!! ASSERTION: dynamic atom with non-zero refcount FolderFlag!: 'false', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
#01: NS_ShutdownAtomTable (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\xpcom\ds\nsatomtable.cpp:539)
#02: mozilla::ShutdownXPCOM (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\xpcom\build\xpcominit.cpp:1082)
#03: XRE_XPCShellMain (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\js\xpconnect\src\xpcshellimpl.cpp:1631)
#04: mozilla::BootstrapImpl::XRE_XPCShellMain (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\toolkit\xre\bootstrap.cpp:53)
#05: NS_internal_main (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\js\xpconnect\shell\xpcshell.cpp:74)
#06: wmain (c:\builds\moz2_slave\tb-try-c-cen-w32-d-00000000000\build\mozilla\toolkit\xre\nswindowswmain.cpp:118)
#07: __scrt_common_main_seh (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:253)
#08: BaseThreadInitThunk[C:\Windows\syswow64\kernel32.dll +0x1338a]
#09: RtlInitializeExceptionChain[C:\Windows\SysWOW64\ntdll.dll +0x397f2]
#10: RtlInitializeExceptionChain[C:\Windows\SysWOW64\ntdll.dll +0x397c5]
[2084] ###!!! ASSERTION: dynamic atom with non-zero refcount FolderFlag!: 'false', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
Hit MOZ_CRASH() at c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/memory/mozalloc/mozalloc_abort.cpp:33
c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/toolkit/mozapps/installer/packager.mk:41: recipe for target 'stage-package' failed
mozmake.exe[3]: Leaving directory 'c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/objdir-tb/mail/installer'
c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/toolkit/mozapps/installer/packager.mk:97: recipe for target 'make-package' failed

So basically that runs into the same problem that we see on debug builds with the profile manager and the application itself.
Flags: needinfo?(nfroyd)
Aleth, can you please explain why we get these errors during the packaging stage.
Looks like we run
  xpcshell.exe ... -g precompile_cache.js -e ...
which runs some of the application and suffers from the same shutdown problems as the profile manager and Thunderbird itself.
Flags: needinfo?(aleth)
This is the precompile cache creation step, i.e. it runs toolkit/mozapps/installer/precompile_cache.js. If this minimal code already has shutdown issues, that's not good.
Flags: needinfo?(aleth)
If there really is no issue with the c-c code (I haven't been following the discussion) it's possible the m-c assertion that was added is a bit too aggressive. For an example of where this has happened before, see http://searchfox.org/comm-central/source/mozilla/dom/system/OSFileConstants.cpp#882
I guess I can look what's happening with the
  ASSERTION: dynamic atom with non-zero refcount FolderFlag!

But given that we've already got
  WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS'
before, I don't know how successful this will be.

And the profile manager complaining in Firefox only (bug 1303637) don't give me all that much confidence. I should build Firefox and see what it does these days.
re comment 0: It's not clear to me whether bug 1303637 and bug 1223574 are really related, as there is no "YOU ARE LEAKING THE WORLD" warning in the startup cache compilation case. And the following unpleasant warning also looks like a different bug: 
WARNING: unable to Flush() dirty datasource during XPCOM shutdown: file c:/mozilla-source/comm-central/mozilla/rdf/base/nsRDFXMLDataSource.cpp, line 745
(In reply to aleth [:aleth] from comment #5)
> re comment 0: It's not clear to me whether bug 1303637 and bug 1223574 are
> really related, ...
Indeed. That's why I wanted to check in to FF. I typed "mach clobber" before going out to lunch and forgot the "mach build", grr.

Maybe the build error from |xpcshell.exe ... -g precompile_cache.js -e ...| is unrelated. But Nathan said in bug 1334558 comment #11 (quote):
> I don't know who is holding those atoms alive, given that they're
> referenced from CSS.  Do you still have live documents at XPCOM shutdown (?!) ?
So if there were documents open, that would account for both problems. I'm not even sure how "document" is define here. Surely, the profile manager displays a panel, and that already is a XUL document in itself. Let me try with FF when my build is done.
You can build FX in about 5 mins with setting ac_add_options --enable-artifact-builds.
No longer depends on: 1334558
(In reply to aleth [:aleth] from comment #5)
> re comment 0: It's not clear to me whether bug 1303637 and bug 1223574 are
> really related, ...
Most certainly not all is well in Firefox either, see bug 1303637 comment #2.
(In reply to Jorg K (GMT+1) from comment #8)
> (In reply to aleth [:aleth] from comment #5)
> > re comment 0: It's not clear to me whether bug 1303637 and bug 1223574 are
> > really related, ...
> Most certainly not all is well in Firefox either, see bug 1303637 comment #2.

Hmm. I'm surprised it's not breaking the precompile_cache step on m-c. I looked at the Linux debug log and there the GRE part ends with

19:24:41     INFO -  resource://gre/modules/workers/require.js
19:24:42     INFO -  Couldn't convert chrome URL: chrome://branding/locale/brand.properties
19:24:42     INFO -  Couldn't convert chrome URL: chrome://branding/locale/brand.properties
19:24:42     INFO -  [28871] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/netwerk/base/nsIOService.cpp, line 794
19:24:42     INFO -  [28871] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/netwerk/base/nsNetUtilInlines.h, line 180
19:24:42     INFO -  [28871] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/dom/media/CubebUtils.cpp, line 218
19:24:42     INFO -  [28871] WARNING: '!compMgr', file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/xpcom/components/nsComponentManagerUtils.cpp, line 63
19:24:42     INFO -  [28871] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
19:24:42     INFO -  [28871] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/slave/m-cen-l64-st-an-d-000000000000/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172

Also some issues, but no dynamic atom asserts and therefore no bustage.
(In reply to aleth [:aleth] from comment #9)
> Also some issues, but no dynamic atom asserts and therefore no bustage.

On the other hand, given that the problematic atom is a mailnews MSGDBFOLDER_ATOM, maybe not surprising after all ;)
I'll do some debugging in a while, like adding 'printf' where we allocate and release those atoms. If the release doesn't happen, we have the culprit. Note that the release is now here:
https://hg.mozilla.org/comm-central/rev/f469e0a1b35e13268d0a8ed06240cc364258a5f0#l1.67
in nsMsgDBFolder::~nsMsgDBFolder and controlled by |if (--mInstanceCount == 0) {|.
Sadly DXR hasn't refreshed yet.

I'm surprised that the minimal |xpcshell.exe ... -g precompile_cache.js -e ...| even instantiates a nsMsgDBFolder object in whose constructor those atoms are created.
OK, here we go: This is what I type on my local system:

$ obj-x86_64-pc-mingw32/dist/bin/xpcshell.exe -g obj-x86_64-pc-mingw32/dist/bin/ -a obj-x86_64-pc-mingw32/dist/bin/ -f mozilla/toolkit/mozapps/installer/precompile_cache.js -e 'precompile_startupcache("resource://gre/");'

and this is what I get:

[5916] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file c:/mozilla-source/comm-central/mozilla/xpcom/base/nsCycleCollector.cpp, line 3595
[5916] WARNING: '!compMgr', file c:/mozilla-source/comm-central/mozilla/xpcom/components/nsComponentManagerUtils.cpp, line 63
[5916] ###!!! ASSERTION: dynamic atom with non-zero refcount FolderFlag!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
[5916] ###!!! ASSERTION: dynamic atom with non-zero refcount Status!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
[5916] ###!!! ASSERTION: dynamic atom with non-zero refcount Keywords!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
[5916] ###!!! ASSERTION: dynamic atom with non-zero refcount Flagged!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
[5916] ###!!! ASSERTION: dynamic atom with non-zero refcount FolderLoaded!: 'false', file c:/mozilla-source/comm-central/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
[5916] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file c:/mozilla-source/comm-central/mozilla/xpcom/base/nsTraceRefcnt.cpp, line 172

Sherlock Holmes continues.
I added some minimal logging to TB and with a simple profile, no mail account, and an opt build I see up to 10 nsMsgDBFolders created, and while some are released and recreated, at shutdown 8 remain unreleased. It's pretty clear something is wrong here (probably missing shutdown blockers).
OK, this is a regression from bug 1334558 then. Since I'm forced to allocate those atoms dynamically now, I have to rely on proper deallocation at shutdown. If the shutdown was never clean, it shows ;-(

Let me add some debug and I'll see.
On the other hand, if I run precompile_cache in the same configuration, I see no Folders at all. So there is something specific to debug builds going on here as well. Could be a race condition (debug builds are slower).
(In reply to aleth [:aleth] from comment #15)
> On the other hand, if I run precompile_cache in the same configuration, I
> see no Folders at all. So there is something specific to debug builds going
> on here as well. Could be a race condition (debug builds are slower).

The complaining about the leaking atoms is for debug builds only.  So they're likely still leaking in opt builds, there's just no message about them.
(In reply to Nathan Froyd [:froydnj] from comment #16)
> The complaining about the leaking atoms is for debug builds only.  So
> they're likely still leaking in opt builds, there's just no message about
> them.

Right, I was referring to the debug printfs logging nsMsgDBFolder creation/release I added, not the assert warnings.
Note that the assert for leaking atoms is new; it got added in bug 1276669.

So chances are this has been leaking all along, including various atoms from whatever stuff is leaking, and has nothing to do with your new atoms per se.  It's just that we didn't use to assert when you leaked atoms, and now we do...
OK, Sherlock Holmes has found the culprit. During xpcshell.exe no nsMsgDBFolder gets instantiated.

However, Gloda has a desperate need to get exactly those atoms which leak here:
https://dxr.mozilla.org/comm-central/rev/aa9ec692549119aac1abf473af2056adb97ad7cb/mailnews/db/gloda/modules/index_msg.js#2774

So I'm clearing NI for Nathan, he's still NI'ed in bug 1303637.
So in particular, the atom being listed as leaking in comment 0 is for the string "stringBundleBindings", which as you noted is not one of your new things.  But it _is_ the id of an element (the <bindings> element in stringbundle.xml) and we atomize id values.  So if you leak that bindings document, you will leak that atom...
Flags: needinfo?(nfroyd)
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #18)
> So chances are this has been leaking all along, including various atoms from
> whatever stuff is leaking, and has nothing to do with your new atoms per se.
> It's just that we didn't use to assert when you leaked atoms, and now we
> do...

Yes, that's my impression too... there's more that needs fixing here than the specific leak that affects the cache compilation.
Boris and Nathan, thanks for you're responses. I think we have a bunch of clean-up to do. The worst one however is that our builds fail.

I'm sure we have a lot of "legacy" problems, but it appears that FF also isn't squeaky clean, see bug 1303637. So perhaps you'll look at the issues there where no ghastly TB developer is interfering ;-)

I'll send a patch for the Gloda problem next.
(In reply to Jorg K (GMT+1) from comment #19)
> However, Gloda has a desperate need to get exactly those atoms which leak
> here:
> https://dxr.mozilla.org/comm-central/rev/
> aa9ec692549119aac1abf473af2056adb97ad7cb/mailnews/db/gloda/modules/index_msg.
> js#2774

Nice, that should allow getting rid of the immediate bustage :-)
(In reply to aleth [:aleth] from comment #23)
> Nice, that should allow getting rid of the immediate bustage :-)
Well, I found the culprit, but now I need help to re-educate him. These atoms are set up in some _init method, but there is no corresponding shutdown method.

So unless someone with better JS skills than I have enlightens me, I'll have to withdraw the "I'll send a patch for the Gloda problem next". Any takers?

BTW, it was funny that only some of the nsMsgDBFolder-allocated atoms should have leaked.
(In reply to Jorg K (GMT+1) from comment #24)
> So unless someone with better JS skills than I have enlightens me, I'll have
> to withdraw the "I'll send a patch for the Gloda problem next". Any takers?

OK, I'll take a look.

> BTW, it was funny that only some of the nsMsgDBFolder-allocated atoms should
> have leaked.

Yes. It seems TB does leak whole folders, but not during precache compilation...
As an aside of of my debugging in nsMsgDBFolder I can now see how many of these objects are created. In my test environment I have 109, *ONE* of which gets destroyed at shutdown:

=== DESTRUCTOR: nsMsgDBFolder::~nsMsgDBFolder: mInstanceCount=109

So we can be very sure that the atoms allocated there are leaking.

Altogether, I'm not convinced that the MOZ_ASSERT() on leaked atoms is really warranted. TB is just a big ugly Firefox add-on and within one hour or so I found three cases where those atoms are leaked (Gloda, nsMsgDBFolder and the profile manager). What about other add-ons? How do we know that like the Gloda code they don't leak? Isn't this atom stuff just a giant string cache? So why complain to much if there are leftovers at the end?

The other day there was some discussion of IRC (#maildev) that likened FF to a VM, so programs run inside it. Well, any good operating system cleans up after ugly applications, so whatever the application allocates, the operating system will free when the application is shut down. It would be very hard to educate *all* applications to *always* to the right thing on *all* their code paths.

So maybe the M-C developers might consider turning down the volume of the noise they make about leaked atoms just a little. How about making this a neat little warning:
BTW, you leaked these atoms: ... (with a pretty list of the first ten or so).
Attached patch glodaAtom.patchSplinter Review
So why not just stop declaring strings as atoms, and use equals() to compare an nsIAtom argument with a string.
Yes, that would work. With my humble knowledge I don't really know what the atom concept is about. I understand that they are some sort of optimisation in order not to carry those strings around through all the interfaces.

I guess an atom comparison might be slightly more efficient then a string comparison, but maybe that's negligible since we're talking about folder events here and how many do we expect to fire per second? Not many, I'd say.

In fact, the first approach to fixing bug 1334558 was to rip out the atoms from nsIFolderListener.idl and nsIMsgFolder.idl. I even prepared a working patch in attachment 8831342 [details] [diff] [review]. BTW, that patch is missing the changes in the Gloda JS file. If we were to use this patch, 'aProperty' would already be a string.
Not tested on try yet, but here's an alternative that should also work, retaining the atoms.
Attachment #8831453 - Flags: review?(jorgk)
Assignee: nobody → aleth
Status: NEW → ASSIGNED
(In reply to aleth [:aleth] from comment #29)
> Not tested on try yet, ...
I'll test it locally now then.
Can you please enlighten me what the second try is about. There you have a _destroy() method. Is the idea to destroy the atoms in there instead of using the "lazy getter"?
(In reply to Jorg K (GMT+1) from comment #32)
> Can you please enlighten me what the second try is about. There you have a
> _destroy() method. Is the idea to destroy the atoms in there instead of
> using the "lazy getter"?

No, that's just an additional patch with some logging, pushed out of interest to see if the behaviour on other OS is the same as what I was seeing locally (i.e. as run from precompile_cache, there's only a call to init but not destroy).
(In reply to Jorg K (GMT+1) from comment #28)
> Yes, that would work. With my humble knowledge I don't really know what the
> atom concept is about. I understand that they are some sort of optimisation
> in order not to carry those strings around through all the interfaces.
> 
> I guess an atom comparison might be slightly more efficient then a string
> comparison, but maybe that's negligible since we're talking about folder
> events here and how many do we expect to fire per second? Not many, I'd say.
> 
> In fact, the first approach to fixing bug 1334558 was to rip out the atoms
> from nsIFolderListener.idl and nsIMsgFolder.idl. I even prepared a working
> patch in attachment 8831342 [details] [diff] [review]. BTW, that patch is
> missing the changes in the Gloda JS file. If we were to use this patch,
> 'aProperty' would already be a string.



nsITreeView properties used to be atoms and that was ripped out in favor of strings, and that is a much more cycles intensive interface. Whether creating an object (once) to equality compare to an existing object is 'faster' than using the existing object's equals() function is an interesting question.

But the purpose here is to do a fast, minimally invasive fix for bustage.  Reconciling atoms should be a broader goal/different bug, done without bustage fix pressure.

My driveby on the second approach is that it adds xpcomtamination, and even in little things it should be a goal to do the opposite.

jorg, if you want to assume/obsolete the patch it's fine, I'm away for the day.
Hey, thanks for that comment.

(In reply to alta88 from comment #34)
> nsITreeView properties used to be atoms and that was ripped out in favor of
> strings, ...
Interesting. Do you have a bug number?

> ... done without bustage fix pressure.
Nice that you understand how I feel ;-)

> My driveby on the second approach is that it adds xpcomtamination, and even
> in little things it should be a goal to do the opposite.
Indeed. So Aleth, what do you make of this?

> jorg, if you want to assume/obsolete the patch it's fine, I'm away for the
> day.
Thanks!
(In reply to alta88 from comment #34)
> nsITreeView properties used to be atoms and that was ripped out in favor of
> strings, and that is a much more cycles intensive interface. Whether
> creating an object (once) to equality compare to an existing object is
> 'faster' than using the existing object's equals() function is an
> interesting question.

String comparisons are probably slower than object comparisons (which will be done via some hash table). But yes, you'd have to ask more knowledgeable persons than me to properly understand the performance implications.

> My driveby on the second approach is that it adds xpcomtamination, and even
> in little things it should be a goal to do the opposite.

I don't really understand this point, despite the XPCOM in the module name, defineLazyGetter is merely a standard JS helper function.
(In reply to aleth [:aleth] from comment #36)
> String comparisons are probably slower than object comparisons (which will
> be done via some hash table). But yes, you'd have to ask more knowledgeable
> persons than me to properly understand the performance implications.

The idl agrees: "At any given time there will always be one atom representing a given string. Atoms are intended to make string comparison cheaper by simplifying it to pointer equality."
http://searchfox.org/mozilla-central/source/xpcom/ds/nsIAtom.idl#94

So my guess is each equals() call is a string comparison instead of an object equality, for whatever difference that makes after various JS runtime optimizations.
Here's another bit of interesting ancient history - why permanent atoms were added: bug 92141
Explicit lazy getters instead of using the helper, in case that is more readable.
Attachment #8831457 - Flags: review?(jorgk)
Attachment #8831453 - Attachment is obsolete: true
Attachment #8831453 - Flags: review?(jorgk)
Ooops, that added some missing underscore typos.
Attachment #8831459 - Flags: review?(jorgk)
Attachment #8831457 - Attachment is obsolete: true
Attachment #8831457 - Flags: review?(jorgk)
(In reply to Jorg K (GMT+1) from comment #35)
> Hey, thanks for that comment.
> 
> (In reply to alta88 from comment #34)
> > nsITreeView properties used to be atoms and that was ripped out in favor of
> > strings, ...
> Interesting. Do you have a bug number?

I also remember that time. We had to adapt our trees to the change. But 407956.
(In reply to aleth [:aleth] from comment #38)
> Here's another bit of interesting ancient history - why permanent atoms were
> added: bug 92141
Thanks, interesting reading.

Re. the efficiency of atoms: I invite you to click here:
https://dxr.mozilla.org/comm-central/search?q=MsgGetAtom&redirect=false
There you will see all the atoms that are created/retrieved on the fly. Note that MsgGetAtom==MsgNewAtom==NS_Atomize.

So surely passing a string to a function that will do a table lookup, that by definition must do string comparisons, cannot possibly be faster than passing the string and comparing it for equality.

I can only hope - and glancing at it, it seems that way - that those calls aren't causing any performance problems.
(In reply to Jorg K (GMT+1) from comment #42)
> Re. the efficiency of atoms: I invite you to click here:
> https://dxr.mozilla.org/comm-central/search?q=MsgGetAtom&redirect=false
> There you will see all the atoms that are created/retrieved on the fly. Note
> that MsgGetAtom==MsgNewAtom==NS_Atomize.

Just in case: that's not the same as atomService.getAtom().

> So surely passing a string to a function that will do a table lookup, that
> by definition must do string comparisons, cannot possibly be faster than
> passing the string and comparing it for equality.
> 
> I can only hope - and glancing at it, it seems that way - that those calls
> aren't causing any performance problems.

Yes, it all depends on how it's used... if those calls are frequent or even inside loops, then likely nothing much is gained.
(In reply to aleth [:aleth] from comment #43)
> (In reply to Jorg K (GMT+1) from comment #42)
> > Re. the efficiency of atoms: I invite you to click here:
> > https://dxr.mozilla.org/comm-central/search?q=MsgGetAtom&redirect=false
> > There you will see all the atoms that are created/retrieved on the fly. Note
> > that MsgGetAtom==MsgNewAtom==NS_Atomize.
> 
> Just in case: that's not the same as atomService.getAtom().

Wrong, it is! :-D
Comment on attachment 8831459 [details] [diff] [review]
Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache

Review of attachment 8831459 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/db/gloda/modules/index_msg.js
@@ +2778,5 @@
> +    // We explicitly know about these things rather than bothering with some
> +    // form of registration scheme because these aren't going to change much.
> +    get _kFolderLoadedAtom() {
> +      delete this._kFolderLoadedAtom;
> +      return this._kFolderLoadedAtom = atomService.getAtom("FolderLoaded");

You really picked the wrong reviewer who has NO idea about this stuff. Care to explain what's happening here, so I learn something ;-)

How often is that called? To the untrained eye is looks pretty inefficient since it seems that the existing thing is deleted and re-fetched.

Please excuse the ignorance.
Comment on attachment 8831459 [details] [diff] [review]
Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache

Review of attachment 8831459 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/db/gloda/modules/index_msg.js
@@ +2778,5 @@
> +    // We explicitly know about these things rather than bothering with some
> +    // form of registration scheme because these aren't going to change much.
> +    get _kFolderLoadedAtom() {
> +      delete this._kFolderLoadedAtom;
> +      return this._kFolderLoadedAtom = atomService.getAtom("FolderLoaded");

A lazy getter like this is called exactly once, as the first time it is called, it is overwritten with the computed result. The benefit is that if it is never called, the value never has to be computed in the first place.
So why the 'delete'?

And what did the other version do:
+      XPCOMUtils.defineLazyGetter(this, "_kFolderLoadedAtom",
+        () => atomService.getAtom("FolderLoaded"));

Now, are we saying that if these atoms are ever used, they will leak? The only change here is that we defer getting them, and during the xpcshell.exe run they are not requested, and therefore they aren't fetched and don't leak.

That would mean that we don't really fix the leakage. Or do I misunderstand it all?
So if you run a full-blown session with Gloda, they will leak at shutdown?

Looking at the spontaneous allocations in C++ code, like
  nsCOMPtr <nsIAtom> isSecureAtom = MsgGetAtom("isSecure");
they die with the nsCOMPtr.
(In reply to Jorg K (GMT+1) from comment #47)
> So why the 'delete'?

You're not able to set a getter, so you have to delete it first. (In more complicated circumstances, it would also avoid any possible recursion, so it's a good thing to do.)

> And what did the other version do:
> +      XPCOMUtils.defineLazyGetter(this, "_kFolderLoadedAtom",
> +        () => atomService.getAtom("FolderLoaded"));

You can look it up in the XPCOMUtils module, it'll be basically the same thing.

> Now, are we saying that if these atoms are ever used, they will leak? The
> only change here is that we defer getting them, and during the xpcshell.exe
> run they are not requested, and therefore they aren't fetched and don't leak.

There is no explicit releasing from JS. Any atom (or anything else) that is fetched from JS will be held in memory until there are no further references to the JS object, at which point the JS GC will clean it up and release it. I.e. it will die with the  _folderListener instance.

> That would mean that we don't really fix the leakage. Or do I misunderstand
> it all?

It *should* be OK. A JS leak is if you keep a reference around that you never get rid of. Gloda does have code to remove the reference to the listener, and that gets called during a normal shutdown. The xpcshell precache compilation is all about startup optimization though and so there is no normal shutdown.

> So if you run a full-blown session with Gloda, they will leak at shutdown?

I haven't seen any issues in local testing.
Note the version of the patch in the previous try pushes had some missing underscores, so I would expect some gloda test failures there.
New try: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=10da34cc963b0daf9d9ded2dc1e099193ef3eb5f
Thanks for the answers.

Now, looking at early try results, I'm not sure debug builds are worth having ;-( Look:

INFO -  PROCESS | 6420 | [6420] ###!!! ASSERTION: dynamic atom with non-zero refcount TotalUnreadMessages!: 'false', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
INFO -  PROCESS | 6420 | Hit MOZ_CRASH() at c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/memory/mozalloc/mozalloc_abort.cpp:33
WARNING -  PROCESS-CRASH | mailnews/imap/test/unit/test_imapFilterActions.js | application crashed [@ mozalloc_abort(char const * const)]

Currently those builds just fail, so we don't need to see the sorry state that follows.

Maybe we should focus on ripping out the atoms instead. I could do a new bug for that and refresh my attachment 8831342 [details] [diff] [review].
Running the gloda xpcshell tests locally with a debug build, I see two atom-ASSERT-induced crashes:

mailnews/db/gloda/test/unit/test_index_junk_imap_online.js
mailnews/db/gloda/test/unit/test_index_junk_imap_offline.js

Both of these are due to the TotalUnreadMessages atom, i.e. unrelated to the gloda issue in this bug.
(In reply to Jorg K (GMT+1) from comment #50)
> Maybe we should focus on ripping out the atoms instead. I could do a new bug
> for that and refresh my attachment 8831342 [details] [diff] [review].

I'm not sure ripping out the atoms altogether is a minimal intervention either, you may end up with unexpected side effects. In a way, this is just a forced leak cleanup. There may in the end not be that many, and seeing which tests fail may help to identify them (e.g. all the other gloda tests pass).
Damn, I pasted the wrong thing:

INFO -  PROCESS | 6420 | [6420] ###!!! ASSERTION: dynamic atom with non-zero refcount TotalUnreadMessages!: 'false', file c:/builds/moz2_slave/tb-try-c-cen-w32-d-00000000000/build/mozilla/xpcom/ds/nsAtomTable.cpp, line 397
INFO -  PROCESS | 6420 | #01: NS_ShutdownAtomTable() [xpcom/ds/nsAtomTable.cpp:539]
INFO -  PROCESS | 6420 | #02: mozilla::ShutdownXPCOM(nsIServiceManager *) [xpcom/build/XPCOMInit.cpp:1082]
INFO -  PROCESS | 6420 | #03: XRE_XPCShellMain(int,char * *,char * *,XREShellData const *) [js/xpconnect/src/XPCShellImpl.cpp:1631]
(In reply to aleth [:aleth] from comment #52)
> I'm not sure ripping out the atoms altogether is a minimal intervention
> either, you may end up with unexpected side effects. In a way, this is just
> a forced leak cleanup. There may in the end not be that many, and seeing
> which tests fail may help to identify them (e.g. all the other gloda tests
> pass).

i.e. assuming all the MsgGetAtom()s are COMPtrs and so unproblematic, the question is why/where the nsMsgFolder destructor is not being called when it should be.
(In reply to aleth [:aleth] from comment #52)
> I'm not sure ripping out the atoms altogether is a minimal intervention
> either, ...
Right, it's maximal intervention. See private message I've just sent.

> In a way, this is just a forced leak cleanup.
I love those, particularly on a Friday night or weekend with no reviewer in sight.

> There may in the end not be that many, and seeing
> which tests fail may help to identify them
I don't agree. We already know that we never clean-up any nsMsgDBFolder objects, see comment #26.

I mentioned bug 1223574 and it's little brother, bug 1303637. While the M-C people will possibly clean up the processing in the profile manager, that one little process that shows one panel and then exits, we will have a very bad time fixing bug 1223574. Firefox (apart from the profile manager) doesn't leak anything when you close it down, but TB does in a big way.

Anyway, I hope I'm wrong (gloomy German vs. cheery Italian) and it's all better than it seems.
(In reply to Jorg K (GMT+1) from comment #55)
> > In a way, this is just a forced leak cleanup.
> I love those, particularly on a Friday night or weekend with no reviewer in
> sight.

Yeah, not fun.

> > There may in the end not be that many, and seeing
> > which tests fail may help to identify them
> I don't agree. We already know that we never clean-up any nsMsgDBFolder
> objects, see comment #26.

I did see some destructor calls locally. There may not be that many places that have to be touched to fix this. But of course you are right that there's no way to know in advance.

> I mentioned bug 1223574 and it's little brother, bug 1303637. While the M-C
> people will possibly clean up the processing in the profile manager, that
> one little process that shows one panel and then exits, we will have a very
> bad time fixing bug 1223574. Firefox (apart from the profile manager)
> doesn't leak anything when you close it down, but TB does in a big way.

re bug 1223574: Any assertions due to atoms leaked because TB is "leaking the world" (i.e. some document or other) will not be fixed by your mailnews atom->string conversion, as these will include binding id atoms.
(In reply to aleth [:aleth] from comment #54)
> i.e. assuming all the MsgGetAtom()s are COMPtrs and so unproblematic, the
> question is why/where the nsMsgFolder destructor is not being called when it
> should be.
Right, all the "ad hoc" ones go via nsCOMPtr and are therefore unproblematic.

The problematic ones are the ones where we take over the pointer and therefore have to do the release ourselves:
https://dxr.mozilla.org/comm-central/search?q=regexp%3AMsgNewAtom.*take&redirect=false

Oh, DXR has refreshed now, so one known culprit is here:
https://dxr.mozilla.org/comm-central/rev/a891516b976bd58497b2f4d6a6f5cce6d074013c/mailnews/base/util/nsMsgDBFolder.cpp#135
You've got to love those tricky macros, basically this expands to something like
  kIsSecureAtom = MsgNewAtom("isSecure").take();
but for a whole list.
(In reply to aleth [:aleth] from comment #56)
> re bug 1223574: Any assertions due to atoms leaked because TB is "leaking
> the world" (i.e. some document or other) will not be fixed by your mailnews
> atom->string conversion, as these will include binding id atoms.

(see comment 20) Sorry if that is gloomy news...
(In reply to aleth [:aleth] from comment #56)
> re bug 1223574: Any assertions due to atoms leaked because TB is "leaking
> the world" (i.e. some document or other) will not be fixed by your mailnews
> atom->string conversion, as these will include binding id atoms.
Absolutely right. That's why I said
  So going to strings would at least eliminate *that* source of atom leakage.
in my private message discussing the atom->string conversion in nsIFolderListener.idl and nsIMsgFolder.idl.
Comment on attachment 8831459 [details] [diff] [review]
Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache

Given that I'm obviously the wrong reviewer here, feel free to land this with rs=jorgk (or even rs=bustage-fix).

I'm still not sure how badly debug builds will be broken once this lands.
Attachment #8831459 - Flags: review?(jorgk) → review+
XPCshell tests may indeed be helpful here: e.g. running mailnews/local/test/unit/test_folderLoaded.js, 5 nsMsgDBFolders are created, and 4 of them are destroyed properly. So that's "almost" a pass ;)
(In reply to Jorg K (GMT+1) from comment #60)
> Comment on attachment 8831459 [details] [diff] [review]
> Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in
> precompile_cache
> 
> Given that I'm obviously the wrong reviewer here, feel free to land this
> with rs=jorgk (or even rs=bustage-fix).
> 
> I'm still not sure how badly debug builds will be broken once this lands.

You can decide after the try run in comment 49 finishes. All builds should be green and opt tests will be unaffected. Debug tests should be ... interesting.
No, you can decide ;-) You're more Mr. Purist than I am.

Anyway, you wanted me to read about XPCOMUtils, so I did (always happy to be educated, if it's not too hard).

Now there I'm reading:

  defineLazyGetter: function XPCU_defineLazyGetter(aObject, aName, aLambda)
  {
    Object.defineProperty(aObject, aName, {
      get: function () {
        // Redefine this accessor property as a data property.
        // Delete it first, to rule out "too much recursion" in case aObject is
        // a proxy whose defineProperty handler might unwittingly trigger this
        // getter again.
        delete aObject[aName];  <===
        let value = aLambda.apply(aObject);

So you're sure that we need our own delete?
(In reply to aleth [:aleth] from comment #52)
> In a way, this is just a forced leak cleanup.
Maybe it's a good thing, the thing Chiaki has always dreamt of: To clean up what's in the box. He's been a constant "voice crying in the desert" (he was quick to jump in at bug 1334558 comment #13).

My pet hate is this message:
WARNING: some msg dbs left open: '!m_dbCache.Length()', file ... /mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 85
OK, looks like there are a small number of xpcshell debug failures (due to the folder leaks), which one can live with until fixed imho, and a big mozmill problem (due to leaking the world).
(In reply to Jorg K (GMT+1) from comment #64)
> My pet hate is this message:
> WARNING: some msg dbs left open: '!m_dbCache.Length()', file ...
> /mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 85

If we don't close the folders, we may also not close their databases. That message may indicate that. I'd be also happy if we could fix that one.
(In reply to Jorg K (GMT+1) from comment #63)
> So you're sure that we need our own delete?

Try it, you'll get "TypeError: setting a property that has only a getter"

Note the defineLazyGetter code doesn't actually involve a getter in the same way, so the role of the delete there is slightly different.
https://hg.mozilla.org/comm-central/rev/e1b5ba1424d2360fa40a427f7bc82c5af1a4af1b
Bug 1334771 - Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache. rs=jorgk bustage-fix CLOSED TREE
Resolving as the build failure is fixed.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 54.0
Comment on attachment 8831459 [details] [diff] [review]
Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache

Looks like Aceman is around, so let's get someone competent to review this ;-)
Attachment #8831459 - Flags: review?(acelists)
Comment on attachment 8831459 [details] [diff] [review]
Use lazy getters to fetch atoms in Gloda to avoid a shutdown leak in precompile_cache

Oops, too late, never mind.

So who's filing follow up bugs for the test failures? Also, we can all head over to bug 1334834.
Attachment #8831459 - Flags: review?(acelists)
(In reply to Jorg K (GMT+1) from comment #26)
> The other day there was some discussion of IRC (#maildev) that likened FF to
> a VM, so programs run inside it. Well, any good operating system cleans up
> after ugly applications, so whatever the application allocates, the
> operating system will free when the application is shut down. It would be
> very hard to educate *all* applications to *always* to the right thing on
> *all* their code paths.

That's an excellent analogy.  We've certainly been thinking about FF as a VM/OS; I hear we even tried to make the OS label official, but it didn't stick very well. ;)

I think the place of TB in the FF VM/OS is mistaken, though.  The "programs" that FF is tuned to run are JS webpages, and like any good OS, we do clean up after web pages after you close the tab(s) hosted them (after they "exit").  The memory leak checings we have built into the system, such as MOZ_COUNT_CTOR/DTOR and the refcounting leak checking that any NS_IMPL_ISUPPORTS-using class receives, are there to ensure that we are releasing all the resources that "programs" consume.  Otherwise, we'd build slow leaks over time, and the stability of the "OS" would be compromised.

TB is more like a kernel module, or a device driver if you like that terminology better: any mishandling of "kernel" datastructures in TB can compromise the integrity or stability of the entire system.  And writing kernel modules is *hard*: it's one of the reasons we took away binary add-ons and components, because the rules aren't always obvious and they do change quite a bit...and if you don't get things right, the browser will probably crash (and often did with binary add-ons).  So we are asking difficult things of TB, but that's partially because it's not operating as a "userspace" program.

> So maybe the M-C developers might consider turning down the volume of the
> noise they make about leaked atoms just a little. How about making this a
> neat little warning:
> BTW, you leaked these atoms: ... (with a pretty list of the first ten or so).

We could certainly do that.  We could probably even arrange for this message to go away if you weren't compiling Firefox.  But the message is there for a reason: it tells you that you're holding on to certain things *way* past the point you're supposed to, and whatever is holding on to those things should have been cleaned up already, and whatever is holding onto...you get the idea.

I know it's frustrating to have all these messages explode on some random merge.  The messages are telling you that there is a problem, and that there's an opportunity to make TB better.
Nathan, thanks for your comment. As you said "it's frustrating to have all these messages explode on some random merge [of M-I to M-C]".

You need to understand our situation. TB has been a somewhat unorganised bunch of unpaid volunteers, a pack of cats, as Kent refers to them. Slowly the project is organising itself and putting on staff, I'm the first one in charge of keeping TB running (or limping along).

On Friday we had four bustages: (1) First someone decided that ";" is no longer a valid comment character in .ini files, (2) next someone changes some constant (WINVER) so our Windows builds blow up, and then you harden Firefox and (3) disallow the "kernel module" TB the access to a kernel interface NS_RegisterStaticAtoms() (I think you haven't decided whether TB is really that "kernel module" or an add-on) and when we scramble to fix that at midnight, (4) it turns out that the alternative interface "dynamic atoms" also causes problems.

All this without any notice or warning whatsoever. And funnily it turns out that Firefox also doesn't have its house clean, since in the profile manager is also leaking a fair lot of things.

So while we appreciate the opportunity to make things better, it really doesn't have to happen on the Friday night at midnight and the following weekend, or as Alta88 pointed out in comment #34: "done without bustage fix pressure".

Let's put things into perspective: I know that {some|most} M-C developers are not friends of Thunderbird. However, the TB team, despite being a pack of cats, have an excellent skill set and we are the secondary M-C testing brigade (bug 1328847 comment #38 "more examples of c-c usefully expanding the test coverage of m-c ;)"), because, to use the analogy, we are the kernel module developers and tread where no one else sets their foot. I could give you so many examples of TB people yelling when something wasn't right in FF. So I think Don Quixote should treat its (perhaps somewhat unwanted) squire Sancho Panza just a little more kindly ;-)
(In reply to aleth [:aleth] from comment #39)
> Explicit lazy getters instead of using the helper, in case that is more
> readable.
Hmm, I don't think it's more readable and I prefer the first version which was:

+      XPCOMUtils.defineLazyGetter(this, "_kFolderLoadedAtom",
+        () => atomService.getAtom("FolderLoaded"));

That function does:

  defineLazyGetter: function XPCU_defineLazyGetter(aObject, aName, aLambda)
  {
    Object.defineProperty(aObject, aName, {
      get: function () {
        // Redefine this accessor property as a data property.
        // Delete it first, to rule out "too much recursion" in case aObject is
        // a proxy whose defineProperty handler might unwittingly trigger this
        // getter again.
        delete aObject[aName];
        let value = aLambda.apply(aObject); (1)
        Object.defineProperty(aObject, aName, { (2)
          value,
          writable: true,
          configurable: true,
          enumerable: true
        });
        return value; (3)
      },
      configurable: true,
      enumerable: true
    });
  },

which you have now explicitly compacted to ...

+    get _kFolderLoadedAtom() {
+      delete this._kFolderLoadedAtom;
+      return this._kFolderLoadedAtom = atomService.getAtom("FolderLoaded");
+    },

so the 'delete' is still there and the second line gets the value (1), I suppose the assignment creates/defines the property (2) and then returns it (3).

Ah well, working, but extremely tricky and it carries the encapsulated set of wizardry to elsewhere.
You need to log in before you can comment on or make changes to this bug.