Open Bug 1223574 Opened 9 years ago Updated 2 years ago

Thunderbirds is leaking JS runtimes on various occasions

Categories

(Thunderbird :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: jorgk-bmo, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: memory-leak)

Start a daily debug version, I used one compiled on 10th Nov. 2015.
Use the -p switch. Create a new profile or select an existing one. Once you dismiss the panel, you get the first leakage:

WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
Leaked URLs:
  chrome://mozapps/content/profile/profileSelection.xul
  resource://gre-resources/counterstyles.css
  chrome://global/content/minimal-xul.css
  resource://gre-resources/quirk.css
  resource://gre/res/svg.css
(the list goes on).

Let's assume you selected an existing but new profile with just one account configured so you can compose a message. Create a new message, don't add any recipient, don't type anything into the body, just close the composition window. Then close the main window.

You get the next leakage:

WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
Leaked URLs:
  chrome://gloda/content/thunderbirdOverlay.xul
  resource://gre-resources/counterstyles.css
  chrome://global/content/minimal-xul.css
  resource://gre-resources/quirk.css
  resource://gre/res/svg.css
  chrome://global/content/xul.css
  chrome://global/skin/scrollbars.css
(the list goes on).

Other times I got this:
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
Leaked URLs:
  chrome://editor/content/editorMailOverlay.xul
  chrome://gloda/content/thunderbirdOverlay.xul
  chrome://messenger/content/search/searchOverlay.xul
  chrome://messenger/content/about-support/overlay.xul
  chrome://messenger-smime/content/msgReadSMIMEOverlay.xul
  chrome://messenger/content/chat/chat-messenger-overlay.xul
(the list goes on).
Running a local Firefox build with -p I get this when exiting the profile manager and starting FF:

WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive
inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
Leaked URLs:
  chrome://mozapps/content/profile/profileSelection.xul
  chrome://pluginproblem/content/pluginFinderBinding.css
  chrome://pluginproblem/content/pluginProblemBinding.css
  resource://gre-resources/counterstyles.css
  resource://gre-resources/html.css
  chrome://global/content/minimal-xul.css
  resource://gre-resources/quirk.css
(the list goes on).

So the first part is *not* a TB specific problem.
OK, I've noticed the following: If I start Daily and don't open/view any messages and don't open the options window, then there is no leak reported at application close time. If I open a message in a tab or in a new window, then I get the leak at shutdown time. Also, starting a message composition will cause it.

I have no idea where to start. Perhaps I can get Chiaki-san interested (is this the first name?).
Flags: needinfo?(ishikawa)
(In reply to Jorg K (GMT+1) from comment #3)
> OK, I've noticed the following: If I start Daily and don't open/view any
> messages and don't open the options window, then there is no leak reported
> at application close time. If I open a message in a tab or in a new window,
> then I get the leak at shutdown time. Also, starting a message composition
> will cause it.
> 
> I have no idea where to start. Perhaps I can get Chiaki-san interested (is
> this the first name?).

Yes, Chiaki is my first name. Good guess :-)

Actually, I have noticed these shutdown issues for some time.

A local run (under Debian GNU/linux 64-bit) of |make mozmill| test suite produces many warnings. This is DEBUG build of C-C thunderbird.

Here is something I produce from a local run of |make mozmill| on Feb 10, for example.

 ========================================
 shutdown order issue, etc
 ========================================

     66 [10659] WARNING: unable to post continuation event: file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp, line 479
     66 [10659] WARNING: An event was posted to a thread that will never run it (rejected): file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp, line 601
     33 [10659] WARNING: unable to Flush() dirty datasource during XPCOM shutdown: file /NREF-COMM-CENTRAL/comm-central/mozilla/rdf/base/nsRDFXMLDataSource.cpp, line 749
     32 WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
     32 [10659] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/XPCOMInit.cpp, line 1008
     16 [14171] WARNING: some msg dbs left open: '!m_dbCache.Length()', file /NREF-COMM-CENTRAL/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 84
     11 [14171] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/base/nsCycleCollector.cpp, line 3563
      5 [14238] WARNING: A runnable was posted to a worker that is already shutting down!: file /NREF-COMM-CENTRAL/comm-central/mozilla/dom/workers/WorkerPrivate.cpp, line 2303
      3	  "/REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/asyncshutdown/AsyncShutdown.jsm",
      2 db left open /NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile/Mail/tinderbox/Inbox.msf
      1 export XPCOM_CC_RUN_DURING_SHUTDOWN=yes
      1 db left open /NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile/Mail/Test Local Folders/Test Folder.msf
      1 db left open /NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile/Mail/Test Local Folders/Another Folder.msf
      1 db left open /NREF-COMM-CENTRAL/objdir-tb3/_tests/mozmill/mozmillprofile/Mail/Reply Identity Testing/Replies.msf

      ... reported for different folders at shutdown time...


The number shown at the beginning of the line is the frequency of occurrences of the warning (disregarding the process id printed at the beginning of such warning/error log lines.)

32 seems to be a few numbers less than the number of test directories mozmill runs (35) and presumably the number of times a thunderbird daemon is started for each directory.
 
My take on the warning lines:

     66 [10659] WARNING: unable to post continuation event: file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp, line 479

This message above obviously suggests that a remaining thread posts an event which is not handled by event system and/or receiving thread.

     66 [10659] WARNING: An event was posted to a thread that will never run it (rejected): file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp, line 601

The message above suggests that the thread on the receiving end of the event somehow refused to process it (presumably it is shutting down).

     33 [10659] WARNING: unable to Flush() dirty datasource during XPCOM shutdown: file /NREF-COMM-CENTRAL/comm-central/mozilla/rdf/base/nsRDFXMLDataSource.cpp, line 749

XPCOM is shutting down, but presumably the I/O subsystem necessary to Flush the datasource is no longer working (!?)

     32 WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!

Yes, this is the bug message you saw. Given the seriousness other messages, the impact of this pales somewhat :-)

     32 [10659] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/XPCOMInit.cpp, line 1008

I think this clearly shows an incorrect order of thread shutdown at the termination of theunderbird.
We really need to figure out the proper shutdown order of threads, and make sure(
maybe using a proper lock, etc. with timeout) so that the order is observed at shutdown time!

     16 [14171] WARNING: some msg dbs left open: '!m_dbCache.Length()', file /NREF-COMM-CENTRAL/comm-central/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 84

I suspect that if Flush() doesn't work, Close() may not work either. This does not give much confidence in thunderbird as the keeper of our messages, right?

If someone can give me an idea of 
 - how to enumerate running threads at the time of shutdown,
 - where to insert such function call to enumerate the thread at shutdown time,
 - a brief description of each thread,
 - more importantly, the dependency of the threads at shutdown time
   (in the sense that thread A needs the service of threads B, C, ...
   to process its shutdown procedure: above warning/error lines suggest
   there is such a dependency between threads),
 - how, if the dependency is unknown, to find it out by
   dumping events, etc. AFTER the top-level shutdown procedure/event is 
   invoked/triggered (should be doable by looking out some global status indicator,
   right?)
 - what mechanism to use to force the ordering: high-level lock, etc. offered
   by mozilla libraries may not work at shutdown for obvious reasons,

then maybe I, or someone else, can look into it.

This has been left ignored far too long, I am afraid.

Because of the seriousness of the warning/error messages I see in |make mozmill| log locally, and the fact that I did experience crashes to thunderbird termination time (not of my own invocation, but due to the upgrade notice. I invoked update process and when thunderbird tried to restart after shutting down, it crashed.), I no longer finish thunderbird one one PC, I simply keep it running inside VirtualBox VM. It is relatively easy to keep it running in this manner.
I think I saw less crashes this manner :-)
But this should not be the normal mode of usage for a supposedly reliable mail client.

TIA
Flags: needinfo?(ishikawa)
Oh, I forgot. One thing those who implement JS interpreter can do is to
leave a name (or address of invoking code) te caller of JS interpreter creation in a simple-minded table (with duplicate entry possible, or simply increment a count if the same caller creates JS interpreter multiple times, and take care of the deletion appropriately), so that we can know WHO the offending creator is.

Right now, I don't think there is such information available.
The above should not add that much overhead, but immensely helps the
debugging and remove this serious offender easily.
I wonder why efforts on such instrumentation are not spent when we have abundant CPU performance now.
I am not calling for a full-featured DTRACE although that would be wonderful, but
a simple trick here and there will go a long way to debug firefox and thunderbird IMHO.

just my two cents worth.

TIA
(In reply to ISHIKAWA, Chiaki from comment #4)

I think actually I need more info:

> If someone can give me an idea of 
>  - how to enumerate running threads at the time of shutdown,
>  - where to insert such function call to enumerate the thread at shutdown
> time,
>  - a brief description of each thread,
>  - more importantly, the dependency of the threads at shutdown time
>    (in the sense that thread A needs the service of threads B, C, ...
>    to process its shutdown procedure: above warning/error lines suggest
>    there is such a dependency between threads),
>  - how, if the dependency is unknown, to find it out by
>    dumping events, etc. AFTER the top-level shutdown procedure/event is 
>    invoked/triggered (should be doable by looking out some global status
> indicator,
>    right?)

When and where the following messages get printed, we need to know
  - which process/thread is posting the event (symbolic name is a plus), and
  - what kind of an event (in symbolic names with possibly a few arguments dumps?)
  - and which thread is the receiving event (again the symbolic name is good)
    
     66 [10659] WARNING: unable to post continuation event: file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp, line 479
     66 [10659] WARNING: An event was posted to a thread that will never run it (rejected): file /NREF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp, line 601
      5 [14238] WARNING: A runnable was posted to a worker that is already shutting down!: file /NREF-COMM-CENTRAL/comm-central/mozilla/dom/workers/WorkerPrivate.cpp, line 2303
 
For the following, which process/thread called Flush(), and possibly print the state of XPCOM shutdown (I have no idea how it is described: normal, received shutdown signal, but not starting shutdown yet, started shutdown but not completed, completed shutdown, etc?)


     33 [10659] WARNING: unable to Flush() dirty datasource during XPCOM shutdown: file /NREF-COMM-CENTRAL/comm-central/mozilla/rdf/base/nsRDFXMLDataSource.cpp, line 749


There may be implementation details, but I don't think they are insurmountable IMHO.
But maybe I am too optimistic.
Kent, I guess Chiaki has always been interested in the various warnings that go past in the debug console an are mainly ignored.

"WARNING: YOU ARE LEAKING THE WORLD" is particularly annoying and visible, since it prints out a whole lot of text. As Chiaki rightly points out in his comment #4, there are many other warnings, the most disconcerting one perhaps is the "WARNING: some msg dbs left open".

Do you have an opinion on all of this? Looks like were shipping a product which internally tells us that there are some things wrong under the covers. We should maybe dedicate some time to this, don't you think?
Flags: needinfo?(rkent)
Yes there are problems at shutdown. Mozilla awhile back made a hang at shutdown a crash after a certain taimeout, and that crash is our #1 crash.

I'm just not sure how important it really is. There are also lots of warning that show up whenever you run a debug version of Thunderbird, many in obscure parts of the code like layout. But my Windows error log also spews lots of warning and errors every day that I choose to ignore (and so apparently do the Windows programmers).

If I were to pick priorities, I would start with the errors that occur regularly in the Error Console for TB45. I get a few on every startup.

"Leaking the world" is obviously meant to be  serious warning. But you have to decide if where it occurs is important. If you were leaking the world every time that you viewed a message, that would be bad. If you leak the world only when you are creating a new profile, that is really not so serious.

So there are no easy answers to priority. Leaks can also be really challenging to find, particularly the JS/C++ interaction leaks that are most of what we see these days.
Flags: needinfo?(rkent)
As rkent says in comment 8, we have to prioritize the targets.
That is why I summarized the warnings with the frequency of occurrences.
I tried to reduce the number of most often seen javascript warnings/errors before.
A few years back, JavaScript warning came down to about half a dozen at one time(!).
Unfortunately, Debian GNU/Linux which I have been using a GCC issue that made TB build
impossible for more than half a year. When GCC issue was resolved, the JavaScript warnings were more than two dozens :-(

So eliminating the warnings/errors must be constant process.

The timeout mechanism that rkent mentions is a small improvement, but
I still feel very uneasy about warnings regarding databases being not closed.
That is another way of assigning priority: one can focus on a particular message that disturbs him/her as well as paying attention to the frequency of occurrences.

As for warnings/errors in the error console. In the last half a year so, I think
I see more error messages. It is irregular, but sometimes I see global tab variable becomes undefined (!). I have no idea where that happens, but when that happens, gasp,
I found that the search dialog is corrupted. The condition column box, etc. becomes a horizontally squeezed (not much height) and cannot be read: if I choose a tab explicitly
most of the time, the problem disappears, but not always (that is why I hesitated to report it.) But I digress. I should report this to another bugzilla.

Anyway, I think what one can do is to analyze the debug log with a few statistical analysis approach (like mine: counting the frequency) and prioritize the issues.

One problem is that we simply don't have the man-power to tackle the prioritized list reasonably well. I have occasionally raised issues and solve the most visible warnings/errors, but the number of bugs/warnings remain more or less constant (or gradually increasing.)
Depends on: 1303637
On the issue of leaked JS runtimes, Aleth posted this to bug 1303637 comment #6:
===
There are previous shutdown leak fixes even in c-c if you are looking for potentially useful examples, e.g. bug 765074, bug 764742, bug 955563, bug 955236.
===

We'll come back to this after fixing bug 1334874 (leaked nsMsgDBFolder objects).

Aceman, you fixed some of the bugs listed above, perhaps I can get you interested in some leak hunting.
Flags: needinfo?(acelists)
I have seen now that this assertion is intermittent on mozmill tests.
I run a single test (folder-widgets/test-message-filters.js) and the assertion triggered, as it does on most mozmill tests (where TB UI is up).

Then I debugged the test and downclocked my CPU+GPU to ~25%. The test of course ran awfully slow. But I noticed in the log that the assertion was suddenly not there.
The mozmill run exited cleanly in this aspect.

May be important fact when debugging this one. It may be timing related, not that we permanently keep some hanging references, which would never go away. It seems they do.

AND, the "some msg dbs left open:" warning WAS still there (it is the DB of the last viewed folder).

I could reproduce this consistently so far.
Flags: needinfo?(acelists)
Keywords: mlk
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.