Open Bug 1240722 Opened 8 years ago Updated 2 months ago

Thunderbird on Linux leaks file descriptors (mostly .msf files) causing high memory. 5 imap accounts. no virtual folders. "Unable to open the summary file for Draft" and "too many files open"

Categories

(MailNews Core :: Database, defect)

Unspecified
Linux
defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: mozilla, Unassigned)

References

(Blocks 1 open bug, )

Details

Attachments

(1 obsolete file)

Since a couple of days, Thunderbird frequently pops up weird and confusing error messages (could not connect to host, could not save draft, unknown host "localhost", malformed .msf file, ...), and I was mystified what happened, and even wrongly suspected my DavMail gateway...
Until I accidentally clicked the attach button, and got a message about "too many files open".

At least that last message was clear (unlike the zillions before...). An lsof showed that Thunderbird had hundreds of .msf files open... and ended up hitting the open file limit of 1024.

The problem here is twofold:
1. Thunderbird should not keep that many files open, and close files once it is done reading them

2. And yet again, I'm not getting tired of repeating it, error messages should be clear and non-misleading. The problem had me mystified during days until finally I fat-fingered that god-sent attach button. Why is it so difficult to just take the damn message returned by the system and show it to the user? Or is this a deliberate political decision born out of a misguided desire to pander to Outlook switchovers or whatever? Is it really thunderbird's idea of userfriendliness that we have to run "strace -p `pidof thunderbird` 2>&1 | fgrep --line-buf '= -1 E' | fgrep --line-buf -v EAGAIN " along with thunderbird just to understand what's wrong with it? (and even that wouldn't help if the error was raised by a library such as ssl...)
clarity please ...
on which OS does this occur?
all pop accounts?  all imap?  or a mix?   how any?
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #1)
> clarity please ...
> on which OS does this occur?

Kubtuntu 14.04.3

> all pop accounts?

No POP accounts

>  all imap?

5 Imap accounts

>  or a mix?

And one "Local Folders"

>   how any?

5 Imap + 1 Local

And the wallpaper of my office is white.

We are on the 10th floor.

And there's snowing outside :-)


In the meantime, I upped the permitted number of open files to 8192 per application (which makes Thunderbird usable again), but I continued to monitor it using lsof from time to time. Interestingly enough, I see that the maximum file descriptor id still stays around (but barely above) 1024. Yesterday, it oscillated between 1025-1027, today I see 1031.
But I have 1424 .msf files.
To me, this means that there must be some mechanism within Thunderbird that attempts (but fails to...) regularly close file descriptors in order to keep their total number below 1024. Neat, but counterproductive if it doesn't get the count *quite* right. Wouldn't it be preferable to close those descriptors for which it is known that they are no longer needed right away? As far as I understood, Thunderbird only needs to read those .msf files *once*, in order to read back the state where it left when last closing. Those are files that are private to Thunderbird, so it's not as if it needs to continuously monitor them for change by another application. So why not close them after initialization?
Thanks for the details. Those are very helpful

In fewer words, this is simply a bug :)

1. What is the value of mail.db.max_open in tools | options | advanced | general | config editor?  (the default is 30)

2. what software is used for your imap server?  

3. how much memory is thunderbird process using when it gets to keeping so many folders open?

4. please a) disable automatic compact at tools | options | advanced | disk   b) disable "Enable Global Search ..." at tools | options | advanced | general

If still a problem after #4...
5. does problem persist if you start THunderbird in safe mode?

TIA
Severity: normal → major
Flags: needinfo?(mozilla)
OS: All → Linux
I'll start with replying to the first 2 questions, and will get around to the others once they occur.
I've now lowered the ulimit on file descriptors to the system default value and waiting for the problem to re-occur.

(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #3)
> Thanks for the details. Those are very helpful
> 
> In fewer words, this is simply a bug :)

Thanks for acknowledging this :-)

> 
> 1. What is the value of mail.db.max_open in tools | options | advanced |
> general | config editor?  (the default is 30)

It is indeed 30. But right now, it's already got files open up to 779, and still climbining.

> 
> 2. what software is used for your imap server?  

4 of them are dovecot, and one davmail.

> 
> 3. how much memory is thunderbird process using when it gets to keeping so
> many folders open?

So far top shows:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 4524 alain     20   0 1723144 476588  55424 S   0.0  1.5   0:46.65 thunderbird 

.... but the problem hasn't re-occurred yet. I'll gather this again once it happens.

> 
> 4. please a) disable automatic compact at tools | options | advanced | disk 
> b) disable "Enable Global Search ..." at tools | options | advanced | general

I'll do this after it first happens, and then I'll get back to you.

> 
> If still a problem after #4...
> 5. does problem persist if you start THunderbird in safe mode?
> 
> TIA

I'll do this after it happens again after doing #4 :-)
Flags: needinfo?(mozilla)
Ok, it ran out of file descriptors.

Here's the top output for memory usage:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 4524 alain     20   0 1875680 543272  55996 S  12.4  1.7   1:14.26 thunderbird

... and here is the error message "Unable to open the summary file for Drafts. Perhaps there was an error on disk, or the full path is too long." (but tens of different other messages occur as well, depending on where exactly it gets the "too many open files" from the system.
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #3)
[...]
> 4. please a) disable automatic compact at tools | options | advanced | disk 
> b) disable "Enable Global Search ..." at tools | options | advanced | general

Still fails (this time with "Failed to read 'formatDateLong' from chrome://calendar/locale/calendar.properties.")

Now trying last item...:

> 
> If still a problem after #4...
> 5. does problem persist if you start THunderbird in safe mode?
> 
> TIA
(In reply to Alain Knaff from comment #2)
> But I have 1424 .msf files.
> To me, this means that there must be some mechanism within Thunderbird that
> attempts (but fails to...) regularly close file descriptors in order to keep
> their total number below 1024. Neat, but counterproductive if it doesn't get
> the count *quite* right.

There is no mechanism to watch the number 1024. We do not detect how many descriptors are allowed on your system. As said, we observe by default 30 open IDLE msfs. But if you have many folders that are not idle, the number can grow up to the number of all your folders (msfs).

> Wouldn't it be preferable to close those
> descriptors for which it is known that they are no longer needed right away?

We do not know if the msf will be no longer needed in the near future. So the files are open for a while for performance reasons. Only after some time of inactivity, they get closed.

> As far as I understood, Thunderbird only needs to read those .msf files
> *once*, in order to read back the state where it left when last closing.
> Those are files that are private to Thunderbird, so it's not as if it needs
> to continuously monitor them for change by another application. So why not
> close them after initialization?

The files are also needed to save new changes of the state (new msgs, new tags, etc).


What you can do now, is to set the pref mailnews.database.dbcache.logging.console (in options->advanced->config editor) to the value if "Debug". Then restart TB. Open the error console and watch the messages. Report here if you find anything suspicious.

Also check the value of the pref mail.db.idle_limit, the default is 300000 (in milliseconds).
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #3)
[...]
> If still a problem after #4...
> 5. does problem persist if you start THunderbird in safe mode?
> 
> TIA

It still occurs, even in safe mode (even though now it takes somewhat longer till it happens...)

Message this time: "Failed to connect to server<email@address>"

Observing the consumed file descriptors, it seems the number of file descriptors increases in jumps. And for some operations they are actually freed. For instance, rightClick->searchMessages seems to add approx 100 open file descriptors, but those are closed again a short while after the search is completed. However, just letting firefox site there seems to add approx 500 open file descriptors, whiche then *are*not* closed. On reaching just short of 1024, the situation stays stable for a while, and then it attempts to go beyond (this time, triggered by a search)
(In reply to :aceman from comment #7)
> (In reply to Alain Knaff from comment #2)
> > But I have 1424 .msf files.
> > To me, this means that there must be some mechanism within Thunderbird that
> > attempts (but fails to...) regularly close file descriptors in order to keep
> > their total number below 1024. Neat, but counterproductive if it doesn't get
> > the count *quite* right.
> 
> There is no mechanism to watch the number 1024. We do not detect how many
> descriptors are allowed on your system. As said, we observe by default 30
> open IDLE msfs. But if you have many folders that are not idle, the number
> can grow up to the number of all your folders (msfs).

As said, the number of my msfs is 1425, however the max file id hovers around 1024 (sometimes slightly less, sometimes slightly more) even if max number of open files is set to much higher.

> 
> > Wouldn't it be preferable to close those
> > descriptors for which it is known that they are no longer needed right away?
> 
> We do not know if the msf will be no longer needed in the near future. So
> the files are open for a while for performance reasons. Only after some time
> of inactivity, they get closed.

ok

> 
> > As far as I understood, Thunderbird only needs to read those .msf files
> > *once*, in order to read back the state where it left when last closing.
> > Those are files that are private to Thunderbird, so it's not as if it needs
> > to continuously monitor them for change by another application. So why not
> > close them after initialization?
> 
> The files are also needed to save new changes of the state (new msgs, new
> tags, etc).
> 
> 
> What you can do now, is to set the pref
> mailnews.database.dbcache.logging.console (in options->advanced->config
> editor) to the value if "Debug". Then restart TB. Open the error console and
> watch the messages. Report here if you find anything suspicious.

Right at the time when the number of open files jumped from around 100 to 500, I saw lots of messages such as the following being logged:

2016-01-21 13:25:08	mailnews.database.dbcache	DEBUG	skipping cachedDB not open in folder: Hubert

... and then finally
2016-01-21 13:26:08	mailnews.database.dbcache	INFO	open db count 7


> 
> Also check the value of the pref mail.db.idle_limit, the default is 300000
> (in milliseconds).

Initially, I left this as is...

Now I set it lower (3000), however even after 3 seconds, it didn't free up those msf file descriptors
Eventually I get:

2016-01-21 13:35:09	mailnews.database.dbcache	DEBUG	closing expired msgDatabase for folder: Inbox
2016-01-21 13:35:09	mailnews.database.dbcache	INFO	open db count 1

... but the number of open file descriptors still is 1015...
In order to work around this bug, I had upped the number of maximally permitted open file descriptors to 1500.

This has "fixed" the issue for a while, until today it happened again (max fd number = 1499)

However, this time something has changed: many of the .msf files have been open multiple times.

Strange fact, for 64 of them, the both file descriptors are _exactly_ 1000 units apart.

thunderbi 5786 alain  116u   REG               8,34     92025 1049835 /home/alain/.thunderbird/p4u2gnrd.default/ImapMail/mail.lll.lu/Smile.msf
thunderbi 5786 alain 1116u   REG               8,34     92025 1049835 /home/alain/.thunderbird/p4u2gnrd.default/ImapMail/mail.lll.lu/Smile.msf
Alain, can you send an example of the full list of open fd to aceman?
Aceman, what do you make of comment 9, comment 11?
Flags: needinfo?(mozilla)
Flags: needinfo?(acelists)
It may be possible that even if we close/not cache a DB in the upper layers, we still keep the file referenced (with a descriptor) in some low layer.

I myself observed that e.g. if you open a new window in TB the cached DBs are actually purged from cache. We still do not know why that is. So there are unknowns in the backend msf handling.
Flags: needinfo?(acelists)
> Alain, can you send an example of the full list of open fd to aceman?

When I attempted to gather the list now, I noticed that now (version 38.6.0) there are "only" 628 file descriptors open (with a max of 646). That's still a huge amount, but well below the limit of 1024.

Given that there are now much less descriptors, are you still interested in the exhaustive list? (Just asking, as I'm somewhat hesitant to send such personal data out...) Or is it ok if I replace the actual folder names with a one-way hash?(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #12)
Flags: needinfo?(mozilla)
> is it ok if I replace the actual folder names with a one-way hash?
sure. if you're OK, send to me also
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #15)
> > is it ok if I replace the actual folder names with a one-way hash?
> sure. if you're OK, send to me also

Is on its way via e-mail :-)
A hash is fine (we do not need real folder names), however a human readable encoding would be better so we can see easily if there are the same folders opened multiple times. Maybe just hash it into 10 characters?

Also, are you using Virtual folders (Saved searches) in Thunderbird? The ones where you specify a search term so that only messages matching the term are shown in this folder and this stored as a folder and you also set there over which folders it should search. Created via File->New->Saved search.
(In reply to :aceman from comment #17)
> A hash is fine (we do not need real folder names), however a human readable
> encoding would be better so we can see easily if there are the same folders
> opened multiple times. Maybe just hash it into 10 characters?

errrmmm.... just cut off everything after the 10 first characters? :-)

> 
> Also, are you using Virtual folders (Saved searches) in Thunderbird? The
> ones where you specify a search term so that only messages matching the term
> are shown in this folder and this stored as a folder and you also set there
> over which folders it should search. Created via File->New->Saved search.

I do not have any saved searched on that Thunderbird instance.
Other potential matches https://mzl.la/2d4Yzw0
See Also: → 855836
See Also: → 1322409
Just upgraded my home machine from jessie to stretch, and now the problem appeared on this machine (my earlier reports were about my Kubuntu box at work).

Firefox version is 45.8.0
(In reply to Alain Knaff from comment #20)
> Just upgraded my home machine from jessie to stretch, and now the problem
> appeared on this machine (my earlier reports were about my Kubuntu box at
> work).
Are you seeing this with version 52?

> Firefox version is 45.8.0

Do you mean Thunderbird?
Flags: needinfo?(mozilla)
Blocks: 1330872
Yes, Thunderbird.

Still occurs with 52.7.0

1757 file descriptors right now... Only works at all because I upped number of open file descriptors (ulimit -n 4096)
Flags: needinfo?(mozilla)
Please add "memory-leak" key word.
Alain, Because you have a very reproducible case, can you run the beta from http://www.mozilla.org/en-US/thunderbird/channel/ with MSGDB logging inabled per https://wiki.mozilla.org/MailNews:Logging

Aceman, what level of logging would you want to see, "1" or "5"?
Flags: needinfo?(mozilla)
Flags: needinfo?(acelists)
See Also: → 1396655
I wanted to backport the logging change so that he does not need to experiment with beta yet.

I think we can try level of 1 for msgdb. But we haven't used this type of logging here, I am also interested in logging from comment 7.
Flags: needinfo?(acelists)
Aceman...

(In reply to :aceman from comment #25)
> I wanted to backport the logging change so that he does not need to
> experiment with beta yet.

Which version do you prefer he test with, 60 or 52?  or nightly?


> I think we can try level of 1 for msgdb. But we haven't used this type of
> logging here, I am also interested in logging from comment 7.

So you want both comment 7 and log level 1?
Flags: needinfo?(acelists)
Summary: Thunderbird leaks file descriptors like a sieve (mostly .msf files) → Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory
(In reply to Wayne Mery (:wsmwk) from comment #26)
> (In reply to :aceman from comment #25)
> > I wanted to backport the logging change so that he does not need to
> > experiment with beta yet.
> Which version do you prefer he test with, 60 or 52?  or nightly?

60.x should be fine.

> > I think we can try level of 1 for msgdb. But we haven't used this type of
> > logging here, I am also interested in logging from comment 7.
> 
> So you want both comment 7 and log level 1?

Yes.
Flags: needinfo?(acelists)
Both on Debian 9.5 at home, and on Kubuntu 18.04 at work, I have thunderbird 52.9.1. That's what's easily available in both distributions.

I can confirm that on Debian, thunderbird still does go beyond 1024 with its file descriptors.

Here at work I'm at 518 right now, but it's still rising. I'm keeping an eye on the error console. Is there anything particular that I should watch for in there?

How do I set log level 1?
Flags: needinfo?(mozilla)
It's now at 1445

Lots of "2018-09-17 09:17:59	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: xxyyzz" entries, and also "2018-09-17 09:18:59	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: aabbcc"


and:
"2018-09-17 09:18:59	mailnews.database.dbcache	INFO	DBs open in a window: 1, DBs open: 12, DBs already closing: 1341"

Could it be that the file descriptors for the "DBs already closing" are still open, adding to the count?
Yes, the "DBs already closing:" will still have open file descriptors. When you see this, try to open some new TB window (like Account settings) or a compose window. The "DBs already closing:" should drop to 0 at that point. Do you observe that? Are the file descriptiors released at that point?
I just tried opening a compose window, DBs already closing didn't drop to zero (still is 1633), and only one filedescriptor was closed (a while after).

Not better with Account settings either :-(
Strange. Do you use any addons/extensions?
keyconfig
Lightning
Quick Folder Move
Spellchecker.lu
Tidybird
Toggle Word Wrap
DOM Inspector
Can you try disabling "Tidybird" and "Quick Folder Move" temporarily? Those seem to handle folders and may be holding onto them.
Indeed, with both of them off, the number of file descriptors is now rising much slower (still below 100!)... and error console now often shows DBs already closing = 0:

2018-09-17 11:49:25	mailnews.database.dbcache	INFO	DBs open in a window: 1, DBs open: 11, DBs already closing: 0
I spoke too soon. Just after posting this, number of open file descriptors jumped up to 566, and doesn't significantly drop even when I open a window to compose a new message.

Error console shows the following, even after opening the compose window:

2018-09-17 11:52:25	mailnews.database.dbcache	INFO	DBs open in a window: 1, DBs open: 7, DBs already closing: 473
This at least shows that our caching works and does not hold 500 folders in the active list.

The question is what in your workflow or in TB causes those 500 folders to get opened and then closed after a while.
The other bug is already known, that even after closing the folders aren't properly released and hang in the "DBs already closing" list. Something must still hold references to them.
Now, it has jumped up to 1844 file descriptors, none are released when I open a compose or Account settings window.

I am not doing anything special here, didn't even send or refile a mail. Basically just opened thunderbird, opened the error console, and let it sit there, and occasionally opened Compose or Account settings to see whether it impacts file descriptors.
Summary: Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory → Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory. (no virtual folders)
I did have a virtual folder (displaying the result of a predefined search).

I removed it, restarted thunderbird, waited a while, and it's again up to 1865 file descriptors.

I do have multiple accounts configured (6, plus local folders)

Could that be a factor?
6 accounts should be fine (I have more), this problem is a function of number of folders. Do you really have those ~2000 folders in those accounts in total? Or are the file descriptors duplicated (pointing to the same msf multiple times) ?

You also have all of them on IMAP which I am not familiar with, e.g. whether we need to open each folder just to check if there are new messages in it on the server.
I do have around folders in those accounts (estimate, as the second biggest is an exchange account for which it is not easy to count the folders)

And indeed, among those descriptors that are indeed for folders, there are no files that are open multiple times.

However, other files (which are not folders) are open multiple times:
      2 /home/alain/.cache/thunderbird/p4u2gnrd.default/startupCache/startupCache.8.little   
      2 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/cache.sqlite-shm   
      2 /home/alain/.thunderbird/p4u2gnrd.default/cookies.sqlite   
      2 /home/alain/.thunderbird/p4u2gnrd.default/cookies.sqlite-shm   
      2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/{cafe3945-058e-47e3-87f8-75bc120b9638}.xpi   
      2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/FolderPaneSwitcher@kamens.us.xpi   
      2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/keyconfig@dorando.xpi   
      2 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite-shm   
      2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite   
      2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite-shm   
      2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite-wal   
      2 /home/alain/.xsession-errors   
      2 /usr/lib/thunderbird-addons/extensions/langpack-en-GB@thunderbird.mozilla.org.xpi   
      2 /usr/lib/thunderbird/omni.ja   
      3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite   
      3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite-wal
In comment 11 you said some msfs are open multiple times. Is that no longer the case?

The other non-msf files you listed ar enot under our control (but the common base with Firefox), so it is probably OK, if they are open twice.
No, this is no longer the case now.
With the "tidybird" extension switched back on, there was now indeed a folder opened twice. I opened and closed a compose window, and then the duplicate was gone.
After using the tidybird extension (to refile a message into a different folder), that target folder is now a duplicate opened folder, and this time it stays open, even after opening a compose or account settings window
Now, after waiting a while (with tidybird still enabled), there are now loads of duplicates (537), for the most folders which aren't even defined as targets in tidybird.

These stay around when opening composer or account settings.
(I wonder how long tidybird has been causing these issues.  I guess we can be thankful there are only 200 some users according to website)
> Still fails (this time with "Failed to read 'formatDateLong' from chrome://calendar/locale/calendar.properties.")

I also experience this problem (and this was how I've landed here).

I'm 60.3.0 on Buster (Debian testing) vanilla (from repository).

I do not use any extension except the calendar ("Lightning" ?), and also don't use any IMAP accounts (just two POP ones).

I suspected this be a different issue, but then found this and checked; surely enough, after restarting Thunderbird, the issue went away.

(My workflow/habits involve desktop uptimes in the weeks range. No shutdown/restart if not necessary.)

Given the above; that I did not have any issue before starting using Lightning; and also that I have no issues at all with the same configuration at home (with 10something POP accounts), I have the suspicion that this may have to do with Lightning somehow.
(In reply to Alain Knaff from comment #46)
> Now, after waiting a while (with tidybird still enabled), there are now
> loads of duplicates (537), for the most folders which aren't even defined as
> targets in tidybird.
> 
> These stay around when opening composer or account settings.

WHat does the author of tidybird have to say?
Flags: needinfo?(mozilla)
Summary: Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory. (no virtual folders) → Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory. 5 imap accounts. no virtual folders.
(In reply to Victor from comment #48)
>...
> Given the above; that I did not have any issue before starting using Lightning;
> ... I have the suspicion that this may have to do with Lightning somehow.

In which case you are in the wrong bug :)

https://mzl.la/2GgMHtr lists some of the calendar memory issues
(In reply to Wayne Mery (:wsmwk) from comment #50)
> In which case you are in the wrong bug :)
> 
> https://mzl.la/2GgMHtr lists some of the calendar memory issues

Maybe yes, maybe not.
I am experiencing specifically the file descriptor issue shown here, the like of which not existing in a tagged-as-calendar state (as far as I could use both the search and Google).

But it is not that I came for help.
What I wanted to ask if perhaps the OP (or, indeed, anybody affected) could try to reproduce the issues with Lightning disabled and report back.

(My daily/habitual workload is "unfortunately" not massive enough to reproduce the issue consistently in a matter of hours/days.)
Okay, I am guilty of misleading.
It turned out that here at work I *do* use IMAP, not POP3. :)
(Still no tidybird or anything else.)

I have now 280 .msf files open. Thunderbird uses 4895 file/stream descriptors currently, of which 1998 contains or refers to "sqlite" (.sqlite, .sqlite-wal, .sqlite-shm).

Is this normal?

(It may well be that the problem did exist beforehand and I only noticed with the calendar being unable to load month names, otherwise being unrelated to Lightning at all.)
This bug makes it impossible to use TB for more than five minutes. This same problem has plagued me on Windows 8 for years, through many different versions of TB. Right now using 64.0.0.6897 aka 64.0b3 32-bit. Output from console:
[
2018-12-14 14:56:37	mailnews.database.dbcache	INFO	Periodic check of cached folder databases (DBs), count=350

2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: Junk
2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: BitsDuJour
2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: Inbox
-- 300+ Lines removed --
-- 322  Lines total   --
2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: 09 Retail
2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: 14 T.....
2018-12-14 14:56:37	mailnews.database.dbcache	DEBUG	Skipping, DB not open for folder: 20 C....

2018-12-14 14:56:37	mailnews.database.dbcache	INFO	DBs open in a window: 0, DBs open: 30, DBs already closing: 320
]
Of course however NirSoft's OpenedFilesView shows that Windows still counts those files as open.
Yes, if those 350 databases are listed,we may still keep the msf file open, even if it claims "DB not open for folder".
Does opening e.g. tools->Account settings window purge all those databases away?
No, opening Account Settings seems to do nothing to actually close the files, at least not within another five minutes.
The message "DB not open for folder" tells me the data structure that tracks the open/close state of the DB file has already lost its connection with reality, even before the periodic purge is attempted. As such, forcing another purge, say by opening a dialog or whatever, would still have no effect.

The annoying part of this is, if TB were to open MSFs only for accounts that have checked for new mail, then we could limit our activity to one account and then re-open TB to work in another account. However there is some periodic process that opens MSFs for other accounts. This happens even when all accounts have "Check for new messages at startup" and "Check for new messages every [X] minutes" UnChecked / Off.

For the next test, "Allow immediate server notifications when new messages arrive" is now UnChecked Off for all accounts.

Is there a config setting to delay this other background process forever?
> there is some periodic process that opens MSFs for other accounts.

It might be Thunderbird, or it might not. To eliminate the later, start your OS (Windows?) in  safe mode, and start Thunderbird in safe mode https://support.mozilla.org/en-US/kb/safe-mode-thunderbird   Any change in behavior?
Screenshots showing MSFs opened by TB.
[url=https://postimg.cc/KKgB7SKc][img]https://i.postimg.cc/KKgB7SKc/Opened-Files-1.png[/img][/url]
[url=https://postimg.cc/9R5TJ8Td][img]https://i.postimg.cc/9R5TJ8Td/Opened-Files-2.png[/img][/url]
There are four more screens to list the 538 files opened by TB (safe mode made no difference).

>> some periodic process that opens MSFs for other accounts.
>It might be Thunderbird, or it might not.

MSFs are never opened without TB running.
When the OpenedFilesCount jumps up, TB still has only five TCP connections open to the IMAP server. When TB closes, the files and connections are also closed. This situation of non-closed files is not a problem when TB has fewer IMAP folders to track, because Windows never hits its limit of file handles. Closing and re-opening TB in order to work with a different account might be bearable. But the MSFs are often corrupted, or at least un-linked from the account profile.

The next time TB goes to use the index and column information in the MSF, the previous information cannot be loaded. If User were to select that folder, then the message headers must be re-populated from the server, and User must re-configure the columns. Even if that folder's messages are never displayed (because User never looks at that account), the MSF are still opened and loaded by that background process.

In case of either the background process kicking off, or in response to User selecting that folder ... If there is any problem with loading the MSF, or TB thinks it does not exist, then TB goes ahead and (re-)creates the MSF, except the previous file *does* already exist. So the new file is created with an incremented counter in the filename. (That is silly. If we're going to create the index from scratch because the previous file is lost from the profile, then just go ahead and overwrite the orphan file. It's not like TB has any way to re-connect that MSF data to the folder displayed in the UI.)

Once a folder+MSF loses its mind like this, possibly every launch of TB re-creates the MSF *again*. The directory gets polluted with FolderName-9.MSF, FolderName-10.MSF, etc. Each file contains only initialization data, because message headers are never actually read from the IMAP server, because the user never looks at that email account.

Whew. That analysis has been simmering and developing for literally years.

Using blue-sky imagination, how could files be kept open by code outside TB, while Windows still thinks the file handles belong to TB? That would have to be some kind of shim, that steals control of the file behind TB's back after TB opens it, without Windows' noticing. Extremely unlikely.

Update - Looking in the Mork code, it seems to have some functionality that supports a kind of MSF-stealing as a use case omg.

Do you guys use the Move/Copy To item on the context menu of a message?
Does that increase number of open database (in the TB console) or msf files permanently?

No, I seldom use those actions on the context menu.
BUT I do have filters for the 5 folders I have set up that run automatically.
And now that you mentioned, I can figure that the issues starting possibly correlate with me setting the folders up back some months ago.
I guess moving by the filters is equivalent from this aspect to moving them manually.

Yes, if a filter moves a message into a folder that is the same as viewing the folder manually or moving a message to it.
If changes the state/contents of the folder so its msf file must be opened for a while.
But it should be closed after the timeout if it is not used any more.

I have played with this now and I could also get 1000 file descriptors for msf files open.
But those were only about 10 unique folders but they were reported many times, each set for each thread of thunderbird (e.g. rendering threads, storage threads, etc.).
Do those count towards the 'open files limit' in linux?

I used 'lsof | grep thunderbird'. Is this the right way to check this bug? It is mentioned this way in comment 0.

For message / context menu / Move/Copy To, TB runs out of file handles on Windows and I have to quit, before I get around to that level of detail of working with messages. If the 'unused file timeout' is set to 'days' then exhausting the handles happens later. If "Check for new messages" is unchecked/off then the crash happens later. If a filter on a viewed folder causes a message to move to a non-viewed folder in another account, then that other account's/folder's filters are also run, which updates still more folders. This causes a large and immediate jump in the number of open MSF/folder index files.

All the MAB / address book files get held open as well, which I don't understand.

"[msf files] were reported many times, each set for each thread [rendering, storage, etc]" nirsoft.net/utils/opened_files_view.html counts each file only once. But docs.microsoft.com/en-us/sysinternals/downloads/process-explorer reports 850 handles owned by TB, even when OpenedFilesView shows only 200 files open. This, plus TB's multi-threading, gives me a hint that the reference counts for the over-held files are somehow not reaching zero among the various threads on Windows and Linux. I bet the lazy cleanup thread either thinks some other thread is still using the files, or it fails to actually close the files.

Consensus seems to be, however, that shared file descriptors (shown as "FD" in lsof output) are counted as a single entry.

Grep does not have a facility to show the header column when filtering, but you may use sed for that:

lsof | sed '1p;/thunderbird/!d'

To get unique entries, try

ls -l /proc/$(pidof thunderbird)/fd

Severity: major → critical
Component: General → Database
Product: Thunderbird → MailNews Core
Version: 38 Branch → 38

Wallace, thank you for staying engaged.

(In reply to Wallace from comment #62)

For message / context menu / Move/Copy To, TB runs out of file handles on Windows

So using this is what tends to induce the problem?
How many accounts and message folders do you have?

All the MAB / address book files get held open as well, which I don't understand.

They stay open for performance reasons

"[msf files] were reported many times, each set for each thread [rendering, storage, etc]" nirsoft.net/utils/opened_files_view.html counts each file only once. But docs.microsoft.com/en-us/sysinternals/downloads/process-explorer reports 850 handles owned by TB, even when OpenedFilesView shows only 200 files open. This, plus TB's multi-threading, gives me a hint that the reference counts for the over-held files are somehow not reaching zero among the various threads on Windows and Linux. I bet the lazy cleanup thread either thinks some other thread is still using the files, or it fails to actually close the files.

Are you sure they are all disk files? IIRC handles are also used for some graphics and other issues.

This same problem has plagued me on Windows 8 for years, through many different versions of TB

Longer than the 4 years this bug has been open?

Flags: needinfo?(mozilla) → needinfo?(MzPrsna)

(In reply to :aceman from comment #13)

It may be possible that even if we close/not cache a DB in the upper layers,
we still keep the file referenced (with a descriptor) in some low layer.

What structures do you consider to be upper and which lower?

I myself observed that e.g. if you open a new window in TB the cached DBs
are actually purged from cache. We still do not know why that is. So there
are unknowns in the backend msf handling.

Four years hence, do we have any further insight into this?

Flags: needinfo?(acelists)

On Thunderbird 60.9.0 (at work), 2370 file descriptors open right now.

Just to chime in, 60.9.0, Linux, 3481 FDs, of which 112 pertains to .msf files.

But those 112 descriptors are shared between exactly 2 .msf files (that is 66 mappings per file).

I'd find it quite fishy that most threads need to open the index file for themselves. I am unsure if this is the best approach.

The others are mostly for .sqlite files.
Most .sqlite files are open similarly at least 60 times.

Except for "places.sqlite" and "webappsstore.sqlite" that take ~300 on their own, with "-wal", "-shm" and the basic extension ~100-100 each.

In fact I am not even sure if I ever opted for anything called webappsstore or places at all, nor if I couldn't live without any of them.

Same thunderbird as earlier this morning (not restarted since):

  • 2376 file descriptors open
  • of which 2271 regular files (the rest are network sockets to the imap servers, pipes, inotify, ...)
  • of which 1947 are .msf files (the rest are mostly sqlite related)
  • all .msf files are unique

... the fact that there only 6 more file descriptors that a couple of hours ago tells me that the phenomenon is not without bound, and firefox opens most of these descriptors within minutes of launch, and then only adds very few later on. This was different in earlier times.

places.sqlite (and places.sqlite-wal) are opened 3 times each, and then there are 2 more places.sqlite-shm

There are also 2 webappsstore.sqlite, 2 webappsstore.sqlite-wal and 2 webappsstore.sqlite-shm

There are 37 .sqlite files in total, of which only 22 are unique. Of the dupes, most are only open twice, and 4 of them are open 3 times and none more than 3 times.

There are also dupes among the TCP connections:
1 ->ceres.aev.etat.lu:2443 (ESTABLISHED)
1 ->ceres.aev.etat.lu:2993 (ESTABLISHED)
2 ->localhost:imap2 (ESTABLISHED)
3 ->mail.ens.lu:imap2 (ESTABLISHED)
2 ->mail.lilux.lu:imap2 (ESTABLISHED)
5 ->mail.lll.lu:imap2 (ESTABLISHED) (mail.lll.lu has 2 accounts)

It may be worthwhile to mention that my uptime is 21 days 23:50, with suspending (not hibernating) between "days".
I also never close Thunderbird unless there is an update or I restart anyway.

Aceman,

Also, is msgdb logging giving you the info needed to show /why/ this is happening? And if not, what next to get the needed info?

(In reply to Alain Knaff from comment #68)

Same thunderbird as earlier this morning (not restarted since):

  • 2376 file descriptors open
  • of which 2271 regular files (the rest are network sockets to the imap servers, pipes, inotify, ...)
  • of which 1947 are .msf files (the rest are mostly sqlite related)
  • all .msf files are unique

... the fact that there only 6 more file descriptors that a couple of hours ago tells me that the phenomenon is not without bound, and firefox opens most of these descriptors within minutes of launch, and then only adds very few later on. This was different in earlier times.

places.sqlite (and places.sqlite-wal) are opened 3 times each, and then there are 2 more places.sqlite-shm

There are also 2 webappsstore.sqlite, 2 webappsstore.sqlite-wal and 2 webappsstore.sqlite-shm

This is getting confusing for me. Can you summarize:

  • Are you seeing similar numerous open files issue for both Thunderbird and Firefox?
  • What sqlite files, and number for each, are open only in Thunderbird?
  • And can you list detailed numbers for the top 10 .msf files?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Thunderbird leaks file descriptors (mostly .msf files) like a sieve causing high memory. 5 imap accounts. no virtual folders. → Thunderbird on Linux leaks file descriptors (mostly .msf files) causing high memory. 5 imap accounts. no virtual folders.

anonymized as needed

(In reply to Wallace from comment #57)

Screenshots showing MSFs opened by TB.
[url=https://postimg.cc/KKgB7SKc][img]https://i.postimg.cc/KKgB7SKc/Opened-
Files-1.png[/img][/url]

Your screen shots have expired. Can you share them privately or attach to the bug report?

Once a folder+MSF loses its mind like this, possibly every launch of TB
re-creates the MSF again. The directory gets polluted with
FolderName-9.MSF, FolderName-10.MSF, etc. Each file contains only
initialization data, because message headers are never actually read from
the IMAP server, because the user never looks at that email account.

In my experience this (-N filenames) tends to happen for accounts which are not logged in at startup time

(In reply to Wayne Mery (:wsmwk) from comment #74)

(In reply to Alain Knaff from comment #68)
[...]

... the fact that there only 6 more file descriptors that a couple of hours ago tells me that the phenomenon is not without bound, and firefox

Sorry, I actually meant thunderbird here

[...]

This is getting confusing for me. Can you summarize:

  • Are you seeing similar numerous open files issue for both Thunderbird and Firefox?

Currently

  • 2356 filedescriptions in thunderbird
  • 3065 filedescriptors in firefox (spread across 11 instances. The instance which has most has 629)
  • What sqlite files, and number for each, are open only in Thunderbird?
  1 /home/alain/.thunderbird/p4u2gnrd.default/blist.sqlite
  1 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/cache.sqlite
  2 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/cache.sqlite-shm
  1 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/cache.sqlite-wal
  1 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/deleted.sqlite
  1 /home/alain/.thunderbird/p4u2gnrd.default/calendar-data/local.sqlite
  1 /home/alain/.thunderbird/p4u2gnrd.default/content-prefs.sqlite
  1 /home/alain/.thunderbird/p4u2gnrd.default/cookies.sqlite
  2 /home/alain/.thunderbird/p4u2gnrd.default/cookies.sqlite-shm
  1 /home/alain/.thunderbird/p4u2gnrd.default/cookies.sqlite-wal
  3 /home/alain/.thunderbird/p4u2gnrd.default/favicons.sqlite
  2 /home/alain/.thunderbird/p4u2gnrd.default/favicons.sqlite-shm
  3 /home/alain/.thunderbird/p4u2gnrd.default/favicons.sqlite-wal
  1 /home/alain/.thunderbird/p4u2gnrd.default/global-messages-db.sqlite
  1 /home/alain/.thunderbird/p4u2gnrd.default/permissions.sqlite
  3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite
  2 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite-shm
  3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite-wal
  2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite
  2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite-shm
  2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite-wal
  • And can you list detailed numbers for the top 10 .msf files?

What do you mean by "top"? Currently, each of the 2007 .msf files is only open once (initially, when first reported, some .msf where indeed open multiple times, but this is no longer the case with current thunderbird)

(In reply to Alain Knaff from comment #77)

(In reply to Wayne Mery (:wsmwk) from comment #74)

(In reply to Alain Knaff from comment #68)
[...]

... the fact that there only 6 more file descriptors that a couple of hours ago tells me that the phenomenon is not without bound, and firefox

Sorry, I actually meant thunderbird here

[...]

This is getting confusing for me. Can you summarize:

  • Are you seeing similar numerous open files issue for both Thunderbird and Firefox?

Currently

  • 2356 filedescriptions in thunderbird
  • 3065 filedescriptors in firefox (spread across 11 instances. The instance which has most has 629)
    ...
  • And can you list detailed numbers for the top 10 .msf files?

What do you mean by "top"? Currently, each of the 2007 .msf files is only open once (initially, when first reported, some .msf where indeed open multiple times, but this is no longer the case with current thunderbird)

I forgot that you said .msf are now open only once - so I withdraw the question. But you say "1947 are .msf files". So you have that many folders? And surely they are not all "active", i.e. you either click on them, or they get email via a filter, or they aren't part of a frequently used or updated virtual folder? Do you have any virtual folders that enumarate all or most of your folders?

Regarding .sqlite files being open more than once - it seems to me this would be a core issue, not a Thunderbird bug. So for purposes of this bug report we can ignore sqlite issues.

Flags: needinfo?(mozilla)

(In reply to Victor from comment #67)

Just to chime in, 60.9.0, Linux, 3481 FDs, of which 112 pertains to .msf files.

Are you able to test version 68?

But those 112 descriptors are shared between exactly 2 .msf files (that is 66 mappings per file).

These two folders, they are folders you created and are not special folders like Inbox, Draft, Template, Spam, Trash ?
Anything interesting about these two folders or the number of times they are open?
Any addons installed and enabled? (other than calendar)
Are they targets of filters, or part of a virtual folder?
If a target of a filter, does the open count reduce significantly if the filter for that folder is disabled?

I'd find it quite fishy that most threads need to open the index file for themselves.

What threads are you talking about?

The others are mostly for .sqlite files.
Most .sqlite files are open similarly at least 60 times.

Except for "places.sqlite" and "webappsstore.sqlite" that take ~300 on their own, with "-wal", "-shm" and the basic extension ~100-100 each.

In fact I am not even sure if I ever opted for anything called webappsstore or places at all, nor if I couldn't live without any of them.

These are files are created because Thunderbird is a gecko-based application, and can be ignored.

Flags: needinfo?(vcsiky)

(In reply to Wayne Mery (:wsmwk) from comment #79)

(In reply to Victor from comment #67)

Just to chime in, 60.9.0, Linux, 3481 FDs, of which 112 pertains to .msf files.

Are you able to test version 68?

Yes, 68.2.2 currently, 15 minutes after starting it, and without opening a single mail for reading, using up 3537 descriptors.
Might be related to the new issue #1597927 that I've just reported; unsure though.

But those 112 descriptors are shared between exactly 2 .msf files (that is 66 mappings per file).

These two folders, they are folders you created and are not special folders like Inbox, Draft, Template, Spam, Trash ?

These two are presumably created by me (names like "f9b2231e-1" and "c57cfb06-1").
But the system folders (again, in 68.2.2) are close followers.

Anything interesting about these two folders or the number of times they are open?

Nothing. This is after startup.

Any addons installed and enabled? (other than calendar)

Nada.

Are they targets of filters, or part of a virtual folder?

No.

If a target of a filter, does the open count reduce significantly if the filter for that folder is disabled?

I guess it would, since without filter, nothing gets inside them. All self-made folders have filters to populate them and without them, they would be empty.
But I cannot really verify this currently due to #1597927 (everything gets reopened and redownloaded anyway).
And again, they just lead by like 100 to 60 handles each (compared to the "system folders").

I'd find it quite fishy that most threads need to open the index file for themselves.

What threads are you talking about?

I have entries like "DNS Resolver", "QuotaManager", "StyleThread#0" to "StyleThread#2", "ImgDecoder #", "Cache I/O", "URL Classifier", "localStorage", "Classif~ Upd", "ImageBridgeChild", "ImageIO", "Compositor", "Softwar~cThread", "dconf worker", "Worker Launc", "GMPThread", "DataStorage", "Timer", "gmain", "AudioIPC Service", "AudioIPC Cal..", "JS Watchdog", "JS Helper", and "Gecko_IOThread", aside the main "thunderbird", certainly.

Surely enough, for some strange reason, every one of these holds a handle to all the system and custom MSF files.
Most of them also holds its own handle for the .sqlite databases as well.

To me, as an outsider, this seems like a certain coding pattern of a "singleton I/O manager" stuff of some sort being injected into the constructor of all the threads "to be available should the need arise". But it may happen that this couldn't be farther from the truth.

In fact I am not even sure if I ever opted for anything called webappsstore or places at all, nor if I couldn't live without any of them.

These are files are created because Thunderbird is a gecko-based application, and can be ignored.

Thank God it is not Webkit/Blink based.
Still, I insist that this sort of stuff is probably completely useless and suggest this being removed from Gecko altogether (or at least made toggleable by some sort of runtime option).
But yeah, even I do think this has to do with the current issue, I understand that this is probably the least probable path the devs would take to begin with.

Flags: needinfo?(vcsiky)

(In reply to Wayne Mery (:wsmwk) from comment #78)
[...]

I forgot that you said .msf are now open only once - so I withdraw the question. But you say "1947 are .msf files". So you have that many folders?

I do have many mail folders: 1327 in my main account, 356 in another one, and then a few more where they are not easy to count.

And surely they are not all "active", i.e. you either click on them, or they get email via a filter, or they aren't part of a frequently used or updated virtual folder?

Nope, I don't click on all these folders :-)

I do have filters, but they are supposed to only operate on INBOX, not on all the other folders. They are supposed to run "before Junk Classification".

Do you have any virtual folders that enumarate all or most of your folders?

I do not have any virtual folders set up on this box.

Regarding .sqlite files being open more than once - it seems to me this would be a core issue, not a Thunderbird bug. So for purposes of this bug report we can ignore sqlite issues.

Flags: needinfo?(mozilla)

(In reply to Alain Knaff from comment #81)
[...]

I do have filters, but they are supposed to only operate on INBOX, not on all the other folders. They are supposed to run "before Junk Classification".

For good measure, I disabled all filters (by unchecking the "Enabled" checkbox next to them), restarted Thunderbird, and let it run a while. By now, there are again 1805 .msf files open.

So I think we can rule out filters.

Walt, Ben, can you reproduce, see, and offer any explanation for the following? And if it exists, is it something we need to prevent/fix?

(In reply to :aceman from comment #61)

...
I have played with this now and I could also get 1000 file descriptors for msf files open.
But those were only about 10 unique folders but they were reported many times, each set for each thread of thunderbird (e.g. rendering threads, storage threads, etc.).

If Thunderbird is controlling the access to each folder, there should be no way for a) DIFFERENT thread to have an open FD for the same folder, b) and also no way to have the same file open in the SAME the thread - otherwise, what good is the Thunderbird data structure?

Do those count towards the 'open files limit' in linux?

If multiple file opens are permitted to happen against the same file within TB, sure.

Flags: needinfo?(wls220spring)
Flags: needinfo?(benc)

https://mzl.la/2Dwzxo9 lists other possible memory reports - but perhaps none are matches this bug report

I learned how to use a new Linux command today! That being the lsof command.

Using lsof +D (path to my production profile with 2 Gmail accounts). I see some sqlite files duplicated, but the FD values are not duplicates and these descriptors for msf files.

TB68rel/ImapMail/imap.gmail-1.com/[Gmail].sbd/Drafts.msf
TB68rel/Mail/smart mailboxes/Sent.msf
TB68rel/Mail/smart mailboxes/Inbox.msf
TB68rel/Mail/smart mailboxes/Junk.msf
TB68rel/Mail/smart mailboxes/Trash.msf
TB68rel/Mail/smart mailboxes/Drafts.msf
TB68rel/Mail/mail.comcast.net/Inbox.msf
TB68rel/ImapMail/imap.gmail-1.com/INBOX.msf
TB68rel/ImapMail/imap.gmail.com/INBOX.msf
TB68rel/Mail/Local Folders/Unsent Messages.msf
TB68rel/News/news.us.Usenet-News.net/comp.infosystems.www.authoring.stylesheets.msf

Using the Ubuntu provided version of Thunderbird and lsof +D (path to my ubuntu thunderbird profile). Again I see sqlite files duplicated, but the FD values are not duplicates and have these file descriptors for msf files.

ubuntu/Mail/smart mailboxes/Sent.msf
ubuntu/Mail/smart mailboxes/Inbox.ms
ubuntu/Mail/smart mailboxes/Junk.msf
ubuntu/Mail/smart mailboxes/Trash.msf
ubuntu/Mail/smart mailboxes/Drafts.msf
ubuntu/Mail/mail.comcast.net-maildir/Inbox.msf
ubuntu/ImapMail/imap.gmail.com/[Gmail].sbd/Bugzilla.msf
ubuntu/ImapMail/imap.gmail.com/INBOX.msf

I expect the Ubuntu profile to have fewer msf files than my production profile because I rarely use it.

HTH

Flags: needinfo?(wls220spring)

(In reply to Victor from comment #80)

(In reply to Wayne Mery (:wsmwk) from comment #79)

What threads are you talking about?

I have entries like "DNS Resolver", "QuotaManager", "StyleThread#0" to "StyleThread#2", "ImgDecoder #", "Cache I/O", "URL Classifier", "localStorage", "Classif~ Upd", "ImageBridgeChild", "ImageIO", "Compositor", "Softwar~cThread", "dconf worker", "Worker Launc", "GMPThread", "DataStorage", "Timer", "gmain", "AudioIPC Service", "AudioIPC Cal..", "JS Watchdog", "JS Helper", and "Gecko_IOThread", aside the main "thunderbird", certainly.

Surely enough, for some strange reason, every one of these holds a handle to all the system and custom MSF files.
Most of them also holds its own handle for the .sqlite databases as well.

Under Linux, don't threads tend to share the same file descriptor table?
So if one thread has an open MSF file, all the threads will show it... (but I'd hope that it'd still only count as one file open toward the per-process upper limit).
There's obviously real problems being caused for some people here, but I'm just wondering if this could be confusing the count.

(In reply to :aceman from comment #61)

I have played with this now and I could also get 1000 file descriptors for msf files open.
But those were only about 10 unique folders but they were reported many times, each set for each thread of thunderbird (e.g. rendering threads, storage threads, etc.).

Ahh, snap! I didn't read this closely enough before I made my previous comment!

Do those count towards the 'open files limit' in linux?

I'd really hope not! (will investigate)

I used 'lsof | grep thunderbird'. Is this the right way to check this bug? It is mentioned this way in comment 0.

Ahh, I think this will count each thread separately. I seem to recall that there's no real distinction between threads and processes in linux, other than what they choose to share between each other. So in that lsof command, all the TB threads will be mirroring the same files I guess.
I'd probably go for:

ls -l /proc/$(pidof thunderbird)/fd | wc -l

or:

lsof -p $(pidof thunderbird)

...to just catch the root thread.

All looks fine when I run either of those on my mostly-empty test setup (a small IMAP account on gmail, and the default local folder). I see 4 .msf files, all open exactly once. (and most of them closing after some minutes idleness).

But if you've got thousands of folders and all the ,msf files are open simultaneously, I could see how that'd hit open-files-per-process limits.

Flags: needinfo?(benc)

(In reply to Ben Campbell from comment #87)

(In reply to :aceman from comment #61)

I used 'lsof | grep thunderbird'. Is this the right way to check this bug? It is mentioned this way in comment 0.

Ahh, I think this will count each thread separately. I seem to recall that there's no real distinction between threads and processes in linux, other than what they choose to share between each other. So in that lsof command, all the TB threads will be mirroring the same files I guess.
I'd probably go for:

ls -l /proc/$(pidof thunderbird)/fd | wc -l

or:

lsof -p $(pidof thunderbird)

...to just catch the root thread.

Wallace, et al,
Do things look better using the above syntax?

Flags: needinfo?(vcsiky)
Flags: needinfo?(mozilla)
Flags: needinfo?(MzPrsna)

Well, this way I only have 110 descriptors.
I guess this is okay.

(In reply to Victor from comment #52)

(It may well be that the problem did exist beforehand and I only noticed
with the calendar being unable to load month names, otherwise being
unrelated to Lightning at all.)

It is perhaps the most important, however, that I am not experiencing this symptom for over like 10 months now.

Flags: needinfo?(vcsiky)

Currently, 483 file descriptors open under the .thunderbird directory, of which 436 are msf files (none of these duplicate)

Some of the other files (not ending in .msf) are duplicate:
2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/{847b3a00-7ab1-11d4-8f02-006008948af5}.xpi
2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/{e2fda1a4-762b-4020-b5ad-a41df1933103}.xpi
2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/quickfolders@curious.be.xpi
2 /home/alain/.thunderbird/p4u2gnrd.default/extensions/togglewordwrap@kiszka.org.xpi
2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite
2 /home/alain/.thunderbird/p4u2gnrd.default/webappsstore.sqlite-wal
3 /home/alain/.thunderbird/p4u2gnrd.default/favicons.sqlite
3 /home/alain/.thunderbird/p4u2gnrd.default/favicons.sqlite-wal
3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite
3 /home/alain/.thunderbird/p4u2gnrd.default/places.sqlite-wal

With only 483 file descriptors, we are now well below the default number of open files limit.

$ ls -l /proc/$(pidof thunderbird)/fd | wc -l
548

(Higher number as that sockets, pipes, and files not under ~/.thunderbird)

Btw, I was rather astonished to see the following:
lr-x------ 1 alain alain 64 Jan 31 10:42 50 -> /dev/shm/org.chromium.L5gd00 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 51 -> /dev/shm/org.chromium.eQ5PB2 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 52 -> /dev/shm/org.chromium.glSud4 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 53 -> /dev/shm/org.chromium.avvbP5 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 54 -> /dev/shm/org.chromium.YMW2q7 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 55 -> /dev/shm/org.chromium.pHU228 (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 56 -> /dev/shm/org.chromium.nyR5Ea (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 57 -> /dev/shm/org.chromium.I29ahc (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 58 -> /dev/shm/org.chromium.WXYiTd (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 59 -> /dev/shm/org.chromium.fX0uvf (deleted)
lr-x------ 1 alain alain 64 Jan 31 10:42 60 -> /dev/shm/org.chromium.wDHJ7g (deleted)

Why is thunderbird mapping Chromium shared memory (not even installed...)

Flags: needinfo?(mozilla)

Chiaki, any thoughts on what is happening and how to capture?

Flags: needinfo?(acelists) → needinfo?(ishikawa)
Summary: Thunderbird on Linux leaks file descriptors (mostly .msf files) causing high memory. 5 imap accounts. no virtual folders. → Thunderbird on Linux leaks file descriptors (mostly .msf files) causing high memory. 5 imap accounts. no virtual folders. "Unable to open the summary file for Draft" and "too many files open"

(In reply to Wayne Mery (:wsmwk) from comment #91)

Chiaki, any thoughts on what is happening and how to capture?

I read the posts above.
I think I don't hit the limit probably because I use pop3 instead of imap and
because I have only about a couple of hundred folders.
But my prlimit output under linux (this is a home linux image I use inside virtualbox to develop TB patches.)
I believe my office PC has the same setting.

ishikawa@ip030:/home/ishikawa$ prlimit
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited bytes
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited bytes
LOCKS      max number of file locks held       unlimited  unlimited locks
MEMLOCK    max locked-in-memory address space 2099136000 2099136000 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024    1048576 files
NPROC      max number of processes                 63890      63890 processes
RSS        max resident set size               unlimited  unlimited bytes
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals           63890      63890 signals
STACK      max stack size                        8388608  unlimited bytes
ishikawa@ip030:/home/ishikawa$ 

So I have not hit the file descriptor limit before, I think.

On my home PC, I have used TB under Windows10 for some time, and I don't see the issue either.
(Actually, I counted folders on my home PC, which has a couple of hundreds. My office PC as a similar count or few folders)

In addition to knowing the current open file descriptors using lsof as noted above, we can trace the open and close system calls to see what files are opened and then closed dynamically by using strace. I used this to track the subtle file open/close timing issue which plagued windows port of my patches a few times.
Under linux, it is OK to rename/delete files while there are open descriptors to these files. However, under Windows, renaming/deleting fails while there open file descriptors. Thus, we need to close the file descriptors before attempting to do so under Windows.

strace has an option to print the pathname when close() is called.: which is handy. I did not know this when I attempted the above. So I needed to keep track of the association of FD and file path when I traced the file open/close using an awk script to track down the issue of file closing timing. Very educational.

       -y          Print paths associated with file descriptor arguments.

So one of the following would be handy to monitor what goes on:

strace -ff -y --trace=/open,/close,/dup thunderbird/thunderbird

strace -ff -y --trace=/open,/close,/dup -p PID-OF-TB-PROCESSS

-ff is necessary because TB spawns a few processes during startup.
-y is the tracking of pathname as noted above.
--trace=/open,/close,/dup is to limit the capture to system calls that match {open|close|dup}.
There are openat() system calls instead of open, and dup is used to duplicate a file descriptor. All these need to be captured.
That is why regular-expression match is used instead of exact "open", etc.
( I think I have to re-check my memory here to make sure I list all the relevant system calls.
I checked. There are dup, dup2, dup3, etc. I think the above is a good enough first cut.)

I have a theory about a possible leak.
I can't recall in which bugzilla entry I mentioned about the following issue.
But during shutdown, I saw a strange sequence of events.
An object was destroyed, and as part of destroy/delete operation, it called the destructor which
calls a destructor of another class, unfortunately object that is referenced is gone, and if my memory is correct, it had something to do with file I/O.
So obviously, there are lazy I/O operation to invoke the file operation at the last minute (i.e., the file close operation done at object destruction).
The reference to an object that manages the file descriptor may not be refcounted very well (?)
But if so, even my office PC that has only a couple hundreds would have run out of 1024 FD limits already. I don't shutdown office PC's TB.
So if this refcount issue exists at all, I think it is related to IMAP handling only.

But anyway, we can trace the handling of open/close/dup, etc. to see if there are obvious anomaly.

Flags: needinfo?(ishikawa)
See Also: → 1708842
See Also: → 1593039

Alain, Victor,
Are you still seeing this issue with version 102, or with beta?

Flags: needinfo?(vcsiky)
Flags: needinfo?(mozilla)

(In reply to Wayne Mery (:wsmwk) from comment #93)

Alain, Victor,
Are you still seeing this issue with version 102, or with beta?

Unfortunately, I cannot say, as I recently reverted to 78.14.0 to get a usable calendar back.

Flags: needinfo?(mozilla)

(In reply to Wayne Mery (:wsmwk) from comment #93)

Alain, Victor,
Are you still seeing this issue with version 102, or with beta?

As I have written it above, I did not experience the issue for quite some months even back 3 years ago.
I do have calendar issues just like Alain (like bug 1806800), but I am still using 102, and I am pretty sure I do not have this issue any more.

So this may be resolved for me.

Flags: needinfo?(vcsiky)
Attachment #9385835 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: