Closed Bug 1726319 Opened 3 years ago Closed 2 years ago

tb continuously downloading messages, stuck indexing large gmail account, cannot read offline

Categories

(MailNews Core :: Networking: IMAP, defect, P1)

Thunderbird 93
Unspecified
All

Tracking

(thunderbird_esr91 unaffected, thunderbird99 wontfix, thunderbird100+ wontfix, thunderbird101+ wontfix)

RESOLVED FIXED
102 Branch
Tracking Status
thunderbird_esr91 --- unaffected
thunderbird99 --- wontfix
thunderbird100 + wontfix
thunderbird101 + wontfix

People

(Reporter: nicolopiazzalunga, Assigned: gds)

References

(Regression)

Details

(Keywords: perf, regression)

Attachments

(10 files, 5 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

I have a gmail account setup with imap.
It has around 60k messages.
Although I set the option to syncronise all msgs locally, if I work offline not all messages are there.
Moreover, in activity manager, tb is stuck indexing messages at 0%.
I tried to both repair forlder and force syncronization, but this does not solve my issue.

Although I set the option to syncronise all msgs locally, if I work offline not all messages are there.

Might be a bad message.

But I would start by deleting global-messages-db.sqlite from your profile. Does that help?

Flags: needinfo?(nicolopiazzalunga)

Thank you, that helps, as it solves the indexing problem.
Now it seems I can read messages offline.
However, TB is stuck dowloading messages from my gmail account:
even though my gmail is only around 7GB, TB keeps downloading stuff, and its size reaches dozens of GB, at some point even 100+ GB.
Do you know what could be the issue?
I tried compacting folders, but then at some point it resumes dowloading stuff..

Flags: needinfo?(nicolopiazzalunga)

Thoughts?

We get other reports of such issues. Somewhat rare, super difficult to diagnose.
I'm pressed for time, so this query is only a rough approximation of such reports. https://mzl.la/3Aj4YOb

Severity: -- → S2
Component: Untriaged → Networking: IMAP
Flags: needinfo?(unicorn.consulting)
Flags: needinfo?(gds)
Product: Thunderbird → MailNews Core
Summary: tb stuck indexing large gmail account, cannot read offline → tb continuously downloading messages, stuck indexing large gmail account, cannot read offline

p.s. the ultimate situation of course is to get us enough information that we can fix the code, not just get the user running.

This requires some pain on the part of the user to gather reproducible data and information for us, before destroying the situation that causes the problem.

I can provide info, just ask what you need, when you have time.
I'm running TB 78.13.0, as provided by my distro, void lixux, on x86_64-musl, you can see how it is built and patched here
https://github.com/void-linux/void-packages/tree/master/srcpkgs/thunderbird

I guess the first thing is to record an IMAP:5 log for just the problem gmail account. If you have other accounts it would be good to "disable" them like this before recording the log. This can be accomplished by un-selecting 3 item in the accounts Server settings for each non-problem account:

  • Check for new messages at startup
  • Check for messages every X minutes
  • Allow immediate notifications when new messages arrive

Then shutdown TB and prepare to record the IMAP:5 log as described here: https://wiki.mozilla.org/MailNews:Logging

Note: On linux from a bash cmd line, I run TB and record the log like this:
MOZ_LOG=IMAP:5,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
rather than following the exactly method described in the wiki. This produces a log at ~/tblog.moz_log and assumes thunderbird is in your path (if not, just add it in front of "thunderbird").

When you run tb, only click into folders on the problem account.

Let it record the log for a while and let me know which folders you select or other activities you do within TB while recording the log. Not sure how long to ask you to record but the file can get really big depending on what is really happening. Anyhow, shutdown tb and attach the log above using the "Attach New File" button.

Other info that will be helpful is the names of folders within the server and how many messages in them. Are you subscribed to the "[Gmail]/All Mail" folder? It can get really big and sometimes causes problems. If you want to subscribe to it, it might be better to just let TB store its headers and leave the message bodies on the server since everything there is already in other folders.

Anyhow, this is enough to get started...

Flags: needinfo?(gds)

I had some errors in my command line for recording the log. I've fixed them above. But just to be sure, here's the correct command line:
MOZ_LOG=IMAP:5,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
P/S: You can watch log file, e.g., "tail -f ~/tblog.moz_log", to make sure something is getting recorded before letting tb run a long time.

Thanks.
In settings->copies and folders->message archives, I deselected "keep message archives in All Mail".
In settings->syncronization and storage->message syncronization, I deselected "keep messages in all folders for this account on this computer", and under advanced I only subscribed for offline use to inbox, drafts, and sent mail.

The gmail folders are as follows:

  • Inbox: 44524 messages
  • gmail->Drafts: empty
  • gmail->Sent Mail: 9091
  • gmail->All Mail: 53130
  • gmail->Spam: empty
  • gmail->Trash: 1407
  • gmail->Important: 34668
  • gmail->Starred: empty

If I play around the gmail folder for around 10 mins, the log file gets already pretty big, like 250+ MB, and I'm unable to upload it here.
Let me try for an even shorter time, for a start.

I think, when I'm hitting the problem of TB trying to download (again) messages, suddenly the log file jumps from dozens of kB to dozens on MB, let me try again.

btw, this is printed on terminal, not sure if relevant:

$ MOZ_LOG=IMAP:5,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
[calBackendLoader] Using Thunderbird's libical backend
console.debug: "Successfully loaded OpenPGP library librnp.so version 0.14+git20210121.7c8492b4.MZLA from /usr/lib/thunderbird/librnp.so"
console.warn: "[l10nregistry] Attempting to synchronously load file\n resource:///chrome/en-US/locale/en-US/calendar/messenger/openpgp/msgReadStatus.ftl while it's being loaded asynchronously."
JavaScript error: resource://gre/modules/ActorManagerChild.jsm, line 297: TypeError: singletons is null
JavaScript error: resource://gre/modules/ActorManagerChild.jsm, line 297: TypeError: singletons is null
JavaScript error: resource://gre/modules/ActorManagerChild.jsm, line 297: TypeError: singletons is null
Extension error: Error while loading 'jar:file:///usr/lib/thunderbird/omni.ja!/chrome/messenger/search-extensions/twitter/manifest.json' (NS_ERROR_FILE_NOT_FOUND) resource://gre/modules/Extension.jsm:570 :: readJSON/</<@resource://gre/modules/Extension.jsm:570:20
onStopRequest@resource://gre/modules/NetUtil.jsm:128:18

console.debug: "Trying to load /usr/lib/thunderbird/libotr.so.5"
console.debug: "Trying to load libotr.so.5 from system's standard library locations"
console.debug: "Trying to load libotr.so from system's standard library locations"
console.log: (new Error("Cannot load required OTR library", "resource:///modules/OTRLib.jsm", 91))
console.debug: "Found 0 public keys and 0 secret keys (0 protected, 0 unprotected)"
2021-09-17 10:38:06 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com

2021-09-17 10:38:28 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com

2021-09-17 10:38:59 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com

Is there another way to share with you the large log file?

I think, the minimal length of log file to capture the issue is
$ du -sh ~/tblog.moz_log
17M /home/nicolo/tblog.moz_log
which is moments after it starts downloading the messages from inbox, and after it tried to open inbox folder for a few minutes, which is more than the 10 MB allowed here.

Perhaps upload it to one of the file sharing sites and post a link might be the simplest. Something as simple as one drive would work.

Flags: needinfo?(unicorn.consulting)

Is this https://paste.c-net.org/CorpusRubbish visible to you?
does it provide any useful hint?

(In reply to nicolopiazzalunga from comment #8)

Thanks.
In settings->copies and folders->message archives, I deselected "keep message archives in All Mail".

That doesn't do an unsubscribe of All Mail. That only disables the archiving feature (another right-click option and a button above the message details). To actually unsubscribe from a folder, you right click the account and go into the "subscribe" dialog and un-check the folder. However, the log you have provided so far doesn't try to do anything with All Mail, so maybe not (yet) an issue.

In settings->syncronization and storage->message syncronization, I deselected "keep messages in all folders for this account on this computer", and under advanced I only subscribed for offline use to inbox, drafts, and sent mail.

That sounds good but it won't remove existing messages that are stored. You have to either go into the profile and delete the corresponding "mbox" file or right-click the folder and do a "repair". (I would recommend just deleting the mbox file if you you have a lot of message in folders you don't want to store locally, like "Important" folder below. (The only real reason to store message locally, especially for gmail, is if you need to read messages while offline or you are afraid gmail server will somehow lose or delete your messages and you want a local backup.)

Ok, I see in your comment 0 you do want to read messages offline.

The gmail folders are as follows:

  • Inbox: 44524 messages
  • gmail->Drafts: empty
  • gmail->Sent Mail: 9091
  • gmail->All Mail: 53130
  • gmail->Spam: empty
  • gmail->Trash: 1407
  • gmail->Important: 34668
  • gmail->Starred: empty

In the log you attached it looks like tb was starting to download (fetch) messages it did not yet have the headers for and didn't have the bodies stored locally and I only see fetches into Inbox. It looks perfectly normal then you shutdown tb. On restart, it should continue from where it left off and fetch and store any new messages.
I assume while recording the log you were clicked on gmail Inbox and didn't do anything else?

I'm also not sure from reading your comment 0 and later if the issues you see are for a new gmail account setup in tb or if you've had the account on tb for a while and it just started having problems.

I only subscribed for offline use to inbox, drafts, and sent mail.

I would recommend doing this: With tb shutdown, go into your profile and delete the mbox and msf files for the "offline" folders you want. (You can make a backup of the files if you are worried.) The file are in a location like this:
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/INBOX.msf
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/INBOX
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/[Gmail].sbd/Sent Mail
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/[Gmail].sbd/Sent Mail.msf
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/[Gmail].sbd/Drafts
~/.thunderbird/kgmivl5o.default-release/ImapMail/imap.gmail.com/[Gmail].sbd/Drafts.msf

The names without the .msf are the mbox files. (I'm assuming you are setup to use the default mbox storage and not maildir storage.)

On tb startup and with Inbox clicked, this will force a complete re-download of everything in Inbox from gmail. It will also do a clean rebuild of the .msf indexes which is better than the "repair". It will take a while to do the download and the tblog file will get big if you record it. You can monitor the file with "tail" if you want to make sure tb is still doing stuff.

You can then move to Sent Mail and Drafts and do the same thing after Inbox finishes.

This will also force a re-index for global search that Wayne suggested a fix for above. So it might be good to also delete the global-messages-db.sqlite along with the mbox and .msf files so that starts fresh too.

Since you have so many messages and the log file is so big and takes a long time, I'm not sure what else to do. The only alternative would be to let me setup your gmail account with your credentials and test it from here. But I doubt if that would be acceptable.
I also have a archive of 17k messages (not as many as your 60k) but I'll try testing my gmail account with it and see what happens.

btw, this is printed on terminal, not sure if relevant:
<snip>
$ MOZ_LOG=IMAP:5,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
2021-09-17 10:38:06 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com>
2021-09-17 10:38:28 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com
2021-09-17 10:38:59 autosyncActivities ERROR OnDownloadError: Inbox of redacted@gmail.com

These might be. But probably occurred when you shutdown tb. The times don't match the log.

You might also disable "Global Search and Indexer" in the General preferences while you are letting tb re-download your stored messages. To take effect, a tb restart is needed. Then once everything is re-fetched, stable and working (hopefully) as you expect, you can turn it back on. FWIW, my 17k messages (most with very large headers and some large attachments) took 5 hours to finish just the indexing. But this occurs in the background and doesn't really slow down tb a lot. However, it did make my laptop battery discharge faster than usual.

(In reply to gene smith from comment #16)

I assume while recording the log you were clicked on gmail Inbox and didn't do anything else?

Correct, I only clicked on gmail folders, possibly inbox and sent.

I'm also not sure from reading your comment 0 and later if the issues you see are for a new gmail account setup in tb or if you've had the account on tb for a while and it just started having problems.

It has been stuck indexing for many months, not downloading messages for offile use.
Deleting just the sqlite database solved that, but then it went downloading in cycles, with TB folder getting to 100+ GB.

I went ahead and deleted imap.gmail.com* and global-messages-db.sqlite
With your suggestions applied, I will let it download for a while and report back.
Thanks.

These might be. But probably occurred when you shutdown tb. The times don't match the log.

Yes, it was from another session.

I'm populating my gmail Inbox with, right now, 47141 fairly big messages from my test archive and so far looks OK and still adding more. Hope I don't go over my gmail quota :). Right now its only "appending" messages from my local imap server to gmail server. Once that finishes I'll click on gmail Inbox so the messages are downloaded to gmail Inbox file in tb profile.
Did your "Inbox" file go to 100+ GB? What size was your Inbox.msf? (With my test message archive they have some unusually large headers which cause the .msf to also get pretty large.)

Ok, got the messages copied into the gmail Inbox. Seems to be working OK. Can also see the messages when offline. Size of the INBOX profile files seem about as expected since I have almost 50k messages and many are huge:

gene@xps17lt:~/.thunderbird$ ll ./kgmivl5o.default-release/ImapMail/imap.gmail.com/INBOX*
-rw------- 1 gene gene 1467102934 Sep 17 22:28 ./kgmivl5o.default-release/ImapMail/imap.gmail.com/INBOX
-rw-rw-r-- 1 gene gene   27659036 Sep 17 22:29 ./kgmivl5o.default-release/ImapMail/imap.gmail.com/INBOX.msf

I haven't tried to populate other gmail folders since INBOX should be representative of your issue.
I still have the "global indexing" switched off since that's a separate issue.

(In reply to gene smith from comment #19)

Did your "Inbox" file go to 100+ GB? What size was your Inbox.msf?

Cannot recall that, sorry.

It only took a 3 hours to download headers and messages; right now I have

$ ll
total 4731176
-rw------- 1 nicolo nicolo 4812831326 Sep 18 15:55 INBOX
-rw-r--r-- 1 nicolo nicolo 31867216 Sep 18 16:26 INBOX.msf
-rw-r--r-- 1 nicolo nicolo 0 Sep 18 10:29 Templates.msf
-rw-r--r-- 1 nicolo nicolo 1194 Sep 18 10:29 '[Gmail].msf'
drwx------ 2 nicolo nicolo 4096 Sep 18 11:08 '[Gmail].sbd'
-rw-r--r-- 1 nicolo nicolo 198 Sep 18 10:30 filterlog.html
-rw-r--r-- 1 nicolo nicolo 25 Sep 18 10:29 msgFilterRules.dat

I stilll have indexing turned off, as suggested.
I will monitor it for a few days, for disk and cpu usage, and report back.

Thanks.

For some time TB was behaving fine.
Just now, it started again downloading stuff:

Bringing folder Inbox up to date
...Downloading message 1 of 645 in Inbox...

Disk usage seems fine, but cpu usage is a bit higher than idle.

Also, is any of these relevant?

09:59:00.485
NS_ERROR_NOT_AVAILABLE: 4 ActivityManager.jsm:127
getActivity resource://gre/modules/ActivityManager.jsm:127
removeActivity resource://gre/modules/ActivityManager.jsm:80
onFolderRemovedFromQ resource:///modules/activity/autosync.jsm:244
_timerCallback resource:///modules/AppIdleManager.jsm:30
10:30:21.785
TypeError: this._idleService is undefined
imCore.jsm:152:20
_checkIdle resource:///modules/imCore.jsm:152
observe resource:///modules/imCore.jsm:122
toggleOfflineStatus chrome://messenger/content/mail-offline.js:65
oncommand chrome://messenger/content/messenger.xhtml:1
10:31:27.527
NS_ERROR_NOT_AVAILABLE: ActivityManager.jsm:127
getActivity resource://gre/modules/ActivityManager.jsm:127
removeActivity resource://gre/modules/ActivityManager.jsm:80
onFolderRemovedFromQ resource:///modules/activity/autosync.jsm:244
_timerCallback resource:///modules/AppIdleManager.jsm:30

Bringing folder Inbox up to date
...Downloading message 1 of 645 in Inbox...

Did you see this in the "Status bar" at the bottom or in the "Activity Manager"?
Were there really 645 new message coming in or was it a false alarm?
What were you doing in tb when you saw this happen?
Were you recording the imap log when this happened?
Did tb ever go back to normal when finished downloading the 645 messages?
Are you doing any filtering of messages in tb (i.e., moves messages received from inbox to another gmail folder)?
Are you using "server side filters" (i.e., gmail itself delivers mail to non-Inbox folders).
How often are you having tb check gmail for new messages? (default is 10 minutes).
Are you using imap idle. This is called "immediate notification" in Server settings. (I think you are per the log you provided.)
Do you have other devices or other tb instances or gmail website accessing your gmail account?
Are your gmail setting for imap at gmail site at the default?

Disk usage seems fine, but cpu usage is a bit higher than idle.

I typically see about 3 or 4% cpu usage in "top" for tb. I'm actually running two tb instances for debug/test purposes.
When indexing (i.e., after I've copied in a lot messages to Inbox) I'll see 70 to 100% cpu usage on tb but goes back to 3 or 4% when done but can take many hours if 50k messages.

Also, is any of these relevant?

I assume you are see this on the console? I think these will also appear in the Error Log founded under developer tools. I don't see these with my two tb instances running even when new messages come in. It looks like they are about "Activity Manger" and autosync and the "issues" are seen in .js files. Have you made any changes that might affect autosync? That would be in the Config Editor and the is controlled by
mail.server.default.autosync_offline_stores which defaults to true. (The config editor is accessed with a button near the bottom of "General Settings".)
FYI, autosync causes messages you haven't yet accessed to be automatically downloaded to the selected folder and stored locally (a.k.a, for offline use). The download of 645 messages sounds like that is what is happening. But if you don't really have 645 new unread messages, it shouldn't be downloading or saying it's downloading.
You might disable autosync (set autosync_offline_stores to false) and restart tb and see if that helps.

Another remote possibility is that this is a packaging problem from your distro. You can just download a pre-built tb tarball and try running tb directly from there after unpacking. E.g,,
https://ftp.mozilla.org/pub/thunderbird/releases/91.1.1/linux-x86_64/en-US/thunderbird-91.1.1.tar.bz2
If this doesn't help or won't run at all, you can just delete the tar.bz2 file and the unpacked thunderbird directory. It won't affect your already installed from distro package tb and they will both use your current tb profile (but only one can run at a time with your single profile).

Let me first say that the issue went away after restarting TB, and it hasn't come back for the whole day yet.

(In reply to gene smith from comment #24)

Did you see this in the "Status bar" at the bottom or in the "Activity Manager"?

It was in the activity manager.

Were there really 645 new message coming in or was it a false alarm?

No, there was no new message.

What were you doing in tb when you saw this happen?

Nothing, just let it idle.

Were you recording the imap log when this happened?

Unfortunately not.

Are you doing any filtering of messages in tb (i.e., moves messages received from inbox to another gmail folder)?
Are you using "server side filters" (i.e., gmail itself delivers mail to non-Inbox folders).

No.

How often are you having tb check gmail for new messages? (default is 10 minutes).

10 mins.

Are you using imap idle. This is called "immediate notification" in Server settings. (I think you are per the log you provided.)

Yes. Should I disable it?

Do you have other devices or other tb instances or gmail website accessing your gmail account?

No.

Are your gmail setting for imap at gmail site at the default?

Yes.

I assume you are see this on the console?

They were on the developer tools->error console.

I have default value (true) mail.server.default.autosync_offline_stores.

But if you don't really have 645 new unread messages, it shouldn't be downloading or saying it's downloading.

Indeed, I don't.

You might disable autosync (set autosync_offline_stores to false) and restart tb and see if that helps.

Then I'll set this to false should the issue arise again, ok?

Another remote possibility is that this is a packaging problem from your distro.

The thing is, I'm using musl libc (as opposed to glibc), so I'm not sure I can use that tarball.
Do you see anything relevant in how my distro builds TB?
https://github.com/void-linux/void-packages/tree/master/srcpkgs/thunderbird
Look in particular at patches.

Thanks.

Let me first say that the issue went away after restarting TB, and it hasn't come back for the whole day yet.

What was it doing before you restarted? Was it attempting to download the 645 messages and stuck trying to for a long time? Or did you see it start the download and then restarted tb?

Did the mbox file get a lot bigger even with no new messages?

Then I'll set this [autosync_offline_stores] to false should the issue arise again, ok?

I've never seen autosync cause a problem. Something is causing tb to think there are "new" messages when there isn't. This occurs when tb checks for new mail on the timed interval and gmail reports (or tb thinks it reports) a change in the number of message in the mailbox. Rather than disabling autosync, maybe you could try one other thing first. Try enabling something called "condstore" which is turned off by default. It somewhat changes the dynamics of checking for new mail:
mail.server.default.use_condstore
I've been doing some test here with it turned on and it generally seems to work better, although I haven't seen anything like what you see. Again, this requires a tb restart to take effect.

The thing is, I'm using musl libc (as opposed to glibc), so I'm not sure I can use that tarball.
Do you see anything relevant in how my distro builds TB?
https://github.com/void-linux/void-packages/tree/master/srcpkgs/thunderbird
Look in particular at patches.

Yes, tb definitely requires glibc 2.17 or higher.
I looked at the some of the many patches but don't really understands what they do. Looks like it just builds tb and if it doesn't outright crash, it should probably work. None of the patches directly affect tb imap functionality, AFAICT.

Nicolo,
One more thing regarding condstore. It adds some more info to the imap log but you have to enable it to appear. So you need to include IMAP_CS in the MOZ_LOG environment var, e.g.,
MOZ_LOG=IMAP:5,IMAP_CS:5,timestamp,sync

(In reply to gene smith from comment #26)

What was it doing before you restarted? Was it attempting to download the 645 messages and stuck trying to for a long time? Or did you see it start the download and then restarted tb?

iirc, it was just idle.
It was attempting to dowload the (putative) first new message, then I just restarted it, and the issue hasn't come back yet.

Did the mbox file get a lot bigger even with no new messages?

I don't think so.
$ ls
imap.gmail.com imap.gmail.com.sbd
and the first
$ ll
4819309344 Sep 21 10:08 INBOX
31965390 Sep 21 10:08 INBOX.msf
0 Sep 18 10:29 Templates.msf
1194 Sep 18 10:29 '[Gmail].msf'
4096 Sep 18 17:38 '[Gmail].sbd'
198 Sep 18 10:30 filterlog.html
25 Sep 18 10:29 msgFilterRules.dat
which seeems pretty constant
$ du -sh ~/.thunderbird/9xd11snr.default/ImapMail/imap.gmail.com/
5.9G /home/nicolo/.thunderbird/9xd11snr.default/ImapMail/imap.gmail.com/

Try enabling something called "condstore" which is turned off by default. It somewhat changes the dynamics of checking for new mail:

Ok, so, even if im not seing the issue right now, I'll turn on mail.server.default.use_condstore and monitor for a while with MOZ_LOG=IMAP:5,IMAP_CS:5,timestamp,sync

None of the patches directly affect tb imap functionality, AFAICT.

Ok, I'll just assume musl does not interfere with imap for the moment.

I'll let it run for a while and post the file here again.
Thanks.

After running for a few hours, no major issue.
However, I still think there's something wrong.
I noticed TB has issues with the last received message, as it keeps downloading it every time I click on it.
This also shows up in activity manager as 'total nr of msg downloaded: 1' every few minutes, and the last message is not available offline.

I attach a smaller log that shows this issue of re-downloading.
I also have a larger log for the several hour run, but it is full of personal data, like message contents etc, so I'd have to clean it first.

Attached file tblog.moz_log

(In reply to gene smith from comment #26)

Something is causing tb to think there are "new" messages when there isn't. This occurs when tb checks for new mail on the timed interval and gmail reports (or tb thinks it reports) a change in the number of message in the mailbox.

I think this is precisely what is going on, even with condstore on.

If I turn on indexing again, the phenomenon is emphasized, and it starts downloading 'new' stuff again, this time imap.gmail.com folder is growing (from 5.9 to 6.4G).

Attached image 1632230069.png

Sorry for the delay in responding.
So it seems that when you open your newest email (UID 56393) tb doesn't think it's stored locally in the mbox file so it gets fetched (downloaded) again. Is that true also of any email or just your newest email in INBOX?

Also, now with indexing turned back on, you see the mbox file grow. Is tb also re-fetching the messages when you look at the log while indexing is happening with no other user activity? For me, when I turn on indexing I don't see any fetching from IMAP server going on (except for normal periodic checks for new mail).

My mbox file (INBOX) has about the same number of messages as yours but it is "only" about 1.5G compared to about 5G for yours. I thought my messages were pretty big with lots of attachments (.jpg's). Do you have a lot of large attachments in your INBOX mbox file?
At one time there was a 4G limit on mbox file sizes but was removed by incorporating a 64 bit index into the database (.msf file) per this: tb limits.

You can add a column to sort by size. You can also add a column "order received" that will list the message UID which is useful when looking at the imap log since most fetching is done by UID.

Anyhow, don't know what to suggest right now. I don't think the "condstore" really makes a difference so you can turn that off for now if you want. However, leaving the IMAP_CS in the MOZ_LOG var is still somewhat helpful even with "CS" switched off.
Maybe I need to somehow bump my INBOX size so it goes more the 4G somehow (actually, I don't receive much email :)).

(In reply to nicolopiazzalunga from comment #33)

If I turn on indexing again, the phenomenon is emphasized, and it starts downloading 'new' stuff again, this time imap.gmail.com folder is growing (from 5.9 to 6.4G).

Are you also seeing the message count and/or the unread count also increase for Inbox?

I now have over 5G folder size and 85k messages and don't see a problem yet. But you said it takes a while to start happening.
About how many new emails are added to your Inbox (received) per day?

With indexing on I don't see anything bad happening yet.

(In reply to gene smith from comment #35)

Sorry for the delay in responding.

No worries.

So it seems that when you open your newest email (UID 56393) tb doesn't think it's stored locally in the mbox file so it gets fetched (downloaded) again. Is that true also of any email or just your newest email in INBOX?

I think without indexing that was happening for newest one only, while with indexing it happened for more than one msg.

Also, now with indexing turned back on, you see the mbox file grow. Is tb also re-fetching the messages when you look at the log while indexing is happening with no other user activity? For me, when I turn on indexing I don't see any fetching from IMAP server going on (except for normal periodic checks for new mail).

Yes, somehow it was also refetching lots of messages.
The message count was staying constant.
Around 10-15 new messages to that inbox per day.

My mbox file (INBOX) has about the same number of messages as yours but it is "only" about 1.5G compared to about 5G for yours. I thought my messages were pretty big with lots of attachments (.jpg's). Do you have a lot of large attachments in your INBOX mbox file?

Yes, as a matter of fact yesterday I spent a few hours deleting hundreds of messages with large attachments, and now the situation is back to normal: no activity, indexing on, and size of folder of 4.3G.

Maybe those large attachments were creating issues or triggering some bug in either tb or gmail.
Now I'm back with everything as before, except I'm not subscribed to AllMail.
Will monitor again for a few days and report.

Thanks.

Maybe those large attachments were creating issues or triggering some bug in either tb or gmail.

If you have them in trash or have moved them to another folder and if they are not too personal, maybe you could zip the mbox file and send it to me to test with. I can't seem to see a problem here with my email set.

I was thinking maybe a folder larger than can be accessed with 32 bit offset was causing the problem. But mine is now >5G and you are still >4G and it's working OK for you, at least for now.

The email set I have isn't more than 4G (only about 1.5G). So the only way I can get my folder size above 4G is to copy the messages several times so I have dups. Unfortunately, a special feature of gmail imap is that it doesn't allow duplicate messages in a folder. You can copy them to the gmail folder and no error is reported and gmail just throws any dups away and doesn't show dups. So the only way I could reach >4G was to copy my email set multiple times to a dovecot imap server, which doesn't care if you copy in duplicates. (The basic handling of imap messages is mostly server independent.)
Anyhow, I still haven't seen a problem with either server, with or without global indexing.

Thanks, but I realised trash was taking up over 2GB, so I emptied it.
Now I'm also at around 2GB total, after deleting more messages.
I'm no longer experiencing any issues.

If you're trying hard to reproduce my previous 'issue',
I suggest generating many messages with large attachments (around 10MB say),
at the same time trying to fill a large trash folder.

My email set that I'm testing with has about 60 messages of size 10G to 22G 10M to 22M and I still don't see a problem and have many more with smaller attachments or no attachments (total 85291 msgs).
Also, my Trash folder has about 1.3Gbytes.
Not sure why Trash would be special or cause problems since it's really just another folder and not really anything unique. But I'll keep it in mind.
Might be good to disable indexing for Trash if you don't care to search messages there (if you've switched global indexing back on).

Just to point out, Google recommends the use of the IMAP mark as deleted feature, not move trash to another folder. It may be they are aware of issues with their implementation that leads to this recommendation in the first place.

Indexing is disabled for Trash.
I've now selected 'When I delete a message: Just mark it as deleted' in server settings, as suggested.
(This way deleted messages do not appear in Trash folder.)

Could it be that the issue I was experiencing before was just TB downloading some of the messages with large attachements, even though it previously thought it had already syncronized everything?

Could it be that the issue I was experiencing before was just TB downloading some of the messages with large attachements, even though it previously thought it had already syncronized everything?

Theoretically large attachments shouldn't cause a problem when you store messages locally in mbox file and for offline use. The only "problem" I've seen is if you don't store messages locally for offline and your attachment is too large (maybe >25M, don't remember the exact figure), there may not be enough ram allocated to hold the message (or the attachment part of the message) and you see something like a 27 byte file saved when you try to save the attachment. But that's not your issue.

I've now selected 'When I delete a message: Just mark it as deleted' in server settings, as suggested.
(This way deleted messages do not appear in Trash folder.)

With gmail imap at default setting at gmail.com, when you delete a message in tb which just sets the \deleted flag on the message if you are set to "just mark as deleted", this causes gmail to immediately expunge the message from the folder. If you have one or more folders with the same message, when you delete it from the last folder it still remains in "All Mail". So if you are not subscribed to "All Mail" and you want to get back a deleted message, you will have to go to gmail.com and copy the message back from "All Mail" to where you want it.

With all other non-gmail servers, when you mark a message \deleted it just leaves it where it is (doesn't expunge it automatically) and tb continues to show it but with an X in front and with a strike-through line. With gmail you may see this too but either temporary or the message won't be readable, not sure, since the message was auto-expunged.

You can change default gmail.com setting so that it doesn't auto-expunge when you mark the message as \deleted so the deleted messages remain present in the folder but just crossed out and are only expunged from the folder when you do a "compact". However, changing this at gmail.com may cause other issues, not sure (I'm not a big gmail user, but have a couple accounts mostly for testing).

I am seeing this issue with TB98b3 (and was with b2 as well)... it seems that anytime I re-open TB, or my laptop wakes up from sleep / hibernate then it re-downloads a LOT of messages, really slowing down TB & the computer. It's been reliably bad since I switched from the stable TB (91.something?) to test for another issue.
Please let me know what I can do to help debug... I saw this issue years ago, then no more, now it's back.
I am running on a win11 machine, the IMAP Server is dovecot running on FreeBSD. I have a lot of folders & mails.

Hard to say what the problem might be. The first step might be to produce and attach a log as described in comment 6 above. Let me know exactly what you were doing when recording the log -- that always is very useful. If it only occurs after a wake-up, you can manually sleep windows, let it sit for a few minutes and wake it up while recording the log. Seems like that should catch the problem.

When you say LOT of messages, in which folders? How many messages in the different folders and their structure are some of the details that might also be helpful.

This morning (after waking the computer) it redownloaded about 2700 messages (67MB, largest message in that folder is 200kb, most around 10-40kb) from a folder (there was one new message in it... it only seems to redownload folders where there is at least one new message), in parallel it is indexing other folders. The folder is in a top-level folder called "folders", then 2-3 levels under that. I was doing nothing (computer just woke up, went to TB, and it started). It really slows down when it starts re-indexing the folder it is downloading to (looking at the activity manager).

It also seems to have done a partial re-indexing, indexing about 200 messages fairly early in the process. Just now it finished indexing 1100 more, while still downloading. Maybe indexing and syncing should not happen to the same folder at the same time? That may speed things up a lot. I had noticed before that the downloading sometimes starts very quickly (a few dozen or so messages per second), and then slows to a crawl sometime later. Right now it is only doing about one message per second (!!) while indexing the same folder, I'm guessing this is due to the conflicting folder/file access.

It also seems that it forgets that that folder was sorted latest-message-at-the-top ... I have some folders sorted one way and some the other, this one has the latest ones at the top and I'm pretty sure I changed the order on this one. I had seen this before as well.

Are you saying that after it wakes up and re-downloads and re-indexes the folders that it eventually stabilizes and all is OK? If so, how long does this take?

When there are a lots of new messages (or what TB thinks are new msgs) the indexing will occur. You might consider turning off indexing temporarily until the the re-downloading problem is resolved: General setting, near the bottom, turn off "Enable Global Search and Indexer" and then restart TB to be sure it takes effect.

Is it re-downloading all the messages in the top-level "folders" and folders under it? By default, new messages only appear in Inbox so I guess you must be using a TB filter or server side filter to move messages into "Folder" and others below it?

A reason that a re-download may occur is that the "UIDVALIDITY" value that tb has doesn't match what the server reports. But I've never seen a dovecot/tb combination have this problem, so probably doubtful. The only way to tell is to produce a log file. It might be good to set the MOZ_LOG variable to include IMAP_CS:5 also, e.g., MOZ_LOG = IMAP:5,IMAP_CS:5,timestamp,sync, since that will show definitely if UIDVALIDITY has changed.

I don't know if you have tried to "repair" the problem folders (right click, properties, repair folder), sometimes that helps but, personally, with TB shutdown, I prefer to just delete the problem files and directories in the Profile and let tb rebuild them. If you look in the profile area (under ImapMail/<your account name>) you will find files corresponding to the imap folder structure with names like folder.msf, folder, and directories with names like folders.sdb which are containers for subfolders. You may also see folders with a suffix like folder-1, folder-2 etc. This indicates there was a problem so a new directory was created due to tb finding an inconsistency in a folder. So the higher suffix is the currently active directory and the others are no longer in use. AFAIK, compact doesn't get rid of these so OK to just delete them.

If you go this route and delete the possibly corrupted profiles files and directories you can first back them up if you are concerned, but since they are all on the server, I never bother with a backup.

When you restart TB and select the problem folders, of course tb will re-download them. With indexing disabled it might be a bit faster. Hopefully this will fix the problem you are having with the beta.

Yes, it stabilizes... after it has redownloaded the affected folders. This can take minutes or an hour or two, depending on how many folders. It is only the low(est)-level folders, and yes, I use server side filtering with imapfilter on the server where dovecot is running. It happened with only one folder this morning, but with several folders a few days ago when i had not used the computer for a few days.

I would argue that TB needs to be more robust here - if a folder is syncing, indexing should be postponed. This will alleviate the problem somewhat without causing additional headaches.

How do I set the logging variables on windows? Do as in https://wiki.mozilla.org/MailNews:Logging#Windows ? The problem is that I can't reliably reproduce it and would need to let the logging run for a few days.

I have not tried to repair (or delete) the mail file as it stabilizes. I see no folder-1 etc. in the tree. Closing and reopening TB shows a few mailboxes getting parts re-indexed (seems normal). It seems to happen when a) TB has not been opened for a while and b) there are new messages in some folders (social network folders and the like where you can see a dozen or so mails a day/ over night).

(In reply to Nils from comment #47)

I had noticed before that the downloading sometimes starts very quickly (a few dozen or so messages per second), and then slows to a crawl sometime later. Right now it is only doing about one message per second (!!)

All of this sounds verbatim to what I saw in https://bugzilla.mozilla.org/show_bug.cgi?id=1740486#c3 when I would do a folder repair and it would eventually fail. But it was when TB was downloading mail and it got to the last 90-95% that downloading messages came down at a rate of one mail every 4-5 seconds for me. In the end, I did wind up backing up my profile and starting over from scratch and it fixed folder repairing, message download slowness and all other problems I was having.

It's possible you might have partially gotten bitten by the bug that showed up somewhere around 94 beta where messages were getting corrupted or showing headers from preceding and succeeding messages and things were all mangled. Of course, doing a folder repair only seemed to have made the issue worse.

If you have Windows, have you tried to replicate the issue with Thunderbird Portable? Would be a good indicator of if it is a profile corruption or a genuine bug of some other kind.

I would argue that TB needs to be more robust here - if a folder is syncing, indexing should be postponed. This will alleviate the problem somewhat without causing additional headaches.

That seem like a good idea but at this time there is little motivation to make the current imap code more robust since the whole c++ based imap subsystem is getting re-written in javascript, from what I've heard. I haven't been asked to be involved with it even though I have maintained the c++ imap code for over 5 years (unpaid volunteer), so I don't know the exact schedule other than it will be available at a release later than version 100. I guess the theory is that a re-write will fix all the obscure issues present in the huge c++ code base.

It's possible you might have partially gotten bitten by the bug that showed up somewhere around 94 beta...

That's is what it sounds like but Nils indicated he went from release 91 to 98beta in comment 45. So not sure about that.

How do I set the logging variables on windows? Do as in https://wiki.mozilla.org/MailNews:Logging#Windows ? The problem is that I can't reliably reproduce it and would need to let the logging run for a few days.

Yes, it's just a matter of setting the MOZ_LOG and MOZ_LOG_FILE env vars and running tb with them in effect. However, it might be more useful to first try to delete the profile files/folders as I described in comment 48 and Arthur also did as he described here: bug 1740486 comment 19

If you have Windows, have you tried to replicate the issue with Thunderbird Portable?

I'm not really familiar with TB Portable: https://portableapps.com/apps/internet/thunderbird_portable. I assume this is what Arthur is referring to.
I was also wondering if going back to an official release version (91 or 97) might also "fix" the problem. However, probably still would need to clean out the profile.

Had it happen again this morning.

  • I had TB closed overnight
  • opened it this morning, and it updated all folders normally
  • clicked on a few folders I don't often use, and it refreshed those normally
  • the activity manager paused forever on a "indexing 1 of 2" operation in a folder
  • closed TB
  • reopened TB
  • one folder that I had previously visited and left one email unread in showed as "newly unread" (blue instead of black bold in the folder pane)
  • clicked on the folder, downloading 2700 mails veeery slowly (1/sec or so, most mails in the 10-40kb range, imap server on the local network...)
  • also, the sorting of that folder is newest-at-the-bottom again instead of newest-at-the-top ... making me think that there is some folder index / config corruption happening somewhere
  • activity manager is showing that it is indexing 1 of 20 (not showing in which folder though, that would be helpful!!) and downloading 200 of 2700
  • ...2 mins later it is now indexing 200-some messages in that folder while downloading. This is annoying and slowing things down.

Should I file a separate bug for the indexing-and-downloading conflict / slowness?

Some additional comments:

  • the "size" column that I added to that folder was also gone, so there is definitely something happening with the folder config
  • when I clicked on the folder it was empty but filled up with headers immediately, and then started downloading. It seems that TB "forgets" everything about that folder, and that leads to the redownload / reindexing
  • when I closed TB in the middle of downloading / indexing, and the reopened it, it picked up where it had left off, and the folder config was still correct

Is there a logging to show that the config for a folder (along with the index not being there) is being recreated or cannot be loaded? I think that is the thing to track down.

(In reply to gene smith from comment #51)

That seem like a good idea but at this time there is little motivation to make the current imap code more robust since the whole c++ based imap subsystem is getting re-written in javascript, from what I've heard. I haven't been asked to be involved with it even though I have maintained the c++ imap code for over 5 years (unpaid volunteer), so I don't know the exact schedule other than it will be available at a release later than version 100. I guess the theory is that a re-write will fix all the obscure issues present in the huge c++ code base.

Your IMAP expertise will be very useful, so I hope you'll be involved! The imap-js work will get started in the next month(s) or so. xref bug 1707547.
Indeed threading related bugs in particular should go away.

It just happened with another folder:

  • clicked on the folder as there was a new mail indicated
  • lots of headers downloaded
  • download of messages
  • config was again gone, no size column, sorting was the default instead of how I set it

Would it be possible to log when a configuration is read / not found / found but invalid? That may help show when it is being stepped on and how.
Also, maybe log when ALL mails in a folder are being downloaded instead of just an incremental download (maybe for folders with >1000 messages or so)? I'm trying to think of how to narrow this problem down. Once the folder is OK again, then it is OK for a while. I've only seen the issue in large folders (1000+ messages or so maybe?).

For me, the problem was solved in the discussion above.
Thanks again.

(In reply to nicolopiazzalunga from comment #56)

For me, the problem was solved in the discussion above.
Thanks again.

nicolopiazzalunga,
Not sure exactly what you did to fix the problem or even if your problem was the same as Nils is seeing. If the same or similar, could you summarized exactly what you did? Thanks.

Nils,
I don't know of any special logging feature to catch what you ask about.
Have you tried switching off "global indexing" temporarily as I suggested?
Have you tried deleting the problem folder files and directories in the profile?
Have you recorded an imap log?

You are on a local dovecot server which I also have here and I don't see problems. I think nicolo was using gmail.

Another option would be for you to provide me with a temporary test account on your imap server and you might populate the folders in similar way so I might be able to duplicate the problem from my location. I realize this may not be possible.

Gene,
This morning I deleted the .msf file but left the mail file in place. That folder just re-synced all headers and started downloading mails again. now I will delete both the mail folder and .msf, and see what happens tomorrow. Next step would be to turn off the global indexing if it happens again.

Let me know if you have other suggestions, but I want to try and narrow this down one step at a time. I can't provide a test account, this is behind a NAT on my home network.

(In reply to Nils from comment #58)

Gene,
This morning I deleted the .msf file but left the mail file in place. That folder just re-synced all headers and started downloading mails again. now I will delete both the mail folder and .msf, and see what happens tomorrow. Next step would be to turn off the global indexing if it happens again.

Let me know if you have other suggestions, but I want to try and narrow this down one step at a time. I can't provide a test account, this is behind a NAT on my home network.

Yeah, that's what I figured. No problem.

Yes, with tb shutdown, I would deleted both .msf file and the corresponding mbox file (mail file) and let tb re-download and rebuild the file pair. I assume this is a folder (called a "mailbox" in imap parlance) that your server is delivering into and you are having problems with.

OK, next step done, no improvement. I deleted the MSF files, I deleted the mailbox files, I even deleted the whole directory with a bunch of mailboxes (msf files and individual mailbox files) in it and let it recreate, but the problem still happens.

Result:

  • the mailbox is "forgotten" or corrupted somehow, and completely redownloaded and indexed
  • when I select the mailbox I can see the mails filling up and scrolling past (activity manager indicates that it is downloading headers, then the mail downloading starts)
  • the configured view of the mailbox is gone... it is sorted in the opposite (default) order of how I sort it (newest at the bottom), and the "size" col which I always add is gone
  • basically: the mailbox is "reset" both in terms of content and in terms of configuration, and deleting MSF and mailbox file from disk does not help

Setup:

  • email arrives at ISP
  • getmail pulls it to the inbox of the dovecot server (pulling from several mail accounts)
  • imapfilter puts it into the correct folder
  • TB is just connected to the dovecot server

Next I will turn off the global indexing and see if that helps.

(In reply to Nils from comment #60)

OK, next step done, no improvement. I deleted the MSF files, I deleted the mailbox files, I even deleted the whole directory with a bunch of mailboxes (msf files and individual mailbox files) in it and let it recreate, but the problem still happens.

Result:

  • the mailbox is "forgotten" or corrupted somehow, and completely redownloaded and indexed
  • when I select the mailbox I can see the mails filling up and scrolling past (activity manager indicates that it is downloading headers, then the mail downloading starts)
  • the configured view of the mailbox is gone... it is sorted in the opposite (default) order of how I sort it (newest at the bottom), and the "size" col which I always add is gone
  • basically: the mailbox is "reset" both in terms of content and in terms of configuration, and deleting MSF and mailbox file from disk does not help

By "Result:" do you mean what you see on TB restart, i.e., the folders are still getting re-downloaded on each restart? Or is "Result:" what you saw after deleting the files and directories and startup up TB after all was re-downloaded? I'm pretty sure you mean the latter former (since you saw "no improvement" but just want to be sure.
I think if you delete the .msf (database) files they contain the setting for the folder columns so when re-created you will have default column setups.

Setup:

  • email arrives at ISP
  • getmail pulls it to the inbox of the dovecot server (pulling from several mail accounts)
  • imapfilter puts it into the correct folder
  • TB is just connected to the dovecot server

TB shouldn't care how new messages get delivered to the your folders/mailboxes, so this sounds OK and shouldn't be a problem. However, to be sure maybe you could temporarily disable the "imapfilter" step?

Next I will turn off the global indexing and see if that helps.

[Note: wrote this as your comment came in, so partially responding to it...]
Next step:

  • delete global-messages-db.sqlite (~350MB file) as suggested in comment #1
  • delete the entire folder with the mailboxes that are most often getting redownloaded
  • waited for reindexing to run on all folders (~2-3h, ~20GB mailfolders total; still going at present)
  • reset the sorting & size column on the "usual suspect" folders
  • wait for it to happen again (will be my next post)

By Result above I mean that I a) closed TB, b) deleted the whole folder with a bunch of mailfiles and MSFs, c) restarted TB and let it DL and index, and reconfigured the folders (yes, if the MSFs are deleted the folders are reset obviously), but then d) occasionally the folders are still getting reset and cleared while TB is running, redownloading and reindexing with the folder view settings disappearing. This is sometimes on restart / wake, sometimes in the middle of a session, leading me to think that the root cause is in the next to paragraphs somewhere.

Disabling imapfilter is possible, but that would mean everything going into my inbox only. I'd be hesitant to do that. I would also assume that new mails arriving during downloading / indexing / new mail checking could be part of the issue. Now that you mention it, it could be that it only happens when new mails are injected into the Mailfolder (on the dovecot side) via imapfilter. Will keep an eye out for that...

I have "mail.server.default.check_all_folders_for_new" set to TRUE as per https://kb.mozillazine.org/Checking_for_new_messages_in_other_folders_%28Thunderbird%29 ... maybe this is a part of it?

Yours came in while writing this (mid-air collision):

I still think we need to rule out a UIDVALIDITY problem which always causes a re-download of the mailbox. I don't know if this is relevant but dovecot wiki talks about it some here: https://wiki.dovecot.org/MboxProblems. You might check your dovecot logs for any mention of UIDVALIDITY.
Also need to make sure TB is not seeing a UIDVALIDITY problem. If you could record a log with just these env vars we might see something:

MOZ_LOG=IMAP_CS:5,timestamp
MOZ_LOG_FILE=<path to where you want the log to be>\tblog

Then in file tblog.moz_log see if a mention of UIDVALIDITY appears. If there is nothing to log the file will remain empty so you can keep it running for a long time if needed.

FYI, here's where tb checks UIDVALIDITY and logs a failure:
https://searchfox.org/comm-central/rev/6817947db03d3e903838dd1985da53a526d53742/mailnews/imap/src/nsImapMailFolder.cpp#2480
Also, the CS means CONDSTORE, but you don't have to enable condstore for this log message to appear.

Re comment 62, Ok I see what you mean. Sounds like it should work to "fix" the problem, but I guess not.

I don't have any experience with the sieve, pigeonhole or imapfilter features of dovecot. It's hard to believe they they would cause a UIDVALITY problem but who knows.

I don't think that the "check all folders for new mail" would cause a uidvalidity error but I would disable that and just set the specific non-Inbox folders you want checked using the right-click/properties/"always check this folder" setting instead. Of course, this will get blown away if the folder re-downloads again.

There is nothing in the dovecot logs (/var/log/maillog*) ... no UID / UIDVALIDITY strings, no Panic's.

I completed the steps in comment #62 and am now running with debugging.

tbdebug.bat:
set MOZ_LOG=IMAP_CS:5,timestamp
rem set MOZ_LOG_FILE=%USERPROFILE%\Desktop\imap.log
set MOZ_LOG_FILE=C:\tmp\imap.log
rem "%ProgramFiles(86)%\Mozilla Thunderbird\thunderbird.exe"
"%ProgramFiles%\Mozilla Thunderbird\thunderbird.exe"

This is after folder and global-messages-db.sqlite cleaning, and setting up the usual suspect folders with size and newest-at-the-top again. I'll have a look at the logs and folders tomorrow morning, lets see what happens now.

(In reply to Nils from comment #65)

There is nothing in the dovecot logs (/var/log/maillog*) ... no UID / UIDVALIDITY strings, no Panic's.

I completed the steps in comment #62 and am now running with debugging.

tbdebug.bat:
set MOZ_LOG=IMAP_CS:5,timestamp
rem set MOZ_LOG_FILE=%USERPROFILE%\Desktop\imap.log
set MOZ_LOG_FILE=C:\tmp\imap.log
rem "%ProgramFiles(86)%\Mozilla Thunderbird\thunderbird.exe"
"%ProgramFiles%\Mozilla Thunderbird\thunderbird.exe"

This is after folder and global-messages-db.sqlite cleaning, and setting up the usual suspect folders with size and newest-at-the-top again. I'll have a look at the logs and folders tomorrow morning, lets see what happens now.

And not to complicate matters any more for the devs but you could also test with 99.0b1 now to see if there's any change with respect to this issue. Back up your profile of course so you can do a clean revert back to 98.0b3 if needed.

Arthur, can you explain the versions to me? Is 98 a "beta" and 99 an "alpha", or how are they using the numbering? On the main TB page in the bottom right corner, the Beta channel has 98, the Daily channel 100... where can I get the 99?

Let me wait and see what happens with 98 tomorrow, if I don't see the problem all day then my comment 62 actions worked. If that didn't work, I can try 99.

(In reply to Nils from comment #67)

Arthur, can you explain the versions to me? Is 98 a "beta" and 99 an "alpha", or how are they using the numbering? On the main TB page in the bottom right corner, the Beta channel has 98, the Daily channel 100... where can I get the 99?

Well in comment 45 you state you're running 98.0b3. 99.0b1 is about to release (or is possibly available if you do a Help > About Thunderbird).
https://www.thunderbird.net/en-US/thunderbird/99.0beta/releasenotes/

Let me wait and see what happens with 98 tomorrow, if I don't see the problem all day then my comment 62 actions worked. If that didn't work, I can try 99.

Sounds good.

set MOZ_LOG_FILE=C:\tmp\imap.log

Nils, you may already know this but the actual log file produced will be C:\tmp\imap.log.moz_log

OK, no need to wait that long. Some folders were reset with new mails in them, others with no new mails (but unread ones). TB99 isn't offered yet, I can upgrade tomorrow when it is officially out on the beta channel. Nothing in the log files so far, I'll see if anything shows up over night.

Next I can turn off the global indexing, or delete and re-add the account and set it to "File per message (maildir)" instead of the current "File per folder (mbox)" ... suggestions?

There is content in the logfile...anything in particular to look for? There is no UIDVALIDITY there, but the redownloading and indexing seems to be logged. Is there a way to sanitize the mailbox names? I'd rather not publish my entire mail tree here : )

I am getting a lot of needFullFolderSync=true ... see the logs below. It would be good if individual lines in the log indicated which folder they were referring to. That would help understand whats going on better. I assume that TB is not expecting to look at this many folders in parallel...

Grepping for needFullFolderSync:

2022-03-10 07:02:09.553000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=17054, added=0, useCS=false
2022-03-10 07:02:09.553000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:09.584000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137418, added=0, useCS=false
2022-03-10 07:02:09.584000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:09.613000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=39642 for folder=folders.00_inboxes.something00
2022-03-10 07:02:09.629000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:02:10.543000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=8782, added=0, useCS=false
2022-03-10 07:02:10.543000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:10.631000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=0 for folder=INBOX
2022-03-10 07:02:25.325000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=3388, added=0, useCS=false
2022-03-10 07:02:25.325000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:25.338000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=10217 for folder=folders.work.fulltime.something11
2022-03-10 07:02:37.875000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=4613, added=0, useCS=false
2022-03-10 07:02:37.875000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:37.883000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=12460 for folder=folders.work.networks.something22
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137418, added=64, useCS=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 1. Do a sync?: added=64, deleted=0, useCS=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 2. Do a sync?: ShowDeletedMsgs=false, exists=65, mFolderTotalMsgCount=64
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 3. Do a sync?: fHighestModSeq=475860, mFolderLastModSeq=475860
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=false, needFolderSync=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Check for new messages above UID=136974
2022-03-10 07:03:19.578000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:03:19.594000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS NormalEndHeaderParseStream(): Store new highest UID=137424 for folder=INBOX
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137424, added=64, useCS=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 1. Do a sync?: added=64, deleted=0, useCS=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 2. Do a sync?: ShowDeletedMsgs=false, exists=65, mFolderTotalMsgCount=65
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 3. Do a sync?: fHighestModSeq=475860, mFolderLastModSeq=475860
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=false, needFolderSync=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Check for new messages above UID=137424
2022-03-10 07:03:21.694000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:03:36.374000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=1 for folder=folders.work
2022-03-10 07:03:44.091000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=0, added=0, useCS=false
2022-03-10 07:03:44.091000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false

Beginning of the logfile:

2022-03-10 07:02:09.553000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=17054, added=0, useCS=false
2022-03-10 07:02:09.553000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:09.584000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137418, added=0, useCS=false
2022-03-10 07:02:09.584000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:09.613000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=39642 for folder=folders.00_inboxes.something00
2022-03-10 07:02:09.629000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:02:10.543000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=8782, added=0, useCS=false
2022-03-10 07:02:10.543000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:10.631000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=0 for folder=INBOX
2022-03-10 07:02:25.325000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=3388, added=0, useCS=false
2022-03-10 07:02:25.325000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:25.338000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=10217 for folder=folders.work.fulltime.something11
2022-03-10 07:02:37.875000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=4613, added=0, useCS=false
2022-03-10 07:02:37.875000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false
2022-03-10 07:02:37.883000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=12460 for folder=folders.work.networks.something22
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137418, added=64, useCS=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 1. Do a sync?: added=64, deleted=0, useCS=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 2. Do a sync?: ShowDeletedMsgs=false, exists=65, mFolderTotalMsgCount=64
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 3. Do a sync?: fHighestModSeq=475860, mFolderLastModSeq=475860
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=false, needFolderSync=false
2022-03-10 07:03:19.568000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Check for new messages above UID=136974
2022-03-10 07:03:19.578000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:03:19.594000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS NormalEndHeaderParseStream(): Store new highest UID=137424 for folder=INBOX
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=137424, added=64, useCS=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 1. Do a sync?: added=64, deleted=0, useCS=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 2. Do a sync?: ShowDeletedMsgs=false, exists=65, mFolderTotalMsgCount=65
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS 3. Do a sync?: fHighestModSeq=475860, mFolderLastModSeq=475860
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=false, needFolderSync=false
2022-03-10 07:03:21.694000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Check for new messages above UID=137424
2022-03-10 07:03:21.694000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=475860 for folder=INBOX
2022-03-10 07:03:36.374000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): Store highest MODSEQ=1 for folder=folders.work
2022-03-10 07:03:44.091000 UTC - [Parent 12096: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=0, added=0, useCS=false
2022-03-10 07:03:44.091000 UTC - [Parent 12096: IMAP]: D/IMAP_CS needFullFolderSync=true, needFolderSync=false

Also, I sometimes see the following dialog when I go to the folder:

Error <folder> on <account>
Unable to open the summary file for <folder> on <account>. Perhaps there was an error on the disk or the full path is too long.

Going to it again right after usually works (or the dialog may come back one more time before it works). Will keep an eye out if this is a precursor to a "reset folder" ... there is definitely not a disk issue and the path never changes.

...should have waited another minute before posting. Just got that dialog again, subsequently the three folders where I saw it were all reset and are now redownloading. Note that grepping also yielded a few UIDVALIDIDY messages. The timestamp of the last 3 messages matches the issue.

2022-03-10 07:03:44.113000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder1
2022-03-10 07:19:42.927000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder2
2022-03-10 07:33:47.199000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder3
2022-03-10 07:57:26.206000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder3
2022-03-10 07:57:28.088000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder1
2022-03-10 07:57:29.522000 UTC - [Parent 12096: Main Thread]: D/IMAP_CS UpdateImapMailboxInfo(): UIDVALIDITY changed, reset highest MODSEQ and UID for folder=folders.work.networks.folder2

Could someone set this bug to "confirmed" possibly?

Can someone try reproducing?

  • set up an imap server (dovecot)
  • set up an account with a few folders (I have a high-level folder called "folders" and the contents in a tree under that (subfolder1, subfolder2, etc. with maybe a few levels), not under Inbox). I anonymized the folder names above but the tree is as I have it (eg. folders.work.networks.folder2)
  • put 1000-2000 mails into subfolder1-subfolder5 (my mails are 10-40kb on average in these folders)
  • set "mail.server.default.check_all_folders_for_new" to TRUE as per https://kb.mozillazine.org/Checking_for_new_messages_in_other_folders_%28Thunderbird%29
  • deliver a mail a minute into each of them (I am putting them in there with imapfilter)
  • click into them every few minutes and see what happens

For completeness, the path on disk to the mail directories / mbox files (slightly anonymized, but the length is 1:1 the same):

  • C:/Users/user5/AppData/Roaming/Thunderbird/Profiles/p123abcd.default-release/ImapMail/srvmail.domain.net/folders.sbd/work.sbd/networks.sbd/mailmbox.msf

...happened again with the "Global Search and Indexer" disabled.

I'm open to any more suggestions at this point. Try going to mailfiles instead of mbox with that mail account maybe? Else only logging with the folder names in each line of the IMAP log would help, I can't figure out which messages follow what in many places.

Well, this is interesting... with the global search and indexing disabled, it seems to happen MORE often than before. And it happens without any new mails arriving in the folder. Could it be something with local index generation / MSF writing (not atomic?) / ...?

Open to any ideas... if someone could try reproducing as in comment 73 that would be great.

Could someone set this bug to "confirmed" possibly?

Hi Nils, so it looks like the issue is UIDVALIDITY related. A while back someone also saw this error on a Exchange server, if I remember correctly, and was not a tb issue.
Not sure that this "confirms" a tb bug or if the problem is in the server. I'll try to simulate your setup and see what happens.

Not sure if what you show in comment 72 might also be the cause. Have you been seeing this pop-up error all along? If TB can't open the summary file (.msf) it's not going to no be able to see the stored UIDVALIDITY so UIDVALIDITY log errors may be a "red herring".

The comment 72 error is what I see if I click on the folder at just the "right" moment is my guess, when the index file is not where / what it is supposed to be. As I've been clicking the folders more often to check in the last day or two I have seen it more often. I also see the error on the parent folder (not sure if always, just tried it a few times today and often if the child folder where I see the problem hat the error, the parent folder ("networks" which contains zero mails!!) did as well). I think that the UIDVALIDITY is a red herring, and the MSF corruption is caused by something stepping on something or something ; )

Logging in more detail what leads to that error dialog (file missing? file corrupted? invalid data? already open for writing? ...) would help. Or a dedicated MSF interaction log... opening, writing, closing, ... and what is triggering it. is there something that logs that?

(In reply to Nils from comment #76)

Well, this is interesting... with the global search and indexing disabled, it seems to happen MORE often than before. And it happens without any new mails arriving in the folder. Could it be something with local index generation / MSF writing (not atomic?) / ...?

Open to any ideas... if someone could try reproducing as in comment 73 that would be great.

So, the folders in questions, these reside in the C:\Users\<your_user_name>\AppData\Roaming\Thunderbird profile folder, yes? I was thinking folder permission issues if they happened to be on a removable or network drive.

Nope, local SSD on a very recent laptop... path is C:/Users/user5/AppData/Roaming/Thunderbird/Profiles/p123abcd.default-release/ImapMail/srvmail.domain.net/folders.sbd/work.sbd/networks.sbd/mailmbox.msf (see comment 74). Whats weird is that I also see the problem with the comment 72 error dialog on the "networks" folder (a directory on disk), which contains no mails.

I don't know if the official beta release you are using has "debug" enable. If so, in the console where your are running tb from you might see various assertion errors printed that might tell sometime about file access problems. For example, when you see the uidvalidity error you should also see in the console this assertion error: https://searchfox.org/comm-central/rev/3dcdf593b06918782b53eb90aa76143af17dfeed/mailnews/imap/src/nsImapMailFolder.cpp#2423. So there may be other errors printed that are relevant.
Not sure, but you might also see the errors in tb's built-in console with ctrl-shift-j.
Otherwise, I'm don't know of any specific MOZ_LOG facility to capture what you ask about.

(In reply to Nils from comment #80)

Nope, local SSD on a very recent laptop... path is C:/Users/user5/AppData/Roaming/Thunderbird/Profiles/p123abcd.default-release/ImapMail/srvmail.domain.net/folders.sbd/work.sbd/networks.sbd/mailmbox.msf (see comment 74). Whats weird is that I also see the problem with the comment 72 error dialog on the "networks" folder (a directory on disk), which contains no mails.

So, working backwards here: How does your machine talk to the server? Meaning, is your machine and the server on a switch and sharing traffic? Is the server behind any hard/soft firewall? What about disabling any Antivirus in case it's the type that acts as a proxy for incoming / outgoing mail scanning? Just thinking out loud about any potential software or network topology issues that could be at play.

Machine talking to the server should not be an issue, this worked fine with TB91 on this laptop before (Win11), and also with TB91 on an Ubuntu computer connected the same way. There is no antivirus except the windows defender.

No console output, but ctrl-shift-j is good... these "uncaught exceptions" seem to coincide in number and timestamp with the failures:

 Uncaught Exception { name: "NS_ERROR_FILE_TARGET_DOES_NOT_EXIST", message: "Component returned failure code: 0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST) [nsIMsgFolder.updateFolder]", result: 2152857606, filename: "resource:///modules/DBViewWrapper.jsm", lineNumber: 88, columnNumber: 0, data: null, stack: "updateFolderAndNotifyOnLoad@resource:///modules/DBViewWrapper.jsm:88:15\nopen@resource:///modules/DBViewWrapper.jsm:810:32\nshow@chrome://messenger/content/folderDisplay.js:849:19\nFolderPaneSelectionChange@chrome://messenger/content/commandglue.js:368:18\nonselect@chrome://messenger/content/messenger.xhtml:1:1\nMozTreeChildren/<@chrome://global/content/elements/tree.js:69:28\n", location: XPCWrappedNative_NoHelper }
​columnNumber: 0
​data: null
​filename: "resource:///modules/DBViewWrapper.jsm"
​lineNumber: 88
​location: XPCWrappedNative_NoHelper { QueryInterface: QueryInterface(), filename: Getter, name: Getter, … }
​message: "Component returned failure code: 0x80520006 (NS_ERROR_FILE_TARGET_DOES_NOT_EXIST) [nsIMsgFolder.updateFolder]"
​name: "NS_ERROR_FILE_TARGET_DOES_NOT_EXIST"
​result: 2152857606
​stack: "updateFolderAndNotifyOnLoad@resource:///modules/DBViewWrapper.jsm:88:15\nopen@resource:///modules/DBViewWrapper.jsm:810:32\nshow@chrome://messenger/content/folderDisplay.js:849:19\nFolderPaneSelectionChange@chrome://messenger/content/commandglue.js:368:18\nonselect@chrome://messenger/content/messenger.xhtml:1:1\nMozTreeChildren/<@chrome://global/content/elements/tree.js:69:28\n"
​
<prototype>: ExceptionPrototype { toString: toString(), name: Getter, message: Getter, … }
DBViewWrapper.jsm:88
    updateFolderAndNotifyOnLoad resource:///modules/DBViewWrapper.jsm:88
    open resource:///modules/DBViewWrapper.jsm:810
    show chrome://messenger/content/folderDisplay.js:849
    FolderPaneSelectionChange chrome://messenger/content/commandglue.js:368
    onselect chrome://messenger/content/messenger.xhtml:1
    MozTreeChildren chrome://global/content/elements/tree.js:69

So the "NS_ERROR_FILE_TARGET_DOES_NOT_EXIST" popped out at me and Ye Ol' Google brought this up: http://forums.mozillazine.org/viewtopic.php?f=39&t=2197337

Ancient but maybe offers something of use? They fixed it by "I finally managed to fix this issue. Everything was caused by security permissions of the mail folder and what's in it. I had to change the ownership, remove all entries and add my username and SYSTEM user with full control. That did the trick."

Well, thats Win7 in 2011, this is Win11 in 2022 ; ) ... this is so old I would rather narrow the current setup issues down than chase that trail. I am the only user on the laptop, and this is in my user directory, so there should not be a problem there. It could be some non-TB file access issue, granted, but I would not assume its permissions. My hunch is still that something (one thread? Indexer? ...) is recreating the file when the other wants to access it, or something like that.

Also, TB 3 vs TB 98, and my TB created the files / folders itself, he had copied them from another machine.

I had to change the ownership ...

Seems like the error Nils sees would be consistent and not just when new mail arrives? But maybe worth checking.

I copied 5000 message to a deep imap folder on dovecot server. It tooks some times and messages were being continuously "delivered" via imap append but I see no problems at all. I could access any messages with messages coming in and getting download from the server with no problem.
P/S: I have global indexing disabled (mainly to save battery).

Do you have the "mail.server.default.check_all_folders_for_new" set to TRUE ? I am seeing it from time to time even when no mails are delivered into those folders. Try leaving 2-3 mails unread, I often have some unread to handle later in the folders. Maybe drip-feed a new mail into the folders every 1, 5, or 10 minutes to see if that does anything? Sometimes nothing happens for hours, sometimes I see the issue twice an hour.

I'm still on 98b3, 99b1 isn't in the beta channel yet. BTW, addons... I just disabled nostalgy++ (v3.2.16, updated the XPI to make it work with TB98 as per a comment on github from the author), maybe that was doing something. Let me see what happens tomorrow.

Looking at the MSF files in the networks.sbd directory, only one is currently indexed, the others are again reset. Something is clearing the MSF files out it seems... contents for one example:

// <!-- <mdb:mork:z v="1.4"/> -->
< <(a=c)> // (f=iso-8859-1)
  (80=ns:msg:db:row:scope:msgs:all)(81=subject)(82=sender)(83=message-id)
  (84=references)(85=recipients)(86=date)(87=size)(88=flags)(89=priority)
  (8A=label)(8B=statusOfset)(8C=numLines)(8D=ccList)(8E=bccList)
  (8F=msgThreadId)(90=threadId)(91=threadFlags)(92=threadNewestMsgDate)
  (93=children)(94=unreadChildren)(95=threadSubject)(96=msgCharSet)
  (97=ns:msg:db:table:kind:msgs)(98=ns:msg:db:table:kind:thread)
  (99=ns:msg:db:table:kind:allthreads)
  (9A=ns:msg:db:row:scope:threads:all)(9B=threadParent)(9C=threadRoot)
  (9D=msgOffset)(9E=offlineMsgSize)
  (9F=ns:msg:db:row:scope:dbfolderinfo:all)
  (A0=ns:msg:db:table:kind:dbfolderinfo)(A1=numMsgs)(A2=numNewMsgs)
  (A3=folderSize)(A4=expungedBytes)(A5=folderDate)(A6=highWaterKey)
  (A7=mailboxName)(A8=UIDValidity)(A9=totPendingMsgs)
  (AA=unreadPendingMsgs)(AB=expiredMark)(AC=version)(AD=forceReparse)
  (AE=fixedBadRefThreading)(AF=onlineName)(B0=folderName)
  (B1=highestModSeq)(B2=highestRecordedUID)>
{1:^80 {(k^97:c)(s=9)} }
{FFFFFFFD:^9A {(k^99:c)(s=9)} }

<(80=1)(81=0)(82=folders.work.networks.folder22)>
{1:^9F {(k^A0:c)(s=9u)}
  [1(^AC=1)(^AD=0)(^AE=1)(^AF^82)]}

Woah. The MSF files LITERALLY disappeared under my nose. I captured the screenshot above, went up a few directories to do a find . -name "*msf" -exec ls -al {} \; and now they are gone. Along with the mbox mailfiles themselves. The one that hat the large size is still there (both MSF and mailfile), but all the "clean" / reset ones are gone (both MSF and mbox mailfile). In this situation, clicking on the folder in TB gives the comment 72 error dialog (which makes sense) along with the "Uncaught" error in the error console. Clicking on it again shows the folder with mails, and the files on disk are then there (redownloaded and MSF regenerated).

Now, 2 mins later, all the MSF files are back in their "clean" states. I'm guessing that this will repeat again sometime. It's too targeted to be random / a Windows issue. This is towards the end of my directory tree... I have around 600 mail folders in total, and these are (when sorting the tree) towards the end. To reproduce, maybe try creating a large folder tree with 3 levels deep and a total of 1000 folders?

Ideas??!?

(In reply to Nils from comment #90)

Woah. The MSF files LITERALLY disappeared under my nose. I captured the screenshot above, went up a few directories to do a find . -name "*msf" -exec ls -al {} \; and now they are gone. Along with the mbox mailfiles themselves. The one that hat the large size is still there (both MSF and mailfile), but all the "clean" / reset ones are gone (both MSF and mbox mailfile). In this situation, clicking on the folder in TB gives the comment 72 error dialog (which makes sense) along with the "Uncaught" error in the error console. Clicking on it again shows the folder with mails, and the files on disk are then there (redownloaded and MSF regenerated).

Now, 2 mins later, all the MSF files are back in their "clean" states. I'm guessing that this will repeat again sometime. It's too targeted to be random / a Windows issue. This is towards the end of my directory tree... I have around 600 mail folders in total, and these are (when sorting the tree) towards the end. To reproduce, maybe try creating a large folder tree with 3 levels deep and a total of 1000 folders?

Ideas??!?

Any filters set up?

I was testing with a heavily modified 97 daily. Tried it with 98b3 just downloaded. Still works OK. But, didn't set "check all folders for new". So I'll try that next.

I looked up nostlgy++on github but no idea if it would affect this. Guess I should have asked the "standard" support question "have you tried running in safe-mode ?".

I'll also try your other suggestions and see what I see.

Nope, no filters. This was a clean install with TB91, and then a move to TB98 at some point.
Good idea to try safe-mode, haven't done that yet. Will let this go overnight and see tomorrow.

Re comment 89 Are you accessing the files using freebsd or linux or maybe something like cygwin on windows? Just curious since you said you were on win11 and the screenshot looks bash-like.
Also, if you have 600 folder and are checking all of them for new mail every X minutes, I would think that would really hit performance.

The server would definitely take a hit

(In reply to Wayne Mery (:wsmwk) from comment #95)

The server would definitely take a hit

Nils's imap server is on his local network and dedicated so probably not a big concern for stressing it.

Nils, this is just a suggestion and probably doesn't affect this issue: If you are backing up your freeBsd server content, you can probably get by with just setting TB so that no offline store is used and messages are just fetched from the server when accessed. Another tb user I've worked with has almost more folders than I have individual messages and runs like this with good results. Also, AFAIK, that user doesn't check all folders for new mail. He did have a complaint that initial folder discovery was taking too long but that has been resolved: bug 163964.

I made the setting so all folders are checked for new mail. Still don't see a problem when delivering mails to a deeply nested folder with 98b3.

P/S: If you want to keep the "check all folders for new mail" setting I would recommend keeping this setting at the default of true: mail.imap.use_status_for_biff. Otherwise, an imap select will occur to check each folder for new mail which is a more "expensive" operation.

(In reply to Nils from comment #88)

I'm still on 98b3, 99b1 isn't in the beta channel yet.

I just noticed the second build of 99.0b1 was done this evening. I also noticed the release notes mention the new POP JavaScript client---not that it applies to you. I'd guess that should hit beta channel any time now. Keep an eye out.

The screenshot was taken using WSL2... I have Ubuntu there. I started using that when it became available and it's not bad, I used to use cygwin. The server is a freenas box, where the dovecot is in its own jail. As this is only for my home & family mails, the load is not a problem... the server is a little overpowered for a home NAS. Also, I do need offline because this is my work & travel laptop.

Will do with the mail.imap.use_status_for_biff , thanks for the suggestion. I'm in safe mode right now, lets see how that works out.

I'm also running a while true; do echo -ndate; echo -n " " $(find . -name "*" |wc -l) ; echo " " $(find ./work.sbd/networks.sbd/ -name "*" | wc -l) ; sleep 10 ; done loop to monitor the number of files in the entire tree and in the part that is affected. The 1187 is the total file count, the 43 in the "networks" folder which is most affected. I would expect a few more than the 43 files to disappear since I have seen the effect in the folder above and a parallel folder to "networks" as well.

Fri Mar 11 07:00:12 CET 2022    1187    43
Fri Mar 11 07:00:22 CET 2022    1187    43
Fri Mar 11 07:00:33 CET 2022    1187    43
Fri Mar 11 07:00:43 CET 2022    1187    43

Well, that didn't take long. This is in safe mode with 98b3:

Fri Mar 11 07:03:46 CET 2022    1187    43
Fri Mar 11 07:03:56 CET 2022    1187    43
Fri Mar 11 07:04:06 CET 2022    1123    20
Fri Mar 11 07:04:16 CET 2022    1123    20

99b1 is available, so I am now upgrading. Lets see if that changes anything.

Nope, same result with 99b1 also in safe mode... "fulltime" and "networks" are sibling folders, both under "work"

Fri Mar 11 07:40:34 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 07:40:54 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 07:41:14 CET 2022 folders 1123 work 246 fulltime 6 networks 17
Fri Mar 11 07:41:34 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 07:41:55 CET 2022 folders 1185 work 308 fulltime 9 networks 42
Fri Mar 11 07:42:15 CET 2022 folders 1184 work 307 fulltime 9 networks 41

Something in TB is deleting and recreating the files. Lets see if it happens in 10 minute intervals like the mailcheck...
Well, what a surprise... 10 minutes later 7:51 the same thing:

Fri Mar 11 07:50:50 CET 2022 folders 1185 work 308 fulltime 9 networks 42
Fri Mar 11 07:51:00 CET 2022 folders 1185 work 308 fulltime 9 networks 42
Fri Mar 11 07:51:10 CET 2022 folders 1124 work 247 fulltime 6 networks 19
Fri Mar 11 07:51:21 CET 2022 folders 1124 work 247 fulltime 6 networks 19

...10mins later 8:01 one disappearance (most are already / still gone at this point)...

Fri Mar 11 08:00:51 CET 2022 folders 1125 work 248 fulltime 6 networks 19
Fri Mar 11 08:01:01 CET 2022 folders 1125 work 248 fulltime 6 networks 19
Fri Mar 11 08:01:12 CET 2022 folders 1124 work 247 fulltime 6 networks 18
Fri Mar 11 08:01:22 CET 2022 folders 1124 work 247 fulltime 6 networks 18

...another 10 mins later 8:11 something I didn't expect: then the files are recreated. Seems one cycle they go, the next they are recreated?

Fri Mar 11 08:11:23 CET 2022 folders 1124 work 247 fulltime 6 networks 18
Fri Mar 11 08:11:33 CET 2022 folders 1124 work 247 fulltime 6 networks 18
Fri Mar 11 08:11:44 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 08:11:54 CET 2022 folders 1187 work 310 fulltime 9 networks 43

...nope, timing doesn't quite hold up here, this one is only 5 mins later:

Fri Mar 11 08:15:49 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 08:15:59 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 08:16:10 CET 2022 folders 1123 work 246 fulltime 5 networks 17
Fri Mar 11 08:16:20 CET 2022 folders 1123 work 246 fulltime 5 networks 17

...but at the next 10 minute mark 8:21 they're recreated. It seems that of the 64 files total that go missing (1187-1123=64), they are all in the "work" folder, and that is the folder alphabetically at the end of the alphabetically sorted mailtree.

Fri Mar 11 08:21:26 CET 2022 folders 1123 work 246 fulltime 5 networks 17
Fri Mar 11 08:21:36 CET 2022 folders 1187 work 310 fulltime 9 networks 43

...and the next one 8:31

Fri Mar 11 08:31:07 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 08:31:17 CET 2022 folders 1123 work 246 fulltime 5 networks 17

...at the next 10 min mark 8:41 nothing happened because it was downloading one of the mailfolders. It happened a bit later 8:43 just after the downloading got paused, then the files appeared.

Fri Mar 11 08:43:41 CET 2022 folders 1122 work 245 fulltime 6 networks 17
Fri Mar 11 08:43:51 CET 2022 folders 1187 work 310 fulltime 9 networks 43

...and at the next 10 min mark 8:51, gone again

Fri Mar 11 08:51:10 CET 2022 folders 1187 work 310 fulltime 9 networks 43
Fri Mar 11 08:51:20 CET 2022 folders 1123 work 246 fulltime 6 networks 17

I'm out of ideas. My mail.server.default.check_time is set to 10 minutes. Could it be something like that the server polling is taking too long, and so one polling is not finished when the next one starts and that's stepping on files? This would explain the interval, and also why it is happening to the "last" folders in the tree. Something like that.

I'd even go to a daily if necessary to get more full debugging info on the file operations. I get the feeling that there is something low-level wrong here that only rears its head in some particular circumstances, and fixing it could provide a fair bit of stability in other places as well.

Any ideas welcome... if I were on Linux I would at this point try to recompile TB and add more logging around mailcheck and MSF file ops. If any dev can do that I'd be VERY happy to test.

My ideas at this point:

  • nuke the entire "work" folder and let it redownload and reindex (assuming that this won't help since I tried that with the "networks" subfolder already)
  • change the profile name to something shorter just in case it is something with path length (the full path with C:\ IS just over 128 chars long... maybe some datastruct is dimensioned like that?)

The screenshot was taken using WSL2...

Ok, I see. I have WSL2 too on my win10 dual-boot partition and it works ok, but I seldom boot into windows so never sure it will keep working between windows updates when I occasionally run it.

I created the 1000 folders in a deep subfolder and don't see a problem yet when I deliver to one of them.
From folder property: imap://gene@wally.dbnet.lan/4/rubish/baskette/truck/big%20truck/container0/gds10
The other folders are gds0 to gds999 that are empty (except for gds10). Also, have the check all folders for new mail enabled.

I wrote the above before I saw comment 99 and it's late here so will look at your results tomorrow. But just to be sure
Are you still using mbox format?
Can you show the exact folder structure you are using and approx. how many messages are in each folder?
Not sure what you mean when you say, I think, "problem is at end of the tree".
Have you tried going back to release 91 since this problem started?

Next data point... after the folders stayed "lost" (networks at 17) for some time and didn't show up in my Unread view, I went to the All view and clicked on the folder that I knew had unread mail in it. I got the comment 72 error dialog, and then the "<account> Looking for folders..." message in the status at the bottom, then it started downloading that folder. When I click on other folders under work/networking the file count (as expected) gradually increases as they are downloaded and get their MSF files. These are much smaller folders with only dozens of messages.

"At the end of the tree" I mean that if you sort the folder tree alphabetically, the problem occurs at the end of it. All my folders are under "folders" , and "folders/work" the work folder is the last of 10 subfolders under folders. Under work I have 20-30 folders, with a few (4-5) large ones with over 1k messages. Not sure how to dump the tree with message counts.

INBOX
folders
...<10 or so top level>
...work
......<10 or so under work, two of those are fulltime and networks>
......fulltime
......networks

My mail.imap.use_status_for_biff is TRUE (default), I had never touched that. Yes, still using mbox format. I have not tried TB91 again since the problem started.

Are you testing on Windows? Can you make your folder paths over 128 char long?

Attached file filelist-tb99b1_post_reinstall.log (obsolete) —

The first file was running tb99b1, the second was a clean re-install into a new directory of 99b1.

Whats weird is that in both cases when files get lost, it drops by 64. Could be coincidence, but ...

Are you testing on Windows? Can you make your folder paths over 128 char long?

I haven't been testing with windows (yet). But from what I can tell the folder paths can be up to 260 chars by default. They can be longer (32k) with a special path prefix \\?\ but I don't know that tb/mozilla uses that.

Still looking at your result from last night...

I just deleted 5 folders (so 10 less files, 5 msf, 5 mbox) ... lets see if the count still drops by 64 when it happens or by a different number now. Paths are definitely less than 256 chars.

(In reply to Nils from comment #99)

Something in TB is deleting and recreating the files. Lets see if it happens in 10 minute intervals like the mailcheck...

Ok so it's basically the same in 99.0b1. Any chance you reached out to the Dovecot folks for some insights? I know you said it was fine in 91 but maybe they have some insights or there's some debugging / logging they might suggest on the server side? It seems odd that TB would just willy nilly delete files / folders without user interaction to do so.

I'm on win10 now with beta 99b3. I didn't see anything at first but did see my column setting reset after copying in some new messages to my deeply nested gds10 folder. (I show the "order received" column and it went away.) All I saw in the error log was something about a gloda error since I still had the default indexing on, I suppose. I wasn't recording an imap log. However, I over 5000 messages in gds10 and I didn't see a re-download.
FWIW, my imap server is also dovecot running on a fedora box. Other than that Nils is using some plugins that I don't use, I don't see how that could be a problem if he's using a fairly up-to-date dovecot version.

(In reply to gene smith from comment #107)

I'm on win10 now with beta 99b3. I didn't see anything at first but did see my column setting reset after copying in some new messages to my deeply nested gds10 folder. (I show the "order received" column and it went away.) All I saw in the error log was something about a gloda error since I still had the default indexing on, I suppose. I wasn't recording an imap log. However, I over 5000 messages in gds10 and I didn't see a re-download.
FWIW, my imap server is also dovecot running on a fedora box. Other than that Nils is using some plugins that I don't use, I don't see how that could be a problem if he's using a fairly up-to-date dovecot version.

For what it's worth, I just sent a message to the Dovecot guys and pointed them to this bug. Maybe they'll reply. Maybe they won't. Wouldn't hurt to have some more eyes here.

Nils,

Is your Dovecot setup polling your email accounts and just acting as a redundant IMAP setup or are you running your own homebrew setup for your own dedicated MX? Just trying to get an understanding of your setup.

(In reply to Arthur K. [He/Him] from comment #109)

Nils,

Is your Dovecot setup polling your email accounts and just acting as a redundant IMAP setup or are you running your own homebrew setup for your own dedicated MX? Just trying to get an understanding of your setup.

I think he explained it in comment 60. Not a dedicated MX.

So after deleting 10 a few things as stated in comment 105 it still drops 64 files:

Fri Mar 11 20:23:07 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Fri Mar 11 20:23:17 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Fri Mar 11 20:23:27 CET 2022 folders 1099 work 245 fulltime 5 networks 18
Fri Mar 11 20:23:37 CET 2022 folders 1099 work 245 fulltime 5 networks 18

My mail setup:

  • mail goes to my ISP
  • getmail retrieves to dovecot inbox (mail does not remain on ISP server)
  • imapfilter files into the folders
  • TB gets mails from dovecot

Arthur, I did not reach out to the dovecot folks because I don't see what dovecot can possibly have to do with deleting mbox and MSF files. I would love to have some TB devs here who are willing to add some more debugging printing to the MSF relevant code.

Gene, if you see the column settings changed that could be part of the MSF file disappearing. Maybe set up a while loop like mine in comment 98 to see if something is happening to the files? Sometimes it goes hours without a problem, other times the problem happens every 10 minutes as seen in the logs above.

I am not using any plugins at present, plus it was the same thing in safe mode. Only plugins were nostalgy++, automatic dictionary, copy sent to current and editEmailSubject MX.

...and at some point all 64 files get recreated again...

Fri Mar 11 20:59:11 CET 2022 folders 1099 work 245 fulltime 5 networks 18
Fri Mar 11 20:59:22 CET 2022 folders 1099 work 245 fulltime 5 networks 18
Fri Mar 11 20:59:32 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Fri Mar 11 20:59:42 CET 2022 folders 1155 work 301 fulltime 9 networks 34

Apologies, 56 deleted & recreated, can't count. So I guess its limited to within the "work" folder at least for whatever reason

I used your script in WSL2 and I am seeing the file count where I thought I created 1000 folders gds0 - gds999 jump around. I also only see about 500 folders in tb instead of 1000 but it varies some. Something weird is going on (unless it's user error). I'll look again with linux and see if I really have 1000 folders (.msf files). My 1000 folders are under container0.sdb.

#!/bin/bash

while true; do
echo -n `date` ;
echo -n " " $(find . -name "*" |wc -l) ;
echo " " $(find ./container0.sbd/ -name "*" | wc -l) ;
sleep 10 ;
done
Attached file script to monitor file in mail folders (obsolete) —

Great, so I'm not going crazy at least : ) Here is an improved script, it will count the files in a few dirs, then diff to the last check and only log if something changed. Makes the log smaller & a bit easier to read.

Re comment 114

I'll look again with linux and see if I really have 1000 folders (.msf files). My 1000 folders are under container0.sdb.

The folders were missing in linux. I re-created the 1000 folders and repeated the test with linux and never saw folders go away in linx (per Nils's script) and never saw spurious re-download. This is with beta 98b3.

Then I moved back to win10 with beta 98b3 and initially all 1000 folders were there and the expected messages were in the folders I had populated, one with 5000 message and other with about 21 messages. The remaining 998 folder are empty. I then appended some messages from another account to the 5000 message folder and when I selected it, it immediately re-downloaded. Per the script, some of the folders went away but then came back:

Fri Mar 11 23:27:55 EST 2022  1007  1003
Fri Mar 11 23:28:06 EST 2022  1007  1003
Fri Mar 11 23:28:16 EST 2022  1007  1003
Fri Mar 11 23:28:26 EST 2022  606  603     <----- ???
Fri Mar 11 23:28:36 EST 2022  1007  1003
Fri Mar 11 23:28:46 EST 2022  1007  1003

and the populated folder content looked ok.

Then after a while the file count went low for a longer time:

Fri Mar 11 23:38:08 EST 2022  1007  1003
Fri Mar 11 23:38:18 EST 2022  1007  1003
Fri Mar 11 23:38:28 EST 2022  449  445
Fri Mar 11 23:38:38 EST 2022  449  445
Fri Mar 11 23:38:48 EST 2022  449  445
Fri Mar 11 23:38:58 EST 2022  449  445

But eventually came back:

Fri Mar 11 23:50:09 EST 2022  449  445
Fri Mar 11 23:50:19 EST 2022  449  445
Fri Mar 11 23:50:29 EST 2022  1007  1003
Fri Mar 11 23:50:39 EST 2022  1007  1003
Fri Mar 11 23:50:49 EST 2022  1007  1003

But tb is now using "Very High" power per win Task Mgr and the laptop fan is running hard. After shutting down tb, the fan stopped.

Anyhow, there seems to be something very window specific wrong in the beta since I see no issue when running under linux. But I haven't tested the tb windows release to see if it has the same problem.

Great, glad you see it too! THanks for helping reproduce. With the new script I posted in comment 115 only logging changes I see the following:

Fri Mar 11 22:19:41 CET 2022 folders 1099 work 245 fulltime 5 networks 18
Fri Mar 11 22:20:12 CET 2022 folders 1102 work 248 fulltime 6 networks 18
Fri Mar 11 22:30:05 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Fri Mar 11 22:44:44 CET 2022 folders 1099 work 245 fulltime 6 networks 16
Fri Mar 11 22:47:58 CET 2022 folders 1100 work 246 fulltime 6 networks 17
Fri Mar 11 22:50:01 CET 2022 folders 1143 work 298 fulltime 9 networks 34
Fri Mar 11 22:50:11 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Fri Mar 11 22:54:36 CET 2022 folders 1099 work 245 fulltime 6 networks 16
Sat Mar 12 03:23:15 CET 2022 folders 1139 work 298 fulltime 9 networks 34
Sat Mar 12 03:23:25 CET 2022 folders 1155 work 301 fulltime 9 networks 34
Sat Mar 12 06:16:26 CET 2022 folders 1099 work 245 fulltime 6 networks 17

The small changes are me clicking on a folder when it is gone, causing it to be reindexed and / or redownloaded, the large changes are the problem changes. The "very high" task manager is consistent with what I'm seeing, thats the downloading and indexing at work fixing things.

and btw, I was chatting in the TB dev matrix channel yestreday, and wsmwk said that he had also seen msf resetting occasionally, with the view settings reset. So this is a real issue, and if this can be tracked down can hopefully add quite a bit of stability to (maybe just windows?) TB considering how many "keeps downloading" issues have cropped up over the years.

So far, no problems seen when running release 91.7.0 (64-bit) on win10 -- no re-downloads and Nils's script shows no file count changes. (As expected, I did have to use --allow-downgrade with ESR 91 to use the same profile after running 98b3.)

Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → Windows
Version: 78 → Thunderbird 98

This sounds almost exactly like what was described here: bug 1726319 but I don't think it is really EFS (Encrypted Files System) specific. Someone also made a script to monitor the number of files and saw them change.

Bug 1551173 also looks like a possible dupe, even though it it referencing POP instead of IMAP. Bug 1551173 comment 21 is interesting.

Bug 1093217 could be related, esp. see bug 1093217 comment 6. This is referencing bug 131589 which is showing a lot of blocks / depends at the bottom. I looked for panacea.dat but did not see that file, should it exist in the profile directory, or where? Several panacea.dat issues might tie to this... https://bugzilla.mozilla.org/buglist.cgi?quicksearch=panacea

Should we rename this bug to "MSF files disappearing" or something MSF related for easier finding? Looking for MSF bugs https://bugzilla.mozilla.org/buglist.cgi?quicksearch=msf shows a few interesting ones like bug 561272 (I see those symptoms) and a few other candidates.

Ah, panacea.dat became msgFolderCache.sqlite per bug 418551 comment 17 ... was backed out in bug 418551 comment 42. It was resolved by Bug 418551 comment 71 - Convert nsFolderCache from mork (panacea.dat) to JSON. r=mkmelin ... but I can't find msgFolderCache.sqlite either ... is the folder cache also in global-messages-db.sqlite or where should it be?

I think it's this from the top level of the profile:
~/.thunderbird/kgmivl5o.default-release/folderCache.json
It's a huge file that describes all the .msf's and replaces panacea.dat, I think.

Ben Campbell has been working on the panacea replacement and has commented on some of the referenced bugs, so I suspect he would be the best person to take a look at the issue.

I had added BenC to the bug earlier because I saw him on the other one.

Thanks, I see it. There is also folderTree.json, but that only written when TB closes apparently. Correct me if I'm wrong, but that seems to only have the on disk folders listed, but not the mailfiles. cat folderTree.json | tr ',' '\n' to list.

I'll do a status check on folderTree.json and folderCache.json to see if I can see size changes, and correlate them to dis-/appearances of MSF files. Do you know if I can delete folderCache.json and let it rebuild without tons of redownloading? It has info from a few old profiles in it.

Also, @BenC ... probably the panacea.dat bugs can be closed https://bugzilla.mozilla.org/buglist.cgi?quicksearch=panacea since that file's now gone.

  • Increases / decreases by 64 or 56 files (MSF and mboxes) are the bug
  • small decreases (1-2 files) are folder deletions
  • de-/increasees changing from 64 to 56 was me deleting a few old folders to see if the number 64 was significant. It seems it wasn't.
  • smaller increases are caused by clicking on a folder and getting it reindexed
Attachment #9267401 - Attachment is obsolete: true
Attachment #9267402 - Attachment is obsolete: true

Filed Bug 1759355 ... I see varying message counts in folderCache.json consistent in part with what is happening here. Not sure if that is a result of this bug, or the other way around. I figured I'd keep it separate, if BenC wants thinks they are the same we can continue here.

Quick info... in bug 1759355 I'm seeing the counts bouncing around on another account that is coming from an outlook.office365.com server, so likely this issue is pure TB and not server / dovecot related at all. That other account has small folders... weirdly enough it resyncs, but the message sorting and column settings don't disappear.

Attached file 1000folders.diff

To duplicate the problem and see it easily, I created 1000 folders under a container folder. The diff shows a quick temporary change I did to CreateMailbox() imap function so when I create folder X, I get folders X0 to X999. I still have to manually subscribe them but you can select all of them in the subscribe dialog and subscribe them all together. I also copied 5000 messages to folder X10 and much fewer to a couple more but not sure that is necessary to duplicate the problem.
Again I only see the problem (number of folders varies over time) with 98b3 on win10. Also I have the pref mail.server.default.check_all_folders_for_new = true and haven't checked the behavior with this false.

Gene, it might be reproducible with less. I just started looking at the outlook.office365.com account I have... that has an Inbox (300 mails, 200MB) plus the usual (drafts, junk, trash, archive, etc.) folders, the a folder called "folders" with 4 subfolders, they have 48/6mb 104/55mb 19/2mb 430/224mb). Here I don't deliver to them at all, the mails in those folders are all manually filed, leading me to believe that it has nothing to do with mail delivery / new mails in folders. With things often happening at 5 or 10 minute intervals i'm guessing its something with indexing, searching, or mailcheck.

Sun Mar 13 23:12:26 CET 2022 folders 1112 work 256 fulltime 6 networks 17 o365 15
Sun Mar 13 23:13:23 CET 2022 folders 1112 work 256 fulltime 6 networks 17 o365 24
NOTE: o365 is a different server from the other folders listed

This happened just after starting TB... it reloaded the folders from the server (I saw them filling up in the folder pane even) and the MSFs showed up (see the 15 -> 24 filecount change).

I didn't see any difference with 99b1, so you might want to move to that to be on the latest beta.

This is the diff log of changes to folderCache.json as the o365 folders mentioned in the previous comment got refreshed after TB startup, and when the MSF files got created. It seems as if TB had forgotten a lot about this account at the last shutdown.

Also I have the pref mail.server.default.check_all_folders_for_new = true and haven't checked the behavior with this false.

I see the problem on win10 with it set false at about the same frequency.

I've tested a few more time under linux with the beta 98b3 and still don't see a problem there.
Yes, I would say it's not server dependent and probably not dependent on number of folders and/or messages.
I'll try some earlier betas and may see where it starts occurring.

I did some bisecting and determined that the problem we see here started at 93b1. With 92b5, all is OK. With the next beta 93b1 I see the count of folders go down.
It appears this is just a continuation or duplicate of bug 1734843 and I had previously determined something was bad here: bug 1734843 comment 95 and is definitely related to the elimination of panacea.dat. I vaguely recall there was a way to duplicate the problem on linux but I can't remember or find what that was. I'll check again tomorrow.
I think there was a follow on patch in a later beta that attempted to fix the problem in bug 1734843 but apparently it didn't completely work.
Note: Also, other than bug 1734843 there were several other bugs around the same time (5 months ago) that were very similar and caused by the panacea change.

I vaguely recall there was a way to duplicate the problem on linux but I can't remember or find what that was. I'll check again tomorrow.

I think I had to set this to see a problem in linux: bug 1734843 comment 54
Nils, do you have that anti-virus quarantine thing set?

Gene, please have a quick look at bug 1551173 ... some different info that may point to the same thing or something else. Its from TB60 and 3 years old but someone just added lengthy reproducing comments.

I guess the quarantine bug was fixed. Apparently I at least duplicated it on linux a while back: bug 1733966 comment 113. (There were several reports of the the same or similar issues back then and was hard to keep them straight.)
FWIW, linux with 98b3 is still solid based on your script but still quickly see a drop-off in file count when running 98b3 with win10. (I haven't yet updated to 99b.)
Not sure if Ben is looking at this since he is probably busy with other stuff. (I've never dealt with and tried to decode how .msf/mork files are handled. I suspect they nearing "end of life" so fixing bugs in them is probably a low priority.)

Updated script, this one also diffs the folderCache.json file. This might be something to try on Linux... maybe the files aren't disappearing there, but the total message / unread message counts may still be bouncing around. Might be a good data point.

Attachment #9267467 - Attachment is obsolete: true

Thanks! Been running it since last night. Still haven't seen any problems yet on linux.

Another potential data point. I had stated before that the problems appear to be "further down in the tree" ... I have a folder thats high up in the tree alphabetically, with 6500 mails / 950MB (essentially my main collective Inbox-type folder, also under the folders/ tree though) and that one has never had a problem. Maybe the problem is something timing related between folder / new mail searching, compacting, indexing? Two things trying to do something to a mailbox (at 10 minute intervals which is the mailcheck interval) and in the process stepping on a data structure?

As one of the original filers of this bug, I can confirm that my symptoms never occur in the folders that TB directly manipulates as part of mail processing. These files, of course, are continuously touched by TB infrastructure and their MSF files are being constantly regenerated.

Related observations for my symptoms (disappearing MSF files):

  • The disappearance is not wholesale, it's more like incremental rot over weeks.
  • The affected files can be any point in the file tree (I only go down about 3 levels).
  • The affected files are rarely touched by me or TB -- mainly, they're historical in nature -- and they are only added to/modified manually
  • The problem seems to be worse within encrypted directories (I'm on Windows, using EFS)
  • A looooong time ago, I turned off the glodia (?) indexer and that seemed to make the problem disappear until I turned it back on
  • Windows has two continuously-running processes that may be conflicting (via invisible/ignored locks) with glodia: the EFS itself and the windows indexer.
  • I don't know to what extent TB is locking MSF files, but it might be a good idea to do it always when running. No other app should be interested in or touching MSF files

I am seeing some changes detected in folderCache.json file by the script. Most look innocuous but some seem strange like changes in the nextUID count. It should just gradually go up and not make big jumps like this:

-		"nextUID" : 1139,
+		"nextUID" : 10738,

or

-		"nextUID" : 2,
+		"nextUID" : 5239,

I tried to find something in imap log but after several days the log got corrupted (lots of @ leading chars which is known bug in releases and beta that I have fixed in my daily build).
I'm also doing a "check for new mail on all folders" and checking every 1 minute so that's a lot of activity with the 1000+ folders.
The last nextUID change above was on my gds999 test folder containing 42 messages but no offline store. When I opened it I at first only saw 30 some odd messages. Then a bit later all 42 were there so not sure what is going on. The script didn't detect any file count deltas and I didn't see any downloading in the status bar.
Also, FWIW, I've had indexing turned off at General | Enable Global Search and Indexer during my tests.
Another thing I see is that folderStatus.json contains info on folders that have long been deleted.

So no changes in totalMsgs as in Bug 1759355 ? Guess there is a win/linux difference then. I'm still leaning towards timing corruption... some file access are slow / not atomic (or maybe even some data structure updates?!?) and that is for some reason more present on Win. Also, the folderCache.json for me contains old deleted accounts, I guess I should delete the file and let it rebuild... is that possible (and is it somewhat quick or does it take hours)? I don't want to test on my live system...

(In reply to Nils from comment #144)

I guess I should delete the file and let it rebuild... is that possible (and is it somewhat quick or does it take hours)? I don't want to test on my live system...

At one time I deleted the panacea.dat before the new folderCache.json format was developed. I think it basically forced a download on everything again in all accounts. I assume that would be true of folderCache.json too but not sure.

That will have to wait a week, I have a pretty busy week ahead and will try it afterwards. Will be interesting to see if it changes anything, or if everything stays the same. Any other ideas on what to test? At this point I'm hoping someone will look at the code and make suggestions a to what else to try or add additional debugging code.

There's a logging facility for folder cache, e.g.,
MOZ_LOG=MsgFolderCache:5,timestamp,sync
I just found it looking at Ben's patch. So far, don't see anything interesting on linux. I'll try it on windows where I actually see problems.

I get almost nothing in that log file except

2022-03-20 16:13:56.595000 UTC - [Parent 6264: Main Thread]: D/MsgFolderCache Loading C:\Users\nilsb\AppData\Roaming\Thunderbird\Profiles\gmlraczj.default-beta\folderCache.json
2022-03-20 16:13:56.620000 UTC - [Parent 6264: Main Thread]: D/MsgFolderCache AutoSave in 60s
2022-03-20 16:14:56.628000 UTC - [Parent 6264: Main Thread]: D/MsgFolderCache AutoSave

That's all I've seen too on linux.
I attempted to produce a daily build for win64 using the thunderbird "try" server but my credentials had expired (Re bug 1760466). So I've now done a local build on and for windows and am running that. So now maybe I can see something happening since I only saw the problem on windows.
The version I'm running now is 100a1 as built from comm-central latest. I'll do some testing now on windows and see if the problem still occurs.

Update -- failed again after a longer time. Adding more printf's to debug.
Found another maybe useful logging facility: MOZ_LOG=MsgDB:5 which is only used in mailnews/db/msgdb/src/nsMsgDatabase.cpp. And that's where summary files (.msf) get "blown away" if there's a problem with the .msf file. summaryFile->remove() does the deletion (called in 2 places but so far I've only seen the first one occur).
However, with MsgDB:5 set, I see no .msf deletions after a long time. Without it set and just IMAP:5 set, the error occurs after about 4 or 5 minutes usually (according to printf's I added to the tb code).
So kind of looking like the problem is somehow timing related. With MOZ_LOG=MsgDB:5,timestamp,sync set, windows is acting like what I see on linux, so far.
See https://searchfox.org/comm-central/rev/20d0dc6c8e767c9059ce80ac154b7df81a8f120e/mailnews/db/msgdb/src/nsMsgDatabase.cpp#1216

It appears with windows you maybe can't have more than 512 msf files open. TB appears to be trying to open them all with windows and runs into trouble due to too many open. I typically see problem with files gds500.msf to gds999.msf with win10 (as you say "top of the tree?). Files get deleted since they can't be opened and then they can't be recreated, per existing MsgDB logging and some extra logging I added.
With linux it appears that each of the >1000 .msf files is opened and then closed so you don't have a huge number open at the same time and no "invalid" database files (.msf) are detected and then deleted.
This based on logging with MsgDB:5 on windows and then on linux.
I also turned off "check all folders for new mail" on win10 and still saw .msf files going away, typically above gds500.

Nils, if you want to test with the additional logging for MsgDB:5 I now have a windows-64 "try" server build here:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=12e750d8aabf305154375969f87cf0bab5aa9bc1&selectedTaskRun=ZX8U7_xBTsqc0josXu5AEg.0
You can access the window installer by clicking on the "B" next to "x64 opt" and down below you click "Archive and Debugging" and then select target.installer.exe or you can just use this direct link to target.installer.exe:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ZX8U7_xBTsqc0josXu5AEg/runs/0/artifacts/public/build/install/sea/target.installer.exe
This just produces a daily (100a) version with my modifications to one file that you can see by clicking on the number that starts with 12e7 at the first link above. I added more detailed logging that are strings beginning with "gds: " in some added MOZ_LOG calls. Originally I just made these printf's but easier to tell what and when stuff happens when in the MsgDB log.

Ben, I don't know if you have been following this. Do you have any ideas on what the problem is?

Flags: needinfo?(benc)

FWIW in linux I just saw what looked like all my msf's open (more an 1200) simultaneously. They stayed open for a while but eventually they all suddenly closed (per the log) back to just a few open (~3) and no msf's were removed.

(In reply to gene smith from comment #151)

It appears with windows you maybe can't have more than 512 msf files open. TB appears to be trying to open them all with windows and runs into trouble due to too many open.

With linux it appears that each of the >1000 .msf files is opened and then closed so you don't have a huge number open at the same time and no "invalid" database files (.msf) are detected and then deleted.

eek... I don't have any instant insight on this except to note that it seems very odd that there's a difference between platforms here...

An aside:

.msf filehandle management seems to be serious voodoo. The code is absolutely riddled with little hacks to indirectly close/delete/reparse database files, and it's not always clear what the original intention of the hacks are. Lots of VCS archeology.
As you mention in Comment #138, the plan is to replace the Mork DB with something else, so there's not been a lot of effort expended in really understanding all those layers :-( I actually think that replacing mork with a different database in will be relatively easy - the hard part is going through all the places which have open/close/delete/reparse hacks and work out what they are trying to achieve... then doing it in a more explicit and obvious way.
I just wrote up Bug 1760925 as a place to figure out the filehandle management. Not an immediate help on this bug, but hey.

regarding the number of files open in windows...does this help?

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-170

(In reply to David Taber from comment #154)

regarding the number of files open in windows...does this help?

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-170

Thanks. Yes I saw that too. I'm not 100% sure that 512 is the default open file limit for windows 10/11 but, if so, looks like _setmaxstdio function would have to be added to window specific file handling code for mozilla. Per searchfox, it's not there now. So probably a hard sell. Also, sounds like _setmaxstdio increases the limit for the whole system and not just a single process.

On linux ulimit -Sn shows 1024 as the open file "soft" limit. But sometimes I see 1218 msf files open with no problem observed. So maybe the "soft" limit has some +/- tolerance?

See Also: → 1760925

Ben, Thanks for the info. I need to retest on window but it looks like I only see the problem starting at 93b1 from what I found in comment 134. This was when panacea.dat was changed to folderCache.json. Do you have any idea why that would affect this?

Richard, I know you have a lot of folders (probably way more than 1000). Do you see re-downloading or folders being inaccessible (error string: "Unable to open the summary file for <folder> on <account>. Perhaps there was an error on the disk or the full path is too long.",
especially with releases higher then 91 (the recent betas or dailies)? Thanks!

Flags: needinfo?(richard.leger)

Thunderbird 99 has 1,713 handles for me on win10.

I know Mac folks hit a limit as well - Bug 855751 and friends - TB forgetting folder view and display prefs after running out of file handles on Mac. (also on linux with similar config) "The file XXXX could not be opened. ..."

(In reply to Wayne Mery (:wsmwk) from comment #157)

Thunderbird 99 has 1,713 handles for me on win10.

I think the 512 default limit for windows is actually per process and only applies to files opened via "C Runtime Library". So you can have a lot more than 512 handles but spread over all the many processes. I don't know if moz/tb opens mork files via C Runtime or through some other library.

I still need to look at windows again, but apparently Mr. Bienvenu didn't expect more than about 50 msf files to be open at a time:
https://searchfox.org/comm-central/rev/9bcc0f011f2f38c2ad1104daaf5b6b7b2ac2aeca/mailnews/db/msgdb/src/nsMsgDatabase.cpp#901
I typically see less than 50 open but sometime all msf file (about 1257) become open for a while, based on my modified logging.

I know Mac folks hit a limit as well - Bug 855751 and friends - TB forgetting folder view and display prefs after running out of file handles on Mac. (also on linux with similar config) "The file XXXX could not be opened. ..."

I found something similar for mac where they run tb from a bash command and set the "ulimit" value to increase handles: bug 800279

(In reply to gene smith from comment #156)

Richard, I know you have a lot of folders (probably way more than 1000). Do you see re-downloading or folders being inaccessible (error string: "Unable to open the summary file for <folder> on <account>. Perhaps there was an error on the disk or the full path is too long.",
especially with releases higher then 91 (the recent betas or dailies)? Thanks!

Yes, in recent beta I do encounter inability sometime to time to get a list of message in folders.

This is what I get in TB 98.0b3 (64-bit), as reported recently: Bug 1593039 Comment 20

Same issue occasionally in TB 99.0b2 (64-bit) on Windows 10 Pro x64 with IMAP mailbox working online (Keep messages... option unticked).
If I click on other folder around the one I get the error with either parents or children I get the same error...

I usually work around the problem by clicking on the Inbox folder... then it works again... when I click back on any folder with which I had the issue... I am wondering if it could also be due to authentication...
Maybe something that is triggered when pressing Inbox but not a sub-sub-sub folder? For now I have really looked into the issue because I have not got too much time to do so and I can fix quickly... hoping it would be fixed at some point... while keeping up-to-date... :-)

But if you need something specific, let me know. Next it happens, I'll try gather console errors at the least... if that can help.

Flags: needinfo?(richard.leger)

(In reply to gene smith from comment #159)

(In reply to Wayne Mery (:wsmwk) from comment #157)

Thunderbird 99 has 1,713 handles for me on win10.

I think the 512 default limit for windows is actually per process and only applies to files opened via "C Runtime Library". So you can have a lot more than 512 handles but spread over all the many processes. I don't know if moz/tb opens mork files via C Runtime or through some other library.

per process, yes. but most of TB is under a single process.

I still need to look at windows again, but apparently Mr. Bienvenu didn't expect more than about 50 msf files to be open at a time:
https://searchfox.org/comm-central/rev/9bcc0f011f2f38c2ad1104daaf5b6b7b2ac2aeca/mailnews/db/msgdb/src/nsMsgDatabase.cpp#901
I typically see less than 50 open but sometime all msf file (about 1257) become open for a while, based on my modified logging.

I am familiar with this. You will want to look at mail.db.max_open, mail.db.idle_limit and mail.db.timestamp_leeway as related items.
afaik yes, it is possible to blow past the limits - I don't remember all the details - but startup is one of those times IIRC. And this is were the folder cache (panacea) comes into play.

virtual folders (and thus also unified folder) can have an impact in this area. (see bug 855751 comment 53, bug Bug 717574 , etc)

lastly, FWIW, the current limits are I think unrealisticly low, given today's mail environment with proliferation of accounts and folders. This harkens back to bug 589310 comment 36. As a result, some users experience poor performance, and must twiddle with mail.db.max_open, and mail.db.idle_limit. Long gone are the days of computers with 1-2gb of memory (when this framework was coded), when few users had more than 2-3 accounts and less than 50 folders.

And, it wouldn't suprise me if there were not a bug in the overall logic of the implementation of these limits.

Thanks Richard. Sounds like you are seeing the same thing.
I make a quick change in the "netwerk" win area to increase the number of open files:

diff --git a/netwerk/system/win32/nsNotifyAddrListener.cpp b/netwerk/system/win32/nsNotifyAddrListener.cpp
--- a/netwerk/system/win32/nsNotifyAddrListener.cpp
+++ b/netwerk/system/win32/nsNotifyAddrListener.cpp
@@ -679,17 +679,18 @@ nsNotifyAddrListener::CheckAdaptersAddre
  * address, sets mLinkUp to true.  Sets mStatusKnown to true if the link status
  * is definitive.
  */
 void nsNotifyAddrListener::CheckLinkStatus(void) {
   DWORD ret;
   const char* event;
   bool prevLinkUp = mLinkUp;
   ULONG prevCsum = mIPInterfaceChecksum;
-
+  printf("gds: call _setmaxstdio\n");
+_setmaxstdio(1500);
   LOG(("check status of all network adapters\n"));

Since this only runs once at tb startup only for windows, seems like an OK place to call the function for testing.
With my detailed logging I now don't see problems when TB attempts to hold open multiple .msf files. The count of the number of files in the .sdb container directory remains constant at 1000 for the sub-folder I created having 1000 folders. I'm capturing and logging the max number of open folders and now see over 1000 open in windows with no problems. Without the function call to _setmaxstdio I saw failures when the max open number reached about 470.

I've only been testing with normal folder setups, i.e., no virtual folders or search folders. I also think reporter Nils is also just using the basic folder setup.

Wayne, I don't completely understand mail.db.max_open and friends. It doesn't appear to be a "hard" limit but just an indication or suggestion that TB should "think about" closing some DBs when the limit is exceeded. Maybe that's what you mean by "blow past the limit" above. What I see is the max number of simultaneously open DBs gradually increases over time but eventually comes back down. Right now the max is 1088 but a few minutes ago the open number was just 4 (now it's 819). This is after running for over 2 hrs in windows with the handles increased. I have no idea if what I see is really related to panacea/folderCache changes (lot of other changes at 93b1 per the pushlog). Also, right now I'm running with mail.server.default.check_all_folders_for_new = true and check for new mail set to 1 minute so this causes a lot of DB/msf activity and logging of MsgDB:5 stuff.

Great that the debugging is turning up a few causes! The one thing that bothers me about Wayne's comment is that the issue is showing up throughout the usage, not just at startup. So maybe later the indexing / compacting / mailcheck / whatever are also opening a lot of handles at once?

(In reply to Nils from comment #163)

Great that the debugging is turning up a few causes! The one thing that bothers me about Wayne's comment is that the issue is showing up throughout the usage, not just at startup. So maybe later the indexing / compacting / mailcheck / whatever are also opening a lot of handles at once?

I can confirm issue I have observed may haapen randomely at any point in time (startup or much much later).
My subfoldrrs contains a small number of emails which I don't think are re-indexed each time I click on them. Though it is likely doing a mail check to sync changes with the server. As for compacting could not say because it is transparently done in the background. Though as I don't receive email directly in sub-folder nor content changes very often and only manually, I would be very surprised it may be the case on the folder I click. Unless you refer to a processing happening within the entire mailbox. But also why Inbox is not affected? Why clichking on Inbox and back on the folder helps work around the issue?

Anyway Gene I am glad you found a way to debug and thanks for attempting to fix, with a bit of luck your changes will do the trick ;-)

Based on my improved MsgDB:5 logging, it appears that the number of open DBs is always low at first but always gradually increases over time. So once the open file limit (default 512 for windows) is reached and TB tries to open another DB it gets an error. The error causes the un-openable DB (msf file) to appear bad so TB deletes the file and then tries to create and re-open it which also fails. This happens for all the msf files above 512, so they end up deleted.

However, sometimes most of the DBs get closed which releases the handles and then you sometimes see the situation where each file is opened (with the "create" option if non-existent), operated upon and then closed so the number of open files doesn't exceed 512 and then all missing MSFs exists again.

This appears, just looking at the log, as a totally random process. Turning off "check all folders for new mail", and even autosync doesn't really affect it. I still see over 1000 open DBs when I have the _setmaxstdio set to 1500 on window (but no Open DB errors occur with that set). Then I saw with autosync turned off over 1000 DBs open and are never closed.

Anyway Gene I am glad you found a way to debug and thanks for attempting to fix, with a bit of luck your changes will do the trick ;-)

I'm not sure that increasing the limit on the number of open files is really a solution. I would like to know why so many DBs end up opened at the same time. I think that's the real problems. I'm not sure how much this is caused or influenced by the panacea/folderCache transition or some other change at 93b1. All I know is that with 92b5 I didn't see msf files disappear with windows but I didn't run it with MsgDB:5 default logging or with my "improved" logging (re comment 134).

(In reply to gene smith from comment #162)

...
Wayne, I don't completely understand mail.db.max_open and friends. It doesn't appear to be a "hard" limit but just an indication or suggestion that TB should "think about" closing some DBs when the limit is exceeded. Maybe that's what you mean by "blow past the limit" above. What I see is the max number of simultaneously open DBs gradually increases over time but eventually comes back down. Right now the max is 1088 but a few minutes ago the open number was just 4 (now it's 819). This is after running for over 2 hrs in windows with the handles increased.

Yeah, bad wording on my part. max_open is there mainly to control memory usage IIRC, not for minimizing the number of file opens for its own sake.

That said, in my experience with users' problems (not my personal experience), high memory usage in Thunderbird creates performance problems in non-paging environments that I have never understood. Perhaps a high burden from gecko garbage collection ensues? But I really have no idea - never investigated it.

I have no idea if what I see is really related to panacea/folderCache changes (lot of other changes at 93b1 per the pushlog). Also, right now I'm running with mail.server.default.check_all_folders_for_new = true and check for new mail set to 1 minute so this causes a lot of DB/msf activity and logging of MsgDB:5 stuff.

indeed. check_all is not a performance option :) or at least that has been the party line.

bug 723248 implemented max_open and https://mzl.la/3JG65w8 hits many of the bugs since, including bug 939462 regression (but I think there are more) which shows some fagility and perhaps lack of tests.

Further reading in bug 347837 comment 2 and bug 787557 comment 2.

I bit the bullet and went ahead an un-patched/backed out the conversion from panacea.dat to folderCache.json : https://hg.mozilla.org/comm-central/rev/5891b788e8bd (except for unit tests) on my daily build a 97.0a. After rebuild, It causes a lot of header re-downloads but no a big deal for me since I mostly use no offline store. Anyhow, so far I only see a max simultaneous open of 29 DBs, even after lots of updates due to switch from folderCache.json back to using panacea.dat.
I'm testing this on linux but I see the same potential problem on linux as on windows but apparently linux allows over 1000 open files by default. But I'm not not getting anywhere near that with panacea.dat (29). With folderCache.json in effect, on linux (and windows with _setmaxstdio) I eventually see over 1000 max simultaneously open DBs and sometimes they all remain open.
I've been letting it run with panacea.dat now for about 2 hours so maybe I'll see more then 29 open at same time eventually. Right now it is showing only 2 simultaneously open.

If caused by Bug 418551 - Convert nsFolderCache from mork (panacea.dat) to JSON-cpp - then for beta the regression started 93.0b1

Is there anything to test at this point, or is the issue narrowed down enough that a dev needs to look at it? I was busy last week, but can look at things again this week if needed. On the other hand, I could also use a working TB (91) that doesn't keep slowing my machine down...

(In reply to Wayne Mery (:wsmwk) from comment #168)

If caused by Bug 418551 - Convert nsFolderCache from mork (panacea.dat) to JSON-cpp - then for beta the regression started 93.0b1

From what I see with my test setup to simulate Nils' setup (1000 mostly empty folders on top of my existing ??? folders) I don't see the problem with panacea.dat in use by backing out patch for bug 418551 from my 97a daily.
I have no idea why folderCache.json holds open many more DB file handles than panacea.dat, however even with panacea.dat there can be a too many open handles on all platforms, as has been reported in previous bugs.

A partial work-around is to call the _setstdiomax() with a bigger value for windows. Maybe a new pref could be added to allow this to be manually set above the OS default limit and could apply to all platforms (call ulimit() for linux and mac, I think). However, don't know where this should occur in the code base and I suspect it might need to be somewhere in platform specific moz-central code.

(In reply to Nils from comment #169)

Is there anything to test at this point, or is the issue narrowed down enough that a dev needs to look at it? I was busy last week, but can look at things again this week if needed. On the other hand, I could also use a working TB (91) that doesn't keep slowing my machine down...

Since I can pretty much duplicate the problem you see, I don't know how useful it would be for you to do more testing. I could maybe provide you with a "try" build of the latest daily that increases the number of open handles so you don't see the bug. I guess it might be good to know if that definitely fixes the problem for you. Let me know and I'll produces the build. Otherwise, if you don't want to do that, I guess going back to 91, for now, would be best.
Thanks for all your work on this!

If you can do that, sure, I'll test. I am running 99b2 right now... would you patch that, or would I have to downgrade to 97? I would rather stay with a 99 build since it fixes some teams calendaring bugs that I was seeing earlier.
Thanks to you for bisecting and nailing it down! I would suggest closing Bug 1759355 as a dupe of this since it seems pretty sure to be the same thing if you agree.

Here's a try build with today's daily 100 something I think. So it should have the fixes you want in 99b2.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=c62b744001d30ce670550e5ecd498fe87545199e
Comment 151 describes how to download this build. All this does is update the logging some from the build of comment 151. Just set MOZ_LOG to MsgDB:5,sync,timestamp and when you see a log line start with "gds:" it's an error of some sort. Mostly you should see database files getting Open() and Close() and in the brackets you will see number of currently open files and, in parens, the maximum number of simultaneously open files. So once you start seeing the number of currently open files exceed about 500 you should start seeing errors and DBs will be removed and attempted to be re-created and that fails too, until files get closed, then they come back (usually).

I also want to add in the call to _setstdio() in the next try build, but I need to figure out how to do that since the "try" server is for the comm-central (tb) code and I need to "push" the mozilla code. (I was told a while back how to do this but I don't remember. Should have made notes...)

See below
Here's how you make a "try" build with a modification of mozilla code:
bug 1657068 comment 12. So basically you have to do a push to try server for mozilla first with the mozilla mod in place. Then you tell the tb build (in comm/.gecko_rev.yml -- a hidden file) which try server and try commit to use to build tb. FYI, for the record, here's the push and build of the mozilla code (you don't need this):
https://treeherder.mozilla.org/jobs?repo=try&revision=051be40a6bf0077b26b6405543d8ddfd2b93e405

Here's the TB build with the improved logging AND with the file handle increased to 1500:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=56f7de3264cce847fdecc0c0fcff00fb2e44c6c0

I'm running this right now (daily shows version "100.0a"). This still running folderCache.json and NOT panacea.dat and I am seeing max count of open DBs as 915 and don't see any .MSFs getting removed. I'm running with "check all folder for new mail" and checking new mail every 1 min.

In the log when a DB is opened you will see, for example, this:
nsMsgDatabase::Open(numOpen(max)=703(1085),... followed by the msf path and other stuff.
This means there are currently 703 simultaneously open DBs and since startup, the max number of simultaneously open DBs is 1085. (Gone up some from above but still no MSF removals errors since haven't exceeded 1500 open.
:
:
Well, doing some more testing I see a problem. Connection is being dropped from imap servers and I don't know what is causing it. So you should probably hold off on running this for now.

The problem was with my "self-signed" certificate I use to access my local dovecot. I think dovecot was updated since I last accessed it and so I had to click "get mail" and confirm the exception. So should be OK to test using the link above. (I saw the same problem even with release 91, so not a "daily" regression.)

I've notice when I access a folder that hasn't been accessed in a while I will see an attempt to first access the folder from a "temp" directory like this: nsMsgDatabase::Open(numOpen(max)=804(804), C:\Users\gene\AppData\Local\Temp\MozillaMailnews\gds987.msf and this always fails. This appears to produce a "gds:" error but it's just trying to removing a non-existent file in the temp directory. Then it goes ahead and opens the proper file: nsMsgDatabase::Open(numOpen(max)=805(805), C:\Users\gene\AppData\Roaming\Thunderbird\Profiles\ew29pz1v.default-release\ImapMail\wally.dbnet.lan\4.sbd\rubish.sbd\baskette.sbd\truck.sbd\big-truck.sbd\gds987.msf.

You may also see a few "gds:" errors at first if there are any corrupt or maybe orphan DBs files that get re-created. You may also see any folder that were not present when you last shutdown tb (due to this bug) be re-created (you'll see Open() parameter create=true in the log).

Gene,

Why folderCache.json holds open many more DB file handles than panacea.dat, and why they are not closed, remain a mystary, I could not find out either.

Does an nsMsgDatabase::Close exists (similarly to a fclose for fopen), that is either not used to close an opened DB file when no longer needed or not properly closing the file without raising an error? Maybe is such function is used it, it is not applied to the righ file pointer, or run via a different process or thread that do not have access to the opened file pointer?
If such function exists it could be interesting to look at what it closes, and if it succeed doing so...

Or no such function exists and the Garbage Collector (or else mechanism) is relied on for this task of closing opened file no longer needed and it never does it (thinking the file keep being used), do it via the wrong process and fails silently or di it but not fast enough?
Are DB files are opened faster than they are closed, if closed at all?

I think with your tests to reproduce and logging process you are on the path to discover why DB files are opened and not close before reaching a limit, which seems the root cause of the issue.
While your fix maybe a nice workaround to temporarily fix the issue due to reaching a limit set, by extending the limit, a follow-up bug may need to be raised to investigate why the limit is reached in the first place.
I suppose it may be your plan unless it is figured out altogether here.

Richard, It looks like the close of unneeded DB occurs at the javascript level here: https://searchfox.org/comm-central/rev/5d5a8435a8f71f844985fd3048f6e6c61f90fd36/mailnews/base/src/MsgDBCacheManager.jsm#115

I think the idea is if more then 30, mail.db.max_open, are open they will be closed. But it will allow more than 30 to remain open if they are associated with a "window". My guess, without proof, is that with folderCache.json the open DBs are incorrectly associated with a window (even though there are definitely not sometimes >500 folders open in a window). But with panacea.dat the false association with a window does not occur as much.

I suppose it may be your plan unless it is figured out altogether here.

I don't have a lot of plans :).

Just updated to TB101b1, issue is still present.

I really hope someone can fix this it is really annoying and slows the system to a crawl periodically.

Nils, Did you ever try running the "try" build pointed to by the 2nd "treeherder" link in comment 173 above? If so, did it make a difference?
Anyhow, it now appears that the "try" build outputs have been deleted from the try server due to time limits on storage.

I haven't looked at this in over a month and it's really not something I completely understand or have worked with in the past. Maybe the developer who did the conversion from panacea to foldercache should look at this?

Flags: needinfo?(vseerror)

No, real life / work got in the way for a few weeks. Not sure how to get that developer to look at this... I added a comment to bug 418551 @BenC seems to be the one who did the work, and he is CCed on this bug. and commented https://bugzilla.mozilla.org/show_bug.cgi?id=1726319#c153 and is tagged in the needinfo

Keywords: regression
Regressed by: 418551
Flags: needinfo?(vseerror)
Flags: needinfo?(benc)

(In reply to gene smith from comment #178)

Nils, Did you ever try running the "try" build pointed to by the 2nd "treeherder" link in comment 173 above? If so, did it make a difference?
Anyhow, it now appears that the "try" build outputs have been deleted from the try server due to time limits on storage.

Gene, can you do the try build again, assuming this is a good time for Nils to quickly pick it up?

I haven't looked at this in over a month and it's really not something I completely understand or have worked with in the past. Maybe the developer who did the conversion from panacea to foldercache should look at this?

That's still Benc, as Nils pointed out

Flags: needinfo?(lohner)
Flags: needinfo?(benc)
OS: Windows → All
See Also: → 1762327

Yes, I would have time to install / run it

Flags: needinfo?(lohner)

(In reply to Nils from comment #181)

Yes, I would have time to install / run it

Ok, sorry my usual late start but I'll re-do the "try" build right now and let you know when it's ready. Thanks!

The build is running here: https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=8ff8a53012ec0661f1182ff1bc2df90f66538591
Not sure it will succeed and I haven't retested it since last time. If it doesn't work, I'll try again but now some storms are moving through the area so thought I'd go ahead and post the link in case power goes off here.

Ok, the build finished and seem to work with preliminary testing. Click the green "B" and under "Artifacts and Debugging Tools" download target.installer.exe and install custom so it goes to the Daily install directory (or where ever you want).
Probably need to define env var MOZ_LOG with "MsgDB:5" to to record the results.
This should now allow up to 1500 open files in windows instead of default of about 500.

OK, downloaded and installed. What is the easiest way to point it at my current profile? Also, can I go back to 101b1 if something else does not work here?

(In reply to Nils from comment #185)

OK, downloaded and installed. What is the easiest way to point it at my current profile?

https://support.mozilla.org/en-US/kb/using-multiple-profiles

Also, can I go back to 101b1 if something else does not work here?

You should be OK probably for a couple weeks (As long as beta hasn't diverged too much and hasn't migrated some data to a different format. I'm not aware of any migrations coming up)

Flags: needinfo?(nicolopiazzalunga)
Flags: needinfo?(nicolopiazzalunga) → needinfo?(lohner)

(In reply to Nils from comment #185)

OK, downloaded and installed. What is the easiest way to point it at my current profile? Also, can I go back to 101b1 if something else does not work here?

FWIW, I run with a command line like this: MOZ_LOG=MsgDB:5,timestamp,sync MOZ_LOG_FILE=~/tblog ../obj-x86_64-pc-linux-gnu/dist/bin/thunderbird --allow-downgrade -p
so I can move to an different TB version without creating a new profile (--allow-downgrade) and show the profile selection/creation prompt on startup (-p) and to log the message DB commands (MsgDB:5). Probably no need to log with IMAP:5 for your testing purposes of this bug.

I'm running it (102.0a1 (2022-05-06) (64-bit)) now, for now its slow because its downloading mails from previous corruptions. I'll let it catch up and then see if re-corrupts.

One other thing... sometimes it will re-download one of the folders blazing fast, other times crawling all the way, still other times it will start fast and then slow down after a few hundred messages. I wonder if that is part of this bug, or if that is because an indexer, folder compacter, mailcheck, whatever is happening during the redownload leading to multiple parallel accesses of the .msf ... I wonder if it should be locked if there is a complete redownload.

Please let me know what you think, I'm tempted to file a new bug about this since I've seen other people comment on the fast-then-slow downloading of mails in one folder before.

Do you have 100s or 1000s of folders to re-download? If so, are you selecting them to trigger the download or just staying at one folder and letting TB's "autosync" trigger the download to the others? I'm not sure if this makes a difference but with many folders re-syncing in parallel some may be faster than others.

I have about 700 folders total. Sometimes I retrigger the download by clicking on the folder, sometimes it happens automatically (autosync). The one yesterday was autosyncing. I haven't found a correlation to when it is fast / slow, but a "single folder resync and nothing else" option could drastically speed things up, esp. given the mailserver is local.

To answer the second part of the question, small folder seem to download faster, the large ones I see a lot in the status bar at the bottom autosyncing.

So after a little over 24h of running it seems to be OK. Initially there were redownloads to fix the previous corruption / deletion, but since then everything has been stable. I would say that Gene's solution qualifies as a workaround, but obviously a real fix is still needed.

Flags: needinfo?(lohner)

Hi Nils, that's good to hear. Have you been running with MsgDB:5 logging? If so, I think my logging additions should record the number of currently open files and the maximum number of simultaneously open files since startup. I'd be curious to know what that shows with your 700 folders.

Oops... I set the debugging options but forgot to enable the logfile itself : ) Just restarted with logging.
From looking just now, it went up to the 60s within the first minute, peaking just under 600. I'll post a log with the folder names cleaned ( sed 's/C:.*[^ ]/FOLDER/' FC_logging.log.moz_log )

Attached file MsgDB_cleaned.log (obsolete) —

sed 's/C:.*msf/FOLDER/' FC_logging.log.moz_log > MsgDB_cleaned.log

After looking again today, 594 seems to be the "max" number I get. What surprises me is the open / close sequence... in the logfile you sometimes have a mix of open/ close where the open number stays stable, and you have a ton of opens where the closes happen a lot later, these are what leads to the problem. Can you in any way log what leads to the openings / closings? Identifying that could be a quick /dirty but fully adequate fix for this.

Flags: needinfo?(gds)

Looking at the past few days, it never goes above the 594 I mentioned list comment. It does hit that number again every now and then though. Is there a way of figuring out what is driving the open a ton of folders / keep them open a while / close a ton of folders / then resume normal open-a-few, close-a-few operations. If you want the full log let me know.

Flags: needinfo?(gds)
See Also: → 1396655

max open remains at 594 even over the course of several days... question is why there are sometimes "open" burst with a lot of folders being opened, that then remain for quite a while and then are later closed in bulk. If they would open and close, or not open as many at once the problem would not occur.

Attachment #9275940 - Attachment is obsolete: true

adding :aceman to CC ... maybe he has ideas on how to debug further

Nils, sorry for the delay in getting back to you on this. I have been working on it and I've found where the many "opens" are occurring. It's in an area I've dealt with very little in the past -- the code that "purges" out old messages if the user has configured it. Every 5 minutes the code looks at each folder to see if the message retention settings have been changed by the user. The information is in the FolderCache.json file so the code looks there first. Specifically it's looking for a folder string attribute called "LastPurgeTime" here: https://searchfox.org/comm-central/rev/48468049ddceab57193e15c3ef92ec1b59fef59e/mailnews/base/src/nsMsgPurgeService.cpp#169. Almost every place "LastPurgeTime" is found in FolderCache.json it has the value: "LastPurgeTime" : null,. GetStringProperty() here https://searchfox.org/comm-central/rev/48468049ddceab57193e15c3ef92ec1b59fef59e/mailnews/base/src/nsMsgDBFolder.cpp#2003 fails when it tries to read from FolderCache.json so it then opens the database to try to find the value for LastPurgeTime. It doesn't find it there either and the database file remains open.

I don't know why the "null" value for LastPurgeTime is even present in the FolderCache file. Some folders don't have it at all and they don't cause a the database file to get opened, only the ones with "LastPurgeTime" : null, cause the database file to become opened.

I found maybe 3 ways to "fix" or work-around the problem:

  1. Edit each "LastPurgeTime" : null, line to have real date some time in the past, e.g., "LastPurgeTime" : "Mon May 16 00:51:23 2022",
  2. Remove each LastPurgeTime : null, line. Not sure if they don't eventually come back, however.
  3. In the code, just ignore the failure to read from cache due to the "null" value when propertyName is "LastPurgeTime" like this diff:
diff --git a/mailnews/base/src/nsMsgDBFolder.cpp b/mailnews/base/src/nsMsgDBFolder.cpp
--- a/mailnews/base/src/nsMsgDBFolder.cpp
+++ b/mailnews/base/src/nsMsgDBFolder.cpp
@@ -2006,18 +2006,21 @@ nsMsgDBFolder::GetStringProperty(const c
   nsCOMPtr<nsIFile> dbPath;
   nsresult rv = GetFolderCacheKey(getter_AddRefs(dbPath));
   if (dbPath) {
     nsCOMPtr<nsIMsgFolderCacheElement> cacheElement;
     rv = GetFolderCacheElemFromFile(dbPath, getter_AddRefs(cacheElement));
     if (cacheElement)  // try to get from cache
       rv = cacheElement->GetCachedString(propertyName, propertyValue);
     if (NS_FAILED(rv))  // if failed, then try to get from db
     {
+      // temp test gds: Ignore error and don't open DB. propertyValue stays blank.
+      if (!strcmp(propertyName, "LastPurgeTime"))
+        return NS_OK;
       nsCOMPtr<nsIDBFolderInfo> folderInfo;
       nsCOMPtr<nsIMsgDatabase> db;
       bool exists;
       rv = dbPath->Exists(&exists);
       if (NS_FAILED(rv) || !exists) return NS_MSG_ERROR_FOLDER_MISSING;
       rv = GetDBFolderInfoAndDB(getter_AddRefs(folderInfo), getter_AddRefs(db));
       if (NS_SUCCEEDED(rv))
         rv = folderInfo->GetCharProperty(propertyName, propertyValue);
     }

I'm sure 1 and 2 are not acceptable fixes. I'm unsure about the patch and think there is certainly a better way to fix this problem. But don't know what it is.

Another way to work around the problem is to "disable" the purge timer. It appears that setting this pref does it:
mail.purge.timer_interval = -1 and also maybe mail.purge.min_delay = -1. Not sure if this disables the timers or just makes them very long.

Thanks for jumping down the rabbit hole!
I'm going to just think out loud here...

  1. Why would this problem start occurring after the move from panacea.dat to folderCache.json? Maybe the LastPurgeTime is not set / read / written to the file?
  2. If the folder has not been purged, then why should it remain open? I would either check if it needs to be purged and run the purge, or just close it again (which should set the LastPurgeTime)
  3. Googling a little I found https://www-archive.mozilla.org/mailnews/arch/spam/purge from 2003 (...) so maybe a look at the purge service (is it doing something? if not, that would explain why everything remains open so long...) might give a hint (more rabbit hole?)
  4. Frankly, considering where you are applying the patch I think it's in the completely wrong location. This feels more like a "hopeful band-aid" than a fix
  5. Looking at the calling code, I would return NS_NOT_OK (or whatever the negative variation of NS_OK is) to trigger line 171 if (NS_FAILED(rv)) in the caller
  6. Set the last_purge_time to something older ... maybe a year ago so that it is not NULL? ... that might require more understanding of the purging process etc. though to be sure it's OK

That's all I can think of for now

See Also: → 637352

Nice find! Is this a regression on that, or a case that wasn't covered?

I don't think that is a regression. Anyhow, still can't figure out where "LastPurgeTime" gets set to null in the FolderCache.json.

See comment 200 for more details.
I don't know that this is the best fix for the problem but it does work. The problem is that with default "retention setting", where all messages are retained and none are purged on the default 8 hr interval, there appears in folderCache.json (a.k.a. cache file) for each folder an element "LastPurgeTime" : null,. When this is read on the purge polling interval (default 5 min.) an error occurs when read from folderCache.json and then an attempt is made to read it from the folder's database file. This ends up possibly opening many folders when the user has a lot of folders -- sometimes more than the OS limit, e.g., 512 for windows. This causes the database file (e.g., folder.msf) to be deleted and then causes re-downloads.

So rather than trying to open the DB file when LastPurgeTime fails to read from cache, the patch just goes ahead and allows the purge process to continue and attempt the purge (if folder not using default retention settings) which sets a valid time string into cache file for LastPurgeTime so on subsequent intervals LastPurgeTime is successfully read from cache and the DB does not need to be opened to read LastPurgeTime.

I think maybe a better fix would be to set the initial value of LastPurgeTime in the cache to the current time. However, I haven't been able to find where or how the initial value of LastPurgeTime is set to null in FolderCache.json. Also, FWIW, on initial creation of FolderCache.json, LastPurgeTime is not present at all for all folders and only appears a few minutes later with a null value. It seems to somehow originate from the DB file (folder.msf) but I also can't find how that happens.

Assignee: nobody → gds
Attachment #9277044 - Flags: review?(benc)

From your explanation this makes sense, and I think this is a good solution as it eliminates potential future problems if the initialization ever changes, fails, or whatever. This makes it more robust, and if the purge happens and the current time is set upon a LastPurgeTime=NULL then it fixes a potential error state.

That said, maybe an additional bug is hiding in the (non-) initialization somewhere...

There was an error in the comment. I said it returns an error but the patch actually returns NS_OK. (Returning NS_OK allows the check for need to purge and actual purging to occur even when "LastPurgeTime" is not present or null.)

I also changed other wording in the comment to maybe be clearer. Otherwise, there are no functional changes from the original patch.

Note: LastPurgeTime will get set to a valid time string only if and when the folder is purged, depending on user's retention settings. Also, it appears the LastPurgeTime is only put in the cache file, with an initial null value, if it has been attempted to be read from cache but only after "useServerRetention" has been placed in the cache file.

Attachment #9277044 - Attachment is obsolete: true
Attachment #9277044 - Flags: review?(benc)
Attachment #9277690 - Flags: review?(benc)

Comment on attachment 9277690 [details] [diff] [review]
1726319-fix-too-many-open-files.patch --v2

LGTM (even if we might be scratching our heads over such a specific exemption 5 years from now :- )

I wonder if the null LastPurgeTime is an artifact of the mork -> json migration? (i.e. my fault!)
Or maybe just some oddness that snuck into the original mork db, pre-migration.
Or maybe even a subtle difference in how the json-based GetCachedString() works, versus the old mork version. Hmm.

In any case, it's in the wild so we need to handle it, and this seems like a good work-around. The comment explains it all nicely.

Flags: needinfo?(benc)
Attachment #9277690 - Flags: review?(benc) → review+

I wonder if the null LastPurgeTime is an artifact of the mork -> json migration? (i.e. my fault!)

Folders that are enabled for purge (either by overriding the server specific setting or enabled for all folders for a server via server specific settings) seem to import from panacea OK (LastPurgeTime has a last time string and useServerRetention is 0 if overridden or 1 if following server settings). It's the folder that are not enabled for purge that have the null value LastPurgeTime in folderCache.json that cause the issue where the DB is opened for each of them and not immediately closed(*). But for these not enabled folders, the original panacea didn't have an entry for LastPurgeTime or it it also had an empty value. (I'm not good at reading mork files.) So not sure what else the import could do except maybe set the would be null LastPurgeTime to something, maybe the current time or just any string like "not-yet".

Anyhow, I tested again with import of an old panacea and it worked OK. After running with folderCache.json a while and seeing the proper purging going on for enabled folders, I then shut down tb and deleted folderCache.json and restarted. Now with panacea gone, folderCache.json was rebuilt and purge happened on enabled folders. The only minor issue I see is that "useSeverRetention" remains null for purge-enabled folders causing the db to be opened and then quickly closed but it's not a big problem. Ideally useServerRetention should get written to 1 or 0 in folderCache to provide faster access than opening the DB file to get the value.

(*) As an alternate fix I tried to do a close of the DB after trying to read LastPurgeTime from the DB. But with over 1000 folders in my test, the 500ms time was exceeded resulting in some folders never getting checked and possibly purged: https://searchfox.org/comm-central/rev/4705f5b39fd03ad28930ed62570a595f50c4dc5c/mailnews/base/src/nsMsgPurgeService.cpp#204

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/aba71e533211
Fix for too many summary file DBs open simultaneously r=benc

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch

(In reply to Ben Campbell from comment #207)

I wonder if the null LastPurgeTime is an artifact of the mork -> json migration? (i.e. my fault!)

If you look at bug 1773822 comment #9 you can see that retrieving an attribute for all folders in the system just kills you.

Keywords: perf
Version: Thunderbird 98 → Thunderbird 93
See Also: → 672835
See Also: → 1839063
You need to log in before you can comment on or make changes to this bug.