Closed Bug 1912195 Opened 1 year ago Closed 1 month ago

Message headers are re-downloaded at every startup [regression from 115.13.0 to 128.1.0]

Categories

(MailNews Core :: General, defect)

Thunderbird 128
defect

Tracking

(thunderbird_esr140 wontfix, thunderbird144 wontfix)

RESOLVED FIXED
145 Branch
Tracking Status
thunderbird_esr140 --- wontfix
thunderbird144 --- wontfix

People

(Reporter: pmenzel+bugzilla.mozilla.org, Assigned: mkmelin)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, regressionwindow-wanted)

Attachments

(1 file, 2 obsolete files)

Steps to reproduce:

On a Kaby Lake Dell XPS 13 9360 with Debian sid/unstable, the package thunderbird was upgraded from 115.13.0 to 128.1.0:

upgrade thunderbird:amd64 1:115.13.0-1 1:128.1.0esr-1

Start the new version of Thunderbird.

Actual results:

The fan starts spinning and all message headers are re-downloaded. I am unable to see the latest message in my inbox.

Expected results:

All messages should be accessible in the inbox right away, and the fan should not start spinning.

Closing Thunderbird 128.1.0 and starting it again, it downloaded the message headers again, the fan started spinning, and I was unable to access all my messages during this time.

Nachrichten-Kopfzeile X von 38733 in Posteingang wird heruntergeladen…

Keywords: regression
Summary: Upgrading from 115.13.0 to 128.1.0esr results in all inbox files to be redownloaded → Message headers are re-downloaded at every start [regression from 115.13.0 to 128.1.0]
Blocks: tb128found
See Also: → 1912581

See bug 1912581 comment 2 and bug 1912581 comment 3 for initial discussion about this bug report.

Reporter Paul, So the re-download of header occurs on every start? How many imap accounts and type of imap server(s) do you have. Are most of the message in Inbox or do you have messages distributed across multiple folders? If so, about how many folders? Do all the folders require re-download when accessed? Do you let the download(s) finish before restarting TB?

An IMAP:4 log may help in determining what the problem is. You can find info on how to record the log here: https://wiki.mozilla.org/MailNews:Logging
FYI, on linux, to run TB and record the log I just run this in a terminal (assumes thunderbird executable is on the $PATH):
MOZ_LOG=IMAP:4,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
The log goes into ~/tblog.moz_log.

(In reply to gene smith from comment #2)

Reporter Paul, So the re-download of header occurs on every start?

Yes, it does.

How many imap accounts and type of imap server(s) do you have.

Four IMAP accounts. What do you mean by “type”?

Are most of the message in Inbox or do you have messages distributed across multiple folders?

Around 428.000, the most, are in my inbox with the problem, and several thousand in other folders.

If so, about how many folders?

Over fifty folders in total in all three accounts?

Do all the folders require re-download when accessed?

It does not look like it. Only for the inbox.

Do you let the download(s) finish before restarting TB?

Yes, I do let it finish.

An IMAP:4 log may help in determining what the problem is. You can find info on how to record the log here: https://wiki.mozilla.org/MailNews:Logging
FYI, on linux, to run TB and record the log I just run this in a terminal (assumes thunderbird executable is on the $PATH):
MOZ_LOG=IMAP:4,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
The log goes into ~/tblog.moz_log.

I am going to send you the URL per email.

Hi Paul,
Thanks for the log. I don't see anything obvious as to why all the headers get downloaded on each startup. I was hoping to see something like "uid validity" bad, but I don't.

Could you look at your files in your profile and see if any are greater than 4 GB? Since you don't use offline store there should be no mbox files (e.g., Inbox) but there should be a files called Inbox.msf (or <whatever>.msf) containing the headers/database for each folder. By any chance are these greater than 4 GB? Also, by any chance are you running 32-bit TB? I don't think that files greater than 4 GB are a problems with 64-bit TB but possibly are for 32-bit TB starting with 128esr (Re: bug 1911076, now fixed at least for mbox, not sure about .msf).

If no files are >4 GB or you are running 64-bit TB, can I ask you to run another log? It would probably be better to first disable all but the account containing the 400000+ messages in Inbox. You can do this in server setting by temporarily un-selecting/un-checking two items on all but the one account: "Check for new messages at startup" and "Check for new messages every X minutes". This will allow the log to contain just the relevant info for the "worst case" account having the largest number of headers to download. Do this before starting to record a new log.

When recording the new log, I want to ask you to add another parameter to the MOZ_LOG var, IMAP_CS:5. This is for imap CONDSTORE. Even though you don't have it enabled in TB preferences, this might provide more info in the log as to why headers are getting re-downloaded. In my example above, you can include IMAP_CS:5 like this:
MOZ_LOG=IMAP:4,IMAP_CS:5,timestamp,sync MOZ_LOG_FILE=~/tblog thunderbird
Thanks again!

(In reply to Paul Menzel from comment #0)

Steps to reproduce:

On a Kaby Lake Dell XPS 13 9360 with Debian sid/unstable, the package thunderbird was upgraded from 115.13.0 to 128.1.0:

upgrade thunderbird:amd64 1:115.13.0-1 1:128.1.0esr-1

I suppose this means the upgrade installed 64-bit TB. As I'm sure you know, you can check this with "Help | About" menu.

I suppose this means the upgrade installed 64-bit TB. As I'm sure you know, you can check this with "Help | About" menu.

Yes, it’s 64-bit: 128.1.0esr (64-Bit)

$ file /usr/lib/thunderbird/thunderbird
/usr/lib/thunderbird/thunderbird: symbolic link to thunderbird-bin
$ file /usr/lib/thunderbird/thunderbird-bin
/usr/lib/thunderbird/thunderbird-bin: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=c33364db892bc5a585b7fe3d8c6a4f823d3b2cf2, for GNU/Linux 3.2.0, stripped

PS: I am going to read and try to do the things in your other comment tomorrow.

Yes, that's what I thought (I should have checked comment 0 first). Anyhow, may still be good to know the size of the .msf for the largest folder containing the 480000 emails. Also, are you seeing "new" .msf files getting created on each startup indicated by the name, e.g., Inbox-X.msf, where X is 1, 2, 3.... etc?

(In reply to gene smith from comment #7)

Anyhow, may still be good to know the size of the .msf for the largest folder containing the 480000 emails.

It’s 853 MB.

Also, are you seeing "new" .msf files getting created on each startup indicated by the name, e.g., Inbox-X.msf, where X is 1, 2, 3.... etc?

lsof -pX shows no such files to be created/opened.

Ok, that sounds like a reasonable size for 400k messages, I think.
I guess it is still re-downloading on each TB start? If so, how long does it take to finish the download? Even if server is on localhost it will take, just guessing, 15 minutes for 400k headers (maybe more).
:
I looked again at the log you sent the link for. It doesn't actually show all of the 400k headers getting re-downloaded. It actually shows about 39854 headers downloaded and it takes about 3 minutes per timestamps. But I still can't tell why they are downloaded. It looks like they are only getting downloaded for the 400k msgs account, not the others which, per "select/exists" responses have 33, 2283, 1116 and 19145 messages in each account inbox.
What I'm seeing is that TB is fetching the headers for message with UID >= 484229. You can select to show a new column heading "Order Received" to see the UID of each message and then sort the messages list on that column. Do you see anything unusual about the messages starting with UID 484229 and higher which are the "newer" messages in Inbox?
TB is supposed to keep track of the highest UID seen and only fetch newer ones, and not re-fetch headers it already has.

@Benc, Ben, do know of any possible problems with storage in .msf files (no offline store) when a huge number of messages (e.g., > 400,000) are in a mailbox? (The most I have in any mailbox is around 40k and just for testing purposes.)

@Paul, If you can provide another log as requested in comment 4 (with added IMAP_CS:5) it might provide some info.

Flags: needinfo?(benc)

(In reply to gene smith from comment #9)

[…]

I guess it is still re-downloading on each TB start?

Yes, it does.

If so, how long does it take to finish the download? Even if server is on localhost it will take, just guessing, 15 minutes for 400k headers (maybe more).

Actually, it’s faster than I expected. Some days ago the counter increased by 399 messages, today by 405 messages, so maybe IMAP supports bulk downloading.

I looked again at the log you sent the link for. It doesn't actually show all of the 400k headers getting re-downloaded. It actually shows about 39854 headers downloaded and it takes about 3 minutes per timestamps.

Indeed, not all are downloaded. Today 40509.

But I still can't tell why they are downloaded. It looks like they are only getting downloaded for the 400k msgs account, not the others which, per "select/exists" responses have 33, 2283, 1116 and 19145 messages in each account inbox.

Yes, only for that one from what I can see.

What I'm seeing is that TB is fetching the headers for message with UID >= 484229.

Good find.

You can select to show a new column heading "Order Received" to see the UID of each message and then sort the messages list on that column.

That’s my default to not miss messages that took longer to be delivered, and would not show up at the top with date sorting.

Do you see anything unusual about the messages starting with UID 484229 and higher which are the "newer" messages in Inbox?

484228 is an calendar acceptance message with an ICS file attached:

Content-type: multipart/mixed; boundary="Boundary_(ID_qyG4ZdjoAsiZ+Jo19dCbWQ)"


--Boundary_(ID_qyG4ZdjoAsiZ+Jo19dCbWQ)
Content-type: multipart/alternative;
 boundary="Boundary_(ID_ryU4ZdJoASiZ+Jo21dCbwA)"


--Boundary_(ID_ryU4ZdJoASiZ+Jo21dCbwA)
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: 8BIT

X Y <x.y@mpcdf.mpg.de> has accepted your event invitation.


--Boundary_(ID_ryU4ZdJoASiZ+Jo21dCbwA)
Content-type: text/calendar; method=REPLY; charset=UTF-8
Content-transfer-encoding: 8BIT
[…]

TB is supposed to keep track of the highest UID seen and only fetch newer ones, and not re-fetch headers it already has.

@Benc, Ben, do know of any possible problems with storage in .msf files (no offline store) when a huge number of messages (e.g., > 400,000) are in a mailbox? (The most I have in any mailbox is around 40k and just for testing purposes.)

@Paul, If you can provide another log as requested in comment 4 (with added IMAP_CS:5) it might provide some info.

I am going to send you the URL per direct email.

484228 is an calendar acceptance message with an ICS file attached:

I put this message in an imap mailbox and added a few more messages after it. It doesn't cause any headers to get re-fetched. But the mailbox only has a few messages in it.

I am going to send you the URL per direct email.

This log looks pretty much like the the first one you sent. I didn't see any lines with "IMAP_CS" in them that might show the highest UID number TB knows about before fetching headers. Did you include "IMAP_CS:5" in the MOZ_LOG env. var. like this: MOZ_LOG=IMAP:4,IMAP_CS:5,timestamp,sync and also described in comment 4?

If you include IMAP_CS:5 you will lines like this example:
2024-08-18 20:29:43.508720 UTC - [Parent 774942: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=33604, added=0, useCS=false
This just indicates the the highest UID for the mailbox is 33604 and no new messages have been detected and "condstore" is disabled or not supported by server. Therefore no headers need to be fetched.

If you include IMAP_CS:5 and you still don't see IMAP_CS in the log, I'm not sure why. Make sure you are clicked on the Inbox with the 400k messages. If still don't see it you might try enabling imap CONDSTORE in TB config editor by setting mail.server.default.use_condstore to true and restarting.
Condstore is supported by the cyrus servers you are using (per capability response). In Tb, condstore just sometimes reduces the need to download all the message flags at startup. It may help some since you have so many message flags to fetch on each startup. But, by itself, it doesn't affect the need to download headers.

I probably should have done this sooner, but I went ahead and created 430,000 simple emails (a bit more than you have) using Ben's (benc) program "fakemail". This tests that 430000 emails can reside in an imap mailbox and on restart if any headers get downloaded like you are seeing. Right now they are being copied to a cyrus imap folder (cyrus running local on localhost). It takes a while and the copy count is up to about 60,000 so may need to let it run all night to finish.

(In reply to gene smith from comment #11)

[…]

I am going to send you the URL per direct email.

This log looks pretty much like the the first one you sent. I didn't see any lines with "IMAP_CS" in them that might show the highest UID number TB knows about before fetching headers. Did you include "IMAP_CS:5" in the MOZ_LOG env. var. like this: MOZ_LOG=IMAP:4,IMAP_CS:5,timestamp,sync and also described in comment 4?

I used:

$ MOZ_LOG=IMAP:4,IMAP_C:5,timestamp,sync MOZ_LOG_FILE=/dev/shm/tblog thunderbird

Now, I see, I missed the S in IMAP_CS. Embarrassing. Sorry for wasting your time. I am going to sent you the URL of a the log you asked for in a minute.

If you include IMAP_CS:5 you will lines like this example:
2024-08-18 20:29:43.508720 UTC - [Parent 774942: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=33604, added=0, useCS=false
This just indicates the the highest UID for the mailbox is 33604 and no new messages have been detected and "condstore" is disabled or not supported by server. Therefore no headers need to be fetched.

Yes, such lines are included now.

If you include IMAP_CS:5 and you still don't see IMAP_CS in the log, I'm not sure why. Make sure you are clicked on the Inbox with the 400k messages. If still don't see it you might try enabling imap CONDSTORE in TB config editor by setting mail.server.default.use_condstore to true and restarting.
Condstore is supported by the cyrus servers you are using (per capability response). In Tb, condstore just sometimes reduces the need to download all the message flags at startup. It may help some since you have so many message flags to fetch on each startup. But, by itself, it doesn't affect the need to download headers.

AS IMAP_CS:5 messages are logged now, I am not enabling it for now, until you tell me to do it again.

In the new log I see this:

2024-08-19 05:58:27.483607 UTC - [Parent 7919: IMAP]: D/IMAP_CS Do full sync?: mFolderHighestUID=484228, added=0, useCS=false

which tells TB to fetch and new header higher than 484228, which it does.
Then when fetching those headers, log shows this:

2024-08-19 06:01:08.060998 UTC - [Parent 7919: Main Thread]: D/IMAP_CS NormalEndHeaderParseStream(): store new highest UID=532092 for folder=INBOX

So when it starts again it should have the highest UID as 532092 stored, but apparently not since headers above 484228 are always fetched, from what you are seeing and from what your logs say.

Also, not sure why but my attempt to put 430k messages into cyrus only resulted in only about 135k messages. The "fakemail" program to generate messages may be producing dupes. Not sure if cyrus kicks out dupes or not. Need to look at this closer ASAP.

I don't see how, but by any chance are you running up against a quota limit? I don't see anything in the log about quota limits. But with my localhost based cyrus, I have the messages limit at 10,000,000 and I see that in my imap quotaroot responses.

Thank you for the analysis.

So when it starts again it should have the highest UID as 532092 stored, but apparently not since headers above 484228 are always fetched, from what you are seeing and from what your logs say.

Where should that be stored? Some database? Can I check that?

I don't see how, but by any chance are you running up against a quota limit? I don't see anything in the log about quota limits. But with my localhost based cyrus, I have the messages limit at 10,000,000 and I see that in my imap quotaroot responses.

How can I check? There is nothing related to quota in etc/imapd.conf, so no idea if there is a default.

Where should that be stored? Some database? Can I check that?

It would be encoded in the Inbox.msf file. But those files are not easy to read and I don't know how to find the highest UID in there.

How can I check? There is nothing related to quota in etc/imapd.conf, so no idea if there is a default.

I don't remember how I set it. I probably had to run the cyrus config program/script/shell (don't remember what it's called). Anyhow, you can also check the quota by right-clicking a folder in TB and look at "properties". There is a "quota" tab there that will show something if your cyrus has quotas enabled. If not, the quota tab will be empty. (Again, this is just a "long shot" that this is the cause of the problem.)

No quotas shown by Thunderbird for the inbox (Dieser Ordner stellt keine Informationen zum Speicherplatzkontingent bereit.).

If you tell me, what function to step through with GDB (or some other debugging facility), I could try that.

(In reply to Paul Menzel from comment #18)

If you tell me, what function to step through with GDB (or some other debugging facility), I could try that.

You would probably need to step into here: https://searchfox.org/comm-central/rev/cbc2cfb3485bd3d976a36fc090e21b507820c863/mailnews/imap/src/nsImapMailFolder.cpp#2875

However, the TB you have may be optimized and not have a symbol table which would make it difficult (or impossible?) to debug. There are other builds that may not be optimized at the TB archive site or there may be corresponding symbol files you can get for a given build. But I don't know much about how to do this since when I run gdb it is on a self built daily debug build with symbols produced automatically.
The archive site is here: https://archive.mozilla.org/pub/thunderbird/

I did finally manage to load my local cyrus mailbox with more messages than you have in your inbox, 452115. But I'm not seeing any header downloads at startup. I was thinking maybe there is some kind of limit on the number of messages, but I'm not seeing it here with my daily build or 128.1.0esr from archive site with 452115 messages in a single mailbox.

Unless you want to try to debug first, here's two ways to hopefully fix the problem -- but without really knowing why it broke:

  • You can shutdown TB and delete the Inbox.msf file. Then restart TB and click on the problem Inbox. This will cause all the header to re-fetch from the server and create a new Inbox.msf file, or
  • Right click the Inbox and choose "properties" and then click the "Repair" button. This will also re-fetch headers and over-write the Inbox.msf file. I'm not 100% sure it deletes the Inbox.msf file first, but I think it does.

Of course, whichever way you choose, if you do this, it will take a while to re-download the headers. But since you don't use mbox/maildir offline storage it is much faster than downloading all the messages and attachments of your 400k+ Inbox emails.

Flags: needinfo?(benc)
See Also: → 1912582

Just for reference, that this still happens with thunderbird 1:128.1.1esr-1. I am still undecided if I should move that calendar message out of the inbox to see if that helps. That would probably destroy the use case, and no debugging is possible.

I suggested deleting Inbox.msf and letting it re-download or maybe do a folder repair. Of course, this destroys the "evidence" too. But before deleting or repair, you could first backup the current Inbox.msf so the evidence is preserved and the file could be put back.

Another thing to try might be to copy all the messages from some before the "calendar" message (lower UIDs) to the highest UID in your inbox to a temporary IMAP folder on the same account (copy them, don't move them). Then see if headers get re-downloaded unexpected for the temporary folder too.

Another possibility is to create a new TB temporary profile for the account, which will setup everything fresh in the new profile and not affect your original profile, and see if it also has the same problem with re-downloading on each startup.

Paul, can you also get a performance profile of the message DL process? https://support.mozilla.org/en-US/kb/profiling-thunderbird-performance

If possible, please use 128.2.0 - now available as a candidate but in a day or two should be on the website.

Blocks: 1912582
Component: Untriaged → General
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
Keywords: perf
Product: Thunderbird → MailNews Core
See Also: 1912582
Summary: Message headers are re-downloaded at every start [regression from 115.13.0 to 128.1.0] → Message headers are re-downloaded at every startup [regression from 115.13.0 to 128.1.0]

Also, turn off automatic compact in Settings.

Just a note, that it’s still happening with thunderbird 1:128.2.0esr-1.

Please provide a performance profile while using 128.2.1

Whiteboard: [closeme 2024-10-01][needs performance profile]

It’s not happening anymore with thunderbird 1:128.2.1esr-1. No idea if the always added messages changed something or if there was a code fix.

Status: UNCONFIRMED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(pmenzel+bugzilla.mozilla.org)
Resolution: --- → WORKSFORME

It’s started happening again with thunderbird 1:128.13.0esr-1. All messages from a certain point on are downloaded.

Status: RESOLVED → VERIFIED

Hmm, I forgot to change the Resolution field, and now I am unable to do it. :(

"VERIFIED" is for resolved bugs, you apparently wanted to reopen the bug, reversing the "UNCONFIRMED → RESOLVED" transition.

Status: VERIFIED → UNCONFIRMED
Resolution: WORKSFORME → ---

It’s started happening again with thunderbird 1:128.13.0esr-1. All messages from a certain point on are downloaded.

Is that calendar message still triggering it?
Downloads happen now every time you start TB?
About how many messages are getting re-downloaded?
Is it still just happening for your largest inbox, i.e., I think about 400k messages?

From comment 25:

Please provide a performance profile while using 128.2.1

Doesn't sound like a "performance" problem so don't think we need this.

I'll see if I can find the last imap:4 log you sent and look at it again.
I run self-built fairly recent dailies and have never seen this when I open inbox for cyrus account with lots of messages. No recent fixes have occurred that would affect this behavior.

(In reply to gene smith from comment #30)

It’s started happening again with thunderbird 1:128.13.0esr-1. All messages from a certain point on are downloaded.

Is that calendar message still triggering it?

No it’s a different message. It’s a Wireguard list message.

Downloads happen now every time you start TB?

Indeed, and always from the same message on like last time.

About how many messages are getting re-downloaded?

Over four thousand now. (Just to avoid misunderstandings, only the headers are re-downloaded, as I have configured Thunderbird that way.)

Is it still just happening for your largest inbox, i.e., I think about 400k messages?

Indeed, only that one with around 500,000 messages now.

I sent you the URL to the logs directly.

I sent you the URL to the logs directly.

Thanks for info and link. I downloaded it and have it open right now. Will take a look in detail shortly.

Hi Paul,
I looked again at your log and nothing jumps out. Looks about the same as what you sent me maybe a year ago.
The log shows all the startup activity for all you accounts. If you are consistently seeing this on just one of the accounts that starts with the server name "sally" having the 500k msgs, maybe you can try temporarily "disabling" all but that one account and see if there might be some interference or cross-talk going on between the accounts causing this. I've never seen any issue like this, but first time for everything.

To disable the other accounts, just go to server settings for each of the other accounts and do this:

  • Uncheck "Check for new messages at startup"
  • Uncheck "Check for new messages every X minutes"

Now select Inbox on the problem account and shutdown tb.
Then restart TB while recording IMAP:4 log again like you have been and stay selected only on Inbox of the problem account without going to any other folder on this or other accounts. Also, don't open any new messages that may arrive in Inbox.

This shouldn't fix the problem. It should just make the IMAP:4 log only contain the activity for the problem account, so just shutdown TB and send me the resulting log.
If it does fix the problem (no headers downloading on every TB startup) I'm not sure what to think.

Of course, after recording the log again, you can go back to the "disabled" accounts server settings and reverse the steps above and restart TB to ensure all the accounts are now "enabled".

Ok, thanks for the new IMAP:4 log with all the other accounts disabled. Still see the same result (which is probably good since account cross-talk would be really bad).
I think we need to look directly inside the INBOX.msf file and see exactly what value is stored for highestRecordedUID. It's in the text of the file somewhere but I don't know how to decode it. However there is a perl program you can run on INBOX.msf file. I've used it a few times and it's referenced here: https://bugzilla.mozilla.org/show_bug.cgi?id=1923355#c7. The direct URL to the program mork.pl is https://www.ggbs.de/extensions/mork.pl.

If you are willing to run it directly on your INBOX.msf for your 500k account, cd to the directory in your tb profile containing your INBOX.msf, e.g.,
cd ~/.thunderbird/kgmivl5o.default-release/ImapMail/localhost
then with tb shutdown, run the mork.pl program:
perl ~/Downloads/mork.pl INBOX.msf | less
It may take a while to parse the file before it displays anything.
Once it prints the output, search for highest and it should find info like this:

fixedBadRefThreading = 1
           flags = 83014 ( Elided Mail Inbox ImapPersonal ImapBox )
    forceReparse = 0
   highestModSeq = 43595 (275861)
highestRecordedUID = 8e76 (36470)     <------------- most important value!
    highWaterKey = 8e76 (36470)            <-------------  interesting but not as important.
       imapFlags = e000 (57344)
     mailboxName = INBOX
         MRMTime = 1754417379 (Tue Aug  5 14:09:39 2025)
         MRUTime = 1754677797 (Fri Aug  8 14:29:57 2025)
         numMsgs = 89a1 (35233)
      numNewMsgs = 47ec (18412)
      onlineName = INBOX
     sortColumns =  byId (ascending)
       sortOrder = 1 ( ascending )
        sortType = 15 ( byId )
  totPendingMsgs = 0
     UIDValidity = 611de794 (1629349780)
unreadPendingMsgs = 0
         version = 1
       viewFlags = 0 ( )
        viewType = 0 ( eShowAllThreads )

Note the value for highestRecordedUID and highWaterKey, in hex followed by decimal in parens.
Then run TB again and after any folders finish downloading, shutdown TB and run mork.pl on INBOX.msf again and see if the values have changed.
My guess is that highestRecordedUID does not change and that causes the messages with that UID and higher to be downloaded again on each startup. (Of course, if it doesn't change, "why not" is the big question. Or if it does, why are the same message headers getting re-downloaded -- also a big question.)

Note1: In the IMAP:4 log you just sent, the first occurrence of a line that says got stored highest UID=605618 for folder=INBOX is the initial value read from INBOX.msf and the last line that says store new highest UID=610904 for folder=INBOX is the last value written to INBOX.msf. So after you shutdown this TB run, you should see 610904 as the highestRecordedUID in the INBOX.msf file.
Note2: I haven't seen any problem running mork.pl. It seems just to be "read only" on the target .msf file and doesn't write to it. But if you don't want to run it in place directly on your INBOX.msf in your profile, you can copy your INBOX.msf somewhere and run mork.pl on it there. Be sure to do any copy of Inbox.msf with tb shut down.
Note3: I don't know the details but the mork DB is getting replaced soon. I think work is in progress. So please ignore the "editorial" comments in the mork.pl file. :).

It occurred to me that it's possible that the INBOX.msf is not getting written at all. So you might check the modification time/date and size on it before running tb. Then run tb and let the headers download then shutdown tb. Then check the time/date and file size again. If nothing changes, somehow the properties are getting written (per the log) but never get flushed to disk for some buggy reason. If that's the case, probably no reason to check the file with mork.pl.

Thank you for these pointers. The runs for over half an hour and fails with a parse error:

[…]
  (BC929=35f99)(F86F0=Mirror display gone)(F86F1
    =5980963b-4ae4-5b4d-cf36-d031e1c665df@htt-consult.com)(F86F5
    =SoftEtherVPN/SoftEtherVPN <reply+ACHDDCCFFN3VPNETAEDGVSF7EZVA3EVBNHHDQPNYWA@reply.github.com>)(F86F6
    =SoftEtherVPN/SoftEtherVPN/issues/1430/899617239@github.com)(F86F7
    =611a890d)(F86F9
    =CAA7TbcsaM_nFeA1q=Eq6zsW1gH_73HXSCmtsc=3O0hi2s8QGfA@mail.gmail.com)
  (F86FA
    =<CAA7TbcveKrAFSODfKKJ_q1yaPEZgvn06mBO2tXe7h=Xr6jZHLw@mail.gmail.com>)
  (F86FB=611a8b06)(F86FD
    =SoftEtherVPN/SoftEtherVPN <reply+ACHDDCFBZA4TIPMS5I6RKMF7EZWQFEVBNHHDQPNYWA@reply.github.com>)(F86FE
    =SoftEtherVPN/SoftEtherVPN/issues/1430/899626530@github.com)(F86FF
    =611a8c02)(F8701
    =[PATCH] auto-t: hostapd.py: remove _hostapd_restarted concept)
  (F8702=20210816160450.11566-1-prestwoj@gmail.com)(F8703=611a8ca2)
  (F8705
    ==?UTF-8?B?0JTQvNC40YLRgNC40Lkg0J/QvtC90LDQvNC+0YDQtdCy?= <dponamorev@gmail.com>, Coreboot <coreboot@coreboot.org>)(F8706
    =CAGfcU4RU97YMpyO9GhZfeqpH6aTFgGbqSRX7g9wTHyVv9+Lx8A@mail.gmail.com)
  (F8707
    =<160209417080.14.4380890715724377308@b63950293ec5> <160209884181.14.13434303532990824164@b63950293ec5> <b5ed581a-1700-de07-2f8f-f16bf76bd5af@gmx.de>)
  (F8708=611a8f15)(C318E=37be1)(F870A
    =linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org)(F870B
    =cong.wang@bytedance.com, arnd@arndb.de, gustavoars@kernel.org, richardcochran@gmail.com, nikolay@nvidia.com, colin.king@canonical.com, kuba@kernel.org, shuah@kernel.org, davem@davemloft.net)(F870C
    =[Intel-wired-lan] [RFC net-next 0/7] Add basic SyncE interfaces)
  (F870D=20210816160717.31285-1-arkadiusz.kubalewski@intel.com)(F870E
    =611a8d2e)(F8710
    =[Intel-wired-lan] [RFC net-next 2/7] selftests/ptp: Add usage of PTP_DPLL_GETSTATE ioctl in testptp)(F8711
    =20210816160717.31285-3-arkadiusz.kubalewski@intel.com)(F8712
    =<20210816160717.31285-1-arkadiusz.kubalewski@intel.com>)(F8713
    =611a8d30)(F8715
    =[Intel-wired-lan] [RFC net-next 1/7] ptp: Add interface for acquiring DPLL state)(F8716=20210816160717.31285-2-arkadiusz.kubalewski@intel.com)
  (F8717=611a8d2f)(F8719
    =[Intel-wired-lan] [RFC net-next 3/7] ice: add get_dpll_state ptp interface usage)(F871A=20210816160717.31285-4-arkadiusz.kubalewski@intel.com)
  (F871B=611a8d31)(F871D
    =[Intel-wired-lan] [RFC net-next 6/7] ice: add SIOC{S|G}' not parsed
15: Cannot parse

The numbers you asked for are not shown unfortunately.

I checked INBOX.msf actually gets updated after closing Thunderbird, and diff shows that something is appended at the end:

$ diff -u INBOX.msf.old INBOX.msf
[…]
+@$${12FA0{@
+[1:^9E(^88^1EEB12)]
+{-1:^C6 {(k^C7:c)(s=9)} 
+  [-95278(^CA=95278)(^CC=1)]}
+@$$}12FA0}@
+
+@$${12FA1{@
+[-FFFFFFE6:^80(^88=1)(^82^1F4)(^85^1F5)(^81^221232)(^83^221233)(^84^221234)
+    (^BA^221235)(^86^221235)(^89=1)(^95^B6)(^87=f4e)(^9A^94)(^8E=95278)
+    (^BB=0)(^BD=)(^C1^220544)]
+[-95278:^C6(^CA=95278)(^CC=1)(^CD^1EEB13)(^C9=2)(^CE=f4e)]
+@$$}12FA1}@
+
+@$${12FA2{@
+[-243D3:m(^9B=95278)(^8F=95278)(^91=0)(^90=0)(^92=0)(^93=0)]
+{1:^80 {(k^96:c)(s=9)} -
+  [-95278]}
+{-95278:^80 {(k^97:c)(s=9)243D3:m } }
+{FFFFFFFD:^99 {(k^98:c)(s=9)} -95278 }[1:^9E(^A0=7888f)]
+@$$}12FA2}@
+
+@$${12FA3{@
+[1:^9E(^88^1EEB14)]
+{-1:^C6 {(k^C7:c)(s=9)} }
+[-95278:^C6]
+@$$}12FA3}@
+
+@$${12FA4{@
+@$$}12FA4}@

Not sure if mork.pl is finding an actual corruption or if it has problems itself with a huge .msf file. Based on what you told me before I suspect your INBOX.msf is now over a gig. I tried it again on a larger file than I tried yesterday and it seemed to be taking forever, so I gave up.

I then looked inside the INBOX.msf (with gvim editor) and see that highestStoredUID has a code "B8". I see several places in the file with B8=<a hex number> but the only one that seems to be correct is if I go to the bottom of the file and search backwards for ^B8= and the hex value I find there matches the highest UID I see in my "Order Received" column and what I see in the log for the final store new highest UID=....

I don't see a "B8" in your diff of stuff added but not sure you included the whole diff in your comment 36.

Anyhow, if you open your INBOX.msf in an editor you should first make sure you see something like this. I see this near the top of the file:
(B8=highestRecordedUID)(B9=ns:msg:db:row:scope:pending:all)
Then go to bottom and search backwards for ^B8= and convert the hex value to decimal. That should be the saved "highestRecordedUID" we are looking for. So if you see this, what is the value before and after running TB? (This is the same thing I was hoping mork.pl would show us.)

INBOX.msf has a size of 335 MB. mork.pl took over 30 minutes on an Intel 14th generation desktop CPU.

The files starts with:

// <!-- <mdb:mork:z v="1.4"/> -->
< <(a=c)> // (f=iso-8859-1)
  (B8=ns:msg:db:row:scope:pending:all)(B9=ns:msg:db:table:kind:pending)
  (BA=dateReceived)(BB=ProtoThreadFlags)(BC=replyTo)(BD=keywords)
  (BE=imageSize)(BF=junkscore)(C0=recipient_names)(C1=sender_name)
  (C2=junkscoreorigin)(C3=folderName)(C4=columnStates)(C5=customSortCol)
  (80=ns:msg:db:row:scope:msgs:all)(81=subject)(82=sender)(83=message-id)
  (84=references)(85=recipients)(86=date)(87=size)(88=flags)(89=priority)
  (8A=label)(8B=numLines)(8C=ccList)(8D=bccList)(8E=msgThreadId)
  (8F=threadId)(90=threadFlags)(91=threadNewestMsgDate)(92=children)
  (93=unreadChildren)(94=threadSubject)(95=msgCharSet)
  (96=ns:msg:db:table:kind:msgs)(97=ns:msg:db:table:kind:thread)
  (98=ns:msg:db:table:kind:allthreads)
  (99=ns:msg:db:row:scope:threads:all)(9A=threadParent)(9B=threadRoot)
  (9C=msgOffset)(9D=offlineMsgSize)
  (9E=ns:msg:db:row:scope:dbfolderinfo:all)
  (9F=ns:msg:db:table:kind:dbfolderinfo)(A0=numMsgs)(A1=numNewMsgs)
  (A2=folderSize)(A3=expungedBytes)(A4=folderDate)(A5=highWaterKey)
  (A6=mailboxName)(A7=UIDValidity)(A8=totPendingMsgs)
  (A9=unreadPendingMsgs)(AA=expiredMark)(AB=version)(AC=forceReparse)
  (AD=fixedBadRefThreading)(AE=onlineName)(AF=sortColumns)(B0=viewFlags)
  (B1=viewType)(B2=sortType)(B3=imapFlags)(B4=highestModSeq)(B5=MRUTime)
  (B6=sortOrder)(B7=highestRecordedUID)>
<(80=1)(95=fffffffe)(9F=574e9ce4)(81=0)(9C=2)>
[…]

So it’s ^B7= for me:

    (^A7^88)(^B7=9532c)(^A0=7891b)(^A5=9532c)(^C4^9F704)(^C5=)(^A1=3ae0)

With Python:

>>> print("%i", 0x9532c)
%i 611116

That is actually the highest UID when I closed Thunderbird, so it gets updated correctly. But when I open Thunderbird, it only shows up to 605618, as verified by the logs above, and re-downloads the rest.

>>> hex(605618)
'0x93db2'

93db2 is in INBOX.msf quite often, and also with ^B7=:

$ grep -na -e 9532c -e 93db2 INBOX.msf 
422474:    =<cba9dec1d1e44137b9532c348ad9a71b@gsi-bonn.de>)(D15B=461d)(54B09
482746:    =osresearch/heads <reply+008e3188704c6793db250279fa0a0167872b72df053700879\
493237:.com)(674E7=79532ca5-deb4-e3b1-022f-1243d080013c@neuf.fr)(674E8=5c862983)
494542:    =<13b79ff3-a99f-ea76-d25c-ae509f829eb7@neuf.fr> <79532ca5-deb4-e3b1-022f-1\
2084395:    =0101017e4b608924-631967ea-4c42-4815-813a-03baa93db2f3-000000@us-west-2.am\
2487484:e> <c9532c5aa511463492daa2fda7ea6a6e@ipp.mpg.de> <f894fc70-373b-4d77-b1a3-9074\
2488457:e> <c9532c5aa511463492daa2fda7ea6a6e@ipp.mpg.de> <f894fc70-373b-4d77-b1a3-9074\
2491618:    =c9532c5aa511463492daa2fda7ea6a6e@ipp.mpg.de)(13F94C
2491866:e> <c9532c5aa511463492daa2fda7ea6a6e@ipp.mpg.de>)(13FA44=678e4c02)(13FA45
2494164:<(13FFF8=89e40)(13FFF7=67893db2)>
3457866:    =649532cf)(1B142D=A66709F1-118D-4E08-969D-5FD6C08E326B@slavino.sk)
3580338:    =0101018c43d9532c-b8428637-55f4-498d-9e70-68c2ce6dba9f-000000@us-west-2.am\
3652626:    =ad97517c-fb5b-c5fd-7425-2b9493db27df@gwdg.de)(1E5DD2
4884135:    =2cc137478349400086a3f094693db224@mbx19-sub-01.um.gwdg.de)(21171E
5136669:<(21CA3B=93db2)(21CA3A=68753a90)>
5136670:[-23DD2:m(^9B=93db2)(^8F=93db2)(^91^21CA3A)(^90=0)(^92=1)(^93=1)]
5136680:    (^8E=93db2)(^BB=0)]}
5136689:    (^A7^88)(^B7=93db2)(^A0=7786e)(^A5=93db2)(^C4^9F704)(^C5=)(^A1=39f3)
5138838:    (^86^21CA3A)(^89=1)(^95^1CC)(^87=1750)(^9A^94)(^8E=93db2)(^BB=0)
5165563:    (^8E=93db2)(^BB=0)(^C1^21CA4A)(^BD=)]
5198219:    (^8E=93db2)(^BB=0)(^C1^21CA4F)(^BD=)]
5212068:    (^8E=93db2)(^BB=0)(^BD=)]
5235673:    (^8E=93db2)(^BB=0)(^BD=)(^C1^21D433)]
5269234:    (^8E=93db2)(^BB=0)(^C1^21CA4A)(^BD=)]
5292359:    (^8E=93db2)(^BB=0)(^C1^21CA49)(^BD=)]
5310936:    (^8E=93db2)(^BB=0)(^BD=)]
5341779:    (^8E=93db2)(^BB=0)(^BD=)(^C1^21CA8C)]
5361227:    (^8E=93db2)(^BB=0)(^C1^21CA55)(^BD=)]
5371866:    (^8E=93db2)(^BB=0)(^C1^21CA55)(^BD=)]
5403435:    (^8E=93db2)(^BB=0)(^BD=)]
5403438:    (^B7=93db2)(^A0=7786e)(^A5=93db2)(^C4^9F704)(^C5=)(^A1=39ec)(^88^1EEB14)
5432306:    (^8E=93db2)(^BB=0)(^C1^21CA4A)(^BD=)]
5432309:    (^B7=93db2)(^A0=7786e)(^A5=93db2)(^C4^9F704)(^C5=)(^A1=39ec)(^88^1EEB14)
5472857:    (^8E=93db2)(^BB=0)(^C1^21CA4F)(^BD=)]
5518130:    (^8E=93db2)(^BB=0)(^C1^21CA4A)(^BD=)]
5560945:    (^8E=93db2)(^BB=0)(^C1^21CA4F)(^BD=)]
5605475:    (^8E=93db2)(^BB=0)(^BD=)]
5634126:    (^8E=93db2)(^BB=0)(^BD=)]
5695189:    (^8E=93db2)(^BB=0)(^C1^21CA51)(^BD=)]
5728340:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5761538:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5796531:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5805465:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5837257:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5881467:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5951961:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
5994440:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6033101:    (^8E=93db2)(^BB=0)(^C1^21CA51)(^BD=)]
6039538:    (^8E=93db2)(^BB=0)(^BD=)]
6111609:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6151257:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6192156:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6231989:    (^8E=93db2)(^BB=0)(^C1^21CA51)(^BD=)]
6271785:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6307530:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6309157:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6350664:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6419691:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6434105:    (^B7=93db2)(^A0=7786d)(^A5=93db2)(^C4^9F704)(^C5=)(^A1=39f2)(^88^1EEB14)
6437599:<(2219E9=9532c)(2219E8=689a544b)>
6437600:[-243EC:m(^9B=9532c)(^8F=9532c)(^91^2219E8)(^90=0)(^92=1)(^93=1)]
6467636:    (^BA^2219E8)(^86^2219E8)(^89=1)(^95^B6)(^87=1a47)(^9A^94)(^8E=9532c)
6469635:    (^8E=93db2)(^BB=0)(^C1^21CA46)(^BD=)]
6471201:    (^A7^88)(^B7=9532c)(^A0=7891b)(^A5=9532c)(^C4^9F704)(^C5=)(^A1=3ae0)
6471438:    (^BA^2219E8)(^86^2219E8)(^89=1)(^95^B6)(^87=1a47)(^9A^94)(^8E=9532c)

So, it’s corrupted as the highestRecordedUID is recorded several times?

Paul, thanks for the info and sorry for the delay in getting back to you on this.
I see what you mean about the values for highestRecordedUID occurring more than once in the files. It appears that the last value is the correct value but occurrences earlier are older/"historical" values and, for you, the older values are getting used to request the first header to download, causing re-downloads of headers.
I don't know of a way to get rid of the old out of date historical values other than "repairing" the folder. I do see that just doing a "compact" operation has no effect on the .msf file. Of course, as you know, repairing (unlike compact) will force a complete re-download of ALL the headers which will take some time (other details in comment 19).
At comment 21 you said the re-downloads stopped happening (and apparently was OK for about 11 months). By any chance did you do a folder repair on INBOX at that time?
I've been trying to re-populate my cyrus server Inbox with 500k messages like you have. However, my INBOX.msf is not nearly as large and is 176 Mb (probably not as much out-of-date info and maybe smaller headers), and so far, haven't seen a problem, except one time, on tb restart, a newly added headers set (don't remember, maybe about 16k headers) got re-downloaded, but on subsequent restarts they didn't.

Maybe Wayne knows who the current mork database expert is and understands what's going on internally in the .msf / mork files. But there might not be a lot interest in fixing this since "mork" db file format is going away in tb, from what I understand. (I don't know if the new DB will even continue to be inside .msf files.)

Paul, I don't think I asked you this, but are you using mbox or maildir? I doubt if this really matters since you are not using offline store. But it may change something unexpected and I've only tested with my cyrus server set to the default mbox. This is under "Server settings" and then, near the bottom, "Message store type", typically grayed.

Flags: needinfo?(vseerror)
Status: UNCONFIRMED → NEW
Ever confirmed: true

(In reply to Paul Menzel from comment #38)

So, it’s corrupted as the highestRecordedUID is recorded several times?

I wouldn't necessarily say it's corrupted. Mork has a tendency to want to just keep appending values to the file to act as a transaction log. Most of the rows that appear to be repeated here look like they're just updating a single row (=info on a single message). The default way we do that in Mork is to delete all values from that row and then spill the entire row, something like this:

[-67E6F:^80(^88=80)(^C0^1E155)(^82^23FA2)(^85^1E275)(^81^24490)(^83^24491)
    (^84=)(^BB^24492)(^86^24492)(^89=1)(^87^24493)(^9A^92)(^8E=67e6f)
    (^BC=0)(^BD^24497)(^BE^24497)(^BF=)(^C4^240F4)(^C7^2453B)(^9D^2453C)
    (^8B=113a4)]

(the - at the beginning means "forget everything in this row, and replace all the columns with these values).

I've decided to write yet another mork parser in Rust this time to see if it might be useful for your needs, but it's not entirely clear what the best path forward would be for you.

My initial guess is that mork is just refusing to compress the database. So I'd like to know a few things:

  • Which line number first begins with @$${ -- and what is its contents?
  • What is the contents and line number of the final line? (should be something like @$$}2B29}@
  • What's the highest atom ID? That is, the last line that contains )>, what is the beginning of the cell (like (24543=189dc)>)?

(In reply to Joshua Cranmer [:jcranmer] from comment #41)

(In reply to Paul Menzel from comment #38)

[…]

Thank you for looking into this and your explanations.

My initial guess is that mork is just refusing to compress the database. So I'd like to know a few things:

  • Which line number first begins with @$${ -- and what is its contents?

1728526 @$${14{@

  • What is the contents and line number of the final line? (should be something like @$$}2B29}@

6777362 @$$}12F63}@

  • What's the highest atom ID? That is, the last line that contains )>, what is the beginning of the cell (like (24543=189dc)>)?

6777342 <(22201F=1755060995)(22201D=954b5)(22201E=78a64)>

(In reply to gene smith from comment #39)

Paul, thanks for the info and sorry for the delay in getting back to you on this.

Compared to other projects, I didn’t even notice the delay. So no need to be sorry, and thank you for helping me.

At comment 21 you said the re-downloads stopped happening (and apparently was OK for about 11 months). By any chance did you do a folder repair on INBOX at that time?

I am pretty sure, I did not do a repair.

I've been trying to re-populate my cyrus server Inbox with 500k messages like you have. However, my INBOX.msf is not nearly as large and is 176 Mb (probably not as much out-of-date info and maybe smaller headers), and so far, haven't seen a problem, except one time, on tb restart, a newly added headers set (don't remember, maybe about 16k headers) got re-downloaded, but on subsequent restarts they didn't.

What I am also often doing is to select a bunch of newly received messages, and select to download them for offline read. No idea, if that has anything to do with the database.

[…]

Paul, I don't think I asked you this, but are you using mbox or maildir? I doubt if this really matters since you are not using offline store. But it may change something unexpected and I've only tested with my cyrus server set to the default mbox. This is under "Server settings" and then, near the bottom, "Message store type", typically grayed.

Yes, it’s the default single file (mbox).

See Also: → 1982926

(In reply to Joshua Cranmer [:jcranmer] from comment #41)

(In reply to Paul Menzel from comment #38)

So, it’s corrupted as the highestRecordedUID is recorded several times?

I wouldn't necessarily say it's corrupted. Mork has a tendency to want to just keep appending values to the file to act as a transaction log. Most of the rows that appear to be repeated here look like they're just updating a single row (=info on a single message). The default way we do that in Mork is to delete all values from that row and then spill the entire row, something like this:

[-67E6F:^80(^88=80)(^C0^1E155)(^82^23FA2)(^85^1E275)(^81^24490)(^83^24491)
    (^84=)(^BB^24492)(^86^24492)(^89=1)(^87^24493)(^9A^92)(^8E=67e6f)
    (^BC=0)(^BD^24497)(^BE^24497)(^BF=)(^C4^240F4)(^C7^2453B)(^9D^2453C)
    (^8B=113a4)]

(the - at the beginning means "forget everything in this row, and replace all the columns with these values).

I've decided to write yet another mork parser in Rust this time to see if it might be useful for your needs, but it's not entirely clear what the best path forward would be for you.

My initial guess is that mork is just refusing to compress the database.

commit/compress issues have been known to happen https://mzl.la/4lwcQmu

Maybe Wayne knows who the current mork database expert is and understands what's going on internally in the .msf / mork files. But there might not be a lot interest in fixing this since "mork" db file format is going away in tb, from what I understand.

Joshua, Geoff, and Ben are as good as it gets.

Gene, are comment 9, comment 34, comment 39 a good summary of where we are at?

Flags: needinfo?(vseerror)

Thanks for the info Wayne. I'm not sure what a "good" summary of this is. To me it looks like the "highestRecordedUID" is getting recorded in the INBOX.msf but on restart when it is read out, an old, previously valid value is getting read out and used. This causes a re-fetch of a lot of headers.
But I haven't been able to consistently duplicate the problem here, even with > 500k message in my inbox.

Regarding the compression issue, it does appear that reporter Paul's inbox.msf file has grown a lot bigger than it needs to be. I was hoping that just doing a folder compact from right-click menu would affect the inbox.msf and purge all the old out-of-date records and make it smaller, but apparently not. I wonder if maybe one also has to have offline store (mbox) present for the folder compact to do a DB "compress"? Paul runs, I think, with no offline store, at least for the problem Inbox.msf that is so huge and is now returning bad highest UID value.

Anyhow, hopefully Joshua (or Ben and Geoff also CC'd on this bug) has some ideas about why reporter Paul is seeing a "legacy" value being read back from his Inbox.msf for highest UID.
I do see that each time a new header is imap fetched, the database "property" value (highestRecordedUID) is written. I'm pretty sure that each write doesn't immediately commit the new highest uid value to disk.

This sounds a bit like the behaviour I described in bug 1833345, but for the highestRecordedUID field rather than numNewMsgs. I'm certain that's a bug in the Mork code, so it could be the culprit here.

We could try forcing a compress commit from the error console, then restarting, and see if that helps:

// With the folder in question open:
tabmail.currentAbout3Pane.gFolder.msgDatabase.commit(Ci.nsMsgDBCommitType.kCompressCommit);

(In reply to Geoff Lankow (:darktrojan) from comment #46)

[…]

We could try forcing a compress commit from the error console, then restarting, and see if that helps:

// With the folder in question open:
tabmail.currentAbout3Pane.gFolder.msgDatabase.commit(Ci.nsMsgDBCommitType.kCompressCommit);

That helped!

> tabmail.currentAbout3Pane.gFolder.msgDatabase.commit(Ci.nsMsgDBCommitType.kCompressCommit);
undefined

Closing and opening Thunderbird, no messages are re-downloaded.

PS: Ctrl + Shift + j opens the error console. Developer tools does not work for me, as described in bug 1912424 1.

That helped!
Closing and opening Thunderbird, no messages are re-downloaded.

That's good.
I'm curious if that actually made your Inbox.msf file smaller? I.e, now less than 335 MB per comment 38?
Of course, the question is, why doesn't this occur automatically to keep the .msf files working correctly?

Off topic: I see that folderCache.json (formerly panacea.dat) seems to also grow without bounds. Even after a folder or an account has long been deleted, the folders entries are still there. But it doesn't cache highestStoredUID so n/a for this bug.

It would happen reasonably regularly if folder compaction was happening, but since they don't have an offline store, there's no need for folder compaction.

(In reply to gene smith from comment #48)

I'm curious if that actually made your Inbox.msf file smaller? I.e, now less than 335 MB per comment 38?

Yes, the size was reduced to 238 MB.

This doesn't really fix the bug. It just allows a user with a possibly over-sized or
corrupt .msf summary file and no offline store to force compression of the .msf file.
Without something like this, an imap folder without offline store never gets its
.msf summary file compressed and cleaned up.

An alternate way to do the same thing might be to add a new button below the "repair folder"
button on the folder properties context menu. That button could trigger similar actions
as I propose here. A button to "compress summary file" would be much less drastic
then "repair folder" since all headers (and possibly all messages) would not need to be
re-downloaded.

I wouldn't add a separate action, but

  • add it as part of the repair folder and
  • make it happen through some other regular check as well (perhaps where we regularly check if there's anything to compact? if there's not we could still commit)

Also does commit/compress when individual folder or all folders on a selected server compacted.
Only does commit/compress for not-compactable folders and folders with no offline store.
Have not yet added anything to "Folder Repair" to do commit/compress. Not sure it's needed
since commit/compress will occur, if needed, when an individual folder is compacted.

Attachment #9507399 - Attachment is obsolete: true
See Also: → 1833345

Commit

  • on folder repair
  • when checking for need to compact, and there was no need
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9509110 - Attachment is obsolete: true
Attachment #9512564 - Attachment description: Bug 1912195 - Ensure database is commited regularly. r=darktrojan → Bug 1912195 - Ensure database is commited regularly. r=gds
Target Milestone: --- → 145 Branch

Was this code always broken, but surfaced by a change elsewhere?

Flags: needinfo?(mkmelin+mozilla)
Whiteboard: [closeme 2024-10-01][needs performance profile]

Can't say for sure, but sounds like it surfaced from changes elsewhere.

Flags: needinfo?(mkmelin+mozilla)

(In reply to Wayne Mery (:wsmwk) from comment #55)

Was this code always broken, but surfaced by a change elsewhere?

This doesn't really "fix" the reported bug which is a corrupted and huge .msf file for reporter's inbox. What it does is cause folders that are accessed and updated often to occasionally have their .msf file "compressed" to get rid of unused/obsolete records and to hopefully avoid corruption. The root problem is probably deep inside the mork code somewhere so the fix that Magnus came up with may not be needed or it's a no-op with a another DB.
So my opinion is that it's an issue that's been in mork a long time (unless someone has tweaked the mork code recently).

Pushed by edicharry@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/2045d8dc7614
Ensure database is commited regularly. r=gds

Status: ASSIGNED → RESOLVED
Closed: 1 year ago1 month ago
Resolution: --- → FIXED
Regressions: 1992351
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: