Open Bug 1551173 Opened 5 years ago Updated 2 years ago

MSF files disappear spontaneously (pop) (mostly after restart)

Categories

(MailNews Core :: Database, defect)

x86_64
Windows 8.1
defect
Not set
critical

Tracking

(Not tracked)

People

(Reporter: francis_, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss)

Please see bug 881829
This is a bit of a "me too" post but it could be a different bug altogether, or shed light on the one I linked. In a nutshell, the author said three years ago the problem had resolved itself. For me, it is re-emerging, and encryption does not seem to be the culprit.

I use Thunderbird 60.6.1 on a W8.1 laptop and a W10 desktop, but it occurred with other versions of TB and with W7 too.

Incidentally, I am also a consultant who has to be extremely careful about client data. TB is installed on both machines, but the profile and mail files are stored on an encrypted file on a USB key. This means I do not have to synchronise TB between two machines; I do not have to use IMAP (some of my e-mail hosts---notably TalkTalk---have been hacked before, so now I exclusively use POP and delete all server e-mails); and if I lose my USB key the encryption protects its content. For encryption, I use a file container initially with TrueCrypt, now with VeraCrypt. I have about 8GB of mail.

A few years (~6 years) and versions of TB and Windows ago, I noticed that some filters failed to work. It turned out the MSF file of the folder where the filter was going to move the message was missing. Clicking on the offending folder was not always the solution: I would get an hourglass and the message list would not show at all (I believe this exonerates indexing and gloda, even though gloda was wrong as a result).

TB was very temperamental as to which folder I had to open before opening the faulty ones. I never discovered what the rule was, but proximity seemed to play a part. Usually, the faulty folders were at the bottom of the tree, but not always; past activity, particularly filters, seems to play a part too. Once I found the right sequence of folders to open and TB accepted to rebuild the MSF file, the filter could be run. But not for long. At the next session, or sometimes even within the same session, some MSF files would disappear again. Some of those files were the same as before, some where different; randomness seemed to be inescapable. There were also several error messages about missing address books.

About two? years ago, I solved the problem by deleting all MSF files, gloda and possibly other files (panacea.dat possibly). This also fixed the address-books problems. Unfortunately, I did not write down exactly what I did... No comment.

This week, the problem resurfaced. But this time, there was no encryption. I had made a copy of the entire TB profile and mail files on my C: drive and pointed TB to use that. Promptly, TB went haywire and could not run searches properly. Deleting all MSF files led to only some of them being rebuilt. Deleting gloda led some folders to be indexed but not all of them (at least that is what the Activity Manager told me). The MSF files could forcibly be rebuilt by clicking on the folder, but the fix was ephemeral: as soon as I clicked away from a problem folder (on a saved search), I could see the corresponding MSF file disappear from Windows Explorer, and the mail count disappear from the folder pane.

Is the saved search part of the problem? It looks at the entire folder tree (with a few exceptions) for messages within a certain year; it rarely gets the same results from one session to the next; and it always counts fewer than there really are. If I break the search down into two searches, one on each half of the folder tree, both searches work without problem, the counts are correct, and the NSF files remain in place. So volume is also a factor. However, this does not explain why some MSF files would disappear from one session to the next, or within one session when I do not use the search facility. Is there a process similar to a search taking place when TB starts or receives mail, which accidentally vaporises MSF files?

By contrast, the profile and mail files on the encrypted USB key still seem fine. All the more puzzling that since making the backup on my C: drive, I have increased the number of folders (from the high 500s to around 800), though not the number of messages, by archiving some mail for the first time (hence the temporary backup on my hard drive and the need for searches to make sure I could reconcile the message counts).

All this is very woolly, random and thus difficult to fix. I wish I had some advice and guidance to bring a bit of science and usefulness to my observations so that the disappearing MSF problem can be addressed.

Missing MSF files are a mystery. I'm a heavy POP user and don't usually lose MSF files. I will lose the lot if I remove panacea.dat, bug 1093217, comment #5. Encryption/EFS rings a bell, bug 1279344. Also bug 1322476.

Moving this bug to where similar bugs are being categorized.

Component: Untriaged → Database
Product: Thunderbird → MailNews Core

You don't really say much about saved search. Please provide more facts/steps and theory. And also how this might relate to the non-saved search case.

Severity: major → critical
Flags: needinfo?(francis_)

Saved searches were very: locate all messages received within a year (say 2018) in all local folders. They had two criteria:
Date "is before" 2019-01-01
Date "is after" 2017-12-31
The result count was inconsistent (between 8000 and 9000) and always well below what it should have been (around 10000). Sorry for the round numbers, I am doing this from memory.

I made these searches to archive mail in annual batches, and now that mail has been archived, I can no longer reproduce the original issue. However, let me know if you need further info. I may have the possibility to restore from an earlier back-up to check things in more detail

Flags: needinfo?(francis_)

Saved searches were very simple

WFM per comment 4. Let us knowif the problem returns

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME

The problem is still happening. I lost a few MSF files a few weeks ago, and noticed this because some filters, which move messages into affected folders, stopped working.

Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---

Right, Filters will expose this problem, but wouldn't cause it.

Francis, Have you learned any more in the past year why this is happening to you?

See meta Bug 498274 - [Meta] Issues around simultaneous "view mail, delete/append of mail, update of mail data, read by other" and "compact folder" of local mail folder and IMAP offline-store

On of these is Bug 498185 - MailDB(.msf) is corrupted(Rebuild-Index is invoked), if "Compact Folder" is invoked on "copy target folder" while "copy to folder by message filter" is running. "Compact Folder" itself silently fails.

Blocks: 498274
Flags: needinfo?(francis_)
Summary: MSF files disappear spontaneously → MSF files disappear spontaneously (pop)

Wayne,

Sorry, I am none the wiser about the source of the problem. I have not seen this problem recently (that doesn't mean it has gone away for good). The absence of the problem seems to be confirmed by two checks:

  • my mail count for 2019 is the same whether doing one global search or searching within first-level folders individually.
  • the number of .msf files is the same as the number of mail files (without extensions).

If I understand well the two bugs you referred, some message/folder actions during compacting might cause the problem. That might be the case, but it does not explain why .msf files repeatedly disappear (from storage) within a few seconds of leaving the associated folder (within TB), i.e., when reading mail, not when writing/moving/deleting it. So I think my symptoms are different.

I know this is not hugely helpful, but the intermittent nature of the problem, my lack of technical know-how, and the fact I don't know how to trigger it, are not helping much. Thanks for looking at this problem nonetheless.

Francis

Flags: needinfo?(francis_)

I can confirm this issue. This morning a few filters failed; when I looked for .msf files using the Windows "Everything" utility (https://www.voidtools.com/), I noticed about 1200 zero-byte .msf files, all created, maybe by a ghost, at 0:01 (one minute after midnight). I'm pretty sure that I arrived at home and started the laptop to open TB. This is the second time this happens in about a span of a year, it last happened possibly upgrading to TB 78 which included changes to how folder (tree) states/colours are cached. Looking at comment #2, I didn't delete panacea.dat.

I'll duplicate bug 1322476 over here in a minute, check there for additional information.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: dataloss
Summary: MSF files disappear spontaneously (pop) → MSF files disappear spontaneously (pop) (mostly after restart)
See Also: → 1279344

This happened to me for a couple years.

I even wrote a script to check the total count of MSF files in my entire profile,
once per second, beeping if the number changed. So I got to be VERY familiar
with the timing of the disappearances:

  • They always disappeared when Thunderbird was running, never when it was
    closed, so I'm pretty sure it was Thunderbird that was deleting them.
  • They'd be stable for several minutes or hours, then suddenly start being
    deleted. Each second a few more would be gone. My script showed me the
    current count each time it beeped, so I could watch the count dropping in
    near real-time.
  • If I then closed Thunderbird, they'd stop being deleted.
  • When I re-opened Thunderbird, they'd be stable again for a while.
  • After a few more minutes or hours, they'd start vanishing again.
  • When the MSF file for a folder was gone, searches would fail to find
    messages in that folder. Also, the counts of unread and total messages
    and the size of the folder would be missing from the folder list pane.
    And of course the properties of the folder would be lost (columns
    displayed, sort order, threading or not, etc.).
  • Moving to each folder would recreate the MSF file for that folder, so I'd
    use the arrow keys to quickly move to each folder. But not too quickly.
    If I arrowed past any folder too fast, it would not recreate that MSF file.
    So, I had to hit the up or down arrow repeatedly, not allow it to
    auto-repeat. But I did NOT have to wait for the message counts and
    folder size to be fully updated before going on to the next folder.
    Once they started getting updated, they would continue to increase till
    they hit their true value, even if I moved on to another folder. So, I
    could sometimes see those numbers increasing for multiple folders at
    the same time.
  • Doing a search for a string in the body of all messages may have also
    caused all the deleted MSF files to be re-created. This worked
    sometimes, but I'm not sure it always worked.
  • The deletions had nothing to do with compacting folders. I have
    Thunderbird configured to always ask me before compacting, and to
    only do so when it can save over 200 MB. So, I know when compacting
    was occurring, and it had no correlation with the deletions.
  • The deletion were not triggered by any user interactions. More likely
    by some problem accumulating because I'd had Thunderbird open for
    too long. Sometimes I could go several hours without any deletions.
    Sometimes only a couple of minutes. I NEVER left Thunderbird running
    overnight. Always shut it down before running my backup script. But
    occasionally, I'd walk walk away from my Mac for a few minutes and
    leave Thunderbird running. I'd often hear the beeping start up, and
    rush back to close Thunderbird to get the deletions to stop.
  • Sometimes, once they started being deleted, they would stop after
    a few dozen deletions, even if I did nothing. Other times, they'd keep
    getting deleted till hundreds of them were gone and I closed Thunderbird.
    I don't know if it ever happened that all 600-700 of my MSF files were
    deleted in one episode.

See my script at:

And my comment about how stopped happening when I did a significant
cleanup of my 25-year old Netscape/Thunderbird profile:

Hope this helps!
--Fred

Thanks for your comment, let's talk in 2022. I have the problem of vanishing .msf files once or twice a year, so if you started again in 2021 you can't exclude the possibility for it to happen again. You also need: http://bristle.com/Tips/Mac/Unix/tb_msf_count, and that's a heavy load on the file system to count all those files.

BTW, as I said in comment #10, I get zero-length .msf files. So is that the next step of them being deleted? The next start creates new empty ones for all the folders that don't have a corresponding .msf file?

I'll debug it in a local build, but apart from deleting panacea.dat, bug 1093217, comment #5, I have no way to trigger it.

I did a bit of investigation on local folders only stored as mbox. The first question why I see zero size MSF files has the easiest answer. When MSF files are removed, the are recreated empty at the next start of TB.

I couldn't reproduce what was reported in comment #1, that is, that removing panacea.dat will cause a mass removal of MSF files. Removing panacea.dat will however cause a massive re-check of all MSF files via nsMsgDatabase::CheckForErrors(). That function will also remove any MSF file it deems invalid.

BTW, here are the call sites where MSF files are removed:
https://searchfox.org/comm-central/search?q=sum.*-%3ERemove%5C%28&path=&case=false&regexp=true

TB is known for not always properly closing databases (MSF files), see:
https://searchfox.org/comm-central/search?q=left+open&path=nsMsgDatabase.cpp&case=false&regexp=false

So my working theory is this:
For a mass exodus to happen, two events need to come together: First panacea needs to get damaged causing a mass recheck, and then that recheck in nsMsgBrkMBoxStore::IsSummaryFileValid() must also return a wrong answer.
It would also explain the observed behaviour from comment #12, nsMsgBrkMBoxStore::IsSummaryFileValid() returning a wrong result will cause MSF deletion.

To be continued.

Good info, Klaus! Thanks!

Yeah, my MSF file deletions may have been related to a corrupted
panacea.dat file.

As I said, mine had probably evolved for 25 years or so, as I migrated from
Netscape Messenger to Thunderbird, from Windows to Mac, etc.

I also appreciate the links to searches of the Thunderbird source code.
I didn't know that was available.

Thanks!
--Fred

(In reply to Klaus B. from comment #14)

I couldn't reproduce what was reported in comment #1, that is, that removing panacea.dat will cause a mass removal of MSF files.

To clarify, in comment #1, I said removing panacea.dat has possibly fixed the problem, albeit temporarily, not caused it.

Hi Francis, you didn't write comment #1, Jorg K did. He/she referenced bug 1093217, comment #5, by the same person. Maybe you meant the original description in comment #0.

In general, panacea.dat is a Mork database that stores properties for all folders in the system. Open it with a text editor to see it. It contains references to all MSF files in the system. So it's likely that any action around panacea.dat will also affect MSF files. If panacea.dat is damaged or deleted, the system will rebuild it, that causes some delay at startup. During the rebuild, all MSF files are also checked, and in my theory that my lead to undesired results. I got side-tracked into bug 1093217 which has a patch attached. The interesting connection is that this patch revolves around so-called "folder info" which is the very thing that is used for MSF checking here:
https://searchfox.org/comm-central/rev/136bf46cea8bf29a481eeb53734f1c1544f8ab9f/mailnews/local/src/nsMsgBrkMBoxStore.cpp#203
So in case that "folder info" is incorrect at the time of the check, perfectly valid MSF files would be classified "invalid" and subsequently removed here: https://searchfox.org/comm-central/search?q=sum.*-%3ERemove%5C%28&path=nsMsgDatabase.cpp&case=false&regexp=true

It's a real pity there is not a reproducible case here. Something that only happens to a person who could debug it every year is almost impossible to track down. That's where reading the code comes in.

My apologies to you and Jork for getting the comment wrong. Thanks for the explanation on panacea.dat. It did not know it was human-readable, so it gives me something to look into when the issue next arises. It did occur a few weeks ago but seems to have sorted itself out.

Out of interest, is there a connection between panacea.dat and address books? They are proliferating and 19 of them have no name at all. I remember seeing messages (sorry, no specifics!) about address books at the height of the vanishing MSF files issue.

Until TB 68 address books were stored in Mork databases, in TB 78 they've been migrated to SQLite files. MSF files and panacea.dat are still Mork files. There is no connection between address books and panacea.dat, the latter stores folder properties.

(In reply to Klaus B. from comment #14)

The first question why I see zero size MSF files has the easiest answer.
When MSF files are removed, the are recreated empty at the next start of TB.

Yes, and I found out where that happens, seems like a total hack to paper over some other issue:
https://searchfox.org/comm-central/rev/e7c25ffdd125799ce4a6c4be4405b6c1d9e8739a/mailnews/base/src/nsMsgDBFolder.cpp#1206
An empty MSF file is created, far removed from the Mork code which administers those databases :-( - The call stack at this point is:

xul.dll!nsMsgDBFolder::GetFolderCacheKey(nsIFile * * aFile, bool createDBIfMissing) Line 1211	C++
xul.dll!nsMsgDBFolder::ReadDBFolderInfo(bool force) Line 566	C++
xul.dll!nsMsgDBFolder::GetFlags(unsigned int * _retval) Line 1159	C++
xul.dll!nsMsgDBFolder::AddSubfolder(const nsTSubstring<char16_t> & name, nsIMsgFolder * * child) Line 3364	C++
xul.dll!nsMsgBrkMBoxStore::AddSubFolders(nsIMsgFolder * parent, nsCOMPtr<nsIFile> & path, bool deep) Line 951	C++
xul.dll!nsMsgBrkMBoxStore::DiscoverSubFolders(nsIMsgFolder * aParentFolder, bool aDeep) Line 65	C++
xul.dll!nsMsgLocalMailFolder::GetSubFolders(nsTArray<RefPtr<nsIMsgFolder> > & folders) Line 185	C++

That's called from JS ins discoverFolders() here:
https://searchfox.org/comm-central/rev/e7c25ffdd125799ce4a6c4be4405b6c1d9e8739a/mail/base/modules/MailUtils.jsm#62

Creating empty MSF files for missing MSF files appears counterproductive to what we're attempting here: To figure out how the MSF files get lost in the first place.

(In reply to Klaus B. from comment #21)

(In reply to Klaus B. from comment #14)

The first question why I see zero size MSF files has the easiest answer.

Yes, and I found out where that happens, seems like a total hack to paper over some other issue:
https://searchfox.org/comm-central/rev/e7c25ffdd125799ce4a6c4be4405b6c1d9e8739a/mailnews/base/src/nsMsgDBFolder.cpp#1206
An empty MSF file is created, far removed from the Mork code which administers those databases :-(

Over in Bug 1724122 I moved that hack slightly but left it in because I assumed it fixed something, even if I couldn't tell what :-)
But I agree with your assessment - nothing should be creating mork files except mork (and Bug 11050 has some strong opinions about that too).

I propose removing the hack, and will prepare a patch to do that. It predates the Mercurial history and if there is anything still relying on it (doubtful, I'd say), I'd rather find out what that is, and fix it properly.

See Also: → 1724849

I've moved the hack-removal out into Bug 1724849 - it seems there is something relying on creating those empty .msf files, as I get a couple of unit test failures.

Bug 1726319 is showing this behavior (with IMAP instead of POP, but since its the mail storage its probably the same thing). I've taken the liberty of adding Klausb and Ben Campbell to that bug, hope that's OK.

Quick update: a saved search might trigger the problem, and a "ghost" folder might be part of it too.

MSF files have been stable (i.e., they have not shown sign of disappearing) for some months, until I ran a saved search. That search looks for messages within individual folders (ticked by hand). I had not used it for a year or more and sensing trouble, I created a complete backup. I just changed the date parameters on the query and ran it. Immediately, I realised the message count was wrong, perhaps 30-40% of what it should be.

MSF files had disappeared "after"* one particular folder I am calling a "ghost" folder. In Windows explorer I can see a folder called "<something>.sbd and another one just beneath that called "<something>b88cc2d6.sbd". Only the first folder has associate mail and index files (respectively <something>. and <something>.msf), and it is also the only one to show in the Thunderbird folder hierarchy.

*After means anything underneath in the Thunderbird folder list, whether at the same level or at a level above too. For instance in the following hierarchy:

<Folder 1>
-- <Folder 1.2>
-- <something>
-- <Folder 1.3>
<Folder 2>
-- <Folder 2.1>
-- <folder 2.2>

the folders <Folder 1.3>, <Folder 2*> could all be affected by the disappearing MSF problem, though not all were. But <Folder 1> and <Folder 1.3> were not.

However, this did not last long and after a few minutes, most folders, anywhere, were affected, except of course those where messages were counted properly by the query. I removed the "ghost" folder, restarted Thunderbird, and still had the same issue. It is notable that even though the MSF files were recreated when I got into the associated folder, they still disappear shortly after I left it, even though I was not longer running any query.

Restoring from the backup made just before, I recreated my query from scratch, without using manual folder selection, and kept it as a saved search. It worked fine (as in it returned the correct number of messages). Then I manually selected the folders for that search and again, it worked fine.

So search is a factor, but it looks like a trigger rather than the cause: it has an effect that lasts even after TB is shut down and restarted and the search is not used.

I'll be happy to run both searches with comparative logging/debugging options turned on if that can help anyone. Let me know (and point me to a how-to).

Edit: I cannot remember if, when I ran the query from the back up (when it worked), I had removed the ghost file. I most likely did.

Francis, what version are you running now? Have you tried with the beta 99.1? We are working on disappearing MSF files as well in bug 1726319.

(In reply to Nils from comment #27)

Francis, what version are you running now? Have you tried with the beta 99.1? We are working on disappearing MSF files as well in bug 1726319.

Also, please verify that your setup is exclusively POP and no IMAP is involved as per comment 0.
Thanks!

Nils, I'm still on 78.9.0. Anything more recent crashes on start.
gene, I'm all POP, no IMAP. I was working in offline mode as well.

(In reply to Francis Corvin from comment #29)

I'm still on 78.9.0. Anything more recent crashes on start.

Perhaps working out what the crash is would be beneficial here, V78 is no longer supported and observations on how it acts are really of little use.

It may also be that there is a relationship between your observed issues and the crashes. I suggest you start a support topic in the support forum
https://support.mozilla.org/en-US/questions/new/thunderbird/form
and quote your submitted crash IDs for the current release version, so someone can assist you with your crashing issue.

If you do not know how to obtain the submitted crash IDs, please see here https://support.mozilla.org/en-US/kb/mozilla-crash-reporter-tb

Francis, can you try with --safe-mode maybe? Or with a new profile and re-add the accounts (if POP maybe with leave-on-server enabled so your "real" TB profile still gets the mails too)?

(In reply to Francis Corvin from comment #25)

Quick update: a saved search might trigger the problem, and a "ghost" folder might be part of it too.

For search folders, see bug 1554188. How many folders are in your profile? We've recently detected that hitting the "max open files" limitation in Windows has led to massive MSF losses in various situations.

@Matt, I am now on 102.3.0. It runs without crashing. I suspect (have not investigated) that the change of path from "...\Program Files (x86)" to "...\Program Files" had something to do with it. I also spotted significant .msf file loss after launching the new version, but this has not happened again.

@Nils, not sure I want to risk running two different versions of TB, especially as the current one seems to work.

@b5, Interesting hypothesis, and some symptoms look very similar to mine. I have 1,000 mailboxes (give or take a dozen).

(In reply to Francis Corvin from comment #33)

@Matt, I am now on 102.3.0. It runs without crashing. I suspect (have not investigated) that the change of path from "...\Program Files (x86)" to "...\Program Files" had something to do with it. I also spotted significant .msf file loss after launching the new version, but this has not happened again.

Probably bug 1787609 as you have over 1000 accounts, hitting the file open limit of the operating system should be fairly common for many activities around managing and searching mail.

(In reply to Matt from comment #34)

Probably bug 1787609 as you have over 1000 accounts, hitting the file open limit of the operating system should be fairly common for many activities around managing and searching mail.

Just to clarify, I don't have that many e-mail accounts, even including aliases. I do have ~1000 MSF files. Is this what you meant?

Forgot to ask: how do I check the open file limit on Windows 10, and how many handles are open at any one time?

(In reply to Francis Corvin from comment #35)

Just to clarify, I don't have that many e-mail accounts, even including aliases. I do have ~1000 MSF files. Is this what you meant?

I based my statement on your comment on 1,000 mailboxes (give or take a dozen) to mean accounts. However, in this instance the issue is folders not accounts, and having 1,000 folders leads to a similar expectation from me.

I suggest you have a look at bug 1554188#c31 Fundamentally the limit for open files in C libraries is 512.

Bug 1726319#c151 suggests the limit is built into windows. Not being a developer, I really do not know how Geko opens files but if the limt is in windows it is a moot point. It is possible for C to open more than the traditional 512 files https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-170#remarks but that does not help if the issue is windows itself.

You need to log in before you can comment on or make changes to this bug.