Open Bug 384017 Opened 19 years ago Updated 2 years ago

Thunderbird updates File Modified timestamp on MSF files even when the mail folder is not accessed or changed. (not good for backups)

Categories

(MailNews Core :: Backend, defect)

x86
Windows Vista
defect

Tracking

(Not tracked)

People

(Reporter: stephan, Unassigned)

References

Details

It appears that Thunderbird changes (or at least changes the file modified date) on all (or at least most) msf files on a regular and frequent visit, even if the mail folder associated with the msf file is not accessed. This wreaks havoc with backup systems, which keep thinking these files have been updated and decides it needs to back them up. I have mail folders I haven't visited in months, but the msf files are marked by the OS as "modified" every single day, and I can only think that Thunderbird is doing this.
.msf file use is not limited to only "mail folder open"(click the mail folder). An example is "Retention Policy" in property of a folder. And ".msf" file keeps state of a folder, then physical file update may occur even if no mail is read. When and how frequent ".msf" update occur? (Process Monitor at MS's sysinternals is useful. Watch a ".msf" file access.) ( http://www.microsoft.com/technet/sysinternals/utilitiesindex.mspx ) I guess that "modified every single day" indicates update by "Retention Policy".
I'm not 100% sure I see the bug here. I would think that most internal files would be touched, at least to check for information consistency, somewhat regularly. (apologies if I'm wildly confused or off-base, first time I'm getting into the Mozilla internals)
If there are no changes to a folders content I think there should be no changes to the .msf file. It may be an option to have a mark for the folder "frozen/static/archived" to avoid any change to it
Devoti Paolo writes in comment #4: > If there are no changes to a folders content I think there should be no > changes to the .msf file. Exactly. Stephan Golux writes in comment #0: > This wreaks havoc with backup systems, which keep thinking these files > have been updated and decides it needs to back them up. I was burnt by this recently. A while back I noticed that *all* .msf files were bring touched on startup, so I added a rule to my backup scripts to exclude them, and thus ceased backing them up. Recently some bug caused a bunch of .msf files to get zeroed out, and away went the associated message state information. Not the biggest loss, but still an annoyance, as I become increasingly more reliant on message meta data (like tags). (I'm currently looking through Bugzilla to see if this is a known problem, and to see if there is a way to permanently store the state information in the mailbox file. Although I know there are some status headers maintained by Mozilla, I don't think compacting the folder writes all of the meta data into the mailbox file itself.)
On my System (WinXP Professional, Thunderbird 2.0.0.6) the "last changed" timestamp seen in the properties dialog of my Profiles/default/*/Mail/pop3.web.de/Inbox file did not change in the last months, but I got lots of mails in the last months. Only the "last access" timestamp did change. This prevented my backupsoftware from doing incremental backups correctly of my mailboxes. Thunderbird should really set this timestamp accordingly. Couldn't someone write a AddOn that does this, until this gets resolved? I'm not a programmer or coder, so don't point at me :)
(In reply to comment #6) Excuse me. I looked at the wrong file. Please ignore my rantings. All the praise to the people creating thunderbird.
This also occurs on Macintosh (OS X 10.x). If found this problem because OS X's "Time Machine" was backing 500 mbytes a day when I was just reading my email (I have years of old email stored in Thunderbird folders). I guess one alternate solution (on Mac OS X) would be to archive the whole Thunderbird /Users/MyLogin/Library/Thunderbird directory tree into another account's dir tree on the box. You could then login into that other account and run Thunderbird there to see the old archived files. And also, you could delete old folders out of your current tree to prevent them from getting backed up. But this is a pretty brain damaged solution. Re: Comment 1 by WADA Why does does the modification time of an .msf file change if the state and retention policy hasn't changed?
(In reply to comment #8) > Re: Comment 1 by WADA > Why does does the modification time of an .msf file change if the state and > retention policy hasn't changed? ".msf" file is an Data Base(mork DB). And I believe that it's always opened with "write-mode" (open with read-only mode is nonsense, because ".msf" is a data base to keep track of action on mail folder). And, history of activity(for example, timestamp data of last check by retension policy) is held in ".msf". To Ben Slade: What do you think?
In my experience in a system with many hundreds of all *.msf are opened every time you open Thunderbird. I don't understand the justification for this. What purpose is there in opening a file which is not involved in a given sessions activities? There is also a related lost (needs to be rebuilt) index problem which may be related to this one. But my main problem is that there is unnecessary overhead in backing up and synchronizing Thunderbird because of all the files that are forced into play (essentially all Thunderbird's data files). A daily back-up or a desktop-laptop sync should be a minute operation at today's router speeds or using something like the Laplink cable. But my backups/syncs take 5-10 minutes and a huge part of that - like 9/10ths - is the filehandling system Thunderbird has created. No other software I've encountered, including three other email programs, behave similarly.
Summary: Thunderbird updates File Modified timestamp on MSF files even when the mail folder is not accessed or changed. → Thunderbird updates File Modified timestamp on MSF files even when the mail folder is not accessed or changed. (not good for backup)
Summary: Thunderbird updates File Modified timestamp on MSF files even when the mail folder is not accessed or changed. (not good for backup) → Thunderbird updates File Modified timestamp on MSF files even when the mail folder is not accessed or changed. (not good for backups)
Some random thought after reading Comment 9. I think that the real problem is that .msf file (if I understand correctly the comment) are used for two completely different purposes (please correct me if I'm wrong). The first one is to index the mail file, and keep track of the status of each and every message. The second one is to keep track of the global status of that particular mail file (that is seen as a folder). This status includes information that can change without explicit actions from the user (e.g. when the folder was compressed, when was viewed etc...). So, often thunderbird writes something into the .msf file (it is not sufficient to open a file in read-write mode to change its date). I think that this two different kind of information should go to different files, so that one can backup only the .msf files, losing almost nothing not backing up the second type of information.
In response to https://bugzilla.mozilla.org/show_bug.cgi?id=384017#c12 I'd reframe this a bit. One purpose of the .msf file is to cache the headers, so that certain actions can take place without a trip out to the server. The cached information shouldn't be kept in the backup-able profile at all. On a Macintosh, ~/Library/Caches/ would the right place to put a cache. That would reduce the size of the file substantially, and make the read/write distinction a small enough issue that it could be ignored.
These files seem to contain more than just cached information. For instance, if one gets deleted, I lose the reverse date sort order setting for this folder. Such settings must be saved in the backup !
Component: General → Backend
Product: Thunderbird → MailNews Core
QA Contact: general → backend
Version: 2.0 → Trunk
Stadelmann's point is well-take but we're talking here about the need to back up files that have not changed during a session, indeed the wisdom of doing so. The problem - and this is one of those issues that has been sitting out here for years unaddressed - is that Thunderbird changes the date-time stamp whether a file has been accessed or not during a session, causing the backup program to copy that file back over an identical file byte for byte on the backup drive. No purpose to this, increases the amount of writing which invites errors, and extends the time for backups considerably. In my case lengthening a backup as many as 20x what it needs to be.
Thunderbirds filestructure is very bad designed when it comes that one is wanting to back up its mails. I'm running Thunderbird 3.0.3 on a Mac on OS X 10.6.2. I switched from Apple Mail to Thunderbird, because Thunderbird offers some more functionality. But Apple Mail has FAR the better filestructure, i.e. there every mail is kept as a single file and my backup program then just can backup up this single new file. In Thunderbird THE WHOLE mailbox, for example the inbox has to be backuped everytime?! Very uncool. Best would be a simple filestructure. Where the mails 1-n are stored in 1-n files named like the mails-header. If this doesn't work with the password-security that Thunderbird provides. Let the user choose if he wants a password, but then can't easily backup its mails or whether he wants to go without password but with a much better backupable filestructure.
(In reply to comment #17) > Thunderbirds filestructure is very bad designed when it comes > that one is wanting to back up its mails. Tb's ".msf" file is a kind of DB, Data Base(Mork DB). It's never same a Text file, Word's document file, Photoshop's graphic file, which are modified by batch type update by you(modification of such file is done only when you modified content by invoking Text editor/Word/Photoshop and only when you saved modified file content). I believe that system of "Backup of file for Data Base merely based on archive bit of file on MS Win or equivalent on other OS" is one of most stupid file backup system. "System for backup" should be consolidated with "recovery procedure from backup file". Why "file usage by an application" should be designed/implemented merely based on such stupid backup system which merely/blindly bases on only "archive bit of file on MS Win or equivalent on other OS"?
(In reply to comment #17) > But Apple Mail has FAR the better filestructure, > i.e. there every mail is kept as a single file (snip) Please note that this bug is for phenomenon of frequent update of file's last modified timestamp and archive bit of ".msf" file(Mork DB) by Tb than users thought, and for incident that .msf file size is not so small if large mail folder. And one of requests by bug opener and some ones is; Split .msf file to "large but relatively static" part(index of mail) and "frequently updated but file size is small" part(folder management). This bug is never for "update of mail folder file(file of Unix Mbox format)". steikeldrout@gmail.com, please note that "one problem per a bug" is rule at B.M.O. For mail folder file of Apple Mail. You are right. Apple changed to "a file per mail(emlx)" by Apple Mail 2 from "unix mbox"(multiple mails in a file) of Apple Mail 1. But it's NEVER for file backup system who merely bases on archive bit of a file or file modification timestamp. See bug 58308 for request of qmail's maildir structure("one file per mail" is involved in it), please.
(In reply to comment #16) > The problem - and this is one of those issues that has been sitting out here > for years unaddressed - is that Thunderbird changes the date-time stamp whether > a file has been accessed or not during a session, causing the backup program to > copy that file back over an identical file byte for byte on the backup drive. it's even worse. At least in SeaMonkey (actual 2.0.7pre) on Linux the .msf files for rather old and definitely not touched folders are altered. I'm using a syncing Program that compares file contents (unison) and even then every day a mass of files gets updated.
(In reply to comment #20) > it's even worse. At least in SeaMonkey (actual 2.0.7pre) on Linux the .msf > files for rather old and definitely not touched folders are altered. I'm using > a syncing Program that compares file contents (unison) and even then every day > a mass of files gets updated. Files for a mail folder(.msf file + Unix Mbox file for mail data) is a kind of Data Base. Back up(and recovery from back up files) of "Data Base" by archive bit of a file or file update timestamp only is one of stupidest backup/recovery procudure of "Data base". Susanne Jaeger, why an application who uses "Data Base" should change usage of files of "Data Base" for such stupid backup system?
I don't understand the question. I'm using one profile (with a lot of mail folders) on 2 systems (not at the same time) and synchronize these profiles every day. When I exclude all msf-files from synchronizing its nearly impossible to keep track of folders with new/unread mails, because these are (seem to be) extracted from the corresponding msf. Rebuilding the index for all folders in a large mailarchive with (at least) hundreds of subfolders ist nothing I want to do regularly.
(In reply to comment #22) > I don't understand the question. Oh, I've understood why many complaints are posted to this bug. I leave this bug, because I dont't want to add comment or question to such complaints any more. Bye.
FYI. If no retention policy is set, update frequency of .msf may be reduced by fix of bug 637352.
I'm still on TB 3.1.11. I have NO retention policies set. But when I only LOOK at emails in my Local Folders the MSF file date & timestamp gets updated. This is undesirable. The above thread seems to peter out. Was there ever a fix for this behaviour, or has it been fixed in a later release of TB please?
(In reply to Kevin from comment #25) > I'm still on TB 3.1.11. I have NO retention policies set. But when I only > LOOK at emails in my Local Folders the MSF file date & timestamp gets > updated. > > This is undesirable. The above thread seems to peter out. > > Was there ever a fix for this behaviour, or has it been fixed in a later > release of TB please? It's fixed in TB 5, not TB 3.1.x
That's fantastic thank you. My laptop - desktop sync might work now! TB version numbers seem to have gone through the roof. Looks like it's up to version 10 now. Hope I don't have to upgrade hardware and relearn how to use it! Many thanks.
My hopes have been dashed. I installed TB 10 over my TB 3 installation. It picked up my profile & settings ok. I went in to one of my Local Folders, nothing in the preview pane so no action like marking the message as read. Came out of TB and the corresponding MSF file timestamp had been updated. Grrrr! Has this bug slipped back in since TB 5?
MSF file seems to be growing every time I go in to TB. 1st time by 1525 bytes. 2nd time by 192 bytes. 3rd time by 108 bytes.
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
Severity: normal → S3
See Also: → 1848494
You need to log in before you can comment on or make changes to this bug.