[meta] finish "maildir" message storage
Categories
(MailNews Core :: Database, enhancement)
Tracking
(Not tracked)
People
(Reporter: wsmwk, Assigned: benc)
References
(Depends on 25 open bugs, Blocks 2 open bugs)
Details
(Keywords: meta, user-doc-needed, Whiteboard: [maildir])
This meta bug is to organize the work to finish "maildir" message storage. Blocking are bugs from [1] and from bug 58308 (which is qmail and won't block this) I include fixed bug 402392 and bug 738651. The end goal is to enable maildir as default, but the meta bug could certainly live on to finish straggling issues after maildir enabled by default. To do list: - Someone to file a bug for "enable maildir as default message store" - check to see if I have missed anything in blockers - remove anything in blockers that doesn't exactly belong [1] https://bugzilla.mozilla.org/buglist.cgi?list_id=5785777;field0-0-0=short_desc;type0-0-1=substring;field0-0-1=status_whiteboard;resolution=---;query_format=advanced;value0-0-1=maildir;type0-0-0=anywordssubstr;value0-0-0=maildir%20mail-dir;product=MailNews%20Core;product=Thunderbird
Add to the to-do another task: - Tool to optimize mail storage (convert mbox to maildir in existing installations)
Updated•10 years ago
|
Comment 2•10 years ago
|
||
"maildir" in bug summary of Bug 534135 is pure "Maildir" used by IMAP server instead of Tb's maildir-like, Maildir Lite, .... Removing from depndency.
One day (after the >4GB mbox work) I'd like to try out the maildir-lite support and see if I can polish some of the bugs reported here.
Comment 4•10 years ago
|
||
FYI. Quick summary of observed phenomena in some basic/simple test with recent trunk nightly(8Tb 22.0a0). Current maildir store looks; - RETR to LocalMbox/tmp/nnnn, move to LocalMbox/cur/nnnn, works. - fetch body[] to IMAPMbox/tmp/nnnn, move to IMAPMbox/cur/nnnn, works. - Copy/Move mail(s) is always same as Copy/Move from LocalSource/cur/nnnn to LocalTargetMbox/cur/nnnn. - No care for Copy/Move Source/Target folder == berkleystore No Mbox/cur/nnnn, so misbehave. - No care for Copy/Move Source/Target folder == maildirstore/IMAP - Offline-Use=Off : No offline-store file(Mbox/cur/nnnn) => misbehave - Offline-Use=On : Copy/Move from/to IMAP_maildir is; same Copy/Move as Copy/Move from/to Local_maildir + re-synchronizasion with server because of IMAP, if Move. So, if move from IMAP folder, IMAP_maildir/cur/nnnn is simply moved to maildir_Target_folder/cur/nnnn as done in move from maildir_local_mbox to maildir_local_mbox. then, when IMAP Mbox open after move, mail is fetched again. - It looks that return code, status etc. is not checked many places. So, empty directory of Mbox/cur/nnnn, Mbox/cur/nnnn of file size=0 is easily created in many many cases. - Unique filename generation has holes, so same cur/nnnn file can be used by multiple mails. - If directory for Mbox is suffixed, Mbox name is case sensitive in IMAP server. File name in client file syatem is case insensitive. bc/cur for abc, Abc-1/cur for Abc, ABC-2/cur for ABC, are used. In this case, association between Mbox name and Directory name is easily broken by rename of folder, by unsubscribe/subscribe(due to known bug). Funny phenomena was also observed. - If auto-sync of IMAP account is disabled, <server_name>.sbd is created, and <server_name>.sbd\INBOX, INBOX.msf is created and used, and <server_name>.sbd\INBOX/cur is created and used. Directory/file for sub folders under INBOX is created in <server_name>.sbd\INBOX.sbd. Directory/file for all other folder is normally created under <server_name> directory(and <server_name>.msf is created, as usual). This may be caused by /INBOX/Inbox folder what is intensionally created for bug testing. .
Comment 5•10 years ago
|
||
I've opened meta bug 859011 for many currently known problems around "Copy/Move mails with MaildirStore". So moving such bugs from this bug's dependency tree to that bug, to keep this bug as root meta bug for "followups after MaildirStore implementation".
Reporter | ||
Updated•9 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Updated•8 years ago
|
Comment 6•8 years ago
|
||
Is it possible to have the maildir files have the same filename [or at least the date/time part], sans the flags as they do on the server?
Comment 7•8 years ago
|
||
Deleting a folder with Thunderbird will delete the .msf file, but not the actual folder that messages are stored in.
Comment 8•8 years ago
|
||
My inbox folder has about 156k emails in it, but the corresponding file system folder has 200k files. How can this be reconciled without downloading all the emails again? It's taken close to 40 hours to download my whole mail store so far and it isn't quite finished yet. The sluggishness and lock ups during this time are painful too. I've had crashes and had to restart Thunderbird, this may be part of the problem. I haven't deleted 40k emails either (from the inbox). The inbox is only one of my folders, some other folders have huge numbers too. All up I should have about 2M emails (not quite), which includes mailing list data amongst other emails going back over 10 years. It would be much better to be able to rsync the Maildir folder and have a process adjust file names and remove tag information from the file name to the lines at the top of each email. Then build the .msf files from the actual content ... so, basically an offline build.
Comment 9•8 years ago
|
||
Did you guys even test this? I've initiated a repair on my Inbox and now I have 16K extra files in the OS with what looks like another 120K to go. Total emails in the Inbox (reported by TB) is identical to the server's OS directory, that is around 156K; right now the client's OS directory has 217K of email files. Of course the problems are amplified with large mail storage, but even small test folders exhibit problems.
Reporter | ||
Comment 10•8 years ago
|
||
Andrew, AFAIK no one using maildir has experienced this problem. Please file a separate bug report describing your issue https://bugzilla.mozilla.org/enter_bug.cgi?product=Thunderbird and answering the questions posed here and make it block this bug - because this bug is a meta for overall tracking, not for fixing specific bugs. Thanks (In reply to Andrew McGlashan from comment #8) > My inbox folder has about 156k emails in it, but the corresponding file > system folder has 200k files. How can this be reconciled without > downloading all the emails again? It's taken close to 40 hours to download > my whole mail store so far and it isn't quite finished yet. The > sluggishness and lock ups during this time are painful too. > I've had crashes and had to restart Thunderbird, this may be part of the > problem. I haven't deleted 40k emails either (from the inbox). Please post your crash IDs in the new bug report https://support.mozilla.org/en-US/kb/mozilla-crash-reporter#w_viewing-crash-reports > The inbox is only one of my folders, some other folders have huge numbers > too. All up I should have about 2M emails (not quite), which includes > mailing list data amongst other emails going back over 10 years. > > It would be much better to be able to rsync the Maildir folder and have a > process adjust file names and remove tag information from the file name to > the lines at the top of each email. Then build the .msf files from the > actual content ... so, basically an offline build. I believe imap would not like that. Please direct all future comments to the new bug(s) you file.
Reporter | ||
Updated•7 years ago
|
Comment 12•6 years ago
|
||
Is there a progress on this issue? Thomas
Reporter | ||
Comment 13•5 years ago
|
||
Updating dependency list, from bugs found in the query https://bugzilla.mozilla.org/buglist.cgi?v4=maildir&bug_id=476239%2C%201306254%20845952&bug_id_type=nowords&f1=short_desc&o3=substring&list_id=14305682&f8=blocked&v3=maildir&j2=OR&o1=nowordssubstr&resolution=---&classification=Client%20Software&classification=Components&f4=status_whiteboard&query_format=advanced&f3=short_desc&o4=substring&f2=OP&v8=845952%20859011%20&f7=CP&product=MailNews%20Core&product=Thunderbird&o8=nowordssubstr And adding selected "non-blocking" maildir bugs to "related" list
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Comment 16•3 years ago
|
||
Guys, are you aware that in TB68 maildir implementation caused imap folders to have messages with unreadable attachments? For example many users reported that they can't open pdf files from attachments. EML files on disk were fine. Repairing folder helps same as deleting msf and allowing program to recreate it. So the problem was in improper msf files.
Another problem concerning maildir local folders was connected to msf files, too. When moving files from folder to folder (in local folders) (mostly more than 50 at a time), they were moved on disk, but not in TB (stayed in msf). TB showed them in src folder (but clicking them said msg unavailable). It doesn't happen "sometimes". It happened all the time. Generally after moving files (when manually sorting them year by year) we had to delete msf in order to get proper message list and view what was moved and what not.
My question is - are you aware of these problems and did you fix them? Is msf thoroughly tested? 78+ roadmap says that maildir is decent in 78, but i didn't find anything mentioned in changelog about bugs i stumbled upon. To be clear it wasn't on just my machine. With my coworker we deployed many migrations from mbox to maildir and had these problems on almost every machine.
Currently we use mbox for imap and maildir for local folders, but prefer to move emails around in mbox because it's more stable. We convert to maildir in the end after all work is done.
Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.
Comment 17•3 years ago
|
||
To explain last sentence "I would name maildir emails by date then msg id" - i mean files on the disk. That would allow to move them to folders by hand and not in TB. Moving many emails in maildir local folders is very unstable just as I said before. I did write a script that extracts email date from email message and renames files, but it would be super cool to not have to do that in the first place. This functionality is dicated by the fact that we archive mails mostly by year. People often say - delete/archive emails older than x years.
Another bug.
Reproduce: create account, set custom folder to C:\foo, close TB, delete manually C:\foo. Open TB. This time not only C:\foo is created also c:\foo.sbd is created and TB starts syncing emails and write files into c:\foo.sbd instead of c:\foo.
To get around you have to close TB, delete c:\foo.sbd and run TB again.
This time TB sees existing c:\foo and starts to write files to it.
Assignee | ||
Comment 18•3 years ago
|
||
First off, thanks for taking the time to write all that up - very useful and much appreciated!
(In reply to Zbigniew Gralewski from comment #16)
My question is - are you aware of these problems and did you fix them? Is msf thoroughly tested? 78+ roadmap says that maildir is decent in 78, but i didn't find anything mentioned in changelog about bugs i stumbled upon. To be clear it wasn't on just my machine. With my coworker we deployed many migrations from mbox to maildir and had these problems on almost every machine.
This bug is the overview one to track all the maildir related issues - see the "Depends on" list at the top to see all the unresolved maildir bugs.
There are definitely still enough rough edges that I'd be wary of using maildir in production.
A big maildir push is high on my TODO list.
If there are maildir issues not linked to this meta bug, then I recommend creating a new bug and adding it to the "depends on" list.
I don't see any existing bugs that obviously cover the imap-folders-have-messages-with-unreadable-attachments issue you mention - want to write it up? No problem if not - I'll go through your comments in more detail and write whatever we don't already have.
Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.
I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).
Probably a discussion to break out into another bug or on the mailing list.
In any case, plain maildir is a good first step. There's still a bunch of places in the code that kind-of-sort-of assume mbox. So getting vanilla maildir solid and reliable makes it waaaaay simpler to add other potential storage schemes (either minor variants on maildir or stuff that's completely different in approach).
Comment 19•3 years ago
|
||
To explain last sentence "I would name maildir emails by date then msg id" - i mean files on the disk. That would allow to move them to folders by >hand and not in TB. Moving many emails in maildir local folders is very unstable just as I said before. I did write a script that extracts email date >from email message and renames files, but it would be super cool to not have to do that in the first place
I fully agree with this suggestion from Zbigniew Gralewski
@Zbigniew Gralewski :
You wrote about a script. I am interested.
Could you pass it to me?
thoste at email dot com
Thank you
Reporter | ||
Comment 20•3 years ago
|
||
(In reply to Ben Campbell from comment #18)
...
Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.
I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).
There is indeed a "breaking point" on folder size where it takes forever to enumerate folder contents in the MS Windows environment.
Comment 21•3 years ago
|
||
Script that renames eml files massively: https://github.com/VerisZG/ahk_eml_rename_by_date/blob/master/__eml-rename-by-year.ahk
Comment 22•3 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #20)
I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).There is indeed a "breaking point" on folder size where it takes forever to enumerate folder contents in the MS Windows environment.
Wayne, internally I would leave them as they are and in maildir spec as it is. Cur and tmp folders are fine. On dir in TB, two dirs on the disk (cur and tmp). In other words it is useful to have location of files on disk consistent with structure of folders in Thunderbird. Look at it this way, we have GDPR, we teach people how to archive and delete emails and TB can be configured to move them into yearly subfolders when archiving. Problem is when you have a user that does nothing just holds thousands of emails in big inbox. Admin has to be able to quickly move them into local folders, sort by year into subfolders and finally make user mailbox smaller so the user is forced to sort and archive in realtime or once a week. Admin can put maildir local foldes into sync by google drive, synology drive, dropbox etc (excluding MSF files) and you have realtime protection of local foldes then. I use that with success. So I would only use the date extracted from email as filename because it helps a lot with manual admin work. Renamed EML files reindex in TB just fine. Maybe use messageid for emails that don't have proper "Date:" field in headers or use "date_messageid". Look at my ahk script attached in recent post. We rename all files using it and sort into subfolders by date manually, then delete msf files, run TB and the job of sorting tousands of files is done. Admin work is quick and business rules apply. Maybe a bit offtopic but I wish TB to be admin and business rules implementation friendly.
Comment 23•3 years ago
|
||
(In reply to Ben Campbell from comment #18)
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).
Probably a discussion to break out into another bug or on the mailing list.
One issue with the YYYY/MM folder how would you determine according to which timezone should the month change to the next one - local or UTC?
Updated•2 years ago
|
Comment 24•2 years ago
|
||
All folders of my IMAP accounts are set for offline use, so I always have a backup of all my e-mails. However, with maildir, after compressing folders, thunderbird tends to re-download a lot of those mails, and while everything looks fine in the UI, the on-disk folders contain lots of duplicates. Thunderbird just adds an ever increasing number in front of the ".eml" extension and downloads all the same e-mails again and again. I already noticed this years ago, when maildir was still in beta. Now I set up a new profile and am very disappointed to see it still doing the same shit.
Comment 25•2 years ago
|
||
(In reply to Bachsau from comment #24)
All folders of my IMAP accounts are set for offline use, so I always have a backup of all my e-mails.
No, you have a local cache copy for speed, should you loose the emails on the server or Thunderbird losses the ability to connect to the server you will see everything deleted. That is not a backup.
However, with maildir, after compressing folders,
Compact has no function under Maildir lite and should be disabled in IMAP accounts.
thunderbird tends to re-download a lot of those mails,
A response to the reindex that the compact process carries, just as a repair folder will see all the message headers downloaded again, but I have no access to compacting imap accounts, only repairing folders in Thunderbird 78
and while everything looks fine in the UI, the on-disk folders contain lots of duplicates.
Thunderbird just adds an ever increasing number in front of the ".eml" extension and downloads all the same e-mails again and again. I already noticed this years ago, when maildir was still in beta. Now I set up a new profile and am very disappointed to see it still doing the same shit.
Really maildir lite is still in beta as it has never been enabled by default. See https://support.mozilla.org/en-US/kb/maildir-thunderbird
I have one account I use with maildir and I find it has it's issues, but I do not see multiple email copies being downloaded from Gmail.
Perhaps if you have identified a bug that can be reproduced (you offer no steps) you might consider filing a bug for that issue.
This bug is a meta bug to monitor the bugs that are outstanding with regard to the implementation of the maildir lite feature so your comment is most unlikely to see anything happen with regard to the implementation. Filing a bug report for identified bugs is the appropriate approach. If you want to discuss the issue I would suggest you could perhaps use the Discourse forum for beta releases https://discourse.mozilla.org/c/thunderbird/beta/257 While this is considered experimental I would think the beta forum would be the appropriate place for discussion and support as the feature is unfortunately certainly not release quality
Assignee | ||
Comment 26•2 years ago
|
||
Breadcrumbs! I just wrote a big screed of mailstore plans over at https://bugzilla.mozilla.org/show_bug.cgi?id=1308335#c9 (which would probably have been more useful here :- )
Comment 27•1 year ago
|
||
Updated•1 year ago
|
Comment 28•1 year ago
|
||
Possibly already mentioned -
IMAP account - Maildir- Problem If use 'Shift+DEl' to bypass Trash then email not deleted off server, so other imap access to account still see email.
Need to exit Thunderbird to force an expunge (as per Account Settings expunge settings) or have to run a full compact on all folders after deleting to update the server. Not practical.
No ability to compact on a folder - no right click compact option.
Can Customise toolbar to add 'compact' button to toolbar, so able to select folder and click on compact.
Request auto expunge and synch to update server if using Shift+DEL.
Support Forum : https://support.mozilla.org/en-US/questions/1377121
![]() |
||
Comment 29•9 months ago
|
||
i see that Maildir-type storage doesn't seem to handle messages with multiple labels in Gmail as expected... it will create separate (but otherwise identical) EML files, one for each Gmail label / IMAP folder.
this is covered in bugzilla bug #1554529 - Redundant copies of multi-labeled messages stored for GMail (maildir profile
https://bugzilla.mozilla.org/show_bug.cgi?id=1554529
... but i see that this bug is not mentioned above in the "Depends on" references? Is it fixed in current Thunderbird versions?
Comment 30•9 months ago
•
|
||
Fair enough. Based on Bug 1554529 comment 12 this does appear to be a maildir only issue
Updated•8 months ago
|
Description
•