Open Bug 845952 (maildirblockers) Opened 11 years ago Updated 13 days ago

[meta] finish "maildir" message storage

Categories

(MailNews Core :: Database, defect)

defect

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: wsmwk, Assigned: benc)

References

(Depends on 25 open bugs, Blocks 2 open bugs)

Details

(Keywords: meta, user-doc-needed, Whiteboard: [maildir])

This meta bug is to organize the work to finish "maildir" message storage. Blocking are bugs from [1] and from bug 58308 (which is qmail and won't block this) I include fixed bug 402392 and bug 738651. The end goal is to enable maildir as default, but the meta bug could certainly live on to finish straggling issues after maildir enabled by default.

To do list:
- Someone to file a bug for "enable maildir as default message store"
- check to see if I have missed anything in blockers
- remove anything in blockers that doesn't exactly belong

[1] https://bugzilla.mozilla.org/buglist.cgi?list_id=5785777;field0-0-0=short_desc;type0-0-1=substring;field0-0-1=status_whiteboard;resolution=---;query_format=advanced;value0-0-1=maildir;type0-0-0=anywordssubstr;value0-0-0=maildir%20mail-dir;product=MailNews%20Core;product=Thunderbird
Add to the to-do another task:

- Tool to optimize mail storage (convert mbox to maildir in existing installations)
No longer depends on: 537626
No longer depends on: 584695
Depends on: 58308, 852080, 855950, 855954
Whiteboard: [maildir]
No longer depends on: 789679
Depends on: 856087
"maildir" in bug summary of Bug 534135 is pure "Maildir" used by IMAP server instead of Tb's maildir-like, Maildir Lite, .... Removing from depndency.
No longer depends on: 534135
One day (after the >4GB mbox work) I'd like to try out the maildir-lite support and see if I can polish some of the bugs reported here.
FYI.

Quick summary of observed phenomena in some basic/simple test with recent trunk nightly(8Tb 22.0a0).

Current maildir store looks;
- RETR to LocalMbox/tmp/nnnn, move to LocalMbox/cur/nnnn, works.
- fetch body[] to IMAPMbox/tmp/nnnn, move to IMAPMbox/cur/nnnn, works.
- Copy/Move mail(s) is always same as
  Copy/Move from LocalSource/cur/nnnn to LocalTargetMbox/cur/nnnn.
- No care for Copy/Move Source/Target folder == berkleystore
    No Mbox/cur/nnnn, so misbehave.  
- No care for Copy/Move Source/Target folder == maildirstore/IMAP
  - Offline-Use=Off :
    No offline-store file(Mbox/cur/nnnn) => misbehave  
  - Offline-Use=On :
    Copy/Move from/to IMAP_maildir is;
       same Copy/Move as Copy/Move from/to Local_maildir
     + re-synchronizasion with server because of IMAP, if Move.
    So, if move from IMAP folder, IMAP_maildir/cur/nnnn is
    simply moved to maildir_Target_folder/cur/nnnn
    as done in move from maildir_local_mbox to maildir_local_mbox.
    then, when IMAP Mbox open after move, mail is fetched again.
- It looks that return code, status etc. is not checked many places.
  So, empty directory of Mbox/cur/nnnn, Mbox/cur/nnnn of file size=0
  is easily created in many many cases.
- Unique filename generation has holes, so same cur/nnnn file can be
  used by multiple mails.
- If directory for Mbox is suffixed,
    Mbox name is case sensitive in IMAP server.
    File name in client file syatem is case insensitive.
  bc/cur for abc, Abc-1/cur for Abc, ABC-2/cur for ABC, are used.
  In this case, association between Mbox name and Directory name
  is easily broken by rename of folder, by unsubscribe/subscribe(due to
  known bug).
Funny phenomena was also observed.
- If auto-sync of IMAP account is disabled,
  <server_name>.sbd is created, and <server_name>.sbd\INBOX, INBOX.msf
  is created and used, and <server_name>.sbd\INBOX/cur is created
  and used. Directory/file for sub folders under INBOX is created
  in <server_name>.sbd\INBOX.sbd.
  Directory/file for all other folder is normally created under
  <server_name> directory(and <server_name>.msf is created, as usual).
  This may be caused by /INBOX/Inbox folder what is intensionally
  created for bug testing.
 .
Component: General → Database
Product: Thunderbird → MailNews Core
Hardware: x86 → All
Version: unspecified → Trunk
I've opened meta bug 859011 for many currently known problems around "Copy/Move mails with MaildirStore".
So moving such bugs from this bug's dependency tree to that bug, to keep this bug as root meta bug for "followups after MaildirStore implementation".
Depends on: 797710
Depends on: 890742
Depends on: 906469
Blocks: 476239
Depends on: 816304
Depends on: 1011399
Alias: maildirblockers
Keywords: feature
Depends on: 1078367
No longer depends on: 58308
No longer depends on: 753147
No longer blocks: 1135309
Depends on: 1135309
No longer depends on: 1124948, 1135309
Is it possible to have the maildir files have the same filename [or at least the date/time part], sans the flags as they do on the server?
Deleting a folder with Thunderbird will delete the .msf file, but not the actual folder that messages are stored in.
My inbox folder has about 156k emails in it, but the corresponding file system folder has 200k files.  How can this be reconciled without downloading all the emails again?  It's taken close to 40 hours to download my whole mail store so far and it isn't quite finished yet.  The sluggishness and lock ups during this time are painful too.

I've had crashes and had to restart Thunderbird, this may be part of the problem.  I haven't deleted 40k emails either (from the inbox).

The inbox is only one of my folders, some other folders have huge numbers too.  All up I should have about 2M emails (not quite), which includes mailing list data amongst other emails going back over 10 years.

It would be much better to be able to rsync the Maildir folder and have a process adjust file names and remove tag information from the file name to the lines at the top of each email.  Then build the .msf files from the actual content ... so, basically an offline build.
Did you guys even test this?

I've initiated a repair on my Inbox and now I have 16K extra files in the OS with what looks like another 120K to go.

Total emails in the Inbox (reported by TB) is identical to the server's OS directory, that is around 156K; right now the client's OS directory has 217K of email files.

Of course the problems are amplified with large mail storage, but even small test folders exhibit problems.
Andrew, AFAIK no one using maildir has experienced this problem. Please file a separate bug report describing your issue https://bugzilla.mozilla.org/enter_bug.cgi?product=Thunderbird and answering the questions posed here and make it block this bug - because this bug is a meta for overall tracking, not for fixing specific bugs. Thanks

(In reply to Andrew McGlashan from comment #8)
> My inbox folder has about 156k emails in it, but the corresponding file
> system folder has 200k files.  How can this be reconciled without
> downloading all the emails again?  It's taken close to 40 hours to download
> my whole mail store so far and it isn't quite finished yet.  The
> sluggishness and lock ups during this time are painful too.


> I've had crashes and had to restart Thunderbird, this may be part of the
> problem.  I haven't deleted 40k emails either (from the inbox).

Please post your crash IDs in the new bug report
https://support.mozilla.org/en-US/kb/mozilla-crash-reporter#w_viewing-crash-reports

> The inbox is only one of my folders, some other folders have huge numbers
> too.  All up I should have about 2M emails (not quite), which includes
> mailing list data amongst other emails going back over 10 years.
> 
> It would be much better to be able to rsync the Maildir folder and have a
> process adjust file names and remove tag information from the file name to
> the lines at the top of each email.  Then build the .msf files from the
> actual content ... so, basically an offline build.

I believe imap would not like that.
Please direct all future comments to the new bug(s) you file.
Depends on: 1176675
Depends on: 1182686
Depends on: 1044456
Depends on: 1259035
Depends on: 1259040
Depends on: 1261633
Blocks: 1306254
Depends on: 1275948, 1264673
Depends on: 1307017
Depends on: 1317066
Depends on: 1293770
Depends on: 1203570
Depends on: 1317117
Is there a progress on this issue?

Thomas
Depends on: 1457409
Depends on: 1214407
Depends on: 1333342
Depends on: 1472524
Depends on: 1215807
Depends on: 1486491
Depends on: 1491228
Depends on: 1498532
Depends on: 1504465
Depends on: 1519364
Depends on: 1529929
Depends on: 1515254
Depends on: 1526289
Depends on: 1519045
Depends on: 1586653
Depends on: 856396
Depends on: 1593455
Depends on: 1607021
No longer depends on: 1607021
Assignee: nobody → benc
Depends on: 1533624
Depends on: 1611897
Depends on: 1643901
See Also: → 533792
Depends on: 1617518

Guys, are you aware that in TB68 maildir implementation caused imap folders to have messages with unreadable attachments? For example many users reported that they can't open pdf files from attachments. EML files on disk were fine. Repairing folder helps same as deleting msf and allowing program to recreate it. So the problem was in improper msf files.

Another problem concerning maildir local folders was connected to msf files, too. When moving files from folder to folder (in local folders) (mostly more than 50 at a time), they were moved on disk, but not in TB (stayed in msf). TB showed them in src folder (but clicking them said msg unavailable). It doesn't happen "sometimes". It happened all the time. Generally after moving files (when manually sorting them year by year) we had to delete msf in order to get proper message list and view what was moved and what not.

My question is - are you aware of these problems and did you fix them? Is msf thoroughly tested? 78+ roadmap says that maildir is decent in 78, but i didn't find anything mentioned in changelog about bugs i stumbled upon. To be clear it wasn't on just my machine. With my coworker we deployed many migrations from mbox to maildir and had these problems on almost every machine.

Currently we use mbox for imap and maildir for local folders, but prefer to move emails around in mbox because it's more stable. We convert to maildir in the end after all work is done.

Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.

To explain last sentence "I would name maildir emails by date then msg id" - i mean files on the disk. That would allow to move them to folders by hand and not in TB. Moving many emails in maildir local folders is very unstable just as I said before. I did write a script that extracts email date from email message and renames files, but it would be super cool to not have to do that in the first place. This functionality is dicated by the fact that we archive mails mostly by year. People often say - delete/archive emails older than x years.

Another bug.
Reproduce: create account, set custom folder to C:\foo, close TB, delete manually C:\foo. Open TB. This time not only C:\foo is created also c:\foo.sbd is created and TB starts syncing emails and write files into c:\foo.sbd instead of c:\foo.
To get around you have to close TB, delete c:\foo.sbd and run TB again.
This time TB sees existing c:\foo and starts to write files to it.

First off, thanks for taking the time to write all that up - very useful and much appreciated!

(In reply to Zbigniew Gralewski from comment #16)

My question is - are you aware of these problems and did you fix them? Is msf thoroughly tested? 78+ roadmap says that maildir is decent in 78, but i didn't find anything mentioned in changelog about bugs i stumbled upon. To be clear it wasn't on just my machine. With my coworker we deployed many migrations from mbox to maildir and had these problems on almost every machine.

This bug is the overview one to track all the maildir related issues - see the "Depends on" list at the top to see all the unresolved maildir bugs.
There are definitely still enough rough edges that I'd be wary of using maildir in production.
A big maildir push is high on my TODO list.
If there are maildir issues not linked to this meta bug, then I recommend creating a new bug and adding it to the "depends on" list.

I don't see any existing bugs that obviously cover the imap-folders-have-messages-with-unreadable-attachments issue you mention - want to write it up? No problem if not - I'll go through your comments in more detail and write whatever we don't already have.

Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.

I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).
Probably a discussion to break out into another bug or on the mailing list.

In any case, plain maildir is a good first step. There's still a bunch of places in the code that kind-of-sort-of assume mbox. So getting vanilla maildir solid and reliable makes it waaaaay simpler to add other potential storage schemes (either minor variants on maildir or stuff that's completely different in approach).

To explain last sentence "I would name maildir emails by date then msg id" - i mean files on the disk. That would allow to move them to folders by >hand and not in TB. Moving many emails in maildir local folders is very unstable just as I said before. I did write a script that extracts email date >from email message and renames files, but it would be super cool to not have to do that in the first place

I fully agree with this suggestion from Zbigniew Gralewski

@Zbigniew Gralewski :

You wrote about a script. I am interested.
Could you pass it to me?
thoste at email dot com

Thank you

(In reply to Ben Campbell from comment #18)

...

Last but not least, I would name maildir emails by date then msg id because it allows sorting them yearly and totally simplifies archiving. Trying to work on one big maildir local folder consisting of mails from many years is in tb68 almost impossible without converting to mbox first.

I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).

There is indeed a "breaking point" on folder size where it takes forever to enumerate folder contents in the MS Windows environment.

(In reply to Wayne Mery (:wsmwk) from comment #20)

I think that's an interesting point - definitely something to look into.
There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).

There is indeed a "breaking point" on folder size where it takes forever to enumerate folder contents in the MS Windows environment.

Wayne, internally I would leave them as they are and in maildir spec as it is. Cur and tmp folders are fine. On dir in TB, two dirs on the disk (cur and tmp). In other words it is useful to have location of files on disk consistent with structure of folders in Thunderbird. Look at it this way, we have GDPR, we teach people how to archive and delete emails and TB can be configured to move them into yearly subfolders when archiving. Problem is when you have a user that does nothing just holds thousands of emails in big inbox. Admin has to be able to quickly move them into local folders, sort by year into subfolders and finally make user mailbox smaller so the user is forced to sort and archive in realtime or once a week. Admin can put maildir local foldes into sync by google drive, synology drive, dropbox etc (excluding MSF files) and you have realtime protection of local foldes then. I use that with success. So I would only use the date extracted from email as filename because it helps a lot with manual admin work. Renamed EML files reindex in TB just fine. Maybe use messageid for emails that don't have proper "Date:" field in headers or use "date_messageid". Look at my ahk script attached in recent post. We rename all files using it and sort into subfolders by date manually, then delete msf files, run TB and the job of sorting tousands of files is done. Admin work is quick and business rules apply. Maybe a bit offtopic but I wish TB to be admin and business rules implementation friendly.

(In reply to Ben Campbell from comment #18)

There's a bigger question here: Is there any benefit to adhering to the maildir spec (all emails as files in a single flat directory), rather than, say, automatically stashing emails into subfolders. For example, "<YYYY>-<MM>/<msgid>.eml" would probably be manageable and rather useful to the user. You could get a more even distribution by, say, using subdirs based on hashing the messageid. But at the expense of making it an arse for the user to find emails in the filesystem (seems like a bad tradeoff).
Probably a discussion to break out into another bug or on the mailing list.

One issue with the YYYY/MM folder how would you determine according to which timezone should the month change to the next one - local or UTC?

See Also: → 286888
Summary: finish "maildir" message storage [meta] → [meta] finish "maildir" message storage
Depends on: 1694942

All folders of my IMAP accounts are set for offline use, so I always have a backup of all my e-mails. However, with maildir, after compressing folders, thunderbird tends to re-download a lot of those mails, and while everything looks fine in the UI, the on-disk folders contain lots of duplicates. Thunderbird just adds an ever increasing number in front of the ".eml" extension and downloads all the same e-mails again and again. I already noticed this years ago, when maildir was still in beta. Now I set up a new profile and am very disappointed to see it still doing the same shit.

(In reply to Bachsau from comment #24)

All folders of my IMAP accounts are set for offline use, so I always have a backup of all my e-mails.

No, you have a local cache copy for speed, should you loose the emails on the server or Thunderbird losses the ability to connect to the server you will see everything deleted. That is not a backup.

However, with maildir, after compressing folders,

Compact has no function under Maildir lite and should be disabled in IMAP accounts.

thunderbird tends to re-download a lot of those mails,

A response to the reindex that the compact process carries, just as a repair folder will see all the message headers downloaded again, but I have no access to compacting imap accounts, only repairing folders in Thunderbird 78

and while everything looks fine in the UI, the on-disk folders contain lots of duplicates.

Thunderbird just adds an ever increasing number in front of the ".eml" extension and downloads all the same e-mails again and again. I already noticed this years ago, when maildir was still in beta. Now I set up a new profile and am very disappointed to see it still doing the same shit.

Really maildir lite is still in beta as it has never been enabled by default. See https://support.mozilla.org/en-US/kb/maildir-thunderbird

I have one account I use with maildir and I find it has it's issues, but I do not see multiple email copies being downloaded from Gmail.

Perhaps if you have identified a bug that can be reproduced (you offer no steps) you might consider filing a bug for that issue.

This bug is a meta bug to monitor the bugs that are outstanding with regard to the implementation of the maildir lite feature so your comment is most unlikely to see anything happen with regard to the implementation. Filing a bug report for identified bugs is the appropriate approach. If you want to discuss the issue I would suggest you could perhaps use the Discourse forum for beta releases https://discourse.mozilla.org/c/thunderbird/beta/257 While this is considered experimental I would think the beta forum would be the appropriate place for discussion and support as the feature is unfortunately certainly not release quality

Depends on: 1686852
Depends on: 1717137
Depends on: 1683714
Depends on: 1719996

Breadcrumbs! I just wrote a big screed of mailstore plans over at https://bugzilla.mozilla.org/show_bug.cgi?id=1308335#c9 (which would probably have been more useful here :- )

Depends on: 1736320
Status: NEW → ASSIGNED
Depends on: 1763263
Depends on: 1764857
No longer depends on: 1764857
Depends on: 1767190

Possibly already mentioned -
IMAP account - Maildir- Problem If use 'Shift+DEl' to bypass Trash then email not deleted off server, so other imap access to account still see email.
Need to exit Thunderbird to force an expunge (as per Account Settings expunge settings) or have to run a full compact on all folders after deleting to update the server. Not practical.
No ability to compact on a folder - no right click compact option.
Can Customise toolbar to add 'compact' button to toolbar, so able to select folder and click on compact.

Request auto expunge and synch to update server if using Shift+DEL.
Support Forum : https://support.mozilla.org/en-US/questions/1377121

i see that Maildir-type storage doesn't seem to handle messages with multiple labels in Gmail as expected... it will create separate (but otherwise identical) EML files, one for each Gmail label / IMAP folder.

this is covered in bugzilla bug #1554529 - Redundant copies of multi-labeled messages stored for GMail (maildir profile
https://bugzilla.mozilla.org/show_bug.cgi?id=1554529

... but i see that this bug is not mentioned above in the "Depends on" references? Is it fixed in current Thunderbird versions?

Depends on: 1554529

Fair enough. Based on Bug 1554529 comment 12 this does appear to be a maildir only issue

Severity: normal → S3
Depends on: 1827973
Depends on: 1835556
Depends on: 1716651

Most of the blocking reports are "bugs" not enhancements, so let's categorizes this meta as a "bug"

Type: enhancement → defect
Depends on: 1888585
Depends on: 1898635
You need to log in before you can comment on or make changes to this bug.