Open Bug 361807 Opened 18 years ago Updated 1 year ago

Support a SQLite Database for Storing Mails

Categories

(MailNews Core :: Database, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: kamikazow, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en; rv:1.9a1) Gecko/20061029 Camino/1.2+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en; rv:1.9a1) Gecko/20061029 Camino/1.2+

Currently Thunderbird can lose lots of mails when crashing during a write access in the mail database.
Since SQLite will be part of Mozilla 1.9 anyway (for Firefox's Places system) Thunderbird should use this as well for storing mails.
Thunderbird's mail database would become "crash proof".

Reproducible: Always
But then you can't use SpotLight search (bug 290057). Maildir (separate file per message) might be better : bug 58308
I support Thunderbird moving to a different mail storage back-end.  There are obvious advantages and disadvantages to using either SQLite or Maildir.  I would propose that the Thunderbird developers make it easier for the community to create their own mail storage backends.  Having a 'pluggable' mail storage system would eliminate any arguments over what system is better.  The MAC users could create their Spotlight search capable mail backend, and the Windows users could have an alternative to the default mbox file.

Based on the feedback generated by both bugs - 290057 and 58308, it seems to me that creating a pluggable framework for email storage would be a good move because it would also eliminate this same discussion in the future when mail storage format X becomes everyone's favorite way of storing emails.  Originally we implemented mail storage using mbox.  A while back we were introduced to the License Free Maildir format.  Recently we were introduced to Public Domain SQLite.  What will it be next?
It would probably move to Unified Storage: http://wiki.mozilla.org/Mozilla2:Unified_Storage
Unfortunately I couldn't find any bugs for tracking. (I'm getting fed up with having to delete the .msf-files on a regular basis.)
Oops, I posted this to the Penelope version of the bug (bug 364808) by mistake. Sorry for the cross-posting:

> Currently Thunderbird can lose lots of mails when crashing during a write
> access in the mail database.

I understand the theory, but does it happen in practice? In other words, is
this a solution looking for a problem? 

I haven't encountered the problem in supporting several small businesses on
Mozilla/TB mail clients (and Eudora), in my own usage, or providing end-user
support in Mozillazine forums. In fact, they've been exceptionally stable --
one reason I use these mail clients -- and I've had more problem with db-based
clients like Outlook.

There are other headaches, like the time it takes to reindex large mailboxes
and mailbox size limits. Something that fixes those bugs, and has other
attractions, might be useful.

I prefer the text format for compatibility, both now and in the future. In 10+
years, will I be able to read a SQLite db created today? Where can I find 10
year old db apps today? I read any mbox file ever created, though, even 15
years old.

As they say KISS, or more precisely, let's keep it 'as simple as possible, and
no simpler'.

(BTW, the maildir bug is Bug 58308)
Flags: blocking-thunderbird3?
Confirming as valid request for consideration.
Lowering severity to enh.
Address book is Bug 382876
Severity: major → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac OS X → All
Hardware: Macintosh → All
Version: unspecified → Trunk
 To quote someone else "In this modern world, the only sensible thing for someone implementing a mail reader to do is use BSD `mbox' files to store mail messages, because that is the de-facto standard used by almost every popular mail reader, on Unix, Windows, and Mac." I happen to agree with this for present and future compatibility reasons.

KAMiKAZOW: A database is not crash proof, nothing is. The closest you'll get is a backup.
Blocks: 402392
In http://www.jwz.org/doc/mailsum.html, Jamie seemed to be against DB summaries.  He only proposed a better method for storing the information about messages, not the messages themselves.  The problem at hand concerns the instability of the mbox format while writing the file out.  jwq - Perhaps your statement is a call for better stability while writing the mbox file out.

In the short span of nine years since Mr. Zawinski wrote his statements, database engines have progressed from very instable and fragile to what we have now as very stable and robust.

Nothing is crash proof - the point of this bug is not to fix the inevitable truth that all things human are imperfect.  The point of this bug seems to me very simply put that the mbox implementation by Thunderbird is prone to data loss and a better solution to storage should be considered.

In my opinion, the comments above by guanxi, jwq, and Jo Hermans only add the the case that Thunderbird should allow the user to choose which storage format they would rather use.
(In reply to comment #7)

 Jamie was against Mork /replacing/ the existing, better, summary files. Alas, we ended up with Mork.

 I can't accept assertions that mbox format is particularly vulnerable to crashes, because I haven't seen any hard evidence to support them. But that's beside the point. As I read it, this bug calls for the replacement of mbox storage with SQLite storage - no choice involved. I wouldn't like that, please leave my mbox alone.
 
> In my opinion, the comments above by guanxi, jwq, and Jo Hermans only add the
> the case that Thunderbird should allow the user to choose which storage format
> they would rather use.

 That would be Bug 402392. Feel free to vote for it.
The argument has been brought up that mbox is the first choice when it comes to reading the mailstore in 15 years. While this may be true (still, I don't believe especially SQLite will be so short-lived, from today's POV), shouldn't a mailstore work more efficient in the present than giving some questionable advantage in the unforeseeable future?

I have seen corrupt mbox files in Thunderbird quite often. Too often for my liking. I've had totally messed up folders of a POP3 account (yes, half of the folders were garbage, see Bug 367774). I really don't have the feeling of safety when using Thunderbird. Especially with larger messages, say more than 10 MB. It's time and again the case that downloading those messages over IMAP fails, they arrive empty. I then have to log in to the server and fetch the file from the server's Maildir through SSH. But still, it's the best programme out there, so I stick with it. Using a tested and stable database system like SQLite would make me feel way more safe. Not to mention the uncomparable speedup for searching or mail operations.
I object to using SQLite for message storage because this would make the message storage mostly inaccessible to any third-party tools. Instead, I would like to promote (again) to finally fix bug #58308.

Implementing an SQLite-based message store seems to be just replacing one problem with another, eg. improving robustness a little bit (maybe), but losing interoperability with other programs at the same time. Please also consider all that has been said about the backup properties of mailboxen: With mbox or SQLite, you have to backup a single, large/huge file, wheras with Maildir, you can backup (lose) only one message at a time.

With respect to comment #7, the author recommended using mbox format Mailboxen, but that was in 2000. We now have 2008, and there has been some progress, notably widespread support for Maildir format almost everywhere, except for Mozilla.

With respect to comment #2: SQLite may be fashionable these days, but moving away from mbox is a technical issue. There are no big technical objections to Maildir these days (ie, after the "FAT" age).

With respect to comment #3: Please see the comment of guanxi, regarding the usefulness of SQLite across a network. Maildir doesn't have problems there, it's explicitly designed to work well over the network, and please also think about when you want to see this feature implemented. Reading about "Unified Storage" reminds me about the classical question: "Do you want 80% now, or 100% never?". I also didn't see in that page that they wanted the messages themselves in that Unified Storage, only the indices. Esp. this: http://www.sqlite.org/whentouse.html (Quote: "SQLite will work over a network filesystem, but because of the latency associated with most network filesystems, performance will not be great. Also, the file locking logic of many network filesystems implementation contains bugs (on both Unix and windows). If file locking does not work like it should, it might be possible for two or more client programs to modify the same part of the same database at the same time, resulting in database corruption. Because this problem results from bugs in the underlying filesystem implementation, there is nothing SQLite can do to prevent it."). Maildir is immune to this kind of problem.

With respect to comment #9: Speed of operation is usually achieved using additional index files in Maildir settings (eg. hcache.db in some versions of Mutt). This provides for very robust operation (Maildir), very fast operation (using said extra index), and it's much better tested than SQLite, too. 


Can I somehow vote _against_ this bug?

Why should anyone want to access mailbox files over a network? I don't think that this is a strong argument.
I opened the bug, because I'm sick of losing mails. If maildir helps, it's fine with me. BTW: SQLite can be used as maildir index file as well.
1. I don't see how storing mails in SQLite would have any advantage in the "dataloss on crash" situation.
2. Even if we're going to use some obscure binary database format for storing mail, it'd be completely optional, it won't replace the default. Mbox as such is quite simple, reliable and well-accepted in the mail world.
3. Both maildir and sqlite do require some internal changes to make mailnews capable of handling them in a way transparent to the rest of mail code.
Re 3, Karsten, do you mean mailnews requires internal changes to use maildir or sqlite, or that maildir/sqlite would need to change in order to be used by mailnews?
FWIW, Maildir doesn't actually scale. I had a couple of gigabytes of mail from this Bugzilla and the result was an IMAP daemon that ran out of memory trying to build a message index to feed to whichever client spoke to it. - note: that was *one* folder, I didn't believe in doing folder organization. as it happens, in my nice pathological use case (getting most mail from Bugzilla [messages are <8k each, so >130000messages/gigabyte]), Maildir is actually considerably worse than mbox (in terms of file system space overhead, and stat operations, risk if you're stupid enough to use a shell glob in a shell, ...).

For a while, I had assumed it was a great panacea. I'd still like to see Maildir support added because lots of people think it's the greatest thing since sliced bread.

But practically speaking, for my use, the only thing that does work is some sort of database driven system (* this requires that the mail engine *not* try to retain all information about all messages for a single folder in memory at once, doing so loses for reasons which should be obvious).

note: at the present time, the sqlite engine has been involved in dataloss (100%) of downloads.sqlite. So I wouldn't just jump and say "this is perfect, and dataloss free" if I were you.

That said, I don't really see anyone preventing the reporter or someone else from trying to implement a sqlite backend as an alternate to mbox.
Summary: Use a SQLite Database for Storing Mails → Support a SQLite Database for Storing Mails
(In reply to comment #13)
> Re 3, Karsten, do you mean mailnews requires internal changes to use maildir
> or sqlite, or that maildir/sqlite would need to change in order to be used by
> mailnews?

You can always hack MailNews to make it use sqlite or maildir or whatever, but that'd be just ... hacking. The main problem is that MailNews too often relies upon assumtions which only hold true for the (smtp+)mbox+mork case (dot doubling in offline news reading anyone?).

We need a kind of blackbox interface for reading/storing mails.
The backend could then be (mbox|maildir|sqlite|whatever)+(mork|sqlite|whatever) and no frontend piece would have to care.

This blackbox interface might just be (a derivation of) mozstorage, although I'm not quite sure how that relates to our needs.

Do we actually need to know about something like indexes in the front end at all?
Yes, we need a pluggable message store - that's a given.
(In reply to comment #14)
> note: at the present time, the sqlite engine has been involved in dataloss
> (100%) of downloads.sqlite. So I wouldn't just jump and say "this is perfect,
> and dataloss free" if I were you.

What does that mean? Once I was wondering whether SQLite would be good for e-mail storage and asked on the SQLite-users mailing list about data safety. They said it wasn't a problem unless your hard disk went for good (or so), and Apple Mail also used SQLite as its mail storage for some time. So you say SQLite was particularly error-prone, or did I get it wrong?
it means what i said. when the downloads.sqlite file gets corrupted our only recourse is to destroy it. which means instead of losing one download entry, you lose all of them. if that was your mailbox, i am fairly confident you would not be happy. i'm not saying it's necessarily the database's fault, I just don't want people to say "implementing a brand new untested sqlite based storage system for mailnews will mean no dataloss ever", new code will be buggy even if it uses some module which people think is reliable.
Assignee: mscott → bienvenu
Component: General → MailNews: Database
Product: Thunderbird → Core
QA Contact: general → database
I don't think we're going to switch to sqlite for mailbox storage in tb3 at least. Having a pluggable message store would be the right next step to move this bug forward, I suspect.  Marking blockingtb3-
Flags: blocking-thunderbird3? → blocking-thunderbird3-
Product: Core → MailNews Core
No longer blocks: 402392
Depends on: 402392
Assignee: mozilla → nobody
As a user with a 22GB inbox, I can safely say that Thunderbird is groaning under the pressure and really unable, certainly in its default configuration, to safely or efficiently deal with my (IMAP-based) email. Thunderbird thrashes away at my email constantly, overheats my laptop, pauses for many seconds in the middle of my typing an email, hangs up, and occasionally throws away my entire mbox file and starts a fresh 22GB download.

I am sure that there must be better solutions, eg MailDir or SQLite or some other database backend that support more efficient and more robust storage. Part of the problems might be my encrypted home directory; part might be to do with IMAP or the search indexing, but the fact is that in the default configuration, Thunderbird is just not working efficiently, and has approached for me the point of un-usability.

I have tested TBird a number of times with fresh profiles and on different machines; this problem remerges fairly quickly, and it's not a one-off glitch. I support the idea mentioned above of adding an API layer to allow different storage backends for Thunderbird, with a view to allowing developers to play around with different storage concepts that might improve robustness.
(In reply to John Pye from comment #21)
> As a user with a 22GB inbox, I can safely say that Thunderbird is groaning
> under the pressure and really unable, certainly in its default
> configuration, to safely or efficiently deal with my (IMAP-based) email.

Doubtful that this has anything to do with the size of the mail store, and much more likely it has to do with the number of messages in the store, i.e., the .msf file. You'd be much better off moving messages some of the messages in your inbox into an archive folder.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.