i.e. doesn't require all data to be in memory, by using some buffered disk-based implementation
18 years ago
I thought this was a request for outside help; why is it a bug assigned to me? I think the milestone should be a much bigger number than m15, since I don't plan to ever fix this bug in the forseeable future. Is that what m15 means? This is kind of like assigning some layout engineer a bug that says "make browser smaller and faster than smallest and fastest known browser client". Kind of a dozen milestones effort.
The bug assigned to you because you are the module owner for database, according to bugzilla. m15 is a holding place for "way far out there bugs", for now. I created this bug from the old http://www.mozilla.org/mailnews/jobs.html It's good to have you, Mr. Mork, associated with this bug, because if some external db guru finds it, (by searching for all HELP WANTED bugs) and decides to tackle it, they'll be able to find you easily.
Sorry for the whining and complaining; I was just hit by a thrill of fear when I read this bug. :-)
Bulk-resolving requests for enhancement as "later" to get them off the Seamonkey bug tracking radar. Even though these bugs are not "open" in bugzilla, we welcome fixes and improvements in these areas at any time. Mail/news RFEs continue to be tracked on http://www.mozilla.org/mailnews/jobs.html
Reopen mail/news HELP WANTED bugs and reassign to firstname.lastname@example.org
cc'ing some interested parties. see http://www.jwz.org/doc/mailsum.html for some ideas on how jwz did it in 3.0
Seth, jwz repeated several times that he's don't like the idea of using universal db package for message DB. Do you disagree with him? Is there something wrong with the way it was done in 3.0? Also, does dbm already loaded by mozilla (I guess you mean 'dbm' by 'open source database')? If not, ad hoc solution may be better than overhead of loading universal db code, isn't it? And what about interfaces? Should they remain unchanged or some changes allowed?
sleepycat db has come up from time to time. we should also discuss what we need for folder summaries, and what we need for addressbooks (local, and replicated online ldap) way in the future, but cc some people who know about the issues of large, replicated dbs.
Last time I thought about this, I took a quick look at <http://www.sqlite.org/> and it seemed to have potential. Itss on-disk footprint is within a small number of K of mork; it appears mature, claims to have small memory footprint and be speedy, and is in the public domain (!).
another issue that has come up (query for bugs about replication of ldap databases) is using a db that doesn't require us to read the entire db into disk. it's a big deal for ldap replication (think about the size of phonebook.mab with 100,000 entries. dmose has numbers on this.) I think bienvenu owns a bug about fixing mork to allow for this, but when we get to this, we might decide to just move the addressbook over to something like sleepycat.
SleepyCat has a big licensing issue: people who ship non open-source applications using it have to pay SleepyCat money. People who want to re-brand Mozilla currently do not have to deal with any code that requires this. Including code with a licensing requirement of this sort is likely to be an issue. See <http://www.sleepycat.com/licensing.html> for details on the licensing.
Update to license link: http://www.sleepycat.com/download/oslicense.html Since the creation of the Mozilla Foundation, don't all "redistributions" of mozilla require access to the source?
FYI: for sql style query: sqlite is Public Domain and seems well commented http://www.hwaci.com/sw/sqlite/
In the context of application interoperability and desktop integration, maybe data like news articles, mail messages, address books, bookmarks and so on should be stored application independent by proposing a suitable standard. The sqlite idea is a nice idea. How about creating a new standard where modern desktop environments like KDE and GNOME provide a general SQL data managment daemon for storing application-independent user-data, eg. by running a user-selected SQL server in user-space? It would be pretty cool. Applications could even access these resources concurrently and register as listeners to them. It would provide efficient locking, transactions, ... I basically see two solutions to that problem: 1.) provide an SQL user-space backend and "teach" Mozilla to use it. It could be integrated with the regular Mozilla distribution package and Mozilla could fire it up if there is no such service already running on the user's machine. That would allow to share the data source, eventually, between Mozilla and other desktop apps -- a feature long requested by many users who don't want, for example, be limited to specific browsers (bookmarks sharing while the apps are running!) or mail clients. 2.) Drop the goal of developing a general and efficient data backend by implementing task-specific solutions. I think that this would be somewhat backwards.
someone @mozilla.org should probably contact freedesktop.org if there's a way to create something in this way. I guess that could help quite a bunch of people...
hmm, just saw that my comment #16 was slightly off-topic as this is only about the backend to make such things work, sorry. I meant to answer what comment #15 said about collaborating with e.g. Gnome and KDE on having a standard for storing app independent user data. Ignore my comment for the heart of this bug report. Sorry again for kinda spamming you (and doing it again now).
(In reply to comment #10) > Last time I thought about this, I took a quick look at <http://www.sqlite.org/> > and it seemed to have potential. Itss on-disk footprint is within a small > number of K of mork; it appears mature, claims to have small memory footprint > and be speedy, and is in the public domain (!). and sqlite is now in headed to bigger and better things in Firefox
Is there a schedule for implementing this for Mail/News? (sqlite, MozStorage)
no one has volunteered to do all the heavy lifting required to do this.
Conversation from IRC on this topic: <Mnyromyr> it would help if MailNews could forget its Mork knowledge first, ie. just use some interfaces without assuming anything about the inner workings <Standard8> yeah but you'll probably find there's other parts of the code that assume a mork style database or something. <Standard8> Address book is very bad on that for example <KaiRo> I think doing bug 11050 with mozStorage would probably be good <Standard8> you could probably fairly easily get an ab running on mozstorage, you'd then have to figure out how to rework all the uses like import, address collection to not assume its a mork database. <Mnyromyr> just hacking mailnews to use X instead of Mork doesn't help a bit <Mnyromyr> the most important task imo is a clean interface between any DB and its users <KaiRo> I think for the stuff that is in .msf files, it might make sense to rework things a bit and make this either a global storage for all of mailnews or at least for a whole account, going away from per-directory summary files <KaiRo> at least for message IDs it would be really good to know them at an app level <KaiRo> well, doing things like marking cross-posts read in all affected newsgroups, saved search folders, and much other stuff would be easier to implement when this is not per-folder <Joshua> I would think that per-account is probably the best method * NeilAway notes that IMAP claims it deletes the .msf when your uidvalidity changes <Joshua> this is how I think the database should be handled <Joshua> 1. Make the switch in addrbook <Joshua> 2. Switch nsMsgDatabase and friends in mailnews/db/msgdb/ <Joshua> 3. Combine *.msf to per-account levels <Joshua> 4. How to handle profile migration ??? <KaiRo> I think 2+3 need to be one step, as you probably need to change APIs/calls for both steps in the same places anyways, so you can do them at once <KaiRo> and 4 should be a non-issue as we need auto-import even in existing profiles <KaiRo> I don't think that auto-re-import is a good idea. do it one time when no sqlite is present but msf(s) is/are, and ignore everything else when a sqlite is there. when someone wants us to force re-importing, he should kill his sqlite
And now you finally get someone willing to work on this. ;) My starting point is going to be in addrbook. The planned schema for the addressbook: CREATE TABLE Cards ( CardKey INT NOT NULL PRIMARY KEY AUTOINCREMENT, [ xxx CHAR for all defined names in nsAddressBook.idl ] ); CREATE TABLE OtherProperties ( CardKey INT, Property CHAR, Value CHAR ); Most likely some of the values should be INTs and not CHARs, but there's no harm in declaring all CHARs, so... More tangential: SQLite does permit you to search across databases. Questions/Comments/Concerns/Thoughts?
It feels too simple.
If this is morphed into "Make address book use mozStorage" (which sounds sensible as a first step towards eliminating Mork from MailNews), then it probably should move into the "MailNews: Address Book" component and get the QA contact changed to email@example.com so that address book people will get bugmail.
Joshua, thank you for being willing to pick up the conversion. My main comments at the moment are these: Firstly, the bug for the address book conversion is bug 382876 so any address book specific stuff I'd like to be covered there. Secondly, whilst the move to a sqlite database although should be fairly easy, we need to make sure it is implemented in a way that is easy for us to use and doesn't hold back future dev - the database will be at the heart of our address book so it needs to be flexible and manageable. So I'd like to ask that you put some basic requirements and a design (e.g. schema, which classes) together on the wiki pages (http://wiki.mozilla.org/MailNews:Address_Book e.g. MailNews:Address_Book_Sqlite_Design) so that we can have something that can be used for a discussion to ensure we get this right. One requirement, for example is mailing lists which from your schema is hard to see where they fit in. I'm not stopping you playing around with the implementation whilst we agree this, I'd just like to be sure that what we end up with will be the right thing. (In reply to comment #22) > More tangential: SQLite does permit you to search across databases. If you're thinking that this will solve the search all address books at once bug, it won't unfortunately - the way the address books are separated is the problem here, not the fact that you can't search across multiple files.
(In reply to comment #22) > More tangential: SQLite does permit you to search across databases. Obviously Mark answered the question above, but in general, yes sqlite allows you to query across different sqlite databases.
(In reply to comment #22) > CREATE TABLE OtherProperties ( > CardKey INT, > Property CHAR, > Value CHAR > ); > > Most likely some of the values should be INTs and not CHARs, but there's no > harm in declaring all CHARs, so... On the off chance that the app (or an extension) will store additional data in the OtherProperties table besides just the information that the user entered into the "Custom" fields of the address card, it would be worth making the "Value" column a BLOB. Because SQLite uses manifest typing instead of static typing <http://www.sqlite.org/datatype3.html>, consumers would still be able to store strings in the column, but SQLite wouldn't do any type conversion on the values, so consumers would be able to store other values in it too whose types are preserved. See the content pref service's "prefs" table <http://lxr.mozilla.org/mozilla/source/toolkit/components/contentprefs/src/nsContentPrefService.js> for an example of this kind of EAV schema implemented in an SQLite database with the "value" column given the BLOB type designation so that the database engine assigns it NONE affinity (i.e. does no type conversion). But see the Places annotation service's "moz_annos" table <http://lxr.mozilla.org/mozilla/source/toolkit/components/places/src/nsAnnotationService.cpp#251> for a counter-example of an EAV schema that gives the "content" column the LONGVARCHAR type designation, which has TEXT affinity (i.e. the engine tries to coerce values to "text form" before storing them).
Not a primary focus for thunderbird3.
Created attachment 348353 [details] [diff] [review] Experimental common mork/mozstorage interface This patch represents a first look at one possible approach to dealing with databases in mailnews, namely to define a set of common interfaces, nsIGdb, that can be used to implement a variety of stores. I wanted to see how hard that would be, and what one possible implementation would look like. As I understand it, the original mdb interface was conceived to be such a general interface, yet the actual usage seems to have become a non-extendable front-end for Mork. This patch compiles and works (barely), and has replaced the Mork calls in db/msgdb and in nsFolderCache.cpp with their IGdb equivalents. The program will switch between Mork and Mozstorage backends by replacing the components names as NS_MOZSTORAGEGDB_CONTRACTID or NS_MORKGDB_CONTRACTID in 2 places. Unit test test_mozstoragegdb.js works for the mozstorage version, the mork version was done earlier and not been updated, so it currently does not work. As to how hard, it took about one week to do the initial Gdb interface implementation in mailnews and its mork implementation, and another week and a half to do a mozstorage Gdb implementation, including tweaking the Gdb design to work a little better with the SQL model. My real goals are not to replace mork with mozstorage (as requested in this bug), but rather to investigate methods of adding a common database technology to mailnews that could be used as the basis for certain classes of collaborative applications, such as contact or project management. The mail code here is really not the best demonstration of this - address book would be much better. So I think that the next direction I would like to go, rather than to optimize this for the msgdb case, would be to adapt it to address book. For AB, the obvious immediate benefit could be a shared network address database if I did, for example, a MySQL implementation of IGdb. As to the way forward in the TB3 timeframe, one possible approach might be to replace the Mork calls in msgdb and foldercache with their Gdb equivalents, but leave Mork as the default. This could probably be done without the performance regressions that plagued bug 418551, yet would open up possible experiments with extensions that are currently not possible.
We've discussed this over IRC, but I'll still summarize my comments here for those where weren't there. > As I > understand it, the original mdb interface was conceived to be such a general > interface, yet the actual usage seems to have become a non-extendable front-end > for Mork. This is not something I profess to know in detail (you'd have to ask bienvenu), but my understanding is that the lineage went like this: When Mozilla was made open-source back in late 1998, Netscape had some proprietary components, one of these being the database implementation. The nsIMdb* interfaces were constructed mostly (IIRC) based off of this old database implementation. Trying to say much more is difficult, as it is now been almost 10 years since David McCusker first started work (his first post to n.m.p.mail-news on this topic was on December 2, 1998). According to some of the documents, one of the requirements boiled down to "don't use a real database," which SQLite is and which is where I think we truly need to go. Some of McCusker's later posts seem to indicate that he considered other implementations... an implementation in XML (!), LISP (!!), and RDF (!!!), all of which would essentially be trivial ports of the mork implementation. So I don't think the idea of plugging other DBs into the framework was ever seriously considered. The database APIs that I think we want need much more than mork can provide. An API trying to support mork and real databases at the same time will fast discover that the advantages offered by the latter will have to be neutered in order to support mork. > My real goals are not to replace mork with mozstorage (as requested in this > bug), but rather to investigate methods of adding a common database technology > to mailnews that could be used as the basis for certain classes of > collaborative applications, such as contact or project management. I still hold the belief that it would be easier to get mozStorage to work with other backends then to create our own glue interfaces. > As to the way forward in the TB3 timeframe, one possible approach might be to > replace the Mork calls in msgdb and foldercache with their Gdb equivalents, but > leave Mork as the default. In terms of pluggability, I think the message database and the folder cache rank near the bottom (note that I differentiate message backend from message database). It would be far better in the short- and even mid-term to focus on one area of extensibility at a time. I think being able to say, "XXX is easily pluggable" (for example) would be a far better goal to say, "well, you can sort of plug something in XXX, and sort of plug something in YYY, and sort of..." The folder cache serves only a minor purpose, internally, in mailnews (you can delete your folder cache before starting up and the only effect would be a slower startup), while the message database subordinates to the local message storage backend, so I think trying to be able to change what the msf file is stored as is more useless than you would think. Both are more or less internal to mailnews code. Something that should be (in theory) not internal is the address book, where being able to satiate our usages with LDAP or various system address books is probably a better (certainly more easily attainable) goal than trying to change what our *.mabs are stored as. > This could probably be done without the performance > regressions that plagued bug 418551, yet would open up possible experiments > with extensions that are currently not possible. Bug 418551's performance regressions stem pretty much entirely from my inexperience with database design. Judging from your implementation of the SQL interfaces of nsIGdb, I think trying to use the SQL backend would be much less performant than my current implementation.
What is the status on this? Mork is causing me great pains with Thunderbird as it routinley hangs occupying 100% of my CPU and all the backtraces lead to thunderbird being "stuck" on an msf file in the mork code. One specific msf happened to be 192MB today (the mail folder itself houses around 300K messages on an imap server) This has been causing issues since T-bird 3 and has not relented. Related "Fedora" bugs about the issue. https://bugzilla.redhat.com/show_bug.cgi?id=493000 https://bugzilla.redhat.com/show_bug.cgi?id=597388
This issue is also causing me pain. I am trying to wean a number of users across several organisations off Outlook and onto something open source and more secure/useable/reliable. However, they all have some monster mailboxes in their systems that end up freezing their mailers solid because of thsi issue. As in many cases this was their first step into open source, it is not a good advert for the cause.
Does this problem relate to the use of mbox format? If maildir format was used then huge volumes of mail would not be a problem - or am I being naive here?
Not related at all - mbox/maildir is for storing the messages themselves; mork is used to store summary information about the messages.
Um, Are we doing this or not? This Bug started in 1998 a lot of things have happened since 1998. Is there any progress regarding this bug?
I still get periodic hangs associated with high CPU activity, the same as I always have.
(In reply to Rika Pi from comment #35) > Um, Are we doing this or not? This Bug started in 1998 a lot of things have > happened since 1998. > Is there any progress regarding this bug? This is still desirable. It just so turns out that changing the database backend requires rewriting most of our database APIs. And that happens to be something that doesn't get done easily.
Okay, Is there anything we can help in? (I'm mainly a bug tester) Also sorry for posting on this bug, Have we made bugs for rewriting those APIs. I as I hear from you it's hard, but we must start somewhere. Thank you btw for answering my question!
How much work need to do to deal with this issue?
(In reply to Yonggang Luo from comment #41) > How much work need to do to deal with this issue? You need to: 1. Design a database schema 2. Profile the new database logic to optimize it 3. Rewrite the entire database API to be synchronous instead of asynchronous 4. Write a schema migrator 5. Test the above to make sure that dataloss is difficult or impossible. Steps 1 and 4 are easy. Step 3 is insanely difficult.
Taking an asynchronous implementation and using it synchronously is typically quite easy as long as you don't have re-entrance concerns. (If you do then you drop the result in a work queue instead of calling right away.) It doesn't usually require API change, though it might make the code clearer to do so later. What's the hard case for the Mork APIs? (I can't believe I asked that in 2015.)
One of the challenging aspects of adapting the existing usages of Mork to another DB is that the uses have relied on particular features of Mork that do not map easily to SQL. In the mail summary database, is is how threading is organized. In address book, mailing lists. The direction that Joshua has promoted IIUC is to try to use a DB that could work with js worker threads. Practically that means IndexedDB. So our hope is to have a fairly radical rewrite of interactions with the message DB rather than try to map the existing APIs onto mozStorage. The next step forward with that would probably be some experiments with js worker threads and IndexedDB, simulating some typical requirements of the message summary database, to get some ideas of how that might work.
(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #43) > Taking an asynchronous implementation and using it synchronously is > typically quite easy as long as you don't have re-entrance concerns. (If you > do then you drop the result in a work queue instead of calling right away.) > It doesn't usually require API change, though it might make the code clearer > to do so later. > > What's the hard case for the Mork APIs? (I can't believe I asked that in > 2015.) We have synchronous read operations expected quite literally everywhere in our codebase (as well as almost literally every add-on). We also have weird expectations about how the objects involved work that is anathema to transacational database design. Since mork is a completely in-memory database store, that design leaks through to our database APIs since most individual CRUD operations are O(very fast). Since it's a ubiquitous interface, it's virtually certain to be used somewhere where we can't actually be re-entrant. The idiosyncrasies in the design make it difficult to bolt the synchronous API on an underlying asynchronous implementation by changing a "few" things to be async. I have design notes for how to build a parallel asynchronous API that could be built on the current database implementation, but they need wider discussion and I don't think there's a compelling, urgent reason to discuss them now. Something else I realized while writing this up is performance--Mork is in-memory, so operations like "get every header in the database" is rather fast (~building a list of pointers) at the cost of making "open this database" slow and using high memory consumption. More traditional databases instead make the tradeoff of making open fast, making search and filter efficient, and making individual CRUD operations relatively slow.
Oh, so when you said "Rewrite the entire database API to be synchronous instead of asynchronous" you meant "wrap the indexedDB API in a synchronous one", not "rewrite the mozStorage API to be synchronous".
(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #46) > Oh, so when you said > > "Rewrite the entire database API to be synchronous instead of asynchronous" Gah, I meant to say "rewrite the entire [message] database API to be async instead of sync." >_>
I wanna to know if indexdb can be storage in locale filesystem? If not, I would worried about that
(In reply to Joshua Cranmer [:jcranmer] from comment #45) > Something else I realized while writing this up is performance--Mork is in-memory, > so operations like "get every header in the database" is rather fast (~building a list of pointers) > at the cost of making "open this database" slow and using high memory consumption. > More traditional databases instead make the tradeoff of making open fast, making search and filter efficient, > and making individual CRUD operations relatively slow. If high-performane of Mork-DB which is in-memory DB is needed, I don't think there is reason to stop using Mork-DB. I think following is possible aas first step of imprvements. (1) Stop to use Mork-DB for data which is relatively static data, such as Thread Column choice/order/width. (2) Reduce Mork-DB size as many as possible. : push out data which is not needed for folder open from Mork-DB, such as StringProperty like one. I think spliting current msgDBHdr to "msgDBHdrCore + Pointer to msgDBHdrOthers" + "msgDBHdrOthers" is possble. And, if data for msgDBHdrOthers can be held in SQLite DB without performance penalty, it's helpful for imprvements in search etc. (3) Change .msf file to SQLite DB. Mork-DB is a collection of "Key=Value pair". This can pretty easily be mapped to simple/flat DB, and it's already avilable as localStorage in Fx/Tb. One of biggest problem in Mork-DB which is in-momory DB is: Because of in-memory DB, periodical saving data to physical file is needed. The physical file is called .msf file for long time. If this is changed to localStorage lke one, "periodical saving data to physical file" can be replaced by localStorage.setItem(Key,Value), and by "journaling of SQLite", there is no need to worry about file loss due to power failure, system crash, file deletion by user, and so on. "Insert data or replace data + Commit" is sufficiiet to keep backup of in-memory DB data in physical file. For improvements in search, filtering etc. Database outsize of Mork-DB, "DB constructed over current Mork-Db", is possible: See Gloda DB which is special SQLite DB with special tokenizer. Search is usually batch type or bulk type. "Select Where" is pretty powerfull in search. There is no need to search msgDBHdr. Many Indexes for mail data can be created in addution to Mork-DB, independently from Mork-DB.
I am working it and have ideas about it.
Will this fix #1165583 and #1100940? If yes, those should be marked as dupe of this one (but it's not a big deal). If no, then they are incorrectly marked as incomplete.