Database structure for folder hierarchy
Categories
(MailNews Core :: Database, task)
Tracking
(Not tracked)
People
(Reporter: darktrojan, Assigned: darktrojan)
References
(Blocks 2 open bugs)
Details
(Keywords: leave-open)
Attachments
(8 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
In this bug I'll be posting my patches relating to storing folders in the global database. At this point it's experimental, does nothing with actual mail folders, and isn't connected to anything. Nothing about this is finalised, especially the names of various things, but we are pretty happy with the structure so far.
Assignee | ||
Comment 1•1 month ago
|
||
Assignee | ||
Comment 2•1 month ago
|
||
Comment 3•1 month ago
|
||
I've been thinking about this a bit, and here are some more thoughts:
- Folders are mostly reflection of structure on a remote server or (in the case of local folders) the filesystem.
- We need the concept of folders in a global database so we can associate messages with them (messages are inside one or more folders).
- We get folder names and hierarchy from the server (or filesystem). But network traffic is slooooooow, so it makes sense to cache names/hierarchy locally once we have them.
- The server always holds the Authoritative Truth (tm).
- We have local, not-on-the-server data associated with folders too, eg ordering, view settings, localised names for special folders etc.
- In the current database-per-folder system, there is a folderInfo struct which holds such stuff (although I don't think that captures hierarchy - I think the code uses the filesystem structure for hierarchy, updated with info from server when available).
- For folders, we are synchronising three things: the server state, the in-memory state and the database state. Keeping the latter two in step isn't so bad as long as you're ruthless in guarding access to the mechanisms of change. It'll suck if you can update one representation without updating the other :-)
I would be trying to think of terms of the folders in the database being cached representations of server-side objects, plus some extra non-server app-side data stuck onto them (and presumably add-ons would like the ability to slap extra custom data onto folders too).
So to me, the interesting kinds of questions are:
- how do we discover folders (from remote servers and local folder filesystem)
- how do we deal with server-side folder changes? eg think of a yahoo IMAP account where the user renames a folder through the webmail interface while not using TB. What happens the next time TB runs?
- I don't think a client can catch this with vanilla IMAP... It'll just be one folder disappearing (deleted!) a new one showing up, containing messages to (re)download.
- I'm pretty sure ews and jmap would have some kind of immutable server-assigned id for folders so they can be moved around and reparented without fuss. So we should support that.
- I'd bet there are IMAP extensions which support that too (objectID?).
- How do we resolve local "speculative" operations with eventual server success/failure - eg a user reparents a folder locally:
- we want to update our local representation immediately (we're a native app - everything should be instant, dammit!)
- we kick off a network operation to do the same thing on the server
- if the server succeeds, great, all done here.
- if the server operation fails... then we need to roll things back and make sure our local folder data (and the messages therein) are still in sync with the "true" state on the server.
- There's the ugly case where we don't know if it succeeded or failed! Start operation, network goes down... what to do in such cases? (we need general sync-with-server code to sort it out - such a sync mechanism should probably also be used by the initial connection to the server) What does the IMAP code handle this? No idea - I'm sure it mostly all just works, but I bet there's lots of duplication and ugly corner cases in there, with lots of potential to go horribly wrong. It'd be nice to think about this in a more protocol-neutral way.
Assignee | ||
Comment 4•1 month ago
|
||
Assignee | ||
Comment 5•1 month ago
|
||
Each server will be represented as a separate root in the folders table.
Comment 6•1 month ago
|
||
I'm still very nervous about the use of the nested set model when we have recursion at our disposal. Please read up on the risks of the large updates necessary when new folders are created on accounts with many folders. The overhead and risk involved in so many write operations just doesn't seem to be worth it when read performance is the same when using recursive CTEs.
https://lars.yencken.org/hierarchy-in-sql
https://en.wikipedia.org/wiki/Nested_set_model#Performance
Assignee | ||
Comment 7•1 month ago
|
||
I'm not concerned about the performance. This table will have a small number of rows in it – probably less than 10,000 in the worst cases (we know some users have 100 accounts, but have they got 100 folders in each?) and less than 100 for the vast majority of users. Even if there were 10,000 rows, SQLite can update that many in milliseconds.
As the quote in that Wiki article says, "It’s best used when you need to query a tree more frequently than you need to modify the tree." which is what we'll be doing.
That said, most of what I'm working on at the moment doesn't depend on which database model we use. We'd still want all of the functionality I'm building out around the database, and all of the test cases are still valid. We could change out the database model without a lot of work.
Assignee | ||
Comment 8•1 month ago
|
||
A folder in the database can either have an ordinal value or null. Folders with ordinals are sorted
first, in ascending order, then folders without ordinals in alphabetical order. The order of those
with null will be further refined in a later patch, as it needs to account for non-ASCII characters
and the ordering of special folder types.
Assignee | ||
Comment 9•1 month ago
|
||
Assignee | ||
Comment 10•1 month ago
|
||
I did end up moving away from the nested set model, but not for performance reasons. I realised that to have a folder's children be manually ordered (we store the order as specified by the user), automatically ordered (based on the folder type and name), or some combination of both. In the nested set model the children have to be stored in some order and that just made it too complicated. I'd rather store no order where necessary and deal with it as the database is read.
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Updated•1 month ago
|
Assignee | ||
Updated•7 days ago
|
Comment 11•7 days ago
|
||
Assignee | ||
Comment 12•7 days ago
|
||
PIDL will do most of the work for us, but not the string properties.
Assignee | ||
Comment 13•6 days ago
|
||
This does the database read off the main thread. Subsequent patches will add the initial reconciliation of on-disk folders to the task thread.
Assignee | ||
Updated•17 hours ago
|
Comment 14•13 hours ago
|
||
Pushed by john@thunderbird.net:
https://hg.mozilla.org/comm-central/rev/46abd41e1cc1
Make C++ shortcuts to infallible Folder properties. r=BenC
https://hg.mozilla.org/comm-central/rev/31f948947954
Load folders from the database off the main thread. r=BenC, mkmelin
Description
•