Closed Bug 1101159 Opened 10 years ago Closed 9 years ago

TB generally unusable with big amount of messages in folder

Categories

(Thunderbird :: Untriaged, defect)

31 Branch
x86_64
Linux
defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 11050

People

(Reporter: teo8976, Unassigned)

Details

(Keywords: perf)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36

Steps to reproduce:

Use thunderbird for years, hence having a few folders with tens to hundreds of thousands of messages (inbox NOT being one of them)


Actual results:

TB is painfully slow in doing trivial tasks.
For example:
- merely "entering" (i.e. clicking on) a folder that contains many thousands of messages takes ages. Just to show the list of messages on the right and the first (or selected) message on the bottom right panel.
- Every time new messages are downloaded, if a filter causes them to be moved into a folder that has thousands of messages, just saving a dozen of downloaded messages is slow and freezes the UI at times
- Moving big amount of messages from one folder to another takes ages. Even just selecting them is slow. Even just the action of start dragging them is slow (several seconds). And actually moving them takes hours. (I've reported this particular one as a separate bug)
- All time consuming operations such as those described above (most of which should not be time consuming in the first place) freeze the UI while they are performed, rendering the program unresponsive until they complete

And the list may go on.

In short, TB is not designed in a scalable way. It becomes UNUSABLE when you have folders with tons of messages.


Expected results:

None of the above should  happen. Every part of TB should be designed taking scalability and efficiency in mind. There are a lot of potentially O(1) operations which clearly take O(N) if not even more.

Reporting a bug for every single thing that is inefficient in TB would be painful. Perhaps the developers may be able to narrow it down to a small number of critical efficiency issues, but from a user's standpoint, describing all the situations where the behavior is unacceptably slow rendering the application unusable with a big number of messages, would imply writing hundreds of bug reports.

I feel like I reported this before, but I've done a search and I can't find the report, neither mine nor of anybody else.

Perhaps TB could use such things as databases, instead of sticking with monolitic plain text file formats designed decades ago, when the amount of data that had to be handle was small enough that it could make sense (efficient database systems also have existed for decades, by the way).
(In reply to teo8976 from comment #0)
> User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like
> Gecko) Chrome/38.0.2125.122 Safari/537.36
> I feel like I reported this before, but I've done a search and I can't find
> the report, neither mine nor of anybody else.

yes, you have. and you get an email for every one of these bug reports and every bug comment. 
if you aren't finding them, check your spam folder
 
all your bug reports are https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Core&product=Firefox&product=MailNews%20Core&product=NSPR&product=Thunderbird&product=Toolkit&emailreporter1=1&emailtype1=exact&bugidtype=include&email1=teo8976%40gmail.com&list_id=11589796

Please choose which of these bug reports you'd like this new bug duplicated to.

> Perhaps TB could use such things as databases, instead of sticking with
> monolitic plain text file formats designed decades ago, when the amount of
> data that had to be handle was small enough that it could make sense
> (efficient database systems also have existed for decades, by the way).

maildir is coming, but it's at least 6 months away.
> all your bug reports are 
> Please choose which of these bug reports you'd like this new bug duplicated to.

None of them is.

This one is related, but it is just one specific issue:
https://bugzilla.mozilla.org/show_bug.cgi?id=1100940

That might be marked as *blocking* this one, rather than this one a duplicate of that one.


> maildir is coming, but it's at least 6 months away.

No matter which format is used for storage, a lot of things are much more inefficient than they could and should be. Starting from the fact that all processing operations should be done in the background instead of blocking the ui. Also, many of the inefficiencies are not even related to how the messages are stored because they are (or should be) purely UI. For example, when you start dragging there's no reason for doing any processing (not even reading) on the actual messages, that can be deferred until dropping.

And finally, whether the user chooses to use Maildir or mbox (if that is the name of the classic format currently in use), if that imposes any intrinsic burden on efficiency, then other efficient data structures should be used "in parallel" in order to speed things up, and then the data could be synced to and from [actual mail format] in the background.
(In reply to teo8976 from comment #0)
> There are a lot of potentially O(1) operations which clearly take O(N) if not even more.

You are right.
For MsgDatabse, Tb currently uses MorkDB. It's saved in xxx.msf file in Tb.
It is flat table of Key=Data pattern, and it has "messageKey->meta data of mail" structure.
So, size of MsgDatabse is O( number of mails ), and at lest "read entire xxx.msf file content and re-construct MsgDatabase in memory" is currently needed upon each "Mail Folder Open in Tb".
This is a reason why "Keep small mail folder" is always pretty good practice and/or pretty good behavior of clever Tb user :-)

Other O(N) or O( N^2 ).
   "Selection of N mails at Thread Pane" is usually job of O(N).
   However, there are known issues like next in Tb:
       "some jobs relevant to selection of N mails at Thread Pane" is O( N * log N ), O( N^1/2) , O( N^2 ), O( N! ),  ....
       Even when  O( N ), it's O( K * N ) where K is large or pretty large.

A cause of "short UI freeze!" in some kind of jobs.
    Many tasks is executed under main task for UI. 
    For example. IMAP code is executed under main UI task. Many jobs are not asyncronous. Many jobs are synchronous.
    Developers are already aware of this kind of issue.
    However, it requires re-design/re-construction of task structure/multi-tasking in Mozilla family and Thunderbird.

> Perhaps TB could use such things as databases, instead of sticking with monolitic plain text file formats designed decades ago, 
> when the amount of data that had to be handle was small enough that it could make sense
> (efficient database systems also have existed for decades, by the way).

You are absolutely right on MsgDatabase.
You are partially wrong on MsgStore file, because Tb developers already implemented msgstore/maildirstore(one file per a mail), although it's still "under construction".

I also think "read entire data of database upon folder open" is inefficient, if MsgDatabase is large. 
There is no need to read/know "messageKey and associated meta data of all messages in a mail folder" upon mail folder open. 
"MessageKey and associated meta data of mails which is shown at Thread Pane" is sufficient for Thread Pane display upon mail folder open.
Tb developers tried to utilize SQLite DB as MsgDatabase in the past.
However, for performance reason(I believe mainly for response time), MorkDB is still used for MsgDatabase.
"Code by you with sophisticated Data Base system and sophisticated design/implementation" is appreciated very much.
Hint for your code implementation.
   msgstore/berkleystore and msgstore/maildirstore was implemented based  on new "Pluggable MsgStore" feature.
   IIRC, David had plan like "Pluggable MsgDatabase" feature, although I'm not sure.
   If you will implement feature like "Pluggable MsgDatabase", you perhaps can implement msgdatabase/MorkDB(current one), 
   msgdatabase/SQLiteDB, msgdatabase/JavaScriptObject_In_JSON_File, msgdatabase/OracleDB, msgdatabase/DB2DB, ...
Yes, maildir will probably not solve this. I also think the slowness of opening a folder is caused by reading the .msf database. We would need to implement some partial (and on demand) loading of the msf or something.

The slowness of some UI operations should really be filed in individual bugs. Bug 1100940 is a good start, even though all of the 3 stages should have been separately. I have already split it.

I have a test folder of 4GB totaling 1 million messages. I can confirm e.g. the slowness of just opening the folder after TB startup. Just the msf file of the folder is *200MB* so mork has some work to do. (We do not count that file into the folder size which is already filed.)

Wayne, do we have a metabug for slowness on large folders, or is it this one?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(please add perf keyword to performance related bugs. Also, better bug summary helps, eg adding "Folder" helps)

(In reply to :aceman from comment #5)
> Yes, maildir will probably not solve this. I also think the slowness of
> opening a folder is caused by reading the .msf database. We would need to
> implement some partial (and on demand) loading of the msf or something.


> The slowness of some UI operations should really be filed in individual
> bugs. Bug 1100940 is a good start, even though all of the 3 stages should
> have been separately. I have already split it.

Bug 1100940 itself may be a duplicate IMO
 
> I have a test folder of 4GB totaling 1 million messages. I can confirm e.g.
> the slowness of just opening the folder after TB startup. Just the msf file
> of the folder is *200MB* so mork has some work to do. 

For modern desktop PC, I hope 200MB isn't too huge. 


> we do not count that file into the folder size which is already filed.
correct


> Wayne, do we have a metabug for slowness on large folders, or is it this one?

I don't think a meta bug will be helpful for this general issue. ...

For performrance issues it's almost always better to dig deep to find the real issues for each precise use case, not lump all use cases into one bucket (despite the reporter's wish).  And we have plenty of precise bug reports [1].  So I suggest more precise detail is needed about reporter's issues.  For example, we do not know the size of his .msf files.  

[1] https://bugzilla.mozilla.org/buglist.cgi?keywords=perf%2C%20&keywords_type=allwords&order=Last%20Changed&list_id=11621074&short_desc=folder&resolution=---&query_format=advanced&short_desc_type=allwordssubstr&product=MailNews%20Core&product=Thunderbird
Keywords: perf
Summary: TB generally unusable with big amount of messages → TB generally unusable with big amount of messages in folder
(In reply to Wayne Mery (:wsmwk) from comment #6)
> > I have a test folder of 4GB totaling 1 million messages. I can confirm e.g.
> > the slowness of just opening the folder after TB startup. Just the msf file
> > of the folder is *200MB* so mork has some work to do. 
> For modern desktop PC, I hope 200MB isn't too huge. 
Just reading that file from disk may take some seconds. In my tests, even if the file is in OS memory cache, parsing it still takes several seconds (there are several milion of small records in the DB). Rebuilding the file (reparsing the folder) takes minutes to an hour I think. I had to disable gloda indexing on it as it wouldn't finish in reasonable time (would probably take DAYS). Not sure if the fact that many of the msgs were dupes had an effect on that.
While the problems reported are completely valid, we wouldn't normally confirm a conflated set of issues in a single bug report unless we were going to make it a meta bug, which I suggest we are not. At least, not yet.


(In reply to teo8976 from comment #2)
> > all your bug reports are 
> > Please choose which of these bug reports you'd like this new bug duplicated to.
> 
> None of them is.
> 
> This one is related, but it is just one specific issue:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1100940

quite right. I was a bit hasty.  (so many bugs, they sometimes run together)

 
> > maildir is coming, but it's at least 6 months away.
> 
> No matter which format is used for storage, a lot of things are much more
> inefficient than they could and should be. Starting from the fact that all
> processing operations should be done in the background instead of blocking
> the ui. Also, many of the inefficiencies are not even related to how the
> messages are stored because they are (or should be) purely UI. For example,
> when you start dragging there's no reason for doing any processing (not even
> reading) on the actual messages, that can be deferred until dropping.

Unfortunately, unless you are contributing a patch, you don't get to choose the potential solutions. 


(In reply to teo8976 from comment #0)
> User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like
> Gecko) Chrome/38.0.2125.122 Safari/537.36
> 
> Steps to reproduce:
> 
> Use thunderbird for years, hence having a few folders with tens to hundreds
> of thousands of messages (inbox NOT being one of them)
>
> Actual results:
> 
> TB is painfully slow in doing trivial tasks.
> For example:
> 1. merely "entering" (i.e. clicking on) a folder that contains many thousands
> of messages takes ages. Just to show the list of messages on the right and
> the first (or selected) message on the bottom right panel.
> 2. Every time new messages are downloaded, if a filter causes them to be
> moved into a folder that has thousands of messages, just saving a dozen of
> downloaded messages is slow and freezes the UI at times
> 3. Moving big amount of messages from one folder to another takes ages. Even
> just selecting them is slow. Even just the action of start dragging them is
> slow (several seconds). And actually moving them takes hours. (I've reported
> this particular one as a separate bug)
> 4. All time consuming operations such as those described above (most of which
> should not be time consuming in the first place) freeze the UI while they
> are performed, rendering the program unresponsive until they complete
> 
> And the list may go on.
> 
> In short, TB is not designed in a scalable way. It becomes UNUSABLE when you
> have folders with tons of messages.

So let's get down to specifics. First, what type of machine are you running?

Next, I've numbered your list to make it possible to discuss them individually

1. Please give some examples of what number of messages in folders is showing this problem for you.  And what are the file sizes on disk for the folder and .msf file?
2. This is probably a combination of issues that are covered in multiple bug reports of the query in comment 6.
3. again, probably a duplicate.

More generally, you mention you have used Thunderbird for years. Have you had ALL the problems you stated for that time?  Or has slowness started relatively recently, eg in the past year or two?

 
> Expected results:
> 
> None of the above should  happen. Every part of TB should be designed taking
> scalability and efficiency in mind. There are a lot of potentially O(1)
> operations which clearly take O(N) if not even more.
> 
> Reporting a bug for every single thing that is inefficient in TB would be
> painful. Perhaps the developers may be able to narrow it down to a small
> number of critical efficiency issues, but from a user's standpoint,
> describing all the situations where the behavior is unacceptably slow
> rendering the application unusable with a big number of messages, would
> imply writing hundreds of bug reports.

This sounds like frustration  - I think most people will agree that hundreds of bug reports is overstating the case.  The query in comment 6 covers a great many issues in about 50 bug reports. 

 
> I feel like I reported this before, but I've done a search and I can't find
> the report, neither mine nor of anybody else.
> 
> Perhaps TB could use such things as databases, instead of sticking with
> monolitic plain text file formats designed decades ago, when the amount of
> data that had to be handle was small enough that it could make sense
> (efficient database systems also have existed for decades, by the way).

Unfortunately databases are not a magic bullet here.  If they were, all these problems would have been fixed ages ago.
Status: NEW → UNCONFIRMED
Ever confirmed: false
I forgot to ask some other definitely relevant questions:

1. how many very large, active accessed several times a day by filter or manual touches) folders do you have?
2. how much memory is thunderbird process using?
3. how much memory in your PC?  Or, is it a laptop?
4. do you have all pop accounts?  Or a mix of pop and imap?  How many of each?
5. files are on local disk, not network disk?
Flags: needinfo?(teo8976)
teo seems to have disengaged.

The majority of this is related to bug 11050, so I'm duping to there. No further comments are needed there - but what we do need there is coding skills, so please direct anyone interested to that bug.
Severity: normal → major
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(teo8976)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.