performance consideration with huge IMAP folder

RESOLVED INVALID

Status

Thunderbird
Folder and Message Lists
--
enhancement
RESOLVED INVALID
9 years ago
7 years ago

People

(Reporter: Roberto Colmegna, Unassigned)

Tracking

x86
Windows XP

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7
Build Identifier: 2.0.0.19

Using TB on a folder with a lot of msgs (200,000) I noted that
the client use a lot of CPU for processing the reply of 
a "UID FETCH (FLAGS UID)" command (which is launched every time
I click on the folder).

Looking at the TB source code I note some code like this (which
is NOT a real TB code):

function searchMsg(long msgId) {
  for(int i=0; i < msgs.nr; i++)
    if(msgId == msgs.cache[i].id)
      return i;
      
  return -1;
}

but this code require a medium of NR*NR/2 comparisons (which use
a lot of CPU).

Using this code could require NR/2 comparisons (and only 1 compare
in IMAP reply processing while matching with cached msgs, I suppose):


int m_iIdxLastMatch = 0;

function searchMsg(long msgId) {
  int k=m_iIdxLastMatch;
  
  for(int i=0; i< msgs.nr; i++) {
    if(msgId == msgs.cache[k].id) {
      m_iIdxLastMatch = k;
      return k;
    }
    
    k=(k+1)%msgs.nr;
  }
  
  return -1;
}


Reproducible: Always

Actual Results:  
NA

Expected Results:  
NA

NA

Comment 1

9 years ago
There lot changes in IMAP code for TB3, could you give a try TB3 beta2 ?
Version: unspecified → 2.0
(In reply to comment #0)
> processing the reply of a "UID FETCH (FLAGS UID)" command
> (which is launched every time I click on the folder)

This part of your problem sounds for me Bug 479450. (See Bug 479450 Comment #13)
(In reply to comment #0)
> but this code require a medium of NR*NR/2 comparisons (which use a lot of CPU).

There are at least three kinds of comparison which consumes CPU power for threading.
 - Comparison of Message-ID
   => If many mails have reference to Message-Id, rebuild-index takes long. 
 - Comparison of Subject ("comparison with excluding Re:" uses more CPU power)
   => If many mails have same subject, rebuild-index takes very long.
      If many mails have same subject, change to Thread View takes long.
 - Searching for many child mails of a mail which is deleted,
   and changing of parent of many child mails. (See Bug 452221)
Further, Tb has problem of "UI lock" during the long CPU bound process.
I think "binary search" or "indexing" is usual solution of this kind of issue.
(Reporter)

Comment 4

9 years ago
binary or index are the best choice, but the proposed solution could be more
simple to implement and require less debug effort ... but is also less efficiency!

It could be a "bridge" solution from the current implementation to a index/binary search ones.
Roberto, we are very happy if people can find performance optimizations in Thunderbird, however there are many, many algorithms all over the mailnews code base (and even more in core), and there's a lot that could affect what we're doing when we select a folder.

Telling us you're using the wrong algorithm here's what it looks like, is like telling us to look for a needle in a haystack. If you really want to help us, then please tell us at least which function(s) it is in, you can even point to the code here (this contains the current, latest code-base): http://mxr.mozilla.org/comm-central/

You can even write us a patch if you want to:

https://developer.mozilla.org/en/Build_Documentation
https://developer.mozilla.org/En/Getting_your_patch_in_the_tree

We don't mind if you don't do a patch, but please at least tell us where exactly you think this performance issue is.
(Reporter)

Comment 6

9 years ago
I think that the method: 

int nsMsgDatabase::FindInCache(nsMsgDatabase* pMessageDB)

in the file nsMsgDatabase.cpp, could be a good candidate.

BUT I haven't a compile environment which have allowed me 
to test the performance of the hypothetic patch (and, I think,
the knowledge: I'm a Java programmer!).
(In reply to comment #6)
> I think that the method: 
> 
> int nsMsgDatabase::FindInCache(nsMsgDatabase* pMessageDB)
> 
> in the file nsMsgDatabase.cpp, could be a good candidate.

Well that is just getting the message database from the cache. There's one per folder, and you're unlikely to need to get them from the cache in any sort of order, making your suggested change less optimal.

There's probably a couple of things we could do to speed up that function, but they aren't going to have any visible perf impact.

Although I'd love to find some perf improvements in this bug, I don't think we are going to - we need to do a much more details perf analysis using tools to properly focus on the real problem/slow areas. Therefore I'm resolving this bug as invalid as the suggestion doesn't apply easily nor would it make a significant perf improvement.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.