Closed Bug 870556 Opened 11 years ago Closed 10 years ago

O(n^2) performance freezes UI for several minutes fetching new mail from IMAP server for very large folder

Categories

(MailNews Core :: Networking: IMAP, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 37.0

People

(Reporter: derrick_moser, Assigned: neil)

References

(Blocks 1 open bug)

Details

(Keywords: hang, perf)

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0
Build ID: 20130401133302

Steps to reproduce:

I added my mail account to Thunderbird (version 17.0.5).  The mail account uses IMAP.  One of the folders on the IMAP server has over 350000 email messages.


Actual results:

Periodically Thunderbird attempts to download messages.  However, it freezes for several minutes in the nsAutoSyncState::PlaceIntoDownloadQ() method (see mailnews/imap/src/nsAutoSyncState.cpp) spending most of its time on the mDownloadQ.Contains(aMsgKeyList[idx]) statement.  The PlaceIntoDownloadQ method loops over all message to put in the queue but mDownloadQ performs a linear search on its items to avoid duplicates.  This leads to O(n^2) performance and user frustration.


Expected results:

The download queue should have used an acceleration structure to quickly determine if it contains an item.
FYI.
It's similar O(n^2) issue to bug 812923 comment #21, which is common in Tb due to "lnear search of entire array for duplicate" per "each element in other array or same array".
In that bug, "per each element in other array or same array" part is not madatory yet fortunately, currently proposed solution is "linear search of entire array for first element only".

This is similar to difference among Bubble Sort, QuickSort, and Enterprize Data Base system with expensive Sort/Merge Program, and our searching is near Bubble Sort. 
Indexing by Data Base, by JavaScript interpreter for JS Object's property name, indexing or hashing by Tb himself for search such as binary search etc. may be needed for efficiency.
FYI.
"Geting MsgDBHdr of all message in a msgFolder" is not so expensive job even though not so light weight job, and it can be easily executed by JavaScript. So, "holding all key of msgDBHdr data and pointer to msgDBHdr in JavaScript Object" is possible, and if data is needed after restart of Tb, saving in JSON file is possible, like foldeTree.json, sessions.json.
If all messageKey is held in JS object like following,
  var JS_MsgDB = {};
  JS_MsgDB[FolderURI] = {};
  JS_MsgDB[FolderURI][messageKey of a message] = value_object;
  where value_object ={};
    value_object["deleted"]  = false ;
    value_object["msgDBHdr"] = msgDBHdr of this message ;
    value_object["xxx"]      = required data as property xxx;
existence check is simply done by;
  var exists;
  if( !JS_MsgDB[FolderURI][a_messageKey] ||
       true==JS_MsgDB[FolderURI][a_messageKey]["deleted"] )
     exists=false; else exists=true;
I can't think JavaScript interpreter does do linear search for property name of a JavaScript object. So I believe duplication check like above is sufficiently faster than current linear search by Tb.
And, if JSON file is saved as BLOB of SQLite DB, integrity of file is kept easily.
Note: "JSON data as Database data" is used by "mongodb" who is a friend of server side "node.js" what is based on JavaScript.
Component: Untriaged → Networking: IMAP
Keywords: hang, perf
Product: Thunderbird → MailNews Core
This is a rough attempt to fix two O(n^2) performance bugs in Thunderbird.  It needs to be reviewed by someone more familiar with the Thunderbird source code as I am far from being qualified.
I have attached my attempt at fixing the bug.  The changes to the nsAutoSyncState class appear to address the initial O(n^2) problem I encountered.  There is probably a more elegant fix but I'm not familiar with the Thunderbird source code.

Once past this problem, Thunderbird immediately encountered another O(n^2) performance bug (in NS_QuickSort!) when determining the download strategy.  The DownloadQ entries fetched from mthe IMAP server appear to be already sorted.  Unfortunately, NS_QuickSort chooses to use an insertion sort when given a large pre-sorted array as input!  My patch removes some clearly offending code in NS_QuickSort but someone should go through the method to make sure there are no other subtle bugs.

With both of these changes, I no longer experience the periodic freezes from Thunderbird.
Hi Derrick. Thanks for the contribution. Here is a link on the patch process: https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch

You can request a review from one of the the folks listed here: https://wiki.mozilla.org/Modules/MailNews_Core

:bienvenu is the most knowledgeable on IMAP and AutoSync, but he has limited availability. You might try him first. If you don't hear anything for a week or two try :irving or :standard8.
Assignee: nobody → derrick_moser
(In reply to Derrick Moser from comment #3)
> Fix O(n^2) performance bug in DownloadQ and NS_QuickSort

(A) Change from mDownloadQ.Contains to mDownloadQItems.Get which is nsDataHashtable.

Good catch.
Thanks for providing C++ example. Similar method can be applied to other bugs such as bug 846123, if cause of that bug is "linear search of entire array per an element".

(B) NS_QuickSort.

Should we continue to use NS_QuickSort?

See bug 846123 comment #46 and some comments relivant to Quick Sort. Used NS_QuickSort is code written in 20-th century to bypass critical bug in qsort() of Solaris around year 2000 or before...

I believe "back to standard qsort()" is sufficient.
Can you measure difference between NS_QuickSort() and C++ sort()/stable_sort() for array of 30000 or larger?
No severe performance issue in standard qsort() when many duplicate data exist?
(In reply to Derrick Moser from comment #3)
> Fix O(n^2) performance bug in DownloadQ

What happens when followin case?
(1) All messageKey of msgDBHdr is held in mDownloadQ.
(2) Some of them is downloaded by uto-sync, but others are not.
(3) Some mails which is not auto-synched yet are marked as \Deleted
    by other client. Call UID-X.
(4) Tb knows about \Deleted flag of mail of UID-X.
    "Remove immediately" or "Move to trash" model is used,
    so msgDBHdr of the messages is remved in Tb.
(5) Undelete of UID-X is executed by other client.
    For Tb these are similar to "new mail".
    So new msgDBHdr is created with same messageKey of UID-X,
    because messageKey==UID of mail when IMAP.
(6) In this case, following may occur.
    - UID-X is still held in mDownloadQ.
    - UID-X is added to mDownloadQItems.
    - UID-X is unique in mDownloadQItems,
      so UID-X is added to mDownloadQ   
    => multiple UID-X isgeneraed in mDownloadQ.

No ptoblem because UID-X is immediately removed when Tb detects \Deleted flag of mail of UID-X?

If not, "load of messageKey of all mails in mDownloadQ to newly created mDownloadQItems" is needed before adding messages in aMsgKeyList to mDownloadQ, unless mDownloadQ is empty.
> Should we continue to use NS_QuickSort?
> 
> I believe "back to standard qsort()" is sufficient.
> Can you measure difference between NS_QuickSort() and C++
> sort()/stable_sort() for array of 30000 or larger?
> No severe performance issue in standard qsort() when many duplicate data
> exist?

qsort() does not provide support for a data pointer which is used is some situations.  std::sort() looks like a good replacement.  I just ran a few sort methods on 10000000 random integers.  On my test system (yes, it is old and slow):

NS_QuickSort                        4.860s
NS_QuickSort (without insert sort)  5.605s
std::sort                           4.105s
std::stable_sort                    6.196s
qsort                               6.274s

When given all duplicate items:

NS_QuickSort                        0.334s
NS_QuickSort (without insert sort)  0.327s
std::sort                           2.345s
std::stable_sort                    3.072s
qsort                               2.770s

When given pre-sorted items:

NS_QuickSort                        0.587s
NS_QuickSort (without insert sort)  2.947s
std::sort                           1.900s
std::stable_sort                    3.087s
qsort                               3.975s

NS_QuickSort() uses insertion sort if it thinks the data is nearly sorted.  Although this gives O(n) behaviour when working on pre-sorted or duplicate data, it can produce O(n^2) in some cases (which Thunderbird encountered trying to determine a download strategy for my mailbox) and these O(n^2) performance problems are difficult for others to reproduce as they are very dependent on the input data.
> No ptoblem because UID-X is immediately removed when Tb detects \Deleted
> flag of mail of UID-X?

The code always keeps the two structures in sync.  Only non-duplicate entries are inserted and inserts and deletes are always applied to both structures.  This would be more obvious to readers if the list and hashtable were packaged into a separate ListWithNoDuplicates class without exposing the implementation details.  I believe this would require changes to the function signatures of a lot more code.
(In reply to Derrick Moser from comment #8)
> std::sort() looks like a good replacement.

I thought so too. I can't imagine reason why "Standard one" is not used by Mozilla's C++ code... "Standard sort" is usually not so bad as seen by Array.sort() of JavaScript which is stable sort. I thought, as seen in std::sort() of C++, or in sort of Java or Script language like JavaScript, or in many sort utility programs, "sort algorithm choice by language" is usually hidden for general purpose sorting of general data. I thought biggest advantage of qsort() or variants of Quick Sort like NS_QuickSort() to "standard one" is "low resource consumption" rather than "Quickness".
 
"Standard sort" in C is "Bubble Sort"?
"Standard sort" in C++ is pretty new feature?

FYI.

(mozilla-central)
std::sort use in mozilla-central
> http://mxr.mozilla.org/mozilla-central/search?string=std::sort
QuickSort use in mozilla-central
> http://mxr.mozilla.org/mozilla-central/search?string=QuickSort&case=on
(comm-central)
std::sort use in comm-central
> http://mxr.mozilla.org/comm-central/search?string=std::sort
qsort() use in comm-central
> http://mxr.mozilla.org/comm-central/search?string=qsort&case=on
QuickSort use in comm-central
> http://mxr.mozilla.org/comm-central/search?string=QuickSort&case=on

Bug 738533 is request for "going back to qsort() in standard library from NS_QuickSort()". That bug refers to morkQuickSort() which is duplicate of NS_QuickSort().
Despite that only 16 modules were updated, that bug was closed as FIXED...

> Although this gives O(n) behaviour when working on pre-sorted or duplicate data, it can produce O(n^2) in some cases
> (which Thunderbird encountered trying to determine a download strategy for my mailbox)
> and these O(n^2) performance problems are difficult for others to reproduce as they are very dependent on the input data.

No such problem in std::sort of C++?
No such problem if qsort() of C++?
It's NS_QuickSort()/morkQuickSort() only problem?

How slow with NS_QuickSort() in your case? Can problem of bug 846123 be produced by such O(2^n) issue in NS_QuickSort()?
Attachment #748237 - Flags: review?(mozilla)
Attachment #748237 - Flags: feedback?(mbanner)
Someone mentioned QuickSort. If the download queue is sorted, then we can use IndexOfFirstElementGt to see whether the element is in the list (it also tells us where to insert the element to keep the list sorted, in case that helps).
(In reply to neil@parkwaycc.co.uk from comment #11)
> Someone mentioned QuickSort.

Mentioned NS_QuickSort() instead of qsort() of C or C++?
If so, I can imagine reason why NS_QuickSort() is used in many places of Mail&News code.
- Mail&news code was written in era of "qsort() of Solaris is broken",
  or name of "QuickSort()" sounds "far quicker and far better than any
  other sort function".
- NS_QuickSort() is a pretty simple and good example of Quicksort
  implementation, and was/is actually quick sufficiently, and worked
  pretty well in many cases, and problem in it is only what
  Derrick Moser experienced.
  Note:
  "Worst case == O(2^n)" itself is attribute of Quicksort algorithm,
  and Average == O( n * log(n) ).
  Weakness in pre-sorted data case and duplicate data case
  is well known. (perhaps depends on pivot choice logic) 

Are we better to open bug for "Use std:sort() or qsort() instead of NS_QuickSort()/morkQuickSort() in Mail&News"?
The last time we had a slowly performing sort it was because we were hitting the worst case all of the time because of a poorly written insertion algorithm (it was using AppendElement and Sort instead of InsertElementSorted) and I was concerned that we were doing something similar here.
> +++ comm-esr17/mozilla/xpcom/glue/nsQuickSort.cpp

If you want to change nsQuickSort.cpp, this would need to be a separate patch, maybe even in a separate bug. While the mozilla/ directory is contained within the Thunderbird and SeaMonkey code tree, it's a separate repository. Also, different reviewers would be needed for that part as it's a different component.

https://wiki.mozilla.org/Modules/Core#XPCOM
> How slow with NS_QuickSort() in your case? Can problem of bug 846123 be
> produced by such O(2^n) issue in NS_QuickSort()?

Below is an example of the slow behaviour I saw happening in NS_QuickSort.  Increase "n" a bit more to see even worse behaviour from NS_QuickSort.

int my_cmp(const void *a, const void *b, void *data)
{
    int aa = *(int *)a;
    int bb = *(int *)b;
    return (aa < bb) ? -1 : (aa == bb ? 0 : 1);
}

int main(int argc, const char *argv[])
{
    int n = 100000;
    int *a = new int[n];

    // create some partially sorted data
    srand(1);
    int mid = n / 2;
    for(int i = 0; i < mid; ++i)
        a[i] = -(rand() % n) - 1;
    a[mid] = 0;
    for(int i = mid + 1; i < n; ++i)
        a[i] = rand() % n + 1;

    printf("Sorting %d items...\n", n);
    NS_QuickSort(a, n, sizeof(int), my_cmp, 0);
    printf("Done!\n");
    delete [] a;
    return 0;
}
(In reply to Derrick Moser from comment #15)
>         a[i] = -(rand() % n) - 1;
>         a[i] = rand() % n + 1;

Whole numbers generated from rand() usually produces duplicates.
In my test of Math.random() of JavaScript, with large seed, unique whole numbers in generated array was 95% to 99%, but if seed is not sufficiently large, unique whole numbers in generated array was 70% to 90%.

Code of NS_QuickSort() doesn't care duplicate data case in pivot selection. And, upon call of swap(especially when swap_cnt=1 is set), duplicate data case is not cared. 
So, if many duplicate data exists, excess swap execution and loop execution may occur before entering required insertion sort or required recursive call of NS_QuickSort(). 

Do you see severe performance problem with unique number only random data?

FYI.
JavaScript example to generate random and unique whole numbers.
    function generate_Unique(cnt,max,array_to_sort)
    {
        var tbl={};
        for(var xx=0;xx<cnt;xx++)
        { 
            var dat = Math.floor(1+Math.random()*max) ;
            if( tbl[dat] ){ continue; } // ignore duplicate number
            tbl[dat] = true ;
            array_to_sort[array_to_sort.length] = dat ;
        }
    }

Do you actually see severe performance issue on pre-sorted and unique data in NS_QuickSort()?

In my test of my JavaScript porting of the NS_QuickSort() which doesn't do swap if pivot data is same, behaviour on pre-sorted and unique data was;
  Ascending order : no swap occurs.
                    do insertion sort only one time on all array. 
  Descending order: one swap occurs, and array is split to two part.
                    insertion sort is executed two times,
                    one insertion sort on first half of entire array,
                    one insertion sort on second half of aentire array.
Although "insertion sort on eintire array" or two times of "insertion sort on half of eintire array" is executed and the "insertion sort" is O(2^n) process, excess(or required) swap/loop/recursive call etc. never happens when already sorted data. So, I think total executin time of sort by current NS_QuickSort() is far smaller than sort of random data.
I have opened bug 872497 for the NS_QuickSort problem so this bug can focus on the original problem of O(n^2) performance manipulating mDownloadQ.  My concern is further issues may be present due to using an array for mDownloadQ.  This looks like another suspicious manipulation of mDownloadQ in nsAutoSyncState.cpp:

for (; idx < msgCount; idx++)
{
    bool containsKey = false;
    database->ContainsKey(mDownloadQ[idx], &containsKey);
    if (!containsKey)
    {
        mDownloadQ.RemoveElementAt(idx--);
        msgCount--;
        continue;
    }
    // ...
}

I suspect that will lead to O(n^2) performance if another process deletes a large number of emails from the IMAP server.
Comment on attachment 748237 [details] [diff] [review]
Fix O(n^2) performance bug in DownloadQ and NS_QuickSort

Neil's already been looking at this, so I think its best if he reviews it.
Attachment #748237 - Flags: review?(neil)
Attachment #748237 - Flags: review?(mozilla)
Attachment #748237 - Flags: feedback?(mbanner)
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
OS: Linux → All
Hardware: x86_64 → All
Comment on attachment 748237 [details] [diff] [review]
Fix O(n^2) performance bug in DownloadQ and NS_QuickSort

>-      if (NS_SUCCEEDED(rv) && !mDownloadQ.Contains(aMsgKeyList[idx]) && doesFit)
>+      if (NS_SUCCEEDED(rv) && !mDownloadQItems.Get(aMsgKeyList[idx], 0) && doesFit)
What was wrong with using Contains on the hash?
Since you never actually need the value, you should use use an nsTHashtable directly. See the XPCOM hashtable guide:
https://developer.mozilla.org/en-US/docs/XPCOM_hashtable_guide
Nit: Since you're using it as a hash set, probably the best variable name would be mDownloadSet.

>diff -Naur comm-esr17.orig/mozilla/xpcom/glue/nsQuickSort.cpp comm-esr17/mozilla/xpcom/glue/nsQuickSort.cpp
>--- comm-esr17.orig/mozilla/xpcom/glue/nsQuickSort.cpp	2013-05-10 13:12:44.754802485 -0400
>+++ comm-esr17/mozilla/xpcom/glue/nsQuickSort.cpp	2013-05-10 13:16:49.763739901 -0400
(You can't patch the mozilla folder from comm-central.)
Attachment #748237 - Flags: review?(neil) → review-
(In reply to comment #11)
> Someone mentioned QuickSort. If the download queue is sorted, then we can
> use IndexOfFirstElementGt to see whether the element is in the list

Sadly it's only partly sorted, so we can't readily do that.
FYI.

> 10000000 random integers:
> 
> NS_QuickSort                        4.860s
> NS_QuickSort (without insert sort)  5.605s
> std::sort                           4.105s
> std::stable_sort                    6.196s
> qsort                               6.274s
> 
> When given pre-sorted items:
> 
> NS_QuickSort                        0.587s
> NS_QuickSort (without insert sort)  2.947s
> std::sort                           1.900s
> std::stable_sort                    3.087s
> qsort                               3.975s

Myth "NS_QuickSort() used by Tb is pretty quick" was gone, because bug 872497 was fixed and "insertion sort in NS_QuickSort()" is already removed. "Average == O( n * log(n) )" is now applicable to "fully pre-sorted in ascending order" case too.

(In reply to Derrick Moser from comment #8)
> NS_QuickSort() uses insertion sort if it thinks the data is nearly sorted. 
> Although this gives O(n) behaviour when working on pre-sorted or duplicate
> data, it can produce O(n^2) in some cases (which Thunderbird encountered
> trying to determine a download strategy for my mailbox) and these O(n^2)
> performance problems are difficult for others to reproduce as they are very
> dependent on the input data.

Thanks for resolving critical performance problem of bug 872497.
I should have read and understand your comment well before posting comments in bug 872497...
(In reply to neil@parkwaycc.co.uk from comment #19)
> Comment on attachment 748237 [details] [diff] [review]
> Fix O(n^2) performance bug in DownloadQ and NS_QuickSort
> 
> >-      if (NS_SUCCEEDED(rv) && !mDownloadQ.Contains(aMsgKeyList[idx]) && doesFit)
> >+      if (NS_SUCCEEDED(rv) && !mDownloadQItems.Get(aMsgKeyList[idx], 0) && doesFit)
> What was wrong with using Contains on the hash?
> Since you never actually need the value, you should use use an nsTHashtable
> directly. See the XPCOM hashtable guide:
> https://developer.mozilla.org/en-US/docs/XPCOM_hashtable_guide
> Nit: Since you're using it as a hash set, probably the best variable name
> would be mDownloadSet.

Derrick, is it possible to update the patch?
Severity: normal → major
Flags: needinfo?(derrick_moser)
Summary: O(n^2) performance problem fetching from IMAP server → O(n^2) performance freezes UI for several minutes fetching new mail from IMAP server for very large folder
Whiteboard: [patchlove]
I'm hitting this issue as well.  Please let me know if I can help with further testing.

Derrick, let me know if you're not able to make these updates (and if that's what's blocking this fix).  I think I could; they look like simple changes; but I'm still learning / wrestling with Thunderbird's build system.
Adam, I will be unable to make these updates in the foreseeable future due to a lack of spare time.  You are welcome to take over the patch.  You may want to take a peek at bug #872869 as well.
Flags: needinfo?(derrick_moser)
See Also: → 872869
Adam isn't able to take this, so time to pass the potatoe to the next dev. Neil, would you want to take a stab and have someone review it?
Flags: needinfo?(neil)
I'll update the comm-central part of the patch if you find a reviewer. (I won't touch the mozilla-central part of the patch.)
Flags: needinfo?(neil)
Blocks: tbbigfolder
(In reply to neil@parkwaycc.co.uk from comment #26)
> I'll update the comm-central part of the patch if you find a reviewer. 

irving has agreed to review


> (I > won't touch the mozilla-central part of the patch.)

rkent, can you handle the m-c part?  potential reviewers bsmedberg, dougt, froydnj
Flags: needinfo?(neil)
Flags: needinfo?(kent)
(In reply to Wayne Mery (:wsmwk) from comment #27)
> 
> > (I > won't touch the mozilla-central part of the patch.)
> 
> rkent, can you handle the m-c part?  potential reviewers bsmedberg, dougt,
> froydnj

I thought that the m-c part of this was already split out, and landed in bug 872497
Flags: needinfo?(kent)
(In reply to Kent James (:rkent) from comment #28)
> (In reply to Wayne Mery (:wsmwk) from comment #27)
> > 
> > > (I > won't touch the mozilla-central part of the patch.)
> > 
> > rkent, can you handle the m-c part?  potential reviewers bsmedberg, dougt,
> > froydnj
> 
> I thought that the m-c part of this was already split out, and landed in bug
> 872497

quite right. Neil was not cc: on bug 872497 so would not have seen the action
2013-05-19 https://hg.mozilla.org/mozilla-central/rev/aad29aa89237

over to Neil
Assignee: derrick_moser → neil
See Also: → 872497
Whiteboard: [patchlove]
Attached patch Updated patchSplinter Review
Attachment #748237 - Attachment is obsolete: true
Attachment #8481253 - Flags: review?(irving)
Flags: needinfo?(neil)
Attachment #8481253 - Flags: review?(irving) → review+
> Attachment #8481253 [details] [diff] - Flags: review?(irving@mozilla.com) → review+

ready to checkin?
Flags: needinfo?(neil)
(In reply to Wayne Mery from comment #31)
> ready to checkin?

Tree Thunderbird-Trunk is CLOSED!
Flags: needinfo?(neil)
But feel free to ping me when it reopens.

(Keyboard navigation fail, was trying to tab to comment but missed.)
Keywords: checkin-needed
I haven't checked this in because I'm unsure who the patch author is. Feel free to check it in (c-c is pretty green at the moment) or I can do it next time around.
(In reply to Patrick Cloke [:clokep] from comment #34)
> I haven't checked this in because I'm unsure who the patch author is. Feel
> free to check it in (c-c is pretty green at the moment) or I can do it next
> time around.

Neil took over
Flags: needinfo?(clokep)
Flags: needinfo?(clokep)
Pushed comm-central changeset 2e3eb4f336d0.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 37.0
See Also: → 698093
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: