Closed Bug 764306 Opened 12 years ago Closed 8 months ago

When trying to scroll large folders or switch some folders the application beachballs with high CPU (sort)

Categories

(MailNews Core :: Networking: IMAP, defect)

x86
All
defect

Tracking

(Not tracked)

RESOLVED INVALID
Thunderbird 16.0

People

(Reporter: Usul, Assigned: neil)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(5 files, 3 obsolete files)

Str :

1) Open Daily
2) Open my big bugs folder
3) try to scroll

Beachball

4) Switch to other folder
5) Try to scroll

Beachball

I've build with --enable-profiling and got the coming up samples (with symbols).

At first I thought it was gloda related but it's not. I'm now thinking that I've reached out a folder limit size or something.

Samples coming up.
Attached file First sample
Attached file Second sample
Attached file Third sample
Moving to MailNews Core:Imap based on two of the samples.

David would a synched sample + imap.log be useful ?
Component: General → Networking: IMAP
Product: Thunderbird → MailNews Core
QA Contact: general → networking.imap
how big are these folders? The log shows us trying to sort all the uids, which should be quick, unless you have gigantic folders.
(In reply to David :Bienvenu from comment #5)
> how big are these folders? The log shows us trying to sort all the uids,
> which should be quick, unless you have gigantic folders.

Oulan:mail.mozilla.com ludo$ ls -lh B*
-rw-------  1 ludo  staff   230M Jun 13 16:36 Bugs
-rw-r--r--  1 ludo  staff    39M Jun 13 16:38 Bugs.msf

Not that big.
Sorry, I meant how many messages. One thing is that quicksort performs worst when the array is already sorted correctly, and it's quite likely that the array is already sorted. It's possible that it's always already sorted; I'd have to figure out if Mork guarantees that or not.
77000
How long does a "beachball" take?
Mork does not guarantee oid ordering, but these tables tend to be highly highly ordered (undo of a delete is the main way I can see these tables getting unordered). bubble sort is actually a reasonable way of sorting this these keys.
(In reply to :aceman from comment #9)
> How long does a "beachball" take?

A few seconds to 10s of seconds.


(In reply to David :Bienvenu from comment #10)
> Mork does not guarantee oid ordering, but these tables tend to be highly
> highly ordered (undo of a delete is the main way I can see these tables
> getting unordered). bubble sort is actually a reasonable way of sorting this
> these keys.

Hum I don't delete/undo often , I guess that might have triggered it. So does rebuilding the .msf would fix the issue ?
(In reply to Ludovic Hirlimann [:Usul] from comment #11)
> (In reply to :aceman from comment #9)

> 
> Hum I don't delete/undo often , I guess that might have triggered it. So
> does rebuilding the .msf would fix the issue ?

No, it would probably make it worse for you. Sorry, I was not being clear. If the uids were guaranteed to be sorted already, then we wouldn't need to sort but the code today always sorts, and quicksort performs badly on already sorted array.
Ludo, I've requested try server builds which use a bubble sort to sort the key array, if it's not already sorted. http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-ddcef9c6f19e

I'm a bit skeptical that this is going to help, but yours is not the first perf log I've seen with this on the stack, so let's see if this helps.
Attached patch use bubble sort (obsolete) — Splinter Review
it's a bit embarrassing to propose a quick sort but what the heck...
Assignee: nobody → dbienvenu
(In reply to David :Bienvenu from comment #14)

> it's a bit embarrassing to propose a quick sort but what the heck...

er, bubble sort.
(In reply to David :Bienvenu from comment #13)

> I'm a bit skeptical that this is going to help, but yours is not the first
> perf log I've seen with this on the stack, so let's see if this helps.

I'm still beach balling but it's way better. Without the patch sample tells me the hangs last more than 5 seconds while with the patch less than 3 when they are longer. SO this definitively helps.
Usul, is this IMAP specific or can you see the same perf on a local folder? I have a local test folder of 600000 messages that takes some seconds to open. I could try the patch on it if it makes sense. The patch touches database so maybe it is server type neutral...

Also, what is your CPU?
(In reply to :aceman from comment #17)
> Usul, is this IMAP specific or can you see the same perf on a local folder?
> I have a local test folder of 600000 messages that takes some seconds to

I only see it on Imap haven't tried on local folders.

> open. I could try the patch on it if it makes sense. The patch touches
> database so maybe it is server type neutral...
> 
> Also, what is your CPU?

2.4 Ghz Core i7
(In reply to Ludovic Hirlimann [:Usul] from comment #16)

> 
> I'm still beach balling but it's way better. Without the patch sample tells
> me the hangs last more than 5 seconds while with the patch less than 3 when
> they are longer. SO this definitively helps.

with the patch, what does the sample tell you about where the hang is?
(In reply to David :Bienvenu from comment #19)

> 
> with the patch, what does the sample tell you about where the hang is?

in bubblesort
(In reply to Ludovic Hirlimann [:Usul] from comment #20)

> > with the patch, what does the sample tell you about where the hang is?
> 
> in bubblesort

with this patch, now repairing the folder should make the bubble sort fast (or not get called at all). Can you try that with one of your folders? It will make us redownload the headers and sync the offline store, if it's configured for offline use.
After a repair I don't see the beachball anymore.
david shall we set review flags to get this on trunk ?
Attached patch proposed fix for review (obsolete) — Splinter Review
slightly scary to use bubble sort, but it seems to be helping ludo.
Attachment #632895 - Attachment is obsolete: true
Attachment #633644 - Flags: superreview?(mbanner)
Attachment #633644 - Flags: review?(neil)
I've done some research and it seems that the optimal algorithm for sorting a stream of nearly sorted numbers is a hybrid binary insertion sort: http://www.pathcom.com/~vadco/binary.html

In particular while the numbers are sorted it simply appends them to the list, but if one arrives out of order then a binary search is used to insert it into the correct position. This is more efficient than a bubble sort because we can memmove the elements instead of swapping them.

I'll attach a patch, but I don't know whether it makes sense to create a try build with the changes now that the folder has been rebuilt.
Attached patch Alternative approach (obsolete) — Splinter Review
Attachment #633841 - Flags: superreview?(mbanner)
Attachment #633841 - Flags: review?(dbienvenu)
Comment on attachment 633841 [details] [diff] [review]
Alternative approach

this patch doesn't apply (bad hunk 1 in nsIMsgKeyArray.idl)

But, yeah, I like this approach quite a bit more.

+    mdb_id lastId = 0;

lastId should probably be called largestId.
Comment on attachment 633841 [details] [diff] [review]
Alternative approach

I like this, but I'm getting this when trying to apply the patch:

patch: **** malformed patch at line 49: @@ -43,16 +43,22

Hence, cancelling review for now.
Attachment #633841 - Flags: superreview?(mbanner)
Comment on attachment 633644 [details] [diff] [review]
proposed fix for review

I think as you commented, we'll go with Neil's suggestion.
Attachment #633644 - Flags: superreview?(mbanner) → superreview-
Attached patch Fixed patchSplinter Review
Oops, I must have accidentally hit D while looking at the patch in vi.
I've also renamed the variable in the patch (without recompiling...)
Assignee: dbienvenu → neil
Attachment #633841 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #633841 - Flags: review?(dbienvenu)
Attachment #634373 - Flags: superreview?(mbanner)
Attachment #634373 - Flags: review?(dbienvenu)
Attachment #634373 - Flags: review?(dbienvenu) → review+
If this is still an issue, it might be possible to "repair" db's by doing a move row in the table, in essence, keeping the table itself sorted.
Attachment #634373 - Flags: superreview?(mbanner) → superreview+
Attachment #633644 - Attachment is obsolete: true
Attachment #633644 - Flags: review?(neil)
Pushed comm-central changeset 28b099d53998.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 16.0
I know this is resolved now, but I wanted to bring something up regarding sorting algorithms.

On an already-sorted array, insertion sort and bubble sort are equally good (one pass through the data). Quicksort, as pointed out, is generally poor at this... however it can be improved with good choices of the "pivot" element. It will never be as good as insertion sort if the data is completely sorted, but the damage can be mitigated.

On a mostly-sorted array, insertion sort is better than bubble sort. They're both O(n^2), but in practice insertion sort will require far fewer operations.


There is another algorithm with mentioning: Merge Sort. http://en.wikipedia.org/wiki/Merge_sort

Merge sort is O(n*log(n)), like quicksort. However, it does not have the same terrible worst-case performance of quicksort... it is O(n*log(n)) in the best and worst cases. This makes it a fairly predictable algorithm, whereas quicksort can vary wildly.

In some implementations, Merge sort can be O(n) on an already-sorted list.

A good trick for merge and quicksort is to interrupt the algorithm when the individual partitions are smaller than a given size, and switch to an insertion sort... which is commonly faster for small data sets.


Anyway, just thought that worth pointing out... it might be possible to get back to a single algorithm. A bubble sort makes me nervous... it makes me wonder how we might accidentally get into that state and call it on a not-well-sorted array. :)
In bug 872869 Neil comments ...

(In reply to neil@parkwaycc.co.uk from bug 872869 comment #12)
> (In reply to Derrick Moser from bug 872869 comment #10)
> > I had been encountering poor performance problems with Thunderbird for a
> > long time while I was stuck on Fedora 14.  After upgrading to Fedora 18 and
> > discovering the problems still persisted, I started investigating using a
> > debugger and profiler.  This issue, bug 870556, and bug 872497 are the main
> > culprits I found.
> 
> Would bug 872497 have been the underlying cause of bug 764306? If so, then
> we can revert that change and go back to quicksorting the array every time.
Blocks: tbbigfolder
See Also: → 872497
Summary: When trying to scroll some folders or switch some folders the application beachballs → When trying to scroll large folders or switch some folders the application beachballs with high CPU
Patch of this bug will be backed by bug 872869, so re-open this bug to find solution of this bug after backout.

(1) Patch of this bug :
      Sort ListAllKeys() result in messageKey order using Insertion sort technique.
(2) Although Insertion sort had problem of O(N^2) in worst case, following is sure.
      (2-1) Problem of this bug was resolved by merely "sort ListAllKey() result in messageKey order".
      (2-2) Even if worst case of Insertion sort is O(N^2), Insertion sort is fast if almost array is already sorted in ascending order.
      =>
     .Penalty/loss by Insertion sort <<< gain in this bug's problem

If this bug is for "worst case in Insertion sort"(for example, array in descending order), penalty/loss by Insertion sort should be big like bug 872869 comment #33.
However, penalty/loss by Insertion sort was not so big in this bug. This indicates that "array in this bug" is already near to ascending order although it's never absolute ascending order.
Even though "near to ascending order", why problem of this bug was resolved only by "sort of ListAllKeys result"?
It indicates this bug is also an O(N ^ 2) problem, deosn't it?
   Example of O(N ^ 2) issues : Bug 872869, Bug 872497, Bug 870556

Because "sorted ListAllKeys result" resolved this bug's problem, "sort before using ListAllKeys result" is a simple solution of this bug.
Where should sort be called? Sort by which key? messageKey? "Sort by messageKe only" is sufficient in Virtuaal Folder case?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Phenomenon of this bug :
   (a) Beachball when switch big imap folders.
   (b) Beachball when try to scroll aat thread pane.
(a) = msgDB open + message display in View of Thread pane
(b) = Displayed row(msgDBHdr) change in View of Thread pane.

Displayed order at Thread pane depends on sort column of Thread pane(secondary sort is already supported, but is not stable sort yet).
Is "sort of ListAllKeys result in messageKey value" proper solution?
"Construction of messageKey value array in Thread pane sort order" is better, isn't it?
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> Created attachment 632608 [details]
> First sample

Following is seen.
Why NS_QuickSort() is called  so many times?
O(N ^ 2) problem in some component(similar issue to Bug 870556).
If it was same problem as Bug 872497 in NS_QuickSort(), 
"why this bug was resolved by Sort ListAllKeys result uusing Insertion sort" may be explained.
   1 * "Sort ListAllKeys result using Insertion sort"  +  N * "NS_QuickSort() on already sorted array in ascending order"
      is far faster than
   N * "NS_QuickSort() on not already sorted array in ascending order"
i.e.
If array is already sorted in ascending order, NS_QuickSort() did do "insertion sort on entire array == Do nothing.
So, "calling NS_QuickSort() many many times" didn't produce big performance issue,
if array is sorted in ascending order before array is passed to NS_QuickSort() many times.

  
 "" /
 us s



> nsTArrayDefaultAllocator>::Compare<nsDefaultComparator<unsigned int, unsigned int> >(void const*, void const*, void*)  (in XUL) + 0  [0x101b1e2f0]  nsTArray.h:1109
>     +                         !                     :     146 NS_QuickSort  (in XUL) + 2589,2581,...  [0x1020b6b0d,0x1020b6b05,...]  nsQuickSort.cpp:78
>     +                         !                     :     39 NS_QuickSort  (in XUL) + 2541,2529,...  [0x1020b6add,0x1020b6ad1,...]  nsQuickSort.cpp:75
>     +                         !                     :     22 NS_QuickSort  (in XUL) + 2516,2505,...  [0x1020b6ac4,0x1020b6ab9,...]  nsQuickSort.cpp:161
>     +                         !                     :     14 NS_QuickSort  (in XUL) + 2490,2485  [0x1020b6aaa,0x1020b6aa5]  nsQuickSort.cpp:163
>     +                         !                     2 nsImapMailFolder::UpdateImapMailboxInfo(nsIImapProtocol*, nsIMailboxSpec*)  (in XUL) + 1958  [0x101e5abd6]  nsImapMailFolder.cpp:2803
>     +                         !                     : 2 nsImapMailFolder::FindKeysToAdd(nsTArray<unsigned int, nsTArrayDefaultAllocator>
Essentially same as or similar to Bug 870556?
   During analysis of Bug 870556, he found Bug 872497 in NS_QuickSort() and resolved NS_QuickSort() issue,
   then, he found Bug 872869 which was produced by patch for this bug...
Sorry, wrong understanding. Multiple NS_QuickSort() in that place is perhaps recursive call by NS_QuickSort() for half block of array.
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> Created attachment 632608 [details]
> First sample

Following is data for ListAllKeys.
>     +                         !                     599 nsImapMailFolder::UpdateImapMailboxInfo(nsIImapProtocol*, nsIMailboxSpec*)  (in XUL) + 494  [0x101e5a61e]  nsImapMailFolder.cpp:2686
>     +                         !                     : 599 nsMsgDatabase::ListAllKeys(nsIMsgKeyArray*)  (in XUL) + 246  [0x101e2f086]  nsMsgDatabase.cpp:3090
>     +                         !                     :   599 nsMsgKeyArray::Sort()  (in XUL) + 39  [0x101d09e37]  nsTArray.h:1121
>     +                         !                     :     336 NS_QuickSort  (in XUL) + 2512  [0x1020b6ac0]  nsQuickSort.cpp:161
nsMsgKeyArray::Sort() is called, and NS_QuickSort() is used.
It looks for me sorting is done.
However, this patch for this bug was "sort in ascending order in nsMsgDatabase::ListAllKeys using Insertion sort technique".
Why the soltion of  "sort in ascending order in nsMsgDatabase::ListAllKeys" was chosen for this bug?
Oh, following comments were found.
comment #5.
> how big are these folders? The log shows us trying to sort all the uids, which should be quick, unless you have gigantic folders.
comment #7.
> Sorry, I meant how many messages. One thing is that quicksort performs worst when the array is already sorted correctly,
> and it's quite likely that the array is already sorted.
> It's possible that it's always already sorted; I'd have to figure out if Mork guarantees that or not.

This is same as  Bug 872497.
   If swap at a Pivot doesn't occur, our NS_QuickSort() did insertion sort on entire array before fix of Bug 872497.
   If "left half" < Pivot < "right half", swap doesn't occur, so insertion sort is executed on entire array.
   This occurs by "delete messageKey of right half(small messageKey) then undete messageKey of right half".
   If "left half" is sorted in descending order, and if "right half" is sorted in descending order, worst case happens.
   If "left half" is not sorted in ascending order, and if "right half" is not sorted in ascending order, pretty bad case happens even if not worst case.
If standard qsort() was used, this kind of issue couldn't occur.
If entire array is already sorted in ascending order, our NS_QuickSort() was pretty fast as bubble sort is pretty fast, because it does do nothng.
i.e. Our NS_QuickSort() was very torelant with needless/blind NS_QuickSort() call on array of "already sorted in absolute ascending order".
(In reply to neil@parkwaycc.co.uk from comment #30)
> Created attachment 634373 [details] [diff] [review]
> Fixed patch

This kind of job can be achieved by pretty simple code.
   DataArray.Sort() using std::sort().
   Append a few big elements which are not biggest but sufficiently big in the array in random order.
   DataArray.Qsort() using NS_QuickSort() before change by Bug 872497(original NS_QuickSort() ) 
Because of "left half < Pivot < right half", swap doesn't happen, then insertion sort is executed on entire array.
99.9 % of array is already sorted in ascending order, so actual insertion sort is done on biggest 0.1 % part only.
So, InsertAt() like logic is not needed. "Append elements to already sorted array, then call original NS_QuickSort()". is sufficient.

What's bad in Mozilla MailNews/Thunderbird was : 
   Multiple blind calls of NS_QuickSort() without understanding benefit/cost of original NS_QuickSort().
   Even though other standard sort programs are available, called NS_QuickSort() because progrm is named QuickSort.
It's never fault in original NS_QuickSort(), if problem was caused by worst case of NS_QuickSort().
OS: Mac OS X → All
Summary: When trying to scroll large folders or switch some folders the application beachballs with high CPU → When trying to scroll large folders or switch some folders the application beachballs with high CPU (sort)
Severity: normal → S3

Obsolete with version 115

Status: REOPENED → RESOLVED
Closed: 12 years ago8 months ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: