Last Comment Bug 764306 - When trying to scroll large folders or switch some folders the application beachballs with high CPU
: When trying to scroll large folders or switch some folders the application be...
Status: REOPENED
: perf
Product: MailNews Core
Classification: Components
Component: Networking: IMAP (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal (vote)
: Thunderbird 16.0
Assigned To: neil@parkwaycc.co.uk
:
:
Mentors:
Depends on: 872869
Blocks: 1058026
  Show dependency treegraph
 
Reported: 2012-06-13 02:04 PDT by Ludovic Hirlimann [:Usul]
Modified: 2016-11-21 11:08 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
First sample (59.69 KB, text/plain)
2012-06-13 02:07 PDT, Ludovic Hirlimann [:Usul]
no flags Details
Second sample (51.05 KB, text/plain)
2012-06-13 02:08 PDT, Ludovic Hirlimann [:Usul]
no flags Details
Third sample (32.97 KB, text/plain)
2012-06-13 02:08 PDT, Ludovic Hirlimann [:Usul]
no flags Details
use bubble sort (3.04 KB, patch)
2012-06-13 14:55 PDT, David :Bienvenu
no flags Details | Diff | Splinter Review
Sample with patch applied (36.25 KB, text/plain)
2012-06-14 09:07 PDT, Ludovic Hirlimann [:Usul]
no flags Details
proposed fix for review (3.61 KB, patch)
2012-06-15 13:20 PDT, David :Bienvenu
standard8: superreview-
Details | Diff | Splinter Review
Alternative approach (3.51 KB, patch)
2012-06-16 12:45 PDT, neil@parkwaycc.co.uk
no flags Details | Diff | Splinter Review
Fixed patch (3.52 KB, patch)
2012-06-19 06:29 PDT, neil@parkwaycc.co.uk
mozilla: review+
standard8: superreview+
Details | Diff | Splinter Review

Description Ludovic Hirlimann [:Usul] 2012-06-13 02:04:21 PDT
Str :

1) Open Daily
2) Open my big bugs folder
3) try to scroll

Beachball

4) Switch to other folder
5) Try to scroll

Beachball

I've build with --enable-profiling and got the coming up samples (with symbols).

At first I thought it was gloda related but it's not. I'm now thinking that I've reached out a folder limit size or something.

Samples coming up.
Comment 1 Ludovic Hirlimann [:Usul] 2012-06-13 02:07:39 PDT
Created attachment 632608 [details]
First sample
Comment 2 Ludovic Hirlimann [:Usul] 2012-06-13 02:08:00 PDT
Created attachment 632609 [details]
Second sample
Comment 3 Ludovic Hirlimann [:Usul] 2012-06-13 02:08:24 PDT
Created attachment 632610 [details]
Third sample
Comment 4 Ludovic Hirlimann [:Usul] 2012-06-13 02:09:35 PDT
Moving to MailNews Core:Imap based on two of the samples.

David would a synched sample + imap.log be useful ?
Comment 5 David :Bienvenu 2012-06-13 06:53:18 PDT
how big are these folders? The log shows us trying to sort all the uids, which should be quick, unless you have gigantic folders.
Comment 6 Ludovic Hirlimann [:Usul] 2012-06-13 07:39:52 PDT
(In reply to David :Bienvenu from comment #5)
> how big are these folders? The log shows us trying to sort all the uids,
> which should be quick, unless you have gigantic folders.

Oulan:mail.mozilla.com ludo$ ls -lh B*
-rw-------  1 ludo  staff   230M Jun 13 16:36 Bugs
-rw-r--r--  1 ludo  staff    39M Jun 13 16:38 Bugs.msf

Not that big.
Comment 7 David :Bienvenu 2012-06-13 07:41:32 PDT
Sorry, I meant how many messages. One thing is that quicksort performs worst when the array is already sorted correctly, and it's quite likely that the array is already sorted. It's possible that it's always already sorted; I'd have to figure out if Mork guarantees that or not.
Comment 8 Ludovic Hirlimann [:Usul] 2012-06-13 07:45:58 PDT
77000
Comment 9 :aceman 2012-06-13 08:20:49 PDT
How long does a "beachball" take?
Comment 10 David :Bienvenu 2012-06-13 08:55:47 PDT
Mork does not guarantee oid ordering, but these tables tend to be highly highly ordered (undo of a delete is the main way I can see these tables getting unordered). bubble sort is actually a reasonable way of sorting this these keys.
Comment 11 Ludovic Hirlimann [:Usul] 2012-06-13 10:10:56 PDT
(In reply to :aceman from comment #9)
> How long does a "beachball" take?

A few seconds to 10s of seconds.


(In reply to David :Bienvenu from comment #10)
> Mork does not guarantee oid ordering, but these tables tend to be highly
> highly ordered (undo of a delete is the main way I can see these tables
> getting unordered). bubble sort is actually a reasonable way of sorting this
> these keys.

Hum I don't delete/undo often , I guess that might have triggered it. So does rebuilding the .msf would fix the issue ?
Comment 12 David :Bienvenu 2012-06-13 12:01:58 PDT
(In reply to Ludovic Hirlimann [:Usul] from comment #11)
> (In reply to :aceman from comment #9)

> 
> Hum I don't delete/undo often , I guess that might have triggered it. So
> does rebuilding the .msf would fix the issue ?

No, it would probably make it worse for you. Sorry, I was not being clear. If the uids were guaranteed to be sorted already, then we wouldn't need to sort but the code today always sorts, and quicksort performs badly on already sorted array.
Comment 13 David :Bienvenu 2012-06-13 14:52:01 PDT
Ludo, I've requested try server builds which use a bubble sort to sort the key array, if it's not already sorted. http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-ddcef9c6f19e

I'm a bit skeptical that this is going to help, but yours is not the first perf log I've seen with this on the stack, so let's see if this helps.
Comment 14 David :Bienvenu 2012-06-13 14:55:48 PDT
Created attachment 632895 [details] [diff] [review]
use bubble sort

it's a bit embarrassing to propose a quick sort but what the heck...
Comment 15 David :Bienvenu 2012-06-13 14:56:07 PDT
(In reply to David :Bienvenu from comment #14)

> it's a bit embarrassing to propose a quick sort but what the heck...

er, bubble sort.
Comment 16 Ludovic Hirlimann [:Usul] 2012-06-14 00:04:15 PDT
(In reply to David :Bienvenu from comment #13)

> I'm a bit skeptical that this is going to help, but yours is not the first
> perf log I've seen with this on the stack, so let's see if this helps.

I'm still beach balling but it's way better. Without the patch sample tells me the hangs last more than 5 seconds while with the patch less than 3 when they are longer. SO this definitively helps.
Comment 17 :aceman 2012-06-14 00:10:07 PDT
Usul, is this IMAP specific or can you see the same perf on a local folder? I have a local test folder of 600000 messages that takes some seconds to open. I could try the patch on it if it makes sense. The patch touches database so maybe it is server type neutral...

Also, what is your CPU?
Comment 18 Ludovic Hirlimann [:Usul] 2012-06-14 06:40:49 PDT
(In reply to :aceman from comment #17)
> Usul, is this IMAP specific or can you see the same perf on a local folder?
> I have a local test folder of 600000 messages that takes some seconds to

I only see it on Imap haven't tried on local folders.

> open. I could try the patch on it if it makes sense. The patch touches
> database so maybe it is server type neutral...
> 
> Also, what is your CPU?

2.4 Ghz Core i7
Comment 19 David :Bienvenu 2012-06-14 07:46:44 PDT
(In reply to Ludovic Hirlimann [:Usul] from comment #16)

> 
> I'm still beach balling but it's way better. Without the patch sample tells
> me the hangs last more than 5 seconds while with the patch less than 3 when
> they are longer. SO this definitively helps.

with the patch, what does the sample tell you about where the hang is?
Comment 20 Ludovic Hirlimann [:Usul] 2012-06-14 09:07:29 PDT
Created attachment 633158 [details]
Sample with patch applied

(In reply to David :Bienvenu from comment #19)

> 
> with the patch, what does the sample tell you about where the hang is?

in bubblesort
Comment 21 David :Bienvenu 2012-06-14 09:34:11 PDT
(In reply to Ludovic Hirlimann [:Usul] from comment #20)

> > with the patch, what does the sample tell you about where the hang is?
> 
> in bubblesort

with this patch, now repairing the folder should make the bubble sort fast (or not get called at all). Can you try that with one of your folders? It will make us redownload the headers and sync the offline store, if it's configured for offline use.
Comment 22 Ludovic Hirlimann [:Usul] 2012-06-15 01:47:58 PDT
After a repair I don't see the beachball anymore.
Comment 23 Ludovic Hirlimann [:Usul] 2012-06-15 13:11:21 PDT
david shall we set review flags to get this on trunk ?
Comment 24 David :Bienvenu 2012-06-15 13:20:19 PDT
Created attachment 633644 [details] [diff] [review]
proposed fix for review

slightly scary to use bubble sort, but it seems to be helping ludo.
Comment 25 neil@parkwaycc.co.uk 2012-06-16 12:35:52 PDT
I've done some research and it seems that the optimal algorithm for sorting a stream of nearly sorted numbers is a hybrid binary insertion sort: http://www.pathcom.com/~vadco/binary.html

In particular while the numbers are sorted it simply appends them to the list, but if one arrives out of order then a binary search is used to insert it into the correct position. This is more efficient than a bubble sort because we can memmove the elements instead of swapping them.

I'll attach a patch, but I don't know whether it makes sense to create a try build with the changes now that the folder has been rebuilt.
Comment 26 neil@parkwaycc.co.uk 2012-06-16 12:45:45 PDT
Created attachment 633841 [details] [diff] [review]
Alternative approach
Comment 27 David :Bienvenu 2012-06-18 08:30:27 PDT
Comment on attachment 633841 [details] [diff] [review]
Alternative approach

this patch doesn't apply (bad hunk 1 in nsIMsgKeyArray.idl)

But, yeah, I like this approach quite a bit more.

+    mdb_id lastId = 0;

lastId should probably be called largestId.
Comment 28 Mark Banner (:standard8, afk until Dec) 2012-06-19 06:08:22 PDT
Comment on attachment 633841 [details] [diff] [review]
Alternative approach

I like this, but I'm getting this when trying to apply the patch:

patch: **** malformed patch at line 49: @@ -43,16 +43,22

Hence, cancelling review for now.
Comment 29 Mark Banner (:standard8, afk until Dec) 2012-06-19 06:08:49 PDT
Comment on attachment 633644 [details] [diff] [review]
proposed fix for review

I think as you commented, we'll go with Neil's suggestion.
Comment 30 neil@parkwaycc.co.uk 2012-06-19 06:29:36 PDT
Created attachment 634373 [details] [diff] [review]
Fixed patch

Oops, I must have accidentally hit D while looking at the patch in vi.
I've also renamed the variable in the patch (without recompiling...)
Comment 31 David :Bienvenu 2012-06-19 10:28:35 PDT
If this is still an issue, it might be possible to "repair" db's by doing a move row in the table, in essence, keeping the table itself sorted.
Comment 32 neil@parkwaycc.co.uk 2012-06-21 12:22:50 PDT
Pushed comm-central changeset 28b099d53998.
Comment 33 Jake Maul [:jakem] 2012-08-30 15:54:21 PDT
I know this is resolved now, but I wanted to bring something up regarding sorting algorithms.

On an already-sorted array, insertion sort and bubble sort are equally good (one pass through the data). Quicksort, as pointed out, is generally poor at this... however it can be improved with good choices of the "pivot" element. It will never be as good as insertion sort if the data is completely sorted, but the damage can be mitigated.

On a mostly-sorted array, insertion sort is better than bubble sort. They're both O(n^2), but in practice insertion sort will require far fewer operations.


There is another algorithm with mentioning: Merge Sort. http://en.wikipedia.org/wiki/Merge_sort

Merge sort is O(n*log(n)), like quicksort. However, it does not have the same terrible worst-case performance of quicksort... it is O(n*log(n)) in the best and worst cases. This makes it a fairly predictable algorithm, whereas quicksort can vary wildly.

In some implementations, Merge sort can be O(n) on an already-sorted list.

A good trick for merge and quicksort is to interrupt the algorithm when the individual partitions are smaller than a given size, and switch to an insertion sort... which is commonly faster for small data sets.


Anyway, just thought that worth pointing out... it might be possible to get back to a single algorithm. A bubble sort makes me nervous... it makes me wonder how we might accidentally get into that state and call it on a not-well-sorted array. :)
Comment 34 Wayne Mery (:wsmwk, NI for questions) 2014-08-27 04:25:44 PDT
In bug 872869 Neil comments ...

(In reply to neil@parkwaycc.co.uk from bug 872869 comment #12)
> (In reply to Derrick Moser from bug 872869 comment #10)
> > I had been encountering poor performance problems with Thunderbird for a
> > long time while I was stuck on Fedora 14.  After upgrading to Fedora 18 and
> > discovering the problems still persisted, I started investigating using a
> > debugger and profiler.  This issue, bug 870556, and bug 872497 are the main
> > culprits I found.
> 
> Would bug 872497 have been the underlying cause of bug 764306? If so, then
> we can revert that change and go back to quicksorting the array every time.
Comment 35 WADA 2015-03-11 19:04:38 PDT
Patch of this bug will be backed by bug 872869, so re-open this bug to find solution of this bug after backout.

(1) Patch of this bug :
      Sort ListAllKeys() result in messageKey order using Insertion sort technique.
(2) Although Insertion sort had problem of O(N^2) in worst case, following is sure.
      (2-1) Problem of this bug was resolved by merely "sort ListAllKey() result in messageKey order".
      (2-2) Even if worst case of Insertion sort is O(N^2), Insertion sort is fast if almost array is already sorted in ascending order.
      =>
     .Penalty/loss by Insertion sort <<< gain in this bug's problem

If this bug is for "worst case in Insertion sort"(for example, array in descending order), penalty/loss by Insertion sort should be big like bug 872869 comment #33.
However, penalty/loss by Insertion sort was not so big in this bug. This indicates that "array in this bug" is already near to ascending order although it's never absolute ascending order.
Even though "near to ascending order", why problem of this bug was resolved only by "sort of ListAllKeys result"?
It indicates this bug is also an O(N ^ 2) problem, deosn't it?
   Example of O(N ^ 2) issues : Bug 872869, Bug 872497, Bug 870556

Because "sorted ListAllKeys result" resolved this bug's problem, "sort before using ListAllKeys result" is a simple solution of this bug.
Where should sort be called? Sort by which key? messageKey? "Sort by messageKe only" is sufficient in Virtuaal Folder case?
Comment 36 WADA 2015-03-11 19:24:32 PDT
Phenomenon of this bug :
   (a) Beachball when switch big imap folders.
   (b) Beachball when try to scroll aat thread pane.
(a) = msgDB open + message display in View of Thread pane
(b) = Displayed row(msgDBHdr) change in View of Thread pane.

Displayed order at Thread pane depends on sort column of Thread pane(secondary sort is already supported, but is not stable sort yet).
Is "sort of ListAllKeys result in messageKey value" proper solution?
"Construction of messageKey value array in Thread pane sort order" is better, isn't it?
Comment 37 WADA 2015-03-12 01:30:51 PDT
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> Created attachment 632608 [details]
> First sample

Following is seen.
Why NS_QuickSort() is called  so many times?
O(N ^ 2) problem in some component(similar issue to Bug 870556).
If it was same problem as Bug 872497 in NS_QuickSort(), 
"why this bug was resolved by Sort ListAllKeys result uusing Insertion sort" may be explained.
   1 * "Sort ListAllKeys result using Insertion sort"  +  N * "NS_QuickSort() on already sorted array in ascending order"
      is far faster than
   N * "NS_QuickSort() on not already sorted array in ascending order"
i.e.
If array is already sorted in ascending order, NS_QuickSort() did do "insertion sort on entire array == Do nothing.
So, "calling NS_QuickSort() many many times" didn't produce big performance issue,
if array is sorted in ascending order before array is passed to NS_QuickSort() many times.

  
 "" /
 us s



> nsTArrayDefaultAllocator>::Compare<nsDefaultComparator<unsigned int, unsigned int> >(void const*, void const*, void*)  (in XUL) + 0  [0x101b1e2f0]  nsTArray.h:1109
>     +                         !                     :     146 NS_QuickSort  (in XUL) + 2589,2581,...  [0x1020b6b0d,0x1020b6b05,...]  nsQuickSort.cpp:78
>     +                         !                     :     39 NS_QuickSort  (in XUL) + 2541,2529,...  [0x1020b6add,0x1020b6ad1,...]  nsQuickSort.cpp:75
>     +                         !                     :     22 NS_QuickSort  (in XUL) + 2516,2505,...  [0x1020b6ac4,0x1020b6ab9,...]  nsQuickSort.cpp:161
>     +                         !                     :     14 NS_QuickSort  (in XUL) + 2490,2485  [0x1020b6aaa,0x1020b6aa5]  nsQuickSort.cpp:163
>     +                         !                     2 nsImapMailFolder::UpdateImapMailboxInfo(nsIImapProtocol*, nsIMailboxSpec*)  (in XUL) + 1958  [0x101e5abd6]  nsImapMailFolder.cpp:2803
>     +                         !                     : 2 nsImapMailFolder::FindKeysToAdd(nsTArray<unsigned int, nsTArrayDefaultAllocator>
Comment 38 WADA 2015-03-12 02:00:02 PDT
Essentially same as or similar to Bug 870556?
   During analysis of Bug 870556, he found Bug 872497 in NS_QuickSort() and resolved NS_QuickSort() issue,
   then, he found Bug 872869 which was produced by patch for this bug...
Comment 39 WADA 2015-03-12 02:10:48 PDT
Sorry, wrong understanding. Multiple NS_QuickSort() in that place is perhaps recursive call by NS_QuickSort() for half block of array.
Comment 40 WADA 2015-03-12 02:22:51 PDT
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> Created attachment 632608 [details]
> First sample

Following is data for ListAllKeys.
>     +                         !                     599 nsImapMailFolder::UpdateImapMailboxInfo(nsIImapProtocol*, nsIMailboxSpec*)  (in XUL) + 494  [0x101e5a61e]  nsImapMailFolder.cpp:2686
>     +                         !                     : 599 nsMsgDatabase::ListAllKeys(nsIMsgKeyArray*)  (in XUL) + 246  [0x101e2f086]  nsMsgDatabase.cpp:3090
>     +                         !                     :   599 nsMsgKeyArray::Sort()  (in XUL) + 39  [0x101d09e37]  nsTArray.h:1121
>     +                         !                     :     336 NS_QuickSort  (in XUL) + 2512  [0x1020b6ac0]  nsQuickSort.cpp:161
nsMsgKeyArray::Sort() is called, and NS_QuickSort() is used.
It looks for me sorting is done.
However, this patch for this bug was "sort in ascending order in nsMsgDatabase::ListAllKeys using Insertion sort technique".
Why the soltion of  "sort in ascending order in nsMsgDatabase::ListAllKeys" was chosen for this bug?
Comment 41 WADA 2015-03-13 01:12:00 PDT
Oh, following comments were found.
comment #5.
> how big are these folders? The log shows us trying to sort all the uids, which should be quick, unless you have gigantic folders.
comment #7.
> Sorry, I meant how many messages. One thing is that quicksort performs worst when the array is already sorted correctly,
> and it's quite likely that the array is already sorted.
> It's possible that it's always already sorted; I'd have to figure out if Mork guarantees that or not.

This is same as  Bug 872497.
   If swap at a Pivot doesn't occur, our NS_QuickSort() did insertion sort on entire array before fix of Bug 872497.
   If "left half" < Pivot < "right half", swap doesn't occur, so insertion sort is executed on entire array.
   This occurs by "delete messageKey of right half(small messageKey) then undete messageKey of right half".
   If "left half" is sorted in descending order, and if "right half" is sorted in descending order, worst case happens.
   If "left half" is not sorted in ascending order, and if "right half" is not sorted in ascending order, pretty bad case happens even if not worst case.
If standard qsort() was used, this kind of issue couldn't occur.
If entire array is already sorted in ascending order, our NS_QuickSort() was pretty fast as bubble sort is pretty fast, because it does do nothng.
i.e. Our NS_QuickSort() was very torelant with needless/blind NS_QuickSort() call on array of "already sorted in absolute ascending order".
Comment 42 WADA 2015-03-13 21:43:22 PDT
(In reply to neil@parkwaycc.co.uk from comment #30)
> Created attachment 634373 [details] [diff] [review]
> Fixed patch

This kind of job can be achieved by pretty simple code.
   DataArray.Sort() using std::sort().
   Append a few big elements which are not biggest but sufficiently big in the array in random order.
   DataArray.Qsort() using NS_QuickSort() before change by Bug 872497(original NS_QuickSort() ) 
Because of "left half < Pivot < right half", swap doesn't happen, then insertion sort is executed on entire array.
99.9 % of array is already sorted in ascending order, so actual insertion sort is done on biggest 0.1 % part only.
So, InsertAt() like logic is not needed. "Append elements to already sorted array, then call original NS_QuickSort()". is sufficient.

What's bad in Mozilla MailNews/Thunderbird was : 
   Multiple blind calls of NS_QuickSort() without understanding benefit/cost of original NS_QuickSort().
   Even though other standard sort programs are available, called NS_QuickSort() because progrm is named QuickSort.
It's never fault in original NS_QuickSort(), if problem was caused by worst case of NS_QuickSort().

Note You need to log in before you can comment on or make changes to this bug.