Closed Bug 872869 Opened 7 years ago Closed 5 years ago

nsMsgDatabase::ListAllKeys() very slow with high CPU downloading messages into large folder. Backout of patch for Bug 764306 is mandatory.

Categories

(MailNews Core :: Database, defect, P1, major)

x86
All

Tracking

(thunderbird38+ fixed, thunderbird_esr31+ affected)

RESOLVED FIXED
Thunderbird 39.0
Tracking Status
thunderbird38 + fixed
thunderbird_esr31 + affected

People

(Reporter: derrick_moser, Assigned: aceman)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, regression, Whiteboard: [patchlove][regression by Bug 764306])

Attachments

(1 file, 3 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:20.0) Gecko/20100101 Firefox/20.0
Build ID: 20130329043827

Steps to reproduce:

I left Thunderbird open for a while.  It periodically downloaded emails from an IMAP server.  One of my folders on the IMAP server have over 350000 messages.  While slowly fetching emails, it consumed a lot of CPU time (around 90% for long periods of time).  Although this does not freeze up the system, it is an enormous drain on CPU time available for other processes.


Actual results:

I noticed Thunderbird spent a lot of time on the InsertElementSorted() statement in nsMsgDatabase::ListAllKeys().  See: mailnews/db/msgdb/nsMsgDatabase.cpp.  The method appears to degrade to an O(n^2) insertion sort when building a list of all keys.


Expected results:

ListAllKeys() should have operated in O(n) time.

A couple of things are going wrong here:

1) nsMsgDatabase::ListAllKeys() should not bother to sort the keys.  All of the callers appear to pass in empty lists to be filled and don't care if the results are sorted.  The name of the method doesn't even suggest to potential callers that the results will be sorted.  If the caller requires the results to be sorted, they should explicitly do so as it would make the code more readable and easier to refactor.
2) InsertElementSorted() should not exist.  Code should never require a changing array to remain sorted.  This will just lead to O(n^2) performance problems.  A more appropriate data structure should be chosen for existing code that attempts this.
Component: Untriaged → Database
Keywords: perf
Product: Thunderbird → MailNews Core
Maybe InsertElementSorted means that it is inserting into a sorted array. In that case the insert should actually be O(log n).
So then ListAllKeys should be O(n log n) in he worst case. In case the keys happen to be fetches from lowest to highest it should be O(n). Are you sure you see O(n^2) ?

I think having the keys sorted is often needed, e.g. for optimizing the key ranges sent to the server.
> Maybe InsertElementSorted means that it is inserting into a sorted array.

Yes, that is also how I interpret what it means.

> In that case the insert should actually be O(log n).

Consider inserting an item that belongs at the beginning of the list: all existing items must be moved to accommodate the new item.  The code could be fortunate to receive an item that can simply be appended to the list but on average with arbitrary input half of the items will need to be moved.  This produces O(n) behaviour.  Other data structures like a balanced tree could achieve O(log n) performance but not a sorted list.

> So then ListAllKeys should be O(n log n) in he worst case.

With the above observation, our analysis shows O(n^2) worst case performance.  Placing InsertElementSorted inside of a loop essentially creates a potentially O(n^2) performing insertion sort.  These bugs can be hard to spot as the loop might be several function calls away wrapping the code that actually calls InsertElementSorted.  Eliminating InsertElementSorted would be a nice improvement to the code.

> Are you sure you see O(n^2) ?

Absolutely.

> I think having the keys sorted is often needed, e.g. for optimizing the key
> ranges sent to the server.

My two recommendations to avoid O(n^2) behaviour are:
1) Do not sort the list until all items have been added (and don't attempt to add more after it has been sorted).
2) Use a different data structure to hold the items (perhaps a balanced heap) and generate a sorted list while visiting (or removing) items from the data structure.
Attached is my attempt of addressing the issue.  nsMsgDatabase::ListAllKeys() no longer attempts to keep the list sorted as it inserts new keys (this led to poor performance).  Instead, it appends all keys without sorting.  Callers can sort the result if needed.
(In reply to Derrick Moser from comment #3)
> Created attachment 752430 [details] [diff] [review]
> Improve performance of nsMsgDatabase::ListAllKeys()
> 
> Attached is my attempt of addressing the issue. 
> nsMsgDatabase::ListAllKeys() no longer attempts to keep the list sorted as
> it inserts new keys (this led to poor performance).  Instead, it appends all
> keys without sorting.  Callers can sort the result if needed.

I'm highly skeptical that this patch actually compiles. :-)
> I'm highly skeptical that this patch actually compiles. :-)

Oops.  It looks like I didn't have DEBUG defined while compiling.
Attachment #752430 - Attachment is obsolete: true
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

This will need a database and IMAP guru to say where we need the list to be sorted.

Have you actually run with the patch to see if it really improves the performance? What are the numbers now?
Attachment #752464 - Flags: feedback?(neil)
Attachment #752464 - Flags: feedback?(mozilla)
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

Sorry, I can't really speculate on whether the keys need to be sorted.
Attachment #752464 - Flags: feedback?(neil)
> Have you actually run with the patch to see if it really improves the
> performance? What are the numbers now?

I discovered the main source of the poor performance by running Thunderbird under Callgrind (a profiler for Linux) for about 6 hours.  nsMsgDatabase::ListAllKeys() was responsible for 55.96% of the total CPU time used by Thunderbird.  I did the same thing after applying my changes and found nsMsgDatabase::ListAllKeys() was now responsible for only 0.13% of the total CPU time.  The relative proportion of time spend in other functions between the two measurements appears pretty much unchanged.
Derrick, was this behavior not happening before and something changed?  Or, was it always happening and you decided it had just become unbearable? :)
Flags: needinfo?(derrick_moser)
OS: Linux → All
Summary: nsMsgDatabase::ListAllKeys() very slow → nsMsgDatabase::ListAllKeys() very slow with high CPU downloading messages into large folder
I had been encountering poor performance problems with Thunderbird for a long time while I was stuck on Fedora 14.  After upgrading to Fedora 18 and discovering the problems still persisted, I started investigating using a debugger and profiler.  This issue, bug 870556, and bug 872497 are the main culprits I found.
Flags: needinfo?(derrick_moser)
bienvenu, feedback thoughts?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(mozilla)
(In reply to Derrick Moser from comment #10)
> I had been encountering poor performance problems with Thunderbird for a
> long time while I was stuck on Fedora 14.  After upgrading to Fedora 18 and
> discovering the problems still persisted, I started investigating using a
> debugger and profiler.  This issue, bug 870556, and bug 872497 are the main
> culprits I found.

Would bug 872497 have been the underlying cause of bug 764306? If so, then we can revert that change and go back to quicksorting the array every time.
I just wonder whether this doesn't shift the work to some other place. You save a binary search (O(log(n)) at each insert (m times), but then you need to sort the array (even if it is sorted, but you don't know it) at some uses, which is O(n.log(n)) I think. In your case n is much larger than m so the new code may be actually slower, just at different places. Derrick, do you have any arguments why I would be wrong?
Flags: needinfo?(derrick_moser)
Status: NEW → ASSIGNED
If derrick is correct, this probably has major perf implications.

(In reply to :aceman from comment #13)
> I just wonder whether this doesn't shift the work to some other place.... 
> Derrick, do you have any arguments why I would be wrong?

We need someone (perhaps Adam?) to take over the patch, as Derrick recently commented in Bug 870556 "I will be unable to make these updates in the foreseeable future due to a lack of spare time.  You are welcome to take over the patch."

But perhaps Derrick will still chime in??
Severity: normal → major
Status: ASSIGNED → NEW
See Also: → 870556
Whiteboard: [patchlove]
Blocks: 1058026
(In reply to :aceman from comment #13)
> I just wonder whether this doesn't shift the work to some other place. You
> save a binary search (O(log(n)) at each insert (m times), but then you need
> to sort the array (even if it is sorted, but you don't know it) at some
> uses, which is O(n.log(n)) I think. In your case n is much larger than m so
> the new code may be actually slower, just at different places. Derrick, do
> you have any arguments why I would be wrong?

Unclear if Derrick will be replying. Perhaps Neil?
Flags: needinfo?(neil)
Flags: needinfo?(mozilla)
Flags: needinfo?(derrick_moser)
(In reply to aceman from comment #13)
> I just wonder whether this doesn't shift the work to some other place. You
> save a binary search (O(log(n)) at each insert (m times), but then you need
> to sort the array (even if it is sorted, but you don't know it) at some
> uses, which is O(n.log(n)) I think. In your case n is much larger than m so
> the new code may be actually slower, just at different places. Derrick, do
> you have any arguments why I would be wrong?

It depends on the distribution of the incoming keys. I switched from sort() to InsertElementSorted() because for (what I believed was) nearly sorted data this is where InsertElementSorted() shines while sort() was (presumably due to the now fixed bug 872497) exhibiting O(n^2) behaviour.

If it transpires that the data isn't actually anywhere near sorted then InsertElementSorted() quickly becomes really slow because of all the memory move operations and indeed it is now the case exhibiting O(n^2) behaviour.
Flags: needinfo?(neil)
Reverting this seems reasonable since the worst case behavior is bad, especially with big databases.
Attachment #752464 - Flags: feedback?(mozilla) → feedback+
(In reply to David :Bienvenu from comment #17)
> Reverting this seems reasonable since the worst case behavior is bad,
> especially with big databases.

Neil, can you take over the patch, since Derrick won't be back?
Flags: needinfo?(neil)
(In reply to Wayne Mery from comment #18)
> Neil, can you take over the patch, since Derrick won't be back?

Given comment #7, I don't know what you expect me to do here, sorry.
Flags: needinfo?(neil)
> (In reply to Wayne Mery from comment #18)
> > Neil, can you take over the patch, since Derrick won't be back?
> 
> Given comment #7, I don't know what you expect me to do here, sorry.

If comment 2 and comment 17 (reverting) don't answer that, then must we ask bienvenu again, or is this something we can decide ourselves?
Flags: needinfo?(irving)
Flags: needinfo?(acelists)
I don't have a deeper understanding of this than just reading the comment logs on all the bugs referenced in comment 12. Based on that, I'd guess that backing out bug 764306 is the next thing to try - we'd want to get Ludo to test that build on his gigantic bugmail folder to make sure it doesn't bring back the beachballs he was suffering in that bug.
Flags: needinfo?(irving)
I have no experience with IMAP.
Maybe the guy from bug 846123 could see something here.
Flags: needinfo?(acelists)
(In reply to :aceman from comment #22)
> I have no experience with IMAP.
> Maybe the guy from bug 846123 could see something here.

Ah, Alfred.  See comment 21. Do you agree?
Flags: needinfo?(infofrommozilla)
(In reply to Wayne Mery (:wsmwk) from comment #23)
> (In reply to :aceman from comment #22)
> > Maybe the guy from bug 846123 could see something here.

First I can confirm the Bug. I measured some timings: (10k messages: 34ms / 100k: 5900ms / 400k: 266s!)
I noticed, that on all cases exactly 200 messages got appended and the rest got sorted. This leads to Bug 464126.

We fetch blocks of 200 messages starting with high ID numbers and store them in the same order in the DB. For the insertionsort algorithm that is the worst case. For each message, the full array must be moved.

With "mail.imap.hdr_chunk_size"=0 (after reloading data) the timings are much better: (10k messages: 16ms / 100k: 16ms / 400k: 109ms)

> Ah, Alfred.  See comment 21. Do you agree?

Yes, Quicksort has acceptable times. With sorted data: 10k: <1ms / 100k: 63ms / 400k: 265ms
And with "mail.imap.hdr_chunk_size"=200:               10k: <1ms / 100k: 32ms / 400k: 125ms
Flags: needinfo?(infofrommozilla)
How about introducing ListAllKeysUnsorted?
    ListAllKeysUnsorted() is simple enumerator of  m_mdbAllMsgHeadersTable without any sorting/indexing.
   If user needs sorted result, simply calls C++ std::sort sort/std::stable_sort for extracted data, or use Array.sort() in JavaScript.
I don't think sorting by messageKey is not always mandatory.
Forcing sort upon any "getting all messageKeys in a MorkDB" is not mandatory. 

Question about current msgDatabase.ListAllKeys.
   Why InsertElementSorted(outOid.mOid_Id) call is needed for each element?
   Why it's not "get all elements from table, then simply sort" like approach?

FYI.
(a) O(n^2) problem in Mozilla's NS_QuickSort is already resolved by  Derrick Moser.
      Bug 872497 - O(n^2) performance in NS_QuickSort
(b) JavaScript's Array.sort() utilizes QuickSort algorithm, but it's never NS_QuickSort of Mozilla which was initially used for temporarily bypass critical sort() problem in Solaris around 1990, which is not used any more in many modules(already changed back to .original sort() or new std::sort() /std::stable_sort() at many places).
"InsertElementSorted for an array" is designed for small array or for "simply add an element to existing array". It's simply same as "array copy" if already sorted data.
"What is bad" is "use of InsertElementSorted for each data in big array in msgDatabse.ListAllKeys", isn't it?
I've confirmed that the current patch works and compiles with current trunk. The consensus seems to be to backout bug 764396, which this patch does (with some other tweaks).

Ludo, are you able to apply and test this? If not, would a try server run help?
(In reply to Kent James (:rkent) from comment #27)
> The consensus seems to be to backout bug 764396, (snip)

bug 764306, isn't it?
(In reply to WADA from comment #28)
> (In reply to Kent James (:rkent) from comment #27)
> > The consensus seems to be to backout bug 764396, (snip)
> 
> bug 764306, isn't it?

Right, sorry for the typo.
What can we do with this bug? It seems the patch is ready, just needs to be tested by somebody?
We need someone with a large email folder to confirm that this seems to solve whatever the original problem is.
(In reply to Kent James (:rkent) from comment #31)
> We need someone with a large email folder to confirm that this seems to solve whatever the original problem is.

I can do such test, because I wrote Objext Rexx Script to generate BerkleyStore file which contains "N mails of X B/KB/MB per a mail".
I used "N=1000, X=4MB" for 4GB Mbox test. If "N=100,000, X=1KB", "100MB folder with 100,000 mails" can be created.
Needless to say, it takes long and wastes HDD space and it takes pretty long to upload to Gmail IMAP server, and it requires prepartion works.

If you need my Rexx Script, I'm ready to provide it to you any time. 
   Edit script, Write directry name for file geretion in a string variable,
   Type "GEN-Big-Thread.REX AAA 1 100000 1 KB 0 1 1 0 ..." at Command Prompt => "AAA-...-..." file is generated.
(In reply to :aceman from comment #30)
> What can we do with this bug? It seems the patch is ready, just needs to be
> tested by somebody?

I had determined the values from comment #24 by removing attachment 634373 [details] [diff] [review] (Bug 764306)
> First I can confirm the Bug. I measured some timings: (10k messages: 34ms / 100k: 5900ms / 400k: 266s!)

> Yes, Quicksort has acceptable times. With sorted data: 10k: <1ms / 100k: 63ms / 400k: 265ms
(In fact, it's bubble sort)

Attachment 752464 [details] [diff] is pretty much the reverse version of that patch

So a reduction from 266 seconds to 265 milliseconds at 400k messages is quite acceptable.
(In reply to Alfred Peters from comment #33)
> So a reduction from 266 seconds to 265 milliseconds at 400k messages is quite acceptable.

Not "266 milliseconds to 265 milliseconds" or "266 seconds to 265 seconds"?
What a big performance gain!

> > Yes, Quicksort has acceptable times. With sorted data: 10k: <1ms / 100k: 63ms / 400k: 265ms
> (In fact, it's bubble sort)

QuickSort(name of algorithm, or qsort() of C) is never bubble sort. It's O( N * logN ) sort, and it's not so bad although is not fastest. Major objective of QuickSort(algorithm) is less memory requirement etc.
What's bad was in our NS_QuickSort() untill change by Bug 872497 was made.
    Bug 872497 - O(n^2) performance in NS_QuickSort
1. NS_QuickSort() was introduced and used around 1990, to avoid critical issue in qsort() of Solaris.
2. But as stated in Bug 72196 comment #3, it was turned out to be wrong
    -- we were calling qsort with an incorrect comparison function which only broke the Solaris implementation of qsort.
    (Bug 72196 - eliminate |NS_QuickSort|, use C's standard |qsort| instead => WONTFIX, because claim was "return to qsort()")
3. However, using standard sort() of C, using std:sort()/std::stable_sort() of C++, was somehow not done.
    Peoples looks to had been loving the NS_QuickSort() pretty deeply.
4. However, O(N^2) issue in our NS_QuickSort() was exposed by Bug 872497.
    If swap of left side element and right side element doesn't occur, NS_QuickSort() did insertion sort for entire block of left side + right side.
    It's pretty fast if entire array is already sorted in ascending order, as bubble sort is fast, because nothing is moved.
    However, if following, worst case happens.
          If all elements in left side < Pivot < all right side elements, swap doesn't occur, so insertion sort is done on entire array.
          If left side of pivot is in descending order, and if right side of pivot is descending order, worst case occurs.
              Insertion sort is executed on array which is sorted in descending order.
5. Fix of Bug 872497.removed this "bubble sort like behavior".
    So, our current NS_QuickSort() is O( N logN ) even when entire array is already sorted in ascending order,
    and O( N logN ) even when worst case, and average is O( N logN ).

I can't understand reason why NS_QuickSort() is still so deeply loved by Mozilla developers.
"Insertion sort of entire array when entire array is already sorted in ascending order" is already removed from NS_QuickSort().
Why code which relies on "Insertion sort of entire array when entire array is already sorted in ascending order" of NS_QuickSort() is used in many places?
What is real advantage of NS_QuickSort() over qsort(), sort(), std::sort(), std::stable_sort()? 
What is actual advantage of NS_QuickSort() over C++ std::sort() and std::stable_sort()?
(In reply to WADA from comment #34)
> What a big performance gain!

Yes, the improvement is from seconds to milliseconds.
Interestingly, the duration for 100k messages is /only/ ~6 seconds.
I think at around 100k messages the array no longer fits into the CPU cache.

> > > Yes, Quicksort has acceptable times. With sorted data: 10k: <1ms / 100k: 63ms / 400k: 265ms
> > (In fact, it's bubble sort)
> 
>  [sort algorithm]

I have not verified which algorithm is actually used. In Bug 764306 was first spoken of quick sort and then of bubble sort. Any algorithm is better than the worst case of insertion sort, we are using right now.
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

Review of attachment 752464 [details] [diff] [review]:
-----------------------------------------------------------------

> nsMsgDatabase::ListAllKeys() no longer attempts to keep the list sorted as it
> inserts new keys (this led to poor performance).  Instead, it appends all keys
> without sorting.  Callers can sort the result if needed.

Callers of ListAllKeys().
http://mxr.mozilla.org/comm-central/search?string=listallkeys%28
I think sort() is better explicitly called in some cases.
  http://mxr.mozilla.org/comm-central/source/mailnews/imap/src/nsAutoSyncState.cpp#343
  Matching/merge of existent UIDs and known UIDs to find "not auto-sync'ed yet" UIDs.

"Backout of patch for Bug 764306" means that problem of Bug 764306 comes back.
What is new solution of Bug 764306? What was actual cause of problem?
Bug 764306 was problem of (a) "beachballs with high CPU when scroll at thread pane of large folder" (b) "beachballs with high CPU when switch some big folders".

This is "Display at thread pane according to order in Thread Pane View". If so, "sort by key=UID if imap" is useless unless ordered in "Order Received" or "Date(approximately UID order)" at Thread pane, isn't it? "Sort by sort order of Thread Pane" is nedded, isn't it?
Summary: nsMsgDatabase::ListAllKeys() very slow with high CPU downloading messages into large folder → nsMsgDatabase::ListAllKeys() very slow with high CPU downloading messages into large folder. Backout of patch for Bug 764306 is mandatory.
Question on Bug 764306 and patch of Bug 764306 and "closing Bug 764306 as FIXED."
   Was patch for Bug 764306 actually resolved all poblems of Bug 764306?
   If yes, how did you verify that the patch of Bug 764306 resolved all problems of Bug 764306?
(In reply to WADA from comment #36)

> "Backout of patch for Bug 764306" means that problem of Bug 764306 comes
> back.
> What is new solution of Bug 764306? What was actual cause of problem?

The question was answered by Neil in comment 16 - right?
(In reply to Alfred Peters from comment #39)
> (In reply to WADA from comment #36)
> > "Backout of patch for Bug 764306" means that problem of Bug 764306 comes back.
> > What is new solution of Bug 764306? What was actual cause of problem?
> The question was answered by Neil in comment 16 - right?

I think so.
  Cause was perhaps "result of ListAllKeys is not sorted".
  Bug 872497 of NS_QuickSort() was perhaps known(note: it's never problem of sort() nor qsort()).
  He believed that array data is already sorted in ascending order at almost all places in data array.
  So, used InsertElementSorted() instead of sort().
If so, solution is obvious : call std:sort() or std:stable_sort() at caller side of ListAllKeys.

FYI #1.
If mail in local mail folder is uploaded from Thraed Pane View with ascening order in Order Received(or Date) is uploaded  to imap server,
newest one(mail at bottom of thread pane) is uploaded first, aand oldest one (mail at top of thread pane).
So, "smallest UID=newest mail, largest UID=oldest mail" happens by upload  to imap server from local mail folder.
A known technique to get "smallest UID=oldest mail, largest UID=newest mail" :
    Sort by Date in descending order at local mail folder, select mails, then upload to imap folder.

FYI #2.
Sort at thread pane already supports secondary sort(sorted by two columns), but it's not stable sort, because sort() or qsort() or NS_QuickSort() is used in sorting.
Simplest/easiest solution of it exists on earth : use C++'s std:stable_sort().
is 

as
ListAllKeys(keys) is called by nsMsgDatabase::DumpContents() upon msgDB open.
In DumpContents(), it doesn't look that sort is needed, because it simply extracts data of each msgDBHdr and  gathers threadid.
Cause of  Bug 764306 was actually "result of ListAllKeys is not sorted in messageKey(==UID if imap) order"?
Cause is that user of msgDBHdr array(who shows mails in Thread Pane) doesn't sort msgDBHdr array as he needs before accessing entire msgDBHdr array, isn't it? "Sort of result of ListAllKeys in messageKey(==UID if imap) order" is sufficient for Bug 764306?
Or Bug 764306 is another O(N^2) issue in worst case(or not imaginary case, for example, not-sorted-in-ascending-order)?
Why patch for this bug is still not landed yet? What's wrong in the patch?
I believe that "gain by (A) is far greater than loss by (B)" is apparently obvious.
   (A) By Backout of patch for Bug 764306, this bug is automatically resolved.
   (B) By backout of patch for Bug 764306, Bug 764306 comes back.
(In reply to WADA from comment #43)
> Why patch for this bug is still not landed yet? What's wrong in the patch?
> I believe that "gain by (A) is far greater than loss by (B)" is apparently
> obvious.
>    (A) By Backout of patch for Bug 764306, this bug is automatically
> resolved.
>    (B) By backout of patch for Bug 764306, Bug 764306 comes back.

Well technically nobody has actually formally requested this in a review request. I've been watching the discussion to see if there is a consensus. I'm not seeing any objections to landing the backout. WADA has stated a clear preference. Any objections to moving forward with the backout?
About "Sort of ListAllKeys result".
If Virtual Folder(Search Folder, Unified Folder), I think "result of ListAllKeys of multiple folders of multiple accounts" is merged.
I believe std:stable_sort() is better if "sort by messageKey" is needed, because duplicate messageKey occurs.
  when virtual folder, logically, unique-key = msgFolder(==account + Mbox) + messageKey .
  order of account, mbox, is better kept for proper threading in Virtual Folder.
Target Milestone: --- → Thunderbird 38.0
Keywords: regression
Whiteboard: [patchlove] → [patchlove][regression by Bug 764306]
Priority: -- → P1
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on c-c, etc.): 
Risk to taking this patch (and alternatives if risky):

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on c-c, etc.): 
Risk to taking this patch (and alternatives if risky):

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on c-c, etc.): 
Risk to taking this patch (and alternatives if risky):

[Approval Request Comment]
Regression caused by (bug #): 
User impact if declined: 
Testing completed (on c-c, etc.): 
Risk to taking this patch (and alternatives if risky):

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]:
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: 
[String/UUID change made/needed]:
Attachment #752464 - Flags: approval-comm-release?
Attachment #752464 - Flags: approval-comm-esr31?
Attachment #752464 - Flags: approval-comm-beta?
Attachment #752464 - Flags: approval-comm-aurora?
(In reply to Kent James (:rkent) from comment #44)
> Well technically nobody has actually formally requested this in a review
> request. I've been watching the discussion to see if there is a consensus.
> I'm not seeing any objections to landing the backout. WADA has stated a
> clear preference. Any objections to moving forward with the backout?

Maybe it should be asked in bug 764306 too?
Has anybody tried to repair/compact the folder? See bug 764306 comment 22. It is necessary for patch in bug 764306 to work right. Only after that the initial arrays of IDs is sorted and the insertion works right (actually changes to appends which are cheap).

(In reply to Alfred Peters from comment #24)
> We fetch blocks of 200 messages starting with high ID numbers and store them
> in the same order in the DB. For the insertionsort algorithm that is the
> worst case. For each message, the full array must be moved.
So in that case, maybe we could iterate the database in nsMsgDatabase::ListAllKeys in reverse order, but I am not sure it is possible and/or performant.

Does this mean even if the folder is repaired (list of keys is sorted in DB), then subsequent fetches from server break the ordering because they fetch from max UID to minimum?
Suppose you have UIDs in the DB as 0-1000, and there are 1000 new msgs on the server, does the result after downloading become 0-1000,2000-1801,1800-1601,1600-1401,1400-1201,1200-1001 (thus 0-1000,2000-1001), which is simply not sorted lowest to highest?

If we do not revert bug 764306 verbatim, but actually push the patch here (which is only partial revert of 764306), it is quite possible that bug 764306 will not return back. Because it was caused by a sort that was then optimized by the patch there. But it is not sure if that sort is actually needed for it. So the patch here removes any sort (whether quicksort or insertElementsorted). We "just" need to see all callers of ListAllKeys whether they depend on the array being sorted (maybe that is the hardest part:)).
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

Review of attachment 752464 [details] [diff] [review]:
-----------------------------------------------------------------

So I have went through all the callers and here are my opinions:

    mailnews/base/src/nsMsgFolderCompactor.cpp
        line 106 -- nsresult rv = db->ListAllKeys(m_keyArray); 

In a POP3 mbox file, the keys are often the same as offset of the message in the file or (previous key+1). Thus they are an increasing counter as messages come in. They never decrease, only at compact time (the file here). So they should be ordered in the database (in increasing order). So we probably do not need to sort here. If, then better would be the msgHdr.offset so that we read the mbox file sequentially, without seeking.
IF we compact the mbox backend for IMAP or News the keys can be unsorted in the DB, but we can't fix that. Again, sorting by offsets would be useful.
(This file bypasses the methods of nsMsgKeyArray and directly accesses m_key. I should look at that later.)

    mailnews/imap/src/nsAutoSyncState.cpp
        line 343 -- rv = database->ListAllKeys(keys); 

I am not sure here. There is some iteration over the array (mExistingHeadersQ) with the lastIdx pointer which is not atomic (goes in batches across the calls to the function). So for safety we should sort the array to keep current state.

    mailnews/imap/src/nsImapMailFolder.cpp
        line 2677 -- rv = mDatabase->ListAllKeys(keys);

Covered in the patch already.
 
    mailnews/local/src/nsPop3IncomingServer.cpp
        line 169 -- rv = subFolderDB->ListAllKeys(keys); 

Tries to copy some messages to a deferred to account. For POP3, keys should be ordered already (they represent the order of receiving or coping).

    mailnews/news/src/nsNNTPArticleList.cpp
        line 49 -- rv = m_newsDB->ListAllKeys(keys); 

There is code further down the file where the keys (as m_idsInDB) are iterated over, individual indexes are fetched in a contiguous manner (with m_dbIndex) and it looks like AddArticleKey() runs through the array while the keys are less than the new added "key" and deletes them. I think that requires sorted elements in m_idsInDB.

I'll update the patch if the original author is away.

::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
@@ +5074,5 @@
>    if (!keys)
>      return NS_ERROR_OUT_OF_MEMORY;
>    nsresult rv = ListAllKeys(keys);
> +  // Sort to make debugging easier
> +  keys->Sort();

I think we should not sort here for easier debugging of bugs like this one:) Let us see the real order of the elements.

::: mailnews/imap/src/nsImapMailFolder.cpp
@@ +2752,5 @@
>    {
>      uint32_t boxFlags;
>      aSpec->GetBox_flags(&boxFlags);
> +    // FindKeysToDelete and FindKeysToAdd require sorted lists
> +    existingKeys.Sort();

Yes, this seems correct, we sorted existingKeys before (but relying on ListAllKeys to do it if ListAllOfflineDeletes didn't change the existingKeys array).
Comment on attachment 752464 [details] [diff] [review]
Improve performance of nsMsgDatabase::ListAllKeys()

I think you also said this patch is not correct yet, so we can't land it anywhere. I'll prepare a new patch.
Attachment #752464 - Flags: approval-comm-release?
Attachment #752464 - Flags: approval-comm-esr31?
Attachment #752464 - Flags: approval-comm-beta?
Attachment #752464 - Flags: approval-comm-aurora?
FYI.
When new msgDBHdr for newly arrived mail is added, messageKey is increased.
However, if IMAP and IMAP delete model="move to trash" or "remove it imeediately"(i.e. not "Just mark it as deleted"), msgDBHdr is removed when \Deleted flag is stored. So, if "undelete of mail which was flagged as \Deleted" happens, "order of msgHdr in msgDatabase" != "order of messageKey==UID if IMAP" happens.
(In reply to :aceman from comment #48)
> Has anybody tried to repair/compact the folder? See bug 764306 comment 22.
> It is necessary for patch in bug 764306 to work right. 

I can not tell if that changes anything. But I don't think so. With 'Repair Folder' at last the Headers are downloaded again. And that is performed in the Bug 464126 manner.

> (In reply to Alfred Peters from comment #24)
> > We fetch blocks of 200 messages starting with high ID numbers and store them
> > in the same order in the DB. For the insertionsort algorithm that is the
> > worst case. For each message, the full array must be moved.
> So in that case, maybe we could iterate the database in
> nsMsgDatabase::ListAllKeys in reverse order, but I am not sure it is
> possible and/or performant.

If a user has accessed their e-mails continuously, they are sorted in ascending order.
So you can't predict a sort order.

I think it would be useful to store and hold the database already sorted.

> Does this mean even if the folder is repaired (list of keys is sorted in
> DB), then subsequent fetches from server break the ordering because they
> fetch from max UID to minimum?

Yes.

> Suppose you have UIDs in the DB as 0-1000, and there are 1000 new msgs on
> the server, does the result after downloading become
> 0-1000,2000-1801,1800-1601,1600-1401,1400-1201,1200-1001 (thus

Not even that. There is no rule, but in most cases, the servers deliver the messages in ascending order. In that case the order would be:

0-1000,1801-2000,1601-1800,1401-1600,1201-1400,1001-1200
 
>   So the patch here removes any sort (whether quicksort or
> insertElementsorted). We "just" need to see all callers of ListAllKeys
> whether they depend on the array being sorted (maybe that is the hardest

So you want to sort the array only when it is needed? But that could ultimately mean that it will be sorted multiple times.
(In reply to Alfred Peters from comment #52)
> >   So the patch here removes any sort (whether quicksort or
> > insertElementsorted). We "just" need to see all callers of ListAllKeys
> > whether they depend on the array being sorted (maybe that is the hardest
> 
> So you want to sort the array only when it is needed? But that could
> ultimately mean that it will be sorted multiple times.

No, it should not mean that. Usually the pattern is like this:
ListAllKeys(array1) // returns sorted list
array2 = array1->m_keys
...
do all the logic on array2
...

The patch here unrolls that to this:
ListAllKeys(array1) // returns unsorted list
array1.Sort();
array2 = array1->m_keys
...
do all the logic on array2
...

But only for places that actually need the array to be sorted. In those cases the number of sorts should not increase whatever we do. In case the caller does not need a sorted list, we save one sort.
Attached patch WIP patch v2 (obsolete) — Splinter Review
OK, let's try this.
Aryx, please push to try server, all tests.
Attachment #752464 - Attachment is obsolete: true
Flags: needinfo?(archaeopteryx)
(In reply to Alfred Peters from comment #52)
> Not even that. There is no rule, but in most cases, the servers deliver the messages in ascending order.

But, it's not guranteed.

> So you want to sort the array only when it is needed? But that could ultimately mean that it will be sorted multiple times.

If so, "Sorted attribute" may be used.
If local folder, "order of msgDBHdr" == "order of messageKey in ascending order" is guaranteed by Tb's desugn/implementation/code.
   This was reason why O(n^2) problem was not exposed for long time, even if bad logic, even if many/needless NS_QuickSort() calls.
If imap folder, it's not guaranteed by server nor by Tb(Tb removes msgDBHdr when \Deleted is stored, and adds when \Deleted is removed).
When "uid fetch 1:* Flags", order of uid is not gurnteed, even if all known servers returns in ascending order of UID.
When "uid fetch x:y body[ ... ]" is issued, order of UID in response is not guaranteed. At least Gmail IMAP doesn't always keep order.
  ListAllKeys(keyArray,"Dump" mode or "Sorted" mode) :
    If local  folder, do Dump mode, then keyArray.sorted=true.
    If imap folder, do Dump mode. if Dump option, keyArray.sorted=false. if Sorted option, do std::sort(), then keyArray.sorted=true.
  Caller side :
    If keyArray.sorted=false but he wants sorted result, do std::sort(), keyArray.sorted=true.
    If he adds element, if order is kept, keep keyArray.sorted=true. if order is not guranteed, keyArray.sorted=false.
(In reply to WADA from comment #56)
>   ListAllKeys(keyArray,"Dump" mode or "Sorted" mode) :
>     If local  folder, do Dump mode, then keyArray.sorted=true.
>     If imap folder, do Dump mode. if Dump option, keyArray.sorted=false. if
> Sorted option, do std::sort(), then keyArray.sorted=true.
>   Caller side :
>     If keyArray.sorted=false but he wants sorted result, do std::sort(),
> keyArray.sorted=true.
>     If he adds element, if order is kept, keep keyArray.sorted=true. if
> order is not guranteed, keyArray.sorted=false.

Yes, this would be a nice feature and I have actually partially implemented it in the patch (see the m_sorted member). It detects if .sort() was called on the array and the an .appendElement() was called on it with some new key. If m_sorted is true it could automatically convert to insertElementSorted and guarantee that the array is still sorted. But I think I have not found any user of such functionality.
As I said, many of the callers do:
ListAllKeys(array1) // returns unsorted list
array1.Sort();
array2 = array1->m_keys

And then we can't track what happens to array2 as it no longer is a nsMsgKeyArray.
Comment on attachment 8576940 [details] [diff] [review]
WIP patch v2

So I see no obvious problems in the try run. The test_nntpPasswordFailure.js failure is on TB-trunk too.
Attachment #8576940 - Flags: review?(neil)
Attachment #8576940 - Flags: review?(Pidgeot18)
(In reply to :aceman from comment #58)
> WIP patch v2

> void sort();
Is this C or C++ standard sort? If so, I think std::sort() of C++ is better.
(In reply to :aceman from comment #57)
> But I think I have not found any user of such functionality.
> As I said, many of the callers do:
> ListAllKeys(array1) // returns unsorted list
> array1.Sort();
> array2 = array1->m_keys

If local mail folder(already sorted in absolute ascending order), and if NS_QuickSort() before fix of Bug 872497, array1.Sort() did do nothing because action by our NS_QuickSort() was same as bubble sort.
However, when array is already sorted in absolute ascending order, our NS_QuickSort() after fix of Bug 872497 does do ordinal QuickSort as all other standard sort program does do his own algorithm. So, it takes O( N logN ) time as standard sort programs takes.
I believe sort call on array of "already sorted in absolute ascending order" is better avoided if "already sorted in absolute ascending order" is already known.

Or, change array1.sort() to :
    Check order of elements in array.
    If array is already sorted in absolute ascending order, call NS_QuickSort() before fix of Bug 872497, else call std::sort().
    If array is already sorted in absolute ascending order, our NS_QuickSort() before fix of Bug 872497 is fastest sort program on earth :-)
Simple JavaScript code to check run time of ListAllKeys of folder selected at folder pane.
How to use : Install "Custom Buttons" addon, add ToolBar button, paste Script code.
Note: If Virtual Folder, retuns 0 keys, because ListAllKeys of Virtual Folder is not  supported.
> var MyConsole = Components.classes["@mozilla.org/consoleservice;1"]
>                 .getService(Components.interfaces.nsIConsoleService);
> var gFolderDisplay   = window["gFolderDisplay"] ;
> var msgFolder        = gFolderDisplay["displayedFolder"] ;
> var selectedMessages = gFolderDisplay["selectedMessages"] ;
> var keyArray = Components.classes["@mozilla.org/messenger/msgkeyarray;1"]
>                .createInstance( Components.interfaces.nsIMsgKeyArray ) ;
> 
> var msg=new Array();
> var NumRun = 3;
> for(var rr=0;rr<NumRun;rr++)
> {
>      var Data = {};
>      Data[ "C_0" ] = new Date(); msgFolder.msgDatabase.ListAllKeys(keyArray) ;
>      Data[ "C_1" ] = new Date();
>      msg[ msg.length ] = "Run = " + rr + ", NumItem = " + keyArray.length
>           + ", ListAllKeys = " + ( Data[ "C_1" ] - Data[ "C_0" ] ) ; 
> }
> MyConsole.logStringMessage( msg.join("\n") );
> return;
FYI.
mozilla.general of news.mozill.org has 93420 articles.
After subscribe, download all headers, ListAllKeys() of Tb 31.5.0 took 10 milliseconds to 15 milliseconds.
i.e. 
When array is already sorted in absolute ascending order, "Insertion sort technique used by patch for Bug 764306" works pretty well.
(note: there is no need to do insertion sort in this case, because array is already sorted in absolute ascending order.)
If only a few elements are not placed in sorted order, it works sufficiently fast too because it's insertion sort.

Problem of O(N^2) is exposed when reversed order exists in array and array size is not small, in both "NS_QuickSort) before patch for Bug 872497" and "current patch for for Bug 764306".
I think "Why the current patch for Bug 764306 resolved problem of Bug 764306" is:
   Time required for 1 * "insertion sort by patch for entire array  which contains partial or big reversed order"
                           + N * "NS_QuickSort() of array which is already sorted in ascending order"
   <<<
   Time required for Several * "insertion sort by NS_QuickSort() for some part of array which contains partial or big reversed order"
This bug is for that cost of 1 * "insertion sort of entire array by current patch of Bug 764306" is too high when array is not sorted yet.

After this bug is fixed, array1.Sort(); of following is needed for problem of Bug 764306.
> ListAllKeys(array1) // returns unsorted list
> array1.Sort();
> array2 = array1->m_keys
aceman, where will the "array1.Sort()" be executed for problem of Bug 764306?
If mozilla.general of news.mozill.org(93420 articles) is subscribed and all headers are download, following can be observed.
    Changing Threaded View mode/Non threaaded view mode take long. (This is known issue)
    And, (a) time required for "threaded view -> non threaded view" is greater than (b) time required for "non threaded view -> threaded view".
(a) is similar to "Beachball when folder is opened and shown at thread pane" of Bug 764306, although critical slowness of Bug 764306 is not observed.
Why (a) is slower than (b)? Is this bug relevant to it? Is there any O(N^2) issue in it?
With mozilla.general of news.mozill.org(93420 articles, subscribed and all headers are download), "Beachball when folder is opened and shown at thread pane, with unthreaded view mode" of Bug 764306 is always observed upon each news group open even though Tb 31.5.0 in which Bug 764306 is already resolved.
This is different problem from Bug 764306? Or due to this bug?
Or issue of "heavy workload in retrieving id of many news articles"?(NNTP unique issue, known issue)
Comment on attachment 8576940 [details] [diff] [review]
WIP patch v2

Review of attachment 8576940 [details] [diff] [review]:
-----------------------------------------------------------------

I've not done a thorough review, and I don't really have the time to do so in the near future, so I'll bounce the review to rkent.

::: mailnews/base/util/nsMsgKeyArray.cpp
@@ +51,5 @@
>  NS_IMETHODIMP nsMsgKeyArray::AppendElement(nsMsgKey aKey)
>  {
> +#ifdef DEBUG
> +  if (m_sorted && m_keys.Length() > 0 && aKey <= m_keys[m_keys.Length()-1])
> +    NS_WARNING("Inserting a new key at wrong position in a sorted key list!");

I would upgrade this from a warning to an assertion, personally.

::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
@@ +3158,5 @@
>    return NS_OK;
>  }
>  
> +// The returned keys are returned in the order as stored in the database.
> +// The caller should sort them if it needs to.

This should be listed in the IDL documentation for nsIMsgDatabase::ListAllKeys, not a header comment in the implementation function.
Attachment #8576940 - Flags: review?(Pidgeot18) → review?(rkent)
(In reply to Joshua Cranmer [:jcranmer] from comment #65)
> Comment on attachment 8576940 [details] [diff] [review]
> WIP patch v2
> 
> Review of attachment 8576940 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I've not done a thorough review, and I don't really have the time to do so
> in the near future, so I'll bounce the review to rkent.
We already have Neil. But I chose you for the one place that touches News.

> ::: mailnews/base/util/nsMsgKeyArray.cpp
> @@ +51,5 @@
> >  NS_IMETHODIMP nsMsgKeyArray::AppendElement(nsMsgKey aKey)
> >  {
> > +#ifdef DEBUG
> > +  if (m_sorted && m_keys.Length() > 0 && aKey <= m_keys[m_keys.Length()-1])
> > +    NS_WARNING("Inserting a new key at wrong position in a sorted key list!");
> 
> I would upgrade this from a warning to an assertion, personally.
Yes, that would be possible (I even run tests with an assertion there to spot any problem).
But it is not sure if the whole #ifdef DEBUG blocks actually stay in the patch.

> ::: mailnews/db/msgdb/src/nsMsgDatabase.cpp
> @@ +3158,5 @@
> >    return NS_OK;
> >  }
> >  
> > +// The returned keys are returned in the order as stored in the database.
> > +// The caller should sort them if it needs to.
> 
> This should be listed in the IDL documentation for
> nsIMsgDatabase::ListAllKeys, not a header comment in the implementation
> function.
Yes, thanks.
(In reply to :aceman from comment #66)
> (In reply to Joshua Cranmer [:jcranmer] from comment #65)
> > Comment on attachment 8576940 [details] [diff] [review]
> > WIP patch v2
> > 
> > Review of attachment 8576940 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > I've not done a thorough review, and I don't really have the time to do so
> > in the near future, so I'll bounce the review to rkent.
> We already have Neil. But I chose you for the one place that touches News.

For what it's worth, the news portion was the only part I looked at in depth, mostly to assure myself that it really did need to be sorted and that sorting the key array instead of db list array was correct.
Comment on attachment 8576940 [details] [diff] [review]
WIP patch v2

Review of attachment 8576940 [details] [diff] [review]:
-----------------------------------------------------------------

I focused my review on trying to get a patch that we could land in TB 38. Time is getting short, but I think we can move forward. Better to land and test a patch on TB 39 that can also land in TB 38 rather than try to have a separate patch for TB 38.

I would prefer if you correct this patch ASAP rather than wait for more reviews. Include my comments as well as jcranmer's.

::: mailnews/base/public/nsIMsgKeyArray.idl
@@ +11,5 @@
>  
>  #include "nsISupports.idl"
>  #include "MailNewsTypes2.idl"
>  
> +[scriptable, uuid(e8292cf9-fc1f-4318-879c-27ec8addaca4)]

This is tracking-tb38 which means that we don't want an interface change. So leave the interface alone.

@@ -36,5 @@
> -   * @param key to insert into the array.
> -   */
> -  void insertElementSorted(in nsMsgKey aMsgKey);
> -
> -  /**

Don't remove the insertElementSorted, instead just return NS_ERROR_NOT_IMPLEMENTED in the implementation. We need to leave this for the interface freeze.

::: mailnews/base/util/nsMsgKeyArray.cpp
@@ -50,5 @@
>  
> -NS_IMETHODIMP nsMsgKeyArray::InsertElementSorted(nsMsgKey aKey)
> -{
> -  m_keys.InsertElementSorted(aKey);
> -  return NS_OK;

replace this with return NS_ERROR_NOT_IMPLEMENTED so that we can land on tb-38.

::: mailnews/imap/src/nsImapMailFolder.cpp
@@ +2779,4 @@
>  
>    }
>    else if (!flagState /*&& !NET_IsOffline() */) // if there are no messages on the server
>      keysToDelete = existingKeys;

Why do you believe that keysToDelete does not need to be sorted?
Attachment #8576940 - Flags: review?(rkent) → review-
Assignee: nobody → acelists
Status: NEW → ASSIGNED
Target Milestone: Thunderbird 38.0 → ---
(In reply to Kent James (:rkent) from comment #68)
> >    }
> >    else if (!flagState /*&& !NET_IsOffline() */) // if there are no messages on the server
> >      keysToDelete = existingKeys;
> 
> Why do you believe that keysToDelete does not need to be sorted?
The places it is used in (like MsgGetHeadersFromKeys() and mDatabase->DeleteMessages()) do not seem to require any particular ordering.
Attached patch patch v3Splinter Review
Attachment #8576940 - Attachment is obsolete: true
Attachment #8576940 - Flags: review?(neil)
Attachment #8584122 - Flags: review?(rkent)
Comment on attachment 8584122 [details] [diff] [review]
patch v3

Review of attachment 8584122 [details] [diff] [review]:
-----------------------------------------------------------------

OK looks good to me. I'm going to land this, and try to push soon into aurora. There's a bit of risk to this, so I'd like to see it on aurora for a few days before we do a beta with it.
Attachment #8584122 - Flags: review?(rkent)
Attachment #8584122 - Flags: review+
Attachment #8584122 - Flags: approval-comm-aurora+
https://hg.mozilla.org/comm-central/rev/2806c04400e7
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 39.0
You need to log in before you can comment on or make changes to this bug.