720677 - Use timsort algorithm to implement array_sort

Santiago Gimeno

Reporter

Description

•

14 years ago

User Agent: Mozilla/5.0 (X11; Linux i686; rv:9.0.1) Gecko/20100101 Firefox/9.0.1 Build ID: 20111220165912

Santiago Gimeno

Reporter

Comment 1

•

14 years ago

As suggested by Luke Wagner in https://bugzilla.mozilla.org/show_bug.cgi?id=715265#c25 it would be nice to implement array_sort using the timsort algorithm (http://en.wikipedia.org/wiki/Timsort) and check if there's any performance boost.

Santiago Gimeno

Reporter

Comment 2

•

14 years ago

Using a radix sort for the case of an all integers array would also be nice. See: https://bugzilla.mozilla.org/show_bug.cgi?id=715265#c27

Santiago Gimeno

Reporter

Comment 3

•

13 years ago

Attached patch TimSort implementation (obsolete) — Details — Splinter Review

Timsort implementation based mainly on the python impl with some ideas taken from the OpenJDK impl. It passes the tests but I'm not sure if there's any performance improvement: I've tried the Sunspider tests and from my interpretation the results didn't look conclusive

Assignee: general → santiago.gimeno

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Attachment #598916 - Flags: review+

Santiago Gimeno

Reporter

Updated

•

13 years ago

Attachment #598916 - Flags: review+

Santiago Gimeno

Reporter

Updated

•

13 years ago

Attachment #598916 - Flags: review?(luke)

Luke Wagner [:luke]

Comment 4

•

13 years ago

Whoa, cool! This is really exciting. CC'ing igor because I think he has the most history with the current sort impl and I think would ultimately be the best reviewer. However, before getting to the technical code review, I think we'll need some careful measurement. In particular: - how does the patch compare for the obvious synthetic generated sets (ordered ascending/descending, mostly ordered ascending/descending, random) at 1K, 10K, 100K, 1M, 10M elems? - how does it do for a user-specified comparator (calling a comparator is really slow, so does the algo increase/decrease the total number of calls)? timsort seems to have proven itself several times over so I expect these will all be positive, but something as important and multi-facted as sorting deserves our due diligence :) Lastly, if you are feeling up to it, I think it would be really interesting to instrument sort() and then browse around to collect a GB or so of *actual* sort() inputs :)

Santiago Gimeno

Reporter

Comment 5

•

13 years ago

Attached patch Moved timsort helper functions to detail namespace — Details — Splinter Review

Attachment #598916 - Attachment is obsolete: true

Attachment #598916 - Flags: review?(luke)

Attachment #612903 - Flags: review?(luke)

Santiago Gimeno

Reporter

Comment 6

•

13 years ago

Attached patch Add functionality to instrument the sorting algorithms — Details — Splinter Review

Added a variable to count the number of comparisons made in every sort and the duration of the sort. It should be applied after the previous patch

Attachment #612904 - Flags: review?(luke)

Santiago Gimeno

Reporter

Comment 7

•

13 years ago

Attached file Javascript functions to generate arrays of integers — Details

It can generate random, mostly ascending and mostly descending arrays of integers

Santiago Gimeno

Reporter

Comment 8

•

13 years ago

Attached file Bash script to generate multiple test files — Details

It generates files to test sorting of arrays of integers of size (100, 1K, 10K, 100K, 1M and 10M) and with different distribution (random, mostly ascending and mostly descending)

Santiago Gimeno

Reporter

Comment 9

•

13 years ago

Attached file Bash script to perform the tests generated by generator.sh — Details

It generates files with the result of every test with the format: filetest|n_comparisons|duration(usecs).

Luke Wagner [:luke]

Comment 10

•

13 years ago

Ah, great. Could you post the results in the bug?

Santiago Gimeno

Reporter

Comment 11

•

13 years ago

I have attached several files to test the performance of MergeSort and TimSort - I have tried to modify the code to instrument both MergeSort and Timsort. For that I have added a variable to store the number of comparisons made in every sort and I have also calculated the duration of every sort. Then I have printed the result to stdout. (see patch.720677.instrument). To calculate the duration I've used gettimeofday() function, but I don't know how portable it is. Three files to generate test files and perform the tests. This files should be placed in the js/src directory: - One javascript (generate_random_integers.js) file to generate different types of arrays of integers (random, mostly ascending and mostly descending) of different sizes (100, 1K, 10K, 100K, 1M and 10M). - The generator.sh script generates 100 test files for every kind of test (random: 100, 1K, 10K, 100K, 1M, 10M; mostly_ascending: 100, 1K, 10K, 100K; mostly_descending: 100, 1K, 10K, 100K). These files are stored in their corresponding directories: random/100, random/1K, ... mostly_ascending/100, mostly_ascending/1K, ..., mostly_descending/100, mostly_descending/1K - The sort.sh script performs all the tests and stores the results in files with the format: test_file|n_comparisons|duration(usecs). One file for every kind of test: random/100-result, random/1K-result, ... , mostly_ascending/100-result, ..., mostly_descending/100-result, ... I have performed the tests in my machine and the results are: Random integer array - #Comparisons ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 631.06 ┃ 506.73 ┃ + 19.7017716223 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 9319.41 ┃ 7625.16 ┃ + 18.1797989358 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 125538.06 ┃ 84237.32 ┃ + 32.8989790029 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 1655529.13 ┃ 1074568.85 ┃ + 35.0921206684 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1M ┃ 19376353.81 ┃ 13610808.82 ┃ + 29.7555724185 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10M ┃ 225139408.09 ┃ 136201648.16 ┃ + 39.5034173202 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ Random integer array - Duration(usec) ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 15 ┃ 20.44 ┃ - 36.2666666667 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 165.5 ┃ 170.97 ┃ - 3.3051359517 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 2091.5 ┃ 1600.53 ┃ + 23.474539804 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 27346 ┃ 18808.95 ┃ + 31.2186425803 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1M ┃ 329822.5 ┃ 231724.13 ┃ + 29.7427767966 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10M ┃ 3791602 ┃ 2123680.14 ┃ + 43.9898982013 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ Mostly ordered ascending(90%) integer array - #Comparisons ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 390.98 ┃ 385.71 ┃ + 1.347895033 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 6633.55 ┃ 5716.4 ┃ + 13.8259303088 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 98219.84 ┃ 64701.44 ┃ + 34.1258955421 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 1381956.4 ┃ 882863.48 ┃ + 36.1149541331 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ Mostly ordered ascending(90%) integer array - Duration(usec) ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 10 ┃ 15.4 ┃ - 54 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 105 ┃ 108.01 ┃ - 2.8666666667 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 1344 ┃ 987.65 ┃ + 26.5141369048 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 18635 ┃ 12413.72 ┃ + 33.3849208479 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ Mostly ordered descending(90%) integer array - #Comparisons ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 458.55 ┃ 489.51 ┃ - 6.75171737 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 7951.3 ┃ 5725.15 ┃ + 27.9973086162 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 113109.02 ┃ 57286.85 ┃ + 49.3525361638 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 1535278.93 ┃ 804653.18 ┃ + 47.5891211508 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ Mostly ordered descending(90%) integer array - Duration(usec) ┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓ ┃ #Samples ┃ MergeSort ┃ TimSort ┃ %improvement ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100 ┃ 10.5 ┃ 19.3 ┃ - 83.8095238095 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 1K ┃ 123 ┃ 139.56 ┃ - 13.4634146341 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 10K ┃ 1551.5 ┃ 1166.47 ┃ + 24.8166290686 ┃ ┣━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━━┫ ┃ 100K ┃ 20648 ┃ 14468.71 ┃ + 29.9268209996 ┃ ┗━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━━┛ * Note 1: I have not created test sets for mostly_ascending and mostly_descending arrays with sizes of 1M and 10M because it took a VERY long time to create them with my method. * Note 2: I have the test files I've generated and used to get these results but the data is huge. I can try to post it somewhere if you want.

Santiago Gimeno

Reporter

Comment 12

•

13 years ago

BTW, I think one improvement could be made to the patch. The merge vector used by TimSort does not need to be of size 2*n but I think one of size n is enough. This could be used for all code paths except for the case where ValueToStringBuffer is used that needs the vector to be of 2*n size

Igor Bukanov

Updated

•

13 years ago

Attachment #612903 - Flags: review?(luke) → review?(igor)

Igor Bukanov

Updated

•

13 years ago

Attachment #612904 - Flags: review?(luke) → review?(igor)

Igor Bukanov

Updated

•

13 years ago

Attachment #612907 - Attachment mime type: application/x-shellscript → text/plain

Igor Bukanov

Updated

•

13 years ago

Attachment #612905 - Attachment mime type: application/javascript → text/plain

Igor Bukanov

Updated

•

13 years ago

Attachment #612908 - Attachment mime type: application/x-shellscript → text/plain

Igor Bukanov

Comment 13

•

13 years ago

Calling array.sort() on integer array typically is a bug as it converts integers to strings and then uses lexicographical string compare and [1 2 10] is sorted as [1 10 2]. So could you redo the tests using array.sort(function(a, b) { return a - b; }) and separately test arrays of non-numeric strings using array.sort() As for the time function in tests use PRMJ_Now() (grep for it calls under js/src for examples of its use). Also, instead of attaching separated test scripts, please attach a patch that includes all of the files as a new files. Then one can use the review machinery to get a nice UI to look at them. Looking at performance numbers I see time regressions for arrays with up to 1000 elements even if the number of compare calls drops. Does it comes from the complexity of the implementation or the code is simply not optimized yet?

David Mandelin [:dmandelin]

Updated

•

13 years ago

Attachment #612903 - Flags: review?(igor)

David Mandelin [:dmandelin]

Updated

•

13 years ago

Attachment #612904 - Flags: review?(igor)

Sankha Narayan Guria [:sankha]

Comment 14

•

12 years ago

This is now being done as self-hosted method in bug 715181.

Till Schneidereit [:till]

Comment 15

•

12 years ago

(In reply to Sankha Narayan Guria [:sankha93] from comment #14) > This is now being done as self-hosted method in bug 715181. Right. We might, at some point, want to change our native implementation, too, however. @Santiago, I unassigned you, but if you decide to pick this up again, feel free to do so. In that case, be sure to coordinate with Sankha, however, to avoid redundant efforts.

Assignee: santiago.gimeno → general

Status: ASSIGNED → NEW

Santiago Gimeno

Reporter

Comment 16

•

12 years ago

Thanks for the notice. Sorry I couldn't continue implementing this, life got in the way. @Sankha, if you have any questions about the patch, I'll do my best to answer them

Till Schneidereit [:till]

Comment 17

•

12 years ago

(In reply to Santiago Gimeno from comment #16) > Thanks for the notice. Sorry I couldn't continue implementing this, life got > in the way. No worries, that's how it goes, sometimes. Thanks for the efforts you put into this, and feel free to take stabs at other things that pique your interest.

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Assignee: general → nobody

Terrence Cole [:terrence]

Comment 18

•

10 years ago

As per bug 715181 comment 91, if we ever move forward with this, please verify that we have the fixes described in http://envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/#sec3

peter

Comment 19

•

9 years ago

I looking for Timsort for my test, but not find little easy code. If you can, test to your table list-merge-sort. I hope, its easier, stable and faster. JS code on page: http://mlich.zam.slu.cz/x/sorting.htm function sortXSlevani_2Arr(comp) I testing on my machine about 1.000.000 or 10.000.000 Typical time is n * log(n) to n * log(n) n to n * log(n) (detect sorted) Principial is easy. Array have lots of sorted arrays. You concat it. You need detect positions (begin-end) or start from length 1. Both time is near equal. With detecting is faster for sorted array. 2461350 -> detect -> 246 135 0 -> concat (3+3) 123456 0 -> concat (6+1) 0123456 END 2461350 -> 2 4 6 1 3 5 0 -> concat (1+1) 24 16 35 0 -> concat (2+2) 1246 035 -> concat (4+4) 0123456 END

peter

Comment 20

•

9 years ago

FF 46.01 1.000.000 random sorted 181 ms sortXSlevani_2Arr 244 ms native JS 397 ms Timsort 280 ms sortXSlevani_3Arr

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Will Medina [:willyelm]

Updated

•

1 year ago

Status: NEW → RESOLVED

Closed: 1 year ago

Resolution: --- → INACTIVE

TimSort implementation 13 years ago Santiago Gimeno 25.26 KB, patch		Details \| Diff \| Splinter Review
Moved timsort helper functions to detail namespace 13 years ago Santiago Gimeno 26.91 KB, patch		Details \| Diff \| Splinter Review
Add functionality to instrument the sorting algorithms 13 years ago Santiago Gimeno 13.47 KB, patch		Details \| Diff \| Splinter Review
Javascript functions to generate arrays of integers 13 years ago Santiago Gimeno 1.84 KB, text/plain		Details
Bash script to generate multiple test files 13 years ago Santiago Gimeno 1.24 KB, text/plain		Details
Bash script to perform the tests generated by generator.sh 13 years ago Santiago Gimeno 996 bytes, text/plain		Details