1502882 - Horrible performance of Array.prototype.sort(cmp_fn) compared to other JS engines

Reporter

Description

•

6 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0

Steps to reproduce:

Run the following fiddle in FF, IE Edge and Chrome:
https://jsfiddle.net/c42vd3m9/2171/


Actual results:

6-9x slower sort in FF than IE:
FF: 18.57ms
Chrome: 2.69ms
IE: 2.34ms

Comparison counts:
FF: always ~120k
Chrome: nearly always 64k
IE: usually in the range of 50-60k (goes as low as 45k and as high as 65k)

André Bargull [:anba]

Comment 1

•

6 years ago

I guess most of the time is spent in VM calls for relational string comparison. Changing the comparison function to:
---
var str_cmp_first_charcode = function (a, b) {
    return (a.charCodeAt(0) < b.charCodeAt(0) ? -1 : a.charCodeAt(0) > b.charCodeAt(0) ? 1 : 0);
}
---
makes it much faster.

TheQwertiest

Reporter

Comment 2

•

6 years ago

Note: even after removing `toString` + and using `str_cmp_full` (e.g. comparing numbers instead of strings) tests show quite a slowdown in FF:
Edge: 1.07ms
FF: 2.2ms

TheQwertiest

Reporter

Comment 3

•

6 years ago

Comparison count difference is caused by a difference in sort algorithm: https://v8.dev/blog/array-sort

PS: Timsort is epic on expensive comparisons and doubly epic with partially sorted data.

Emanuel Hoogeveen [:ehoogeveen]

Comment 4

•

6 years ago

I don't know if my knowledge is still current (it's been years), but IIRC our merge sort uses an iterative algorithm that merges larger and larger power-of-2 chunks together, copying them back and forth between two arrays. This avoids the overhead of the recursive version, but depending on the size of the array to be sorted there might be an additional copy at the end (if an in-place sort is required).

Timsort at its core actually sounds very similar - IIUC it's just smarter about taking advantage of partially sorted data. If we still use the implementation of merge sort that I remember, Timsort might be a reasonably good fit for the existing code. It's been a long time though, so maybe that's a big if.

André Bargull [:anba]

Comment 5

•

6 years ago

Chromium 70 needs 3.5s when comparing numbers with `str_cmp_full` whereas Firefox 63 needs 2.7s (on Ubuntu running in a VM).

And when running on Win10 (also in a VM), Edge is indeed twice as fast for number comparison with `str_cmp_full`. But I guess that's only a side-effect of using QuickSort in Edge and only sorting 21 unique numbers. After changing the test case to use `Math.random() * 20000000` instead of `Math.random() * 20`, Firefox was faster for me when compared to Edge.


Btw. simple number comparison can be further speed up in Firefox by changing the comparison function to:
---
function num_cmp_full_fast(a, b) {
    return a - b;
}
---

because that form is special-cased, so it doesn't actually call the function: https://searchfox.org/mozilla-central/rev/8848b9741fc4ee4e9bc3ae83ea0fc048da39979f/js/src/builtin/Array.cpp#2023-2077  :-)

TheQwertiest

Reporter

Comment 6

•

6 years ago

Well, my primary scenario was sorting with non-trivial string comparison, which was simplified as `str_cmp_full`.

It seems that the performance hit when using custom non-trivial comparator has several causes:
1. Extra comparisons (merge-sort vs tim-sort).
2. Slow string comparison in general (e.g. `str_cmp_first_charcode` vs `str_cmp_first`).

Alice0775 White

Updated

•

6 years ago

Blocks: 1503116

Kannan Vijayan [:djvj]

Updated

•

5 years ago

Priority: -- → P3

André Bargull [:anba]

Updated

•

5 years ago

Depends on: 1538690

André Bargull [:anba]

Updated

•

5 years ago

Depends on: 1538692

André Bargull [:anba]

Updated

•

5 years ago

Depends on: 1291463

André Bargull [:anba]

Comment 7

•

5 years ago

Posted some patches which should improve this case:

bug 1538690 should make str_cmp_first as fast as str_cmp_first_charcode.
bug 1538692 and bug 1291463 should improve str_cmp_full. In my dev-environment, from 13.8ms to 5.5ms; 4.5ms when --spectre-mitigations=off is used. (For comparison: JSC needs 3.2ms and V8 needs 3.3ms in the same environment.)

TheQwertiest

Reporter

Comment 8

•

5 years ago

That's awesome! Thanks for all the work!

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

André Bargull [:anba]

Comment 9

•

5 years ago

(In reply to TheQwertiest from comment #8)

That's awesome! Thanks for all the work!

\o/

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

There have been attempts to use tim-sort (bug 715181), but in the end a simpler solution (= merge-sort) was chosen. Maybe it makes sense to revisit this decision. For the test case from comment #0, tim-sort seems to be better than merge-sort simply because the input contains many duplicates (~8M comparisons for tim-sort and ~12M comparisons in merge-sort). When fewer duplicates are present (comment #5), tim-sort will also call the comparator function about 12M times. (Which shouldn't be too surprising, because the worst case for both algorithms is O(n * log n). :-)

Emanuel Hoogeveen [:ehoogeveen]

Comment 10

•

5 years ago

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

In defense of merge sort, it is very efficient in terms of comparisons in the general case. Indeed, it approaches the theoretical limit for larger arrays (the best known sorting function in terms of comparison efficiency is a hybrid of Ford-Johnson sort and merge sort with some special case logic).

With that being said, I'm not denying that there are real world examples where you can do much better - e.g. arrays that start out nearly sorted (or sorted in reverse). Detecting such cases adds more overhead to the general case, of course, but I'm sure they're common enough that it's worth the cost. I just wanted to push back a little against the claim that "merge-sort is not really suitable for expensive comparisons" :)

TheQwertiest

Reporter

Comment 11

•

5 years ago

Fair enough, my statement was way too categorical =)
And I didn't test it in SM, so it might be not as useful in this case (e.g. bug 715181).

PS: a really good MIT-licensed tim-sort implementation is available here - https://github.com/tvanslyke/timsort-cpp

TheQwertiest

Reporter

Comment 12

•

5 years ago

It seems to be fixed in ESR68: it actually outperforms Chrome and IE Edge most of the time now!

TheQwertiest

Reporter

Comment 13

•

5 years ago

Updated JS fiddle with working console.log (the repro code is the same): https://jsfiddle.net/3k0jn6Ly/

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Mayank Bansal

Comment 14

•

3 months ago

Attached file Reporters Testcase.html — Details

Mayank Bansal

Comment 15

•

3 months ago

Numbers from latest Nightly. And I have increased the number of iterations by 10X :

Nightly: https://share.firefox.dev/3TUlKA1 (2s)
Chrome: https://share.firefox.dev/3ShwYxd (1.4s)