Open Bug 1502882 Opened 6 years ago Updated 3 months ago

Horrible performance of Array.prototype.sort(cmp_fn) compared to other JS engines

Categories

(Core :: JavaScript Engine, defect, P3)

60 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: qwertiest, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0

Steps to reproduce:

Run the following fiddle in FF, IE Edge and Chrome:
https://jsfiddle.net/c42vd3m9/2171/


Actual results:

6-9x slower sort in FF than IE:
FF: 18.57ms
Chrome: 2.69ms
IE: 2.34ms

Comparison counts:
FF: always ~120k
Chrome: nearly always 64k
IE: usually in the range of 50-60k (goes as low as 45k and as high as 65k)
I guess most of the time is spent in VM calls for relational string comparison. Changing the comparison function to:
---
var str_cmp_first_charcode = function (a, b) {
    return (a.charCodeAt(0) < b.charCodeAt(0) ? -1 : a.charCodeAt(0) > b.charCodeAt(0) ? 1 : 0);
}
---
makes it much faster.
Note: even after removing `toString` + and using `str_cmp_full` (e.g. comparing numbers instead of strings) tests show quite a slowdown in FF:
Edge: 1.07ms
FF: 2.2ms
Comparison count difference is caused by a difference in sort algorithm: https://v8.dev/blog/array-sort

PS: Timsort is epic on expensive comparisons and doubly epic with partially sorted data.
I don't know if my knowledge is still current (it's been years), but IIRC our merge sort uses an iterative algorithm that merges larger and larger power-of-2 chunks together, copying them back and forth between two arrays. This avoids the overhead of the recursive version, but depending on the size of the array to be sorted there might be an additional copy at the end (if an in-place sort is required).

Timsort at its core actually sounds very similar - IIUC it's just smarter about taking advantage of partially sorted data. If we still use the implementation of merge sort that I remember, Timsort might be a reasonably good fit for the existing code. It's been a long time though, so maybe that's a big if.
Chromium 70 needs 3.5s when comparing numbers with `str_cmp_full` whereas Firefox 63 needs 2.7s (on Ubuntu running in a VM).

And when running on Win10 (also in a VM), Edge is indeed twice as fast for number comparison with `str_cmp_full`. But I guess that's only a side-effect of using QuickSort in Edge and only sorting 21 unique numbers. After changing the test case to use `Math.random() * 20000000` instead of `Math.random() * 20`, Firefox was faster for me when compared to Edge.


Btw. simple number comparison can be further speed up in Firefox by changing the comparison function to:
---
function num_cmp_full_fast(a, b) {
    return a - b;
}
---

because that form is special-cased, so it doesn't actually call the function: https://searchfox.org/mozilla-central/rev/8848b9741fc4ee4e9bc3ae83ea0fc048da39979f/js/src/builtin/Array.cpp#2023-2077  :-)
Well, my primary scenario was sorting with non-trivial string comparison, which was simplified as `str_cmp_full`.

It seems that the performance hit when using custom non-trivial comparator has several causes:
1. Extra comparisons (merge-sort vs tim-sort).
2. Slow string comparison in general (e.g. `str_cmp_first_charcode` vs `str_cmp_first`).
Blocks: 1503116
Priority: -- → P3
Depends on: 1538690
Depends on: 1538692
Depends on: 1291463

Posted some patches which should improve this case:

  • bug 1538690 should make str_cmp_first as fast as str_cmp_first_charcode.
  • bug 1538692 and bug 1291463 should improve str_cmp_full. In my dev-environment, from 13.8ms to 5.5ms; 4.5ms when --spectre-mitigations=off is used. (For comparison: JSC needs 3.2ms and V8 needs 3.3ms in the same environment.)

That's awesome! Thanks for all the work!

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

(In reply to TheQwertiest from comment #8)

That's awesome! Thanks for all the work!

\o/

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

There have been attempts to use tim-sort (bug 715181), but in the end a simpler solution (= merge-sort) was chosen. Maybe it makes sense to revisit this decision. For the test case from comment #0, tim-sort seems to be better than merge-sort simply because the input contains many duplicates (~8M comparisons for tim-sort and ~12M comparisons in merge-sort). When fewer duplicates are present (comment #5), tim-sort will also call the comparator function about 12M times. (Which shouldn't be too surprising, because the worst case for both algorithms is O(n * log n). :-)

It seems the only remaining problem is the sorting algorithm: merge-sort is not really suitable for expensive comparisons - my example used rather primitive comparison function (which happened to be expensive, because of other bugs that are fixed with the patches above), but it was more complex (and expensive) in the real scenario.

In defense of merge sort, it is very efficient in terms of comparisons in the general case. Indeed, it approaches the theoretical limit for larger arrays (the best known sorting function in terms of comparison efficiency is a hybrid of Ford-Johnson sort and merge sort with some special case logic).

With that being said, I'm not denying that there are real world examples where you can do much better - e.g. arrays that start out nearly sorted (or sorted in reverse). Detecting such cases adds more overhead to the general case, of course, but I'm sure they're common enough that it's worth the cost. I just wanted to push back a little against the claim that "merge-sort is not really suitable for expensive comparisons" :)

Fair enough, my statement was way too categorical =)
And I didn't test it in SM, so it might be not as useful in this case (e.g. bug 715181).

PS: a really good MIT-licensed tim-sort implementation is available here - https://github.com/tvanslyke/timsort-cpp

It seems to be fixed in ESR68: it actually outperforms Chrome and IE Edge most of the time now!

Updated JS fiddle with working console.log (the repro code is the same): https://jsfiddle.net/3k0jn6Ly/

Severity: normal → S3

Numbers from latest Nightly. And I have increased the number of iterations by 10X :

Nightly: https://share.firefox.dev/3TUlKA1 (2s)
Chrome: https://share.firefox.dev/3ShwYxd (1.4s)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: