Closed Bug 1242196 Opened 4 years ago Closed 4 years ago

Specialize %TypedArray%.prototype.sort for default comparators

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: mrrrgn, Assigned: dannas, Mentored)

References

Details

Attachments

(6 files, 6 obsolete files)

The current TypedArray.sort is fast, but probably less so than an in place Radix sort would be. We were prevented from using Radix because it does not work with comparators, however, now that we have something in place we could simply use Radix by default and fall back to the current sort whenever a custom comparator is supplied.

The trick here will be implementing Radix so that the comparator logic outlined in the specification: https://tc39.github.io/ecma262/#sec-typedarraysort is baked in.
Depends on: 1121937
Mentor: winter2718
Hi Morgan, can you add me as assignee please? I've already been working on bug 1239710, and this seems like the logical continuation of that.

Starting to hack on a new project is way more fun, when I have a mentor to turn to  with questions. I'll try to not over-exploit that privilege. Thanks for the great help I've received from you so far! :)
BAM.  (Since I happened to be awake and saw this first.  :-) )
Assignee: nobody → dannas
Attached file radix_sort.js
A test script for prototyping the sort algorithm outside spidermonkey, as always filled with lots of directed-to-myself comments.

I can write a radix sort that is 2x faster than our quicksort. BUT, I had to use a typed array for the |aux| variable or it got 20x slower.

I suspect that the nested assignment expression aux[counts[topmost]++] = val; fools the optimizer into believing it's not a continous int array.

Will be interesting to see what happens when I write put the function inside builtin/Sorting.js. There I can't use typed arrays for sure. Will be fun to investigate.
(In reply to Daniel Näslund from comment #3)
> I can write a radix sort that is 2x faster than our quicksort. BUT, I had to
> use a typed array for the |aux| variable or it got 20x slower.

Would you mind filing a bug for this (with a small testcase demonstrating the problem in the shell), and requesting needinfo from me? :)
Note that you should be able to use Uint8Array in self-hosted code. It won't have any methods on its prototype, but that shouldn't matter for your use case, right? I see that you're using a Uint16Array - you can make that available to self-hosted code by modifying the code here: http://mxr.mozilla.org/mozilla-central/source/js/src/vm/GlobalObject.cpp?rev=56c5187a2c64#434
(In reply to Jan de Mooij [:jandem] from comment #4)
> (In reply to Daniel Näslund from comment #3)
> > I can write a radix sort that is 2x faster than our quicksort. BUT, I had to
> > use a typed array for the |aux| variable or it got 20x slower.
> 
> Would you mind filing a bug for this (with a small testcase demonstrating
> the problem in the shell), and requesting needinfo from me? :)

Sorry for the noise. This loop appended a NaN to the end of my counts array (which was of size R+1). When I fixed the loop condition to be r < R, the performance with a regular Array matched the typed array.

        // Transform counts to indices.
        for (var r = 0; r < R + 1; r++) {
            counts[r+1] += counts[r];
        }

So nothing needs to be done here. Thanks for the quick feedback and sorry for taking up your time. 

I'll add this event to my ongoing log of JavaScript newbie gotchas. :) I wonder if I can get SpiderMonkey to report to me when it fails to treat an array as an continous array with int keys. Will have to read up on that.
(In reply to Till Schneidereit [:till] from comment #5)
> Note that you should be able to use Uint8Array in self-hosted code. It won't
> have any methods on its prototype, but that shouldn't matter for your use
> case, right? I see that you're using a Uint16Array - you can make that
> available to self-hosted code by modifying the code here:
> http://mxr.mozilla.org/mozilla-central/source/js/src/vm/GlobalObject.
> cpp?rev=56c5187a2c64#434

See previous comment. My bad. But thanks for a pointer into how the self-hosted code is initialized. I'll study that to try and get a grip on how intrinistics, self-hosted code and regular types are connected. At the moment, My actions has had sort of a cargo cult aura over them; use this function and that will happen.
(In reply to Daniel Näslund from comment #6)
> (In reply to Jan de Mooij [:jandem] from comment #4)
> > (In reply to Daniel Näslund from comment #3)
> > > I can write a radix sort that is 2x faster than our quicksort. BUT, I had to
> > > use a typed array for the |aux| variable or it got 20x slower.
> > 
> > Would you mind filing a bug for this (with a small testcase demonstrating
> > the problem in the shell), and requesting needinfo from me? :)
> 
> Sorry for the noise. This loop appended a NaN to the end of my counts array
> (which was of size R+1). When I fixed the loop condition to be r < R, the
> performance with a regular Array matched the typed array.

Doh, it did not. Must have forgot to save the script when switching between Array and Uint16Array. 

I've raised bug 1244472 and attached a reproduction script. If it turns out to be a real problem, I'd be glad to take a stab at it.
Attached patch WIP hook up radixsort for u16 (obsolete) — Splinter Review
WIP patch that uses radix sort for the Uint16Arrays that don't have a custom compare function.

I'm seeing performance improvements that don't make sense; too good to be true. First the baseline:

    === Uint16 quicksort===
    N=100000         47,42,41,42,40
    N=200000         85,85,88,87,85
    N=300000         135,132,131,131,134
    N=400000         178,174,178,175,174
    N=500000         222,222,220,224,226

When using my radix sort with the buffer set to var aux = new List() I get a performance degration by ~6x:

    === Uint16 radixsort aux=List ===
    N=100000         217,229,254,255,254
    N=200000         510,509,511,510,511
    N=300000         841,802,791,807,788
    N=400000         1025,1023,1039,1023,1027
    N=500000         1268,1284,1273,1269,1282

When I use an Uint16Array for aux, the performance improves by as much as 20x:

    === Uint16 radixsort aux=Uint16Array ===
    N=100000         6,4,2,2,2
    N=200000         4,5,4,5,4
    N=300000         7,7,7,7,7
    N=400000         9,9,9,10,9
    N=500000         12,14,12,12,12

Will have to investigate.
Attached file benchmark_sort.js (obsolete) —
Script for benchmarking my radixsort.

Relies on the fact that QuickSort is called for sort() with a custom comparator.
Thanks for doing all these measurements!

It might be useful to also test smaller arrays, though. Most arrays aren't that big, and some algorithms may do well on large arrays but are (relatively) worse on smaller arrays (because initialization is slower, for instance).
Attached file doubling_experiment.js (obsolete) —
A benchmark for doing doubling experiments. Output from a testrun on my computer at the top of the file. The RadixSort uses a Uint16Array internally.

jandem made a comment on the performance differences I've been seeing when switching between Array and typed arrays for the radixsort; https://bugzilla.mozilla.org/show_bug.cgi?id=1244472#c1

QuickSort runs surprisingly linear and RadixSort is surprisingly fast. Will continue investigating.
Meant as a baseline for comparing against the javascript sorting algorithms.
Attached file doubling_experiment.js
The javascript RadixSort is 3x faster than the same algorithm coded in C++.

The javascript QuickSort is 6x slower than std::sort form C++ standard library.
Attachment #8714403 - Attachment is obsolete: true
Comment on attachment 8714115 [details] [diff] [review]
WIP hook up radixsort for u16

Review of attachment 8714115 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Sorting.js
@@ +44,5 @@
>  }
>  
> +// Helper for RadixSort
> +function Digit(x, pos) {
> +    return (x >> pos * 4) & 0x0F;

// flipping the signed bit during the last iteration will properly
// sort negative values.
let val = (x >> pos * 4) & 0xF;
if isSigned && pos == max_pos - 1 { // max_pos in this example would be 4
  val ^= 0xF; 
}

@@ +77,5 @@
> +    var aux = new Uint16Array(len);
> +
> +    // ### I wonder what is the optimal step size?
> +    // ### Right now I'm stepping over four times, once per nibble in Uint16.
> +    for (var d = 0; d < 4; d++) {

Why not move in one byte chunks using a base of 256?

@@ +86,5 @@
> +        // Compute frequency counts
> +        for (var i = 0; i < len; i++) {
> +            var val = array[i];
> +            var topmost  = Digit(val, d);
> +            counts[topmost + 1]++;

I like this. To make it work with floats I think we'll just want to add buckets for NaN, -Infinity, -0, and +Infinity
I think we should use a base of 256. In that situation, we'd want to xor by 0x80 on the last iteration to handle signed bits. For handling floating point values we'll need to create buckets for NaN, -0, -Infinity, and +Infinity however that shouldn't be too much extra work. We need to do this to make sure we obey all the rules laid out in the default comparator. Here's our current implementation => https://dxr.mozilla.org/mozilla-central/source/js/src/builtin/TypedArray.js#946

The comment above points out the section in the spec: https://tc39.github.io/ecma262/#sec-typedarraysort

You're kicking butt on these sorts! ^.^
*The comment above our comparator implementation I meant to say, in any case, you'll find it in section 22.2.3.26.
Herp derp, I was wrong about the xor value. We want to use 0x10 for your current implementation.

// flipping the signed bit during the last iteration will properly
// sort negative values.
let val = (x >> pos * 4) & 0xF;
if isSigned && pos == max_pos - 1 { // max_pos in this example would be 4
  val ^= 0x10; 
}
Wrong again::: s/0x10/0x8/

Apologies, I can't binary math tonight. :'(
Comment on attachment 8714115 [details] [diff] [review]
WIP hook up radixsort for u16

Review of attachment 8714115 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Sorting.js
@@ +44,5 @@
>  }
>  
> +// Helper for RadixSort
> +function Digit(x, pos) {
> +    return (x >> pos * 4) & 0x0F;

s/^= 0xF/^= 0x8/
Let's also keep in mind that for NaNs we need to actually store them instead of just iterating a counter. Reason being we don't want to blow away any NaN patterns etc...
(In reply to Morgan Phillips [:mrrrgn] from comment #21)
> we don't want to blow away any NaN patterns etc...

No spec requirement for this -- SortCompare returns +0 when both arguments are NaN to consider them equal, and the implementation-defined series of calls to [[Get]] and [[Set]] discards bit patterns.  Let's not make extra work for ourselves here!
Attached patch wip.diff (obsolete) — Splinter Review
The ecma_6 tests passes. This WIP patch uses radixsort for uint16, int16, uint32, int32 and float32 types.

Had great problems with sorting float32 values. A radixsort on floats work out of the box if all values are positive, but fails if there are negative values. Two problems then: 
1. The sign bit causes the to be placed after the positive values and 
2. The exponent and mantissa is has larger values for smaller values.

Number 1 is easy to fix but number for number 2 I tried first to rearrange the order of the content of the count variable; swap the lower and upper half to fix reordering of signs. And reverse the order of the negative values. But somehow I goofed up and had to abandon that approach.

Then I tried to do the bit twiddling in place during the sort passes, but messed up.

So I settled on doing a prepass where I inverted and reversed and then a postpass that restored the original content of each floating point number.

That's bound to be slower, than the other suggestions. Will measure to see if it still is significantly faster.

Probably need to do a cutoff to insertionsort as well. Need to benchmark smaller array sizes.
Attachment #8714115 - Attachment is obsolete: true
Comment on attachment 8718059 [details] [diff] [review]
wip.diff

Review of attachment 8718059 [details] [diff] [review]:
-----------------------------------------------------------------

Really digging this so far, I'm interested to see if the logic for flipping signed bits can be made more compact.

::: js/src/builtin/Sorting.js
@@ +58,5 @@
> +    for (var i = 0; i < R + 1; i++) {
> +        counts[i] = 0;
> +    }
> +
> +    var invmask = 0x00;

Style nit, why not use ternary assignment: var invmask = invert ? 0x80 : 0x00;

@@ +90,5 @@
> +
> +// ### Should I fall back to insertion sort for small integers?
> +//
> +// ### All this parameters looks ugly.
> +function RadixSort(array, len, nbytes, signed, floating) {

The parameters aren't so bad, but let's make sure to leave a comment describing their functions.

@@ +103,5 @@
> +        aux[i] = 0;
> +    }
> +
> +    // ### Ugly. Factor out a function.
> +    // ### Measure performance. Is this simpler approach enough or should I fiddle

If this doesn't impact performance much and we can reuse the logic for signed integers as well I believe it may actually be easier to follow than an in situ approach. Definitely agree with factoring it out into a helper function.
Attached file benchmark_results.txt
Benchmark results. Added a cut-off to insertion sort for N < 64 elements.

A few data points to show best/worse differences between radixsort and the quicksort. See the file for more data.

uint16:  N<100: 1x          N=5e5: 10x faster
uint32:  N<100: 2x slower   N=5e5:  5x faster
int32:   N<100: 4x slower   N=5e5:  5x faster
float32: N<100: 4x slower   N=5e5: 10x faster

The differences decreases when the measurements are repeated, again see the txt-file.

Questions to ponder on:
Should I consider using QuickSort for arrays where N<100?
Why do we see a difference for N=10 between QuickSort and RadixSort? Both should just call InsertionSort.
Attached file benchmark_sort.js
Attach the benchmarkscript that I used to create the data in the previous data.
Attachment #8714116 - Attachment is obsolete: true
Attached patch wip.diff (obsolete) — Splinter Review
Added a cutoff to insertion sort.
Attachment #8718059 - Attachment is obsolete: true
Attached patch 1242196.patch (obsolete) — Splinter Review
Attachment #8718527 - Attachment is obsolete: true
Attachment #8719944 - Flags: review?(winter2718)
Comment on attachment 8719944 [details] [diff] [review]
1242196.patch

Review of attachment 8719944 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good! My only serious issue here is that the counts/indices logic takes a bit of time to decipher for someone encountering it for the first time. In situations like that a long comment can save enough time/energy to be worthwhile. r=me with a few more comments, and a fleshed out commit message. \o/ PS: Holy moly, this is so fast! :)

::: js/src/builtin/Sorting.js
@@ +58,5 @@
> +    // Compute frequency counts
> +    for (let i = 0; i < len; i++) {
> +        let val = array[i];
> +        let b = ByteAtCol(val, col);
> +        counts[b + 1]++;

This is great, but it would be good to create a short comment explaining why a space is left at the beginning of the "counts" array. Something like:

"By ensuring that counts[0] == 0 we can use simple addition to transform counts into index ranges. For example, if our count at b == 0 is 1, and our count at b == 1 is 2, making counts == [0, 1, 2], we can add each entry to its successor and derive counts == [0, 1, 3], where the range (0, 1) corresponds to b == 0, and the range (1, 3) corresponds to b==2. This transformation from counts into index ranges is used for value distribution later in the sort.
Attachment #8719944 - Flags: review?(winter2718)
Attached patch 1242196.patchSplinter Review
Attachment #8719944 - Attachment is obsolete: true
Attachment #8720437 - Flags: review?(winter2718)
Comment on attachment 8720437 [details] [diff] [review]
1242196.patch

Review of attachment 8720437 [details] [diff] [review]:
-----------------------------------------------------------------

http://www.quickmeme.com/img/0f/0ff49113f737f6725161e579f345863f0a8b6d82d260aca7a3726c18b760b024.jpg
Attachment #8720437 - Flags: review?(winter2718) → review+
Sorry for the delay, landing today assuming that this try push looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=daff843db9cd

I believe that it's coming time to give Daniel higher privileges so that he can grab his own bugs and submit try jobs.
https://hg.mozilla.org/mozilla-central/rev/08a0ab0d5c4a
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.