Open Bug 1244472 Opened 4 years ago Updated 4 years ago

RadixSort is slow with Array, but fast with typed arrays

Categories

(Core :: JavaScript: Standard Library, defect)

defect
Not set

Tracking

()

People

(Reporter: dannas, Unassigned)

Details

Attachments

(2 files, 1 obsolete file)

2.49 KB, application/javascript
Details
2.19 KB, application/javascript
Details
Attached file reproduce.js (obsolete) —
When working on bug 1242196, I noticed that the performance of my radix sort implementation improved 10x when I switched to a Uint16Array instead of a regular Array for an internal aux buffer.

See the attachment for a reproduction script and numbers from my testruns.

Tried reducing the testcase to something smaller than the whole sort algorithm, but then the speed differences disappeared.

Keep in mind that I am a JavaScript and SpiderMonkey newcomer, it's very likely that I've messed up something elementary here.

If that's not the case and I really have happened to have stumbled upon a case where the optimizer fails, then I'd very much would like to take a stab at solving it, if it's within my reach.
Flags: needinfo?(jdemooij)
We create an Array with length 5000000. Arrays (well, native objects in general) can have either sparse or dense elements. Because 5000000 is large and we start with large indexes, we don't allocate dense elements. 

So we have a sparse array and setting sparse elements is really slow; it's basically like we're adding a new property for each element. You'll notice that it's much faster if you add this loop:

    for (var i=0; i<len; i++)
	aux[i] = 0;

That's because this forces the array to have dense elements.

Uint16Array doesn't have this problem, because typed array elements are entirely different. They are always allocated eagerly and are always dense. Using typed arrays where you can makes sense; it avoids these heuristics and also gives the JIT better type and range information.

We can't simply make all elements dense - people use JS objects/arrays as hash maps:

  var o = {};
  var id = 123456789;
  o[id] = value;

If we gave |o| dense elements, we'd use a ton of memory. We need heuristics somewhere.

So, back to this case. It's a hard problem because we add (random) elements within the range [0, 5000000). Eventually, our heuristics will notice that we should use dense elements, but for the initial X elements, using sparse elements is not unreasonable.

Maybe we can convert to dense elements faster; I can check what other engines do.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached file reproduce.js
I built d8, the V8 javascript shell and compared the time to execute reproduce.js.

With Array
     spidermonkey N=5000000        7663,7137,6990,6994,6958
     v8           N=5000000        512,484,492,496,602
With Uint16Array
     spidermonkey N=5000000        261,292,284,286,287
     v8           N=5000000        462,439,438,440,447

Are these timings consistent with what others are seeing?

Tried to infer something from the output of IONFLAGS=range js, but too many names of nodes. I wonder if there's some way to map that information back to the source in a more graphical form. I wonder if IONFLAGS=range is the way to gain information about optimizer decisions; dense vs sparse arrays? I might be better off, looking through the code then trying to understand the documentation for IONFLAGS.

Is there something actionable for someone like me (newcomer to SpiderMonkey) or should I just move on?
Attachment #8714020 - Attachment is obsolete: true
reproduced_reproduction.js mimics the access pattern of radix sort. 

I wonder if this testcase exercises the same path in SpiderMonkey; e.g. is it really a reduced case of the reproduction.js?

With Array
      spidermonkey   N=5000000        1632, 1622, 1620, 1624, 1631
      v8             N=5000000        58, 82, 88, 88, 86
With Uint16Array
      spidermonkey   N=5000000        59, 56, 56, 61, 62
      v8             N=5000000        44, 40, 40, 44, 45
I think your reduced testcase is valid.

If we increase SPARSE_DENSITY_RATIO from 8 to 128 or so, the test is faster because we will "densify" the sparse elements sooner. That's not something we want to do, though, because it risks wasting memory.

I'd really like to have a special SparseObjectElements structure, that uses a HashMap to store sparse elements, instead of allocating shapes for these elements. Shapes are great for normal (named) properties, but for sparse elements it's a lot of overhead, both memory and performance.

Anyway, I added that to my TODO list; maybe I can spend some time on it later this year, but I've a number of other (performance) bugs to fix first.
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.