Optimize native DataView or self host it.

NEW
Unassigned

Status

()

defect
5 years ago
2 months ago

People

(Reporter: mbx, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Reporter

Description

5 years ago
A simple JS implementation (see attached JS file) of DataViews is nearly 10x faster than the native implementation. Although I didn't take care of all the corner cases in the polyfill, it still feels that we ought to do better than 10x slower.

DataView All, Iterations: 64, Elapsed: 9239.0000ms (144.3594ms / Iteration), Result: 0
DataViewPolyfill All, Iterations: 64, Elapsed: 1183.0000ms (18.4844ms / Iteration), Result: 0

DataView.getInt8, Iterations: 64, Elapsed: 1011.0000ms (15.7969ms / Iteration), Result: 0
DataViewPolyfill.getInt8, Iterations: 64, Elapsed: 40.0000ms (0.6250ms / Iteration), Result: 0

DataView.getUint8, Iterations: 64, Elapsed: 993.0000ms (15.5156ms / Iteration), Result: 0
DataViewPolyfill.getUint8, Iterations: 64, Elapsed: 42.0000ms (0.6563ms / Iteration), Result: 0

DataView.getInt16, Iterations: 64, Elapsed: 1076.0000ms (16.8125ms / Iteration), Result: 0
DataViewPolyfill.getInt16, Iterations: 64, Elapsed: 60.0000ms (0.9375ms / Iteration), Result: 0

DataView.getUint16, Iterations: 64, Elapsed: 1035.0000ms (16.1719ms / Iteration), Result: 0
DataViewPolyfill.getUint16, Iterations: 64, Elapsed: 40.0000ms (0.6250ms / Iteration), Result: 0

DataView.getInt32, Iterations: 64, Elapsed: 1042.0000ms (16.2813ms / Iteration), Result: 0
DataViewPolyfill.getInt32, Iterations: 64, Elapsed: 107.0000ms (1.6719ms / Iteration), Result: 0

DataView.getUint32, Iterations: 64, Elapsed: 1130.0000ms (17.6563ms / Iteration), Result: 0
DataViewPolyfill.getUint32, Iterations: 64, Elapsed: 119.0000ms (1.8594ms / Iteration), Result: 0

DataView.getFloat32, Iterations: 64, Elapsed: 1495.0000ms (23.3594ms / Iteration), Result: 0
DataViewPolyfill.getFloat32, Iterations: 64, Elapsed: 496.0000ms (7.7500ms / Iteration), Result: 0

DataView.getFloat64, Iterations: 64, Elapsed: 1211.0000ms (18.9219ms / Iteration), Result: 0
DataViewPolyfill.getFloat64, Iterations: 64, Elapsed: 599.0000ms (9.3594ms / Iteration), Result: 0
Reporter

Updated

5 years ago
Blocks: shumway-m3
Brian, will DataViews be affected by your TypedArray work at all? If not, I can take a stab at self-hosting parts of them to speed things up.

FWIW, it seems like no engine[1] is really optimizing DataView performance all that much. V8 is about half as fast on all the tests, and JSC slightly slower.


[1]: Well, I don't know about Chakra
Flags: needinfo?(bhackett1024)
Note that DataView as currently implemented is almost designed to be slow.  You get stubcalls from the JIT, you get the overhead of CallNonGenericMethod...
(In reply to Till Schneidereit [:till] from comment #1)
> Brian, will DataViews be affected by your TypedArray work at all? If not, I
> can take a stab at self-hosting parts of them to speed things up.
> 
> FWIW, it seems like no engine[1] is really optimizing DataView performance
> all that much. V8 is about half as fast on all the tests, and JSC slightly
> slower.
> 
> 
> [1]: Well, I don't know about Chakra

No, the stuff I'm working on will only remove the next view slot in data views (bug 1061404), which will shrink the size of them but not change their behavior otherwise.

For at least some of these methods though, self hosting seems like a bad idea.  E.g. the above file's implementation for getInt32:

DataViewPolyfill.prototype.getInt32 = function(offset, littleEndian) {
  var u8 = this._u8;
  var a = u8[offset + 0];
  var b = u8[offset + 1];
  var c = u8[offset + 2];
  var d = u8[offset + 3];
  return littleEndian ?
    (d << 24) | (c << 16) | (b << 8) | a :
    (a << 24) | (b << 16) | (c << 8) | d;
}

This is really inefficient compared to what we could do by properly handling getInt32 directly during Ion compilation.  Most platforms support unaligned memory accesses, and a single 32 bit read will be much faster than what this stuff Ion compiles down to.  If we self host this method we'll just have to rip out that self hosted version later when we want to make this truly fast.

If there's a need to make DataViews fast I don't think it would be very hard to do in Ion, it's just that so far as I know no one has asked for it before.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #3)
> No, the stuff I'm working on will only remove the next view slot in data
> views (bug 1061404), which will shrink the size of them but not change their
> behavior otherwise.

Ok.

> 
> For at least some of these methods though, self hosting seems like a bad
> idea.  E.g. the above file's implementation for getInt32:
> 
> DataViewPolyfill.prototype.getInt32 = function(offset, littleEndian) {
>   var u8 = this._u8;
>   var a = u8[offset + 0];
>   var b = u8[offset + 1];
>   var c = u8[offset + 2];
>   var d = u8[offset + 3];
>   return littleEndian ?
>     (d << 24) | (c << 16) | (b << 8) | a :
>     (a << 24) | (b << 16) | (c << 8) | d;
> }
> 
> This is really inefficient compared to what we could do by properly handling
> getInt32 directly during Ion compilation.  Most platforms support unaligned
> memory accesses, and a single 32 bit read will be much faster than what this
> stuff Ion compiles down to.  If we self host this method we'll just have to
> rip out that self hosted version later when we want to make this truly fast.

That makes sense, yes.

> 
> If there's a need to make DataViews fast I don't think it would be very hard
> to do in Ion, it's just that so far as I know no one has asked for it before.

I think DataViews aren't really used all that much. We need to use it, or a custom equivalent, in Shumway to implement certain functionality, which is why we're looking into performance issues.

I'll look into inlining these, as that doesn't seem all that hard.

Comment 5

5 years ago
(In reply to Till Schneidereit [:till] from comment #4)
> I think DataViews aren't really used all that much.

They get used in other addons too. Optimizing them a bit would be welcome.
This optimization does not need to block Shumway's M3 milestone.
Blocks: shumway-1.0
No longer blocks: shumway-m3
Keywords: perf

Comment 7

4 years ago
Huh. I find myself searching Bugzilla to see if anyone has reported the fact that DataViews seem to be really slow in comparison to native-endian typed-arrays, even if manually swapping bytes to deal with endianness. Apparently I already read this...

On the plus side, I should be able to optimise the thing I was trying to do that was too slow by implementing a faster version cribbed from the attachment.

If properly inlining is not currently the plan, self-hosting in the meantime would be a very welcome improvement, as the performance is kinda awful.
OS: Mac OS X → All
Hardware: x86 → All
Also see bug 1246597.

For reasons that are sometimes beyond me, people want to use DataView.  Sometimes this is for reasons of confusion (they think a TypedArray copies the ArrayBuffer, and does not view it, maybe because the word "Int8Array" does not contain the word "View"), sometimes it just seems they feel it is significantly more convenient (parsing stream data).
Reporter

Comment 9

3 years ago
I thought they were used primarily for unaligned reads/writes.
"Alignment" really only makes sense as an idea if you've done low-levelish programming on non-x86 systems, though.

I get the sense that DataView is often seen as the default interface to ArrayBuffer, with TypedArrays being reserved for special cases.  I can't really back that up with data, but given how flexible DataView is -- and how well it fits in with a dynamic-typing world view -- I can see the point.

Comment 11

3 years ago
DataView can read little or big endian; the type-specific arrays are native-endian only, and thus not usable for portable data that could be read on either. DataView is sadly the only way to not have JavaScript be ~hardware~ dependent here, short of writing your own replacement or detecting endianess and reversing when needed. DataView cleanly deals with this botched part of JS typed array design; it's just really too slow to use.

Also, the type-specific arrays choke if the data isn't aligned, without any real indication as to the problem. That was a confusing one to encounter, as that's not actually properly specified or documented. DataView is really the only fully fleshed-out typed array; the others break in cases that nobody should have to worry about in a high-level scripting language.

Comment 12

3 years ago
(In reply to Lars T Hansen [:lth] from comment #8)
> Also see bug 1246597.
> 
> For reasons that are sometimes beyond me, people want to use DataView. 
> Sometimes this is for reasons of confusion (they think a TypedArray copies
> the ArrayBuffer, and does not view it, maybe because the word "Int8Array"
> does not contain the word "View"), sometimes it just seems they feel it is
> significantly more convenient (parsing stream data).

let buffer = new ArrayBuffer(5);
let view = new Float32Array(buffer, 1, 1); // Whoops!

And while this will generally never happen in one's own code, it certainly does happen with some binary formats out there that weren't designed with byte alignment in mind.
The only solution is DataView, whether native or not.
(In reply to spectralcanine from comment #12)
> (In reply to Lars T Hansen [:lth] from comment #8)
> > Also see bug 1246597.
> > 
> > For reasons that are sometimes beyond me, people want to use DataView. 
> > Sometimes this is for reasons of confusion (they think a TypedArray copies
> > the ArrayBuffer, and does not view it, maybe because the word "Int8Array"
> > does not contain the word "View"), sometimes it just seems they feel it is
> > significantly more convenient (parsing stream data).
> 
> let buffer = new ArrayBuffer(5);
> let view = new Float32Array(buffer, 1, 1); // Whoops!
> 
> And while this will generally never happen in one's own code, it certainly
> does happen with some binary formats out there that weren't designed with
> byte alignment in mind.
> The only solution is DataView, whether native or not.

Yes, I know and understand this use case.  DataView is not the only solution but of course it is by far the most convenient one, and very clearly legitimate and appropriate.  (Another solution is to copy the bytes individually into a separate ArrayBuffer using Uint8Array views and then reading a value out of the second buffer using a Float32Array view.  I'm not advocating that, though at present I would not be surprised if it were faster than DataView, especially if the auxiliary buffer and views on it could be reused.)

(Didn't mean to upset any users of DataView with my earlier remark.)

Comment 14

3 years ago
I did in fact do just what you described at some point, with copying chunks into new arrays to get proper byte alignment.
In the end, it was far slower than just using DataView (but to be upfront, I don't quite remember if I tested in Firefox too).

But disregarding performance for a second, it's just far more logical and intuitive to use DataView. It's a native object who's purpose is to quickly handle arbitrary binary data......except for in reality :)
v8 has optimized their DataView performance and we should start to see it become more performance-sensitive on the web:
https://v8project.blogspot.com/2018/09/dataview.html
Flags: needinfo?(jdemooij)

Comment 16

8 months ago
Two years later, and I still use weird in-house type casting functions to read/write [unaligned] binary data, as since this issue was reported (and in fact I started a similar one on Chromium's issue tracker), DataView is a magnitude slower than my far more complex code.

Thanks for looking at this again!

Comment 17

8 months ago
According to the comment log above, I've commented on this once every year since it was filed, 4 years ago, but I neglected to do so last year. Shucks.

Seriously, though: just throw together a self-hosting patch based on the already provided code and dump it in. This should be standard-practice for absolutely everything that could have its performance improved with it, and not doing it for things that get an order of magnitude (at time of filing) speed up is just stupid. Worry about fancy better optimized versions later.

If Google doing it is what it takes for Mozilla to finally fix this, so be it. :sigh:

/end +1
> and not doing it for things that get an order of magnitude (at time of filing) speed up

Do they?  The proof of concept is not spec-compliant, as comment 0 notes. In the past, fixing that sort of  problem has significantly reduced, and in some cases eliminated, the speedup.

That said, I agree that it's worth at least trying and seeing what the performance ends up looking like.
I looked into this a bit. We could probably inline DataView.prototype.get* with a single MIR instruction, and same for set*.

The annoying part is the optional littleEndian argument. Who decided it's a good idea to have that default to big endian when almost all CPUs are little endian and typed arrays store their data as little endian? Sigh.
Surely that argument will be a constant in most cases and will just boil away in MCallOptimize?
(In reply to Lars T Hansen [:lth] from comment #20)
> Surely that argument will be a constant in most cases and will just boil
> away in MCallOptimize?

That's true, but if we want to inline both little endian and big endian we need the code to get/set both cases anyway, and ideally we'd also optimize the unknown-bool case...

Thinking a bit more about this, Ion could also optimize the known-little-endian case just like typed array get/set. When we have a detached buffer, the bounds check would always fail so we would handle that case in Baseline.
Depends on: 1496378
Bug 1496378 landed, now in the JIT we should be able to rename s/TypedArray/ArrayBufferView/ in a number of places and reuse the code also for DataView objects.

After discussion with Ted we decided to try to implement this in the CacheIR CallIC first (and maybe see if we can use that IC in Ion).
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.