Closed Bug 1065894 Opened 10 years ago Closed 4 years ago

Optimize native DataView or self host it.

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla78

Tracking Flags:

Tracking

Status

firefox78

---

fixed

People

(Reporter: mbx, Assigned: anba)

References

Details

(Keywords: perf)

Attachments

(9 files)

dataViewBenchmark.js 10 years ago Michael Bebenita [:mbx] 8.94 KB, application/x-javascript		Details
Bug 1065894 - Part 1: Rename TypedArray mir nodes to mention ArrayBufferView. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 2: Add byte swap instructions to the assemblers. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 3: Add unaligned load and store functions to the assemblers. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 4: Add LInt64Definition::BogusTemp. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 5: Inline DataView getter functions. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 6: Inline DataView setter functions. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 7: Inline 'byteLength' and 'byteOffset' getters for DataViews. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1065894 - Part 8: Add test cases. r=jandem! 4 years ago André Bargull [:anba] 47 bytes, text/x-phabricator-request		Details \| Review

Michael Bebenita [:mbx]

Reporter

Description

•

10 years ago

Attached file dataViewBenchmark.js — Details

A simple JS implementation (see attached JS file) of DataViews is nearly 10x faster than the native implementation. Although I didn't take care of all the corner cases in the polyfill, it still feels that we ought to do better than 10x slower. DataView All, Iterations: 64, Elapsed: 9239.0000ms (144.3594ms / Iteration), Result: 0 DataViewPolyfill All, Iterations: 64, Elapsed: 1183.0000ms (18.4844ms / Iteration), Result: 0 DataView.getInt8, Iterations: 64, Elapsed: 1011.0000ms (15.7969ms / Iteration), Result: 0 DataViewPolyfill.getInt8, Iterations: 64, Elapsed: 40.0000ms (0.6250ms / Iteration), Result: 0 DataView.getUint8, Iterations: 64, Elapsed: 993.0000ms (15.5156ms / Iteration), Result: 0 DataViewPolyfill.getUint8, Iterations: 64, Elapsed: 42.0000ms (0.6563ms / Iteration), Result: 0 DataView.getInt16, Iterations: 64, Elapsed: 1076.0000ms (16.8125ms / Iteration), Result: 0 DataViewPolyfill.getInt16, Iterations: 64, Elapsed: 60.0000ms (0.9375ms / Iteration), Result: 0 DataView.getUint16, Iterations: 64, Elapsed: 1035.0000ms (16.1719ms / Iteration), Result: 0 DataViewPolyfill.getUint16, Iterations: 64, Elapsed: 40.0000ms (0.6250ms / Iteration), Result: 0 DataView.getInt32, Iterations: 64, Elapsed: 1042.0000ms (16.2813ms / Iteration), Result: 0 DataViewPolyfill.getInt32, Iterations: 64, Elapsed: 107.0000ms (1.6719ms / Iteration), Result: 0 DataView.getUint32, Iterations: 64, Elapsed: 1130.0000ms (17.6563ms / Iteration), Result: 0 DataViewPolyfill.getUint32, Iterations: 64, Elapsed: 119.0000ms (1.8594ms / Iteration), Result: 0 DataView.getFloat32, Iterations: 64, Elapsed: 1495.0000ms (23.3594ms / Iteration), Result: 0 DataViewPolyfill.getFloat32, Iterations: 64, Elapsed: 496.0000ms (7.7500ms / Iteration), Result: 0 DataView.getFloat64, Iterations: 64, Elapsed: 1211.0000ms (18.9219ms / Iteration), Result: 0 DataViewPolyfill.getFloat64, Iterations: 64, Elapsed: 599.0000ms (9.3594ms / Iteration), Result: 0

Michael Bebenita [:mbx]

Reporter

Updated

•

10 years ago

Blocks: shumway-m3

Till Schneidereit [:till]

Comment 1

•

10 years ago

Brian, will DataViews be affected by your TypedArray work at all? If not, I can take a stab at self-hosting parts of them to speed things up. FWIW, it seems like no engine[1] is really optimizing DataView performance all that much. V8 is about half as fast on all the tests, and JSC slightly slower. [1]: Well, I don't know about Chakra

Flags: needinfo?(bhackett1024)

Boris Zbarsky [:bzbarsky]

Comment 2

•

10 years ago

Note that DataView as currently implemented is almost designed to be slow. You get stubcalls from the JIT, you get the overhead of CallNonGenericMethod...

Brian Hackett [Laid off!]

Comment 3

•

10 years ago

(In reply to Till Schneidereit [:till] from comment #1) > Brian, will DataViews be affected by your TypedArray work at all? If not, I > can take a stab at self-hosting parts of them to speed things up. > > FWIW, it seems like no engine[1] is really optimizing DataView performance > all that much. V8 is about half as fast on all the tests, and JSC slightly > slower. > > > [1]: Well, I don't know about Chakra No, the stuff I'm working on will only remove the next view slot in data views (bug 1061404), which will shrink the size of them but not change their behavior otherwise. For at least some of these methods though, self hosting seems like a bad idea. E.g. the above file's implementation for getInt32: DataViewPolyfill.prototype.getInt32 = function(offset, littleEndian) { var u8 = this._u8; var a = u8[offset + 0]; var b = u8[offset + 1]; var c = u8[offset + 2]; var d = u8[offset + 3]; return littleEndian ? (d << 24) | (c << 16) | (b << 8) | a : (a << 24) | (b << 16) | (c << 8) | d; } This is really inefficient compared to what we could do by properly handling getInt32 directly during Ion compilation. Most platforms support unaligned memory accesses, and a single 32 bit read will be much faster than what this stuff Ion compiles down to. If we self host this method we'll just have to rip out that self hosted version later when we want to make this truly fast. If there's a need to make DataViews fast I don't think it would be very hard to do in Ion, it's just that so far as I know no one has asked for it before.

Flags: needinfo?(bhackett1024)

Till Schneidereit [:till]

Comment 4

•

10 years ago

(In reply to Brian Hackett (:bhackett) from comment #3) > No, the stuff I'm working on will only remove the next view slot in data > views (bug 1061404), which will shrink the size of them but not change their > behavior otherwise. Ok. > > For at least some of these methods though, self hosting seems like a bad > idea. E.g. the above file's implementation for getInt32: > > DataViewPolyfill.prototype.getInt32 = function(offset, littleEndian) { > var u8 = this._u8; > var a = u8[offset + 0]; > var b = u8[offset + 1]; > var c = u8[offset + 2]; > var d = u8[offset + 3]; > return littleEndian ? > (d << 24) | (c << 16) | (b << 8) | a : > (a << 24) | (b << 16) | (c << 8) | d; > } > > This is really inefficient compared to what we could do by properly handling > getInt32 directly during Ion compilation. Most platforms support unaligned > memory accesses, and a single 32 bit read will be much faster than what this > stuff Ion compiles down to. If we self host this method we'll just have to > rip out that self hosted version later when we want to make this truly fast. That makes sense, yes. > > If there's a need to make DataViews fast I don't think it would be very hard > to do in Ion, it's just that so far as I know no one has asked for it before. I think DataViews aren't really used all that much. We need to use it, or a custom equivalent, in Shumway to implement certain functionality, which is why we're looking into performance issues. I'll look into inlining these, as that doesn't seem all that hard.

Dave Garrett

Comment 5

•

10 years ago

(In reply to Till Schneidereit [:till] from comment #4) > I think DataViews aren't really used all that much. They get used in other addons too. Optimizing them a bit would be welcome.

Chris Peterson [:cpeterson]

Comment 6

•

10 years ago

This optimization does not need to block Shumway's M3 milestone.

Blocks: shumway-1.0
No longer blocks: shumway-m3

Keywords: perf

Dave Garrett

Comment 7

•

9 years ago

Huh. I find myself searching Bugzilla to see if anyone has reported the fact that DataViews seem to be really slow in comparison to native-endian typed-arrays, even if manually swapping bytes to deal with endianness. Apparently I already read this... On the plus side, I should be able to optimise the thing I was trying to do that was too slow by implementing a faster version cribbed from the attachment. If properly inlining is not currently the plan, self-hosting in the meantime would be a very welcome improvement, as the performance is kinda awful.

OS: Mac OS X → All

Hardware: x86 → All

Lars T Hansen [:lth]

Comment 8

•

8 years ago

Also see bug 1246597. For reasons that are sometimes beyond me, people want to use DataView. Sometimes this is for reasons of confusion (they think a TypedArray copies the ArrayBuffer, and does not view it, maybe because the word "Int8Array" does not contain the word "View"), sometimes it just seems they feel it is significantly more convenient (parsing stream data).

Michael Bebenita [:mbx]

Reporter

Comment 9

•

8 years ago

I thought they were used primarily for unaligned reads/writes.

Lars T Hansen [:lth]

Comment 10

•

8 years ago

"Alignment" really only makes sense as an idea if you've done low-levelish programming on non-x86 systems, though. I get the sense that DataView is often seen as the default interface to ArrayBuffer, with TypedArrays being reserved for special cases. I can't really back that up with data, but given how flexible DataView is -- and how well it fits in with a dynamic-typing world view -- I can see the point.

Dave Garrett

Comment 11

•

8 years ago

DataView can read little or big endian; the type-specific arrays are native-endian only, and thus not usable for portable data that could be read on either. DataView is sadly the only way to not have JavaScript be ~hardware~ dependent here, short of writing your own replacement or detecting endianess and reversing when needed. DataView cleanly deals with this botched part of JS typed array design; it's just really too slow to use. Also, the type-specific arrays choke if the data isn't aligned, without any real indication as to the problem. That was a confusing one to encounter, as that's not actually properly specified or documented. DataView is really the only fully fleshed-out typed array; the others break in cases that nobody should have to worry about in a high-level scripting language.

spectralcanine

Comment 12

•

8 years ago

(In reply to Lars T Hansen [:lth] from comment #8) > Also see bug 1246597. > > For reasons that are sometimes beyond me, people want to use DataView. > Sometimes this is for reasons of confusion (they think a TypedArray copies > the ArrayBuffer, and does not view it, maybe because the word "Int8Array" > does not contain the word "View"), sometimes it just seems they feel it is > significantly more convenient (parsing stream data). let buffer = new ArrayBuffer(5); let view = new Float32Array(buffer, 1, 1); // Whoops! And while this will generally never happen in one's own code, it certainly does happen with some binary formats out there that weren't designed with byte alignment in mind. The only solution is DataView, whether native or not.

Lars T Hansen [:lth]

Comment 13

•

8 years ago

(In reply to spectralcanine from comment #12) > (In reply to Lars T Hansen [:lth] from comment #8) > > Also see bug 1246597. > > > > For reasons that are sometimes beyond me, people want to use DataView. > > Sometimes this is for reasons of confusion (they think a TypedArray copies > > the ArrayBuffer, and does not view it, maybe because the word "Int8Array" > > does not contain the word "View"), sometimes it just seems they feel it is > > significantly more convenient (parsing stream data). > > let buffer = new ArrayBuffer(5); > let view = new Float32Array(buffer, 1, 1); // Whoops! > > And while this will generally never happen in one's own code, it certainly > does happen with some binary formats out there that weren't designed with > byte alignment in mind. > The only solution is DataView, whether native or not. Yes, I know and understand this use case. DataView is not the only solution but of course it is by far the most convenient one, and very clearly legitimate and appropriate. (Another solution is to copy the bytes individually into a separate ArrayBuffer using Uint8Array views and then reading a value out of the second buffer using a Float32Array view. I'm not advocating that, though at present I would not be surprised if it were faster than DataView, especially if the auxiliary buffer and views on it could be reused.) (Didn't mean to upset any users of DataView with my earlier remark.)

spectralcanine

Comment 14

•

8 years ago

I did in fact do just what you described at some point, with copying chunks into new arrays to get proper byte alignment. In the end, it was far slower than just using DataView (but to be upfront, I don't quite remember if I tested in Firefox too). But disregarding performance for a second, it's just far more logical and intuitive to use DataView. It's a native object who's purpose is to quickly handle arbitrary binary data......except for in reality :)

Lars T Hansen [:lth]

Comment 15

•

6 years ago

v8 has optimized their DataView performance and we should start to see it become more performance-sensitive on the web: https://v8project.blogspot.com/2018/09/dataview.html

Flags: needinfo?(jdemooij)

spectralcanine

Comment 16

•

6 years ago

Two years later, and I still use weird in-house type casting functions to read/write [unaligned] binary data, as since this issue was reported (and in fact I started a similar one on Chromium's issue tracker), DataView is a magnitude slower than my far more complex code. Thanks for looking at this again!

Dave Garrett

Comment 17

•

6 years ago

According to the comment log above, I've commented on this once every year since it was filed, 4 years ago, but I neglected to do so last year. Shucks. Seriously, though: just throw together a self-hosting patch based on the already provided code and dump it in. This should be standard-practice for absolutely everything that could have its performance improved with it, and not doing it for things that get an order of magnitude (at time of filing) speed up is just stupid. Worry about fancy better optimized versions later. If Google doing it is what it takes for Mozilla to finally fix this, so be it. :sigh: /end +1

Boris Zbarsky [:bzbarsky]

Comment 18

•

6 years ago

> and not doing it for things that get an order of magnitude (at time of filing) speed up Do they? The proof of concept is not spec-compliant, as comment 0 notes. In the past, fixing that sort of problem has significantly reduced, and in some cases eliminated, the speedup. That said, I agree that it's worth at least trying and seeing what the performance ends up looking like.

Jan de Mooij [:jandem]

Comment 19

•

6 years ago

I looked into this a bit. We could probably inline DataView.prototype.get* with a single MIR instruction, and same for set*. The annoying part is the optional littleEndian argument. Who decided it's a good idea to have that default to big endian when almost all CPUs are little endian and typed arrays store their data as little endian? Sigh.

Lars T Hansen [:lth]

Comment 20

•

6 years ago

Surely that argument will be a constant in most cases and will just boil away in MCallOptimize?

Jan de Mooij [:jandem]

Comment 21

•

6 years ago

(In reply to Lars T Hansen [:lth] from comment #20) > Surely that argument will be a constant in most cases and will just boil > away in MCallOptimize? That's true, but if we want to inline both little endian and big endian we need the code to get/set both cases anyway, and ideally we'd also optimize the unknown-bool case... Thinking a bit more about this, Ion could also optimize the known-little-endian case just like typed array get/set. When we have a detached buffer, the bounds check would always fail so we would handle that case in Baseline.

Jan de Mooij [:jandem]

Updated

•

6 years ago

Depends on: 1496378

Jan de Mooij [:jandem]

Comment 22

•

6 years ago

Bug 1496378 landed, now in the JIT we should be able to rename s/TypedArray/ArrayBufferView/ in a number of places and reuse the code also for DataView objects. After discussion with Ted we decided to try to implement this in the CacheIR CallIC first (and maybe see if we can use that IC in Ion).

Flags: needinfo?(jdemooij)

Haik Aftandilian [:haik]

Comment 23

•

5 years ago

A user reported[1] Firefox being 40x slower than Chrome on a DataView benchmark[2]. I collected a profile[3] while running DataView part of the test. Is this the right bug to use for addressing the performance of DataView?

Lars T Hansen [:lth]

Comment 24

•

5 years ago

Is this the right bug to use for addressing the performance of DataView?

Yes, it is.

André Bargull [:anba]

Assignee

Updated

•

5 years ago

Assignee: nobody → andrebargull

Status: NEW → ASSIGNED

Type: defect → enhancement

André Bargull [:anba]

Assignee

Comment 26

•

4 years ago

Attached file Bug 1065894 - Part 1: Rename TypedArray mir nodes to mention ArrayBufferView. r=jandem! — Details

Following bug 1496378, update MIR nodes to mention ArrayBufferView instead of
TypedArray. Later patches in this stack will make use of them for DataView
objects.

André Bargull [:anba]

Assignee

Comment 27

•

4 years ago

Attached file Bug 1065894 - Part 2: Add byte swap instructions to the assemblers. r=jandem! — Details

x86/x64:

Implement the 2-byte swap instructions using rol, matching the output of
Clang for the __builtin_bswap16 built-in.
4-byte and 8-byte swaps are implemented through bswap.
The actual rol and bswap functions are derived from upstream.

arm32:

Implement 2-byte swap with sign extend using revsh.
Implement 2-byte swap with zero extend using rev16 followed by uxth.
4-byte and 8-byte swaps are implemented through rev.

arm64:

Implement 2-byte swap using rev16 followed by either sxth or uxth.
4-byte and 8-byte swaps are implemented through rev.

Depends on D74619

André Bargull [:anba]

Assignee

Comment 28

•

4 years ago

Attached file Bug 1065894 - Part 3: Add unaligned load and store functions to the assemblers. r=jandem! — Details

Depends on D74620

André Bargull [:anba]

Assignee

Comment 29

•

4 years ago

Attached file Bug 1065894 - Part 4: Add LInt64Definition::BogusTemp. r=jandem! — Details

Change LInt64Definition to a proper class and add a BogusTemp method to it.

This will be used in the next patches.

Depends on D74621

André Bargull [:anba]

Assignee

Comment 30

•

4 years ago

Attached file Bug 1065894 - Part 5: Inline DataView getter functions. r=jandem! — Details

The implementation of LoadDataViewElement follows LoadUnboxedScalar,
therefore most LoadDataViewElement functions were placed next to their
counterparts from LoadUnboxedScalar.

MCallOptimize:

The two single-byte getters getInt8 and getUint8 are directly inlined
through LoadUnboxedScalar.
This let's us avoid duplicating the single byte extra cases in Lowering.
And MLoadDataViewElement doesn't need to save the otherwise unused
littleEndian parameter.

Lowering:

Optimises the case when littleEndian is a constant value, so we can avoid
branching around the byte swap instructions in most cases.
As usual, x86 needs special-casing because there are too few registers
available.

I didn't bother to implement the following two optimisations for
LoadUnboxedScalar to LoadDataViewElement:

AnalyzeLoadUnboxedScalar from EffectiveAddressAnalysis
Constant index optimisation in lowering/codegen

Both optimisations are currently unavailable due to Spectre index masking,
so it didn't seem worthwhile to spend time to support them for
LoadDataViewElement.

Depends on D74622

André Bargull [:anba]

Assignee

Comment 31

•

4 years ago

Attached file Bug 1065894 - Part 6: Inline DataView setter functions. r=jandem! — Details

Similar to the previous part, inline the setter functions based on
the current StoreUnboxedScalar support.

Depends on D74623

André Bargull [:anba]

Assignee

Comment 32

•

4 years ago

Attached file Bug 1065894 - Part 7: Inline 'byteLength' and 'byteOffset' getters for DataViews. r=jandem! — Details

Depends on D74624

André Bargull [:anba]

Assignee

Comment 33

•

4 years ago

Attached file Bug 1065894 - Part 8: Add test cases. r=jandem! — Details

Depends on D74625

Pulsebot

Comment 34

•

4 years ago

Bogdan Tara[:bogdan_tara | bogdant]

Comment 35

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/420e72fb4f83
https://hg.mozilla.org/mozilla-central/rev/2b5bf053877a
https://hg.mozilla.org/mozilla-central/rev/abb6d449fd6a
https://hg.mozilla.org/mozilla-central/rev/0822e88eb009
https://hg.mozilla.org/mozilla-central/rev/d36328fd387d
https://hg.mozilla.org/mozilla-central/rev/5bd1d18fe1c2
https://hg.mozilla.org/mozilla-central/rev/e0d13e5b07a6
https://hg.mozilla.org/mozilla-central/rev/9ba7a0d69b96

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox78: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla78

spectralcanine

Comment 36

•

4 years ago

Thank you! I can definitely see the speed difference.

Unfortunately my typecasting code is still somehow faster (on Chrome as well), but if this landed a couple of years ago I'd probably be using it :P

spectralcanine

Comment 37

•

4 years ago

...and then I realized that I am forgetting this will only be in 78 and I wasn't even testing it.
Currently on 76, reading with DataView seems to be about 4 times slower than Chrome, while writing is faster, not sure why.
Would be interesting to see how this optimization works.
Right now it is still faster on both browsers to explicitly call for example a function that takes 4 numbers, sets 4 indices of a Uint8Array, and returns the resulting number from a Float32Array that looks at the same buffer, all preallocated at script load time.

Marco Trevisan (Treviño)

Comment 38

•

4 years ago

Any plans in backporting to the latest ESR?

Lars T Hansen [:lth]

Comment 39

•

4 years ago

(In reply to Marco Trevisan (Treviño) from comment #38)

Any plans in backporting to the latest ESR?

That's unlikely to happen, since 78 ESR is close -- and also because this bug has been languishing for 6 years, not exactly making it a poster child for something that is backport-worthy.

Markus Schütz

Comment 40

•

4 years ago

Just updated this example [1] to DataView and tried it in Firefox 76, Firefox Nightly and Chrome 81. That worker now runs double as fast as previously and performance is on par with that of Chrome.

Great update, thanks!

[1] https://github.com/potree/potree/blob/develop/src/workers/BinaryDecoderWorker.js

Landry Breuil (:gaston)

Updated

•

4 years ago

Regressions: 1642883

Zhao Jiazhong

Updated

•

4 years ago

Blocks: 1642915

Jan de Mooij [:jandem]

Updated

•

4 years ago

Updated

•

4 years ago

No longer blocks: 1642915

Depends on: 1642915

Zhao Jiazhong

Updated

•

4 years ago

Blocks: 1642915

No longer depends on: 1642915

You need to log in before you can comment on or make changes to this bug.