Optimize native DataView or self host it.
Categories
(Core :: JavaScript Engine, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox78 | --- | fixed |
People
(Reporter: mbx, Assigned: anba)
References
Details
(Keywords: perf)
Attachments
(9 files)
8.94 KB,
application/x-javascript
|
Details | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Reporter | ||
Updated•10 years ago
|
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
Comment 3•10 years ago
|
||
Comment 4•10 years ago
|
||
Comment 5•10 years ago
|
||
Comment 6•10 years ago
|
||
Comment 7•9 years ago
|
||
Comment 8•8 years ago
|
||
Reporter | ||
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
Comment 11•8 years ago
|
||
Comment 12•8 years ago
|
||
Comment 13•8 years ago
|
||
Comment 14•8 years ago
|
||
Comment 15•6 years ago
|
||
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Comment 18•6 years ago
|
||
Comment 19•6 years ago
|
||
Comment 20•6 years ago
|
||
Comment 21•6 years ago
|
||
Comment 22•6 years ago
|
||
Comment 23•5 years ago
|
||
A user reported[1] Firefox being 40x slower than Chrome on a DataView benchmark[2]. I collected a profile[3] while running DataView part of the test. Is this the right bug to use for addressing the performance of DataView?
Comment 24•5 years ago
|
||
Is this the right bug to use for addressing the performance of DataView?
Yes, it is.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 26•4 years ago
|
||
Following bug 1496378, update MIR nodes to mention ArrayBufferView instead of
TypedArray. Later patches in this stack will make use of them for DataView
objects.
Assignee | ||
Comment 27•4 years ago
|
||
x86/x64:
- Implement the 2-byte swap instructions using
rol
, matching the output of
Clang for the__builtin_bswap16
built-in. - 4-byte and 8-byte swaps are implemented through
bswap
. - The actual
rol
andbswap
functions are derived from upstream.
arm32:
- Implement 2-byte swap with sign extend using
revsh
. - Implement 2-byte swap with zero extend using
rev16
followed byuxth
. - 4-byte and 8-byte swaps are implemented through
rev
.
arm64:
- Implement 2-byte swap using
rev16
followed by eithersxth
oruxth
. - 4-byte and 8-byte swaps are implemented through
rev
.
Depends on D74619
Assignee | ||
Comment 28•4 years ago
|
||
Depends on D74620
Assignee | ||
Comment 29•4 years ago
|
||
Change LInt64Definition to a proper class and add a BogusTemp
method to it.
This will be used in the next patches.
Depends on D74621
Assignee | ||
Comment 30•4 years ago
|
||
The implementation of LoadDataViewElement
follows LoadUnboxedScalar
,
therefore most LoadDataViewElement
functions were placed next to their
counterparts from LoadUnboxedScalar
.
MCallOptimize:
- The two single-byte getters
getInt8
andgetUint8
are directly inlined
throughLoadUnboxedScalar
. - This let's us avoid duplicating the single byte extra cases in Lowering.
- And
MLoadDataViewElement
doesn't need to save the otherwise unused
littleEndian
parameter.
Lowering:
- Optimises the case when
littleEndian
is a constant value, so we can avoid
branching around the byte swap instructions in most cases. - As usual, x86 needs special-casing because there are too few registers
available.
I didn't bother to implement the following two optimisations for
LoadUnboxedScalar
to LoadDataViewElement
:
AnalyzeLoadUnboxedScalar
from EffectiveAddressAnalysis- Constant index optimisation in lowering/codegen
Both optimisations are currently unavailable due to Spectre index masking,
so it didn't seem worthwhile to spend time to support them for
LoadDataViewElement
.
Depends on D74622
Assignee | ||
Comment 31•4 years ago
|
||
Similar to the previous part, inline the setter functions based on
the current StoreUnboxedScalar
support.
Depends on D74623
Assignee | ||
Comment 32•4 years ago
|
||
Depends on D74624
Assignee | ||
Comment 33•4 years ago
|
||
Depends on D74625
Comment 34•4 years ago
|
||
Comment 35•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/420e72fb4f83
https://hg.mozilla.org/mozilla-central/rev/2b5bf053877a
https://hg.mozilla.org/mozilla-central/rev/abb6d449fd6a
https://hg.mozilla.org/mozilla-central/rev/0822e88eb009
https://hg.mozilla.org/mozilla-central/rev/d36328fd387d
https://hg.mozilla.org/mozilla-central/rev/5bd1d18fe1c2
https://hg.mozilla.org/mozilla-central/rev/e0d13e5b07a6
https://hg.mozilla.org/mozilla-central/rev/9ba7a0d69b96
Comment 36•4 years ago
|
||
Thank you! I can definitely see the speed difference.
Unfortunately my typecasting code is still somehow faster (on Chrome as well), but if this landed a couple of years ago I'd probably be using it :P
Comment 37•4 years ago
|
||
...and then I realized that I am forgetting this will only be in 78 and I wasn't even testing it.
Currently on 76, reading with DataView seems to be about 4 times slower than Chrome, while writing is faster, not sure why.
Would be interesting to see how this optimization works.
Right now it is still faster on both browsers to explicitly call for example a function that takes 4 numbers, sets 4 indices of a Uint8Array, and returns the resulting number from a Float32Array that looks at the same buffer, all preallocated at script load time.
Comment 38•4 years ago
|
||
Any plans in backporting to the latest ESR?
Comment 39•4 years ago
|
||
(In reply to Marco Trevisan (Treviño) from comment #38)
Any plans in backporting to the latest ESR?
That's unlikely to happen, since 78 ESR is close -- and also because this bug has been languishing for 6 years, not exactly making it a poster child for something that is backport-worthy.
Comment 40•4 years ago
|
||
Just updated this example [1] to DataView and tried it in Firefox 76, Firefox Nightly and Chrome 81. That worker now runs double as fast as previously and performance is on par with that of Chrome.
Great update, thanks!
[1] https://github.com/potree/potree/blob/develop/src/workers/BinaryDecoderWorker.js
Updated•4 years ago
|
Updated•4 years ago
|
Description
•