Code generated by Simple Allocator is ~2x slower on many tests of Bug 1884572 (https://gorhill.github.io/lz4-wasm/test/index.html ) for both JS and WASM
Categories
(Core :: JavaScript Engine, task, P5)
Tracking
()
People
(Reporter: mayankleoboy1, Unassigned)
References
Details
Attachments
(1 file, 1 obsolete file)
775.99 KB,
application/x-zip-compressed
|
Details |
Go to https://gorhill.github.io/lz4-wasm/test/index.html
Unrar the attached sample text and input it there.
compare with Simple Allocator Vs Backtracking Allocator
BT Alloc:
Compress:
- zhipeng-jia/snappyjs x 8.55 ops/sec ±0.54% (26 runs sampled): 692 MB/s
- pierrec/node-lz4 x 6.99 ops/sec ±4.29% (21 runs sampled): 566 MB/s
- gorhill/lz4-block.js x 15.63 ops/sec ±1.32% (30 runs sampled): 1266 MB/s
- gorhill/lz4-block.wasm x 11.96 ops/sec ±0.30% (34 runs sampled): 968 MB/s
Done.
Uncompress:
- zhipeng-jia/snappyjs x 10.25 ops/sec ±0.99% (29 runs sampled): 830 MB/s
- pierrec/node-lz4 x 7.84 ops/sec ±3.92% (24 runs sampled): 634 MB/s
- gorhill/lz4-block.js x 16.39 ops/sec ±2.51% (32 runs sampled): 1327 MB/s
- gorhill/lz4-block.wasm x 43.62 ops/sec ±0.44% (47 runs sampled): 3532 MB/s
Done.
Simple Regalloc:
Compress:
- zhipeng-jia/snappyjs x 5.80 ops/sec ±0.47% (19 runs sampled): 469 MB/s
- pierrec/node-lz4 x 4.49 ops/sec ±6.79% (16 runs sampled): 363 MB/s
- gorhill/lz4-block.js x 9.56 ops/sec ±0.83% (28 runs sampled): 774 MB/s
- gorhill/lz4-block.wasm x 10.37 ops/sec ±0.24% (30 runs sampled): 839 MB/s
Done.
Uncompress:
- zhipeng-jia/snappyjs x 5.78 ops/sec ±1.47% (19 runs sampled): 468 MB/s
- pierrec/node-lz4 x 4.74 ops/sec ±1.31% (16 runs sampled): 383 MB/s
- gorhill/lz4-block.js x 7.92 ops/sec ±0.35% (24 runs sampled): 641 MB/s
- gorhill/lz4-block.wasm x 26.69 ops/sec ±0.22% (37 runs sampled): 2161 MB/s
Done.
This benchmark is known to be regalloc heavy (see analysis comments in bug 1884572). But maybe the JS tests shouldnt be slower with the Simple Allocator?
I dont think this is a regression, so i will mark it as blocking bug 1958280.
Reporter | ||
Comment 1•14 days ago
|
||
Reporter | ||
Comment 2•14 days ago
•
|
||
Samply profiles:
Simple allocator: https://share.firefox.dev/42ZsGii (52s)
BT Allocator: https://share.firefox.dev/4iKiRdC (50s)
So despite all the ~2x slowdowns that the bench says, the total time difference is only 2 seconds on a base of 50 seconds (i.e. 4%)
Maybe the title of the bug should be "Code generated by Simple Allocator is ~2x slower than code generated by BT Allocator"
Reporter | ||
Updated•13 days ago
|
Comment 3•12 days ago
|
||
I can't unrar the attachment on Linux from the file browser ("Declared dictionary size is not supported").
The profile shows we're spending most time in some very tight loops and that's where the simple allocator performs most poorly, so these results are not entirely unexpected. We could check if there's low-hanging fruit but this isn't high priority for now.
Reporter | ||
Updated•12 days ago
|
Reporter | ||
Comment 4•12 days ago
|
||
Updated•12 days ago
|
Description
•