Use a better representation for liveIn BitSets in register allocator
Categories
(Core :: JavaScript Engine: JIT, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox133 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
References
Details
(Keywords: perf-alert)
Attachments
(1 file)
The register allocator allocates a BitSet for each basic block, with a bit for each virtual register. For very large graphs this is both slow and wasteful because these bit sets will be very sparse.
I have patches that change this to a different bit set representation based on an InlineMap
, storing 32 bits for each entry. Most maps end up with a relatively small number (<= 8) of entries.
This improves Ion compilation time for the Wasm module in bug 1916442 from ~7.8 seconds to ~5.4 seconds in the JS shell on Linux x64.
Assignee | ||
Comment 1•24 days ago
|
||
The register allocator currently allocates a BitSet for each basic block, with a bit
for each virtual register. For very large graphs this is both slow and wasteful because
these bit sets will be very sparse.
This patch adds a SparseBitSet
class that uses an InlineMap
, storing 32 bits per entry.
Most maps end up with a relatively small number of entries and these can be stored
inline.
This improves Ion compilation time for the Wasm module in bug 1916442 from ~7.8 seconds
to ~5.4 seconds in the JS shell on Linux x64.
Cranelift made a similar change to their regalloc2 allocator.
Updated•22 days ago
|
Comment 3•16 days ago
|
||
bugherder |
Comment 5•8 days ago
|
||
(In reply to Pulsebot from comment #2)
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/20d0b0e92cd3
Add jit::SparseBitSet and use it for virtual register bit sets. r=jseward
Perfherder has detected a talos performance change from push 20d0b0e92cd39555b05b12b7b31659b0466f536f.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|
5% | pdfpaint issue16782.pdf | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 400.50 -> 378.85 |
5% | pdfpaint issue3666.pdf | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 675.60 -> 642.67 |
4% | pdfpaint issue16782.pdf | macosx1015-64-shippable-qr | e10s fission stylo webrender | 397.81 -> 381.26 |
3% | pdfpaint issue3591.pdf | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 581.38 -> 562.29 |
3% | pdfpaint issue5481.pdf | macosx1015-64-shippable-qr | e10s fission stylo webrender-sw | 487.56 -> 475.07 |
... | ... | ... | ... | ... |
2% | pdfpaint issue5549.pdf | linux1804-64-shippable-qr | e10s fission stylo webrender | 533.64 -> 522.94 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 2301
For more information on performance sheriffing please see our FAQ.
Description
•