Closed Bug 1920430 Opened 24 days ago Closed 16 days ago

Use a better representation for liveIn BitSets in register allocator

Categories

(Core :: JavaScript Engine: JIT, task, P2)

task

Tracking

()

RESOLVED FIXED
133 Branch
Tracking Status
firefox133 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

Details

(Keywords: perf-alert)

Attachments

(1 file)

The register allocator allocates a BitSet for each basic block, with a bit for each virtual register. For very large graphs this is both slow and wasteful because these bit sets will be very sparse.

I have patches that change this to a different bit set representation based on an InlineMap, storing 32 bits for each entry. Most maps end up with a relatively small number (<= 8) of entries.

This improves Ion compilation time for the Wasm module in bug 1916442 from ~7.8 seconds to ~5.4 seconds in the JS shell on Linux x64.

Depends on: 1920433

The register allocator currently allocates a BitSet for each basic block, with a bit
for each virtual register. For very large graphs this is both slow and wasteful because
these bit sets will be very sparse.

This patch adds a SparseBitSet class that uses an InlineMap, storing 32 bits per entry.
Most maps end up with a relatively small number of entries and these can be stored
inline.

This improves Ion compilation time for the Wasm module in bug 1916442 from ~7.8 seconds
to ~5.4 seconds in the JS shell on Linux x64.

Cranelift made a similar change to their regalloc2 allocator.

Severity: -- → N/A
Priority: -- → P2
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/20d0b0e92cd3 Add jit::SparseBitSet and use it for virtual register bit sets. r=jseward
Status: ASSIGNED → RESOLVED
Closed: 16 days ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch

2.2% improvement on AWFY-webassembly-embenchen-box2d

(In reply to Pulsebot from comment #2)

Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/20d0b0e92cd3
Add jit::SparseBitSet and use it for virtual register bit sets. r=jseward

Perfherder has detected a talos performance change from push 20d0b0e92cd39555b05b12b7b31659b0466f536f.

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
5% pdfpaint issue16782.pdf macosx1015-64-shippable-qr e10s fission stylo webrender-sw 400.50 -> 378.85
5% pdfpaint issue3666.pdf macosx1015-64-shippable-qr e10s fission stylo webrender-sw 675.60 -> 642.67
4% pdfpaint issue16782.pdf macosx1015-64-shippable-qr e10s fission stylo webrender 397.81 -> 381.26
3% pdfpaint issue3591.pdf macosx1015-64-shippable-qr e10s fission stylo webrender-sw 581.38 -> 562.29
3% pdfpaint issue5481.pdf macosx1015-64-shippable-qr e10s fission stylo webrender-sw 487.56 -> 475.07
... ... ... ... ...
2% pdfpaint issue5549.pdf linux1804-64-shippable-qr e10s fission stylo webrender 533.64 -> 522.94

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 2301

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: