Open Bug 1913842 Opened 5 months ago Updated 4 months ago

Register allocation and Ion compile times are too slow on Android during Speedometer3

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

defect

Tracking

()

Performance Impact medium

People

(Reporter: denispal, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(1 file)

Ion compile times can be very slow on Android. A significant problem we have running Speedometer3 is that we do not spend enough time in Ion during the short subtest windows and making these faster should theoretically help us reach Ion sooner for more functions.

I've attached some examples from perfetto of the NewsSite-Next subtest, where Ion compilations are taking up to 24ms but the subtest itself is only 83ms long. Since we start compiling so late due to thresholds, we don't usually have much time in this tier during execution at all.

A significant portion of the Ion compile time is spent during Register Allocation. Often times it's around 50% of the compile time, and sometimes can be as high as 80%+. It might be useful to experiment with a linear scan allocator to minimize this time.

A simpleperf profile of the compile times during the NewsSite-Next subtest: https://share.firefox.dev/4fOV4cq. Roughly 40% of the time is spent in regalloc.

Yulia was mentioning a research paper last week where they select which virtual registers gets a chance to be in a register by measuring the density of the virtual register uses in a given window around the studied instruction. Maybe a different approach like this one could be more efficient for a JIT.

Blocks: sm-opt-jits
Severity: -- → S4
Priority: -- → P2

It might be useful to experiment with a linear scan allocator to minimize
this time.

Building a new allocator and getting it production-ready is a big undertaking.
There are a couple of things we could try to make the existing allocator
modestly faster:

  • on mobile, skip the spill-bundle allocation loop
    (tryAllocatingRegistersForSpillBundles). Per comments at [1], this chews
    up a bunch of time but almost never improves the allocation.

  • we know that when allocating large functions, the RA causes a large number of
    cache misses because it repeatedly traverses large AVL trees (of register
    commitments). We could try to reduce the footprint of the trees by replacing
    the inter-node pointers with 32-bit array indices -- a relatively easy
    change. Or we could replace the trees with B-trees, which are claimed to be
    more cache-friendly.

Yulia was mentioning a research paper last week where they select which
virtual registers gets a chance to be in a register by measuring the density
of the virtual register uses

The paper is an interesting read. If we do want to try out a new allocator
design, I think this might be worth trying instead of a linear-scan allocator.

[1] https://searchfox.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#4626

I am going to try Julian's suggestion above of skipping the call to tryAllocatingRegistersForSpillBundles. I'm curious to see how much that changes compilation time and how that impacts the overall performance.

I tried this out and here is the comparison report (on Linux and Windows 11 at least - the A51 jobs never ran for some reason). At least on those platforms, it seems to not make much difference.

I think that I also ran the comparison on Pixel 6 - let me see if I can find that.

Whiteboard: [sp3]
Depends on: 1922073
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: