Register allocation and Ion compile times are too slow on Android during Speedometer3
Categories
(Core :: JavaScript Engine: JIT, defect, P2)
Tracking
()
Performance Impact | medium |
People
(Reporter: denispal, Unassigned)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
(Whiteboard: [sp3])
Attachments
(1 file)
112.09 KB,
image/png
|
Details |
Ion compile times can be very slow on Android. A significant problem we have running Speedometer3 is that we do not spend enough time in Ion during the short subtest windows and making these faster should theoretically help us reach Ion sooner for more functions.
I've attached some examples from perfetto of the NewsSite-Next subtest, where Ion compilations are taking up to 24ms but the subtest itself is only 83ms long. Since we start compiling so late due to thresholds, we don't usually have much time in this tier during execution at all.
A significant portion of the Ion compile time is spent during Register Allocation. Often times it's around 50% of the compile time, and sometimes can be as high as 80%+. It might be useful to experiment with a linear scan allocator to minimize this time.
Reporter | ||
Comment 1•5 months ago
|
||
A simpleperf profile of the compile times during the NewsSite-Next subtest: https://share.firefox.dev/4fOV4cq. Roughly 40% of the time is spent in regalloc.
Comment 2•5 months ago
|
||
Yulia was mentioning a research paper last week where they select which virtual registers gets a chance to be in a register by measuring the density of the virtual register uses in a given window around the studied instruction. Maybe a different approach like this one could be more efficient for a JIT.
Updated•5 months ago
|
Comment 3•5 months ago
|
||
It might be useful to experiment with a linear scan allocator to minimize
this time.
Building a new allocator and getting it production-ready is a big undertaking.
There are a couple of things we could try to make the existing allocator
modestly faster:
-
on mobile, skip the spill-bundle allocation loop
(tryAllocatingRegistersForSpillBundles
). Per comments at [1], this chews
up a bunch of time but almost never improves the allocation. -
we know that when allocating large functions, the RA causes a large number of
cache misses because it repeatedly traverses large AVL trees (of register
commitments). We could try to reduce the footprint of the trees by replacing
the inter-node pointers with 32-bit array indices -- a relatively easy
change. Or we could replace the trees with B-trees, which are claimed to be
more cache-friendly.
Yulia was mentioning a research paper last week where they select which
virtual registers gets a chance to be in a register by measuring the density
of the virtual register uses
The paper is an interesting read. If we do want to try out a new allocator
design, I think this might be worth trying instead of a linear-scan allocator.
[1] https://searchfox.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#4626
Comment 4•5 months ago
|
||
I am going to try Julian's suggestion above of skipping the call to tryAllocatingRegistersForSpillBundles
. I'm curious to see how much that changes compilation time and how that impacts the overall performance.
Comment 5•5 months ago
|
||
I tried this out and here is the comparison report (on Linux and Windows 11 at least - the A51 jobs never ran for some reason). At least on those platforms, it seems to not make much difference.
I think that I also ran the comparison on Pixel 6 - let me see if I can find that.
Updated•5 months ago
|
Updated•5 months ago
|
Description
•