js::jit::GenerateCode spends a lot of time in malloc lock contention
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Tracking
()
Performance Impact | medium |
People
(Reporter: denispal, Assigned: nbp)
References
(Depends on 1 open bug, Blocks 3 open bugs)
Details
(Whiteboard: [sp3])
You can see a lot of time spent trying to grow Vectors during code generation which leads to a lot of lock contention. This was measured during the NewsSite-Next test of Speedometer3.
Comment 1•4 months ago
|
||
The code generator has vectors for OsiPoint and safepoint indexes. I think we should be able to pre-allocate these vectors to the right size if we track how many of these there are before codegen.
Comment 2•4 months ago
|
||
Also maybe giving the vectors some inline capacity would help. We should collect some data to see how large they get for typical functions.
Updated•4 months ago
|
Assignee | ||
Comment 3•4 months ago
•
|
||
Looking at the profile, I see that:
# samples | Time Under | frame |
---|---|---|
308 | 104ms | js::jit::CompactBufferWriter::writeByte |
226 | 72ms | js_pod_arena_realloc<js::jit::OutOfLineCode*> |
153 | 43ms | mozilla::Vector<js::jit::OsiIndex, …>::growStorageBy |
148 | 42ms | mozilla::Vector<js::jit::CodegenSafepointIndex, …>::growStorageBy |
For the OutOfLineCode*
, I do not see any reason to have a vector, and it can probably be allocated in the LifoAlloc.
For the OsiIndex
and CodegenSafepointIndex
, we can count the number of OSI Point inserted an make a single allocation.
For the CompactBufferWriter
, which encapsulates the SafepointWriter
and SnapshotWriter
. These can probably be pre-allocated using some heuristics based the number of OSI Point and on the number ResumePoint operands.
Comment 4•4 months ago
|
||
Nicolas, maybe you can look into fixing this.
Updated•4 months ago
|
Assignee | ||
Updated•4 months ago
|
Comment 5•4 months ago
|
||
Conventional media will not tell you this:
One of the dependent patches (or all of them combined) have had some _relatively large _ improvement on Linux/Mac compared to Windows
WebAssembly Embenchen-compile : 2.4% on Linux / 1.2% on MacOS
Updated•4 months ago
|
Updated•4 months ago
|
Comment 6•3 months ago
|
||
With the three dependencies fixed, is it worth rechecking this now?
Reporter | ||
Comment 7•3 months ago
|
||
I have not done a direct before & after comparison, but I did recently measure Ion compile time as a whole and compared to my previous measurements in August we are much faster in RegAlloc and Codegen times.
August:
RegAlloc 11x worse than Maglev
Codegen 1.6x worse than Maglev
October:
RegAlloc 8x worse than Maglev
Codegen has parity with Maglev
So, these are really great improvements!
I collected a new profile that contains these fixes (https://share.firefox.dev/3YOVBVo) and thanks to this work I see most of the previous hot spots eliminated. There is still some malloc lock contention, but they are all coming from js::jit::CompactBufferWriter::writeByte
which presumably will be much harder to eliminate. Perhaps we can pre-allocate this vector based on the bytecode size of the function?
Comment 8•3 months ago
|
||
Glancing at the profile, it looks like the main uses of CompactBufferWriter are in these three per-CodeGenerator streams, so pre-allocation doesn't seem crazy.
Denis: Do we have any estimate of how much improvement we saw from reducing regalloc contention / how much we were allocating there vs here? In other words: how valuable do we think this would be?
Reporter | ||
Comment 9•3 months ago
|
||
(In reply to Iain Ireland [:iain] from comment #8)
Denis: Do we have any estimate of how much improvement we saw from reducing regalloc contention / how much we were allocating there vs here? In other words: how valuable do we think this would be?
It's hard to accurately estimate the impact since this work is done off-thread and subject to a lot of varying conditions. However, for SP3 I would estimate it will be very minor. Most of the time there is spent in Baseline, and improving Ion codegen any further would probably only be a minor improvement to overall Ion compile time. We're still spending most of that time in regalloc. This might be more impactful if we decide to lower Ion thresholds to get more functions into Ion sooner, but there are bigger issues at play.
Comment 10•3 months ago
|
||
Sounds reasonable. I'm going to bump this down in priority, since we've already extracted most of the benefit.
In the future, I do think that improvements to how we encode snapshots / safepoints / recover instructions are one of the more promising opportunities for reducing off-thread Ion compile time.
Assignee | ||
Comment 11•3 months ago
•
|
||
I collected a bunch of stats based on our test suite, in order to predict the size of the compact buffers, so far without success.
I am planning to give it more CPU time to see if we could find an upper approximation.
Updated•3 months ago
|
Description
•