1922073 - js::jit::GenerateCode spends a lot of time in malloc lock contention

Reporter

Description

•

4 months ago

You can see a lot of time spent trying to grow Vectors during code generation which leads to a lot of lock contention. This was measured during the NewsSite-Next test of Speedometer3.

https://share.firefox.dev/3NakO5Y

Jan de Mooij [:jandem]

Comment 1

•

4 months ago

The code generator has vectors for OsiPoint and safepoint indexes. I think we should be able to pre-allocate these vectors to the right size if we track how many of these there are before codegen.

Jan de Mooij [:jandem]

Comment 2

•

4 months ago

Also maybe giving the vectors some inline capacity would help. We should collect some data to see how large they get for typical functions.

Jan de Mooij [:jandem]

Updated

•

4 months ago

Flags: needinfo?(jdemooij)

Nicolas B. Pierron [:nbp]

Assignee

Comment 3

•

4 months ago

•

Edited

Looking at the profile, I see that:

# samples	Time Under	frame
308	104ms	js::jit::CompactBufferWriter::writeByte
226	72ms	js_pod_arena_realloc<js::jit::OutOfLineCode*>
153	43ms	mozilla::Vector<js::jit::OsiIndex, …>::growStorageBy
148	42ms	mozilla::Vector<js::jit::CodegenSafepointIndex, …>::growStorageBy

For the OutOfLineCode*, I do not see any reason to have a vector, and it can probably be allocated in the LifoAlloc.

For the OsiIndex and CodegenSafepointIndex, we can count the number of OSI Point inserted an make a single allocation.

For the CompactBufferWriter, which encapsulates the SafepointWriter and SnapshotWriter. These can probably be pre-allocated using some heuristics based the number of OSI Point and on the number ResumePoint operands.

Jan de Mooij [:jandem]

Comment 4

•

4 months ago

Nicolas, maybe you can look into fixing this.

Flags: needinfo?(jdemooij) → needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 months ago

Depends on: 1922259

Steven DeTar [:sdetar]

Updated

•

4 months ago

Blocks: sm-runtime

Severity: -- → S3

Priority: -- → P2

Whiteboard: [js-perf-next]

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 months ago

Depends on: 1922829

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 months ago

Assignee: nobody → nicolas.b.pierron

Status: NEW → ASSIGNED

Flags: needinfo?(nicolas.b.pierron)

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

4 months ago

Depends on: 1923090

Mayank Bansal

Comment 5

•

4 months ago

Conventional media will not tell you this:
One of the dependent patches (or all of them combined) have had some _relatively large _ improvement on Linux/Mac compared to Windows

WebAssembly Embenchen-compile : 2.4% on Linux / 1.2% on MacOS

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

4 months ago

Whiteboard: [js-perf-next] → [js-perf-next][sp3]

Jira Integration Bot

Updated

•

4 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-819

Mayank Bansal

Comment 6

•

3 months ago

With the three dependencies fixed, is it worth rechecking this now?

Flags: needinfo?(dpalmeiro)

Denis Palmeiro [:denispal]

Reporter

Comment 7

•

3 months ago

I have not done a direct before & after comparison, but I did recently measure Ion compile time as a whole and compared to my previous measurements in August we are much faster in RegAlloc and Codegen times.

August:
RegAlloc 11x worse than Maglev
Codegen 1.6x worse than Maglev

October:
RegAlloc 8x worse than Maglev
Codegen has parity with Maglev

So, these are really great improvements!

I collected a new profile that contains these fixes (https://share.firefox.dev/3YOVBVo) and thanks to this work I see most of the previous hot spots eliminated. There is still some malloc lock contention, but they are all coming from js::jit::CompactBufferWriter::writeByte which presumably will be much harder to eliminate. Perhaps we can pre-allocate this vector based on the bytecode size of the function?

Flags: needinfo?(dpalmeiro)

Iain Ireland [:iain]

Comment 8

•

3 months ago

Glancing at the profile, it looks like the main uses of CompactBufferWriter are in these three per-CodeGenerator streams, so pre-allocation doesn't seem crazy.

Denis: Do we have any estimate of how much improvement we saw from reducing regalloc contention / how much we were allocating there vs here? In other words: how valuable do we think this would be?

Denis Palmeiro [:denispal]

Reporter

Comment 9

•

3 months ago

(In reply to Iain Ireland [:iain] from comment #8)

Denis: Do we have any estimate of how much improvement we saw from reducing regalloc contention / how much we were allocating there vs here? In other words: how valuable do we think this would be?

It's hard to accurately estimate the impact since this work is done off-thread and subject to a lot of varying conditions. However, for SP3 I would estimate it will be very minor. Most of the time there is spent in Baseline, and improving Ion codegen any further would probably only be a minor improvement to overall Ion compile time. We're still spending most of that time in regalloc. This might be more impactful if we decide to lower Ion thresholds to get more functions into Ion sooner, but there are bigger issues at play.

Iain Ireland [:iain]

Comment 10

•

3 months ago

Sounds reasonable. I'm going to bump this down in priority, since we've already extracted most of the benefit.

In the future, I do think that improvements to how we encode snapshots / safepoints / recover instructions are one of the more promising opportunities for reducing off-thread Ion compile time.

Priority: P2 → P3

Nicolas B. Pierron [:nbp]

Assignee

Comment 11

•

3 months ago

•

Edited

I collected a bunch of stats based on our test suite, in order to predict the size of the compact buffers, so far without success.
I am planning to give it more CPU time to see if we could find an upper approximation.

Iain Ireland [:iain]

Updated

•

3 months ago

Whiteboard: [js-perf-next][sp3] → [sp3]

Nicolas B. Pierron [:nbp]

Assignee

Updated

•

1 month ago

Depends on: 1940069

Bugzilla

js::jit::GenerateCode spends a lot of time in malloc lock contention

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

Tracking

()

People

(Reporter: denispal, Assigned: nbp)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

(Whiteboard: [sp3])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Updated

Updated

Updated

Updated

Updated

Comment 5

Updated

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Updated