Minor tuning of the wasm baseline compiler
Categories
(Core :: JavaScript: WebAssembly, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox104 | --- | fixed |
People
(Reporter: jseward, Assigned: jseward)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
On x86_64-linux, running as part of the browser, as compiled by clang-13.0.0
at -O2. When compiling a LO nightly build from
https://wasm-test.libreoffice.org. There are many calls to four tiny
functions, which could profitably be either marked inline, or failing that, be
split into an always-inline fast path and a never-inline slow path.
BI = billion instructions
Total cost of compilation:
js::wasm::BaselineCompileFunctions 21.22 BI
of which
js::jit::X86Encoding::BaseAssembler::X86InstructionFormatter::memoryModRM
1.235 BI, 39.09 million calls
31.6 insns per call
js::wasm::IsHugeMemoryEnabled
0.582 BI, 28.04 million calls
20.8 insns per call
js::wasm::CheckIsSubtypeOf
1.156 BI, 26.54 million calls
43.6 insns per call
js::wasm::TypeContext::isSubtypeOf
0.176 BI, 26.28 million calls
6.7 insns per call (!)
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
As a result of profiling the baseline compiler compiling Photoshop, here are
two minor bits of tuning:
-
For
CatchInfoVector
, use an inline size of one rather than zero. Using a
zero inline size produces a heap allocation if astruct Control
is called
on to deal with even a single try-catch construction. Setting it to one
removes a few allocations when compiling Photoshop. Setting it higher does
not appear to have any effect. -
wasm::IsHugeMemoryEnabled
is improved. This is more complex than it looks
because (I think) it deals with atomics.-
don't query both
IsHugeMemoryEnabledHelper32
and
IsHugeMemoryEnabledHelper64
on each call -
mark the latter two no-inline, so that the complex case (
enabled32
or
enabled64
has not been set) is not inlined into
wasm::IsHugeMemoryEnabled
.
-
Results are:
-
1.5% reduction in user time compiling Photoshop
(Core i5 1135G7 @ 4.2 GHz, best of 40 runs) -
1.35% reduction in instruction count
-
0.59% reduction in memory accesses
Assignee | ||
Comment 2•3 years ago
|
||
Some further performance numbers with the comment 1 patch:
after after
-Million-Insns- -user-time-s- wasmsize insns IPC
filename before after before after kbytes per byte @ 4.2 GHz
photoshop.wasm 17,409 17,156 1.698 1.669 83235 206.1 2.45
autocad.wasm 11,948 11,884 1.111 1.107 53999 220.1 2.56
clang.wasm 6,141 6,107 0.577 0.577 46715 130.7 2.52
earth.wasm 4,129 4,104 0.381 0.383 21285 192.8 2.55
IPC measured on an Intel Core i5 1135G7. User times are the best of 40 runs.
So the patch is reliably a win on an insn counts basis, and looks like a tiny
improvement or neutral on a user time basis.
I took the opportunity to compute insns/wasm-byte and also the achieved IPC on
my machine. Interestingly, when compiling photoshop.wasm with Ion, the
machine runs at about 0.85 IPC, which means that Ion is interacting very badly
with the microarchitecture. For that case it compiles 50 x more slowly than
baseline (84.0 seconds vs 1.669 seconds).
Comment 4•3 years ago
|
||
bugherder |
Description
•