14% regression on AWFY-Jetstream2-HashSet-wasm-Runtime around 1Aug2023
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox-esr115 | --- | unaffected |
firefox116 | --- | unaffected |
firefox117 | --- | unaffected |
firefox118 | --- | disabled |
firefox119 | --- | wontfix |
firefox120 | --- | wontfix |
People
(Reporter: mayankleoboy1, Unassigned)
References
(Regression, )
Details
(Keywords: regression)
Attachments
(2 files, 1 obsolete file)
Suspect: bug 1571998
Comment 1•1 year ago
|
||
Set release status flags based on info from the regressing bug 1571998
:yury, since you are the author of the regressor, bug 1571998, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Comment 2•1 year ago
|
||
I suspect numbers were affected by https://phabricator.services.mozilla.com/D170824 change. Running before and after this patch increases time of this test about 0.374000 -> 0.388000 (+0.014) sec. The JS2 scores are counted as 5000 / time
, which gives you bigger swing on short running tests.
A minor regression with bug 1571998 landing was expected, but not by 14%. I will investigate more locally this particular test. I cannot easily see if there are more tests regressed from this bug using perfherder UI.
Comment 3•1 year ago
|
||
Comment 4•1 year ago
|
||
this test about 0.374000 -> 0.388000 (+0.014) sec.
The number were for the test I compiled from HashSet.c sources by new emscripted, but apparently Jetstream2 was using old version. After running it "as is" the numbers match CI: the before runtime is 1.123000 and after 1.221000 sec (which corresponds to 4.45 and 4.09 scores). Attaching shell test.
Comment 5•1 year ago
|
||
Comment 6•1 year ago
|
||
The following is tested on Aarch64 locally. What affected the scores, by running in the shell:
- The machine code, before and after the patches, only added instructions from wasmMarkSlowCall() and CodeGenerator::visitWasmCall(), which is about 10% regression.
- If I disable code emitted by wasmMarkSlowCall() ->
mov x28, x28
, I see change 1-2% - If I disable code emitted by masm.freeStackTo() ->
sub x28, x29, #0xNN | mov sp, x28
in CodeGenerator::visitWasmCall(), I see change 8-9%.
Both needed in module calls a function that can perform a tail call.
Comment 7•1 year ago
|
||
The additional analysis from using perf stat
on x64 from disabling freeStackTo
in the CodeGenerator.cpp:
- led to reduction 11,039,908,613 -> 10,803,949,176 instructions
- it is extra 200,000 per call for benchmark()
- there are 11,000 operations in the benchmark(); operations are different and may call other functions
- that gives ~20 extra instruction per its sloc/operation
So looks like that all overhead is coming from newly introduced masm.freeStackTo().
Comment 9•1 year ago
|
||
We're not certain this benchmark is representative of common usage, but we're going to see if we can introduce some light-weight whole module analysis to remove the overhead. This overhead only affects SM builds with tail call support enabled in the binary. Right now this is only nightly builds, we're not riding the trains to release yet.
I'm going to mark 118 as disabled then, as we won't be shipping in that release.
Updated•1 year ago
|
Comment 10•1 year ago
|
||
Here are the top 20 hottest basic blocks in the generated code, for Ion/x86_64.
These together are about 50% of the blocks executed. I've reordered them by
function and address for easier reading.
The hottest blocks (numbers 0, 1 and 2) are complete functions (!). The
benchmark is clearly very call-intensive and so the extra call overhead -- the
leaq -smallish_number(%rbp), %rsp
, as generated by masm.freeStackTo
-- is
relatively large.
No realistic workload would behave like this -- it would be extensively inlined
by the front-end (wasm-generating) compiler. If we really wanted to fix this,
one possibility is to roll our own lightweight inliner for Ion. But IMO,
certainly not before we can measure the overhead when running realistic
(AOT-inlined) workloads.
Comment 13•1 year ago
|
||
At this moment, we are agree to accept this regression. The bug 1849759 might help to reduce the effect.
I'm not sure if we want to close it as WONT FIX, or keep it open and reduce tracking that for upcoming versions.
Comment 14•1 year ago
|
||
I will make this an S3 bug since we are accepting the regression.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 16•11 months ago
|
||
Yes, let's close it. We will track it somewhere else.
Description
•