Closed Bug 1846911 Opened 1 year ago Closed 11 months ago

14% regression on AWFY-Jetstream2-HashSet-wasm-Runtime around 1Aug2023

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- unaffected
firefox116 --- unaffected
firefox117 --- unaffected
firefox118 --- disabled
firefox119 --- wontfix
firefox120 --- wontfix

People

(Reporter: mayankleoboy1, Unassigned)

References

(Regression, )

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

Set release status flags based on info from the regressing bug 1571998

:yury, since you are the author of the regressor, bug 1571998, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

I suspect numbers were affected by https://phabricator.services.mozilla.com/D170824 change. Running before and after this patch increases time of this test about 0.374000 -> 0.388000 (+0.014) sec. The JS2 scores are counted as 5000 / time, which gives you bigger swing on short running tests.

A minor regression with bug 1571998 landing was expected, but not by 14%. I will investigate more locally this particular test. I cannot easily see if there are more tests regressed from this bug using perfherder UI.

Flags: needinfo?(ydelendik)
Attached file shell version of the test (obsolete) —

this test about 0.374000 -> 0.388000 (+0.014) sec.

The number were for the test I compiled from HashSet.c sources by new emscripted, but apparently Jetstream2 was using old version. After running it "as is" the numbers match CI: the before runtime is 1.123000 and after 1.221000 sec (which corresponds to 4.45 and 4.09 scores). Attaching shell test.

Attachment #9347208 - Attachment is obsolete: true

The following is tested on Aarch64 locally. What affected the scores, by running in the shell:

  • The machine code, before and after the patches, only added instructions from wasmMarkSlowCall() and CodeGenerator::visitWasmCall(), which is about 10% regression.
  • If I disable code emitted by wasmMarkSlowCall() -> mov x28, x28, I see change 1-2%
  • If I disable code emitted by masm.freeStackTo() -> sub x28, x29, #0xNN | mov sp, x28 in CodeGenerator::visitWasmCall(), I see change 8-9%.

Both needed in module calls a function that can perform a tail call.

See Also: → 1847480

The additional analysis from using perf stat on x64 from disabling freeStackTo in the CodeGenerator.cpp:

  • led to reduction 11,039,908,613 -> 10,803,949,176 instructions
  • it is extra 200,000 per call for benchmark()
  • there are 11,000 operations in the benchmark(); operations are different and may call other functions
  • that gives ~20 extra instruction per its sloc/operation

So looks like that all overhead is coming from newly introduced masm.freeStackTo().

:rhunt could this be triaged for priority/severity?

Flags: needinfo?(rhunt)

We're not certain this benchmark is representative of common usage, but we're going to see if we can introduce some light-weight whole module analysis to remove the overhead. This overhead only affects SM builds with tail call support enabled in the binary. Right now this is only nightly builds, we're not riding the trains to release yet.

I'm going to mark 118 as disabled then, as we won't be shipping in that release.

Severity: -- → S2
Flags: needinfo?(rhunt)
Priority: -- → P2

Here are the top 20 hottest basic blocks in the generated code, for Ion/x86_64.
These together are about 50% of the blocks executed. I've reordered them by
function and address for easier reading.

The hottest blocks (numbers 0, 1 and 2) are complete functions (!). The
benchmark is clearly very call-intensive and so the extra call overhead -- the
leaq -smallish_number(%rbp), %rsp, as generated by masm.freeStackTo -- is
relatively large.

No realistic workload would behave like this -- it would be extensively inlined
by the front-end (wasm-generating) compiler. If we really wanted to fix this,
one possibility is to roll our own lightweight inliner for Ion. But IMO,
certainly not before we can measure the overhead when running realistic
(AOT-inlined) workloads.

Duplicate of this bug: 1847478
Duplicate of this bug: 1847480

At this moment, we are agree to accept this regression. The bug 1849759 might help to reduce the effect.

I'm not sure if we want to close it as WONT FIX, or keep it open and reduce tracking that for upcoming versions.

I will make this an S3 bug since we are accepting the regression.

Severity: S2 → S3

Should we resolve this as WONTFIX?

Flags: needinfo?(ydelendik)

Yes, let's close it. We will track it somewhere else.

Status: NEW → RESOLVED
Closed: 11 months ago
Flags: needinfo?(ydelendik)
Resolution: --- → WONTFIX
See Also: → 1849759
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: