Open Bug 1708419 Opened 3 years ago Updated 4 months ago

TeaVM wasm benchmark 2x slower than Chrome

Categories

(Core :: JavaScript: WebAssembly, defect, P3)

Firefox 90
x86_64
Unspecified
defect

Tracking

()

People

(Reporter: linuxhippy, Unassigned)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

I loaded the TeaVM wasm benchmark located here: http://www.teavm.org/live-examples/jbox2d-benchmark/teavm-wasm.html

Actual results:

Running on Firefox one benchmark round takes ~220ms on my ryzen laptop, while running on Chrome one iteration only takes 120-30ms.

The Bugbug bot thinks this bug should belong to the 'Core::Javascript: WebAssembly' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Javascript: WebAssembly
Product: Firefox → Core

I can verify the performance discrepancy but I have no analysis yet of where that comes from. Will keep it in wasm for now.

Severity: -- → S4
Status: UNCONFIRMED → NEW
Type: enhancement → defect
Ever confirmed: true
Priority: -- → P3
Hardware: Unspecified → x86_64

MacBookPro15,2 (ca late 2018) 13" 2.7GHz Quad i7, macOS 11.3, idle with no other applications running, compute times as recorded by benchmark. Likely this is the same system I used to report comment 2.

Current Firefox Nightly with fresh profile and no other tabs, 33 runs: average 202ms, median 197ms, min 173, max 237, std dev 15.9
Current Chrome release, no other tabs, 33 runs: average 137ms, median 127ms, min 127, max 159, std dev 6.5

So, the firefox/chrome ratio is a little better than reported on the ryzen (202/137 vs 220/125) but there's a clear difference.

Lenovo ThinkStation P710 Signature Edition, 2xQuad Xeon E5-2637 v4 3.50GHz, Win10 Pro 20H2 / 19042.928.

FF88 release, 33 runs: avg 202, median 196, min 112, max 266, stddev 36
Chrome 90 release, 33 runs: avg 159, median 160, min 133, max 189, stddev 13

I would say that the performance difference is real and not unique to the Ryzen, to a specific chip model, or to a specific OS. Still no clue about whether this is actually a wasm problem, will need to dig in to look at how timings are performed, for one thing.

Some obvious next steps to investigate, when we get around to it:

  • Julian has a hot-blocks profiler which unfortunately is linux-only, i think, but can be run on firefox to see what the inner loop looks like, maybe we'll see something obvious (eg indirect function calls, where we know we're worse)
  • I haven't tried to disable baseline to see if there's some problem with eg stubs always going via baseline even after tier-up
  • It may be possible to run this program in headless mode, because the JS looks both hand-written and very clean; that way we can do things more simply in the shell

Hot blocks when compiled with wasm baseline. Not particularly interesting
but added for completeness.

Hotblocks when compiled by Ion. This is a bit more interesting. Two things
I noticed:

(1) 7.8% of all insns go into a tiny function, at rank 0. It's a shame the upstream
compiler didn't inline it; at least I assume it exists in the incoming wasm. I think
I saw the equivalent in the baseline hotblocks.

(1 and a half) said block contains a redundant movl 11688(%r15),%ecx which
surely doesn't help. But since we're not in the game of store-to-load-forwarding
on linear memory accesses, I'm not sure we can easily get rid of it.

(2) Generally more concerning are the long blocks full of FP arithmetic that
follow. It looks like they use xmm0 through xmm5, yet contain quite some
spilling of the xmm registers. Indeed, in total 222 of the insns have the string
"xmm" and of those, 54 also have the string "rsp". So I'm wondering why that
FP spilling exists, given that there doesn't seem to be a shortage of xmm registers
at this point. Did the allocator make a poor choice of values to allocate to
xmm registers?

Possibly related to bug 1714280.

See Also: → 1714280
Blocks: 1755624
Blocks: sm-regalloc
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: