Open Bug 1869758 Opened 10 months ago Updated 7 months ago

WASM performance gap of 40% between Firefox and Safari

Categories

(Core :: JavaScript: WebAssembly, defect, P3)

Firefox 121
defect

Tracking

()

UNCONFIRMED

People

(Reporter: christian.speckner, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Attached file repro.img.gz

Steps to reproduce:

  1. Go to https://cloudpilot-emu.github.io/uarm-preview/
  2. gunzip the attached SD image and select it as SD
  3. Download https://palmdb.net/content/files/archive-rom/palm-roms-complete/Palm-Tungsten-E2-nand.bin and select it as NAND
  4. Download https://palmdb.net/content/files/archive-rom/palm-roms-complete/Palm-Tungsten-E-nor.bin and select it as NOR
  5. Wait for the device to boot, then click "home" (the house at the bottom) , choose "CARD" from the picker at the top right and start "Bike or Die!"
  6. Click "Continue in trial mode", wait for the game to load, then observe the "limit MIPS" at the right while the animation is playing, which is a performance indicator

Actual results:

Safari is faster by about 40%

Expected results:

Safari and FF should be roughly on par.

I am developing an emulator for PalmOS 5 PDAs that runs in the browser via WASM (compiled from C/C++ via emscripten). This emulates an ARMv5 CPU, which is very CPU heavy. In my tests, FF performs almost 40% worse than Safari on my Macbook Pro M1. I know that it is hard to compare different architectures and CPUs, but benchmarks from other people running FF on Windows on comparable x64 machines point to similar numbers, which indicates that the issue is not particular to MacOS / ARM64, but a general performance gap between Spidermonkey and JavascriptCore on my code. I also have one benchmark from MacOS on a 2019 Core i7 which shows the same performance gap.

The "limit MIPS" is calculated from the time required for the host to emulate 1/50 second worth of ARM instructions, so it is an accurate speed indicator. The emulator does not run while the virtual ARM is sleeping (and does other tricks to avoid ARM execution), so you need to run an app like Bike or Die that is busy executing ARM code to get serious numbers.

Component: Untriaged → JavaScript: WebAssembly
Product: Firefox → Core

Can you create profile using https://profiler.firefox.com/ and share it (top right) ?

Here's two profiles of the Bike or Die intro running, one of the public WASM binary, and one of a build that includes symbols:

You can find the source code that goes with the symbols here: https://github.com/cloudpilot-emu/cp-uarm/tree/9e2af01cef90ebd36eeaa81d7ab5ace4d1f225c7

The WASM code runs on a web worker.

Fwiw, while generating the profile I found a code path that was reading from stdin periodically, amounting to about 5% of the total time. With this removed the gap between Firefox and Safari is now more like 30%.

Sorry, I just reread my report and noticed that I linked the wrong NOR file. The correct link is https://palmdb.net/content/files/archive-rom/palm-roms-complete/Palm-Tungsten-E2-nor.bin

The NOR that I originally linked will crash the emulator. As the page stores the NOR in indexeddb and automatically reloads it this may render the page unable to start correctly. You can reload the page with https://cloudpilot-emu.github.io/uarm-preview/?noload in this case to skip autoload. Sorry for the mess.

I wonder if your code is running in our baseline tier and not having an opportunity to tier-up to an optimized version. We only do a check to tier up to optimized code on function entry. So if you have a long running loop of performance sensitive code (such as an interpreter) you will not be able to run the optimized code unless the function returns and is re-entered. The profiles you link show a long running function with a lot of self-time, which would give some evidence to that.

Can you go to about:config and set javascript.options.wasm_baselinejit = false and then re-run and compare to Safari? The result won't be exactly fair, but would help indicate if this was the issue.

Flags: needinfo?(christian.speckner)

Thanks for your response!

I just checled, disabling the baseline JIT indeed gives a slight performance increase, but the gap is still significant. To give numbers, I get about 61 - 62 MIPS with the default settings, and about 63-65 MIPS with javascript.options.wasm_baselinejit = false . As a reference, Safari currently runs at about 80 MIPS.

While the emulator runs an endless loop, it does not loop inside web assembly, but rather calls into WASM from JS to run the emulator for a timeslice of approx. 200 msec worth of ARM instructions. Once WASM returns the code yields to the event loop either via setTimeout (to sleep the remaining time) or via setImmediate (if the remaining time is below a threshold). I use https://github.com/yuzujs/setImmediate to shim setImmediate, which uses a MessageChannel. The next timeslice is then executed when the scheduled JS callback is called.

Flags: needinfo?(christian.speckner)

The severity field is not set for this bug.
:rhunt, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(rhunt)
Severity: -- → S3
Flags: needinfo?(rhunt)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: