Open Bug 1927640 Opened 4 months ago Updated 1 month ago

Loading/starting a small game takes 10s doing regalloc and Module loading on TaskController threads (and lots of time in WASM-Baseline in main thread))

Categories

(Core :: JavaScript: WebAssembly, task, P2)

task

Tracking

()

People

(Reporter: mayankleoboy1, Unassigned)

References

(Blocks 3 open bugs, )

Details

Attachments

(1 file)

Go to https://www.crazygames.com/game/cell-to-singularity-mesozoic-valley
Click on "Play Now"
Let it download and load
Once it loads, click on "Play". A spinning geode will appear.
Let the geode do its stuff

AR:
First Run: https://share.firefox.dev/3Ut4tgC
Second Run: https://share.firefox.dev/3Untb2c

Attached file about:support
Severity: -- → S3
Priority: -- → P2

It looks like the wasm file [1] is about 50MiB uncompressed. The game might be small, but the wasm file definitely is not :)

[1] files.crazygames.com/cell-to-singularity-mesozoic-valley/2/One Save Build/Build/c6ec891b745073b6515391bb7a4fae8b.wasm.br

How much time does Chrome take here? I couldn't find that out accurately using their built-in profiler.

I am not sure about Chrome, I've never used their profiling tools before. V8 now defaults to fully lazy compilation, so they won't compile a function until it's called for the first time, and they also won't optimize the function until it's called a lot. So it's difficult to compare our compile time vs. theirs.

When we ship lazy tiering, that will help get rid of the massive background compile.

Is the concern in this bug just about the background compile or something else like frame rate/startup time?

(In reply to Ryan Hunt [:rhunt] from comment #4)

Is the concern in this bug just about the background compile or something else like frame rate/startup time?

Concerns are only on the startup time and/or the overall resource use during startup (relative to chrome, and if there is anything obvious we can improve).

One thing i forgot to add in comment #5: The profiler suggests that almost all the time on the main thread of the content-process appeas to be spent on "Wasm (Baseline)" .. Is there anything to investigate?

@Mayank, can you try out the new wasm compilation pipeline and see if that
makes startup faster?
in about:config, set javascript.options.wasm_lazy_tiering to true.

I just tried this, on a slow machine. The time between end of loading and game
startup was about 1 second; and I can see from MOZ_LOG=wasmPerf:3 output that
immediately after the game starts there is a burst of functions tiering up
(on-demand optimised Ion compilation), but it is very brief.

Ryan writes (comment 2)

It looks like the wasm file [1] is about 50MiB uncompressed. The game might
be small, but the wasm file definitely is not :)

FTR, below are the stats I get when I leave the spinning geode to do its thing
for a minute or so, after which I clicked on "Ready". We see that indeed there
is a lot of code in the module (131620 functions, 46479011 bytecode bytes) but
less than 1% (966) of the functions got Ion-compiled.

[Child 41832: Main Thread]: I/wasmPerf ------ Heuristic Settings ------
[Child 41832: Main Thread]: I/wasmPerf w_lazy_tiering_level (1..9) = 5
[Child 41832: Main Thread]: I/wasmPerf w_inlining_level (1..9) = 5
[Child 41832: Main Thread]: I/wasmPerf w_direct_inlining = true
[Child 41832: Main Thread]: I/wasmPerf w_call_ref_inlining = true
[Child 41832: Main Thread]: I/wasmPerf w_call_ref_inlining_percent (10..100) = 50
[Child 41832: Main Thread]: I/wasmPerf ------ Complete Tier ------
[Child 41832: Main Thread]: I/wasmPerf 131620 functions in module
[Child 41832: Main Thread]: I/wasmPerf 46479011 bytecode bytes in module
[Child 41832: Main Thread]: I/wasmPerf 0 CallRefMetrics in module (0 bytes)
[Child 41832: Main Thread]: I/wasmPerf ------ Partial Tier ------
[Child 41832: Main Thread]: I/wasmPerf 966 functions tiered up
[Child 41832: Main Thread]: I/wasmPerf 532770 bytecode bytes tiered up
[Child 41832: Main Thread]: I/wasmPerf 5740 direct-calls inlined
[Child 41832: Main Thread]: I/wasmPerf 0 call_ref-calls inlined
[Child 41832: Main Thread]: I/wasmPerf 341909 direct-call bytecodes inlined
[Child 41832: Main Thread]: I/wasmPerf 0 call_ref-call bytecodes inlined
[Child 41832: Main Thread]: I/wasmPerf 17 functions overran inlining budget
[Child 41832: Main Thread]: I/wasmPerf 1622304 bytes mmap'd for p-t code storage
[Child 41832: Main Thread]: I/wasmPerf 1612609 bytes actually used for p-t code storage
[Child 41832: Main Thread]: I/wasmPerf ------ Derived Values ------
[Child 41832: Main Thread]: I/wasmPerf 64.2% p-t bytecode expansion caused by inlining
[Child 41832: Main Thread]: I/wasmPerf 99.4% of partial tier mapped code space used
[Child 41832: Main Thread]: I/wasmPerf ------
[Child 41832: Main Thread]: I/wasmPerf >>>>

(In reply to Julian Seward [:jseward] from comment #7)

@Mayank, can you try out the new wasm compilation pipeline and see if that
makes startup faster?
in about:config, set javascript.options.wasm_lazy_tiering to true.

Profile : https://share.firefox.dev/40HBFVU (time is reduced to 1.1s on taskcontroller0, and 600ms on other taskcontroller threads). So lazy tiering does improve things by a factor of 10!

Type: defect → task
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: