1916442 - js::wasm::IonCompileFunctions slow and resource consuming on onnx 1.20.x wasm

Reporter

Description

•

5 months ago

I am working on integrating the new Transformers.js lib v3, and it works but I noticed a huge RSS memory usage in the inference worker.

I profiled the code and found that the js::wasm::IonCompileFunctions spins trying to compile some WASM. I could not do a memory profile because it locks the report in about:memory.

This is the profile I got from the profiler https://share.firefox.dev/4g6ZrQh
and to reproduce, follow these steps:

apply patch https://phabricator.services.mozilla.com/D220519
in about:config, set up browser.ml.enable to true
go in about:inference and select the NER preset
run the inference and wait for the models to download and the inference to finish (you will see results on the console on that page)

That will set the inference process to the mentioned state. You can check its activity in about:processes

Tarek Ziadé (:tarek)

Reporter

Updated

•

5 months ago

Blocks: 1913071

Jira Integration Bot

Updated

•

5 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/GENAI-136

Tarek Ziadé (:tarek)

Reporter

Comment 1

•

5 months ago

Notice that if I set javascript.options.wasm_optimizingjit to false the problem is gone

Nicolas B. Pierron [:nbp]

Updated

•

5 months ago

Component: Machine Learning → JavaScript: WebAssembly

Tarek Ziadé (:tarek)

Reporter

Comment 2

•

5 months ago

interesting : things seems to be faster for that lib when that pref is off

Tarek Ziadé (:tarek)

Reporter

Comment 3

•

5 months ago

the wasm used can be picked here: https://cdn.jsdelivr.net/npm/onnxruntime-web@1.20.0-dev.20240829-be76e1e1b8/dist/

ort-wasm-simd-threaded.wasm and ort-wasm-simd-threaded.jsep.wasm

Julian Seward [:jseward]

Updated

•

5 months ago

Assignee: nobody → jseward

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

5 months ago

Blocks: wasm-perf, sm-jits

Julian Seward [:jseward]

Comment 4

•

5 months ago

•

Edited

On x86_64-linux, I managed to Ion-compile both ort-wasm-simd-threaded.wasm and
ort-wasm-simd-threaded.jsep.wasm to completion. The latter is larger and took
about 10 minutes and around 4GB of memory.

It contains some very large functions, the largest of which is 1233871 wasm
bytecode bytes, producing 388722 LIRs in 132855 basic blocks. This takes the
allocator a long time to process (several minutes), but it doesn't loop. I
imagine it will take about twice as long on an ARM64 platform, since ARM64 has
about twice as many integer registers to search through.

Julian Seward [:jseward]

Comment 5

•

5 months ago

(In reply to Tarek Ziadé (:tarek) from comment #2)

interesting : things seems to be faster for that lib when that pref is off

If you mean the lib appears to run faster when
javascript.options.wasm_optimizingjit is set to false, I think that is
expected, because the baseline compiled code isn't competing
against Ion's register allocator for compute resources. We know that
-- especially for large inputs -- Ion's register allocator has very poor
memory locality, and it could be that the resulting avalanche of
traffic to shared parts of the memory hierarchy -- the L3 cache and
DRAM -- slows down the baseline code.

Julian Seward [:jseward]

Comment 6

•

5 months ago

Are you in control of the building of ort-wasm-simd-threaded.jsep.wasm?
From our point of view, the simplest "fix" would be to reduce the aggressiveness
of inlining, or whatever is causing the formation of such a huge wasm function,
so as to keep the register allocator away from such pathological behaviour.

I should add .. there are several very large functions in that file, not just one.
In order of increasing size, the top 9 sizes (in wasm bytecode bytes) are
101299, 103011, 111524, 133122, 146727, 149709, 404850, 651603, 1233871.
ort-wasm-simd-threaded.wasm also has very large functions, although
somewhat smaller than these.

Flags: needinfo?(tziade)

Tarek Ziadé (:tarek)

Reporter

Comment 7

•

5 months ago

Are you in control of the building of ort-wasm-simd-threaded.jsep.wasm?

No, but the cmake script is here :
https://github.com/microsoft/onnxruntime/blob/main/cmake/onnxruntime_webassembly.cmake

if there are some obvious changes we could do there, we could build our own artifacts.

The latter is larger and took about 10 minutes and around 4GB of memory.

10 minutes and 4GB is a no go for our users. is it possible to provide already compiled versions so they skip that step ?
If not, would it be possible for now to run that specific WASM with Ion deactivated? In case the fix takes a long time to happen since it's upstream

Flags: needinfo?(tziade)

Julian Seward [:jseward]

Comment 8

•

5 months ago

•

Edited

(In reply to Tarek Ziadé (:tarek) from comment #7)

10 minutes and 4GB is a no go for our users.

Oh, indeed, 10 mins / 4GB is unreasonable in any scenario.

is it possible to provide already compiled versions so they skip that step ?

We don't have any (simple) way to do that, but ..

If not, would it be possible for now to run that specific WASM with Ion deactivated? In case the fix takes a long time to happen since it's upstream

.. yeah, something like that would be easy to do. The downside would be that
you would get performance of 60%-70% compared to that of Ion compiled
code. Is that acceptable? Are these .wasms performance-critical?

Tarek Ziadé (:tarek)

Reporter

Comment 9

•

5 months ago

Is that acceptable? Are these .wasms performance-critical?

It is performance critical for sure.

This is the cmake file for building them https://github.com/microsoft/onnxruntime/blob/main/cmake/onnxruntime_webassembly.cmake

I'll reach out the project to see if we can get some help

Tarek Ziadé (:tarek)

Reporter

Comment 10

•

5 months ago

Julian, could you provide the steps you used to manually compile the WASMs ? I would also be curious to run the same thing on Chrome/Chromium to compare the time it takes

Flags: needinfo?(jseward)

Tarek Ziadé (:tarek)

Reporter

Comment 11

•

5 months ago

Added this for cross visibility https://github.com/microsoft/onnxruntime/issues/21978

Julian Seward [:jseward]

Comment 12

•

5 months ago

(In reply to Tarek Ziadé (:tarek) from comment #9)

I'll reach out the project to see if we can get some help

That might be worth doing also because it might be the case that other
wasm implementations are also unhappy at having to do optimised
compilation for such huge functions.

Julian Seward [:jseward]

Comment 13

•

5 months ago

(In reply to Tarek Ziadé (:tarek) from comment #10)

Julian, could you provide the steps you used to manually compile the WASMs ?

Put this in a file (eg testCompileWasm.js):

if (scriptArgs.length != 1) {
    print("usage: testCompileWasm /path/to/file.wasm");
    quit(0);
}
print("testCompileWasm: reading");
let b2 = os.file.readFile(scriptArgs[0], "binary");
print("testCompileWasm: compiling");
let m2 = new WebAssembly.Module(b2);
print("testCompileWasm: done " + m2);

Then run (eg)

 /path/to/dist/bin/js --no-ion --no-threads --wasm-compiler=ion \
   -P wasm_experimental_inline_depth_limit=0 \
   -P wasm_experimental_inline_size_limit=0 \
   -P wasm_experimental_inline_call_ref_threshold=0 \
   testCompileWasm.js /path/to/ort-wasm-simd-threaded.jsep.wasm

If it doesn't like the -P bits, remove them.
Change --wasm-compiler=ion to --wasm-compiler=baseline as needed.
--no-ion applies only to JS; has no effect on wasm.
--no-threads makes it a lot easier to profile/benchmark/debug.

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Comment 14

•

5 months ago

To build the shell, there are various ways; here is what I use
on x86_64-linux. Pretty old-fashioned I suspect, but it works.

Check out mozilla-central; then:

cd <mozilla-central>/js
mkdir BUILDX64OPT
cd BUILDX64OPT
CC="ccache clang" CXX="ccache clang++" ../src/configure --disable-debug --enable-optimize="-g -O2"
make -j8

Resulting binary should be BUILDX64OPT/dist/bin/js

Tarek Ziadé (:tarek)

Reporter

Comment 15

•

5 months ago

FYI - We want to land that new lib in central asap because it unlocks a lot of features/improvements we need. In an ideal world before the end of september.

If we can't resolve this issue before that, it would be great to be able to disable that compilation for those lib and use the baseline compilation for now, until we resolve it.

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

5 months ago

Severity: -- → S3

Priority: -- → P1

Comment 16

•

5 months ago

I tried V8 version 12.7.224.16 engine through d8 with:

let wasm = "ort-wasm-simd-threaded.jsep.wasm";
const wasmCode = read(wasm, "binary");
const wasmModule = new WebAssembly.Module(wasmCode);
print(wasmModule);

and :

➜  compwasm /usr/bin/time -l d8 --liftoff --wasm-tier-up testCompileWasm.js
[object WebAssembly.Module]
        0,04 real         0,11 user         0,01 sys
            75153408  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                5038  page reclaims
                   3  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   0  voluntary context switches
                 314  involuntary context switches
           956823949  instructions retired
           286924688  cycles elapsed
            60901696  peak memory footprint

it's returning the module instantly, using 70MB of RSS. Maybe I am not calling it the right way?
My understanding is that --wasm-tier-up compiles it with TurboFan and --liftoff is the base compilation