[WASM] Demo at https://kubohiroya.github.io/vite-react-comlink-worker-assemblyscript-webgpu-boilerplate/ is 3x slower in Nightly
Categories
(Core :: JavaScript: WebAssembly, task, P3)
Tracking
()
People
(Reporter: mayankleoboy1, Assigned: jseward)
References
(Blocks 2 open bugs, )
Details
Attachments
(1 file)
203.35 KB,
application/octet-stream
|
Details |
Go to https://kubohiroya.github.io/vite-react-comlink-worker-assemblyscript-webgpu-boilerplate/
Slider to 500
Click on "Actionscript Start"
Nightly :https://share.firefox.dev/4iVXNlJ (16s, 8s)
Chrome: https://share.firefox.dev/3DxWr0l (5s)
Chrome different run: https://share.firefox.dev/3W11ehr (15s, 6s)
Comment 1•1 month ago
|
||
Spending almost all of its time in a Wasm function. We'd have to look at the generated code to see why we're slower.
Reporter | ||
Updated•1 month ago
|
Updated•29 days ago
|
Reporter | ||
Comment 2•29 days ago
|
||
I retested on an older build (with/without lazy tiering), and I consistently get 8.5s with WASM. I dont know how/why I got 15s in the original comment.
Comment 3•22 days ago
|
||
I am seeing:
Firefox with lazy tiering off = 4 seconds consistently.
Firefox with lazy tiering on = 10 seconds first run, 4 seconds on second run (in a single page load).
Chrome = 7.8 seconds consistently.
It looks like with lazy tiering we're getting stuck running some baseline code for too long. This may be a place where we need OSR.
Updated•22 days ago
|
Assignee | ||
Comment 4•20 days ago
|
||
To confirm: with lazy tiering enabled, once wasm code is running, we spend
99.84% of all instructions in baseline generated code, all in the same
function, index 24. Indeed 74% of all instructions disappear into just one hot
loop of 4 basic blocks.
Numbers are instruction counts
rank ---cumulative--- -----self-----
0: (2995807213 63.22%) 2995807213 63.22% 0x36efd24e1afb wasmBL:fI=24:+1755
1: (3626130756 76.52%) 630323543 13.30% 0x36efd24e1bba wasmBL:fI=24:+1946
2: (3881093073 81.90%) 254962317 5.38% 0x36efd24e1aeb wasmBL:fI=24:+1739
3: (4051067951 85.49%) 169974878 3.59% 0x36efd24e1add wasmBL:fI=24:+1725
4: (4178549111 88.18%) 127481160 2.69% 0x36efd24e1ab8 wasmBL:fI=24:+1688
5: (4306030269 90.87%) 127481158 2.69% 0x36efd24e1ad0 wasmBL:fI=24:+1712
6: (4398100000 92.81%) 92069731 1.94% 0x36efd24e1a6e wasmBL:fI=24:+1614
7: (4483087441 94.61%) 84987441 1.79% 0x36efd24e1aab wasmBL:fI=24:+1675
8: (4568074881 96.40%) 84987440 1.79% 0x36efd24e1baa wasmBL:fI=24:+1930
9: (4624733175 97.60%) 56658294 1.20% 0x36efd24e1a9d wasmBL:fI=24:+1661
10: (4667226895 98.49%) 42493720 0.90% 0x36efd24e1a90 wasmBL:fI=24:+1648
11: (4702672980 99.24%) 35446085 0.75% 0x36efd24e1a5b wasmBL:fI=24:+1595
12: (4716851414 99.54%) 14178434 0.30% 0x36efd24e1a4d wasmBL:fI=24:+1581
13: (4731015988 99.84%) 14164574 0.30% 0x36efd24e1a40 wasmBL:fI=24:+1568
14: (4735451188 99.93%) 4435200 0.09% 0x4c14185 __memcpy_sse2_unaligned_erms+965
15: (4735522938 99.93%) 71750 0.00% 0x4c14180 __memcpy_sse2_unaligned_erms+960
16: (4735567794 99.93%) 44856 0.00% 0x4c070f0 _int_free
17: (4735609738 99.94%) 41944 0.00% 0x4bfd39d pthread_mutex_lock@@GLIBC_2.2.5+93
18: (4735644438 99.94%) 34700 0.00% 0x36efd24e16cb wasmBL:fI=24:+683
19: (4735679088 99.94%) 34650 0.00% 0x36efd24e1a36 wasmBL:fI=24:+1558
20: (4735709098 99.94%) 30010 0.00% 0x4bfec20 __pthread_mutex_unlock_usercnt
Assignee | ||
Comment 5•20 days ago
|
||
Complete hotblock-snapshot profile, including identification of the
hottest loop.
Description
•