Closed Bug 1747265 Opened 4 years ago Closed 4 years ago

Ion instruction reordering very slow on Emscripten 3 code (was: Wasm built with Emscripten >2.0 is 8 times slower to compile than WASM built with Emscripten 1.40.1 on Android phones RAM <=6GB)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

97 Branch

Tracking Flags:

Tracking

Status

firefox97

---

fixed

People

(Reporter: kbrosnan, Assigned: jandem)

Details

Attachments

(4 files)

wasm.zip 4 years ago Lars T Hansen [:lth] 2.06 MB, application/zip		Details
em3.0.1.wasm.js 4 years ago Lars T Hansen [:lth] 154 bytes, application/x-javascript		Details
em1.40.1.wasm.js 4 years ago Lars T Hansen [:lth] 155 bytes, application/x-javascript		Details
Bug 1747265 - Fix quadratic behavior when moving constants as part of instruction reordering. r?iain! 4 years ago Jan de Mooij [:jandem] 48 bytes, text/x-phabricator-request		Details \| Review

Kevin Brosnan [Ex-Mozilla]

Reporter

Description

•

4 years ago

From github: https://github.com/mozilla-mobile/fenix/issues/22947.

with firefox defaut profile, WebAssembly.compile time of built with Emscripten >2.0 is 8,9s (em3.0.1.wasm in tha attachment) ，while WASM built with Emscripten 1.40.1 is 1,2s (em1.40.1.wasm in the attachment), tested on Xiaomi K30(Ram= 6GB, 8 cores), firefox 96.0.0 beta.
wasm.zip
Then I observed the CPU used of wasm (Emscripten >2.0) compilation is 48%, I guess wasm (Emscripten >2.0) compilation maybe only use one core.
I set javascript.options.wasm_optimizingjit false and compile time for both the wasms is about 140ms compared to 100ms for Chromium 94. I saw a "wasm_verbose" switch, is there a relevant guidance document link?

Refer to * When wasm bytecode arrives, we choose the compilation strategy based on * switches and on aspects of the code and the hardware. If switches allow * tiered compilation to happen (the normal case), the following logic applies. * * If the code is sufficiently large that tiered compilation would be beneficial * but not so large that it might blow our compiled code budget and make * compilation fail, we choose tiered compilation. Otherwise we go straight to * optimized code.
it seems with default profile, the compile goes to "optimized code". is this reasonable?

Android device: SAMSUNG A51 (RAM 4G),Xiaomi K30(Ram= 6GB),etc

Fenix version: firefox android 96.0.0 beta

Change performed by the Move to Bugzilla add-on.

Yury Delendik (:yury)

Updated

•

4 years ago

Component: JavaScript Engine → Javascript: WebAssembly

Lars T Hansen [:lth]

Comment 1

•

4 years ago

This could indicate:

tiering heuristics shunt 2.0 content over to ion directly instead of via baseline, which could happen if the 2.0 content is smaller and falls just below the limit while the 1.40 content was just over the limit
background compiler gets only one compile thread, which could happen because i believe we're now using the gecko scheduler
tiering logic could make its decision based on a faulty assumption about how many threads it is going to get

We should also try to figure out if the ram size has anything to do with it; it could, if the scheduler takes memory size into account.

"More data needed". We should prioritize this.

Severity: -- → S2

Priority: -- → P2

Lars T Hansen [:lth]

Comment 2

•

4 years ago

Attached file wasm.zip — Details

Lars T Hansen [:lth]

Comment 3

•

4 years ago

I think the problem may be something else. Ion takes >3x longer to compile the 3.0 content than the 1.40 content on the foreground thread and >10x longer on background threads. On my x64 desktop system with this morning's mozilla-central,

Running on foreground thread only:

$ ~/m-u/obj-release/dist/bin/js --wasm-compiler=ion --no-threads em1.40.1.wasm.js
1451.291015625ms

$ ~/m-u/obj-release/dist/bin/js --wasm-compiler=ion --no-threads em3.0.1.wasm.js
5109.98388671875ms

Running multithreaded:

$ ~/m-u/obj-release/dist/bin/js --wasm-compiler=ion em1.40.1.wasm.js
349.660888671875ms

$ ~/m-u/obj-release/dist/bin/js --wasm-compiler=ion em3.0.1.wasm.js
4049.534912109375ms

Obviously this will have a profound impact on everything. The fact that multithreaded compilation slows down so much suggests that Ion gets stuck in one function somewhere and so the slowest background thread determines the runtime of the whole thing. I think focussing on the foreground compilation case first makes sense.

Lars T Hansen [:lth]

Comment 4

•

4 years ago

Attached file em3.0.1.wasm.js — Details

Lars T Hansen [:lth]

Comment 5

•

4 years ago

Attached file em1.40.1.wasm.js — Details

Lars T Hansen [:lth]

Comment 6

•

4 years ago

Perf says 65% of the time is spent in ReorderInstructions, mostly in MoveBefore, and indeed when disabling that pass the running time drops back down to what it should be.

Nothing interesting seems to have been done with that pass in quite some time; Jan added an early exit check this summer, but nothing since. We should try to bisect for the sake of having done it but I expect that what happens here is that emscripten 3 emits code that triggers nonlinear behavior in code that uses MoveBefore - MoveBefore has an embedded loop and itself is called from a loop, so maybe that's it, or maybe it's more complex.

Lars T Hansen [:lth]

Updated

•

4 years ago

Summary: Wasm built with Emscripten >2.0 is 8 times slower to compile than WASM built with Emscripten 1.40.1 on Android phones RAM <=6GB → Ion instruction reordering very slow on Emscripten 3 code (was: Wasm built with Emscripten >2.0 is 8 times slower to compile than WASM built with Emscripten 1.40.1 on Android phones RAM <=6GB)

Jan de Mooij [:jandem]

Assignee

Comment 7

•

4 years ago

•

Edited

It seems to be mostly from the code we have for moving constants, this can be O(n^2) and this is very slow for large blocks. I think we can rewrite this to be linear.

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Assignee

Comment 8

•

4 years ago

Attached file Bug 1747265 - Fix quadratic behavior when moving constants as part of instruction reordering. r?iain! — Details

Instruction reordering has code to move constants with a single use to the start of
the block, to allow more instruction reordering. This uses MoveBefore, which renumbers
instructions appropriately but unfortunately is also quadratic.

We can fix this by moving these constants before we do the initial renumbering.
On certain Wasm workloads this pass moves the same number of constants (and other
instructions) as before, but is now much faster.

Moving constants up this eagerly is a bit questionable and at some point we might want
to revisit the reordering pass completely, but for now this is a pretty simple rewrite
that fixes a bad perf cliff.

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → jdemooij

Status: NEW → ASSIGNED

Jan de Mooij [:jandem]

Assignee

Updated

•

4 years ago

Flags: needinfo?(jdemooij)

Pulsebot

Comment 9

•

4 years ago

Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fd0ef0d3b8d3 Fix quadratic behavior when moving constants as part of instruction reordering. r=iain

Sandor Molnar[:smolnar]

Comment 10

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/fd0ef0d3b8d3

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox97: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 97 Branch

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Ion instruction reordering very slow on Emscripten 3 code (was: Wasm built with Emscripten >2.0 is 8 times slower to compile than WASM built with Emscripten 1.40.1 on Android phones RAM <=6GB)

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

Tracking

()

People

(Reporter: kbrosnan, Assigned: jandem)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Updated

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type