Some devices run arm64 JS+wasm x10 slower than arm firefox or arm64 webview
Categories
(Core :: JavaScript: WebAssembly, defect, P3)
Tracking
()
People
(Reporter: colormatch, Unassigned)
References
Details
(Keywords: perf:responsiveness, Whiteboard: [arm64:m4] [geckoview:fenix:p2])
Attachments
(1 file)
620.28 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0
Steps to reproduce:
On some devices the performance of Fenix is running x10 slower javascript calculations.
It is probably related to aarch64 build optimizations.
When building an app, using arm aar, there is no performance issue on the same device, and it is slow as hell when built with aarch arr.
(aarch arr was tested from latest from beta channel and nightly up to v67, as v68 was not loading properly)
On device with Android 8, Snapdragon 625, 2Ghz, 4GB RAM
Fenix takes ~ 5.0 sec
App with built-in aarch arr takes ~ 5.0 sec
Firefox 66 (stable) less than 0.5 sec
app with built-in arm arr takes less than 0.5 sec
Meanwhile Fenix on a Device with Android 9, Snapdragon 632, 1.8Ghz, 3GB RAM performs OK (and the app with built-in aarch arr is running fast)
However, Fenix on a device running Android 9 on Snapdragon 835 8GB ram, takes ~4.0 seconds
To reproduce:
Website: https://kinoseed.com/
Steps (see attached image):
- Load any image in "Load Image"
- Move a slider to modify image
Actual results:
Calculations run for about 5 seconds
Expected results:
Calculation should have run for less than 0.5 seconds
Comment 1•6 years ago
|
||
Lars, is there a bug for using the ARM64 Ion backend for wasm? Or are we waiting for Cranelift?
Comment 2•6 years ago
|
||
We are waiting for Cranelift; there are no Ion plans. Ping me privately re scheduling of that project.
10x is a lot, most Wasm benchmarks I've run do not slow down by that much when run in the baseline compiler relative to Ion. So we might also be seeing other things here that are worth investigating. This could be JS slowdown (arm64 relative to arm; both use ion) and other system effects. That said, it's not impossible that the 10x slowdown is all wasm.
Really what we need here are two profiles of the same program in the two configurations on some of these devices.
Note the reporter has one device where the arm64 code seems to be performing well; this is a little worrisome because it means the slowdowns are not uniform across all devices. That should be investigated too.
Reporter | ||
Comment 3•6 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #2)
We are waiting for Cranelift; there are no Ion plans. Ping me privately re scheduling of that project.
10x is a lot, most Wasm benchmarks I've run do not slow down by that much when run in the baseline compiler relative to Ion. So we might also be seeing other things here that are worth investigating. This could be JS slowdown (arm64 relative to arm; both use ion) and other system effects. That said, it's not impossible that the 10x slowdown is all wasm.
I'm afraid it is not all wasm.
In the example page, only the tab with tools like "brightness" use some wasm, and only for part of the function.
On the problem devices Firefox 66 and WebView arm-aar, are performing well (almost instantly), but Fenix and aarch-aar are taking ~7 seconds (which is more than 10x performance decrease).
The mentioned 10x slower performance was comparing a non-wasm function (like moving the movie-style slider)
Really what we need here are two profiles of the same program in the two configurations on some of these devices.
Note the reporter has one device where the arm64 code seems to be performing well; this is a little worrisome because it means the slowdowns are not uniform across all devices. That should be investigated too.
If it helps, here is the app with aarch64 and arm arrs, using beta channel, version 67.0.20190506235559.
https://kinoseed.com/app-arm-release.apk
https://kinoseed.com/app-aarch64-release.apk
App's GV loads this url: https://kinoseed.com/
Comment 4•6 years ago
|
||
Thanks for the quick feedback. I'll update the bug title to more clearly reflect what we're talking about.
Reporter | ||
Comment 5•6 years ago
|
||
After latest Fenix update, a quick test showed both devices (which showed different performance) to be performing on par (at ~3seconds per operation).
I don't know if the previous discrepancy was due to A/B test, for some reason may have ended up with 32bit version, or a mistake in testing on my part is to blame, but as of now, there is no discrepancy in the device's performance.
Comment 6•6 years ago
•
|
||
(In reply to colormatch from comment #5)
After latest Fenix update, a quick test showed both devices (which showed different performance) to be performing on par (at ~3seconds per operation).
I don't know if the previous discrepancy was due to A/B test, for some reason may have ended up with 32bit version, or a mistake in testing on my part is to blame, but as of now, there is no discrepancy in the device's performance.
If you see this problem again, feel free to re-open this bug.
Reporter | ||
Comment 7•6 years ago
|
||
Using Firefox Preview Version 1.0.1922 build: arm64-v8a
After loading an image in kinoseed.com:
- Turning the phone to landscape
- Pressing [ ] to enter full-screen (btw, there's a separate new bug, which cuts the view up to the notification for "full-screen")
- Adjust "Brightness"
Time for brightness adjustment calculation on:
Android 8, Snapdragon 625, 2Ghz, 4GB RAM
~ 2 seconds
Android 9, Snapdragon 632, 1.8Ghz, 3GB RAM
~ 1 second
For comparison:
Using Firefox 67.0 on Android 8, Snapdragon 625, 2Ghz, 4GB RAM
~ 0.2 seconds
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 9•6 years ago
|
||
Noted as something we can look at, later on, when we're seeking Cranelift benchmarks.
In the meanwhile, if it's JS, might be worth letting nbp & sstangl know about it.
Comment 10•6 years ago
|
||
We should try to get a profile on this, though if it's in jit code it might be rather opaque. NI to nbp and sstangl for awareness
Reporter | ||
Comment 11•6 years ago
|
||
fyi:
some of the corrections like "adjusting brightness", use wasm, but those are changing only the small LUT.
The computational hit gets from the trilinear interpolation (of LUT values) and updating the ImageData.data.buffer, and that's in JS.
Turning "Sharpen off", and then comparing "SAVE" (image export) time of the image will allow you to compare just the JS part.
Comment 12•6 years ago
|
||
Adding the [geckoview:fenix:p2] whiteboard tag because the Fenix team is tracking this issue: https://github.com/mozilla-mobile/fenix/issues/2566
Reporter | ||
Comment 13•6 years ago
|
||
Latest nightly builds seem to be performing fast, and just on time as Google Play 64-bit requirement starts today.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 14•6 years ago
|
||
unfortunately I spoke too soon.
The problem persists, however it doesn't seem to be device dependent as first thought !
Two almost identical images from camera roll (same settings taken seconds apart), and after one is loaded it produces 10x lag, and when the other is loaded - it works well.
Without restarting the app, loading the different images produces consistent (undesirable) results - the code runs fast when one of them is loaded, and lags when the other is loaded, which is really weird.
Again only happens with GV native platform arm64-v8a, and loading some images produce the problem.
I'll post a new bug report when I narrow the problem.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 15•6 years ago
|
||
JS example code, which severely impacts performance of aarch64-GV
// SLOW
data32 = new Uint32Array(idata2.data.buffer)
bm = 0xFF0000
sz = la.length-1
for (var yy = 0; yy < data32.length; ++yy) {
px = data32[yy]
blue = sz*((px & bm) >> 16)/255
}
// FAST
data32 = new Uint32Array(idata2.data.buffer)
for (var yy = 0; yy < data32.length; ++yy) {
px = data32[yy]
blue = (px & 0xFF0000) >> 16
blue *= la.length-1
blue /= 255
}
There was another issue when computation is running in nested loops, something like:
blue = var1*j/255
Again, breaking the computation apart, or not having more than 1 vars* (not fully tested) significantly helps the performance.
Reporter | ||
Comment 16•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #12)
Adding the [geckoview:fenix:p2] whiteboard tag because the Fenix team is tracking this issue: https://github.com/mozilla-mobile/fenix/issues/2566
should a new bug report be made?
it seems it is not device-specific issue, but optimization problem affecting all devices
(unless the device specific issue was fixed, and this is something new)
Updated•5 years ago
|
Comment 19•5 years ago
|
||
re-NI'ing (removing sstangl)
Comment 20•5 years ago
|
||
Chris, can you keep an eye on this as we look at turning on cranelift for arm64?
Comment 21•5 years ago
|
||
:tcampbell, will do. Reading above, it's unclear to me how much of this is Wasm and how much is JS but we will certainly take a look once we enable in Nightly!
Updated•4 years ago
|
Updated•4 years ago
|
Comment 23•4 years ago
|
||
Oh interesting. I'll take a look tomorrow. Ion for arm64 is live (and will likely ship in FF90) so wasm perf ought no longer be a concern here.
Comment 24•4 years ago
|
||
This is super fast on my M1 MBP, so I will assume this is fixed by the new wasm optimizing backend for arm64.
Comment 25•4 years ago
|
||
That said, disabling the optimizing jit doesn't make that much of a difference on perceived perf on the M1, so there may have been other updates that matter more, or even updates to the site. It does use Wasm, but the code I'm looking at doesn't even have a loop, it's just a couple of 10-parameter f32 and i32 math things that are both straight-line code, presumably to be called from JS, they don't call each other. For something like this, the overhead of calling from JS is going to matter.
(Shallow investigation only.)
Updated•3 years ago
|
Description
•