Compute true values of arm64BaselineBytesPerBytecode and related definitions
Categories
(Core :: JavaScript: WebAssembly, defect, P2)
Tracking
()
People
(Reporter: lth, Assigned: lth)
References
Details
(Whiteboard: [arm64:m3])
Attachments
(2 files)
6.21 KB,
patch
|
Details | Diff | Splinter Review | |
4.65 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
In wasm/WasmCompile.cpp we have some constants that we use to estimate the size of the output of the compiler from the size of the bytecode we're compiling, so that we can pre-size the output buffers and not incur a lot of copying overhead. For ARM64 the current values are guesses that are not even used until the ARM64 baseline compiler lands. We need to compute the true values here. Also in that file are guesses about how fast compilation is. We need to provide actual measurements for ARM64 here as well. This is tricky because it depends to some extent on the system class, but anything's better than what we have now. While we're in there we should document the exact methodology used for obtaining the measurements.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Lars, what is the status of this issue? It would not be backported to 61 now, should we track firefox 62?
Assignee | ||
Comment 2•5 years ago
|
||
We need this for 63, but it is not a requirement for 62. Updating flags.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Lars, this bug will probably not make it for 63, do you know when you would be able to look at this bug?
Assignee | ||
Comment 4•5 years ago
|
||
Not until after TPAC, for sure... I'm lowering to P2 because ARM64 is not tremendously important to us yet, and having the "right" value here is not important for correctness. Also, there's some mumbling about maybe using telemtry to gather data here, so it'll happen, but not right now.
Updated•5 years ago
|
Comment 5•5 years ago
|
||
[arm64:m3] because correctly sizing the compiler's output buffer doesn't need to block publishing ARM64 Fennec Nightly builds. It probably should block ARM64 Fennec riding to Beta.
Assignee | ||
Comment 6•5 years ago
|
||
Agreed. Hope to do this before long as part of bug 1496325.
Assignee | ||
Comment 7•5 years ago
|
||
Also, in ClassifySystem() in WasmCompile.cpp we should worry about what kind of system Windows-on-ARM64 is. Right now it's classified as a mobile system because it's ARM64.
Assignee | ||
Comment 8•5 years ago
|
||
WIP: Compute, retain, and log (when possible) compilation resource use. This is OK as far as it goes, but does not handle second-tier compilation when we tier up, it only reports on the first tier. For the purposes of this particular bug that's probably good enough but we ought to do better.
Assignee | ||
Comment 9•5 years ago
•
|
||
SoftIron Overdrive 1000 (4xCortex-A57, pretty spiffy), representative runs, this is baseline only since we have no ion:
bytecode machinecode machine/byte ms bytecodes/ms
zlib 66551 209840 3.15 3 22183
raybench 44504 173952 3.91 4 11126
box2d 97060 323200 3.33 6 16176
lua_binarytrees 215718 705952 3.27 12 17976
bullet 382678 1114016 2.91 17 22510
angrybots 10413501 29747744 2.86 414 25153
tanks 11039867 31967296 2.90 445 24808
zengarden 23891164 72693744 3.04 1032 23150
zlib highlights some cache effects. The benchmark appears to compile the same wasm program twice (ie create two new modules from the same bytecode array). The second compilation is about twice as fast as the first, presumably because both bytecode and compiler are in-cache. (What's reported here is the second compilation.)
As programs get larger, we see machine code bytes per bytecode byte approach 3.0. Since this is baseline, compile times are really not that interesting, but if we have a plausible ion/baseline compilation time factor from another platform we can use that as a placeholder. On x64, for the larger programs, that factor is about 4 on my Xeon system, let's say 5 to use a round number.
In addition, the ARM64 system I'm using for this is pretty fast, so let's assume that current consumer hardware will take twice as long to compile. (Bug 1496325 is about monitoring these values continually across our user population.) High-end phones now have 4 fast cores so the fact that my dev system is quad-core should not be a factor.
Thusly, we get the following initial values for ARM64:
arm64BaselineBytesPerBytecode = 3.0
arm64IonBytesPerBytecode = 2.75 (this is a guess, we have no data)
arm64BytecodesPerMs = 24000/5/2 = 2400
Remarkably, 2400 bytecodes/ms is faster than the value we're using for x64 (2100) but the reference hardware for x64 is fairly old, it might be worth re-measuring. On my Xeon system I see stable rates of 15K bytecodes/ms with Ion, and 60K-100K/ms with baseline.
Assignee | ||
Comment 10•5 years ago
|
||
Ah, but the perf data we want is per core, so now the world is right again: this gives an ARM64 bytecode per ms count (for Ion) around 600, which sounds just right. This is poorly documented, so I'll include better documentation in the patch.
Assignee | ||
Comment 11•5 years ago
|
||
As above but with --no-threads then:
raybench 7316 bytecodes/ms
bullet 7972 bytecodes/ms
angrybots 7390 bytecodes/ms
tanks 7291 bytecodes/ms
zengarden 6928 bytecodes/ms
This gives us pretty much 730 bytecodes/ms/core which is a good number.
Incidentally, the "guess" values currently in the code are very close to the computed values.
Assignee | ||
Comment 12•5 years ago
|
||
Very minor adjustments from our initial guesses, actually. Updated some comments and clarified some names.
The code for recording & computing these values will move to the telemetry bug; the present bug is only about the static values themselves.
![]() |
||
Comment 13•5 years ago
|
||
Comment on attachment 9035113 [details] [diff] [review]
bug1442540-arm64-tier-parameters.patch
Review of attachment 9035113 [details] [diff] [review]:
Nice job with the thorough evaluation!
Comment 14•5 years ago
|
||
Pushed by lhansen@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/16819308720e Empirical ARM64 policy values for wasm tiering. r=luke
Comment 15•5 years ago
|
||
bugherder |
Description
•