Closed Bug 1442540 Opened 2 years ago Closed 9 months ago

Compute true values of arm64BaselineBytesPerBytecode and related definitions


(Core :: Javascript: WebAssembly, defect, P2)




Tracking Status
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed


(Reporter: lth, Assigned: lth)


(Blocks 1 open bug)


(Whiteboard: [arm64:m3])


(2 files)

In wasm/WasmCompile.cpp we have some constants that we use to estimate the size of the output of the compiler from the size of the bytecode we're compiling, so that we can pre-size the output buffers and not incur a lot of copying overhead.

For ARM64 the current values are guesses that are not even used until the ARM64 baseline compiler lands.  We need to compute the true values here.

Also in that file are guesses about how fast compilation is.  We need to provide actual measurements for ARM64 here as well.  This is tricky because it depends to some extent on the system class, but anything's better than what we have now.

While we're in there we should document the exact methodology used for obtaining the measurements.
Summary: Compute true value of arm64BaselineBytesPerBytecode and related definitions → Compute true values of arm64BaselineBytesPerBytecode and related definitions
Blocks: Fennec-ARM64
Priority: P3 → P1
Target Milestone: --- → mozilla61
Lars, what is the status of this issue?
It would not be backported to 61 now, should we track firefox 62?
Flags: needinfo?(lhansen)
Target Milestone: mozilla61 → ---
We need this for 63, but it is not a requirement for 62.  Updating flags.
Flags: needinfo?(lhansen)
Target Milestone: --- → mozilla63
Component: JavaScript Engine: JIT → Javascript: Web Assembly
Lars, this bug will probably not make it for 63, do you know when you would be able to look at this bug?
Flags: needinfo?(lhansen)
Not until after TPAC, for sure...  I'm lowering to P2 because ARM64 is not tremendously important to us yet, and having the "right" value here is not important for correctness.  Also, there's some mumbling about maybe using telemtry to gather data here, so it'll happen, but not right now.
Flags: needinfo?(lhansen)
Priority: P1 → P2
Whiteboard: [arm64:m1]
[arm64:m3] because correctly sizing the compiler's output buffer doesn't need to block publishing ARM64 Fennec Nightly builds. It probably should block ARM64 Fennec riding to Beta.
Whiteboard: [arm64:m1] → [arm64:m3]
Agreed.  Hope to do this before long as part of bug 1496325.
Also, in ClassifySystem() in WasmCompile.cpp we should worry about what kind of system Windows-on-ARM64 is.  Right now it's classified as a mobile system because it's ARM64.
WIP: Compute, retain, and log (when possible) compilation resource use.  This is OK as far as it goes, but does not handle second-tier compilation when we tier up, it only reports on the first tier.  For the purposes of this particular bug that's probably good enough but we ought to do better.

SoftIron Overdrive 1000 (4xCortex-A57, pretty spiffy), representative runs, this is baseline only since we have no ion:

                bytecode  machinecode   machine/byte   ms   bytecodes/ms
zlib               66551       209840       3.15        3          22183
raybench           44504       173952       3.91        4          11126
box2d              97060       323200       3.33        6          16176
lua_binarytrees   215718       705952       3.27       12          17976
bullet            382678      1114016       2.91       17          22510
angrybots       10413501     29747744       2.86      414          25153
tanks           11039867     31967296       2.90      445          24808
zengarden       23891164     72693744       3.04     1032          23150

zlib highlights some cache effects. The benchmark appears to compile the same wasm program twice (ie create two new modules from the same bytecode array). The second compilation is about twice as fast as the first, presumably because both bytecode and compiler are in-cache. (What's reported here is the second compilation.)

As programs get larger, we see machine code bytes per bytecode byte approach 3.0. Since this is baseline, compile times are really not that interesting, but if we have a plausible ion/baseline compilation time factor from another platform we can use that as a placeholder. On x64, for the larger programs, that factor is about 4 on my Xeon system, let's say 5 to use a round number.

In addition, the ARM64 system I'm using for this is pretty fast, so let's assume that current consumer hardware will take twice as long to compile. (Bug 1496325 is about monitoring these values continually across our user population.) High-end phones now have 4 fast cores so the fact that my dev system is quad-core should not be a factor.

Thusly, we get the following initial values for ARM64:

arm64BaselineBytesPerBytecode = 3.0
arm64IonBytesPerBytecode = 2.75 (this is a guess, we have no data)
arm64BytecodesPerMs = 24000/5/2 = 2400

Remarkably, 2400 bytecodes/ms is faster than the value we're using for x64 (2100) but the reference hardware for x64 is fairly old, it might be worth re-measuring. On my Xeon system I see stable rates of 15K bytecodes/ms with Ion, and 60K-100K/ms with baseline.

Ah, but the perf data we want is per core, so now the world is right again: this gives an ARM64 bytecode per ms count (for Ion) around 600, which sounds just right. This is poorly documented, so I'll include better documentation in the patch.

As above but with --no-threads then:

raybench  7316 bytecodes/ms
bullet    7972 bytecodes/ms
angrybots 7390 bytecodes/ms
tanks     7291 bytecodes/ms
zengarden 6928 bytecodes/ms

This gives us pretty much 730 bytecodes/ms/core which is a good number.

Incidentally, the "guess" values currently in the code are very close to the computed values.

Very minor adjustments from our initial guesses, actually. Updated some comments and clarified some names.

The code for recording & computing these values will move to the telemetry bug; the present bug is only about the static values themselves.

Attachment #9035113 - Flags: review?(luke)

Comment on attachment 9035113 [details] [diff] [review]

Review of attachment 9035113 [details] [diff] [review]:

Nice job with the thorough evaluation!

Attachment #9035113 - Flags: review?(luke) → review+
Pushed by
Empirical ARM64 policy values for wasm tiering.  r=luke
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: mozilla63 → mozilla66
You need to log in before you can comment on or make changes to this bug.