Open
Bug 1345476
Opened 7 years ago
Updated 2 years ago
[exploration] Do size-profiling of EpicZenGarden codegen to look for size-reduction opportunities
Categories
(Core :: JavaScript: WebAssembly, task, P3)
Core
JavaScript: WebAssembly
Tracking
()
NEW
People
(Reporter: luke, Unassigned)
References
(Blocks 1 open bug, )
Details
Attachments
(1 file, 4 obsolete files)
518.85 KB,
application/pdf
|
Details |
...especially on 32-bit, where code-memory size is more limited (bug 1345205). For a 44mb .wasm file, we're currently getting about 105mb of executable code. Bug 1334504 should help here by removing the separate "profiling" prologue/epilogues. I think a good start might be measuring aggregate size of generateOutOfLineCode() and doing per-op-type profiling if that's high. It'd also be good to sanity check that the size of stubs/glue code added between functions (by ModuleGenerator) is only a small fraction.
Reporter | ||
Updated•7 years ago
|
Summary: Baldr: do size-profiling of EpicZenGarden codegen to look for → Baldr: do size-profiling of EpicZenGarden codegen to look for size-reduction opportunities
Comment 1•7 years ago
|
||
mbebenita observed at one point that the baseline code was 2x the size of the Ion code and with the removal of patching that can only have gotten worse. Tiering is likely to make the code management problem worse.
Reporter | ||
Comment 2•7 years ago
|
||
Ugh, that's a good point; like 3x worse. A mitigation could be to not do baseline compilation for wasm modules bigger than a certain threshold on 32-bit. Really, a 44mb .wasm module (14mb gzipped) is a bit ridiculous no matter how you slice it; Unity's done a lot more work on code size and is only 12mb (3.6mb gzipped).
Comment 3•7 years ago
|
||
And yet it's presumably these large modules that would benefit the most from baseline compilation... And we'll need an answer for debugging needs, probably? (Could be as ugly and easy as not limiting code size in Dev Ed, I suppose, but that's scant help for a dev trying to diagnose a customer problem in a release browser.)
Reporter | ||
Comment 4•7 years ago
|
||
Right, that could be a reason to increase the max-code-bytes quota further on 32-bit. For debugging, a 64-bit browser would work fine.
Reporter | ||
Comment 5•7 years ago
|
||
Another idea would be to switch baseline, in these gigantor cases, to do per-function JIT compilation. If all baseline calls went through a table (which could be the same mechanism for tiering into Ion), then that table could be pre-populated with stubs that compile when called. I still think we'd have a smoother experience (and be able to take advantage of streaming+parallel compilation) with AOT baseline, though, so probably we'd want to prefer this when not prohibited by memory.
Updated•7 years ago
|
Priority: -- → P3
Reporter | ||
Comment 6•7 years ago
|
||
As an update: due to size-reductions in the .wasm file itself (bug 1341633), and improvements in codegen (bug 1338217 and bug 1334504), a 32-bit Ion build is 76mb and a 32-bit --wasm-always-baseline build is 113mb. So one of these should fit in the now-140mb quota (as increased by bug 1345205).
Comment 7•7 years ago
|
||
So I took a look at the EpicZenGarden .wasm file and it looks like the name section is left in there which takes up 8,602,868 bytes. Not super important, but useful to know when doing size comparisons. Also, I'm not quite sure which .wasm file you are looking at, mine is 39,510,398 bytes. Here's a breakdown of all the instructions we emit and the total number of bytes used to encode them: cdq 2 cqo 4 div 10 idiv 12 sqrtsd 24 bsr 45 cvttss2si 45 andpd 55 bsf 141 jo 174 cvttsd2si 230 movq 245 xchg 258 roundsd 270 js 330 setbe 359 setge 1605 cvtsi2sd 2138 setle 2489 divsd 2557 movapd 2873 cvtsd2ss 2892 ucomisd 3002 movsxd 3009 addsd 3605 roundss 4141 subsd 4282 jnp 4872 xorpd 5533 mulsd 5706 setb 6571 sqrtss 7473 ja 9818 andps 10832 setg 12000 sar 13151 jb 13182 jp 13866 jg 15780 cvtss2sd 16047 setae 16141 pcmpeqw 18960 setl 19922 psllq 22752 cvtsi2ss 23316 divss 25684 movd 28364 seta 36983 setne 44202 jbe 45066 movsd 46027 shr 46723 sete 47028 xorps 52630 jl 54798 ucomiss 61401 jge 79826 lea 82510 or 88758 imul 88899 jle 92686 cmove 111055 ret 112293 subss 127990 movaps 153815 nop 224586 movsx 230824 movabs 233030 shl 263491 movzx 302992 addss 313032 jne 370436 xor 376958 and 448103 sub 497382 mulss 507326 test 543892 jmp 786174 cmp 836107 je 890184 jae 896890 movss 2074553 call 2777591 add 2802569 mov 22145761 Total 39219338 I didn't use any command line args, so I'm guessing it's a 64-bit Ion build on OSX. So many moves :|
Comment 8•7 years ago
|
||
Some more data related to the emitted code in ion is attached. Both the chart and the table are ordered by the average byte size per op.
Reporter | ||
Comment 9•7 years ago
|
||
Thanks! (cc'ing a few people who might be interested in this raw data) After scanning the list a few times, I can't really see anything unexpected here: the "flabby" ops (with more than ~9bytes/op) all use an insignificant % of total bytes. (Speaking of, what was the sum total code size? It'd be nice to have a column next to totalBytes which has % of total) The comparisons (*AndBranch) are all a bit bigger than expected, but I expect this is due immediates. One interesting thing is that the single biggest op (at 11.2mb) is MoveGroup which is inserted by the register allocator when spilling is necessary; given all the existing work on this, I doubt there is any low-hanging fruit here. There are quite a lot of calls here (578k) so the lack of non-volatile registers could be a significant contributing factor.
Comment 10•7 years ago
|
||
A few more ideas, at a quick look: WrapInt64ToInt32 is currently a `movl`; with care this could often be optimized to a no-op in many cases since 32-bit instructions only read the low 32 bits of their inputs. NegD/NegF would be smaller with a constant-pool load instead of materializing the constant manually. Branch immediates: the macroassembler currently always uses 32-bit immediates for forward branches. In the case of branches within individual LIR opcodes, the code generator may be able to declare that the destination is within range for an 8-bit immediate.
Comment 11•7 years ago
|
||
Sheets updated with explicit percentages.
Attachment #8884488 -
Attachment is obsolete: true
Reporter | ||
Comment 12•7 years ago
|
||
Thanks! So scanning the whole list again, it seems like all ops are in one of 3 categories: (1) insignificant % of total (<2%), (2) already optimized and thus not likely source of low-hanging fruit (call, load, store, movegroup), (3) just super-hot and not flabby (comparisons, add). So unfortunately no clear action here, just "good job Ion!". Next, it'd be useful to profile the size of the OutOfLineCodes emitted by generateOutOfLineCode().
Comment 13•7 years ago
|
||
Updated data with OOL operations.
Attachment #8885005 -
Attachment is obsolete: true
Reporter | ||
Comment 14•7 years ago
|
||
Wow, so out-of-line is pretty light then, mostly just out-of-line switch jump tables. So the total size here is 37.7mb whereas about:memory reports 56.7mb for the total code allocation. The two significant remaining buckets are wasm trap out-of-line paths (emitted by masm.wasmEmitTrapOutOfLineCode()) and prologue/epilogue code (emitted by GenerateFunction(Prologue|Epilogue)). Just to make sure we've covered the whole 56.7mb, could you measure and include these as well?
Comment 15•7 years ago
|
||
Attachment #8886737 -
Attachment is obsolete: true
Comment 16•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #14) > GenerateFunction(Prologue|Epilogue)). Just to make sure we've covered the > whole 56.7mb, could you measure and include these as well? The grandtotal is around 49~MB, so I'm wondering: waaaat else might be taking the remaining 7~MB??
Reporter | ||
Comment 17•7 years ago
|
||
Yeah, that sounds interesting. Other buckets: padding (functions are aligned to 16 byte boundaries, iirc), and the stubs at the end (emitted by ModuleGenerator::finishCodegen). (Note: I _think_ about:memory is measuring MB not MiB.)
Comment 18•7 years ago
|
||
> (Note: I _think_ about:memory is measuring MB not MiB.) Hm, looks like MiB: https://hg.mozilla.org/mozilla-central/annotate/e0b0865639cebc1b5afa0268a4b073fcdde0e69c/toolkit/components/aboutmemory/content/aboutMemory.js#l1600 (Aside: TIL verbose mode gives you the exact bytes!)
Reporter | ||
Comment 19•7 years ago
|
||
D'oh! Sorry about that; knowing njn, I had assumed MiB would've been used if MiB was meant.
Comment 20•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #17) > Yeah, that sounds interesting. Other buckets: padding (functions are > aligned to 16 byte boundaries, iirc), and the stubs at the end (emitted by > ModuleGenerator::finishCodegen). (Note: I _think_ about:memory is measuring > MB not MiB.) That's interesting: I've left my Linux laptop at the office today and I ran the data-gathering patch on my personal Mac (including the stubs). We're getting 3 generated stubs that add up to roughly 10MB. I'm seeing pretty bizarre numbers here, though; I have to recheck everything and re-run the numbers with the same baseline tomorrow.
Comment 21•7 years ago
|
||
Alright: I've found the origin of those bizarre numbers I was getting for some emitted op (RTTI dark magic, fwiw:)). There are only two stubs and they add up to around 5MB~ (5122589 bytes). Those, along with the 16 bytes aligns, should account for everything, at least... I guess.:)
Comment 22•7 years ago
|
||
Here's the updated table, including the stubs and the relevant byte size measurements of the compiled code merges. All in all we're almost there in accounting for everything (the grand total is "almost" 56MB). We're still missing a couple of megabytes, as the compiled code merges ("asmMergeWith") calls add up to exactly the 56MB we're seeing in about:memory. Luke, do you think this may be due to the aligns?
Attachment #8887280 -
Attachment is obsolete: true
Reporter | ||
Comment 23•7 years ago
|
||
(In reply to Michelangelo De Simone [:mds] from comment #22) Ah, ok, so there's still a 59.3MB-56.4MB=2.9MB hiding somewhere. It should be possible to track this down by slicing up the remainder of CodeGenerator::generateWasm() (e.g., I see a masm.flush() call).
Updated•5 years ago
|
Component: JavaScript Engine: JIT → Javascript: WebAssembly
Updated•5 years ago
|
Comment 24•2 years ago
|
||
This is probably still worth doing, though not for x86-32 probably. Latest measurements for wasm baseline (bug 1715459 comment 4) on x86-64 is 75MB of code (reducable to 71MB by pinning the TLS) but we also care about Ion. And these days there are much bigger test cases than Zen Garden.
Type: enhancement → task
Summary: Baldr: do size-profiling of EpicZenGarden codegen to look for size-reduction opportunities → [exploration] Do size-profiling of EpicZenGarden codegen to look for size-reduction opportunities
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•