Too many FULL_WASM_ANYREF_BUFFER minor GC
Categories
(Core :: JavaScript: WebAssembly, enhancement, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox138 | --- | fixed |
People
(Reporter: yury, Assigned: yury)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
When running dart's benchmark (dart-wasm-gc-benchmarks), the SpiderMonkey generates too many minor GCs with reason of FULL_WASM_ANYREF_BUFFER. It looks like a legitimate behavior. The minor GC and object promotion operations are top calls in the profiler https://share.firefox.dev/4gHWMMF
Increasing AnyRef store buffer did not improve the benchmark test scores.
Per conversation with sfink, changing highPromotionRate to 0.4 at https://searchfox.org/mozilla-central/source/js/src/gc/Pretenuring.cpp#325 improved the peformance by 23% (was average frame time 8ms, and after the change 6.5ms). The amount of FULL_WASM_ANYREF_BUFFER decreased 11x. Additionally, applying 0.3 or 0.5 did not improve the situation much.
Do we want to change the highPromotionRate for Wasm GC objects?
Attaching log with JS_GC_REPORT_PRETENURE=1, which has some promotional rate numbers for benchmark run without any changes applied.
Comment 1•1 year ago
|
||
(In reply to Yury Delendik (:yury) from comment #0)
Currently the benchmark spends 22% of its time collecting the nursery, which does seem high.
Per conversation with sfink, changing
highPromotionRateto0.4at https://searchfox.org/mozilla-central/source/js/src/gc/Pretenuring.cpp#325 improved the peformance by 23%
Changing this may well make other benchmarks slower. And the fact that you would need to pick a very specific value to improve this one indicates that this is not a great approach.
Some other possibilities that may improve this:
- more finegrained tracking of allocation sites for Wasm GC objects
- some analysis in Wasm to reduce nursery allocation of objects that will always end up tenured
- changes to the nursery to reduce the cost of allocating long lived objects there (e.g. partial nursery eviction)
Comment 2•1 year ago
|
||
To be clear, it's not that 40% is the magic number. 50% had some improvement, just not as much as 40%. 30% and 40% behaved the same. So the improvement requires 40% or less, not exactly 40%.
Also, here's JS_GC_REPORT_PRETENURE output from yury:
Pretenuring info after minor GC 20 for FULL_WASM_ANYREF_BUFFER reason with promotion rate 15.4%:
Site Zone Location BytecodeOp SiteKind TraceKind NAllocs Promotes PRate State
0x1041a9930 0x102b8d600 normal JS Object 309 0 0.0% ShortLived
0x1041a9980 0x102b8d600 normal JS Object 309 0 0.0% ShortLived
0x10419c140 0x102b8d600 normal JS Object 254 0 0.0% ShortLived
0x1041cc3e0 0x102b8d600 normal JS Object 294 0 0.0% ShortLived
0x1041d0d00 0x102b8d600 normal JS Object 636 0 0.0% ShortLived
0x1041c26b0 0x102b8d600 normal JS Object 652 0 0.0% ShortLived
0x1041b09b0 0x102b8d600 normal JS Object 255 136 53.3% ShortLived
0x1041b0690 0x102b8d600 normal JS Object 255 136 53.3% ShortLived
0x1041cde70 0x102b8d600 normal JS Object 302 128 42.4% ShortLived
0x1041cd510 0x102b8d600 normal JS Object 302 128 42.4% ShortLived
0x1041c3e70 0x102b8d600 normal JS Object 388 0 0.0% ShortLived
0x1041b6860 0x102b8d600 normal JS Object 430 0 0.0% ShortLived
0x1041af1a0 0x102b8d600 normal JS Object 320 41 12.8% ShortLived
0x1041ad3f0 0x102b8d600 normal JS Object 252 0 0.0% ShortLived
0x1041944e0 0x102b8d600 normal JS Object 288 0 0.0% ShortLived
0x1041949e0 0x102b8d600 normal JS Object 457 0 0.0% ShortLived
0x1041cd4c0 0x102b8d600 normal JS Object 1003 248 24.7% ShortLived
0x1041cd740 0x102b8d600 normal JS Object 1003 248 24.7% ShortLived
0x1041cdce0 0x102b8d600 normal JS Object 819 136 16.6% ShortLived
0x104194a30 0x102b8d600 normal JS Object 457 0 0.0% ShortLived
0x104192460 0x102b8d600 normal JS Object 336 0 0.0% ShortLived
0x104192320 0x102b8d600 normal JS Object 384 0 0.0% ShortLived
0x1041950c0 0x102b8d600 normal JS Object 932 0 0.0% ShortLived
0x1041c55e0 0x102b8d600 normal JS Object 3410 0 0.0% ShortLived
0x1041ccde0 0x102b8d600 normal JS Object 1232 0 0.0% ShortLived
0x1041ccf70 0x102b8d600 normal JS Object 1232 0 0.0% ShortLived
0x1041c5540 0x102b8d600 normal JS Object 3443 0 0.0% ShortLived
0x1041ce000 0x102b8d600 normal JS Object 1894 304 16.1% ShortLived
0x1041ce370 0x102b8d600 normal JS Object 2220 304 13.7% ShortLived
0x1041cd6f0 0x102b8d600 normal JS Object 2220 304 13.7% ShortLived
0x104192370 0x102b8d600 normal JS Object 384 0 0.0% ShortLived
0x1041924b0 0x102b8d600 normal JS Object 384 0 0.0% ShortLived
0x1041b4830 0x102b8d600 normal JS Object 390 0 0.0% ShortLived
0x1041ae340 0x102b8d600 normal JS Object 234 117 50.0% ShortLived
0x104191830 0x102b8d600 normal JS Object 315 0 0.0% ShortLived
0x1041a5560 0x102b8d600 normal JS Object 390 0 0.0% ShortLived
0x1041a55b0 0x102b8d600 normal JS Object 390 0 0.0% ShortLived
0x1041d0760 0x102b8d600 normal JS Object 349 0 0.0% ShortLived
0x1041cc2a0 0x102b8d600 normal JS Object 4352 1 0.0% ShortLived
0x1041bf460 0x102b8d600 normal JS Object 1414 1 0.1% ShortLived
0x1041d15c0 0x102b8d600 normal JS Object 2546 394 15.5% ShortLived
0x1041d02b0 0x102b8d600 normal JS Object 7514 2638 35.1% ShortLived
0x1041d1520 0x102b8d600 normal JS Object 4048 1584 39.1% ShortLived
0x104190160 0x102b8d600 normal JS Object 5169 1848 35.8% ShortLived
0x1041abc30 0x102b8d600 normal JS Object 273 39 14.3% ShortLived
0x1041abb90 0x102b8d600 normal JS Object 273 39 14.3% ShortLived
0x1041d0f80 0x102b8d600 normal JS Object 11850 2720 23.0% ShortLived
0x1041d14d0 0x102b8d600 normal JS Object 10752 1127 10.5% ShortLived
0x1041cdab0 0x102b8d600 normal JS Object 324 0 0.0% ShortLived
0x104191e20 0x102b8d600 normal JS Object 324 16 4.9% ShortLived
0x1041d12a0 0x102b8d600 normal JS Object 11299 5663 50.1% ShortLived
0x1041d1390 0x102b8d600 normal JS Object 25296 394 1.6% ShortLived
0x1041d1340 0x102b8d600 normal JS Object 24046 757 3.1% ShortLived
0x102b8de08 0x102b8d600 unknown JS Object 4 5 Unknown
Updated•1 year ago
|
| Assignee | ||
Updated•1 year ago
|
| Assignee | ||
Comment 3•1 year ago
|
||
After instrumentation of the spidermonkey, the benchmark indeed has some types/allocSites with multiple callsites. About 10-20% of types in the dart benchmark. It is worth to try assigning individual allocSite to a callsite/bytecode.
| Assignee | ||
Comment 4•1 year ago
|
||
| Assignee | ||
Comment 5•1 year ago
|
||
Some numbers for WIP runs:
| AverageDraw | Before | After | Change |
|---|---|---|---|
| baseline | 11074 us | 10401 us | -7% |
| ion | 8074 us | 7484 us | -7% |
The amount of minor GCs dropped from 1083 -> 580, 50%.
There is some issue in tiering mode I need to investigate, and further optimize the WIP.
Updated•1 year ago
|
Comment 7•1 year ago
|
||
| bugherder | ||
Description
•