Closed Bug 864502 Opened 8 years ago Closed 8 years ago
Analysis to separate Ion+BC-relevant data from JM-relevant data
56.44 KB, patch
|Details | Diff | Splinter Review|
47.18 KB, patch
|Details | Diff | Splinter Review|
45.12 KB, patch
|Details | Diff | Splinter Review|
Once Ion is no longer depending on analyzeTypes, we should be able to discard most of the heavyweight analysis info associated with script bytecodes. However, as long as JM is still in the tree, the code to collect this information cannot fully be removed. Currently, bytecode analysis results (the single-bit fields in js::analyze::Bytecode), SSA analysis results from bytecode analysis, and type results from inference analysis, are all stored inside js::analyze::Bytecode. An array pointers to these bytecode objects is allocated, with size equal to the compiled script's actual bytecode size (i.e. a word for every byte of bytecode). For offsets corresponding to actual bytecode ops, these pointers are initialized to a js::analyze::Bytecode allocation, which is itself about 9 machine words in size. After rm analyzeTypes, most of this info will not be used by either Ion or Baseline. As a first step to reducing memory usage, we can reorganize these data into two separately rooted structures: the first holding just data relevant for Ion and baseline, and the second holding data relevant for just JM. In situations where JM is not enabled, the latter data can simply never be allocated.
Could we just wait for JM to be removed? That is happening at the beginning of the next cycle, right? (Why can't that happen now, for that matter?)
Doing this improve our memory story right now without introducing much in the way of unknowns. This approach is good for b2g, which is looking for near-term memory wins. We still need some variant of analyzeSSA for things like |NewScriptProperties|. This work can be organized in such a way as to bring bring in immediate nominal memory wins, as well as making it easier to strip this functionality down the line. For the first iteration, I'm keepign the codeArray as-is, but changing Bytecode to store a single pointer to all the heavyweight data, which is direct-allocated in a separate array whose size is exactly the same as the number of bytecode ops, and linked into from js:analyze::Bytecode. It's a simple low-risk change that should reduce Bytecode overhead by about 5 words.
These patches are finally looking to be in somewhat OK shape. Try build for the patches here: https://tbpl.mozilla.org/?tree=Try&rev=eed38c57e708 The remaining issues seem to be timeouts with --ion-eager only, because in the worst case, analysis gets done 2x for scripts, and --ion-eager triggers that situation more often, causing much higher run times for scripts. Otherwise, try seems green. Before I post the patches, an explanation of the changes: 1. The Bytecode struct is split into two structs: Bytecode and BytecodeExtra. Bytecode gains a 10-bit "smallStackDepth" field and a 6-bit "smallLoopDepth" field. With a "light analysis", ops which have a stack depth that fits in this field, and a loop depth that fits within this field, and are not monitored ops, will have their "extra" pointer set to NULL. When running a "heavy analysis", all ops will have a valid "extra" pointer. 2. The analysis control flow into two paths - one for heavy analysis and one for light analysis. In light analysis, allocations for "extra" Bytecode objects, and for SSA values, and other objects which are attached to the "extra" field, is done from a separate "heavyAlloc" LifoAlloc, which is freed at the end of anlysis (however, before it's freed, the Bytecode array is scanned, and any ops which need their BytecodeExtra - such as ops with large stackDepths, loopDepths, or monitored ops - have their BytecodeExtra copied into the main analysis lifo alloc space). 3. Ion and baseline are change to invoke the light analysis instead of the heavy analysis.
I did an AWSY run of the patch here: https://areweslimyet.com/?series=lightanalysis Then, downloaded the AWSY analysis json files for the analysis and a reference AWSY run from the base build it used: Reference json: https://areweslimyet.com/data/7ab7562be0e6807e7397a6ef94fbd6ade001d0e8.json.gz Withpatch json: https://areweslimyet.com/data/23621d4b9c30820a90e137c529f32caa10b08041.json.gz Then I wrote a python script to sum up all the 'analysis-pool' numbers from both JSON files over all desktop runs. Reference: 129.24M With patch: 94.31M About 28% reduction in analysis-pool data. I was hoping for more, but if this holds up in b2g, it should be good. From an the about:memory json files I downloaded from jlebar's mailing list thread, I noticed about 1.8/5.8MB (31%) being used for analysis-pool data. This redunction should then equate to about an 8% overall reduction in JS memory overhead. Building b2g for unagi right now so we can get hard numbers.
Looking at the complexity of these patches and the limited gains, I don't think this is the right approach. We should be looking to nearly eliminate the analysis-pool data, and after some more thinking it *is* possible to do this even with JM still in the tree. The idea is to leave the ScriptAnalysis logic alone but only fill script->analysis when compiling with JM or when analyzing 'arguments' usage or 'new' script properties. The latter two would be the only use cases with baseline/ion are active and should be very limited in scope compared to what we do now. How that can be done: 1. Do bug 865059. This should require a lot fewer changes than this patch, mostly just changing the interpreter so that it doesn't require any scripts to be analyzed and only monitors opcode results if the script actually has been analyzed. The main sticking point is what to do if JM+TI wants to do OSR (where incomplete interpreter types have correctness problems), but that should be easy to resolve --- keep a bit on JSScript indicating whether the script has ever executed without being analyzed, abort JM compilation if that bit is set, and analyze scripts eagerly if JM is enabled. This should be robust against cx->methodJitEnabled changing in value. Per bug 678037 comment 33 we will only baseline compile ~20% of code that executes at all in normal web workloads. So I would hope for a reduction in analysis pool memory of up to 80%. 2. Integrate the logic/data needed by baseline/ion into baseline compilation, the baseline script and baseline ICs itself. This is more complex and less critical than bug 865059 but would allow the script->analysis and baseline/ion stack to be entirely decoupled. Storing information like the pc->bytecodeTypes map and monitoring info (arrayWriteHole, etc.) can be efficiently stored on the ICStubs themselves, and other data like knowing which ops are jump targets does not need to live beyond baseline compilation.
Ok, sounds good. I'll get started on point 2 as discussed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.