Closed Bug 613221 Opened 9 years ago Closed 9 years ago

TypeInference: make inference structures GC safe

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: bhackett, Assigned: bhackett)

References

(Blocks 1 open bug)

Details

(Whiteboard: fixed-in-jaegermonkey)

Attachments

(2 files, 2 obsolete files)

Currently, all inference data is allocated out of per-script arenas and points freely to other arenas, with no regard for the GC (--enable-type-inference disables the GC).  This can be fixed in two steps.

1. Mark all atoms and objects which inference data points to, but don't collect the inference data or scripts themselves.  This allows GC of everything but the scripts themselves to happen.

2. Collect inference data when collecting a script.  The type constraints in a script can influence the types in other scripts, but only so long as the code they describe is still capable of running.  The pointers between constraints in different scripts are thus weak references and should be removed during sweep.
Attached patch part 1 patchSplinter Review
This takes care of part 1 above, marking atoms the analysis uses but not collecting the scripts or analysis data until the compartment itself is destroyed.

http://hg.mozilla.org/projects/jaegermonkey/rev/f098b0cf0761
Assignee: general → bhackett1024
Depends on: 621301
General thoughts on how to proceed with this, for reference when I get back or if someone picks this up in the meantime (would be hard).

Right now type constraints form a web where constraints generated by analysis of one script can use and reference type sets generated during analysis of other scripts.  We should instead impose some order on this, recording the external inputs used during analysis of each script and deriving all type constraints generated during analysis from these inputs.  These inputs include the script's own argument type sets, return type sets of callees, type sets of upvars, type sets associated with object properties, and any dynamic overflows or undefined reads which occur.

The behavior of the analysis on a script (the result types it generates, etc.) is functionally determined from these inputs; if we record and remember on the TypeScript which inputs the script's analysis depends on, then after analysis we can at any time discard the constraints themselves and the type sets associated with stack values and non-closed locals.  If any of the input type sets subsequently changes, the analysis must be redone, but this should get rid of most of the long-term memory use of type inference (in a manner similar to bug 617656, aggressively discard from unused compartments on GC).  Remaining long-term data is the type sets for type object properties, script arguments, retvals and closed locals, and the coarse dependency graph of particular scripts on this data.

The type constraints on a script can still propagate types into arbitrary places (this is recording inputs, not outputs).  If a GC collects a TypeObject or JSScript with a TypeScript (once generational GC is in place, this should only happen on a major collection), then discard all constraints for all scripts in the compartment per the above, remembering the inputs (except inputs from collected TypeObjects/TypeScripts) so that the analysis can be redone later.  Remaining type sets (long term arguments etc. listed above) also must be scanned to remove any destroyed TypeObjects.
Attached patch WIP (obsolete) — Splinter Review
This handles inference memory management by throwing away all intermediate analysis information (types of stack operands, generated type constraints) on every GC, compressing them down to constraints on property/var/etc. type sets describing which scripts to reanalyze/recompile should that type set change.  Still to do:

- Don't discard analysis information or collect scripts or type objects if the GC was triggered during analysis itself, during compilation or any other time the engine is directly using the intermediate type information.

- Handle eval right.  These are discarded up at the start of a GC rather than at the end as with other scripts, which is problematic if we've constructed constraints reasoning about the behavior of the eval script.

- Understand the cycle collector better, and whether any more treatment is needed for it.  In particular, it can traverse the JS heap, but can it collect JS objects and scripts or will it just break the cycles and let the JS GC (and COM pointers) clean things up.
(In reply to comment #3)
> - Understand the cycle collector better, and whether any more treatment is
> needed for it.  In particular, it can traverse the JS heap, but can it collect
> JS objects and scripts or will it just break the cycles and let the JS GC (and
> COM pointers) clean things up.

The latter.

/be
Attached patch WIP 2 (obsolete) — Splinter Review
Fixes most remaining issues, one trace-test failure.  Still issues related to dynamic scoping --- 'with', 'let', and dynamic binding from 'var' and 'function'.  Inference handles these by marking the relevant type sets as unknown, but if there is a 'function foo() { function bar() {} }' we want to analyze/compile bar without necessarily having analyzed foo (goes double for 'eval(function bar() {})').  This probably requires moving the book-keeping inference does to handle dynamic scoping into the parser itself.
Attachment #515425 - Attachment is obsolete: true
Comment 5 threw me -- what does "dynamic scoping" applied to 'let'?

/be
> Comment 5 threw me -- what does "dynamic scoping" applied to 'let'?

Likewise for 'var' (except at global scope) and almost always for 'function' (the only case of dynamic 'function' binding is a non-standard SpiderMonkey extension where local function declarations only extend the lexical environment if they are dynamically executed). But 'let' *never* does dynamic binding.

Dave
'dynamic' only in the sense that the variable names bound by a script depend on the current program point, and are not invariant across the whole script.  This is rooted in a desire for inference to understand JSOP_NAME opcodes, but since name resolution depends on all containing scripts which we now have no guarantee of having analyzed, I think now this is a lost cause and we'll just punt on NAME and fall back to PICs (shouldn't affect SS/V8 much).
Blocks: 637674
(In reply to comment #8)
> 'dynamic' only in the sense that the variable names bound by a script depend on
> the current program point, and are not invariant across the whole script.

Strongly suggest different term than 'dynamic', even as local jargon.

In particular let is lexically scoped to enclosing braced block or implicit block in the case of for (;;) loop (and implicit body block in the case of for-in loops; see bug 449811 for how this is a change from the current implicit block around the entire for-in loop). So using "dynamic" to mean something that has static and in Harmony lexical scope is like mixing vinegar and baking soda.

/be
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-jaegermonkey
Restore some code ripped out above which looks up GETGNAME/CALLGNAME properties during inference that might be lazily loaded (Math, etc.).  This code generally isn't necessary if the script was interpreted some before analyzing (the interpreter would do the lookup), but is useful for -m -a.

http://hg.mozilla.org/projects/jaegermonkey/rev/ae7853f80529
You need to log in before you can comment on or make changes to this bug.