Closed Bug 557407 Opened 14 years ago Closed 11 years ago

Introduce Javascript type inference


(Core :: JavaScript Engine, enhancement)

Not set





(Reporter: bhackett1024, Assigned: bhackett1024)




(4 files, 35 obsolete files)

584.95 KB, patch
Details | Diff | Splinter Review
59.67 KB, patch
Details | Diff | Splinter Review
57.79 KB, patch
Details | Diff | Splinter Review
206.08 KB, patch
Details | Diff | Splinter Review
This is a bug for discussing and improving a proposed type inference engine for SpiderMonkey.  The attached patch is a prototype which analyzes the first executed script and dumps types inferred for that script to a file infer.txt.

Input: bytecode for the script and functions it defines.
Output: possible types for each stack value, variable, property, etc.

The goal of this inference is to compute sound type sets for each value, modulo future evals and integer overflows which might occur.  When those evals or overflows occur, the type sets will be able to be updated dynamically to reflect any new possible types.

This prototype does not yet handle eval or overflow, nor several other features:

dynamic name resolution (e.g. with), NAME/CALLNAME are treated as accessing globals
variables accessed by closures
iteration, accessing arbitrary properties of an object
arguments array, the global object

This inference takes about 8 ms on my machine to run all of the SunSpider benchmarks (several of these use one of the above features, so the inferred types are not complete with respect to those features).

The inference algorithm used is similar to an Andersen-style pointer analysis, constructing and solving set constraints describing the possible types of each value.
Attached patch Patch for inference prototype (obsolete) — Splinter Review
Assignee: general → bhackett1024
Given namespace ti it's ok with me if you use just "Script" not "TiScript", etc. as the typename.

The copyright date range came from jspropertytree.h but you should cite just 2010 I think ;-).

Looks great otherwise! More in a bit.

Attached patch a few fixes(?) (obsolete) — Splinter Review
Fix copyright year range to be just 2010.

Separate #include "js*inlines.h" (implementation, not interface) to come after all interface #includes.

Use enum not #define for TYPE_ flags. Add TYPE_VOID (undefined's type) and reoder from bottom to top or narrow to wide. TYPE_VOID is needed for cases such as this:

function make(a,b,c) {
  var o = {a: a, b: b, c: c};
  if (typeof a == "string")
    return {a: undefined, b: null, c: 4};
  return o;
o1 = make(1,2,3)
o2 = make("one", true, 3)
print(o1.a, o1.b, o1.c);
print(o2.a, o2.b, o2.c);

Use size_t consistently to feed into sizeof-based computations, and to avoid signed vs. unsigned warnings.

Roll flag test-based if-then actions into loops. Is this ok for now? It assumes TYPE_GLOBAL is highest bit that can be rolled, and that it is not uint32(1)<<31.

Handle JSOP_DEFVAR in global code (still has the same TODO as JSOP_NAME, etc. in function and eval code).

Recognize the global undefined constant (again assuming global code only).


Attachment #437243 - Flags: review?(bhackett1024)
ti doesn't mean anything to me.  I have some difficulty believing it could mean much of anything to most prospective SpiderMonkey hackers, sight unseen.  Strawmen that provide better (or worse) guidance:


Names boil away pretty easily with namespace and using blocks, so I wouldn't be too concerned about trsnss.
'namespace types' looks good to me.

I kept the Ti* names to make it clearer in the inference code itself when it's dealing with type information rather than a JS construct, e.g. TiScript vs. JSScript rather than Script vs. JSScript.  Not attached to these though.

For TYPE_VOID, the main issue is making sure the type sets contain VOID when the value can actually be void, e.g. the following:

var a = 3;
var b;
var c = a + b;

var x = new Object();
x.a = 3;
var c = x.a + x.b;

I'd thought of the type sets as all implicitly containing void, but now realize that a lot of the time we can correctly identify values which are definitely defined, even with a flow-insensitive analysis like this.  Still, as it stands anything read out of a variable or property should be marked as maybe-void.  Definitely need to look into keeping track of defined variables during the inference's flow sensitive stack analysis to improve this.
Attached patch latest hacks, mainly fyi (obsolete) — Splinter Review
interdiff -w against your patch next.

The main changes here are

* An explicit loop in SetupByteCode to avoid unoptimized tail calling in gcc -g.

* Code to inspect of next op after NAME/GETGVAR/GETLOCAL to check for a POP, which must mean a "var x;" produced the sequence and the type of the default undefined value (TYPE_VOID) should not be imputed to x.

We talked today since I hacked this about avoiding unwanted void-type unioning generally, but I thought this inspection hack might be worth keeping.

Attachment #437243 - Attachment is obsolete: true
Attachment #437368 - Flags: review?(bhackett1024)
Attachment #437243 - Flags: review?(bhackett1024)
Attachment #437244 - Attachment is obsolete: true
This patch adds a flow sensitive analysis for inferring defined variables at each point, and adds TYPE_VOID when a read of a possibly-undefined variable occurs.  For the sunspider benchmarks, the total runtime for the defined analysis + type analysis is about 10ms on my machine (an increase of 2ms over the type analysis alone).

No analysis of properties is performed, but whenever a property is read it is assumed to be defined (no TYPE_VOID).  This case is analogous to integer overflow, where if someone *does* read an undefined property (hopefully a rare event), the inferred information would need to be fixed up dynamically.
Attachment #437190 - Attachment is obsolete: true
Attachment #437368 - Attachment is obsolete: true
Attachment #437370 - Attachment is obsolete: true
Attachment #437368 - Flags: review?(bhackett1024)
This patch adds NAME resolution and proper lookup of the scope for variables within 'with', accessed from a closure, and elsewhere (let vars are not handled yet, should be otherwise complete).
Attachment #437583 - Attachment is obsolete: true
This patch adds support for accesses to arbitrary elements of an object (which can be distinguished from accesses to integer indexes), the arguments object, proper handling of prototypes, and some other fixes.  There are still some things that aren't handled, e.g. exceptions, but the static side of the inference is mostly feature complete (famous last words).
Attachment #438238 - Attachment is obsolete: true
This patch adds handling for exceptions and for the call and apply properties.
Attachment #439152 - Attachment is obsolete: true
This patch runs the inference incrementally whenever eval or Function are called.  The defined variables analysis is revamped to compute information for closure vars (and all other vars), and handle calls to functions via eval or arbitrary property lookups.
Attachment #439371 - Attachment is obsolete: true
This patch adds runtime checks on GETPROP/GETELEM to watch for undefined values and report them to the inference.  Overflows to floats for numeric operations are also reported, and the information is used to incrementally update the inference results (as done for eval).  This covers most of the runtime checks required by the inference; the other (minor) ones are:

- watch for GETELEM accesses on builtin properties like toString.
- watch for exception handlers catching runtime errors like ReferenceError.

This patch also includes a mechanism for measuring the number of times a method would need to be re-JIT'ed if the inference information was used in the method JIT.  This mechanism assumes methods are JIT'ed in the following fashion (based on what JagerMonkey is doing):

1. Scripts are initially JIT'ed when they first execute.  Changes to the inference information before this initial execution do not require re-JIT'ing.
2. When the inference information changes (due to eval, undefined or overflow), those changes are propagated everywhere possible and any JIT'ed scripts affected by the changes are immediately re-JIT'ed.

Under these measurements, running the sunspider benchmarks would require 15 re-JITs of a script (this could be reduced somewhat).  Summary of those 15 coming in a few minutes.
Attachment #440394 - Attachment is obsolete: true
Here are the re-JITs required by the inference for sunspider, separated by category.

eval: 5 jits

#1: date-format-tofte: eval() result can be an integer, assigned to ia[...].
#2: date-format-tofte: above propagated to the second argument of arrayExists
#3/#4: date-format-xparb: find new targets for this[func](), introduced by eval()
#5: string-tagcloud: new values for 'this' in parseJSON introduced by eval() on giant JSON blob.  'this' should only be a string, but gets conflated with everything else due to polymorphic behavior of walk().

undefined: 9 jits

#6: 3d-raytrace: test on closest.shader which might be void.
#7: 3d-raytrace: comparison on colour.reflection which might be void.
#8: 3d-raytrace: self array passed to invertMatrix contains undefined elements.
#9: crypto-md5: str2binl does |= on undefined array elements.
#10: crypto-sha1: ditto for str2binb
#11: date-format-tofte: test on ia[ij] eventually returns void
#12: date-format-tofte: above propagated to second argument of arrayExists
#13: date-format-xparb: test on Date.formatFunctions[format] returns void
#14: string-tagcloud: test on Object.prototype.toJSONString returns void

overflow: 1 jit

#15: math-partialsums: k2*k eventually overflows

This is an overapproximation of the number of re-JITs actually required; some of these changes would not affect the generated code.  #3 and #4 are finding new callees for a call site that already has multiple callees.  #6 and #11-#14 may or may not be required; the inference could add void regardless if it sees a GETPROP/GETELEM whose result will be tested for existence.
This patch reduces the number of re-JITs required on sunspider to 9, fixing #3, #4, #6, #11, #13, and #14 above.
Attachment #440767 - Attachment is obsolete: true
It might be good to try other benchmarks, since we know SunSpider traces pretty well by now. Some bugs that we know don't always trace well:

 * JPEG encoder (bug 531185)
 * v8 suite
 * emulators (bug 514765 - there's an NES one too but trunk crashes on it atm)
 * instability or weirdness (bug 470779, bug 535925).
Great stuff.  By and large the inference handles these well, though there are (mostly fixable) issues with several.

JPEG encoder:

No JITs needed and accurate types found, but the inference is super wasteful in handling the input image data, a giant array of integers.  Each INITELEM needs several heap allocations and indirect calls to mark something as an int which is already known to be an int.  Will fix this to use low overhead and no allocation (except one baseline sizeof(void*) per bytecode) when handling initializers building arrays of constants, arrays of arrays of constants, arrays of objects containing constants, etc.

V8 suite:

No JITs needed and handled fine except for earley-boyer and raytrace.

earley-boyer: performance problem in that each of the ~1500 sc_Pair (cons cells) creations makes a new allocation site, and these all point to each other through their cdr entries.  Would be good to fold together uses of 'new' on the same non-builtin function, as the inference isn't likely to meaningfully distinguish them anyways.

raytrace: uses Object.extend from the Prototype library which does arbitrary assigns to properties of most objects in the program.  This library is intensely polymorphic and uses arbitrary accesses extensively; the inference is going to do a very poor job on web sites which use it a lot.  Need a way to quickly and soundly more or less bail out on JS doing this sort of stuff, and still provide type information for code that is separable.


Same issue as the JPEG encoder, there's a giant array of integers to process.  6 JITs needed for types changing after 'load' calls.


No problem with these, though for 535925 the non-reduced testcase gave a syntax error.
This patch has some performance fixes and other changes.


- No bytecode, stack or constraint structures are constructed when processing constant values used in initializers
- Instead of using an unsorted array, sets of objects/scripts are represented as singletons (<= 1 value, no allocation needed), arrays (2..8 values), or hashtables (9+ values).
- Objects created by using 'new' on the same (non-Object/Array) function in different places are considered the same. 


Sunspider improved from 29ms on my laptop to 20ms.

crypto.js:                      4.498 ms  
deltablue.js:                   0.716 ms  
earley-boyer.js:                12.25 ms  
raytrace.js:                    108.11 ms  (still needs fixing)
regexp.js:                      0.362 ms  
richards.js:                    0.476 ms  
splay.js:                       0.362 ms  

JPEG encoder:  35ms
Emulator:      17ms
Attachment #440832 - Attachment is obsolete: true
(In reply to comment #19)
> - Objects created by using 'new' on the same (non-Object/Array) function in
> different places are considered the same. 

Does this handle the case where the constructor is redefined?

var obj = new Thing();
Thing = OtherThing;
var otherObj = new Thing();

"Handle" here might include throwing up its type-inferring hands in disgust.
Yep, 'function' here is the (immutable) script used to initialize Thing/OtherThing, not the name itself.  When 'new Thing()' is called then the result could be the new object for either of the scripts used to initialize Thing/OtherThing.

Thing =
  function () { this.x = 1; }  // Function #1
OtherThing =
  function () { this.x = 2; }  // Function #2

// possible types for Thing are the union of Function #1
// and the possible types for OtherThing, so both Functions #1 and #2.
Thing = OtherThing;

// this can be calling #1 or #2, so the result object may be #1:new or #2:new
var obj = new Thing();
This patch uses an arena for all allocation, freeing everything when the context is finished (instead of letting everything leak, as before).  There is some waste (not leaks) in that when arrays of objects/scripts are reallocated the old arrays are no longer used.  Sunspider time reduced by 2ms.
Attachment #441575 - Attachment is obsolete: true
(In reply to comment #22)

Heads up: your patch is going to be affected when the patch in 559408 commits.
This patch collects statistics about the precision of generated type sets.  It also uses a worklist algorithm to avoid blowing the stack when propagating constraints, uses type set unification in a few places, and reworks the handling for builtin properties.

Statistics summary:

We're interested in how big the type sets are for values popped by the various bytecodes.  Ignoring stack values whose types are trivial to determine (fixed singleton type, e.g. the value pushed by the '2' in 'x + 2'), on the sunspider/v8/jpeg/emulator benchmarks the inference identifies a single type for 52% of popped values, and two types for an additional 38% (includes things like 'particular object or null', 'int or float', and 'particular type or void').  The remaining 10% can be large but there is still information on what they *can't* be.
Attachment #441664 - Attachment is obsolete: true
Attached patch new patch with unknown type (obsolete) — Splinter Review
This patch adds a TYPE_UNKNOWN, which might be any type.  If an object or script winds up in a set containing UNKNOWN, the object's properties and script's arguments have unknown type.

This is designed as a way for the inference to finish fast yet still provide some useful information if it finds the generated type sets to be getting too large.  The threshold for adding UNKNOWN is only hit on the raytrace.js V8 benchmark, and lowers the time for this one to 5ms with only a small loss in the precision of the result.
Attachment #442316 - Attachment is obsolete: true
Catching up - good news re comments 18/19. What's the heuristic for deciding that the inference won't finish quickly?
The time the inference takes is more or less proportional to the total sizes of the type sets when it finishes.  For each object and script we estimate its effective contribution to these sizes as the sum over all type sets it is added to of the size of those type sets when it is added.  When this contribution exceeds some threshold (how to compute this threshold is still up in the air, right now it's a kludge of 5000) the object/script is marked as unknown, and will not be added to any further type sets.  This measure of contribution is nice because:

- It is easy to compute
- It does not fire when there are occasionally huge type sets, which can legitimately occur.
- It does not fire when a single object/script is used in many places, but those places all have small sets.
- It picks up on bad behavior quickly, which is when there are many large and similar type sets.
This patch adds two main features:

- Ability to precisely map a jsval to its type inference representation. JSObject and JSFunction have pointers to their type representation (this could probably be done more compactly).  This enables three things:

1. Cross check inference information against the values that actually appear in the interpreter.  This is done if JS_TYPES_CHECK_STACK is defined; the inference generates correct types for all the sunspider/V8/jpeg/emulator benchmarks.
2. Record types seen by the interpreter when recording executions (TODO).
3. Opens door to using more runtime checks to handle polymorphism well (TODO, speculative).

- Generate type information for builtin properties/functions at the same time as the corresponding JSObjects and JSFunctions are generated.  Makes it much easier to ensure the builtin information used by the inference is complete and correct.

(This patch is fairly invasive, changing signatures of JSAPI methods to take inference objects and handlers.  This isn't really necessary and could be changed, but is easier for development to make sure all objects/functions have an inference representation).
Attachment #442795 - Attachment is obsolete: true
Brian, could you fold this in? Mostly style policing, and C++ namespace reduction.

I'll look at the new patch tonight. First thing we want to avoid is major API change. A parallel API might be too bloaty. Something has to give, will noodle on it and comment soon.

Sure.  Also, I've gotten rid of the handlerPrivate stuff from TiFunction and all the places it is passed around (can post that patch if you want).
This patch adds an execution recorder which serializes and writes to disk all information necessary to run the inference offline (not the interpreter itself).  This includes the contents of all scripts, all type objects and functions created, all builtin properties added, and the full trace through the scripts including the types of all values popped by each bytecode.  For all the benchmarks, recording and then replaying the execution does the same thing as running the type inference straight up with the interpreter.

This also removes most dependencies the inference has on the base JS information, and cleans up the inference interface in JSAPI some.
Attachment #444342 - Attachment is obsolete: true
Attachment #444541 - Attachment is obsolete: true
This patch adds handling for accesses and calls on XPCOM objects, so that correct types are generated by the inference for all the code that runs when the browser starts up.  This works by:

A) Assigning consistent names to XPCOM objects based on their interface name, e.g. XPC.nsISupports, XPC.nsIScriptableInterfaces.
B) Whenever a property access or call occurs on one of these objects, the result is supplied to the inference which does incremental analysis (probably requiring one or more JITs).

This design is nice in that XPCOM or other JS-external clients don't have to supply a full description of the objects they create ahead of time.  The problems:

A) More runtime checks.  These checks are only needed when using an object which has properties the inference doesn't know about; there isn't yet a clear line drawn here, so the checks are for now done on all property accesses and all calls (yuck).  Once this is fixed the overhead should be low. 
B) More JITs.  The inference doesn't know about the types of accesses until they happen, which cripples its ahead-of-time analysis.  The plan is to supply the inference incomplete, unsound hints about the possible types of particular accesses (e.g. the 'classes' property of a XPC.nsIXPCComponents might be a XPC.nsIXPCComponents_Classes).
Attachment #445479 - Attachment is obsolete: true
This patch adds a (fairly kludgey) table of guesses for the types of a few hundred properties and functions that show up in the XPCOM-accessing code that runs at browser startup and in the quick stubs for the DOM.  Not intended to be permanent, but largely reduces the need for re-JIT'ing; currently 23 re-JITs (in 11 different scripts) are needed for the startup code.
Attachment #446541 - Attachment is obsolete: true
Re-JIT confuses people. Suggest "Re-Whip", an apt metaphor given these lyrics.


"crack that whip
give the past the slip
step on a crack
break your momma's back
when a problem comes along
you must whip it
before the cream sits out too long
you must whip it
when something's going wrong
you must whip it

now whip it
into shape
shape it up
get straight
go forward
move ahead
try to detect it
it's not too late
to whip it
whip it good

when a good time turns around
you must whip it
you will never live it down
unless you whip it
no one gets away
until they whip it

i say whip it
whip it good
i say whip it
whip it good"
This patch adds an instrumentation mechanism allowing the inference to mark certain bytecodes as needing monitoring, e.g. accesses to XPCOM objects, calls to functions with unknown return types, and (new) accesses to arbitrary string properties.  Other cleanup to treat TiFunction as a subclass of TiObject (as for JSObject/JSFunction) avoiding separate treatment of the two, and to represent single types as a word sized value jstype, which uses the 3 lower bits as tag bits for primitive types, and is otherwise a pointer to a TiObject.

The change for arbitrary accesses allows the inference to analyze V8/raytrace.js precisely, and greatly improves precision on the Firefox startup code (which needs some more work though, as the new precision exposed missing handling for constructs like getters/setters).
Attachment #446816 - Attachment is obsolete: true
This patch fixes issues with the browser startup JS exposed by the increased precision of the last patch, so that correct types are found throughout the execution.  This includes handling getters/setters and miscellany like watching for overflows on Math.floor.

This also adds an option for the inference to be lazy, to only analyze a script when it is first executed, keeping it more in line with the method JIT and avoiding doing work on code which is never executed but is not statically dead (which may be substantial; twice as much code is analyzed for the startup using an eager vs. lazy analysis).  Recompilations for this option are measured based on a JIT which does per-bytecode compilation (even for the initial compilation); this will change to get closer to what JM is actually doing.

Lazy analysis of the startup code takes 57ms, with 1/2 type sets found for 90% of stack values, and recompilation of 150 bytecodes required.  This code is about twice as large as sunspider, performance is not too far off but room for improvement (though focus is going to shift to live websites).
Attachment #448046 - Attachment is obsolete: true
This patch adds a configuration option --enable-type-inference, tied to a JS_TYPE_INFERENCE define which turns on or off the type inference and the types associated with objects and functions.  If the flag is not enabled there shouldn't be any runtime change other than the extra ignored type argument when objects/functions are created.  The JS API changes have been reworked to be backwards compatible.  This also updates the the patch to mozilla-central as of yesterday.

Using this patch I've built the browser with inference in debug and release modes and without inference in release mode, and things seem to be working.  What additional testing needs to happen for this to get checked in?
Attachment #448607 - Attachment is obsolete: true
Attachment #449112 - Flags: review?(gal)
Brian, do you have tryserver access? Without --enable-type-inference it might be a waste of tryserver resources; running js/src/tests/ and js/src/trace-test/ (note plural vs. singular) should be enough if you've ifdef'ed carefully. But a tryserver run with --enable-type-inference could be worth it.

I don't have commit access, so don't think I can use the tryserver (will file a bug).  It would be good to run the tryserver anyways, the patch touches various things in XPCOM and the DOM code and I can't guarantee it doesn't affect the browser's behavior (though it shouldn't).
(In reply to comment #38)
> Brian, do you have tryserver access? Without --enable-type-inference it might
> be a waste of tryserver resources; running js/src/tests/ and
> js/src/trace-test/ (note plural vs. singular) should be enough if
> you've ifdef'ed carefully. But a tryserver run with --enable-type-inference
> could be worth it.
> /be

Hi, I ran these two regression suites and got failures in both (pasted below).  I get the same failures if I build JS from an unmodified mozilla-central tree.  Is this expected?  Is there some platform I should be running the tests on?  (this is on a Mac powerbook).



This patch fixes things so that the inference can run on simple JS-enabled webpages and generate correct results, e.g. knowing about properties of the window/global object.  Scope lookups are stripped down so that we don't try to keep track of parent chains on objects (error-prone and pointless), and just monitor name bytecodes appearing in a script which is executed on a non-global object.

For a simple website 'alert("Hello world!");' 80ms analysis time was needed for all loaded JS (about twice as large as sunspider) and accurate types found for 93% of stack values.  For the startup code and simple websites, about 15% of executed instructions need to be monitored in some way (e.g. results of calls/gets reported to the inference), mostly from uses of DOM/XPCOM objects whose properties aren't known.  This compares with 0.4% for sunspider/v8.

Regression status: for both a stock mozilla-central and with this patch (inference disabled), I get failures in the following JS regressions:


After enabling type inference, I get the above failures plus the following ones:


The first is due to the extra word added to JSObject; by itself this is enough to force an out of memory condition.  The latter four are due to creating objects/scripts with 2^16 variables, which the inference does not cope with well (it uses plain linked lists to manage sets of variables/properties, needs fixing).
Attachment #449112 - Attachment is obsolete: true
Attachment #450017 - Flags: review?(gal)
Attachment #449112 - Flags: review?(gal)
I started reviewing this. Will take a bit.
Alright, I skimmed the patch. Lets start breaking up a bit. You need certain changes to data structures and APIs. We should start with those. Can you make an independent patch that changes jsobj.h and other engine internal structures. and another one that contains the API changes (jsapi.h, anything JS_bla). Your changes to JSObject totally make the GC lose its little mind. You are bumping the size of JSObject from 32 to 36 (rounded up to 40), which means instead of a cheap shift, calculating the mark bitmap bit becomes a division/multiplication operation, which probably makes GCing them a misery. We will have to figure out how to do this stuff efficiently.

We should go through this subset and try to make the necessary hooks for you and land that, before we land the actual analysis. We should be careful about breaking APIs (anything JS_ and anything in jsapi.h and jspubtd.h). We can break them if it makes sense, but it should be clear that there is no other way to do it.
Hmm, one way to keep the size of an object the same is to replace classword with typeword, and hang the JSClass off the type object.  The class of an object is immutable, right?  The patch should already maintain the invariant that all objects with the same type object have the same class (need to test), except with the (fixable) default object in release mode.  Downside is the extra dereference to get to the class; maybe use the (unused) third bit in the word to store JSCLASS_IS_PRIVATE?

The changes to jsapi.h should be fully backwards compatible (is JS_TN part of the API?), though several functions that used to be externs are now inlines.
OK, it looks like the class of an object can change, but only for arrays (dense -> slow) and typed arrays (slow -> fast).  I'm going to work on a patch to use the (mutable) third bit in the classword to store whether an array is dense/fast, so that the class itself does not have to change after creation.  Is this reasonable?
> Downside is the extra dereference to get to the class

Yeah, we'd have to be careful of the cache effects...

Watch out for typed arrays too?
Class can change for other objects too, i.e. for proxies or classes the embedding implements. The jit generates code to test the class word and then treat the object special (i.e. dense array -> reach into dslots directly by index). Will that test stay the same?
Where in the code do the proxy/embedding class changes happen?  No hits in mxr on classword outside js/src, and jsproxy.cpp doesn't mention it.  Is init (hard to grep for!) called on objects that are already initialized with a different class?

Guarding on classes is slightly different (no slower), instead of mask out the low 2 bits and test equality against a constant JSClass*, mask out the low 2 bits and test equality against a constant classword.  e.g. guardClassHelper has a new parameter 'bool dense', and generates the following:

jsuword classword = jsuword(clasp) | (dense ? jsuword(4) : jsuword(0));
guard(cond, addName(lir->ins2(LIR_eqp, class_ins, INS_CONSTWORD(classword)), namebuf), exit);
Brian, there is no guaranteed 3rd bit free in classword. I would not mess with the dense vs. slow array class business right now, it will conflict with other work.

Speaking of which, and don't let this slow you down: See also bug 558451, which I'll be finishing this week (I'm traveling, in London now, speaking tomorrow early, not hacking till after the talk). The patch for bug 558451 separates classword into an untagged JSClass *clasp member and a jsuword flags member (type to be finalized so it packs with jsvals).

This patch temporarily inflates JSObject, which indeed hurts performance, so before I'm done we need to deflate again. Yeah, I'm talking big here ;-).

The final JSObject should look *something* like {ops, props, dslots, fslots[5]} with no title-lock, and with ops and clasp fused. This suggests fusing type, ops, and clasp somehow. The idea is to copy the JSClass provided by API users when creating new objects. This breaks JS_InstanceOf callers, so we'd have to save the original (typically static, but not necessarily) address of the user's JSClass instance in the copy.

The advantage is that we fold ops and clasp together and minimize hooks, at the same time removing one degree of indirection (dependent load) to get to the ops (currently accessed via obj->map->ops).

Hope this helps,

(In reply to comment #49)
> Brian, there is no guaranteed 3rd bit free in classword.

Forgot to say why: JSClass instances are usually static singletons per native or "host" built-in constructor, and can't be assumed to be aligned on 0 mod 8 grain. Most we can assume is 0 mod 4.

Looks like we can bump JSObject size after all. We have a mark bit per 8-bytes now, not per sizeof(JSObject).
(In reply to comment #51)
> Looks like we can bump JSObject size after all. We have a mark bit per 8-bytes
> now, not per sizeof(JSObject).

We should bump, or have several sizes, as a separate exercise from this bug, IMO. In any event small objects matter, from Gregor's charts at bug 502736, so if we can fuse ops/clasp/type and keep only one pointer, we save more space for the real stuff. How many type-like members does it take to screw in a JSObject light bulb?

Currently JSObject stores parent/proto as jsval. Yet they are objects so we can get at least 6 bits if we replaces those proto/parent slots with separated words. And if one take into account that objects are 32-byte aligned then there are 2*5 bits available. The parent slot bits are particularly cheap as parent is rarely used.
(In reply to comment #52)
> if we can fuse ops/clasp/type and keep only one pointer, we save more space for
> the real stuff.

It would be nice to fuse JSClass and JSObjectOps. And if the type information can be stored there as well, even better. We really should not need more than one vtable per object.
Let's not bit-tag more, we don't need to and it just slows us down compared to the competition, AFAICS.

Brian, you're looking for a whole typeword or equivalent, right?

Alright. So we can extend JSObject if need to. Lets ask the next logical question. Why do we have to store that word in JSObject? When is that word accessed, from where, and how frequently? Would a hash map work? Increasing the size of every object in the system should be well justified.
What the inference needs is to be able to access the type information for a given JSObject.  That could be done as a separate pointer, but it would be great, yeah, to fold together the class, ops, and type information.  For this patch it's probably best to defer and leave it as a separate pointer (which only exists when type inference is enabled).

We need to access the type information for an object so that we can determine, at runtime, whether a given jsval is contained in a given type set, and if not then add the jsval's type to that set and run the inference incrementally.  This is for two main cases:

Accesses on client objects with unknown properties.  If there is an access to the inference doesn't know the type of bar with certainty so it waits until the statement is executed and propagates based on the result.  If the statement is executed multiple times then the check is repeated, but if the same type keeps coming out then no more work is done.  This could (and for the DOM should) be fixed so that we know the possible types of interface values.

Dynamic checks for understanding polymorphic code.  If there is an assignment x[y] = z; where y can be a string and z could be many types, the inference will wait for the statement to execute and update the type sets on x precisely.  There are some other cases too (maybe more in the future): handling arguments at calls to many possible functions, and accesses to properties of values with many possible types.

Taken together, on the startup code and simple websites checks in one of these categories are needed for 15% of executed instructions.  If the first category is fixed (or when running code that doesn't do heavy XPCOM/DOM manipulation) this should come down to 2-3% (guess).  If the type information is a member (or similar) on JSObject, these checks will be quick; if the type information needs to be fetched from a hashtable, less so.
I don't want to give you a runaround here but I think we should measure the cost of doing this with a hash table, mostly to separate your code a bit from the rest of the engine.

Also, is there a less invasive way to add type information than JS_APIFunctionWithType? This is a big and complex to get right change to the API, and makes using JSAPI even more complicated than it already is. Also, we make new global objects all the time, so this information is built repeatedly. That seems wasteful. How about some sort of internal database that represents all known types for standard classes and their properties? You can query that instead of re-creating the information every time.
(In reply to comment #58)
> I don't want to give you a runaround here but I think we should measure the
> cost of doing this with a hash table, mostly to separate your code a bit from
> the rest of the engine.

How would using a hashtable affect this?  The places where the inference code gets entwined with the engine is when it's looking up or storing the type information for an object, and needs code whether it's being done as a pointer access or hashtable lookup.  I agree that using a hashtable would be good to measure perf (especially if the class/ops/type fusion turns out unworkable), but that needs to wait until the checks are being added to a tracing/method JIT to see the overall effect.

> Also, is there a less invasive way to add type information than
> JS_APIFunctionWithType? This is a big and complex to get right change to the
> API, and makes using JSAPI even more complicated than it already is.

There are two hazards with making setting type information a separate function: type information doesn't get set (hard to debug), and type information gets set multiple times (breaks correctness).  Plus doubling the number of calls when making a value and setting its type information.  I wish this patch didn't work on jsapi.h so much, but the nice thing at least is it is backwards compatible and all the old functions still exist and work the same way.

> Also, we make new global objects all the time, so this information is built repeatedly.
> That seems wasteful. How about some sort of internal database that represents
> all known types for standard classes and their properties? You can query that
> instead of re-creating the information every time.

The inference actually uses a single representation of Object/Function/Array/String/etc. per runtime, so when new global objects get created in that runtime new inference structures are not created, but the new JS values are checked against the types that are already there.  This design is simple but problematic in that we will confound information from different websites, e.g. a global variable in one site can affect types of an identically named variable in another.  A finer granularity would be nice (one type context per domain/protocol?), but as long as it's at least as coarse as JSContext this change can be made in the future without being too invasive.
Just to clarify, in parallel we should also definitely clean house in JSObject
and unify class/objectops. No doubt. I am volunteering to do most of that.
Well its backwards compatible, except if you accidentally use any of the old functions, the information isn't set, which is hard to debug.
Ah, but jsapi.h plays a trick where #if JS_TYPE_INFERENCE && DEBUG then objects and functions created without an explicit type are given types which are the file:line where they were created.  The old functions are still fine to use so long as the objects/functions they make don't escape somewhere a script can directly access them (there seem to be a fair number of such objects).
(In reply to comment #49)
> I would not mess with
> the dense vs. slow array class business right now, it will conflict with other
> work.

In particular bug 566767 which will introduce dense-array-of-int and dense-array-of-double.
Attached patch new patch with regression fixes (obsolete) — Splinter Review
This patch fixes the inference output to be correct for all of tests/  There are still 11 failures (3 baseline, 3 for memory pressure, 5 for lots of created variables).
Attachment #450017 - Attachment is obsolete: true
Attachment #452631 - Flags: review?(gal)
Attachment #450017 - Flags: review?(gal)
This patch rebases the inference to the Jaegermonkey branch.  Fixes handling for 64 bit values, GLOBAL opcodes.  Adds tables of dynamically inserted types for Sunspider so that inference-based optimizations can be tested without needing a recompiler or dynamically inserted checks.
This patch adds analyses to perform cross-branch/join constant/copy propagation and register allocation.  Propagation is a dataflow folded into the is-defined dataflow already present, register allocation is a backwards pass picking out, for each bytecode, loop variables and variables/slots which are live and likely to be used shortly and should be in particular registers at the start of the bytecode.

Total time for definitions + propagation + regalloc is 5ms on sunspider.

Register allocation does not account for temporaries needed in the middle of an opcode, nor for accesses on type tags (82% of values popped off the stack in sunspider have statically known type tags, 12% are either ints or doubles, 6% are unknown).

With this in mind, accesses in sunspider required for reading/writing payloads of stack and argument/local slots are 53 million with 4 registers allocated, 41 million with 5 registers allocated, 28 million with 6 registers allocated.  This out of a total of 290 million payload accesses.
Attachment #453153 - Attachment is obsolete: true
This patch modifies JM to use the precomputed register allocation for each bytecode.  This is a heavyweight approach (needs fixing) that forces a sync of the FrameState with the register info at each bytecode, keeping modifications to other parts of the FrameState or the Compiler as a whole to a minimum --- mainly in doing syncs / avoiding spills at branch points, and in using preferred registers for storing the output of variable loads and operations.

Works on at least a few regressions (tried nsieve-bits, crypto-aes), not tuned.
Attachment #454932 - Attachment is obsolete: true
Attached patch WIP rebase + cleanup (obsolete) — Splinter Review
Rebase on current TM, clean up a fair amount of cruft.  Compiles, doesn't work.
Attachment #452631 - Attachment is obsolete: true
Attachment #452631 - Flags: review?(gal)
Attached patch more WIP (obsolete) — Splinter Review
doesn't crash on SS, though warnings on missing types in several tests
Attachment #481951 - Attachment is obsolete: true
Attached patch working on SS (obsolete) — Splinter Review
This patch generates correct results on SS, analysis time is 15ms.
Attachment #482441 - Attachment is obsolete: true
Depends on: 604035
Depends on: 604045
Attached patch patch (obsolete) — Splinter Review
This generates correct types for all but 78 jstests.  Many of these are due either to the inference disabling GC (as it doesn't yet pin any GC objects it refers to) or to bug 584603.
Attachment #482709 - Attachment is obsolete: true
Depends on: 584603
Attached patch patch with better allocation (obsolete) — Splinter Review
This allocates inference objects in per-script arenas, rather than one big compartment-wide arena.  First step towards making inference data GC-safe; plan is to collect the inference data for a script when the script is collected, removing references to the script from type constraints on other scripts (i.e. constraints are weak references).
Attachment #484821 - Attachment is obsolete: true
Attached patch rebase on bug 604426, cleanup (obsolete) — Splinter Review
This is almost ready to land on the Jaegermonkey branch, so that work on integration into JM can proceed.  The plan for that integration (from discussion with dvander) is to use per-method recompilation, at least initially.  I still need to retune / measure the inference for per-method recompilation, and if numbers look good will land.

I'm not sure this recompilation strategy will work best long term when dealing with polymorphic code, but for monomorphic code it will work fine and the implementation complexity is strictly less than that needed for bytecode or basic block recompilation.
Attachment #485760 - Attachment is obsolete: true
Patch landed on the Jaegermonkey branch:

Sunspider analysis time is 17ms (including jsanalyze definitions/metadata pass), with 94% of stack values monomorphic.  8 recompilations required, which is a couple more than previous iterations, due to no longer tracking which globals are defined (regexp-dna has some use-before-defs of globals).

V8 analysis time is 28ms, with 92% of stack values monomorphic.  Except for raytrace and earley-boyer, 9 recompilations required.  raytrace requires 16 recompilations because the inference doesn't statically model calls to Function.prototype.apply, and takes too long to figure out the possible types of the Vector class (should get fixed).  earley-boyer requires 88 recompilations because of worse handling of polymorphism vs. earlier versions of the patch.  This benchmark uses polymorphic sc_Pair lists extensively, and most of the recompiles are for specializing the types of functions which access these lists.  This could be fixed either by a more fine grained initial compilation, interpreting functions some before compiling, or (easiest) marking reads of these lists as having unknown type.
Attachment #486713 - Attachment is obsolete: true
No longer depends on: 604035, 604045
Attached patch followup fixesSplinter Review
Followup fixes to get jit-tests mostly working when used with -m and inference.  36 failures left, which are mostly due to tests not being idempotent and breaking when run under the interpreter and then the method JIT.
Attached patch logging patchSplinter Review
This cleans up inference logging to behave similar to but separate from TM and JM logging, using env variable INFERFLAGS.
Attachment #456068 - Attachment is obsolete: true
Overhaul of handling of prototypes in type inference.  Prep for bugs 604035 and 612204.  Previously prototypes were handled by pushing all properties in a type object to the other type objects which could possibly have it as their prototype.  Now this propagation only happens for properties in instances which are explicitly accessed, and each type object a single other type object as its prototype (changing __proto__ on an object causes all its properties to become unknown).  Type properties distinguish between types which came from assignments to own properties vs. types which came from a prototype.
Performance for would depend completely on this, since I emulate a couple 8-bit CPU registers (I wrap the overflow by &'ing with 0xFF) and two 16-bit CPU registers (program counter and stack pointer; &'ed by 0xFFFF). If Firefox could tell I'm working with these specific types that would probably increase speed dramatically.
(In reply to comment #78)
> Performance for would depend completely on
> this, since I emulate a couple 8-bit CPU registers (I wrap the overflow by
> &'ing with 0xFF) and two 16-bit CPU registers (program counter and stack
> pointer; &'ed by 0xFFFF). If Firefox could tell I'm working with these specific
> types that would probably increase speed dramatically.

Use typed arrays,
I do, I'm talking about the individual CPU registers accessed as properties (A, B, C, D, E, F, H, L, PC, SP).
The RAM and ROM is done in typed arrays, but more type inference for the opcode workbench would be great.
Blocks: 628209
TI is well and truly introduced, by now.
Closed: 11 years ago
Resolution: --- → FIXED
I'm not sure, we haven't been properly introduced to each other.
You need to log in before you can comment on or make changes to this bug.