As it stands, TI copes badly with polymorphic code. If a piece of the analyzed program can manipulate multiple types, TI will tend to propagate those types everywhere in the system and lose a tremendous amount of precision. Have known this problem since the project started, and have known the fix for almost as long. Trying to improve the static portion of the analysis to deal with polymorphism better is hopeless --- polymorphism can crop up in so many and such complicated ways we can't hope to understand it statically. Instead, we just need to detect within TI where the boundaries of the polymorphism are, and stick in dynamic barriers to keep the imprecise types from leaking to the rest of the program.
Conceptually, this is pretty simple. Fundamentally, TI takes each assignment in the program and constructs subset constraints from it. For an assignment 'x = y', where x and y have type sets X and Y, there is a subset constraint 'Y <= X' saying that all types in Y should also be in X.
With dynamic analysis/monitoring available we can relax this, and say that Y is allowed to contain types not in X, provided that we have a read barrier at this assignment checking that the bonus types do not actually show up.
Two issues in getting these polymorphism barriers in: where they are inserted, and when they are inserted. For the first, for now doing things at property reads and at call argument binding should be sufficient, I think. May want to add barriers at property writes and call return binding in the future (could add them everywhere, but it would add a lot of dynamic monitoring needed by the interpreter and slower convergence to the final code for not much payoff I think). For the second, we could add barriers whenever the inferred types for Y don't match the observed types for X, or we could use heuristics. Will play it by ear.
Add JSScript type sets for use in all read property accesses. The interpreter always stores types it reads in these sets, and inference has the option of inserting barriers when propagating types from the sets the read could possibly produce to this new persistent set of observed types. These barriers are never actually inserted yet, and the Compiler doesn't yet know how to cope with them.
For calls, barriers can be inserted (aren't yet) and things should work. The Compiler doesn't distinguish between call sites with type barriers and call sites where not all callees were determined (in both cases, not all possible argument bindings were considered by inference), and uses an entry point which does type checks. This functionality was already there (used in the arity check entry point), but now has a separate entry point in the JIT which skips the arity check and is used when we are generating an inline path or closure stub at a call site with barriers.
This dings SS by about 2% (other benchmarks unaffected), due to the extra cost of always checking type sets at property accesses in the interpreter. The other opportunity for slowdown with this patch is if we get aggressive introducing these type barriers and spend more time recompiling (and interpreting) as a result.
Accidentally removed the line of code marking which ICs are monitored, so we were never doing type checks on inline call paths. Fixed as part of the below rev.
Also, the above rev fixed a separate issue where we never actually marked known types for arguments in the compiler at script entry (fixing this exposed the bug fixed above). This gives 2% back to SS.
At property accesses in the Compiler with read barriers, read the property into two registers and then do type checks vs. the pushed type set. In inference, optionally add read barriers at such property accesses and at call argument bindings. I experimented with heuristics a little here and went to the dynamic end of the spectrum (better type information, slower convergence on the final types and more recompilation). Type barriers are always added when a property access or argument binding could produce a given type but has never been observed to, except for megamorphic accesses on objects (>= 10 observed objects, cheesy hack). Testing locally, this adds 100 recompilations to SS (60 -> 160) but barely dings perf (~3ms). Kraken improves by 12%, in astar, crypto-aes and crypto-pbkdf2. v8-crypto improves a little, others unaffected (they still need more tuning).