Closed Bug 518487 (TM:PIC) Opened 15 years ago Closed 13 years ago

TM: richards.js (and many others) need polymorphic inline caching

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
mozilla1.9.2

People

(Reporter: brendan, Assigned: gal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

So richards.js is a PIC use-case. Branching on shape guards works but we still suffer from the cost of recording secondary traces, and we also guard on too many shapes compared to an optimized PIC.

You can see the structural types in richards via grep:

$ grep '\.run = ' v8/richards.js
TaskControlBlock.prototype.run = function () {
IdleTask.prototype.run = function (packet) {
DeviceTask.prototype.run = function (packet) {
WorkerTask.prototype.run = function (packet) {
HandlerTask.prototype.run = function (packet) {

The four Task subtypes have distinct run methods, which we avoid cloning thanks to the fix for bug 471214, but in any event the property cache memoizes called function values here, not slots, so we get distinct shapes for each prototype. This matters because the scheduler calls TaskControlBlock's run method, which calls one of the four concrete run(packet) methods.

You might wonder if despecializing from joined method (or unjoined but branded function value if we couldn't join due to LAMBDA | {SET,INIT}METHOD) to slot would win, but it would be slower than burning the method in on-trace. It would also not be clear when to despecialize -- presumably for megamorphic callsites. More on that later.

But note that fixing bug 497789 properly means giving each of the four Task subtypes its own shape anyway, each based on the shape of its prototype. So we really do have four distinct shapes at play here, no matter how you slice it, if we are optimizing hardest -- each of which has a distinct run method to call.

This is a prime case for PIC, here of size 4, but we can probably go to 8; I'll instrument degree of polymorphism by callsite as part of this bug's patch). We can avoid shape-guarding on the prototype by linking PICs together in the code cache and invalidating PICs compiled for a given proto-shape when the prototype is reshaped.

It should be possible to use kshape as cache index provided we seed an object's initial shape from its prototype's shape. This is the first order of business.

/be
(In reply to comment #0)
> So richards.js is a PIC use-case. Branching on shape guards works but we still
> suffer from the cost of recording secondary traces,

Forgot to mention the lack of LRU-caching. A PIC can self-organize based on hits to put the hottest result first. We don't have that capability (yet) with tree and branch traces.

/be
We have an approximation of #1: we randomly pick a trace to record after some initial waiting. That should be statistically the hottest one. Then we wait some and randomly pick another one. That again should be often the hottest one of the remaining ones. Its not dynamic though, as #1 properly points out so if there is a phase change it doesn't adapt.
(In reply to comment #2)
> We have an approximation of #1: we randomly pick a trace to record after some
> initial waiting. That should be statistically the hottest one. 

Do we have any empirical backing for that? E.g., if there are 81 traces, and one occurs 20% of the time, and 80 that occur 1% of the time, we only have a 20% chance of picking the hot one the first time, and only about a 16% chance of picking it the second time.
Alias: TM:PIC
Stealing. I would like to add an on-trace 1-entry pic per property access and see what the cost is for the indirect shape and slot storage. Then I will add a miss-and-fill path and see what the overhead of that is. Based on that we can decide whether we want the first trace to be a fast path, or always use a pic for property access (measure first, then worry about it being too slow).
Assignee: brendan → gal
Attached patch patchSplinter Review
Beginnings of a PIC implementation.
Blocks: 460050
Would it be wise first to try and estimate how much of a reduction in
trace duplication (or whatever it is we're trying to avoid) a PIC
implementation would give?  IOW, what can we measure right now that
would give an upper bound on the expected benefit(s)?
Obsolete with the removal of tracejit.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.