Created attachment 8532045 [details] Shell testcase Real-world (framework/library) JS often has very polymorphic property accesses. We've always been bad at those: our ICs attach up to 16 stubs and then give up. This results in really bad performance cliffs, I recently ran into this again with Shumway (getprop, bug 1103441) and Ember (setprop, bug 1097376). I want to see how effective a megamorphic IC stub is that inlines enough of JSObject::getGeneric to handle the most common cases. The attached testcase is 10x faster in V8.
A nice alternative that'd help in some cases is the "reverse shape" map we've been discussing for a while. Unfortunately it won't handle the missing property case in bug 1103441 (Shumway). It seems like a very generic stub is a good first start, later we can do the reverse shape thing to speed up the existing-property case even more.
FWIW, what V8 does here is pretty slick. They have a global hash table that basically maps shape + id => IC stub code. Then their megamorphic stub just inlines a hash lookup to get the stub JIT code and jumps to it... It's potentially much faster than the generic stub I described in comment 0, but you need more memory for stubs. The hash table could have a fixed size though and could be purged on GC...
(In reply to Jan de Mooij [:jandem] from comment #2) > FWIW, what V8 does here is pretty slick. They have a global hash table that > basically maps shape + id => IC stub code. Then their megamorphic stub just > inlines a hash lookup to get the stub JIT code and jumps to it... I have a quick-and-dirty prototype patch for this. The HashMap lookup in JIT code is a bit complicated but works. Some numbers: js before: 872 ms js after : 69 ms d8 : 52 ms Still a bit slower than V8, their hash table setup might be smarter (they have a primary and secondary table). Still, it's just the megamorphic case and we're 12x faster than before so it's promising. If I use 8 instead of 64 different shapes, we get 42 ms so the megamorphic case is less than 2x slower than polymorphic.
I'd like to get back to this soon. It's not a significant problem for Shumway anymore due to bug 1125505, but it still affects Ember and other real-world JS. I'm not too happy with my initial prototype. Although it worked pretty well and was fast, especially setprops were a bit annoying due to the type guards we need. Also, doing the hashtable lookup in JIT code is efficient but also requires a lot of pretty complicated code, especially x86/x64 are annoying with their weird register constraints (shift instructions with variable shift amount require rhs to be in ecx, and hash functions love those shifts). I want to prototype another approach that does more in C++ and see how fast we can get that.
Created attachment 8568593 [details] [diff] [review] WIP Handles own property gets/sets. Needs more work and cleanup, but I'm pretty happy with this approach. Instead of using js::HashTable, we now use a fixed-size table of 1024 entries. This patch also no longer uses JitCode* stubs, we just store the slot number. This seems simpler and less heavy-weight. Micro-benchmark performance is similar to the numbers I posted before. The Ember "Render List" benchmark in bug 1097376 doesn't improve much, even though we do spend ~25% under SetPropertyIC::update without the patch, and almost none with the patch. It's possible it helps the very slow cleanup part of the test, will investigate more... About 15 million hits on Octane-TypeScript, locally I see a ~1000 points win but it's pretty noisy. Will get more numbers later.
Sigh. Ember still wants this. It adds an __ember_meta__ property to tons of different objects, and places where it's used all go megamorphic. I wish I had finished this a year ago, but at least it should be nicer now after the Ion IC refactorings.
I'm working on implementing a variant of this right now. Here's what I have so far: To start with, I've only implemented megamorphic caching for native slot reads in GetPropertyIC. If a cache generates too many of these stubs, it throws them away and replaces them with a megamorphic stub. This stub takes the object shape and property jsid and, as in your patch, looks up the slot information in a global, fixed-size hashtable and performs the slot read if it's found. When applicable, slot information is written to the global hashtable as part of GetPropertyIC::update. Depending on how things go, I might switch to storing generated stubs in the hashtable and jumping to them from the megamorphic stub instead of performing the read in-place. As part of this work, I've extended IonCache to support keeping track of one or more "chains" of stubs. Most ICs have just a single, "main" chain and see no change in behavior, but GetPropertyIC has two chains. Native slot read stubs are placed in their own dedicated chain, and all other stubs go into the main chain. The initial jump into the IC points to either the first stub in the main chain or the first stub in the second chain, if the main chain is empty. When both chains contain stubs, the last stub in the main chain is patched to jump into the first stub of the second chain on failure. Then when we go megamorphic, all the stubs in the second chain (the native slot read stubs) can be cheaply thrown away without affecting any of the other, non-megamorphic stubs. Each chain has its own independent stub count quota, which I think makes sense when the secondary chains are replaced with a single megamorphic stub on overflow. Current status: chains work, and I'm debugging my megamorphic cache implementation right now.
(In reply to Michael Smith [:mismith] from comment #7) > To start with, I've only implemented megamorphic caching for native slot > reads in GetPropertyIC. If a cache generates too many of these stubs, it > throws them away and replaces them with a megamorphic stub. This stub takes > the object shape and property jsid and, as in your patch, looks up the slot > information in a global, fixed-size hashtable and performs the slot read if > it's found. When applicable, slot information is written to the global > hashtable as part of GetPropertyIC::update. Excellent! > Depending on how things go, I > might switch to storing generated stubs in the hashtable and jumping to them > from the megamorphic stub instead of performing the read in-place. Yeah, that's what I did at first (not sure I uploaded my patch to bugzilla, it was very hackish). IIRC it was more complicated than the latest patch and it wasn't that much faster. It might be worth remeasuring that though. One (more recent) problem I see with generating individual stubs is that JitCode generation is more expensive now because we mprotect JIT code twice (W^X protection). So if we do want this, I think we should write these stubs in a way that makes it possible to share stub code (Baseline does this). > Then > when we go megamorphic, all the stubs in the second chain (the native slot > read stubs) can be cheaply thrown away without affecting any of the other, > non-megamorphic stubs. I'm not sure how I feel about two different chains. I think the best way to fix this problem is to use a model more like what Baseline has, so we can unlink individual native slot stubs and keep the other ones. That's a lot more involved, but until then it might be simpler to just unlink everything when we go megamorphic? I'd be interested in the following measurement: how common is it for the first chain to be non-empty when we go megamorphic? Maybe you can collect some data for the React benchmark, Octane, and some popular websites like Gmail, Facebook, Twitter. > Current status: chains work, and I'm debugging my megamorphic cache > implementation right now. Exciting to see work on this!
Oh, I think it makes sense to land an initial implementation that supports only 'own native slot reads'. That's the most common kind. For Ember, the missing-property case also matters -- `if (object.someEmberProperty)` -- but that's a bit harder to implement because we need to shape guard all objects on the prototype. Octane's TypeScript benchmark also hits the megamorphic case. It'd be good to measure that one as well (you can run js/src/octane/run-typescript.js in the shell).
2 years ago
platform-rel: --- → ?
Michael are you still interested in working on this or could you attach your partial patch? Thanks!
I think this is sort of important for some real world code, setting P3.
Priority: -- → P3
(In reply to Tom Schuster [:evilpie] from comment #10) > Michael are you still interested in working on this or could you attach your > partial patch? Thanks! I'm hoping to land this soon, once bug 1288881 goes in. Will upload the partial patch tomorrow.
Hi, Michael. Where you able to make any headway here? I am unable to view bug 1288881 so I'm not sure if that has landed. Alternatively — is there anyone else we feel may be able to pick this up, Tom?
The other bug is probably not going to make any significant progress in the near future. I think Jan is working on something similar in bug 1328140 that should help here as well.
I'm working on getting bug 1288881 landed right now and then plan on resurrecting my patch set for this once the dust has settled.
The work in this bug is subsumed by 1328140. These patches have been made mostly irrelevant as a result.
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → WONTFIX
Isn't this mostly fixed, not wontfix? As in, we added megamorphic stubs...
That is more accurate. Fixed.
Resolution: WONTFIX → FIXED
You need to log in before you can comment on or make changes to this bug.