Closed
Bug 596175
Opened 15 years ago
Closed 2 years ago
IonMonkey: Experiment with alternative PIC code-path.
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jbramley, Unassigned)
Details
(Whiteboard: [js:t])
Currently, PICs can chain up multiple stubs. Each one (and the inline path) has a shape check which falls back to the next stub, or to a slow-case C++ handler. Each stub gets its own lump of memory allocated. The memory will probably not be sequential, so the stubs for a particular PIC will probably be sparsely located throughout memory.
I haven't taken any measurements, but this kind of memory access pattern is often bad for cache utilization. Each PIC is essentially a sparse linked list. The branch predictors might hide this, but I don't know exactly how much of it they will hide.
----
I'd like to experiment with an alternative design:
• The fast (inline) path gets a shape check as it currently does, because PICs that mostly behave like MICs can get a decent benefit from the inline path.
• If the inline shape check fails, a table look-up is performed (out-of-line) using the shape as a key. The value returned is a pointer (or offset) to the stub which implements the PIC for that shape. If no value is returned, the slow path is invoked.
Advantages:
• The slow path can easily add elements to the table in order to add PIC implementations. Patching becomes much simpler and more robust as the majority of it simply involves updating the table in an entirely platform-generic manner.
• Stubs for shapes that are rarely hit after creation will drop out of cache, rather than being kept in cache by the shape guard.
• The maximum number of chained branches on any PIC is greatly reduced. The average will probably be improved too. This should help branch predictors.
• The stubs themselves never need to be patched as the shape guard is elsewhere.
• The table look-up itself will be identical for every PIC, and can be implemented as a single shared code segment.
Disadvantages:
• It is slightly more complicated to jump to the first stub than in the existing implementation.
• x86 and x64 will have to load the shape guard value from memory rather than from an in-line constant in the instruction stream. I don't know what effect this has on performance on x86.
Another variant which might be worth experimenting with puts the table look-up in-line, but that might harm performance if the in-line path is very common (as I expect is the case).
----
I don't have much time to play with this at the moment, but I'm posting a bug so I don't forget about it and to start discussion. (You may have already considered this approach and rejected it.)
On the other hand, if this idea receives a lot of enthusiasm I will prioritize it over getting PICs on ARM because it will make it far simpler to port PICs to any platform. (The majority of the awkward patching reduces down to simple table edits.)
Reporter | ||
Comment 1•15 years ago
|
||
Logging every write to "stubsGenerated" on x64:
SunSpider:
1 stub : 1660
2 stubs : 280
3 stubs : 260
4 stubs : 100
5 stubs : 80
6+ stubs: 490
V8:
1 stub : 4580
2 stubs : 156
3 stubs : 83
4 stubs : 55
5 stubs : 40
6+ stubs: 176
(I didn't measure how many PICs never got stubs because it's slightly trickier to do and because it won't be affected by my proposal as the inline code is unchanged.)
I didn't measure actual stub usage, so I don't know which path is most common. Also, I don't take into account any resetting that may occur, so if PICs are reset periodically as they are in V8, the results will be skewed towards the lower stub counts.
Comment 2•15 years ago
|
||
Interestingly, the design you describe above is almost how PICs were first explained to me, and how I thought they were implemented, before I actually looked at the code. Specifically, they were explained to me as a small region of memory in the inline buffer initially laid out like this:
[jmp]
[nop]
[nop]
...
[nop]
Adding a path would be done by removing the jmp, inserting a guard that on success jumps to the correct handler code, and adding a final terminal jmp to avoid the nop-sled.
How would the table lookup work?
Reporter | ||
Comment 3•15 years ago
|
||
(In reply to comment #2)
> How would the table lookup work?
I imagined nothing more sophisticated than a simple table look-up. Pseudo-code:
for (i=0; i < MAX_PIC_STUBS; i++) {
if (shapes[i] == shape) {
goto stub[i]();
}
}
This would be out-of-line, though, to keep the fast inline case.
(In reply to comment #0)
Keep in mind that if the property is on a prototype, we also have to guard on the shape of the prototype, but the second check could be part of the generated stub.
The cross-platform ease is a nice advantage, but loading the shape guard value out of memory will be more expensive on x86. It might be worth measuring just how much the stub locality costs. Right now it's not clear if re-ordering or GCing only selective portions of the PIC is worthwhile.
This sounds like a good path to handling megamorphic PICs. There are cases in both v8 and SS where we hit the max stubs for a site. For cases like these we could experiment with falling back to some kind of table lookup that's faster than obj->getProperty.
Comment 5•15 years ago
|
||
Hmmmm. You're the expert on ARM, so if it sounds like a good idea to you there, it is probably worth trying. I have my doubts about whether it will help Intel--I would think the branch predictor can do better with a separate branch to check for each shape, and it probably gets a pretty high hit rate. I don't have detailed measurements on that, though.
Also, ISTR that when I was messing around with regexp code before, it seems that immediate comparisons and straight-linish code tend to beat loops and loads most of the time. But again, ARM might be different there.
Reporter | ||
Comment 6•15 years ago
|
||
Interesting.
It makes no sense to maintain a special design for ARM (or any other architecture), and there is some doubt about whether or not this is beneficial, so maybe this is best looked at after PICs are ported for ARM.
> (In reply to comment #0)
> [...] loading the shape guard value
> out of memory will be more expensive on x86.
The look-up could be implemented in-line:
cmp %shape, 0(%stubshapelist)
bne lookup
<first stub> @ Relatively common, so make this in-line.
lookup:
cmp %shape, 4(%stubshapelist)
beq stub1
cmp %shape, 8(%stubshapelist)
beq stub2
...
<slow case> @ Might as well be in-line.
It makes things less portable, but at least all shape patch sites are in one place and easily accessible using a base address and a few offsets (which can be reliably small).
Thinking about it, that pattern is actually a fairly reasonable compromise. The first stub is probably as fast as in the current solution, and subsequent stubs aren't too expensive. For platforms that can benefit from it, a loop could be used as the shape list will be best loaded from a literal pool anyway (for easy patching). This could all be transparent to the PIC code itself as it'd all be hidden behind some platform-specific utility function.
Comment 8•12 years ago
|
||
It is an alternate method of generating pics. It sounds a bit like megamorphic IC's, which have been used in the wild before (iirc, smalltalk supports them). There is no good reason to not have them, particularly since BC does basically everything via ICs.
Flags: needinfo?(mrosenberg)
Comment 9•12 years ago
|
||
Ok, thanks for the info. Updating description and resetting assignee. @jbramley, please don't let that keep you from working on this, of course. ;)
Assignee: Jacob.Bramley → general
Status: ASSIGNED → NEW
OS: Linux → All
Summary: JM: Experiment with alternative PIC code-path. → IonMonkey: Experiment with alternative PIC code-path.
Whiteboard: [js:t]
Assignee | ||
Updated•11 years ago
|
Assignee: general → nobody
Updated•3 years ago
|
Severity: normal → S3
Updated•2 years ago
|
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•