Closed
Bug 559899
Opened 15 years ago
Closed 12 years ago
JM: Investigate PIC performance
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: dmandelin, Unassigned)
References
Details
Attachments
(1 file)
195 bytes,
application/x-javascript
|
Details |
I investigated our PIC perf with the attached microbenchmark. Keeping the property access line inside the loop uncommented instead of the other line adds a minimal GETPROP--just the inline PIC path.
I measured the cost of the basic GETPROP:
SM (TM branch) 12.9 ns = 28.4 cycles
TM (TM branch) 6.6 14.5
JM+TM (JM branch) 7.4 16.3
JM (JM branch) (the PIC) 4.7 10.3
jsc 1.2 2.6
v8 0.9 2.0
So the PIC helps a lot, but it's not helping as much as it could. There are about 3M PIC ops saved in SS, and 17M in v8. If we assume that we are 3.7ns slower than we need to be on all of them, then there is another 10ms to be gained on SS and 60ms to be gained on v8. Those are fairly small gains, but the real gains might be bigger, and I would still like to know what's going on.
Reporter | ||
Comment 1•15 years ago
|
||
Here is the code for this PIC inline path initially:
0x64b19e: mov 0x5c(%ebx),%eax // load o to eax
*0x64b1a1: test %eax,%eax // guard that |o != NULL|
*0x64b1a3: je 0x64b35c
0x64b1a9: test $0x7,%al // guard that o is an object
0x64b1ac: jne 0x64b35c
0x64b1b2: mov (%eax),%edx // load obj->map
*0x64b1b4: mov 0x4(%edx),%edx // load obj->map->shape
0x64b1b7: cmp $0x8fffffff,%edx // guard on shape (UNPATCHED)
0x64b1bd: jne 0x64b35c
0x64b1c3: mov 0x1c(%eax),%eax // load dslots (UNPATCHED)
0x64b1c6: mov 0x4000000(%eax),%eax // load slot value (UNPATCHED)
0x64b1cc: mov %eax,0x5c(%ebx) // store slot value to expr stack
The first and last instructions are set off because they are not part of the PIC proper; they would be present for the code |z = o| as well.
Lines marked with * at the beginning are present in our PIC but not jsc's. Everything else in the PIC proper is pretty much the same.
After patching, we have:
0x64b19e: mov 0x5c(%ebx),%eax // load o to eax
*0x64b1a1: test %eax,%eax // guard that |o != NULL|
*0x64b1a3: je 0x64b35c
0x64b1a9: test $0x7,%al // guard that o is an object
0x64b1ac: jne 0x64b35c
0x64b1b2: mov (%eax),%edx // load obj->map
*0x64b1b4: mov 0x4(%edx),%edx // load obj->map->shape
0x64b1b7: cmp $60,%edx // guard on shape (UNPATCHED)
0x64b1bd: jne 0x64b35c
0x64b1c3: lea 0x1c(%eax),%eax // compute &obj->fslots
0x64b1c6: mov -12(%eax),%eax // load slot value (UNPATCHED)
0x64b1cc: mov %eax,0x5c(%ebx) // store slot value to expr stack
Comment 2•15 years ago
|
||
Maybe we want to represent null with a NullObject instead of 0x0? That would collapse the branching a bit.
Reporter | ||
Comment 3•15 years ago
|
||
(In reply to comment #2)
> Maybe we want to represent null with a NullObject instead of 0x0? That would
> collapse the branching a bit.
We do want to do that. But I tried removing that guard and it only gave a 0.2 cycle/iteration speedup. I also tried guarding on |obj->map| instead of |obj->map->shape|, and that gave 0.7 cycles/iter by itself or 1.2 cycles/iter combined with removing the null guard.
I think the issue might be related to the stack load/stores surrounding this path, but I don't know.
Reporter | ||
Comment 4•15 years ago
|
||
While talking about this with Chris Leary over lunch I realized that we aren't optimizing jsop_getlocalprop in the compiler. Instead we just do a GETLOCAL and then a GETPROP, so the value gets stored to and then reloaded right away. I could fix this in the getprop op cases, but it seems that the compiler optimization work will probably pick this up naturally.
![]() |
||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•