Closed Bug 559899 Opened 14 years ago Closed 11 years ago

JM: Investigate PIC performance

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Unassigned)

References

Details

Attachments

(1 file)

Attached file PIC microbenchmark
I investigated our PIC perf with the attached microbenchmark. Keeping the property access line inside the loop uncommented instead of the other line adds a minimal GETPROP--just the inline PIC path. 

I measured the cost of the basic GETPROP:

  SM (TM branch)          12.9 ns    = 28.4 cycles
  TM (TM branch)           6.6         14.5
  JM+TM (JM branch)        7.4         16.3
  JM (JM branch) (the PIC) 4.7         10.3
  jsc                      1.2          2.6
  v8                       0.9          2.0
  
So the PIC helps a lot, but it's not helping as much as it could. There are about 3M PIC ops saved in SS, and 17M in v8. If we assume that we are 3.7ns slower than we need to be on all of them, then there is another 10ms to be gained on SS and 60ms to be gained on v8. Those are fairly small gains, but the real gains might be bigger, and I would still like to know what's going on.
Here is the code for this PIC inline path initially:

0x64b19e:	mov    0x5c(%ebx),%eax      // load o to eax

*0x64b1a1:	test   %eax,%eax            // guard that |o != NULL|
*0x64b1a3:	je     0x64b35c
0x64b1a9:	test   $0x7,%al             // guard that o is an object
0x64b1ac:	jne    0x64b35c
0x64b1b2:	mov    (%eax),%edx          // load obj->map
*0x64b1b4:	mov    0x4(%edx),%edx       // load obj->map->shape
0x64b1b7:	cmp    $0x8fffffff,%edx     // guard on shape (UNPATCHED)
0x64b1bd:	jne    0x64b35c
0x64b1c3:	mov    0x1c(%eax),%eax      // load dslots (UNPATCHED)
0x64b1c6:	mov    0x4000000(%eax),%eax // load slot value (UNPATCHED)

0x64b1cc:	mov    %eax,0x5c(%ebx)      // store slot value to expr stack

The first and last instructions are set off because they are not part of the PIC proper; they would be present for the code |z = o| as well.

Lines marked with * at the beginning are present in our PIC but not jsc's. Everything else in the PIC proper is pretty much the same.

After patching, we have:

0x64b19e:	mov    0x5c(%ebx),%eax      // load o to eax

*0x64b1a1:	test   %eax,%eax            // guard that |o != NULL|
*0x64b1a3:	je     0x64b35c
0x64b1a9:	test   $0x7,%al             // guard that o is an object
0x64b1ac:	jne    0x64b35c
0x64b1b2:	mov    (%eax),%edx          // load obj->map
*0x64b1b4:	mov    0x4(%edx),%edx       // load obj->map->shape
0x64b1b7:	cmp    $60,%edx             // guard on shape (UNPATCHED)
0x64b1bd:	jne    0x64b35c
0x64b1c3:	lea    0x1c(%eax),%eax      // compute &obj->fslots
0x64b1c6:	mov    -12(%eax),%eax       // load slot value (UNPATCHED)

0x64b1cc:	mov    %eax,0x5c(%ebx)      // store slot value to expr stack
Maybe we want to represent null with a NullObject instead of 0x0? That would collapse the branching a bit.
(In reply to comment #2)
> Maybe we want to represent null with a NullObject instead of 0x0? That would
> collapse the branching a bit.

We do want to do that. But I tried removing that guard and it only gave a 0.2 cycle/iteration speedup. I also tried guarding on |obj->map| instead of |obj->map->shape|, and that gave 0.7 cycles/iter by itself or 1.2 cycles/iter combined with removing the null guard. 

I think the issue might be related to the stack load/stores surrounding this path, but I don't know.
While talking about this with Chris Leary over lunch I realized that we aren't optimizing jsop_getlocalprop in the compiler. Instead we just do a GETLOCAL and then a GETPROP, so the value gets stored to and then reloaded right away. I could fix this in the getprop op cases, but it seems that the compiler optimization work will probably pick this up naturally.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: