Closed Bug 502736 Opened 11 years ago Closed 2 years ago

SunSpider survey: opcode count, type stability, cycles per opcode...


(Core :: JavaScript Engine, defect)

Not set





(Reporter: wagnerg, Assigned: gwagner)



(9 files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5
Build Identifier: 

No bug, some facts about the SunSpider benchmarks:

Very few GVars and they are very stable. Only in one test case the type of a GVar changes.
Locals are very stable and mostly Integers.
Divisions are very rare but mostly with Integer types and the divisor is 2 or 8.
By far the most executed opcode is getLocal.

Specialized subtraction for integers instead of doubles reduces the cycles per subtraction from about 40 to 20.
The same with addition reduces the cost per addition from about 48 to 28.

Reproducible: Always
Integer meaning 31-bit INT_FITS_IN_JSVAL int?

Could you attach the instrumentation patch? Doesn't matter if hacky, just want to see all the details. Thanks,

I changed for example the JSOP_SUB case in the interpreter from BINARY_OP(-) to following:

  rval = FETCH_OPND(-1);
  lval = FETCH_OPND(-2);
  if((lval & rval) & JSVAL_INT)
    i = JSVAL_TO_INT(lval);
    i -= JSVAL_TO_INT(rval);
    STORE_INT(cx, -1, i);
  } else {
    VALUE_TO_NUMBER(cx, -2, lval, d);
    VALUE_TO_NUMBER(cx, -1, rval, d2);
    d -= d2;
    STORE_NUMBER(cx, -1, d);

I am looking at the controlflow-recursive testcase from sunspider. With the BINARY_OP(-) code I get about 40 cycles per sub. With this code I get about 16 cycles per sub if the type is integer.

The instrumentation details will follow.
We could try some asm("") magic to check for overflow, or even adding longs and checking for x >> 32 == 0 might do the trick. Worth a try. 16 vs 40 is probably due to the int -> double and then double -> int conversions. Long math is a bit more expensive, but staying in the integer domain saves a lot of i2f/f2i business.
Actually scrap that. asm("") magic doesn't work since we want 31-bit overflow. So just add the numbers as longs and mask out invalid results.
If we do the calculation with long integers and perform the proper overflow check we kill the performance win again. We win about 2-3 cycles per sub but I guess that's not really worth it.
Attached file instrumentation code
Short description: 
One big data structure -> instrumentationStruct
Cycle count for opcodes happens mostly in DO_OP() and BEGIN_CASE() in js_interpret().
Output and calculation in printInstrumentation()
Instrumentation add, remove in jscntxt.h
Lifetime analysis and dslots size for objects in SunSpider.
Execution time starts at JS_NewRuntime() and ends at JS_DestroyRuntime().
I also use JS_GC_ZEAL to GC at each allocation.
The size for each object is defined as         

   totalSize = (((uint32)obj->dslots[-1])*sizeof(obj->dslots[0]));
and is stored during FinalizeObject().
Attached file improved version
improved version where dslots = 0 is separate.
Attached file All SunSpider
all benchmarks are executed as a single file.
-Axis shows lifetime in cycles for all objects
Y-Axis shows dslots size for each object.
Note the logarithmic scale.
Hi Gregor, great to have this -- any way to estimate what objects are stack-like, become garbage no later than return of C/C++ frame in which the allocating JS API call was made?

For "objects" read gc-things -- strings and doubles definitely included. Thanks,

Waste for all SS benchmarks if Objectsize is increased to 64.
Maybe we could guess the number of needed dslots.
For all SS benchmarks, 76% of the objects have the same dslots size as the previous allocated object.

Total Object count: 225637
Object has same dslots size as earlier allocated object: 172843
Relative: 76.6%
Attachment #388613 - Attachment description: Waste for SS benachmarks. → Waste for SS benchmarks.
corrected calculation for dlsots size:
if(obj->dslots) {
    Size = (((uint32)obj->dslots[-1] - JS_INITIAL_NSLOTS + 1)*sizeof(obj->dslots[0]));

New numbers with JS_INITIAL_NSLOTS corrections:
Size = 0 :155027
Size<= 8 :     8
Size<= 16: 64544
Size<= 24:     0
Size<= 32:  2248
Size<= 64:    46
Size > 64:  3764
dslots size without the length slot: 
if(obj->dslots) {
    Size = (((uint32)obj->dslots[-1] - JS_INITIAL_NSLOTS)*sizeof(obj->dslots[0]));

Size = 0 :155027
Size<= 8 : 64534
Size<= 16:    18
Size<= 24:    10
Size<= 32:  2238
Size<= 64:    46
Size > 64:  3764
YChart for dslots size without length slot.
Size = (((uint32)obj->dslots[-1] - JS_INITIAL_NSLOTS)*sizeof(obj->dslots[0]));
The y-axis is limited to 200.
Assignee: general → anygregor
Ever confirmed: true
(In reply to comment #12)
> any way to estimate what objects are
> stack-like, become garbage no later than return of C/C++ frame in which the
> allocating JS API call was made?
> For "objects" read gc-things -- strings and doubles definitely included.

I have an estimation now for objects and strings. Doubles will follow.
These are the results for opening a spreadsheet on google docs. Startup of Firefox is included. 
I monitored following API function calls. Other functions like JS_EvaluateScript are included since they call one of the following functions:


JS_CallFunctionValue :5465
JS_CallFunctionName  :0
JS_CallFunction      :1
JS_ExecuteScript     :105
JS_EvaluateUCScriptForPrinc: 234

Objects that become garbage before API return: 35882 out of 153069 allocated Objects.
Strings that become garbage before API return: 18553 out of 71230 allocated Strings.

I don't have any meaningful numbers for the SunSpider benchmarks since there is just a single JS_ExecuteScript call involved and almost all objects and strings become garbage before returning. Do you want more fine grained measurements for them?
Resolving as INCOMPLETE, because SunSpider benchmark is no longer of interest in 2018.
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.