Closed
Bug 433382
Opened 17 years ago
Closed 17 years ago
More efficient interpreter switch when computed goto is not available
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
(Keywords: perf)
Attachments
(3 files, 11 obsolete files)
Currently with MSVC compiler, which does not support computed goto extension of GCC, the interpreter loops looks effectively as:
for (;;) {
op = (JSOp) *regs.pc;
len = js_CodeSpec[op].length;
if (interruptHandler) {
....
}
switch (op) {
....
}
regs.pc += len;
}
This structure is inefficient for 2 reasons:
1. The if check for interruptHandler even with branch prediction does not come for free. Using the idea from a patch for bug 430293 it should be possible to replace the if check with branch-less bit operation.
2. It accesses len way to early initializing it from a global table. This forces the compiler to save its value on the native stack whenever a switch case calls a function. Moreover, when a jump operation performs a goto, this assignment to len is completely an necessary. Initializing len the end of each case in the same way as the indirect threading code does should avoid this.
Comment 1•17 years ago
|
||
Are you sure 2 is a real problem in our optimized, PGO'ed builds?
Improving Windows perf incrementally is great, but even better would be computed gotos somehow -- esp. since we can count on VS8 now. Is there no way?
/be
Assignee | ||
Comment 2•17 years ago
|
||
Implementation of the ideas from the comment 1. The biggest part of the patch is introduction of that jsoplen.h headers that defines JSOP_XXX_LENGTH as macros. This way it is possible in the interpreter to specialize END_CASE expansion for different bytecode lengths. Ideally it would be nice to autogenerate the patch but for now at least it is statically checked.
Attachment #320728 -
Flags: review?(brendan)
Assignee | ||
Comment 3•17 years ago
|
||
To get the results with GCC I passed the following args to the make:
BUILD_OPT=1 INTERP_OPTIMIZER="-falign-functions=4096 -O3 -fstrict-aliasing" XCFLAGS="-DJS_THREADED_INTERP=0"
where JS_THREADED_INTERP=0 disables indirect threading in the interpreter with GCC.
The end result is 16% speedup with the patch. Now if somebody can check the patch with VC8, that would be nice.
Assignee | ||
Comment 4•17 years ago
|
||
For references, here is the results of the switched interpreter loop *with* the patch applied versus the trunk with the indirect threading loop. The end result is that with the patch the switch is 12% slower. Without the patch the switched loop is slower by 30%.
Assignee | ||
Comment 5•17 years ago
|
||
(In reply to comment #1)
> Are you sure 2 is a real problem in our optimized, PGO'ed builds?
Given that the patch shows speedups up to 80% for particular benchmarks with GCC 4.3.0 at -O3, that must be a problem.
Assignee | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
> (In reply to comment #1)
> > Are you sure 2 is a real problem in our optimized, PGO'ed builds?
>
> Given that the patch shows speedups up to 80% for particular benchmarks with
> GCC 4.3.0 at -O3, that must be a problem.
For references, here the tests that benefits the most:
bitops: 1.31x as fast 1705.6ms +/- 0.3% 1299.2ms +/- 0.7% significant
3bit-bits-in-byte: 1.48x as fast 266.2ms +/- 0.1% 180.2ms +/- 0.3% significant
bits-in-byte: 1.79x as fast 365.4ms +/- 0.8% 204.6ms +/- 0.8% significant
bitwise-and: 1.078x as fast 589.4ms +/- 0.8% 546.8ms +/- 1.5% significant
nsieve-bits: 1.32x as fast 484.6ms +/- 0.2% 367.6ms +/- 0.5% significant
Comment 7•17 years ago
|
||
Amazing -- that hoisted len load was ~13 years old!
Can you attach a patch including jsoplen.h? I'm still interested in PGO'ed build results too -- could be even better.
/be
Assignee | ||
Comment 8•17 years ago
|
||
The new version of the patch includes the missing jsoplen.h
Attachment #320728 -
Attachment is obsolete: true
Attachment #320739 -
Flags: review?(brendan)
Attachment #320728 -
Flags: review?(brendan)
Comment 9•17 years ago
|
||
Can you automate generation of jsoplen.h from jsopcode.tbl?
/be
Comment 10•17 years ago
|
||
I'll get PGO results. It'll take some time though. Should have results tomorrow morning.
Assignee | ||
Comment 11•17 years ago
|
||
The new version auto-generates jsautooplen.h during the compilation and adjust the build files to account for that. Since xpconnect includes jsopcode.h, in the new version jsopcode.h does not include the generated file. Instead the C sources that uses JSOP_<name>_LENGTH constants includes the file directly. It avoids the need to adjust the makefile for xpconnect to include the build dir for js into the include path.
Attachment #320739 -
Attachment is obsolete: true
Attachment #320798 -
Flags: review?(brendan)
Attachment #320739 -
Flags: review?(brendan)
Assignee | ||
Updated•17 years ago
|
Attachment #320798 -
Attachment is patch: true
Attachment #320798 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 12•17 years ago
|
||
The previous version of the patch does not compile when liveconnect is on as the code there currently uses JSOP_STOP_LENGTH. To fix that I replaced the symbolic constant by one, its numeric value, together with the assert that the last pc is JSOP_STOP. This way there is no need to patch the makefile for liveconnect to add the build directory of js/src to the include path of liveconnect.
Attachment #320798 -
Attachment is obsolete: true
Attachment #320905 -
Flags: review?(brendan)
Attachment #320798 -
Flags: review?(brendan)
Comment 13•17 years ago
|
||
I tried v1 and v3 of this, and neither compile for me.
Assignee | ||
Comment 14•17 years ago
|
||
In the previous patch I forgot to include jsoplengen.c file.
Attachment #320905 -
Attachment is obsolete: true
Attachment #320972 -
Flags: review?(brendan)
Attachment #320905 -
Flags: review?(brendan)
Comment 15•17 years ago
|
||
With a PGO windows build, I get something about 5% faster than the nightly on the same hardware, with some tests showing 10% improvement, and no regressions to speak of.
Comment 16•17 years ago
|
||
Where should we start landing these patches? We need an hg repo for 3.next/1.9.next work. We can't use ActionMonkey because XPCOMGC is not done and won't be for a while, but we need short-term release-ability *and* no-regressions test-ability including in-browser tests and macro-benchmarks.
/be
Comment 17•17 years ago
|
||
Comment on attachment 320972 [details] [diff] [review]
v3 with all the files
Need to get jimb's autoconf build system to track this bug's changes.
>+#include <jsautooplen.h>
We use "" not <> for local headers including generated ones. <> means system include directories only, or used to. It may not matter given -I overuse, but stylistically it would be best to stick with "".
> # define DO_OP() JS_EXTENSION_(goto *jumpTable[op])
>-# define DO_NEXT_OP(n) do { METER_OP_PAIR(op, regs.pc[n]); \
>- op = (JSOp) *(regs.pc += (n)); \
>- DO_OP(); } while (0)
>+# define DO_NEXT_OP(n) \
>+ JS_BEGIN_MACRO \
>+ METER_OP_PAIR(op, regs.pc[n]); \
>+ op = (JSOp) *(regs.pc += (n)); \
>+ DO_OP(); \
>+ JS_END_MACRO
>+
> # define BEGIN_CASE(OP) L_##OP:
> # define END_CASE(OP) DO_NEXT_OP(OP##_LENGTH);
> # define END_VARLEN_CASE DO_NEXT_OP(len);
>-# define EMPTY_CASE(OP) BEGIN_CASE(OP) op = (JSOp) *++regs.pc; DO_OP();
>-#else
>+# define ADD_EMPTY_CASE(OP) BEGIN_CASE(OP) \
>+ JS_ASSERT(js_CodeSpec[OP].length == 1); \
>+ op = (JSOp) *++regs.pc; \
>+ DO_OP();
Style nit: the DO_NEXT_OP body should probably try to align with other macro definitions, including ADD_EMPTY_CASE. That macro might indent by 4 instead of 2, but that's a nit on top of a nit.
>+# define END_EMPTY_CASES
>+
>+#else /* !JS_THREADED_INTERP */
>+
> # define DO_OP() goto do_op
>-# define DO_NEXT_OP(n) goto advance_pc
>+# define DO_NEXT_OP(n) \
>+ JS_BEGIN_MACRO \
>+ JS_ASSERT((n) == len); \
>+ goto advance_pc; \
>+ JS_END_MACRO
Same here.
>+/*
>+ * To share the code for all len == 1 cases we use the specialized label with
>+ * code that fall through to advance_pc: .
s/fall/falls/
>- /* Check for too deep a C stack. */
>+ /* Check for a too deep C stack. */
English nit: previous version could have said "too deep of " but new version would want "too-deep" or "overdeep". Either is fine in my book, probably "too deep of " is more colloquial.
>-#if JS_THREADED_INTERP
>+ /*
>+ * It is important that "op" be initialized before calling DO_OP because
>+ * it is possible for "op" to be specially assigned during the normally
>+ * processing of an opcode while looping. We rely on DO_NEXT_OP to manage
>+ * "op" correctly in all other cases.
s/normally/normal/
>+OPDEF(JSOP_NOP, 230, "nop", NULL, 1, 0, 0, 0, JOF_BYTE)
Suggest eliminating JSOP_SWAP and using its opcode. This touches a few places by removing undetectably-dead code.
/be
Assignee | ||
Comment 18•17 years ago
|
||
The new version of the patch addresses the nits and eliminated JSOP_SWAP as suggested. Here is a delta from the previous version:
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsinterp.c 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsinterp.c 2008-05-22 16:03:47.000000000 +0200
@@ -79,3 +79,3 @@
-#include <jsautooplen.h>
+#include "jsautooplen.h"
@@ -2595,8 +2595,7 @@ js_Interpret(JSContext *cx)
# define DO_OP() JS_EXTENSION_(goto *jumpTable[op])
-# define DO_NEXT_OP(n) \
- JS_BEGIN_MACRO \
- METER_OP_PAIR(op, regs.pc[n]); \
- op = (JSOp) *(regs.pc += (n)); \
- DO_OP(); \
- JS_END_MACRO
+# define DO_NEXT_OP(n) JS_BEGIN_MACRO \
+ METER_OP_PAIR(op, regs.pc[n]); \
+ op = (JSOp) *(regs.pc += (n)); \
+ DO_OP(); \
+ JS_END_MACRO
@@ -2606,5 +2605,6 @@ js_Interpret(JSContext *cx)
# define ADD_EMPTY_CASE(OP) BEGIN_CASE(OP) \
- JS_ASSERT(js_CodeSpec[OP].length == 1); \
- op = (JSOp) *++regs.pc; \
- DO_OP();
+ JS_ASSERT(js_CodeSpec[OP].length == 1); \
+ op = (JSOp) *++regs.pc; \
+ DO_OP();
+
# define END_EMPTY_CASES
@@ -2614,7 +2614,6 @@ js_Interpret(JSContext *cx)
# define DO_OP() goto do_op
-# define DO_NEXT_OP(n) \
- JS_BEGIN_MACRO \
- JS_ASSERT((n) == len); \
- goto advance_pc; \
- JS_END_MACRO
+# define DO_NEXT_OP(n) JS_BEGIN_MACRO \
+ JS_ASSERT((n) == len); \
+ goto advance_pc; \
+ JS_END_MACRO
@@ -2627,3 +2626,3 @@ js_Interpret(JSContext *cx)
* To share the code for all len == 1 cases we use the specialized label with
- * code that fall through to advance_pc: .
+ * code that falls through to advance_pc: .
*/
@@ -2640,3 +2639,3 @@ js_Interpret(JSContext *cx)
- /* Check for a too deep C stack. */
+ /* Check for too deep of C stack. */
JS_CHECK_RECURSION(cx, return JS_FALSE);
@@ -2777,3 +2776,3 @@ js_Interpret(JSContext *cx)
* It is important that "op" be initialized before calling DO_OP because
- * it is possible for "op" to be specially assigned during the normally
+ * it is possible for "op" to be specially assigned during the normal
* processing of an opcode while looping. We rely on DO_NEXT_OP to manage
@@ -2786,6 +2785,8 @@ js_Interpret(JSContext *cx)
/*
- * This is a loop, but it does not look like a loop. The loop-closing
- * jump is distributed throughout interruptJumpTable, and comes back to
- * the interrupt label. The dispatch on op is through normalJumpTable.
- * The trick is LOAD_INTERRUPT_HANDLER setting jumpTable appropriately.
+ * This is a loop, but it does not look like a loop. The loop-closing
+ * jump is distributed throughout goto *jumpTable[op] implementing DO_OP.
+ * When interrupts are enabled, jumpTable is set to interruptJumpTable
+ * where all jumps point to the JSOP_INTERRUPT case. The latter, after
+ * calling the interrupt handler, dispatches through normalJumpTable to
+ * continue the normal bytecode processing.
*/
@@ -2890,8 +2891,2 @@ js_Interpret(JSContext *cx)
- BEGIN_CASE(JSOP_SWAP)
- rtmp = regs.sp[-1];
- regs.sp[-1] = regs.sp[-2];
- regs.sp[-2] = rtmp;
- END_CASE(JSOP_SWAP)
-
BEGIN_CASE(JSOP_SETRVAL)
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsopcode.tbl 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsopcode.tbl 2008-05-22 11:07:32.000000000 +0200
@@ -333,5 +333,5 @@ OPDEF(JSOP_FINALLY, 134,"finally",
/*
- * Swap the top two stack elements.
+ * Generic nop for the decompiler.
*/
-OPDEF(JSOP_SWAP, 135,"swap", NULL, 1, 2, 2, 0, JOF_BYTE)
+OPDEF(JSOP_NOP, 135,"nop", NULL, 1, 0, 0, 0, JOF_BYTE)
@@ -523,2 +523 @@ OPDEF(JSOP_INT32, 228, "int32",
OPDEF(JSOP_LENGTH, 229, "length", NULL, 1, 1, 1, 18, JOF_BYTE|JOF_PROP)
-OPDEF(JSOP_NOP, 230, "nop", NULL, 1, 0, 0, 0, JOF_BYTE)
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsopcode.c 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsopcode.c 2008-05-22 11:12:15.000000000 +0200
@@ -77,3 +77,3 @@
-#include <jsautooplen.h>
+#include "jsautooplen.h"
@@ -2158,10 +2158,2 @@ Decompile(SprintStack *ss, jsbytecode *p
- case JSOP_SWAP:
- /*
- * We don't generate this opcode currently, and previously we
- * did not need to decompile it. If old, serialized bytecode
- * uses it still, we should fall through and set todo = -2.
- */
- /* FALL THROUGH */
-
case JSOP_GOSUB:
@@ -4946,4 +4938,3 @@ DecompileExpression(JSContext *cx, JSScr
JS_ASSERT(op != JSOP_CASE && op != JSOP_CASEX &&
- op != JSOP_DUP && op != JSOP_DUP2 &&
- op != JSOP_SWAP);
+ op != JSOP_DUP && op != JSOP_DUP2);
@@ -5219,9 +5210,2 @@ ReconstructPCStack(JSContext *cx, JSScri
- case JSOP_SWAP:
- JS_ASSERT(ndefs == 2);
- pc2 = pcstack[pcdepth];
- pcstack[pcdepth] = pcstack[pcdepth + 1];
- pcstack[pcdepth + 1] = pc2;
- break;
-
case JSOP_LEAVEBLOCKEXPR:
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsdbgapi.c 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsdbgapi.c 2008-05-22 11:12:21.000000000 +0200
@@ -64,3 +64,3 @@
-#include <jsautooplen.h>
+#include "jsautooplen.h"
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsemit.c 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsemit.c 2008-05-22 11:11:56.000000000 +0200
@@ -68,3 +68,3 @@
-#include <jsautooplen.h>
+#include "jsautooplen.h"
--- /home/igor/m/ff/mozilla/quilt.ANS1mk/js/src/jsobj.c 2008-05-22 16:12:30.000000000 +0200
+++ js/src/jsobj.c 2008-05-22 11:12:21.000000000 +0200
@@ -90,3 +90,3 @@
-#include <jsautooplen.h>
+#include "jsautooplen.h"
Attachment #320972 -
Attachment is obsolete: true
Attachment #322098 -
Flags: review?(brendan)
Attachment #320972 -
Flags: review?(brendan)
Assignee | ||
Comment 19•17 years ago
|
||
I forgot to include the newly introduced jsoplengen.c in the previous patch.
Attachment #322098 -
Attachment is obsolete: true
Attachment #322104 -
Flags: review?(brendan)
Attachment #322098 -
Flags: review?(brendan)
Comment 20•17 years ago
|
||
Comment on attachment 322104 [details] [diff] [review]
v4 with all files
>- /* Check for too deep a C stack. */
>+ /* Check for too deep of C stack. */
Sorry, my last comment should have suggested "of a" -- need the indefinite article before "C stack". "thread stack" would be even better in view of API name, and of the language-independent nature of the thread stack.
>+#if JS_THREADED_INTERP
>+ /*
>+ * This is a loop, but it does not look like a loop. The loop-closing
>+ * jump is distributed throughout goto *jumpTable[op] implementing DO_OP.
s/implementing/inside of/
r=me with these final nits picked.
/be
Attachment #322104 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 21•17 years ago
|
||
The new version addresses the nits from the comment 20:
+++ js/src/jsinterp.c 2008-05-22 19:42:16.000000000 +0200
@@ -2639,3 +2639,3 @@ js_Interpret(JSContext *cx)
- /* Check for too deep of C stack. */
+ /* Check for too deep of a native thread stack. */
JS_CHECK_RECURSION(cx, return JS_FALSE);
@@ -2786,3 +2786,3 @@ js_Interpret(JSContext *cx)
* This is a loop, but it does not look like a loop. The loop-closing
- * jump is distributed throughout goto *jumpTable[op] implementing DO_OP.
+ * jump is distributed throughout goto *jumpTable[op] inside of DO_OP.
* When interrupts are enabled, jumpTable is set to interruptJumpTable
Attachment #322104 -
Attachment is obsolete: true
Attachment #322135 -
Flags: review+
Comment 22•17 years ago
|
||
This is ready to land, right?
Assignee | ||
Comment 23•17 years ago
|
||
(In reply to comment #22)
> This is ready to land, right?
Yes.
Assignee | ||
Comment 24•17 years ago
|
||
The patch is a version of v4b generated against mozilla-central.
Attachment #322135 -
Attachment is obsolete: true
Attachment #325536 -
Flags: review+
Assignee | ||
Comment 25•17 years ago
|
||
The new version synchronizes the patch with mozilla-central changes.
Attachment #325536 -
Attachment is obsolete: true
Attachment #325937 -
Flags: review+
Assignee | ||
Comment 26•17 years ago
|
||
mozilla-central push:
changeset: 15457:21527193c49b
tag: tip
parent: 15455:f7aaa37e926a
user: Igor Bukanov <igor@mir2.org>
date: Fri Jun 20 09:36:56 2008 +0200
summary: [Bug 433382] More efficient interpreter switch when computed goto is not available. r=brendan
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 27•17 years ago
|
||
I backed out the pushed changes: in the Makefile.in I fotrgot to change CSRCS to CPPSRCS for the dependency rule for the new autogenerated header.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 28•17 years ago
|
||
Attachment #325943 -
Flags: review+
Assignee | ||
Updated•17 years ago
|
Attachment #325943 -
Attachment is patch: true
Attachment #325943 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Updated•17 years ago
|
Attachment #325937 -
Attachment is obsolete: true
Assignee | ||
Comment 29•17 years ago
|
||
mozilla-central push:
changeset: 15461:97977f224aff
tag: tip
parent: 15458:404e6281b360
user: Igor Bukanov <igor@mir2.org>
date: Fri Jun 20 10:06:45 2008 +0200
summary: [Bug 433382] More efficient interpreter switch when computed goto
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 30•17 years ago
|
||
I backed out again due to more breakage. Some dependencies are wrong...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 31•17 years ago
|
||
The new patch includes the missing jsoplengen.cpp.
Attachment #325943 -
Attachment is obsolete: true
Attachment #325955 -
Flags: review+
Assignee | ||
Comment 32•17 years ago
|
||
Here is yet another push to mozilla-central:
changeset: 15463:5a770d1e8bb7
tag: tip
user: Igor Bukanov <igor@mir2.org>
date: Fri Jun 20 11:55:49 2008 +0200
summary: [Bug 433382] More efficient interpreter switch when computed goto is not available. r=brendan
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•