Closed Bug 365608 Opened 13 years ago Closed 13 years ago

Alternative bytecode format for atom indexes beyond 0xFFFF

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

Attachments

(4 files, 7 obsolete files)

Currently to support an index into the atom table that does not fit into 2 bytes SpiderMonkey uses 3 different strategy depending on the bytecode format. See JSOP_LITERAL, JSOP_FINDNAME and JSOP_LITOPX cases in jsemi.c.

I suggest to try an alternative uniform approach based on 2 new bytecodes, atombase and clearbase. Whenever the atom index exceeds 0xFFFF, the corresponding bytecode is prefixed with atombase recording the upper bits of the index. In addition clearbase suffix is generated. 

With the new bytecode the interpreter needs to maintain a new register "atoms" that records the current atom table offset. Normally this register is initialized to script->atomMap.vector. All atom-based operations to access their atoms simply index the register with 2-byte offset. When atombase is executed the register is increased by the upper bits of the atom index and clearatom resets the register back to script->atomMap.vector.

The drawback of the new schema is that the big index bytecode increases by the single byte representing  clearatom suffix. It also adds a performance hit due to the extra dispatch. On the other hand this should be compensated by a significantly smaller code corresponding to the current JSOP_LITOPX and in case of  JSOP_LITERAL avoiding pushing/poping the stack. 

Moreover, due to the presence of clearbase it is always possible to tell if a particular atom bytecode has huge index. For that one just need to check if the bytecode is followed by clearbase and then read atombase index if so. Thus the various bytecode walking code fragments in the decompiler/disassembler do not need to maintain the atom base and use a helper function for a slower extraction of index.  Thus there all the code to deal with JSOP_LITERAL and friends will be removed.
Attached patch Implementation v0.1 (obsolete) — Splinter Review
This does not work (although compiles) and just represents the backup of the refactoring efforts.
Attached patch Implementation v0.2 (obsolete) — Splinter Review
Here is yet another backup with JSOP_GETTER/JSOP_SETTER prefix fixes.
Attachment #250131 - Attachment is obsolete: true
I thought of a "segment register" approach when fixing bug 155081, but decided against it because of the complexity of optimizing to group atoms and avoid resetting the base register after each non-initial-segment immediate (btw, reset seems better than clear as a metaphor, fwiw).

Without optimization to avoid constant non-initial-segment register setting and resetting, this patch will penalize all immediates not in the initial segment.  Is that bad?  Not sure, big scripts we've seen sometimes consist of straight line code that sets array[i] for constant i from 0 to HUGE, then runs a loop or does something more complicated (defines functions, whose names would be atoms with huge indexes, which are then called).

Shifting the economics this dramatically seemed undesirable to me, hence the three instructions (JSOP_LITERAL, JSOP_FINDNAME, JSOP_LITOPX).  This cost is now sunk, and unless it is a bad sunk cost (one that's incurring further costs; the classic bad-sunk-cost fallacy is when you own a stock that's doomed and you keep buying based on the rationale that you've already bought a lot), I don't see why we need to "fix" it.

This patch will save code size, OTOH.  How much?  Unless it's large, the skewed bytecode economics for large scripts is an incommensurate risk I'd rather not take just to win some bytes of footprint back.

Given all the other work, including for better GC in 1.9, and for aligning SpiderMonkey and Tamarin in Mozilla 2, is this a bug to patch?

/be
The current code (LITERAL/FINDNAME/LITOPX) imposes cost on non-initial-segment immediates, it's true -- just less than the patch here imposes.

Here's the thought I had when considering a segment register approach: have the code generator try to model the runtime register and optimize away register changes. This can't be done correctly without flow analysis.

The simplest kind of analysis would just notice in each basic block that you'd done a JSOP_ATOMBASE to set the register to segment N, and not clear after each opcode. We don't even model of basic blocks at present. To do so we would need to know when a backward branch splits what seemed like a basic block in two. Since JS lacks goto, this is more easily done prospectively, when processing loop heads (for, while, do).

Once the code generator model had ensured the register was set correctly, it still might need to change the register to a different segment for a bytecode in the same basic block, because the block used two widely separated literals.  But at least it could trivially track the current segment and avoid resetting after each huge index.

This scheme would eliminate the "reset atom base" instruction.

/be

 
Remembering more: I wanted a simple jsemit.c patch; I did not want to build a control flow graph. Invalidating a model segment register in JSCodeGenerator at each loop head is easy; the harder part is invalidating when reaching a forward branch target. But it's not that hard, since the emitter knows to backpatch at precisely those bytecodes that are forward branch targets: all BackPatch() calls with CG_NEXT(cg) passed as the |target| formal.

Assuming locality of atom references, with just a JSOP_ATOMBASE bytecode, this could indeed win over current code in both static footprint and runtime.

/be
Attached patch Implementation v1 (obsolete) — Splinter Review
I have found a problem with the previous versions of the patch: I forgot to reset the atoms register before/after inlined calls.

This version fixes it and bring the following changes:

1. I renamed clearbase into resetbase.
2. I added atombase1..atombase3 so the size of the bytecode is not increased unless there are more then 256K atoms in the script.
3. That forced to add resetbase0 as I do not want to require to maintain the atom register in disassembler/decompiler.
4. To skip atombase/resetbase I added JOF_HIGHBITS that is testes in jsopcode.c.

I am pretty much finished with this patch. If there are better ideas, I welcome reassigns.
Attachment #250134 - Attachment is obsolete: true
(In reply to comment #3)
> Not sure, big scripts we've seen sometimes consist of straight
> line code that sets array[i] for constant i from 0 to HUGE, then runs a loop or
> does something more complicated (defines functions, whose names would be atoms
> with huge indexes, which are then called).

For those huge scripts atom table is waste if atom is used only once. Perhaps the idea would be use the atom table for atoms referenced more then once in the script and embed the rest directly into the bytecode? Alternatively a separated table for string literals should also eliminate the huge atom indexes.
Attached patch Implementation v2 (obsolete) — Splinter Review
Fixing bytecode disassembler bug.
Attachment #250216 - Attachment is obsolete: true
Comment on attachment 250226 [details] [diff] [review]
Implementation v2

Asking for a review since this patch simplifies the implementation of the extra  call bytecodes in bug 363530.
Attachment #250226 - Flags: review?(brendan)
Attached patch Fixes and nit-picks to v2 (obsolete) — Splinter Review
Interdiff should show what's what -- main fix was to move js_GetAtomFromBytecode out of #ifdef DEBUG.

/be
Assignee: igor.bukanov → brendan
Attachment #250226 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #250614 - Flags: review?(igor.bukanov)
Attachment #250226 - Flags: review?(brendan)
(In reply to comment #10)
> Created an attachment (id=250614) [details]
> Fixes and nit-picks to v2
> 
> Interdiff should show what's what -- main fix was to move
> js_GetAtomFromBytecode out of #ifdef DEBUG.

Of course, bugzilla's interdiff fails again.  Is there a bz bug to improve it?

/be
I'd like to eliminate RESETBASE0 and minimize atom segment switching, which will require saving atoms in JSInlineFrame, and force bytecode inspectors (disassembler and decompiler notably) to maintain state.  This is not the end of the world, since the decompiler must do so already for other reasons; the disassembler is an #ifdef DEBUG creature that can evolve, too.  I would do this in a new bug.

Comments?

/be
Comment on attachment 250614 [details] [diff] [review]
Fixes and nit-picks to v2

>Index: jsobj.c
>@@ -3026,51 +3026,61 @@ Detecting(JSContext *cx, jsbytecode *pc)
>+            /*
>+             * At this point, anything but grouping means we're not detecting
>+             * unless we see high bits suffix/prefix.
>+             */
>+            if ((js_CodeSpec[op].format & JOF_TYPEMASK) != JOF_2BYTE)
>+                return JS_FALSE;
>             break;
>+        }

Here is a bug: the must be for any atom prefix, not only for 2-bytecode version.


>Index: jsopcode.h
...
>+ * immediate operand index.  A script with more than 64K literals must wrap the
>+ * bytecode into JSOP_ATOMBASE and JSOP_RESETBASE pair.
>  */

2 spaces before A script: a leftover from my version.
Attachment #250614 - Flags: review?(igor.bukanov)
(In reply to comment #13)
> (From update of attachment 250614 [details] [diff] [review])
> >Index: jsobj.c
> >@@ -3026,51 +3026,61 @@ Detecting(JSContext *cx, jsbytecode *pc)
> >+            /*
> >+             * At this point, anything but grouping means we're not detecting
> >+             * unless we see high bits suffix/prefix.
> >+             */
> >+            if ((js_CodeSpec[op].format & JOF_TYPEMASK) != JOF_2BYTE)
> >+                return JS_FALSE;
> >             break;
> >+        }
> 
> Here is a bug: the must be for any atom prefix, not only for 2-bytecode
> version.

Ah, ok.  I was trying to make a new type, usable for other ops; this kind of test might better be done with a flag.  I'll go back to HIGHBITS for now (or should it be called something more precise?).

> >Index: jsopcode.h
> ...
> >+ * immediate operand index.  A script with more than 64K literals must wrap the
> >+ * bytecode into JSOP_ATOMBASE and JSOP_RESETBASE pair.
> >  */
> 
> 2 spaces before A script: a leftover from my version.

Previous sentence in comment massively predates, and is separated by "  " from the first sentence, so I'd rather leave this matching.

I'm old-school when it comes to punctation ;-).

/be
(In reply to comment #14)
> Ah, ok.  I was trying to make a new type, usable for other ops; this kind of
> test might better be done with a flag.  I'll go back to HIGHBITS for now (or
> should it be called something more precise?).

An alternative is to add a macro to check for this.
(In reply to comment #14)
> (In reply to comment #13)
> > (From update of attachment 250614 [details] [diff] [review] [details])
> > >Index: jsobj.c
> > >@@ -3026,51 +3026,61 @@ Detecting(JSContext *cx, jsbytecode *pc)
> > >+            /*
> > >+             * At this point, anything but grouping means we're not detecting
> > >+             * unless we see high bits suffix/prefix.
> > >+             */
> > >+            if ((js_CodeSpec[op].format & JOF_TYPEMASK) != JOF_2BYTE)
> > >+                return JS_FALSE;
> > >             break;
> > >+        }
> > 
> > Here is a bug: the must be for any atom prefix, not only for 2-bytecode
> > version.
> 
> Ah, ok.  I was trying to make a new type, usable for other ops; this kind of
> test might better be done with a flag.  I'll go back to HIGHBITS for now (or
> should it be called something more precise?).

What's more, I changed the length-1 opcodes (JSOP_ATOMBASE[123] and JSOP_RESETBASE{,0}) to have type JOF_BYTE, which is correct (length 2 is novel and length 1 bytecodes all have type JOF_BYTE).  The format encoding has a type field in the low bits, but it's not meant to represent arbitrary facts about opcodes of different formats -- it is a "format type" (type of format) field.

/be
JOF_2BYTE is the format-type, so the disassembler and other such code can use format-type by itself to decode immediate operands.  JOF_ATOMBASE is a flag used to distinguish JSOP_ATOMBASE{,1,2,3} and JSOP_RESETBASE{,0}).

Some DUMP_CALL_TABLE-#ifdef'ed fixes thrown in.

/be
Attachment #250614 - Attachment is obsolete: true
Attachment #250649 - Flags: review?(igor.bukanov)
Attachment #250649 - Flags: review?(igor.bukanov) → review+
Comment on attachment 250649 [details] [diff] [review]
Fixes and nit-picks to v2, revised

> /*
>  * Emit a bytecode and its 2-byte constant (atom) index immediate operand.
>  * If the atomIndex requires more than 2 bytes, emit a prefix op whose 24-bit
>  * immediate operand indexes the atom in script->atomMap.
>- *
>- * If op has JOF_NAME mode, emit JSOP_FINDNAME to find and push the object in

I forgot to fix the comments. It should write about the format.
The last patch saved over 4K from the optimized and stripped shell executable:

Before the patch:

608524 Jan  6 00:58 ./Linux_All_OPT.OBJ/js
68820  Jan  6 00:58 ./Linux_All_OPT.OBJ/jsinterp.o

After the patch:

604236 Jan  6 01:01 ./Linux_All_OPT.OBJ/js
 66280 Jan  6 01:01 ./Linux_All_OPT.OBJ/jsinterp.o

Clearly, many interpreter savings comes from the fact that GET_ATOM no longer calls js_GetAtom but rather accesses the atom table directly, but removal of the extra bytecode to handle extended literals should also help.  
GET_ATOM was a wrapper on js_GetAtom for paranoia, in the old days.  Occasionally we still see bugs where a bad script or cx is used, and you die in js_GetAtom (please add information here if you remember more, timeless).

Final patch I'm checking in attached next.

/be
Attached patch final patch for checkin (obsolete) — Splinter Review
Attachment #250649 - Attachment is obsolete: true
Attachment #250655 - Flags: review+
That final patch's revised comment still lied.  Here's the interdiff to get the final^2 patch:

diff -u jsemit.c jsemit.c
--- jsemit.c    6 Jan 2007 00:42:41 -0000
+++ jsemit.c    6 Jan 2007 01:16:34 -0000
@@ -1755,6 +1755,8 @@
  * If the atomIndex requires more than 2 bytes, emit a prefix op whose 8-bit
  * immediate operand effectively extends the 16-bit immediate of the prefixed
  * opcode, by changing atom "segment" within script->atomMap (see jsinterp.c).
+ * We optimize segments 1-3 with single-byte JSOP_ATOMBASE[123] codes.
+ *
  * Such prefixing currently requires a suffix to restore the "zero segment"
  * register setting, but this could be optimized further.
  */

/be
Here is another last minute suggestion to the patch. To reduce the code slightly js_PCToLineNumber can use JOF_ATOMBASE and  js_GetAtomFromBytecode as well: 

+++ js/src/jsscript.c	2007-01-06 02:02:51.000000000 +0100
@@ -1495,47 +1495,35 @@ js_GetSrcNoteCached(JSContext *cx, JSScr
     }
 
     return result;
 }
 
 uintN
 js_PCToLineNumber(JSContext *cx, JSScript *script, jsbytecode *pc)
 {
-    uintN atomBase;
     JSAtom *atom;
     JSFunction *fun;
     uintN lineno;
     ptrdiff_t offset, target;
     jssrcnote *sn;
     JSSrcNoteType type;
 
     /* Cope with JSStackFrame.pc value prior to entering js_Interpret. */
     if (!pc)
         return 0;
 
     /*
      * Special case: function definition needs no line number note because
      * the function's script contains its starting line number.
      */
-    atomBase = 0;
-  restart:
-    switch (*pc) {
-      case JSOP_ATOMBASE:
-        atomBase = GET_ATOMBASE(pc);
-        pc += JSOP_ATOMBASE_LENGTH;
-        goto restart;
-      case JSOP_ATOMBASE1:
-      case JSOP_ATOMBASE2:
-      case JSOP_ATOMBASE3:
-        atomBase = (*pc - JSOP_ATOMBASE1 + 1) << 16;
-        pc += 1;
-        goto restart;
-      case JSOP_DEFFUN:
-        atom = GET_ATOM(cx, script->atomMap.vector + atomBase, pc);
+    if (js_CodeSpec[*pc].format & JOF_ATOMBASE)
+        pc += js_CodeSpec[*pc].length;
+    if (*pc == JSOP_DEFFUN) {
+        atom = js_GetAtomFromBytecode(cx, script, pc, 0);
Best of three times with the patch, on the JS code extracted from the testcase for bug 155081:

$ time ./Darwin_OPT.OBJ/js /tmp/biglit.js 

real    0m0.462s
user    0m0.431s
sys     0m0.029s

Best of three without the patch, using JSOP_LITOPX et al. (actually this is my 1.8 branch optimized js shell):

$ time ../srcmoz180/Darwin_OPT.OBJ/js /tmp/biglit.js

real    0m0.277s
user    0m0.252s
sys     0m0.023s

/be
For the following test case:

function g()
{
  return 10;
}

var N = 500*1000;
var src = 'var x = ["';
var array = Array(N);
for (var i = 0; i != N; ++i)
  array[i] = i;
src += array.join('","')+'"]; return x;';
var f = Function(src);
var x = f();
if (x.length != N)
  throw "Unexpected array length: "+x.length;


I see a very slight speed-up with the patch:

Before (best of 3 runs):
real	0m6.659s
user	0m6.372s
sys	0m0.236s

After:
real	0m6.627s
user	0m6.304s
sys	0m0.308s

This is for optimized build of the shell with -O2, I will try the same with -O.
The test case with the standard -O with the patch gives very slight reduction in the speed:

Before:
real	0m6.790s
user	0m6.412s
sys	0m0.308s

After:
real	0m6.812s
user	0m6.500s
sys	0m0.276s
Here is the number for the test case from bug 155081 adopted for jsshell use:

Before:
real	0m0.485s
user	0m0.472s
sys	0m0.012s

After:
real	0m0.493s
user	0m0.484s
sys	0m0.004s

I do not see any significant difference again. The shell was compiled using the standard -O option.
Sorry, false alarm: I left some instrumentation turned on that should not have been on. I'm checking in now.

/be
Blocks: 366122
No longer flag JSOP_RESETBASE* with JOF_ATOMBASE, which should be ok as the decompiler does nothing for unknown ops.

/be
Attachment #250706 - Flags: review?(igor.bukanov)
Attachment #250655 - Attachment is obsolete: true
Attachment #250706 - Flags: review?(igor.bukanov) → review+
Fixed on trunk:

js/src/js.c rev 3.128
js/src/jsemit.c rev 3.232
js/src/jsinterp.c rev 3.317
js/src/jsobj.c rev 3.314
js/src/jsopcode.c rev 3.200
js/src/jsopcode.h rev 3.38
js/src/jsopcode.tbl rev 3.84
js/src/jsscript.c rev 3.126

/be
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Blocks: 366236
Blocks: 365692
I'd like to see this and any followup changes go into js1.7src, even though this bug's patch is not going into 1.8.1.x.  Comments?

/be
Blocks: js1.7src
Flags: in-testsuite-
Flags: blocking1.8.1.2?
Flags: blocking1.8.0.10?
Not blocking and not sure we want such a huge patch going into the branches this late in the game.  Let's reevaluate this for next time... maybe find a simpler patch for the branches?
Flags: blocking1.8.1.2?
Flags: blocking1.8.1.2-
Flags: blocking1.8.0.10?
Flags: blocking1.8.0.10-
Flags: wanted1.8.1.x+
Flags: wanted1.8.0.x+
A hand-merged patch for 1.8.1 branch with the fix for the regression 366312 included.
Assignee: brendan → igor
Status: RESOLVED → ASSIGNED
Resolution: FIXED → ---
Comment on attachment 254175 [details] [diff] [review]
Patch for 1.8.1 branch

The patch has the same failures in the test suite as 1.8.1 shell.
Attachment #254175 - Flags: review?(brendan)
Attachment #254182 - Attachment is patch: true
Attachment #254182 - Attachment mime type: text/x-patch → text/plain
Comment on attachment 254175 [details] [diff] [review]
Patch for 1.8.1 branch

Sigh. Can we try to sync more trunk to branch code? I read the diff of diffs (it would be cool if empty context could be trimmed -- anyone know an option or tool for that?) and this looks good, *except* that JSXDR_BYTECODE_VERSION must not match between trunk and branch now.

SO either we should try to sync jsopcode.tbl b/t trunk and 1.8 branch, or this patch needs to alter jsxdrapi.h too. I am very much in favor of the former.

/be
Comment on attachment 254175 [details] [diff] [review]
Patch for 1.8.1 branch

See comment 43.

/be
Attachment #254175 - Flags: review?(brendan) → review-
This was fixed on 2007-01-06.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.