Closed Bug 1322019 Opened 3 years ago Closed 3 years ago

Print stack transition in dis() function output.

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox53 --- affected
firefox54 --- fixed

People

(Reporter: arai, Assigned: arai)

References

Details

Attachments

(6 files, 6 obsolete files)

3.32 KB, patch
nbp
: review+
Details | Diff | Splinter Review
6.36 KB, patch
nbp
: review+
Details | Diff | Splinter Review
3.92 KB, patch
nbp
: review+
Details | Diff | Splinter Review
3.50 KB, patch
nbp
: review+
Details | Diff | Splinter Review
45.52 KB, patch
nbp
: review+
Details | Diff | Splinter Review
5.63 KB, patch
nbp
: review+
Details | Diff | Splinter Review
Currently we have stack convention data in Opcodes.h, as comment, like this:
>      * Swaps the top two values on the stack. This is useful for things like
>      * post-increment/decrement.
>      *   Category: Operators
>      *   Type: Stack Operations
>      *   Operands:
>      *   Stack: v1, v2 => v2, v1

We could use similar data to print dummy stack transition, next to bytecode in dis() output.

something like this:

00000:  getgname "foo"          # val
00005:  dup                     # val val
00006:  callprop "hello"        # val obj[name]
00011:  swap                    # obj[name] val
00012:  getgname "a"            # obj[name] val val
00017:  getgname "b"            # obj[name] val val val
00022:  add                     # obj[name] val (lval+rval)
00023:  call 1                  # rval
00026:  return                  #
00027:  retrval                 #

(we could do some more formatting tho)

I'm doing it manually everytime modifying bytecode, and it should be nice if it's done automatically.
or, perhaps we could create separated python script to annotate dis() output,
if introducing that to Opcodes.h makes it complicated.
I'll go with python script as a first step.
Summary: Print stack transition in dis() function output. → Add a script to print stack transition in dis() function output.
Attached file sample output (obsolete) —
and output of the following command:

  cat dump.txt | ./js/src/vm/show_stack.py `pwd`

where dump.txt contains the bytecode section of the output of the following:

  dis(async function*() { for await (x of y) { if (x) break; } })

with bug 1331092 patch applied.
currently it cannot handle lambda/lambda_arrow, since it can print multiple-line operand.
Updated to handle more operands
also add comments for jump targets.
Attachment #8836326 - Attachment is obsolete: true
Attached file sample output (obsolete) —
output for dis(async function*() { for await (x of y) { if (x) break; } })
Attachment #8836327 - Attachment is obsolete: true
now it ignores all non-disassembly line, so it can be used like

$ ./obj-dir/dist/bin/js | ./js/src/vm/show_stack.py `pwd`
dis(async function*() { for await (x of y) { if (x) break; } })
Ctrl+D
forgot to update the patch
Attachment #8836450 - Attachment is obsolete: true
[Opcodes.h]
  JSOP_SYMBOL
    removed "," that means the separator between operands, while JSOP_SYMBOL
    has only one operand
  JSOP_STRING
    renamed "string" to "atom", to associate it with "atomIndex" operand


[show_stack.py]

It receives the output of dis() function from stdin, ignoring all not-parsible lines.
It prints the calculated output when it find empty line or EOF.


The output is in the following format:
  pc: opcode operands              # stack
like the following for for `10 + x`:
  00000:  int8 10                  # 10
  00002:  getgname "x"             # 10 x
  00007:  add                      # (10 + x)


It also prints the information about jump target,
like the following for for `y =  x ? 10 : 20;`:
  00000:  bindgname "y"            # global
  00005:  getgname "x"             # global x
  00010:  ifeq 23 (+13)            # global
  00015:  jumptarget               # global
  00016:  int8 10                  # global 10
  00018:  goto 26 (+8)             # global 10

  # from ifeq @ 00010
  00023:  jumptarget               # global
  00024:  int8 20                  # global 20

  # from goto @ 00018
  00026:  jumptarget               # global 10
  00027:  setgname "y"             # 10
  00032:  pop                      #


It calculates the stack for each opcode, from PC 0, by following the execution (jump, branch, try-catch-finally).
Stack is calculated like following ways (maybe too simplified):
  1. let next_list be an list with (PC=0, stack=empty)
  2. while next_list is not empty:
    a. pick an item from next_list
    b. apply Stack uses/defs to stack, for opcode pointed by PC
    c. remember the stack for PC
    d. for each possible jump target of current opcode:
      i. if the jump target is not yet checked:
        1. append (PC=jump target, stack=clone of current stack) to next_list

When applying Stack uses/defs, it tries to keep name, like:
  int8 10    # 10
  int8 20    # 10 20
  add        # (10 + 20)
for the following Stack uses/defs for JSOP_ADD:
>     *   Stack: lval, rval => (lval + rval)

This is done by consuming stack 10, 20 by assigning them to lval, rval, in operands_to_vars function,
and pushing "(lval + rval)" with lval and rval replaced with the value, in expand_expr function.

Also, it uses the atom from operands, like:
  00005:  string "b"               # x "b"
for the following stack uses/defs for JSOP_STRING:
>     *   Operands: uint32_t atomIndex
>     *   Stack: => atom
by associating "SOMETHINGIndex" to "SOMETHING"

when the Stack uses/defs are variadic, or needs special handling for readability,
it's expanded to simple uses/defs in expand_stack function.


[make_opcode_doc.py]

separated code that parses Opcodes.h into opcode.py,
and also made it parse more fields (display_name, image),
and convert operands, stack_uses, stack_defs into an array (operands_array, stack_uses_array, stack_defs_array),
and also it now returns the list of opcodes.
Assignee: nobody → arai.unmht
Attachment #8836453 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8838642 - Flags: review?(luke)
Do you suppose you could get the person who reviewed the original Opcodes.h docs to review this patch as well?  I don't feel I have quite enough context.
Comment on attachment 8838642 [details] [diff] [review]
Add js/src/vm/show_stack.py to show stack transition for given bytecode.

forwarding to Waldo.
feel free to redirect.
Attachment #8838642 - Flags: review?(luke) → review?(jwalden+bmo)
Currently we have the BytecodeParser, which for at least generating error messages for JS expressions.  The bytecode parser reconstruct the stacks with the offset from which the expression comes from, and has special tags [1] for merging branches.  The benefit of re-using the BytecodeParser is to spot error messages issues at the same time.

Do you think it would be possible to output this stack-data as part of the dis() output?

In which case you could convert the current script in a style-check script which validates the consistency between the Macro and the comment.

[1] http://searchfox.org/mozilla-central/source/js/src/jsopcode.cpp#318-319
Flags: needinfo?(arai.unmht)
Comment on attachment 8838642 [details] [diff] [review]
Add js/src/vm/show_stack.py to show stack transition for given bytecode.

(In reply to Nicolas B. Pierron [:nbp] from comment #13)
> Currently we have the BytecodeParser, which for at least generating error
> messages for JS expressions.  The bytecode parser reconstruct the stacks
> with the offset from which the expression comes from, and has special tags
> [1] for merging branches.  The benefit of re-using the BytecodeParser is to
> spot error messages issues at the same time.
> 
> Do you think it would be possible to output this stack-data as part of the
> dis() output?
> 
> In which case you could convert the current script in a style-check script
> which validates the consistency between the Macro and the comment.
> 
> [1] http://searchfox.org/mozilla-central/source/js/src/jsopcode.cpp#318-319

thanks, looks like I can build this on BytecodeParser+ExpressionDecompiler.
I'll rewrite it.
Flags: needinfo?(arai.unmht)
Attachment #8838642 - Flags: review?(jwalden+bmo)
Going back to original plan.
it's almost working.
now it's time to optimize some more.
Summary: Add a script to print stack transition in dis() function output. → Print stack transition in dis() function output.
when opcode defined 2 values, BytecodeParser doesn't trace which value it was.
so, the stack transition would have to say "# (A or B) (A or B)" or something like that.
or we need to modify BytecodeParser to track the index of ndefs.
by using struct for Bytecode.offsetStack array element type,
we can add the index for ndefs to track when an opcode defines 2 values.

then, the struct will become 48-64 bits (iiuc).
so we have another bits available.

now I'm thinking about making UnknownOffset and IgnoreOffset an annotation, for
given offset/index,
and display the possible value in stack dump, like following:

js> dis(() => (x && y)())
flags: LAMBDA EXPRESSION_CLOSURE ARROW
loc     op
-----   --
main:
00000:  getgname "x"                    # x
00005:  and 17 (+12)                    # x
00010:  jumptarget                      # x
00011:  pop                             # 
00012:  getgname "y"                    # y

# from and @ 00005
00017:  jumptarget                      # merged<x>
00018:  undefined                       # merged<x> undefined
00019:  call 0                          # merged<x>(...)
00022:  return                          # 
00023:  retrval                         # !!! UNREACHABLE !!!

for UnknownOffset (the merge case), we don't have enough space to store 2 or more offsets/indices,
(unless storing them in separated space and pointing the index there or something similar)
so the above example shows the 1st item of the merge.
it would be better than just showing "<merged>" in stack dump.
maybe displaying all possibilities for the merge case is just overkill.
since the stack dump will become too long (or spans into multiple lines)
enumerating all possibilities will result in recursion while decompiling expression (if we want to test expression decompiler as well there)
I'll go with just using the first item, not overwriting with joining newer value.
wow, just noticed that we're emitting JSOP_RETRVAL for arrow function with expression body, that's always unreachable :P
See Also: → 1341943
Before applying the main change, I'd like to improve the decompiler output.

Currently unary operator is added outside of parens, like,
!(foo)
+ (foo)

but the operator should be inside the parens, like
(!foo)
(+ foo) // maybe (+foo) ?

also, "void" had no parens.
so also added enclosing parens.
Attachment #8836451 - Attachment is obsolete: true
Attachment #8838642 - Attachment is obsolete: true
Attachment #8840324 - Flags: review?(nicolas.b.pierron)
Also, supported several more opcodes.

  * JSOP_DELNAME
  * JSOP_DELPROP
  * JSOP_DELELEM
  * JSOP_STRICTDELPROP
  * JSOP_STRICTDELELEM
    (delete EXPR)

  * JSOP_GETXPROP
    OBJ.PROP or OBJ[PROP]

  * JSOP_FUNAPPLY
    FUNC(...)    // FUNC here contains ".apply"

  * JSOP_SUPERCALL
  * JSOP_SPREADSUPERCALL
    super(...)

  * JSOP_SUPERFUN
    super

  * JSOP_EVAL
  * JSOP_SPREADEVAL
  * JSOP_STRICTEVAL
  * JSOP_STRICTSPREADEVAL
    eval(...)

  * JSOP_NEW
  * JSOP_SPREADNEW
    new FUNC(...)

  * JSOP_TYPEOF
  * JSOP_TYPEOFEXPR
    (typeof EXPR)
Attachment #8840330 - Flags: review?(nicolas.b.pierron)
there are several cases that opcode keeps the value on the stack, with or without removing other values.
those cases should be reflected more to BytecodeParser::simulateOp.

JSOP_CHECKISOBJ was handled in ExpressionDecompiler::decompilePC but it should be handled in BytecodeParser::simulateOp.
Attachment #8840331 - Flags: review?(nicolas.b.pierron)
offsetStack of BytecodeParser::simulateOp should always be non-null.
so the `if (offsetStack)` is not necessary.
Attachment #8840333 - Flags: review?(nicolas.b.pierron)
and here's the main change.

  * in dis() output display the following 2 information
    * stack dump for each opcode, that shows the stack state after the opcode
      just like stack comment in BytecodeEmitter
      each stack value is shown as expression decompiler output
      (with dedicated mode that displays more internal state)
      if the value comes from 2 or more paths, it's displayed like
        merged<EXPR>
      where EXPR is the first item that reaches the opcode

      also, when the value is marked as "ignore", it's displayed like
        ignored<EXPR>

    * if the opcode is jumped from different place, the opcode, offset, and
      the reason of the jump

  * changed the element type of BytecodeParser::Bytecode.offsetStack to `struct OffsetAndDefIndex`.
    * track the index inside `ndefs` (defIndex), to track the case the opcode defines 2 values
    * store ignored case and merged case separately from offset, to use the offset in stack dump

  * Added some properties to Bytecode class, only in DEBUG build that supports dis()
    * stackDepthAfter/offsetStackAfter
      captures the stack after the opcode,
      (just like stackDepth/offsetStack that captures the stack before the opcode)
      stack dump should show the state after the opcode
    * jumpOrigins
      holds the offsets of opcode that jumps to this opcode, with the reason of the jump
      to annotate the opcode in dis() output
      since this is variable length array, I used Vector,
      and Bytecode ctor should receive LifoAlloc in DEBUG mode

  * added dedicated stack dump mode to BytecodeParser and ExpressionDecompiler
    that calculates extra information and also exposes internal state
      BytecodeParser
        * simulate more opcodes (BytecodeParser::simulateOp) to make stack transition clearer
        * records jumps to bytecode (Bytecode.jumpOrigins)
        * capture the stack after each opcode (Bytecode.offsetStackAfter)

      ExpressionDecompiler
        * do not bother with self-hosted code
          (that tries to display argument name in normal mode)
        * display more clearer name for some opcodes, like
          JSOP_NEWARRAY returns "ARRAY" instead of "[]"
        * handle merged/ignored cases and generate merged<EXPR> and ignored<EXPR> output


there's one place that I don't understand.
in FindStartPC, the else branch for the following code looks like not used by any caller.
I added MOZ_CRASH there and triggered try run, but no test hit it.

>         // If the current PC has fewer values on the stack than the index we are
>         // looking for, the blamed value must be one pushed by the current
>         // bytecode, so restore *valuepc.
>         if (index < size_t(parser.stackDepthAtPC(current)))
>             *valuepc = parser.pcForStackOperand(current, index);
>         else
>              *valuepc = current;

can we just remove the else branch and assert the condition?
Attachment #8840342 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8840324 [details] [diff] [review]
Part 1: Put unary operator inside parens in expression decompilation.

Review of attachment 8840324 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/basic/expression-autopsy.js
@@ +89,5 @@
>  
> +check("o[(!o)]");
> +check("o[(~o)]");
> +check("o[(+ o)]");
> +check("o[(- o)]");

nit: for sanity reasons, can you add:

  check("o[(- (o + 1))]");
Attachment #8840324 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8840330 [details] [diff] [review]
Part 2: Decompile more opcodes.

Review of attachment 8840330 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is correct r=me with test cases for these new values, when possible.

If you think the following nits are worth following feel free to apply them as well:

::: js/src/jsopcode.cpp
@@ +1257,5 @@
>        case JSOP_GETGNAME:
>        case JSOP_GETNAME:
>        case JSOP_GETINTRINSIC:
> +        if (op == JSOP_DELNAME) {
> +            if (!write("(delete "))

nit: I would prefer if this logic is easier to follow, such as:

  case JSOP_DELNAME:
    return write("( delete") &&
           write(loadAtom(pc) &&
           write(")");

@@ +1334,3 @@
>          }
> +        if (op == JSOP_DELPROP || op == JSOP_STRICTDELPROP) {
> +            if (!write(")"))

nit: What do you think of:

   bool hasDelete = op == JSOP_DELPROP || op == JSOP_STRICTDELPROP;
   return (hasDelete ? write("(delete") : true) &&
          decompilePCForStackOperand(pc, -1) &&
          (IsIdentified(prop)
           ? write(".") && quote(prop, '\0')
           : write("[") && quote(prop, '\'') && write("]")) &&
          (hasDelete ? write(")") : true);

@@ +1370,5 @@
> +        {
> +            return false;
> +        }
> +        if (op == JSOP_DELELEM || op == JSOP_STRICTDELELEM) {
> +            if (!write(")"))

nit: same here.
Attachment #8840330 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 8840331 [details] [diff] [review]
Part 3: Reflect the case that stack values are kept instead of newly pushed in BytecodeParser::simulateOp.

Review of attachment 8840331 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch!

Do you think we should remove the default case, and force any new JSOP_Opcode to be defined here as well?
Attachment #8840331 - Flags: review?(nicolas.b.pierron) → review+
Attachment #8840333 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Nicolas B. Pierron [:nbp] from comment #28)
> Do you think we should remove the default case, and force any new
> JSOP_Opcode to be defined here as well?

To answer my-self, no because some of these instruction are mutating the objects which are given to them, and this could change the way these are display in error messages.
Comment on attachment 8840342 [details] [diff] [review]
Part 5: Print stack transition in dis() function output.

Review of attachment 8840342 [details] [diff] [review]:
-----------------------------------------------------------------

Really nice work!
This would help everybody working on the front-end for sure, while helping maintaining good error messages.

::: js/src/jsopcode.cpp
@@ +269,5 @@
> +
> +    enum : uint8_t {
> +        Normal = 0,
> +
> +        // Ignored this value in while expression decompilation.

nit: in while -> in the

@@ +364,4 @@
>      class Bytecode
>      {
>        public:
> +        Bytecode(BYTECODE_CTOR_PARAMETER)

nit: Adding an unused argument should be fine, especially since it is likely to be inlined anyway.  Add an "explicit" keyword.

@@ +1406,5 @@
> +            uint32_t depth = parser->stackDepthAfterPC(pc);
> +
> +            for (uint32_t i = 0; i < depth; i++) {
> +                if (i) {
> +                    if (sp->put(" ") < 0)

nit: Spaces are already part of some de-compiled expressions, would it makes sense to use ", ", or "; " instead ?

@@ +1414,5 @@
> +                const OffsetAndDefIndex& offsetAndDefIndex
> +                    = parser->offsetForStackOperandAfterPC(script->pcToOffset(pc), i);
> +                // This will decompile the stack for the same PC many times.
> +                // We'll avoid optimizing it since this is a testing function
> +                // and it won't worth managing cached expression here.

nit: it won't *be* worth

@@ +1782,5 @@
> +#ifdef DEBUG
> +            // For stack dump, argument name is not necessary.
> +            && !isStackDump
> +#endif /* DEBUG */
> +            ) {

nit: put the opening brace on a new line.

@@ +1929,5 @@
>                 write("(...)");
>        case JSOP_NEWARRAY:
> +#ifdef DEBUG
> +        if (isStackDump)
> +            return write("ARRAY");

nit: write("[...]"); ?
Also add a reference to the simulateOp function.

@@ +1983,5 @@
> +    if (isStackDump) {
> +        // Special decompilation for stack dump.
> +        switch (op) {
> +          case JSOP_ARGUMENTS:
> +            return write("ARGUMENTS");

nit: arguments

@@ +2091,5 @@
> +          case JSOP_RESUME:
> +            return write("RVAL");
> +
> +          case JSOP_SPREADCALLARRAY:
> +            return write("ARRAY");

nit: [...] as well?
Attachment #8840342 - Flags: review?(nicolas.b.pierron) → review+
(In reply to Tooru Fujisawa [:arai] from comment #25)
> >         // If the current PC has fewer values on the stack than the index we are
> >         // looking for, the blamed value must be one pushed by the current
> >         // bytecode, so restore *valuepc.
> >         if (index < size_t(parser.stackDepthAtPC(current)))
> >             *valuepc = parser.pcForStackOperand(current, index);
> >         else
> >              *valuepc = current;
> 
> can we just remove the else branch and assert the condition?

Tracking this back through history highlight the following comment:
http://searchfox.org/mozilla-central/rev/5eef25c0c7e49baccf1aed118d5553e99e6b80f7/js/src/jsopcode.cpp#5811

I do not recall seeing temporary stack slots for JSOP_Opcode for a while, so maybe an assertion would be fine.
Thank you for reviewing :D

(In reply to Nicolas B. Pierron [:nbp] from comment #30)
> @@ +1406,5 @@
> > +            uint32_t depth = parser->stackDepthAfterPC(pc);
> > +
> > +            for (uint32_t i = 0; i < depth; i++) {
> > +                if (i) {
> > +                    if (sp->put(" ") < 0)
> 
> nit: Spaces are already part of some de-compiled expressions, would it makes
> sense to use ", ", or "; " instead ?

all spaces are now inside parens (otherwise the expression doesn't interact well with enclosing expression, and it's a bug in decompiler), so it should be clear enough.
and also I'd like to use the same format as stack comment in BytecodeEmitter.
changed JSOP_INITELEM_ARRAY and JSOP_INITELEM_INC to generate "[...]".
Attachment #8840611 - Flags: review?(nicolas.b.pierron)
Attachment #8840611 - Flags: review?(nicolas.b.pierron) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcb3bb41c374eaa64fd64d651148b59b70acf377
Bug 1322019 - Part 1: Put unary operator inside parens in expression decompilation. r=nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/20cd9a2ede17439c9a8c87fc7c6542ff93945a24
Bug 1322019 - Part 2: Decompile more opcodes. r=nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/c905752969fb007959730ae170449dc1e97cd62d
Bug 1322019 - Part 3: Reflect the case that stack values are kept instead of newly pushed in BytecodeParser::simulateOp. r=nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/445ad080f55cd1f6a3dd99a6656645e97f45b748
Bug 1322019 - Part 4: Remove unnecessary if in BytecodeParser::simulateOp. r=nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/2c034824894414f8c69f124588b6d9e4b655c291
Bug 1322019 - Part 5: Print stack transition in dis() function output. r=nbp

https://hg.mozilla.org/integration/mozilla-inbound/rev/5e464ee0fb565d7aa21536fac6a880a8c3264db0
Bug 1322019 - Part 6: Decompole NEWARRAY+INITELEM_ARRAY/INITELEM_INC to [...]. r=nbp
https://hg.mozilla.org/integration/mozilla-inbound/rev/6f3c80a94aabb00c6dbe71490af990b7b0c28d9c
Bug 1322019 - followup: Change a testcase for decompilation to follow the change. r=bustage
Depends on: 1343245
Depends on: 1375446
You need to log in before you can comment on or make changes to this bug.