Last Comment Bug 767274 - write a very simple expression-only decompiler for js_DecompileValueGenerator
: write a very simple expression-only decompiler for js_DecompileValueGenerator
Status: RESOLVED FIXED
[js:t]
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- enhancement (vote)
: mozilla17
Assigned To: :Benjamin Peterson
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 777181 781071
Blocks: 718969
  Show dependency treegraph
 
Reported: 2012-06-21 23:10 PDT by :Benjamin Peterson
Modified: 2012-08-31 18:47 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
first pass (18.48 KB, patch)
2012-06-25 19:17 PDT, :Benjamin Peterson
no flags Details | Diff | Splinter Review
add "expression autopsy" (25.63 KB, patch)
2012-07-10 10:59 PDT, :Benjamin Peterson
no flags Details | Diff | Splinter Review
add "expression autopsy" (26.75 KB, patch)
2012-07-13 11:28 PDT, :Benjamin Peterson
no flags Details | Diff | Splinter Review
add "expression autopsy" (26.73 KB, patch)
2012-07-18 12:11 PDT, :Benjamin Peterson
no flags Details | Diff | Splinter Review
error decompiler (38.10 KB, patch)
2012-07-23 21:41 PDT, :Benjamin Peterson
no flags Details | Diff | Splinter Review
error decompiler (38.09 KB, patch)
2012-07-23 21:47 PDT, :Benjamin Peterson
gary: feedback-
Details | Diff | Splinter Review
error decompiler (38.37 KB, patch)
2012-07-24 00:37 PDT, :Benjamin Peterson
luke: review+
choller: feedback-
Details | Diff | Splinter Review
error decompiler (38.66 KB, patch)
2012-07-24 14:14 PDT, :Benjamin Peterson
choller: feedback-
Details | Diff | Splinter Review

Description :Benjamin Peterson 2012-06-21 23:10:27 PDT
This will replace the last usage of the decompiler. (For Function.prototype.toString(), see bug 761723).
Comment 1 :Benjamin Peterson 2012-06-25 19:17:08 PDT
Created attachment 636573 [details] [diff] [review]
first pass

Here's an initial implementation. It supports
- variables (there are lots of cases to handle here...)
- various atom things: true/false/undefined/null
- strings and numbers
- property and element access

I don't know how much more needs to be added for this to be sufficient replacement for the decompiler. Thoughts are appreciated.

I would also like guidance on what to do when the analyzer doesn't know what to do with an expression. At the moment "(intermediate value)" is returned. So if you have an expression like "(a + b).prop", the analyzer gives you "(intermediate value).prop". Maybe we should just fallback if we can't handle any part of the expression.
Comment 2 :Benjamin Peterson 2012-07-10 10:59:06 PDT
Created attachment 640684 [details] [diff] [review]
add "expression autopsy"

A few fixes over the last one. I also "implemented" calls. They show up with a "..." in the parameter list.
Comment 3 Luke Wagner [:luke] 2012-07-12 13:02:45 PDT
Comment on attachment 640684 [details] [diff] [review]
add "expression autopsy"

Review of attachment 640684 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking great, nice work!  I had some high-level code questions (interleaved with small nits) to ask about first:

::: js/src/jit-test/tests/basic/expression-autopsy.js
@@ +34,5 @@
> +            Function("o", "undef", "function myfunc() { return o + undef; }\n" + statement),
> +            // Let block
> +            Function("let (o, undef) { " + statement + " }"),
> +            // Aliased let block
> +            Function("o", "undef", "let (o, undef) { " + statement + " }"),

This should produce shadowing, but not aliasing.  To alias, you'll probably want a new test-case that touches o and undef from a nested closure.

::: js/src/jsopcode.cpp
@@ +5729,5 @@
> +    bool quote(JSString *s, uint32_t quote);
> +};
> +
> +ptrdiff_t
> +ExpressionAutopsy::considerPC(jsbytecode *pc)

I may be missing something, but could the return value be a bool?  That is, it seems like the outermost considerPC call would return a pointer to the beginning of the sprint buffer and all nested calls are just being used to test for failure.

@@ +5760,5 @@
> +        unsigned slot = GET_ARGNO(pc);
> +        JSAtom *atom = getArgOrVar(slot);
> +        if (!atom)
> +            // Destructuring
> +            break;

multi-line branch needs { }, so either brace yourself or, my preference, put the comment on the same line as break.

@@ +5773,5 @@
> +            if (!atom) {
> +                // Destructuring
> +                i -= script->nfixed;
> +                JS_ASSERT(i <= unsigned(pcdepth));
> +                return considerPC(pcstack[i]);

Does this case really need to be handled (instead of just returning '(intermediate value)')?  (How would the value be consumed?)

@@ +5791,5 @@
> +      case JSOP_LENGTH:
> +      case JSOP_GETPROP:
> +      case JSOP_CALLPROP: {
> +        JSAtom *prop = (op == JSOP_LENGTH) ?
> +            cx->runtime->atomState.lengthAtom : loadAtom(pc);

I think this fits on 1 line.  If not, the style is:
blah = condition
       ? then
       : else

@@ +5966,5 @@
> +        JS_ASSERT(spindex < 0);
> +        pcdepth += spindex;
> +        if (pcdepth < 0)
> +            return false;
> +        valuepc = pcstack[pcdepth];

IIUC, valuepc is initialized with the current pc and then maybe gets fixed up here.  Instead of this implicit in/out parameter, could you change the signature of initPC to be:

  bool initPC(int spindex, Value v, jsbytecode *pcCurrent, jsbytecode **pcToExamine)

Also, since 'valuepc' isn't accessed in considerPC at all, it seems like 'valuepc' shouldn't be a member variable at all.

More generally, it seems that examineExpressions/initPC could be non-member functions.  I think this would be good b/c, as is, ExpressionAutopsy is in one of two states: initializing and examining.  It would be good if ExpressionAutopsy only represented the latter and the former used purely function-local state.

@@ +6024,5 @@
>  char *
>  js_DecompileValueGenerator(JSContext *cx, int spindex, jsval v,
>                             JSString *fallback)
>  {
> +    ExpressionAutopsy eas(cx);

s/eas/ea/ ?

@@ +6026,5 @@
>                             JSString *fallback)
>  {
> +    ExpressionAutopsy eas(cx);
> +    char *result = eas.examineExpression(spindex, v);
> +    if (result) {

How about having examineExpression take a JSAutoByteString* as an outparam and giving it a bool success return value?  'false' would mean "oom, so, an error" and if JSAutoByteString::ptr() == NULL would mean "wasn't able to decompile".  This would also simplify the error-handling logic in examineExpressions so that it could just return immediately on error w/o having to set 'oom'.

@@ +6028,5 @@
> +    ExpressionAutopsy eas(cx);
> +    char *result = eas.examineExpression(spindex, v);
> +    if (result) {
> +        if (strlen(result) != strlen("(intermediate value)") ||
> +            strcmp(result, "(intermediate value)"))

Could you remove the strlen-comparing disjunct?
Comment 4 :Benjamin Peterson 2012-07-13 11:27:58 PDT
(In reply to Luke Wagner [:luke] from comment #3)
> Comment on attachment 640684 [details] [diff] [review]
> add "expression autopsy"
> 
> Review of attachment 640684 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsopcode.cpp
> @@ +5729,5 @@
> > +    bool quote(JSString *s, uint32_t quote);
> > +};
> > +
> > +ptrdiff_t
> > +ExpressionAutopsy::considerPC(jsbytecode *pc)
> 
> I may be missing something, but could the return value be a bool?  That is,
> it seems like the outermost considerPC call would return a pointer to the
> beginning of the sprint buffer and all nested calls are just being used to
> test for failure.

Yeah, I originally made it complicated with this offset stuff because I thought I might need to do that icky decompiler thing of passing around offsets. Apparently, not! :)

> @@ +5773,5 @@
> > +            if (!atom) {
> > +                // Destructuring
> > +                i -= script->nfixed;
> > +                JS_ASSERT(i <= unsigned(pcdepth));
> > +                return considerPC(pcstack[i]);
> 
> Does this case really need to be handled (instead of just returning
> '(intermediate value)')?  (How would the value be consumed?)

There was a jsreftest that triggered a need for it. Specifically this beauty:

var [a, b, [c0, c1]] = [x, x, x];

where x is null.
Comment 5 :Benjamin Peterson 2012-07-13 11:28:51 PDT
Created attachment 641973 [details] [diff] [review]
add "expression autopsy"

Address review comments. Also support for binary and unary operations.
Comment 6 Luke Wagner [:luke] 2012-07-16 09:32:34 PDT
(In reply to Benjamin Peterson from comment #4)
> > Does this case really need to be handled (instead of just returning
> > '(intermediate value)')?  (How would the value be consumed?)
> 
> There was a jsreftest that triggered a need for it. Specifically this beauty:

I wasn't asking if it was *possible*, just whether it was worth handling :)
Comment 7 Luke Wagner [:luke] 2012-07-16 09:33:25 PDT
(and by "handling", I mean "doing something better than (intermediate value)" not "not crashing" :)
Comment 8 :Benjamin Peterson 2012-07-16 09:39:05 PDT
(In reply to Luke Wagner [:luke] from comment #6)
> (In reply to Benjamin Peterson from comment #4)
> > > Does this case really need to be handled (instead of just returning
> > > '(intermediate value)')?  (How would the value be consumed?)
> > 
> > There was a jsreftest that triggered a need for it. Specifically this beauty:
> 
> I wasn't asking if it was *possible*, just whether it was worth handling :)

It's 3 lines of code to do so, so why not?
Comment 9 :Benjamin Peterson 2012-07-18 12:11:20 PDT
Created attachment 643495 [details] [diff] [review]
add "expression autopsy"

Rebased on BindingNames -> BindingVector change.
Comment 10 Luke Wagner [:luke] 2012-07-20 15:28:49 PDT
Comment on attachment 643495 [details] [diff] [review]
add "expression autopsy"

Review of attachment 643495 [details] [diff] [review]:
-----------------------------------------------------------------

Looking great, almost done!

::: js/src/jsopcode.cpp
@@ +5704,5 @@
>  
>      return JS_TRUE;
>  }
>  
> +struct ExpressionAutopsy

Now that you are A+ #1 expert on decompilation, could you put a little paragraph describing the algorithm?  Like, the basic strategy of:
 - ReconstructPCStack
 - switch on op, recurse on lhs, sprint op, recurse on rhs
and how this essentially gives us an inorder dfs traversal of the expression tree on an otherwise linear bytecode.

@@ +5708,5 @@
> +struct ExpressionAutopsy
> +{
> +    JSContext *cx;
> +    StackFrame *fp;
> +    jsbytecode *valuepc;

rm valuepc

@@ +5746,5 @@
> +    
> +    AutoReleasePtr af(cx, pcstack);
> +    int pcdepth = ReconstructPCStack(cx, script, pc, pcstack);
> +    if (pcdepth < 0)
> +        return write("(intermediate value)");

this ReconstructPCStack bit is duplicated here and in FindStartPC; could you factor out some RAII guard that lets you write:

  ReconstructedPCStack pcstack;
  if (!pcstack.init(cx, script))
      return false;

(with a fancy operator[] that bounds checks 'n everything!)

but what about the pcdepth<0 case?  I was curious how this arose and tbh, I don't see how it does: ReconstructPCStack and SimulateOp both return unsigned values casted to ints.  I'm going to tentatively blame this on imacros.  If I haven't overlooked something, could you make ReconstructPCStack infallible?

@@ +5754,5 @@
> +    // None of these stack-writing ops generates novel values.
> +    JS_ASSERT(op != JSOP_CASE && op != JSOP_DUP && op != JSOP_DUP2);
> +
> +    const char *token = CodeToken[op];
> +    if (token) {

if (const char *token = ...)

Also, could you add a comment to the effect "Handle generic unary and binary expressions"?

@@ +5761,5 @@
> +            jssrcnote *sn = js_GetSrcNote(script, pc);
> +            if (!sn || SN_TYPE(sn) != SRC_ASSIGNOP)
> +                return write("(") && considerPC(pcstack[pcdepth - 2]) &&
> +                    write(" ") && write(token) && write(" ") &&
> +                    considerPC(pcstack[pcdepth - 1]) && write(")");

could you write this as a vertical aligned column:

  return write("(") &&
         decompilePC(...) &&
         write(" ") &&
         etc

?

@@ +5794,5 @@
> +            atom = findLetVar(i);
> +            if (!atom) {
> +                // Destructuring
> +                i -= script->nfixed;
> +                JS_ASSERT(i <= unsigned(pcdepth));

won't need assert with fancy operator[]-having pcstack

@@ +5798,5 @@
> +                JS_ASSERT(i <= unsigned(pcdepth));
> +                return considerPC(pcstack[i]);
> +            }
> +        } else {
> +            atom = getArgOrVar(fun->nargs + i);

Could you instead have getArg() and getVar() so that the nargs+i is inside getVar (w/ a comment explaining the layout of localnames).

@@ +5824,5 @@
> +      }
> +      case JSOP_GETELEM:
> +      case JSOP_CALLELEM:
> +        return considerPC(pcstack[pcdepth - 2]) && write("[") &&
> +            considerPC(pcstack[pcdepth - 1]) && write("]");

vertical aligned column please

@@ +5927,5 @@
> +JSAtom *
> +ExpressionAutopsy::findLetVar(unsigned i)
> +{
> +    if (script->hasObjects()) {
> +        for (jsatomid j = 0, n = script->objects()->length; j != n; j++) {

It's unfortunate that we have to do this for let vars.  I guess the reason it works in the old decompiler is that we simulate block push/pop (at enormous complexity).  This has come up a few other times (that's why aliased var embeds the innermost blockid as an immediate).  One of these days we should just write a reusable function GetBlockChainAtPC.  Short of that, we could enumerate all blocks and concatenate the results with " or maybe ", so if you have:
  let (x) { x.foo() }
  let (y) { y.bar() }
and y.bar() has the error we'd print "(x or maybe y).bar()".  Maybe that's silly.  We'll see if anyone notices first :)

@@ +5932,5 @@
> +            JSObject *obj = script->getObject(j);
> +            if (obj->isBlock()) {
> +                uint32_t depth = obj->asBlock().stackDepth();
> +                uint32_t count = obj->asBlock().slotCount();
> +                if (uint32_t(i - depth) < uint32_t(count)) {

i is an index off of fp->slots() (so it includes script->nfixed) whereas obj->asBlock().stackDepth() is an index off of fp->base() (so it does not include script->nfixed).  (The i -= nfixed bit in the old decompiler is hidden in IsVarSlot.)

Could you add a test (presumably with nfixed > 0) which hits this bug?

@@ +5993,5 @@
> +
> +    if (spindex == JSDVG_SEARCH_STACK) {
> +        // We search from fp->sp to base to find the most recently calculated
> +        // value matching v under assumption that it is it that caused
> +        // exception, see bug 328664.

I'd leave out the "see bug x" since the comment seems to stand on its own w/o it.

@@ +6041,5 @@
> +
> +    ExpressionAutopsy ea(cx, script, fun);
> +    if (!ea.init())
> +        return false;
> +    if (!ea.considerPC(valuepc))

Lastly and lamely, I'm afraid I think it would be best for this patch to stick to some more direct names:
  s/ExamineExpression/DecompileExpression/
  s/ExpressionAutopsy/ExpressionDecompiler/
  s/considerPC/decompilePC/
Comment 11 :Benjamin Peterson 2012-07-23 21:19:04 PDT
(In reply to Luke Wagner [:luke] from comment #10)
> but what about the pcdepth<0 case?  I was curious how this arose and tbh, I
> don't see how it does: ReconstructPCStack and SimulateOp both return
> unsigned values casted to ints.  I'm going to tentatively blame this on
> imacros.  If I haven't overlooked something, could you make
> ReconstructPCStack infallible?

I think the idea was some LOCAL_ASSERT in ReconstructPCStack could cause it to fail.

> Lastly and lamely, I'm afraid I think it would be best for this patch to
> stick to some more direct names:
>   s/ExamineExpression/DecompileExpression/
>   s/ExpressionAutopsy/ExpressionDecompiler/
>   s/considerPC/decompilePC/

Pity. I was hoping to someday do

$ grep -iR js/src decompile
$
Comment 12 :Benjamin Peterson 2012-07-23 21:41:22 PDT
Created attachment 645187 [details] [diff] [review]
error decompiler

Here's a version addressing review comments.

I added PCStack. Tell me if you think its indexing is too weird.

I dredged GetBlockChainAtPC out of the trash heap to implement let variables correctly. I kept it separate from the rest of decompiler because Luke said it might be useful elsewhere.

Fuzzers: Can you break this a bit?
Comment 13 :Benjamin Peterson 2012-07-23 21:47:45 PDT
Created attachment 645188 [details] [diff] [review]
error decompiler

Small fix.
Comment 14 Gary Kwong [:gkw] [:nth10sd] 2012-07-23 21:50:20 PDT
Comment on attachment 645188 [details] [diff] [review]
error decompiler

Setting feedback? from decoder for fuzzer help. I'll also be helping out but heading to BH / Defcon tomorrow.
Comment 15 Gary Kwong [:gkw] [:nth10sd] 2012-07-23 23:00:52 PDT
Comment on attachment 645188 [details] [diff] [review]
error decompiler

try {
    let h
    {
        let w
        t()
    }
} catch (e) {}
let(w) {
    w.s
}

causes Assertion failure: blockChain, in a debug shell (pass testcase in as a CLI argument) and a likely null crash [@ ExpressionDecompiler::findLetVar] in an opt shell.

feedback- till a new patch.
Comment 16 :Benjamin Peterson 2012-07-24 00:37:26 PDT
Created attachment 645216 [details] [diff] [review]
error decompiler

Fix fuzz bug.
Comment 17 Christian Holler (:decoder) 2012-07-24 08:29:11 PDT
I'm on this now, expecting first results in a few hours :)
Comment 18 Luke Wagner [:luke] 2012-07-24 09:12:57 PDT
Comment on attachment 645216 [details] [diff] [review]
error decompiler

Review of attachment 645216 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!  r+ with small nits (mostly in comments :)

::: js/src/jsopcode.cpp
@@ +5791,5 @@
> +{
> +    if (i < 0) {
> +        i += depth_;
> +        JS_ASSERT(i >= 0);
> +    }

hah, I like it

@@ +5805,5 @@
> + *
> + * Here's the basic algorithm:
> + *
> + * 1. Find the stack location of the value whose expression we wish to
> + * decompile. The error handler can explicitly give this. Otherwise, we search

s/error handler/caller/, s/give this/pass this as an argument/.

@@ +5806,5 @@
> + * Here's the basic algorithm:
> + *
> + * 1. Find the stack location of the value whose expression we wish to
> + * decompile. The error handler can explicitly give this. Otherwise, we search
> + * backwards down the stack for it.

s/it/the offending value./

@@ +5816,5 @@
> + * opcode responsible for pushing the value we want to decompile.
> + *
> + * 3. Pass the opcode to decompilePC. decompilePC is the main decompiler
> + * routine, responsible for a string representation of the expression that
> + * generated a certain stack location. It looks at the opcode at the pc passed

s/at the pc/of the pc/

@@ +5817,5 @@
> + *
> + * 3. Pass the opcode to decompilePC. decompilePC is the main decompiler
> + * routine, responsible for a string representation of the expression that
> + * generated a certain stack location. It looks at the opcode at the pc passed
> + * to it and returns the JS source equivalent of it.

to much "it" in this sentence

@@ +5821,5 @@
> + * to it and returns the JS source equivalent of it.
> + *
> + * 4. Expressions can, of course, contain subexpressions. For example, the
> + * literals "4" and "5" are subexpressions of the addition operator in "4 +
> + * 5". If we need to decompile a subexpression, we call decompilePC

instead of "we call decompilePC" perhaps "we recursively execute step 2 for each of the operands' pcs"?

@@ +5824,5 @@
> + * literals "4" and "5" are subexpressions of the addition operator in "4 +
> + * 5". If we need to decompile a subexpression, we call decompilePC
> + * recursively. The result is a depth-first traversal of the expression tree.
> + *
> + *

Nice comment!  (nix the extra " *\n")

@@ +6052,5 @@
> +}
> +
> +JSAtom *
> +ExpressionDecompiler::findLetVar(jsbytecode *pc, unsigned i)
> +{

Could you s/i/depth/ and JS_ASSERT(depth < StackDepth(script))?

@@ +6059,5 @@
> +        JS_ASSERT(chain);
> +        JS_ASSERT(chain->isBlock());
> +        do {
> +            BlockObject &block = chain->asBlock();
> +            uint32_t depth = block.stackDepth();

Guess that means renaming this to 'blockDepth' (and 'count' to 'blockCount', for symmetry).

@@ +6067,5 @@
> +                    const Shape &shape = r.front();
> +                    if (shape.shortid() == int(i - depth) &&
> +                        // This happens for empty destructuring variables in
> +                        // lets. They can be safely ignored.
> +                        JSID_IS_ATOM(shape.propid()))

Multi-line conditions require braces (and kinda look gross).  Perhaps instead you could give the JSID_IS_ATOM check and comment their own statement inside "if (shape.shortid() == ...) {" ?

@@ +6071,5 @@
> +                        JSID_IS_ATOM(shape.propid()))
> +                        return JSID_TO_ATOM(shape.propid());
> +                }
> +            }
> +        } while ((chain = chain->enclosingScope())->isBlock());

We generally (in new code) try to avoid using the result of an assignment unless it really helps.  For this case, could you just put the assignment on the line before the while?

@@ +6190,3 @@
>      if (!fallback) {
> +        if (v.isUndefined())
> +            // Prevent users from seeing "(void 0)"

Comment before 'if' to avoid multi-line then-branch-requiring-curlies.
Comment 19 Christian Holler (:decoder) 2012-07-24 10:57:01 PDT
Comment on attachment 645216 [details] [diff] [review]
error decompiler

Following test asserts:

var [{ x }] = [null, {}];

with

Assertion failure: chain, at jsopcode.cpp:6059
Comment 20 :Benjamin Peterson 2012-07-24 14:14:39 PDT
Created attachment 645489 [details] [diff] [review]
error decompiler

Fix fuzz bug and review comments.
Comment 21 Christian Holler (:decoder) 2012-07-24 17:04:35 PDT
Comment on attachment 645489 [details] [diff] [review]
error decompiler

Following test asserts:

function gen() {  }
label: {
  for (let i in gen())
        break label;
  function gen2() {
      yield 2;
  }
  for (let i in gen2())
    actual = i(0);
}


Assertion failure: blockChain, at jsopcode.cpp:5740
Comment 22 :Benjamin Peterson 2012-07-24 17:32:38 PDT
(In reply to Christian Holler (:decoder) from comment #21)
> Assertion failure: blockChain, at jsopcode.cpp:5740

This was a preexisting bug that is exposed by this. Bug 777181.
Comment 24 Ryan VanderMeulen [:RyanVM] 2012-08-02 19:08:55 PDT
https://hg.mozilla.org/mozilla-central/rev/7852a8f73313
Comment 25 Ryan VanderMeulen [:RyanVM] 2012-08-31 18:47:19 PDT
https://hg.mozilla.org/mozilla-central/rev/db57e4b747b7

Note You need to log in before you can comment on or make changes to this bug.