Closed Bug 767274 Opened 12 years ago Closed 12 years ago

write a very simple expression-only decompiler for js_DecompileValueGenerator

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: Benjamin, Assigned: Benjamin)

References

Details

(Whiteboard: [js:t])

Attachments

(1 file, 7 obsolete files)

This will replace the last usage of the decompiler. (For Function.prototype.toString(), see bug 761723).
Whiteboard: [js:t]
Blocks: 718969
Attached patch first pass (obsolete) — Splinter Review
Here's an initial implementation. It supports
- variables (there are lots of cases to handle here...)
- various atom things: true/false/undefined/null
- strings and numbers
- property and element access

I don't know how much more needs to be added for this to be sufficient replacement for the decompiler. Thoughts are appreciated.

I would also like guidance on what to do when the analyzer doesn't know what to do with an expression. At the moment "(intermediate value)" is returned. So if you have an expression like "(a + b).prop", the analyzer gives you "(intermediate value).prop". Maybe we should just fallback if we can't handle any part of the expression.
Attached patch add "expression autopsy" (obsolete) — Splinter Review
A few fixes over the last one. I also "implemented" calls. They show up with a "..." in the parameter list.
Attachment #636573 - Attachment is obsolete: true
Attachment #640684 - Flags: review?(luke)
Comment on attachment 640684 [details] [diff] [review]
add "expression autopsy"

Review of attachment 640684 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking great, nice work!  I had some high-level code questions (interleaved with small nits) to ask about first:

::: js/src/jit-test/tests/basic/expression-autopsy.js
@@ +34,5 @@
> +            Function("o", "undef", "function myfunc() { return o + undef; }\n" + statement),
> +            // Let block
> +            Function("let (o, undef) { " + statement + " }"),
> +            // Aliased let block
> +            Function("o", "undef", "let (o, undef) { " + statement + " }"),

This should produce shadowing, but not aliasing.  To alias, you'll probably want a new test-case that touches o and undef from a nested closure.

::: js/src/jsopcode.cpp
@@ +5729,5 @@
> +    bool quote(JSString *s, uint32_t quote);
> +};
> +
> +ptrdiff_t
> +ExpressionAutopsy::considerPC(jsbytecode *pc)

I may be missing something, but could the return value be a bool?  That is, it seems like the outermost considerPC call would return a pointer to the beginning of the sprint buffer and all nested calls are just being used to test for failure.

@@ +5760,5 @@
> +        unsigned slot = GET_ARGNO(pc);
> +        JSAtom *atom = getArgOrVar(slot);
> +        if (!atom)
> +            // Destructuring
> +            break;

multi-line branch needs { }, so either brace yourself or, my preference, put the comment on the same line as break.

@@ +5773,5 @@
> +            if (!atom) {
> +                // Destructuring
> +                i -= script->nfixed;
> +                JS_ASSERT(i <= unsigned(pcdepth));
> +                return considerPC(pcstack[i]);

Does this case really need to be handled (instead of just returning '(intermediate value)')?  (How would the value be consumed?)

@@ +5791,5 @@
> +      case JSOP_LENGTH:
> +      case JSOP_GETPROP:
> +      case JSOP_CALLPROP: {
> +        JSAtom *prop = (op == JSOP_LENGTH) ?
> +            cx->runtime->atomState.lengthAtom : loadAtom(pc);

I think this fits on 1 line.  If not, the style is:
blah = condition
       ? then
       : else

@@ +5966,5 @@
> +        JS_ASSERT(spindex < 0);
> +        pcdepth += spindex;
> +        if (pcdepth < 0)
> +            return false;
> +        valuepc = pcstack[pcdepth];

IIUC, valuepc is initialized with the current pc and then maybe gets fixed up here.  Instead of this implicit in/out parameter, could you change the signature of initPC to be:

  bool initPC(int spindex, Value v, jsbytecode *pcCurrent, jsbytecode **pcToExamine)

Also, since 'valuepc' isn't accessed in considerPC at all, it seems like 'valuepc' shouldn't be a member variable at all.

More generally, it seems that examineExpressions/initPC could be non-member functions.  I think this would be good b/c, as is, ExpressionAutopsy is in one of two states: initializing and examining.  It would be good if ExpressionAutopsy only represented the latter and the former used purely function-local state.

@@ +6024,5 @@
>  char *
>  js_DecompileValueGenerator(JSContext *cx, int spindex, jsval v,
>                             JSString *fallback)
>  {
> +    ExpressionAutopsy eas(cx);

s/eas/ea/ ?

@@ +6026,5 @@
>                             JSString *fallback)
>  {
> +    ExpressionAutopsy eas(cx);
> +    char *result = eas.examineExpression(spindex, v);
> +    if (result) {

How about having examineExpression take a JSAutoByteString* as an outparam and giving it a bool success return value?  'false' would mean "oom, so, an error" and if JSAutoByteString::ptr() == NULL would mean "wasn't able to decompile".  This would also simplify the error-handling logic in examineExpressions so that it could just return immediately on error w/o having to set 'oom'.

@@ +6028,5 @@
> +    ExpressionAutopsy eas(cx);
> +    char *result = eas.examineExpression(spindex, v);
> +    if (result) {
> +        if (strlen(result) != strlen("(intermediate value)") ||
> +            strcmp(result, "(intermediate value)"))

Could you remove the strlen-comparing disjunct?
Attachment #640684 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #3)
> Comment on attachment 640684 [details] [diff] [review]
> add "expression autopsy"
> 
> Review of attachment 640684 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsopcode.cpp
> @@ +5729,5 @@
> > +    bool quote(JSString *s, uint32_t quote);
> > +};
> > +
> > +ptrdiff_t
> > +ExpressionAutopsy::considerPC(jsbytecode *pc)
> 
> I may be missing something, but could the return value be a bool?  That is,
> it seems like the outermost considerPC call would return a pointer to the
> beginning of the sprint buffer and all nested calls are just being used to
> test for failure.

Yeah, I originally made it complicated with this offset stuff because I thought I might need to do that icky decompiler thing of passing around offsets. Apparently, not! :)

> @@ +5773,5 @@
> > +            if (!atom) {
> > +                // Destructuring
> > +                i -= script->nfixed;
> > +                JS_ASSERT(i <= unsigned(pcdepth));
> > +                return considerPC(pcstack[i]);
> 
> Does this case really need to be handled (instead of just returning
> '(intermediate value)')?  (How would the value be consumed?)

There was a jsreftest that triggered a need for it. Specifically this beauty:

var [a, b, [c0, c1]] = [x, x, x];

where x is null.
Attached patch add "expression autopsy" (obsolete) — Splinter Review
Address review comments. Also support for binary and unary operations.
Attachment #640684 - Attachment is obsolete: true
Attachment #641973 - Flags: review?(luke)
(In reply to Benjamin Peterson from comment #4)
> > Does this case really need to be handled (instead of just returning
> > '(intermediate value)')?  (How would the value be consumed?)
> 
> There was a jsreftest that triggered a need for it. Specifically this beauty:

I wasn't asking if it was *possible*, just whether it was worth handling :)
(and by "handling", I mean "doing something better than (intermediate value)" not "not crashing" :)
(In reply to Luke Wagner [:luke] from comment #6)
> (In reply to Benjamin Peterson from comment #4)
> > > Does this case really need to be handled (instead of just returning
> > > '(intermediate value)')?  (How would the value be consumed?)
> > 
> > There was a jsreftest that triggered a need for it. Specifically this beauty:
> 
> I wasn't asking if it was *possible*, just whether it was worth handling :)

It's 3 lines of code to do so, so why not?
OS: Linux → All
Hardware: x86_64 → All
Attached patch add "expression autopsy" (obsolete) — Splinter Review
Rebased on BindingNames -> BindingVector change.
Attachment #641973 - Attachment is obsolete: true
Attachment #641973 - Flags: review?(luke)
Attachment #643495 - Flags: review?(luke)
Comment on attachment 643495 [details] [diff] [review]
add "expression autopsy"

Review of attachment 643495 [details] [diff] [review]:
-----------------------------------------------------------------

Looking great, almost done!

::: js/src/jsopcode.cpp
@@ +5704,5 @@
>  
>      return JS_TRUE;
>  }
>  
> +struct ExpressionAutopsy

Now that you are A+ #1 expert on decompilation, could you put a little paragraph describing the algorithm?  Like, the basic strategy of:
 - ReconstructPCStack
 - switch on op, recurse on lhs, sprint op, recurse on rhs
and how this essentially gives us an inorder dfs traversal of the expression tree on an otherwise linear bytecode.

@@ +5708,5 @@
> +struct ExpressionAutopsy
> +{
> +    JSContext *cx;
> +    StackFrame *fp;
> +    jsbytecode *valuepc;

rm valuepc

@@ +5746,5 @@
> +    
> +    AutoReleasePtr af(cx, pcstack);
> +    int pcdepth = ReconstructPCStack(cx, script, pc, pcstack);
> +    if (pcdepth < 0)
> +        return write("(intermediate value)");

this ReconstructPCStack bit is duplicated here and in FindStartPC; could you factor out some RAII guard that lets you write:

  ReconstructedPCStack pcstack;
  if (!pcstack.init(cx, script))
      return false;

(with a fancy operator[] that bounds checks 'n everything!)

but what about the pcdepth<0 case?  I was curious how this arose and tbh, I don't see how it does: ReconstructPCStack and SimulateOp both return unsigned values casted to ints.  I'm going to tentatively blame this on imacros.  If I haven't overlooked something, could you make ReconstructPCStack infallible?

@@ +5754,5 @@
> +    // None of these stack-writing ops generates novel values.
> +    JS_ASSERT(op != JSOP_CASE && op != JSOP_DUP && op != JSOP_DUP2);
> +
> +    const char *token = CodeToken[op];
> +    if (token) {

if (const char *token = ...)

Also, could you add a comment to the effect "Handle generic unary and binary expressions"?

@@ +5761,5 @@
> +            jssrcnote *sn = js_GetSrcNote(script, pc);
> +            if (!sn || SN_TYPE(sn) != SRC_ASSIGNOP)
> +                return write("(") && considerPC(pcstack[pcdepth - 2]) &&
> +                    write(" ") && write(token) && write(" ") &&
> +                    considerPC(pcstack[pcdepth - 1]) && write(")");

could you write this as a vertical aligned column:

  return write("(") &&
         decompilePC(...) &&
         write(" ") &&
         etc

?

@@ +5794,5 @@
> +            atom = findLetVar(i);
> +            if (!atom) {
> +                // Destructuring
> +                i -= script->nfixed;
> +                JS_ASSERT(i <= unsigned(pcdepth));

won't need assert with fancy operator[]-having pcstack

@@ +5798,5 @@
> +                JS_ASSERT(i <= unsigned(pcdepth));
> +                return considerPC(pcstack[i]);
> +            }
> +        } else {
> +            atom = getArgOrVar(fun->nargs + i);

Could you instead have getArg() and getVar() so that the nargs+i is inside getVar (w/ a comment explaining the layout of localnames).

@@ +5824,5 @@
> +      }
> +      case JSOP_GETELEM:
> +      case JSOP_CALLELEM:
> +        return considerPC(pcstack[pcdepth - 2]) && write("[") &&
> +            considerPC(pcstack[pcdepth - 1]) && write("]");

vertical aligned column please

@@ +5927,5 @@
> +JSAtom *
> +ExpressionAutopsy::findLetVar(unsigned i)
> +{
> +    if (script->hasObjects()) {
> +        for (jsatomid j = 0, n = script->objects()->length; j != n; j++) {

It's unfortunate that we have to do this for let vars.  I guess the reason it works in the old decompiler is that we simulate block push/pop (at enormous complexity).  This has come up a few other times (that's why aliased var embeds the innermost blockid as an immediate).  One of these days we should just write a reusable function GetBlockChainAtPC.  Short of that, we could enumerate all blocks and concatenate the results with " or maybe ", so if you have:
  let (x) { x.foo() }
  let (y) { y.bar() }
and y.bar() has the error we'd print "(x or maybe y).bar()".  Maybe that's silly.  We'll see if anyone notices first :)

@@ +5932,5 @@
> +            JSObject *obj = script->getObject(j);
> +            if (obj->isBlock()) {
> +                uint32_t depth = obj->asBlock().stackDepth();
> +                uint32_t count = obj->asBlock().slotCount();
> +                if (uint32_t(i - depth) < uint32_t(count)) {

i is an index off of fp->slots() (so it includes script->nfixed) whereas obj->asBlock().stackDepth() is an index off of fp->base() (so it does not include script->nfixed).  (The i -= nfixed bit in the old decompiler is hidden in IsVarSlot.)

Could you add a test (presumably with nfixed > 0) which hits this bug?

@@ +5993,5 @@
> +
> +    if (spindex == JSDVG_SEARCH_STACK) {
> +        // We search from fp->sp to base to find the most recently calculated
> +        // value matching v under assumption that it is it that caused
> +        // exception, see bug 328664.

I'd leave out the "see bug x" since the comment seems to stand on its own w/o it.

@@ +6041,5 @@
> +
> +    ExpressionAutopsy ea(cx, script, fun);
> +    if (!ea.init())
> +        return false;
> +    if (!ea.considerPC(valuepc))

Lastly and lamely, I'm afraid I think it would be best for this patch to stick to some more direct names:
  s/ExamineExpression/DecompileExpression/
  s/ExpressionAutopsy/ExpressionDecompiler/
  s/considerPC/decompilePC/
Attachment #643495 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #10)
> but what about the pcdepth<0 case?  I was curious how this arose and tbh, I
> don't see how it does: ReconstructPCStack and SimulateOp both return
> unsigned values casted to ints.  I'm going to tentatively blame this on
> imacros.  If I haven't overlooked something, could you make
> ReconstructPCStack infallible?

I think the idea was some LOCAL_ASSERT in ReconstructPCStack could cause it to fail.

> Lastly and lamely, I'm afraid I think it would be best for this patch to
> stick to some more direct names:
>   s/ExamineExpression/DecompileExpression/
>   s/ExpressionAutopsy/ExpressionDecompiler/
>   s/considerPC/decompilePC/

Pity. I was hoping to someday do

$ grep -iR js/src decompile
$
Attached patch error decompiler (obsolete) — Splinter Review
Here's a version addressing review comments.

I added PCStack. Tell me if you think its indexing is too weird.

I dredged GetBlockChainAtPC out of the trash heap to implement let variables correctly. I kept it separate from the rest of decompiler because Luke said it might be useful elsewhere.

Fuzzers: Can you break this a bit?
Attachment #643495 - Attachment is obsolete: true
Attachment #645187 - Flags: review?(luke)
Attached patch error decompiler (obsolete) — Splinter Review
Small fix.
Attachment #645187 - Attachment is obsolete: true
Attachment #645187 - Flags: review?(luke)
Attachment #645188 - Flags: review?(luke)
Comment on attachment 645188 [details] [diff] [review]
error decompiler

Setting feedback? from decoder for fuzzer help. I'll also be helping out but heading to BH / Defcon tomorrow.
Attachment #645188 - Flags: feedback?(gary)
Attachment #645188 - Flags: feedback?(choller)
Comment on attachment 645188 [details] [diff] [review]
error decompiler

try {
    let h
    {
        let w
        t()
    }
} catch (e) {}
let(w) {
    w.s
}

causes Assertion failure: blockChain, in a debug shell (pass testcase in as a CLI argument) and a likely null crash [@ ExpressionDecompiler::findLetVar] in an opt shell.

feedback- till a new patch.
Attachment #645188 - Flags: feedback?(gary) → feedback-
Attached patch error decompiler (obsolete) — Splinter Review
Fix fuzz bug.
Attachment #645188 - Attachment is obsolete: true
Attachment #645188 - Flags: review?(luke)
Attachment #645188 - Flags: feedback?(choller)
Attachment #645216 - Flags: review?(luke)
I'm on this now, expecting first results in a few hours :)
Comment on attachment 645216 [details] [diff] [review]
error decompiler

Review of attachment 645216 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!  r+ with small nits (mostly in comments :)

::: js/src/jsopcode.cpp
@@ +5791,5 @@
> +{
> +    if (i < 0) {
> +        i += depth_;
> +        JS_ASSERT(i >= 0);
> +    }

hah, I like it

@@ +5805,5 @@
> + *
> + * Here's the basic algorithm:
> + *
> + * 1. Find the stack location of the value whose expression we wish to
> + * decompile. The error handler can explicitly give this. Otherwise, we search

s/error handler/caller/, s/give this/pass this as an argument/.

@@ +5806,5 @@
> + * Here's the basic algorithm:
> + *
> + * 1. Find the stack location of the value whose expression we wish to
> + * decompile. The error handler can explicitly give this. Otherwise, we search
> + * backwards down the stack for it.

s/it/the offending value./

@@ +5816,5 @@
> + * opcode responsible for pushing the value we want to decompile.
> + *
> + * 3. Pass the opcode to decompilePC. decompilePC is the main decompiler
> + * routine, responsible for a string representation of the expression that
> + * generated a certain stack location. It looks at the opcode at the pc passed

s/at the pc/of the pc/

@@ +5817,5 @@
> + *
> + * 3. Pass the opcode to decompilePC. decompilePC is the main decompiler
> + * routine, responsible for a string representation of the expression that
> + * generated a certain stack location. It looks at the opcode at the pc passed
> + * to it and returns the JS source equivalent of it.

to much "it" in this sentence

@@ +5821,5 @@
> + * to it and returns the JS source equivalent of it.
> + *
> + * 4. Expressions can, of course, contain subexpressions. For example, the
> + * literals "4" and "5" are subexpressions of the addition operator in "4 +
> + * 5". If we need to decompile a subexpression, we call decompilePC

instead of "we call decompilePC" perhaps "we recursively execute step 2 for each of the operands' pcs"?

@@ +5824,5 @@
> + * literals "4" and "5" are subexpressions of the addition operator in "4 +
> + * 5". If we need to decompile a subexpression, we call decompilePC
> + * recursively. The result is a depth-first traversal of the expression tree.
> + *
> + *

Nice comment!  (nix the extra " *\n")

@@ +6052,5 @@
> +}
> +
> +JSAtom *
> +ExpressionDecompiler::findLetVar(jsbytecode *pc, unsigned i)
> +{

Could you s/i/depth/ and JS_ASSERT(depth < StackDepth(script))?

@@ +6059,5 @@
> +        JS_ASSERT(chain);
> +        JS_ASSERT(chain->isBlock());
> +        do {
> +            BlockObject &block = chain->asBlock();
> +            uint32_t depth = block.stackDepth();

Guess that means renaming this to 'blockDepth' (and 'count' to 'blockCount', for symmetry).

@@ +6067,5 @@
> +                    const Shape &shape = r.front();
> +                    if (shape.shortid() == int(i - depth) &&
> +                        // This happens for empty destructuring variables in
> +                        // lets. They can be safely ignored.
> +                        JSID_IS_ATOM(shape.propid()))

Multi-line conditions require braces (and kinda look gross).  Perhaps instead you could give the JSID_IS_ATOM check and comment their own statement inside "if (shape.shortid() == ...) {" ?

@@ +6071,5 @@
> +                        JSID_IS_ATOM(shape.propid()))
> +                        return JSID_TO_ATOM(shape.propid());
> +                }
> +            }
> +        } while ((chain = chain->enclosingScope())->isBlock());

We generally (in new code) try to avoid using the result of an assignment unless it really helps.  For this case, could you just put the assignment on the line before the while?

@@ +6190,3 @@
>      if (!fallback) {
> +        if (v.isUndefined())
> +            // Prevent users from seeing "(void 0)"

Comment before 'if' to avoid multi-line then-branch-requiring-curlies.
Attachment #645216 - Flags: review?(luke) → review+
Comment on attachment 645216 [details] [diff] [review]
error decompiler

Following test asserts:

var [{ x }] = [null, {}];

with

Assertion failure: chain, at jsopcode.cpp:6059
Attachment #645216 - Flags: feedback-
Attached patch error decompilerSplinter Review
Fix fuzz bug and review comments.
Attachment #645216 - Attachment is obsolete: true
Comment on attachment 645489 [details] [diff] [review]
error decompiler

Following test asserts:

function gen() {  }
label: {
  for (let i in gen())
        break label;
  function gen2() {
      yield 2;
  }
  for (let i in gen2())
    actual = i(0);
}


Assertion failure: blockChain, at jsopcode.cpp:5740
Attachment #645489 - Flags: feedback-
Depends on: 777181
(In reply to Christian Holler (:decoder) from comment #21)
> Assertion failure: blockChain, at jsopcode.cpp:5740

This was a preexisting bug that is exposed by this. Bug 777181.
https://hg.mozilla.org/mozilla-central/rev/7852a8f73313
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: