485949 - rewriting decompiler in JS

Reporter

Description

•

15 years ago

As it is well known, the decompiler is a constant source of bugs. Despite the defensive measures that the code takes, occasionally such bugs turn into critical ones. Hence, to mitigate that, I suggest to rewrite the decompiler using JS. 

In additional to the memory safety, due to regexps and closures available in JS, such implementation can be more maintainable. With the JIT available such port should not be significantly slower than the native implementation.

Blake Kaplan (:mrbkap) (inactive)

Comment 1

•

15 years ago

(In reply to comment #0)
> In additional to the memory safety, due to regexps and closures available in
> JS, such implementation can be more maintainable. With the JIT available such
> port should not be significantly slower than the native implementation.

Won't this run into the "interpreters have too many branches, blow the tracer's mind" bug?

Jason Orendorff [:jorendorff]

Comment 2

•

15 years ago

(In reply to comment #1)
> (In reply to comment #0)
> > In additional to the memory safety, due to regexps and closures available in
> > JS, such implementation can be more maintainable. With the JIT available
> > such port should not be significantly slower than the native implementation.
> 
> Won't this run into the "interpreters have too many branches, blow the
> tracer's mind" bug?

I guess it depends on how you interpret the words "significantly slower" wrt decompiler performance.

I'm fine with making the decompiler 50x slower if doing so reduces the PV of its total future cost by half (the cost of patching fuzzbugs against it, but also the tax on new features and other work--see e.g. bug 437521).

(I am obliged to repeat, at times like these, that I think the best plan is to delete the decompiler and do what Webkit does.)

Brendan Eich [:brendan]

Comment 3

•

15 years ago

I hate the decompiler more than any of you, it has cost me serious time.

But we have an API that supports bytecode serialization and decompilation from saved bytecode. Breaking it involves incommensurate costs and benefits from the ones we see in maintaining it.

I like saved ASTs and maybe we should do those, but they seem likely to use strictly more memory than source notes plus bytecode, at a guess. Anyone got numbers showing a closer call? SRC_NEWLINE could easily tilt the balance for vertically spacy code, but most code is dense, yo.

Really, someone should run an experiment if not do a more detailed estimate.

/be

Igor Bukanov

Reporter

Comment 4

•

15 years ago

(In reply to comment #3)
> But we have an API that supports bytecode serialization and decompilation from
> saved bytecode.

What about using Narcissus for parsing and serializing into canonical form saved JS source code saved? This way we can completely remove the native decompiler and any need for source notes from the bytecode, save memory via integrating the JS source storage with browser's cache and stay compatible with users that needs the decompilation using pure JS code.

> 
> I like saved ASTs and maybe we should do those

Last time I have worked with Rhino it used serialized AST for decompilation. The serializer was integrated with the parser so after the parsing phase the user would get both the final tree and the serialized source. It had its own sets of bugs mostly coming from changes in the parser and forgotten updates in the tree serializer or deserializer missing the updates in the tree format. So do we want to replace one native code component with another that would be just little bit less buggy?

Brendan Eich [:brendan]

Comment 5

•

15 years ago

Then we'd have narcissus-out-of-sync bugs. No free lunch, but the API is old and I wouldn't want to mess with it lightly. Is there a better jsopcode.tbl with some kind of DSL that can aid decompilation? A lot of the decompiler is driven solely by the token for the op and its stack uses and def, for the simple stuff (arith. ops, e.g.).

/be

Igor Bukanov

Reporter

Comment 6

•

15 years ago

(In reply to comment #5)
> Then we'd have narcissus-out-of-sync bugs.

We would have bugs in any case and those bugs have to be fixed. The big difference is that with narcissus-based implementation not a single one would be critical while any native code requires spending time to analyze the consequences of the bug. Plus having up-to-date JS-in-JS parser would benefit ES.next work.

> No free lunch, but the API is old
> and I wouldn't want to mess with it lightly.

TM changed a lot of SM internals in a very significant way yet jsapi.h interfaces survived intact that mostly intact. What makes the simple decompilation API so special that their imlementation cannot be replaced?

Brendan Eich [:brendan]

Comment 7

•

15 years ago

(In reply to comment #6)
> TM changed a lot of SM internals in a very significant way yet jsapi.h
> interfaces survived intact that mostly intact.

Tons of work keeping the API going. See bug 463153 and all the greater work done by jorendorff to bail off trace when an API needs the interpreter stack to be there.

> What makes the simple
> decompilation API so special that their imlementation cannot be replaced?

It's just more work. I'm sure we can do it, but if there is a shorter path that is safe enough...

/be

Igor Bukanov

Reporter

Comment 8

•

15 years ago

(In reply to comment #7)
> It's just more work. I'm sure we can do it, but if there is a shorter path that
> is safe enough...

The shorter path is in the comment 0. The idea is to have a simple native object that allows to query bytecode properties, something like an interface to jsopcode.h. Then JS code would use it for decompilation etc.

David Mandelin [:dmandelin]

Comment 9

•

15 years ago

On the saved AST idea, what about saving source start and end locations for expression bytecodes? That could be pretty compact, probably just a word for most cases, and I don't know if all bytecodes need it or just some. We could also enable the saving only if some pref or debug mode or whatever is turned on, so most users would see zero cost.

Igor Bukanov

Reporter

Comment 10

•

15 years ago

(In reply to comment #9)
> On the saved AST idea, what about saving source start and end locations for
> expression bytecodes? 

Is the idea to decompile only statements and use the source as-is for expressions? I think it is a non-starter. For example, this will leave the comments embedded in expression and functions with closures would look really ugly.

David Mandelin [:dmandelin]

Comment 11

•

15 years ago

(In reply to comment #10)
> (In reply to comment #9)
> > On the saved AST idea, what about saving source start and end locations for
> > expression bytecodes? 
> 
> Is the idea to decompile only statements and use the source as-is for
> expressions? I think it is a non-starter. For example, this will leave the
> comments embedded in expression and functions with closures would look really
> ugly.

I meant treating everything that way. I wouldn't expect many expressions, especially of the sort the decompiler prints, to have comments. But if that is an issue, could it be fixed by a simple comment-stripping/JS-pretty-printing step, or would there be too many formatting fixes to be applying?

Igor Bukanov

Reporter

Updated

•

15 years ago

Assignee: general → igor

Igor Bukanov

Reporter

Comment 12

•

15 years ago

I am working on this - besides translating the decompiler into JS the bug requires to expose a native object that gives access to script's bytecode and some constants for the decompiler.

The problem with the first is how to provide universal access to that object that always work no matter what exists in the global scope. I am considering using something like var::bytecode abusing the namespace support for accessing internal objects.

The problem with the second is that currently the emitter uses condswitch, not tableswitch or lookupswitch, when dealing with code like:

const JSOP = var::bytecode;
...
switch (op) {
  case JSOP.NAME:
 ...
}

Here NAME and other symbols are const properties of JSOP, but the emitter does not optimize this case resulting in rather bad performance. For now I plan to use a preprocessor script that embeds const JSOP_BYTECODE  = value; into the decompiler script.

Igor Bukanov

Reporter

Updated

•

15 years ago

Depends on: 510926

Jason Orendorff [:jorendorff]

Comment 13

•

15 years ago

(In reply to comment #12)
> The problem with the first is how to provide universal access to that object
> that always work no matter what exists in the global scope. I am considering
> using something like var::bytecode abusing the namespace support for accessing
> internal objects.

Ooh. Let's get this right the first time -- I don't think core stuff should depend on E4X (which after all is conditionally compiled in), and we want to use this for a lot more than the decompiler.

I think totally nonstandard, ugly, magical syntax is warranted, e.g. 

    var bytes = #builtins#.getFunctionBytecode(fn);

Because this is basically an intentional back door into the engine, I think it should be as isolated from and orthogonal to other parser/emitter/interpreter features as we can possibly make it.

Brendan Eich [:brendan]

Comment 14

•

15 years ago

Why not do what I did in imacro_asm.js.in?

/be

Jason Orendorff [:jorendorff]

Comment 15

•

15 years ago

In JS_TRACER builds, imacros already have access to engine-private internal
functions. See JSOP_CALLBUILTIN and js_GetBuiltinFunction. That might or might
not be the right thing to generalize.

This ability to access engine-private internals from JS would help bug 460904
and probably bug 462300 (after we get past easy cases like
Array.prototype.join).

Luke Wagner [:luke]

Comment 16

•

15 years ago

Coming from bug 460904, yes, I'd really like what comment 13 is talking about.  Igor, if you like the idea too, perhaps we should coordinate our writing of it?  I'm happy to start but I won't be super fast.

Jeff Walden [:Waldo]

Comment 17

•

15 years ago

(In reply to comment #12)
> Here NAME and other symbols are const properties of JSOP, but the emitter does
> not optimize this case resulting in rather bad performance.

Sucks to be them.  We've never really cared about decompilation performance, nor do I think we should, and we should push back on anyone who complains about it.

Comment 13 has it right, but I would go one further and say that $magicalSyntax must not be available to web scripts or anything of that nature.  Bytecode examination of function objects -- can you say information-hiding hazard?  I want no part of any idea that isn't so internal that even XPConnect can't touch it.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 18

•

15 years ago

(In reply to comment #17)
> Sucks to be them.  We've never really cared about decompilation performance,
> nor do I think we should, and we should push back on anyone who complains about
> it.

Decompilation performance is perhaps relevant for some web scenarios (eBay's monkey patching is one case), and very much for our debugging tools.  If we're going to ignore decompilation performance completely (10x loss, etc.) then we need to be very sure that we have alternate means of finding things like argument names and so forth in Firebug.

Luke Wagner [:luke]

Comment 19

•

15 years ago

(In reply to comment #17)
> I want no part of any idea that isn't so internal that even XPConnect can't
> touch it.

100% agreed.  That was in from the original #jsapi discussion.

Andreas Gal :gal

Comment 20

•

15 years ago

I would suggest an experiment where we store the original source string via strdup in the JSScript and then see what the impact on our memory use is. If it doesn't ding talos numbers, I would suggest getting rid of the decompiler this way.

Brendan Eich [:brendan]

Comment 21

•

15 years ago

People use the decompiler as a pretty-printer too. It's useful in ways that source recovery is not.

Agree with shaver, let's not be naive about perf hit. I know Igor won't be. OTOH I'm still thinking we don't need const propagation including for FOO.BAR in any code of the form switch (x) { case FOO.BAR:...} to skin this cat!

If we want #magic# stuff, we should also turn off free variables, as in make them errors. Otherwise capability leaks will happen. How about a JSOPTION_SELF_HOSTING compile-time option?

/be

Jeff Walden [:Waldo]

Comment 22

•

15 years ago

Option seems susceptible to being mistakenly set by a buggy JSAPI user -- even that much exposure worries me.  I'd rather see an extra argument to the (potentially hypothetical) js_CompileScript that you have to be engine-internal to provide -- or something of that nature, not passed by a value JSAPI users could provide.

I wasn't suggesting being naive about perf hit, but I do recall a couple O(n**2) algorithms in decompilation of array literals that had been there for years before I touched it (and to be honest, I don't even remember if I fixed that or not).  If we don't feel pressure from those, I don't think we have to worry about constant-propagation of const properties on const objects (for example).

Brendan Eich [:brendan]

Comment 23

•

15 years ago

The switch loop approach may be painfully slow with condswitch, no matter how slow the current C++ code is on some array literal corner case. The cost Igor was concerned with is pervasive.

Your point about JSOPTION_WHATEVER is well taken. We can use the internal API and pass a new TCF_* flag (if out of bits, let me know -- probably can scrounge).

/be

Luke Wagner [:luke]

Comment 24

•

15 years ago

(In reply to comment #23)
> Your point about JSOPTION_WHATEVER is well taken. We can use the internal API
> and pass a new TCF_* flag (if out of bits, let me know -- probably can
> scrounge).

I can't see any obvious way to reuse an existing TCF_ flag.  It seems all 16 bits of tcflags are used.  Can I steal a bit from JSFB_LEVEL_BITS (and shrink FREE_STATIC_LEVEL accordingly)?

Brendan Eich [:brendan]

Comment 25

•

15 years ago

(In reply to comment #24)
> (In reply to comment #23)
> > Your point about JSOPTION_WHATEVER is well taken. We can use the internal API
> > and pass a new TCF_* flag (if out of bits, let me know -- probably can
> > scrounge).
> 
> I can't see any obvious way to reuse an existing TCF_ flag.  It seems all 16
> bits of tcflags are used.  Can I steal a bit from JSFB_LEVEL_BITS (and shrink
> FREE_STATIC_LEVEL accordingly)?

Could, but then we talked about adding another trailing parameter, a bool, to JSCompiler::compileScript, and just calling that a friend API.

/be

Igor Bukanov

Reporter

Comment 26

•

15 years ago

(In reply to comment #22)
> I wasn't suggesting being naive about perf hit, but I do recall a couple
> O(n**2) algorithms in decompilation of array literals that had been there for
> years before I touched it (and to be honest, I don't even remember if I fixed
> that or not).  If we don't feel pressure from those, I don't think we have to
> worry about constant-propagation of const properties on const objects (for
> example).

The issues is that without const-propagation a switch over bytecode names would be on average 100*x times slower than an optimized switch where x is a slowdown factor of doing a property lookup versus reading a const from a bytecode. So we are talking about a slowdown on the scale of 1000 times.

As regarding a particular syntax to access internal names I am open for any proposals as long as they provide a simple and short way to access and implement them. On the other hand that may not even be necessary if all the necessary functions would be compiled with a special global object.

Igor Bukanov

Reporter

Updated

•

12 years ago

Assignee: igor → general

:Benjamin Peterson

Comment 27

•

11 years ago

The decompiler is gone now. There's the expression decompiler, but it's far simpler than the old one, so needs a safe language less than the old one.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX