Closed Bug 485791 Opened 15 years ago Closed 12 years ago

Self-host the E4X runtime in JS using proxies

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: igor, Assigned: brendan)

References

Details

The inherent complexity of E4X translates inevitably into the code complexity and bugs. Many of those bugs due to using C/C++ to implement the standard are exploitable in nature. Together with very low usability of E4X that leads to disproportionately amount of time spent on analyzing and patching that code.

One way to stop this time waste is to disable E4X in the browser or at least limit it only to chrome scripts. But if that is not an option, then we should consider rewriting E4X in JS with minimal amount of hooks or parser changes that would be necessary to support such implementation.

This would not only minimize the damage from E4X bugs, but would also validate using JS to implement more features that currently require C++ code.
E4X in JS would seem to need catchalls and some kind of QName-like compound qualifier::name support at a low level (including the :: operator).

We could parse the syntax but leave the trees uninterpreted by C++ code, calling built-in JS to interpret the trees at runtime. This would get rid of the E4X bytecode ops (see bug 441416 -- make this bug block that one if appropriate).

Another idea is to make the parser extensible by privileged JS. Whoa!

I'm in favor of doing something like what this bug suggests.

/be
(In reply to comment #1)
> E4X in JS would seem to need catchalls and some kind of QName-like compound
> qualifier::name support at a low level (including the :: operator).

I do not worry about the parser support. It is relatively simple and straightforward code delegating almost all the jobs to the runtime.

QName is tricky as it requires special runtime support (and a lot of it) when it is used as a property id. Still it could be made much safer if its id form would be implemented not as an object. Rather an implementation, for example, could make it a special fat string that stores the namespace part after JSString data. So any code that deals with ids and misses the check for the string specialty would still see a valid id.
ECMA-357 suggests that it's an object, though. But that spec has too many errata and indeed design flaws.

What's more, ES4's namespaces and :: operator seem out for good from ES-Harmony, which probably will have unforgeable generated names (useful for class-private instance variables), along with closures (with even better integrity via let bindings) to avoid leaking any names that should be module- or instance-private variables.

All of which strongly suggests to me that we should rip out all C++ to do with :: and QName from SpiderMonkey. But to fulfill this bug's summary, that means making the parser extensible and delegating to scripted parsing extensions the job of parsing and interpreting any :: expression -- including lexical qname lookup (up the scope chain). Yikes.

Before we cross that bridge, we could consider retrenching to minimal and safer C++ as you suggest. Would it help if QName were a separate GC-thing, a pair of pointers?

struct JSQName {
    JSNamespace *qualifier;
    JSString    *identifier;
};

/be
(In reply to comment #3)
> ECMA-357 suggests that it's an object, though.

Most of the extra checks and code regarding QName support comes when converting it to the id form, which is completely hidden from ECMA. Hence the idea of using pseudo-strings to represent such ids to avoid having any impact on the code that assumes that id is either string or int.
Blocks: 441416
With the proxies API we could very nicely do almost all of e4x in script. If we follow shaver's idea and force e4x object using code to declare itself (maybe "use e4x" like "use strict" ?), we can maintain the semantic oddities of e4x completely without any trace of e4x in the actual VM just by using source-to-source translation. Brendan suggested keeping the e4x parser. The rest of the code can probably go.
Depends on: harmony:proxies
Let's not make up pragmas. The parser and lexer e4x code is the least of our worries at this stage. Even the AST rewriting work (bug 33874) won't allow us to rewrite this E4X lexing/parsing code in JS. It's an idle thought to say "source to source" at this point -- what about eval?

Relevant bugs:
bug 561785 (Make E4X XML objects non-native)
bug 441416 (banish E4X ops to a sub-interpreter, or desugar to function calls)
possibly another bug I can't find about self-hosting e4x.

Let's focus on where we can self-host for memory safety and real simplicity, even better perf via trace-JITted inlining. I'm refocusing this bug's summary on that.

/be
Summary: rewriting E4X implementation in JS → rewriting E4X runtime in JS using proxies
Mine!

Homer, to moldy week-old sandwich that made him sick: "I can't stay mad at you!"

/be
Assignee: general → brendan
Blocks: 561785
> I do not worry about the parser support. It is relatively simple and
> straightforward code delegating almost all the jobs to the runtime.

> The parser and lexer e4x code is the least of our worries at this stage.

Oh.  I've been hitting my head on the e4x-handling code in the scanner quite a bit recently.  Eg. bug 635235.
Nick: what else? I wrote that comment a while ago, bug 635235 is even older (to the beginning of e4x), and I don't know of other such bugs. Please cite some or file them if you can.

Anyway, syntax extension machinery is not this bug.

/be
That's the only bug in the lexer that I'm aware of.  It's mostly just annoying to have to skim over all that XML-specific code in getTokenInternal().  Maybe just hoisting most if out into separate functions would help a lot.
Pulling inline code out (to static inline helper or inline method) is a good idea -- want to file/take that one?

/be
(In reply to comment #11)
> Pulling inline code out (to static inline helper or inline method) is a good
> idea -- want to file/take that one?

Bug 636654.
Summary: rewriting E4X runtime in JS using proxies → Self-host the E4X runtime in JS using proxies
Blocks: 613142
Per a meeting today with dherman, gal, and jodyer, we will build the E5X runtime on top of harmony proxies. This will steer the design toward library methods and functions over special syntax. We will keep the XML literals and some operators and primary expressions such
as *.

Since proxies have no invoke trap, we can't compatibly expose methods of XML.prototype as properties of XML instances. As with ES5's Object.create, etc., the prototype methods will move to be functions of the XML constructor: XML.name(x), XML.parent(x), etc.

/be
(In reply to comment #13)
> We will keep the XML literals and
> some operators and primary expressions such
> as *.

Will some of the existing XML token kinds be able to be removed?
(In reply to comment #14)
> (In reply to comment #13)
> > We will keep the XML literals and
> > some operators and primary expressions such
> > as *.
> 
> Will some of the existing XML token kinds be able to be removed?

Not under this bug's aegis (see "runtime" in summary) and not without breaking the "XML" part. Unless you see something that could be subsumed by existing token types?

/be
(In reply to comment #13)
> Since proxies have no invoke trap, we can't compatibly expose methods of
> XML.prototype as properties of XML instances. As with ES5's Object.create,
> etc., the prototype methods will move to be functions of the XML constructor:
> XML.name(x), XML.parent(x), etc.

With function proxies the syntax could be x().name, x().parent etc. That is x() could return a function view of the XML object.
(In reply to comment #16)
> (In reply to comment #13)
> > Since proxies have no invoke trap, we can't compatibly expose methods of
> > XML.prototype as properties of XML instances. As with ES5's Object.create,
> > etc., the prototype methods will move to be functions of the XML constructor:
> > XML.name(x), XML.parent(x), etc.
> 
> With function proxies the syntax could be x().name, x().parent etc. That is x()
> could return a function view of the XML object.

We discussed ideas like this. It means XML instances would be instances of Function. It requires a proposed extension to function proxies, to keep instanceof XML true as well. It feels wrong to me, but it would be good to hear from others.

/be
No more E4X.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
..... E4X!!!!!

The more I look at it, the more I look at it!!!!! It's beautiful!!!!! 

What if you transform the e4x object into a JSON object? 

Parsing the xml dom, by using the repeat name between the angle brackets as a field name and the between character as a value... Attribute ....

<e4x isItALive="Yes!">live!</e4x>

Xml2Json(xml);
//is now:
{
e4x:{value:"live!",Attribute:{isItALive:"yes"}}
};

// NO ITS NOT E4X!!!!! I DONT UNDERSTAND! Mozilla is the hub for developers! Why isn't their anyone interested in E4X!?!?!?! If we can have this /regExp/ in EMCAScript, why not this <xml>!!! Their must be someone in this community who want to see e4x up and running! Even if it's just a server side extension! Like for node!

I been searching the net for anything to help me bring attention back to e4x.

I been looking for the old rhino engine to use in my argument. Do anyone know where I can download it?

I believe stupid adobe even have e4x in their action-script.

If you just going to send me an email saying this and that, don't bother. I don't see myself reaching back out to this group any time soon, to a group who don't even see the beauty of e4x in the future.  If anyone feel the same as I do, then please send an email.

Thank you for reading.
You need to log in before you can comment on or make changes to this bug.