As I've been wading through the E4X bugs and testing them against the nearly-complete non-XMLBeans E4X support, I've been able to catalog those bugs that can't be fixed without altering the underlying parser code. I'm creating this issue to address the parser situation (and marking it as part of Core, although we could revisit that) and setting it up as a blocker for those other issues. See the "Blocks" list for details. There has been some discussion on the newsgroup as of late regarding the parser; see
Apologies, submitted too early. See: http://groups.google.com/group/mozilla.dev.tech.js-engine/browse_thread/thread/773c5cc56bfad5f3/# for the discussion referenced above.
I am generalizing this from E4X issues to a general parser issue, as there are non-E4X bugs that might want to mark this as a blocker.
Summary: Rhino parser cannot handle many E4X constructs → Rhino parser inspires fear, awe
Changing to "enhancement" since of course the parser works. It just inspires a sense of impotence and unworthiness in some of those who might seek to master it.
Severity: normal → enhancement
Since I've been making changes to the parser for months now, I don't feel either fear or awe at it. I think it's kind of clean, as parsers tend to go. Maybe I'm just strange, but I'm not concerned about the state of the parser. In any event, I don't see anything actionable about this bug. What exactly could we do to make it less fearsome?
Well, I created this as a blocker for the E4X stuff based on Attila's comments (and those of others) in this thread: http://groups.google.com/group/mozilla.dev.tech.js-engine/browse_thread/thread/773c5cc56bfad5f3/a6fa53f5870f1029?lnk=gst&q=parser&rnum=2#a6fa53f5870f1029 Perhaps I scared too easily. But anyway, it seemed at the time that some of the approaches described in the thread might be the "actionable" stuff here. I thought if that discussion resulted in a solution then when the solution was implemented this bug would be closed. I am not so good with parser generators, etc. -- do you have any thoughts about the discussion in the thread above?
My comment on the above thread is that whenever anobody offers you a parser generator run away. I have never known a self-respecting production compiler writer who has used one. If you think the existing parser is scary, then parser generators are far, far, far worse. I have used them and I abandoned them 27 years ago in favor of the recursive descent parsers that I have written ever since. If I were writing a parser from scratch the most significant difference with what is there is that I would use a precedence table for binary expression parsing. The result is a single recursive function that handles all binary operators instead of the cascade of very similar functions we have today (that are modeled on the grammar). Otherwise, I think the Rhino parser is a pretty clean recursive descent parser that has few problems. I found it effortless to patch in changes (adding the get/set syntax took less than a day, const took about the same). You haven't seen my type checker changes that involve parsing JSDoc comments and attaching type annotations to the parse tree. Those were just as easy to plumb through.
Regarding comment 7: We tokenize the XML, though, as well as tokenizing some of the ordinary text nodes (e.g., if they contain curly braces), so it's a little bit more complex than your ordinary XML case, because within our text nodes we can have XML inside curly braces, etc. So there are more possible states than in a vanilla XML parser. As for the parser generator debate (or more generally the debate on whether the parser could be simplified) ... I am mostly a bystander (at least for now, unless I dive in a little more). But if we decide that doing nothing is the best choice, then I have no problem closing this bug.
Of course I'm biased, but I'd agree with Bob. Staying with the existing recursive descent parser is the way to go. The Rhino parser was heavily informed by the SpiderMonkey parser; starting over with a different approach is likely to lead to subtle incompatibilities with the de facto standard on tricky things like semicolon insertion.
In that case, it seems like there's a consensus sufficient for closing this. If I don't hear an objection in the next few days, I'll do that.
Cleaning up the bug list, as it seems that there is widespread agreement here.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.