Last Comment Bug 745678 - Reflect.parse: range-based location info
: Reflect.parse: range-based location info
Status: UNCONFIRMED
reflect-parse
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 568142
Blocks: 790257
  Show dependency treegraph
 
Reported: 2012-04-15 22:57 PDT by Ariya Hidayat
Modified: 2014-07-24 11:07 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Ariya Hidayat 2012-04-15 22:57:41 PDT
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162 Safari/535.19

Steps to reproduce:

The optional location info for each AST node is in the form of column and line. While this is sufficient for user information, such information does not permit easy reference to the original source.

I propose that an optional range-based information is also made available for each node. Esprima (http://esprima.org/) implements this already, this can be easily tried with the live parser demo: http://esprima.org/demo/parse.html (make sure to check "Index-based range").
Comment 1 Ariya Hidayat 2012-04-15 23:02:03 PDT
An example output:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "Literal",
                "value": 42,
                "range": [
                    0,
                    1
                ]
            },
            "range": [
                0,
                1
            ]
        }
    ],
    "range": [
        0,
        1
    ]
}
Comment 2 Ariya Hidayat 2012-04-15 23:04:01 PDT
Kris Kowal suggested that the range end should not include.

For Esprima, I arbitrarily picked the including second number because mathematically it resembles a real interval: [a, b] includes both a and b.
Comment 3 kris 2012-04-15 23:10:04 PDT
My reasoning is that an inclusive end maps well to String..slice arguments, notwithstanding [a, b) also being valid interval notation :P
Comment 4 Ariya Hidayat 2012-04-16 19:39:32 PDT
It will be fun if array literal is permitted to use non-symmetric parentheses like that :) I'm sure it's more exciting than just semicolon discussion.

Either inclusive range end or not, I don't have a strong preference.

Another alternative would be [start, length] pair. This may be confusing for the first few nodes (since the numbers are not too different), but it would be obvious for the rest.
Comment 5 Dave Herman [:dherman] 2012-04-17 14:54:38 PDT
I'm with Kris. It's cute to imagine the array notation doubling as range notation, but half-open intervals are widely recognized as The Right Thing, and as Kris says it matches existing practice and libraries in JS.

I'm not sure how likely we are to be able to add range information to Reflect.parse. I'll ask the SpiderMonkey group, but if it's not information we need for SpiderMonkey, they're not going to want to take any performance hit. That said, I'd be happy to add it to the API docs and just say that it's not implemented by Reflect.parse.

Dave
Comment 6 Ariya Hidayat 2012-04-20 14:55:31 PDT
I'm fine with using half-open intervals, i.e. non-inclusive range end. I can definitely change Esprima rather easily.

Before I go on, do we agree on the property name and value for the information, i.e. 'range' and two-element array, respectively?

As for SpiderMonkey, since the line and colum-based node information is already available, theoretically one can add the range information as a post-processing step, most likely after a second (quick) pass to obtain the starting index of each line, for the mapping purpose. Thus, it can be marked either as "not implemented", post-processed in pure JavaScript (outside SM code), or documented as post-processable. The first choice is always a harmless start :)
Comment 7 Ariya Hidayat 2012-04-20 15:00:22 PDT
Related Esprima issue:

http://code.google.com/p/esprima/issues/detail?id=247
Comment 8 Dave Herman [:dherman] 2012-04-22 00:04:02 PDT
> Before I go on, do we agree on the property name and value for the
> information, i.e. 'range' and two-element array, respectively?

Does 'range' replace 'loc'? Does this suggest a three-valued option for location information? I don't have an opinion at midnight, but maybe tomorrow with a clearer head I'll have some thoughts.

Dave
Comment 9 kris 2012-04-22 11:46:07 PDT
In Esprima, "loc" and "range" are independent parser options and create separate properties on each syntax node. "loc" is useful for users whereas "range" is more useful for programs, so it makes sense to support any combination of the options. I think this is sound.
Comment 10 Ariya Hidayat 2012-05-01 18:38:05 PDT
dherman: Any further thought on this? I'm ready to change Esprima to output half-open interval for the range.
Comment 11 Ariya Hidayat 2012-06-03 08:24:31 PDT
JFYI, Esprima is now using half-open interval: https://github.com/ariya/esprima/commit/ab42d9fe.
Comment 12 Dave Herman [:dherman] 2012-06-07 09:31:17 PDT
Sorry I didn't comment sooner; this all sounds fine to me.

Dave
Comment 13 John J. Barton 2012-10-30 15:18:58 PDT
(In reply to kris from comment #9)
> In Esprima, "loc" and "range" are independent parser options and create
> separate properties on each syntax node. "loc" is useful for users whereas
> "range" is more useful for programs, so it makes sense to support any
> combination of the options. I think this is sound.

Why have two different places for location information? ie. why:
                        "range": [
                            38,
                            44
                        ],
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4
                            },
                            "end": {
                                "line": 2,
                                "column": 10
                            }
                        }
instead of one place:
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4,
                                "offset": 38
                            },
                            "end": {
                                "line": 2,
                                "column": 10,
                                "offset": 44
                            }
                        }
(Or even better, just elide "loc" node altogether, just have start/end with either line/column or offset or both).
Comment 14 Alistair Braidwood 2013-01-29 23:28:15 PST
Given that this is inspired by work on esprima, would it be better to use a simpler type rather than an array as that would be easier for the engine to optimise?  I find loc.start.offset more natural and I think it'll perform better.

Is the there a difficulty with removing the "loc:{start:{},end:{},source}" structure to a flatter "start:{},end:{},source" because loc may be null if location tracking is not needed and start, end and source take up more space if promoted to a higher level?

Note You need to log in before you can comment on or make changes to this bug.