Reflect.parse: range-based location info

UNCONFIRMED
Unassigned

Status

()

Core
JavaScript Engine
UNCONFIRMED
5 years ago
3 years ago

People

(Reporter: Ariya Hidayat, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: reflect-parse)

(Reporter)

Description

5 years ago
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162 Safari/535.19

Steps to reproduce:

The optional location info for each AST node is in the form of column and line. While this is sufficient for user information, such information does not permit easy reference to the original source.

I propose that an optional range-based information is also made available for each node. Esprima (http://esprima.org/) implements this already, this can be easily tried with the live parser demo: http://esprima.org/demo/parse.html (make sure to check "Index-based range").
(Reporter)

Updated

5 years ago
Whiteboard: reflect-parse
(Reporter)

Comment 1

5 years ago
An example output:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "Literal",
                "value": 42,
                "range": [
                    0,
                    1
                ]
            },
            "range": [
                0,
                1
            ]
        }
    ],
    "range": [
        0,
        1
    ]
}
(Reporter)

Comment 2

5 years ago
Kris Kowal suggested that the range end should not include.

For Esprima, I arbitrarily picked the including second number because mathematically it resembles a real interval: [a, b] includes both a and b.

Comment 3

5 years ago
My reasoning is that an inclusive end maps well to String..slice arguments, notwithstanding [a, b) also being valid interval notation :P
(Reporter)

Comment 4

5 years ago
It will be fun if array literal is permitted to use non-symmetric parentheses like that :) I'm sure it's more exciting than just semicolon discussion.

Either inclusive range end or not, I don't have a strong preference.

Another alternative would be [start, length] pair. This may be confusing for the first few nodes (since the numbers are not too different), but it would be obvious for the rest.
I'm with Kris. It's cute to imagine the array notation doubling as range notation, but half-open intervals are widely recognized as The Right Thing, and as Kris says it matches existing practice and libraries in JS.

I'm not sure how likely we are to be able to add range information to Reflect.parse. I'll ask the SpiderMonkey group, but if it's not information we need for SpiderMonkey, they're not going to want to take any performance hit. That said, I'd be happy to add it to the API docs and just say that it's not implemented by Reflect.parse.

Dave
(Reporter)

Comment 6

5 years ago
I'm fine with using half-open intervals, i.e. non-inclusive range end. I can definitely change Esprima rather easily.

Before I go on, do we agree on the property name and value for the information, i.e. 'range' and two-element array, respectively?

As for SpiderMonkey, since the line and colum-based node information is already available, theoretically one can add the range information as a post-processing step, most likely after a second (quick) pass to obtain the starting index of each line, for the mapping purpose. Thus, it can be marked either as "not implemented", post-processed in pure JavaScript (outside SM code), or documented as post-processable. The first choice is always a harmless start :)
(Reporter)

Comment 7

5 years ago
Related Esprima issue:

http://code.google.com/p/esprima/issues/detail?id=247
> Before I go on, do we agree on the property name and value for the
> information, i.e. 'range' and two-element array, respectively?

Does 'range' replace 'loc'? Does this suggest a three-valued option for location information? I don't have an opinion at midnight, but maybe tomorrow with a clearer head I'll have some thoughts.

Dave

Comment 9

5 years ago
In Esprima, "loc" and "range" are independent parser options and create separate properties on each syntax node. "loc" is useful for users whereas "range" is more useful for programs, so it makes sense to support any combination of the options. I think this is sound.
(Reporter)

Comment 10

5 years ago
dherman: Any further thought on this? I'm ready to change Esprima to output half-open interval for the range.
(Reporter)

Comment 11

5 years ago
JFYI, Esprima is now using half-open interval: https://github.com/ariya/esprima/commit/ab42d9fe.
Sorry I didn't comment sooner; this all sounds fine to me.

Dave

Updated

5 years ago
Depends on: 568142
Blocks: 790257

Comment 13

5 years ago
(In reply to kris from comment #9)
> In Esprima, "loc" and "range" are independent parser options and create
> separate properties on each syntax node. "loc" is useful for users whereas
> "range" is more useful for programs, so it makes sense to support any
> combination of the options. I think this is sound.

Why have two different places for location information? ie. why:
                        "range": [
                            38,
                            44
                        ],
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4
                            },
                            "end": {
                                "line": 2,
                                "column": 10
                            }
                        }
instead of one place:
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4,
                                "offset": 38
                            },
                            "end": {
                                "line": 2,
                                "column": 10,
                                "offset": 44
                            }
                        }
(Or even better, just elide "loc" node altogether, just have start/end with either line/column or offset or both).

Comment 14

4 years ago
Given that this is inspired by work on esprima, would it be better to use a simpler type rather than an array as that would be easier for the engine to optimise?  I find loc.start.offset more natural and I think it'll perform better.

Is the there a difficulty with removing the "loc:{start:{},end:{},source}" structure to a flatter "start:{},end:{},source" because loc may be null if location tracking is not needed and start, end and source take up more space if promoted to a higher level?
(Assignee)

Updated

3 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.