Open Bug 745678 Opened 12 years ago Updated 2 years ago

Reflect.parse: range-based location info

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect

Tracking

()

UNCONFIRMED

People

(Reporter: ariya.hidayat, Unassigned)

References

Details

(Whiteboard: reflect-parse)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.162 Safari/535.19

Steps to reproduce:

The optional location info for each AST node is in the form of column and line. While this is sufficient for user information, such information does not permit easy reference to the original source.

I propose that an optional range-based information is also made available for each node. Esprima (http://esprima.org/) implements this already, this can be easily tried with the live parser demo: http://esprima.org/demo/parse.html (make sure to check "Index-based range").
Whiteboard: reflect-parse
An example output:

{
    "type": "Program",
    "body": [
        {
            "type": "ExpressionStatement",
            "expression": {
                "type": "Literal",
                "value": 42,
                "range": [
                    0,
                    1
                ]
            },
            "range": [
                0,
                1
            ]
        }
    ],
    "range": [
        0,
        1
    ]
}
Kris Kowal suggested that the range end should not include.

For Esprima, I arbitrarily picked the including second number because mathematically it resembles a real interval: [a, b] includes both a and b.
My reasoning is that an inclusive end maps well to String..slice arguments, notwithstanding [a, b) also being valid interval notation :P
It will be fun if array literal is permitted to use non-symmetric parentheses like that :) I'm sure it's more exciting than just semicolon discussion.

Either inclusive range end or not, I don't have a strong preference.

Another alternative would be [start, length] pair. This may be confusing for the first few nodes (since the numbers are not too different), but it would be obvious for the rest.
I'm with Kris. It's cute to imagine the array notation doubling as range notation, but half-open intervals are widely recognized as The Right Thing, and as Kris says it matches existing practice and libraries in JS.

I'm not sure how likely we are to be able to add range information to Reflect.parse. I'll ask the SpiderMonkey group, but if it's not information we need for SpiderMonkey, they're not going to want to take any performance hit. That said, I'd be happy to add it to the API docs and just say that it's not implemented by Reflect.parse.

Dave
I'm fine with using half-open intervals, i.e. non-inclusive range end. I can definitely change Esprima rather easily.

Before I go on, do we agree on the property name and value for the information, i.e. 'range' and two-element array, respectively?

As for SpiderMonkey, since the line and colum-based node information is already available, theoretically one can add the range information as a post-processing step, most likely after a second (quick) pass to obtain the starting index of each line, for the mapping purpose. Thus, it can be marked either as "not implemented", post-processed in pure JavaScript (outside SM code), or documented as post-processable. The first choice is always a harmless start :)
Related Esprima issue:

http://code.google.com/p/esprima/issues/detail?id=247
> Before I go on, do we agree on the property name and value for the
> information, i.e. 'range' and two-element array, respectively?

Does 'range' replace 'loc'? Does this suggest a three-valued option for location information? I don't have an opinion at midnight, but maybe tomorrow with a clearer head I'll have some thoughts.

Dave
In Esprima, "loc" and "range" are independent parser options and create separate properties on each syntax node. "loc" is useful for users whereas "range" is more useful for programs, so it makes sense to support any combination of the options. I think this is sound.
dherman: Any further thought on this? I'm ready to change Esprima to output half-open interval for the range.
JFYI, Esprima is now using half-open interval: https://github.com/ariya/esprima/commit/ab42d9fe.
Sorry I didn't comment sooner; this all sounds fine to me.

Dave
Depends on: 568142
(In reply to kris from comment #9)
> In Esprima, "loc" and "range" are independent parser options and create
> separate properties on each syntax node. "loc" is useful for users whereas
> "range" is more useful for programs, so it makes sense to support any
> combination of the options. I think this is sound.

Why have two different places for location information? ie. why:
                        "range": [
                            38,
                            44
                        ],
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4
                            },
                            "end": {
                                "line": 2,
                                "column": 10
                            }
                        }
instead of one place:
                        "loc": {
                            "start": {
                                "line": 2,
                                "column": 4,
                                "offset": 38
                            },
                            "end": {
                                "line": 2,
                                "column": 10,
                                "offset": 44
                            }
                        }
(Or even better, just elide "loc" node altogether, just have start/end with either line/column or offset or both).
Given that this is inspired by work on esprima, would it be better to use a simpler type rather than an array as that would be easier for the engine to optimise?  I find loc.start.offset more natural and I think it'll perform better.

Is the there a difficulty with removing the "loc:{start:{},end:{},source}" structure to a flatter "start:{},end:{},source" because loc may be null if location tracking is not needed and start, end and source take up more space if promoted to a higher level?
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.