Closed Bug 336376 Opened 16 years ago Closed 16 years ago

Don't reserve identifiers in property identifier contexts

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.8.1alpha1

People

(Reporter: brendan, Assigned: igor)

References

Details

(Keywords: verified1.8.1)

Attachments

(3 files, 7 obsolete files)

This is a proposed change for ECMA-262 Edition 4.  The suggestion comes from Douglas Crockford.  Instead of reserving identifiers unconditionally, the lexer would reserve outside of contexts that must be property names, specifically

* on the right of . and .. operators
* before the : in a property initialiser within an object initialiser

This would have resolved longstanding complaints about the need to write f['delete']() to delete the filesystem node represented by a Java file object denoted by f.  It will enable ES4/JS2 to follow PEP-342 by supporting g.throw(e) for a generator denoted by g.

Lars Hansen of Opera pointed out something to avoid at the last ECMA TG1 face to face meeting: a missing identifier error such as

  x = o.
  if (c)

should not be parsed as x = o.if(c).  I propose we avoid this by including same-line feedback already needed for automatic semicolon insertion.

/be
Blocks: geniter
(In reply to comment #0)
> Lars Hansen of Opera pointed out something to avoid at the last ECMA TG1 face
> to face meeting: a missing identifier error such as
> 
>   x = o.
>   if (c)
> 
> should not be parsed as x = o.if(c).

Why is this more problemetic then the following?

  x = o.
  y = p;

> I propose we avoid this by including
> same-line feedback already needed for automatic semicolon insertion.

BTW, I saw quite a few JS lines like:

var x = document.getElementById("SomeLongName").
         someDomApiToLookForChild().
           css.someProperty;

Hei, I even wrote such constructions on my own. So the question is how one can split such lines if "." would require the property name to be on the same line?
(In reply to comment #1)
> (In reply to comment #0)
> > Lars Hansen of Opera pointed out something to avoid at the last ECMA TG1 face
> > to face meeting: a missing identifier error such as
> > 
> >   x = o.
> >   if (c)
> > 
> > should not be parsed as x = o.if(c).
> 
> Why is this more problemetic then the following?
> 
>   x = o.
>   y = p;

Since y is not a reserved identifier, but if is currently, the above will work thanks to automatic semicolon insertion, but if you change y to if, it will fail. This bug could be interpreted to mean: make the above work with y changed to if. That's not what we want -- we want an error.

> > I propose we avoid this by including
> > same-line feedback already needed for automatic semicolon insertion.
> 
> BTW, I saw quite a few JS lines like:
> 
> var x = document.getElementById("SomeLongName").
>          someDomApiToLookForChild().
>            css.someProperty;
> 
> Hei, I even wrote such constructions on my own. So the question is how one can
> split such lines if "." would require the property name to be on the same line?

Oh, for sure -- the proposal is not to require the right operand of . to be on the same line -- it's only to unreserve identifiers if it is.  Or to do something better than that, so long as it doesn't leave ambiguities, degrade error messages for real errors, or require more than 1 or 2 tokens of lookahead.

/be
(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > Lars Hansen of Opera pointed out something to avoid at the last ECMA TG1 face
> > > to face meeting: a missing identifier error such as
> > > 
> > >   x = o.
> > >   if (c)
> > > 
> > > should not be parsed as x = o.if(c).
> > 
> > Why is this more problemetic then the following?
> > 
> >   x = o.
> >   y = p;
> 
> Since y is not a reserved identifier, but if is currently, the above will work
> thanks to automatic semicolon insertion,

IOW, it will be parsed as 'x = o.y = p;', a valid nested assignment expression statement.

> but if you change y to if, it will fail.

Viz,

js> x = o.
if (c)
typein:2: SyntaxError: syntax error:
typein:2: if (c)
typein:2: ^

> Or to do
> something better than that, so long as it doesn't leave ambiguities, degrade
> error messages for real errors, or require more than 1 or 2 tokens of
> lookahead.

Igor, counter-proposals?  Now is the time to propose and experiment.  If you would like to take this bug, please do.

/be
(In reply to comment #3)
> > >   x = o.
> > >   y = p;
> > 
> > Since y is not a reserved identifier, but if is currently, the above will work
> > thanks to automatic semicolon insertion,
> 
> IOW, it will be parsed as 'x = o.y = p;', a valid nested assignment expression
> statement.

I just wanted to point out that mistypes like writing '.' instead of ';' in the present language can already produce bogus while syntactically correct code. So why making efforts to protect against

  x = o.
  if (c)

while ignoring

  x = o.
  y = p;

?

> > Or to do
> > something better than that, so long as it doesn't leave ambiguities, degrade
> > error messages for real errors, or require more than 1 or 2 tokens of
> > lookahead.
> 
> Igor, counter-proposals?

Something like a warning mode that looks at the identation level and notifies about strange usage of '.' at the end of line? I just do no see a reason to prevent one to write:

longLineToGetFileName.
  delete();


> If you would like to take this bug, please do.

Right, lets have some code.
Assignee: general → igor.bukanov
(In reply to comment #4)
> I just wanted to point out that mistypes like writing '.' instead of ';' in the
> present language can already produce bogus while syntactically correct code. So
> why making efforts to protect against
> 
>   x = o.
>   if (c)
> 
> while ignoring
> 
>   x = o.
>   y = p;
> 
> ?

Mainly to preserve quality of diagnostics, and at the limit, backward compatibility.  Imagine someone had such a typo as

x = o.
if (c)
 . . .

and for some reason depended on the SyntaxError.  If JS2 accepts this all of a sudden, rockets could crash.

> Something like a warning mode that looks at the identation level and notifies
> about strange usage of '.' at the end of line? I just do no see a reason to
> prevent one to write:
> 
> longLineToGetFileName.
>   delete();

Hmm, indentation heuristics in the scanner, I'm scared.  ECMA TG1 members may vomit this one back with "rejected" stamped to it.  But you are right that we may end up wanting to avoid same-line heuristics too.  The end-point of simplifying is to unreserve in syntactic context, ignoring lexical feedback, and just have to risk incompatibility -- or else require an explicit version selection to enable the feature.  We are trying to avoid that.

/be
(In reply to comment #0)
> This is a proposed change for ECMA-262 Edition 4.  The suggestion comes from
> Douglas Crockford.  Instead of reserving identifiers unconditionally, the lexer
> would reserve outside of contexts that must be property names, specifically
> 
> * on the right of . and .. operators
> * before the : in a property initialiser within an object initialiser

  * on the right of the :: operator where the right operand is an identifier

/be
Attached patch First implementation attempt (obsolete) — Splinter Review
Here is the first attempt to implement context-sensitive keywords. 

The patch introduces 2 new token stream flags, TSF_PROPERTY and TSF_AFTER_DOT. The first is used to mark context that follows '::', '..' or is a property name in object literal. The second marks that token follows '.'. 

These 2 flags are necessary to allow to use function:: extension as prefix for method names after '.'. That is, the patch assumes that in cases like:

xml..function
ns::function
{ function: 0 }

function should be treated as property name while

xml.function::children

should refer to the 'children' method of xml, and not to the element with local name 'children' in the namespace referred by 'function' variable.

The patch does not have JS version checks and does not deal with the issue like:

a = o.
if (condition)
(In reply to comment #5)
> Hmm, indentation heuristics in the scanner, I'm scared.  ECMA TG1 members may
> vomit this one back with "rejected" stamped to it.

I do not suggest indentation-sensitive-parser! In fact, this is my biggest dislike of Python. What exactly is indentation level in presence of various control/formatting characters from the full Unicode zoo?

The suggestion is to have a tool that would allow to report potentially bogus code assuming a sane indentation of the source using ASCII blnaks and 8-char horizontal tabs. 
In the view of the current proposal any suggestions how to deal with

xml.function::children  ?

The patch above makes 'function' keyword exceptional as the single keyword that is not allowed after '.'.  But that is just one possibility. Perhaps it is better not to allow keywords as namespace prefixes unless keyword:: has a defined meaning? Then one can still write:

xml.function and xml.try but not xml.try::name 
Status: NEW → ASSIGNED
(In reply to comment #9)
> In the view of the current proposal any suggestions how to deal with
> 
> xml.function::children  ?
> 
> The patch above makes 'function' keyword exceptional as the single keyword that
> is not allowed after '.'.  But that is just one possibility. Perhaps it is
> better not to allow keywords as namespace prefixes unless keyword:: has a
> defined meaning? Then one can still write:
> 
> xml.function and xml.try but not xml.try::name 

This does sound better to me -- then we just need a bit in jskeyword.tbl to say which keywords have defined meaning as namespace identifiers.  ES4 may add more: public, private, internal, and possibly intrinsic.

Thanks for the patch, it looks good at a glance -- could you furnish a diff -w version?  I'll review over the weekend in greater detail.

/be
Attached patch Implementation v2 (obsolete) — Splinter Review
Here is a version that allow to use keywords as property names in object literals, after '::', '.', '..' unless the keyword is followed by '::' forming namespace prefix. That is, the following is valid:

// Construct object with 2 properties, 'function' and 'try'
var obj = { function: 1, try: 2 }  

// Refer to a xml child with name given by QName(ns, 'function') 
xml.ns::function  

// Refer to the properties 'function' and 'try' 
obj.function
obj.try

// Refer to method children of xml object
xml.function::children

While the following generates a syntax error:

xml.try::name

The patch does not check for JS language version in any way. Should it?
Attachment #220919 - Attachment is obsolete: true
Attachment #221104 - Flags: review?(brendan)
Attached patch diff -w of Implementation v2 (obsolete) — Splinter Review
Attached patch Implementation v3 (obsolete) — Splinter Review
This is the previous patch + fixes to allow to compile with JS_VERSION=150. Note the patch allows to use keywords for property names with this compilation define as well.
Attachment #221104 - Attachment is obsolete: true
Attachment #221105 - Attachment is obsolete: true
Attachment #221159 - Flags: review?(brendan)
Attachment #221104 - Flags: review?(brendan)
Attached patch diff -U8 -w of Implementation v3 (obsolete) — Splinter Review
Comment on attachment 221159 [details] [diff] [review]
Implementation v3

Great!  Nits only:

1. Avoid a jsfun.c patch via #define js_IsKeyword...?
2. Might want to comment on how firstTokenFlags affects PrimaryExpr only in cases where the caller has not peeked or done a get/unget -- which are the relevant cases for TSF_KEYWORD_IS_NAME, but not for all other TSF_ flags.
3. jsproto.tbl really needs to #include "jsconfig.h" and use even more #if tests, and therefore include a code member for the enumerator value, so it does not vary and break XDR compatibility.

My patch in bug 326466 fixes item 3, so if you are game, please incorporate the subset of that patch that makes sense, or else just leave jsproto.tbl unchanged for now and I'll get it.

Thanks for doing this.  I'll specify it for ECMA TG1 and see what they say.

/be
Attachment #221159 - Flags: review?(brendan) → review+
Attached patch Implementation v4 (obsolete) — Splinter Review
Here is a version to address the issues from comment 15:

1. PrimaryExpr delegates to its caller the responsibility of getting the token. The previous version that relied on flags was too fuzzy even with comments.

2. jssscan.h defines js_IsKeyword as macro to avoid touching jsfun.c

3. The patch does not touch jsprtoto.tbl
Attachment #221159 - Attachment is obsolete: true
Attachment #221160 - Attachment is obsolete: true
Attachment #221287 - Flags: review?(brendan)
Attached patch diff -U8 -w of Implementation v4 (obsolete) — Splinter Review
Comment on attachment 221287 [details] [diff] [review]
Implementation v4

Nits only:

>@@ -1282,23 +1283,20 @@
>         }
>         UngetChar(ts, c);
> 
>+        /*
>+         * Check for keywords unless we saw Unicode escape or parser asks
>+         * to ignore keywords.
>+         */
>         if (!hadUnicodeEscape &&
>+            (ts->flags & TSF_KEYWORD_IS_NAME) == 0 &&

!(ts->flags & TSF_KEYWORD_IS_NAME) is more typical style, seems to read better to me as a boolean expression.

>+++ src/jsscan.h	2006-05-08 09:54:19.000000000 +0200
>@@ -272,6 +272,9 @@
>  */
> #define TSF_IN_HTML_COMMENT 0x2000
> 
>+/* Ignore keywords and return TOK_NAME instead to the parser. */
>+#define TSF_KEYWORD_IS_NAME  0x4000

Line up 0x4000 with 0x2000?

>+#define js_IsKeyword(chars, length) (js_CheckKeyword(chars, length) != TOK_EOF)

Columns 80 and above are off-limits ;-).

r=me with nits picked. Thanks again,

/be
Attachment #221287 - Flags: review?(brendan) → review+
Attached patch Patch to commitSplinter Review
Here is the patch to commit with addressed
Attachment #221287 - Attachment is obsolete: true
Attachment #221288 - Attachment is obsolete: true
I committed the patch from comment 19 but my curiosity prevents from marking bug as fixed and queries about function names. 

AFAICS keywords in function names can be ignored unless one wants to implement that function foo.bar() {} extension from JScript. I.e. any reasons not to allow:

function delete() { } 

given that one can refer to the function as function::delete?
I rverted the commit as the tree was closed :(
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
I did not mean to mark as fixed!
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #20)
> I committed the patch from comment 19 but my curiosity prevents from marking
> bug as fixed and queries about function names. 
> 
> AFAICS keywords in function names can be ignored unless one wants to implement
> that function foo.bar() {} extension from JScript. I.e. any reasons not to
> allow:
> 
> function delete() { } 
> 
> given that one can refer to the function as function::delete?

Is that better than this['delete'] or whatever object works, if this does not?

The use of function as a built-in namespace is not ECMA-sanctioned, and it if were, the use of a namespace qualifier to avoid keyword ambiguity seems strained, compared to this bug's unambiguous (excluding errors involving missing tokens) desire to use keywords after dot and similar operators.

So I'm not as motivated.  How about a separate bug?

/be

Igor, could you please land the patch to commit on JS_1_7_ALPHA_BRANCH?  Thanks.

/be
The branch has changes that moved declaration from the top-level of PrimaryExpr into local blocks. Is this a policy change? In the updated version I tried to follow it.
(In reply to comment #24)
> Igor, could you please land the patch to commit on JS_1_7_ALPHA_BRANCH? 

Done.
I committed this to the js1.7a branch.

/be
Forgot to note that toolkit/mozapps/extensions/src/nsExtensionManager.js uses sharp variables.

/be
Marking this bug as fixed (as the patch is checked into the trunk), we should continue the discussion about |function delete()| in bug 343675, which I just filed.
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Keywords: fixed1.8.1
Need some valid and invalid examples to test. 
Blocks: 344005
Flags: in-testsuite+
forgot to mention:    Repository revision: 1.1     /cvsroot/mozilla/js/tests/js1_7/lexical/regress-336376-01.js,v

verified fixed 1.8.1, trunk 20060723 win/linux/mac(ppc|tel)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.