Mistreatment of end-of-line and semicolon



14 years ago
14 years ago


(Reporter: Igor Bukanov, Assigned: Igor Bukanov)




(2 attachments)



14 years ago
Rhino parser mistreats end-of-line and semicolon in several places:

1. It consumes ';' after '}' as a part of the statement. For example, it allows
the following code:

if (condition) 
	while (condition2) { };

while in fact it should be rejected since while (condition2) { }; constitutes 2
statements, not one.

2. It does not allow for postfix ++ and -- to follow multi-line expression. For
example, the following valid code is rejected:


3. It does not generate SyntaxError if a statement that requires terminating
semicolon spans several lines does not have one and is followed by another
statement on the same line and auto-insert semicolon before the second
statement. I.e. while it correctly detects invalid code like:

1 + 2 3

it does not generate SyntaxError for
1 +
2 3

Comment 1

14 years ago
Created attachment 156779 [details]
Test case for shell

Comment 2

14 years ago
Created attachment 156784 [details] [diff] [review]
Fix: refactoring of parser/tokenizer iteration

To fix the bug I decided to refactor the parsing and tokenizing interaction in
the following way:

1. TokenStream is no longer responsible for storing pushed back token and
always read a token and return it. In addition TokenStream always return

To compensate for this Parser gets few method like peekToken, consume token
etc. In addition Parser merge Token.EOL* token sequence into single
token|TI_AFTER_EOL where TI_AFTER_EOL flag provides sensitivity for EOL. In
this way it became easier to track all eol issues and the end result is simpler
code in the parser that handles the test case correctly.

2. The patch follows the SpiderMonkey practice and changes the parser to check
for well terminated statement in the single place at the end of
statementHelper. All statements that do not require the terminating semicolon
simply return from the function while those that do fall through to the
semicolon checks. In this way semicolon is consumed as the part of the
statement only if it is required.

In addition the code always autoinsert the semicolon after do-while loop to
follow SpiderMonkey (see bug 238945).

3. The patch fixes mishandling of labeled blocks for decompiler. Now they are
correctly preserved in the output source. 

4. The patch makes TokenStream package private and moves isJSLineTerminator to
ScriptRuntime as public method that regexp parser can use.

5. The patch adds explicit ASSIGN_LSH, ASSIGN_ADD etc. tokens replacing
ASSIGNOP+op pair as it reduces code complexity.

Comment 3

14 years ago
Test case run before the patch:

FAILED: Postfix ++ and -- should be applicable to multiline operand like       (i
FAILED: Check for missed semicolon after the statement should be done even if
the statemen itself spans several lines and the following should produce
        2 3
FAILED: Semicolon after } should not be treated as a part of statement and the
following should produce SyntaxError:
if (1)                                                     
        while (0) { };                                     
else ;                                                     

After the patch:

Comment 4

14 years ago
I committed the patch
Last Resolved: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.5R6
You need to log in before you can comment on or make changes to this bug.