Closed Bug 173067 Opened 22 years ago Closed 19 years ago

Error using '/' unescaped in RegExp literal in character class, unlike IE

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: douglas, Assigned: mrbkap)

References

()

Details

(Whiteboard: [Patch for bug 85721 breaks error reporting])

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2a) Gecko/20020910
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2a) Gecko/20020910

JSLINT (http://www.crockford.com/javascript/jslint.html) uses a long /regex/ to
tokenize JavaScript sources. The main regex is very long. The browser fails to
load the file containing the regex, and gives no error.

Reproducible: Always

Steps to Reproduce:
1. Paste any JavaScript source into the window.
2. Click the JSLINT button.

Actual Results:  
Nothing.

Expected Results:  
Present a report on the conformance of the source.

JSLINT works correctly on IE.
Have to mark this one invalid.

The problem is not the size of the regexp literal, but the use of the "/"
character in the regexp literal. This works in IE, but does not work in
Perl nor in Mozilla/Netscape, and is expressly forbidden by the ECMA-262
specification for the language.

Note that "/" is permitted as a pattern character by ECMA, and you can
create regexp patterns containing this if you use the RegExp constructor.
However, ECMA specifically forbids the use of "/" in RegExp literals.

For details on this, see bug 98306, particularly Comments 31-34.
Also note the Evangelism bug 101070 filed against Microsoft on this.
The ECMA-262 Edition 3 spec is at http://www.mozilla.org/js/language/.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
> The browser fails to load the file containing the regex, and gives no error.

I do see an error. When I load the jslint site in Mozilla 2002093008,
I immediately see this error in Tools > Web Development > JavaScript Console:

Error: unterminated character class [
Source File: http://www.crockford.com/javascript/jslint.js
Line: 4, Column: 45

Source Code:
var i=0,line=0,character=0,s=lines[0];var 
tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00
-\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&
=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?|\^=?|\!=?=?|[a-



Here is what I get when I try to define the same regexp literal |tx|
in the standalone JavaScript shell:


js> var 
tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00
-\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==
?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?
|\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+(
[xX][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9]+))?)/,rx=
/^(\\[^\x00-\x20]|\[(\\[^\x00-\x20]|[^\x00-\x
20\\])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/;

58: SyntaxError: unterminated character class [:
58: var 
tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00
-\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==
?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?
|\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+(
[xX][0-9a
58: .......^



Here is what I get with the RegExp patch for bug 85721 applied:


js> var 
tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00
-\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==?
=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?|
\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+([x
X][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9]+))?)/,rx=/^
(\\[^\x00-\x20]|\[(\\[^\x00-\x20]|[^\x00-\x20\
\])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/;

7: SyntaxError: unterminated character class *:
7: var 
tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00
-\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==?=
?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?|\
^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+([xX
][0-9a
7: .......^



Here is what Perl does:

$ perl -e '"abc" =~ 
/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00-\x
1f\x27\\])*\x27|[(){}[\]
,;?:~^]|==?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]
|!--|<=?)?|\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_
$]*|[0-9]+([xX][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9
]+))?)/,rx=/^(\\[^\x00-\x20]|\[(\\[^\x00-\x2
0]|[^\x00-\x20\\])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/;'

Unmatched [ before HERE mark in regex 
m/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00-\
x1f\x27\
\])*\x27|[(){}[\],;?:~^]|==?=?|/[ << HERE *=/ at -e line 1.




It looks like only Perl is recording the correct position of the error.
The contents of the character class following "HERE" is:

                           [*=/]

That's the problem: the use of "/" in the regexp literal.
Thanks for spotting the /... [*=/] .../.

Three observations: 

First, if this is in fact a syntax error, then the browser should report it 
correctly.

Second, I disagree with the interpretation of 15.10.1. It appears to me to 
allow unescaped / in ClassRanges. Only \ ] - expressly require escapement. 
7.8.5 seems the problem here, offering comfort to lazy compiler writers.

Third, any time IE interprets the spec in a more useful way, Mozilla suffers by 
comparison and by de facto incompatibility.
Douglas: thank you for this report. I'm going to reopen this bug as
you suggest, to correct the inaccuracies in our error messages above.

In addition, the patch for bug 85721 is incorrectly stating 

                 unterminated character class *
instead of       unterminated character class [



In order for the site to work in Mozilla, all one has to do is use the
RegExp constructor instead of a regexp literal whenever the character "/"
has to be matched;  e.g. 

replace   var tx =        /^\s*(\x22   etc.  etc.   lx=/\*\/|\/\*/ ;
   with   var tx = RegExp('^\s*(\x22   etc.  etc.   lx=/\*\/|\/\*');




> Second, I disagree with the interpretation of 15.10.1. It appears to me
> to allow unescaped / in ClassRanges. Only \ ] - expressly require escapement. 

Correct. "/" is allowed by ECMA 15.10.1 as a regexp pattern character.
That's why the definition of |tx| via the RegExp constructor works.

But ECMA 7.8.5 specifically prohibits "/" from regexp literals:

7.8.5 Regular Expression Literals 

A regular expression literal is an input element that is converted
to a RegExp object (section 15.10) when it is scanned. 

                   etc. 

RegularExpressionLiteral :: 
            / RegularExpressionBody / RegularExpressionFlags 

RegularExpressionBody :: 
            RegularExpressionFirstChar RegularExpressionChars 

RegularExpressionChars :: 
            [empty] 
            RegularExpressionChars RegularExpressionChar 

RegularExpressionFirstChar :: 
            NonTerminator but not * or \ or / 
            BackslashSequence 

RegularExpressionChar :: 
            NonTerminator but not \ or / 
            BackslashSequence 
  
                   etc.


I will cc Waldemar to consider whether this part of the spec might or
should be changed -
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Status: UNCONFIRMED → NEW
Ever confirmed: true
The ECMA spec is as was intended.  Slashes must be escaped within regular
expression literals but not within strings parsed as regular expressions.  The
same thing applies to other characters such as line breaks.
fwiw, I tested cvs tip w/
attachment 97782 [details] as jsregexp.h and
attachment 113430 [details] as jsregexp.c
and the error reporting is still broken.
(error reporting works correctly w/ cvs tip)
Blocks: RegExpPerf
Whiteboard: [Patch for bug 85721 breaks error reporting]
timeless is right:


----------------------  CURRENT SPIDERMONKEY TIP  -------------------------
js> var tx = /[*=/]/;
10: SyntaxError: unterminated character class [:
10: var tx = /[*=/]/;
10: .........^


----------------  SPIDERMONKEY WITH LATEST 85721 PATCH  -------------------
js> var tx = /[*=/]/;
10: SyntaxError: unterminated character class *:
10: var tx = /[*=/]/;
10: .........^


---------------------------  CURRENT RHINO  -------------------------------
js> var tx = /[*=/]/;
js: "<stdin>", line 10: uncaught JavaScript exception: SyntaxError:
missing ; before statement (<stdin>; line 10)
js: var tx = /[*=/]/;
js: ..............^
Updating Summary to make it more searchable:

from: Long Regular Expressions are handled badly
  to: Using "/" character unescaped in RegExp literals causes an error
Summary: Long Regular Expressions are handled badly → Using "/" character unescaped in RegExp literals causes an error
This seems like two bugs in one:

1.  A bug about the error report not saying exactly what was wrong.  That's
minor and if you look for the character class closing bracket, see it, and
wonder why the engine didn't, you'll probably see the /.  Unless the regexp is
very long! Something to fix, for sure.

2.  An IE parity bug.  This may matter more than 1.

/be
Summary: Using "/" character unescaped in RegExp literals causes an error → Error using '/' unescaped in RegExp literal in character class, unlike IE
A Dutch site for ordering school books fails in version 1.7 on

match1 = /^[0-9][\.][0-9][0-9][0-9][/][0-9][0-9][0-9][0-9][0-9]$/;

(http://www.vandijk.nl/VDSStudents/AddSaleOrder)

The KDE 3.2.3 Konqueror browser is less strict in the interpretation
of the regexp syntax and handles this regexp as intended by the programmer.

In this case, I think compatibility is better than strict obying the syntax rules.

Thanks for the development of this excellent browser!



for reference, we currently treat the ie parity bug as bug 224988 (tech evang)
Assignee: rogerl → general
*** Bug 294145 has been marked as a duplicate of this bug. ***
I'm not entirely certain what everybody is counting as bugs here, but there is
certainly a bug in the error reporting for unclosed character classes since
we're passing a jschar* as a char* (so the output is always one character
long). This patch fixes that problem. I guess we still need to decide whether
or not actually accept this (invalid) regexp the way IE and Konqueror do.

(for the record, I'm against accepting it and all for sticking to this part of
the spec. which seems to be well-specified, if slightly less user-friendly, but
I'll leave it up to Brendan and Shaver to decide on that).
Assignee: general → mrbkap
Status: NEW → ASSIGNED
Attachment #186864 - Flags: review?(brendan)
Comment on attachment 186864 [details] [diff] [review]
fix error reporting

r+a=me, and agreed: this should remain an error.  Thanks,

/be
Attachment #186864 - Flags: review?(brendan)
Attachment #186864 - Flags: review+
Attachment #186864 - Flags: approval1.8b3+
Fix checked in (for the error reporting).
Status: ASSIGNED → RESOLVED
Closed: 22 years ago19 years ago
Resolution: --- → FIXED
Flags: testcase?
Checking in regress-173067.js;
/cvsroot/mozilla/js/tests/js1_5/Regress/regress-173067.js,v  <--  regress-173067.js
initial revision: 1.1
Flags: testcase? → testcase+
#
# behavior changed to match MSIE/Opera/etc
# see bug 309840
#
js1_5/Regress/regress-173067.js is now excluded in spidermonkey-n.tests
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: