Closed
Bug 173067
Opened 22 years ago
Closed 20 years ago
Error using '/' unescaped in RegExp literal in character class, unlike IE
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
People
(Reporter: douglas, Assigned: mrbkap)
References
()
Details
(Whiteboard: [Patch for bug 85721 breaks error reporting])
Attachments
(2 files)
755 bytes,
text/html
|
Details | |
2.04 KB,
patch
|
brendan
:
review+
brendan
:
approval1.8b3+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2a) Gecko/20020910 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.2a) Gecko/20020910 JSLINT (http://www.crockford.com/javascript/jslint.html) uses a long /regex/ to tokenize JavaScript sources. The main regex is very long. The browser fails to load the file containing the regex, and gives no error. Reproducible: Always Steps to Reproduce: 1. Paste any JavaScript source into the window. 2. Click the JSLINT button. Actual Results: Nothing. Expected Results: Present a report on the conformance of the source. JSLINT works correctly on IE.
Comment 1•22 years ago
|
||
Have to mark this one invalid. The problem is not the size of the regexp literal, but the use of the "/" character in the regexp literal. This works in IE, but does not work in Perl nor in Mozilla/Netscape, and is expressly forbidden by the ECMA-262 specification for the language. Note that "/" is permitted as a pattern character by ECMA, and you can create regexp patterns containing this if you use the RegExp constructor. However, ECMA specifically forbids the use of "/" in RegExp literals. For details on this, see bug 98306, particularly Comments 31-34. Also note the Evangelism bug 101070 filed against Microsoft on this. The ECMA-262 Edition 3 spec is at http://www.mozilla.org/js/language/.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
Comment 2•22 years ago
|
||
> The browser fails to load the file containing the regex, and gives no error. I do see an error. When I load the jslint site in Mozilla 2002093008, I immediately see this error in Tools > Web Development > JavaScript Console: Error: unterminated character class [ Source File: http://www.crockford.com/javascript/jslint.js Line: 4, Column: 45 Source Code: var i=0,line=0,character=0,s=lines[0];var tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00 -\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[& =]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?|\^=?|\!=?=?|[a- Here is what I get when I try to define the same regexp literal |tx| in the standalone JavaScript shell: js> var tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00 -\x1f\x27\\])*\x27|[(){}[\],;?:~^]|== ?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)? |\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+( [xX][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9]+))?)/,rx= /^(\\[^\x00-\x20]|\[(\\[^\x00-\x20]|[^\x00-\x 20\\])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/; 58: SyntaxError: unterminated character class [: 58: var tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00 -\x1f\x27\\])*\x27|[(){}[\],;?:~^]|== ?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)? |\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+( [xX][0-9a 58: .......^ Here is what I get with the RegExp patch for bug 85721 applied: js> var tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00 -\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==? =?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?| \^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+([x X][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9]+))?)/,rx=/^ (\\[^\x00-\x20]|\[(\\[^\x00-\x20]|[^\x00-\x20\ \])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/; 7: SyntaxError: unterminated character class *: 7: var tx=/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00 -\x1f\x27\\])*\x27|[(){}[\],;?:~^]|==?= ?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=]|!--|<=?)?|\ ^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_$]*|[0-9]+([xX ][0-9a 7: .......^ Here is what Perl does: $ perl -e '"abc" =~ /^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00-\x 1f\x27\\])*\x27|[(){}[\] ,;?:~^]|==?=?|\/[*=/]?|\*[/=]?|\+[+=]?|-[-=]?|%=?|&[&=]?|\|[\|=]?|>>?>?=?|<([/=] |!--|<=?)?|\^=?|\!=?=?|[a-zA-Z_$][a-zA-Z0-9_ $]*|[0-9]+([xX][0-9a-fA-F]+|\.[0-9]*)?([eE][+-]?[0-9]+)?|\.([0-9]+([eE][+-]?[0-9 ]+))?)/,rx=/^(\\[^\x00-\x20]|\[(\\[^\x00-\x2 0]|[^\x00-\x20\\])+\]|[^\x00-\x20\\\/\[])+\/[gim]*/,lx=/\*\/|\/\*/;' Unmatched [ before HERE mark in regex m/^\s*(\x22(\\[^\x00-\x1f]|[^\x00-\x1f\x22\\])*\x22|\x27(\\[^\x00-\x1f]|[^\x00-\ x1f\x27\ \])*\x27|[(){}[\],;?:~^]|==?=?|/[ << HERE *=/ at -e line 1. It looks like only Perl is recording the correct position of the error. The contents of the character class following "HERE" is: [*=/] That's the problem: the use of "/" in the regexp literal.
Comment 3•22 years ago
|
||
Reporter | ||
Comment 4•22 years ago
|
||
Thanks for spotting the /... [*=/] .../. Three observations: First, if this is in fact a syntax error, then the browser should report it correctly. Second, I disagree with the interpretation of 15.10.1. It appears to me to allow unescaped / in ClassRanges. Only \ ] - expressly require escapement. 7.8.5 seems the problem here, offering comfort to lazy compiler writers. Third, any time IE interprets the spec in a more useful way, Mozilla suffers by comparison and by de facto incompatibility.
Comment 5•22 years ago
|
||
Douglas: thank you for this report. I'm going to reopen this bug as you suggest, to correct the inaccuracies in our error messages above. In addition, the patch for bug 85721 is incorrectly stating unterminated character class * instead of unterminated character class [ In order for the site to work in Mozilla, all one has to do is use the RegExp constructor instead of a regexp literal whenever the character "/" has to be matched; e.g. replace var tx = /^\s*(\x22 etc. etc. lx=/\*\/|\/\*/ ; with var tx = RegExp('^\s*(\x22 etc. etc. lx=/\*\/|\/\*'); > Second, I disagree with the interpretation of 15.10.1. It appears to me > to allow unescaped / in ClassRanges. Only \ ] - expressly require escapement. Correct. "/" is allowed by ECMA 15.10.1 as a regexp pattern character. That's why the definition of |tx| via the RegExp constructor works. But ECMA 7.8.5 specifically prohibits "/" from regexp literals: 7.8.5 Regular Expression Literals A regular expression literal is an input element that is converted to a RegExp object (section 15.10) when it is scanned. etc. RegularExpressionLiteral :: / RegularExpressionBody / RegularExpressionFlags RegularExpressionBody :: RegularExpressionFirstChar RegularExpressionChars RegularExpressionChars :: [empty] RegularExpressionChars RegularExpressionChar RegularExpressionFirstChar :: NonTerminator but not * or \ or / BackslashSequence RegularExpressionChar :: NonTerminator but not \ or / BackslashSequence etc. I will cc Waldemar to consider whether this part of the spec might or should be changed -
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Updated•22 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 6•22 years ago
|
||
The ECMA spec is as was intended. Slashes must be escaped within regular expression literals but not within strings parsed as regular expressions. The same thing applies to other characters such as line breaks.
fwiw, I tested cvs tip w/ attachment 97782 [details] as jsregexp.h and attachment 113430 [details] as jsregexp.c and the error reporting is still broken. (error reporting works correctly w/ cvs tip)
Blocks: RegExpPerf
Whiteboard: [Patch for bug 85721 breaks error reporting]
Comment 8•22 years ago
|
||
timeless is right: ---------------------- CURRENT SPIDERMONKEY TIP ------------------------- js> var tx = /[*=/]/; 10: SyntaxError: unterminated character class [: 10: var tx = /[*=/]/; 10: .........^ ---------------- SPIDERMONKEY WITH LATEST 85721 PATCH ------------------- js> var tx = /[*=/]/; 10: SyntaxError: unterminated character class *: 10: var tx = /[*=/]/; 10: .........^ --------------------------- CURRENT RHINO ------------------------------- js> var tx = /[*=/]/; js: "<stdin>", line 10: uncaught JavaScript exception: SyntaxError: missing ; before statement (<stdin>; line 10) js: var tx = /[*=/]/; js: ..............^
Comment 9•21 years ago
|
||
Updating Summary to make it more searchable: from: Long Regular Expressions are handled badly to: Using "/" character unescaped in RegExp literals causes an error
Summary: Long Regular Expressions are handled badly → Using "/" character unescaped in RegExp literals causes an error
Comment 10•21 years ago
|
||
This seems like two bugs in one: 1. A bug about the error report not saying exactly what was wrong. That's minor and if you look for the character class closing bracket, see it, and wonder why the engine didn't, you'll probably see the /. Unless the regexp is very long! Something to fix, for sure. 2. An IE parity bug. This may matter more than 1. /be
Summary: Using "/" character unescaped in RegExp literals causes an error → Error using '/' unescaped in RegExp literal in character class, unlike IE
Comment 11•21 years ago
|
||
A Dutch site for ordering school books fails in version 1.7 on match1 = /^[0-9][\.][0-9][0-9][0-9][/][0-9][0-9][0-9][0-9][0-9]$/; (http://www.vandijk.nl/VDSStudents/AddSaleOrder) The KDE 3.2.3 Konqueror browser is less strict in the interpretation of the regexp syntax and handles this regexp as intended by the programmer. In this case, I think compatibility is better than strict obying the syntax rules. Thanks for the development of this excellent browser!
Comment 12•20 years ago
|
||
for reference, we currently treat the ie parity bug as bug 224988 (tech evang)
Assignee: rogerl → general
Comment 13•20 years ago
|
||
*** Bug 294145 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 14•20 years ago
|
||
I'm not entirely certain what everybody is counting as bugs here, but there is certainly a bug in the error reporting for unclosed character classes since we're passing a jschar* as a char* (so the output is always one character long). This patch fixes that problem. I guess we still need to decide whether or not actually accept this (invalid) regexp the way IE and Konqueror do. (for the record, I'm against accepting it and all for sticking to this part of the spec. which seems to be well-specified, if slightly less user-friendly, but I'll leave it up to Brendan and Shaver to decide on that).
Comment 15•20 years ago
|
||
Comment on attachment 186864 [details] [diff] [review] fix error reporting r+a=me, and agreed: this should remain an error. Thanks, /be
Attachment #186864 -
Flags: review?(brendan)
Attachment #186864 -
Flags: review+
Attachment #186864 -
Flags: approval1.8b3+
Assignee | ||
Comment 16•20 years ago
|
||
Fix checked in (for the error reporting).
Status: ASSIGNED → RESOLVED
Closed: 22 years ago → 20 years ago
Resolution: --- → FIXED
Updated•19 years ago
|
Flags: testcase?
Comment 17•19 years ago
|
||
Checking in regress-173067.js; /cvsroot/mozilla/js/tests/js1_5/Regress/regress-173067.js,v <-- regress-173067.js initial revision: 1.1
Flags: testcase? → testcase+
Comment 18•18 years ago
|
||
# # behavior changed to match MSIE/Opera/etc # see bug 309840 # js1_5/Regress/regress-173067.js is now excluded in spidermonkey-n.tests
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•