Closed
Bug 100199
Opened 23 years ago
Closed 23 years ago
[], [^] are valid RegExp conditions
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
People
(Reporter: pschwartau, Assigned: rogerl)
References
()
Details
(Keywords: js1.5, Whiteboard: [Needs fix in Rhino, too])
Attachments
(2 files)
369 bytes,
patch
|
brendan
:
superreview+
|
Details | Diff | Splinter Review |
1.03 KB,
patch
|
rginda
:
review+
brendan
:
superreview+
|
Details | Diff | Splinter Review |
Getting syntax errors when I try the regexp /[]/ in SpiderMonkey or Rhino: IN SPIDERMONKEY: js> 'xyz'.match(/[]/) 9: unterminated character class [: 9: 'xyz'.match(/[]/) 9: ............^ IN RHINO: js> js> 'xyz'.match(/[]/) js: uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [] The empty character class [] is a valid RegExp construct : it is the condition that a character belong to the set containing no characters. As such, no character can match this condition, and any RegExp containing [] should produce a null match. But it should not produce a syntax error.
Reporter | ||
Comment 1•23 years ago
|
||
Similarly, the condition [^] is a valid RegExp construct. ANY character should match this condition. Works properly in SpiderMonkey, but not Rhino: IN SPIDERMONKEY: js> 'abc'.match(/[^]/) a IN RHINO: 'abc'.match(/[^]/) uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [^]
Keywords: js1.5
Reporter | ||
Updated•23 years ago
|
Whiteboard: [Needs fix in Rhino, too]
Reporter | ||
Comment 2•23 years ago
|
||
Updating summary -
Summary: The empty character class [] is a valid RegExp condition → [], [^] are valid RegExp conditions
Reporter | ||
Comment 3•23 years ago
|
||
Testcase added to JS test suite: mozilla/js/tests/ecma_3/RegExp/regress-100199.js
Assignee | ||
Comment 4•23 years ago
|
||
Reporter | ||
Comment 5•23 years ago
|
||
cc'ing reviewers -
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 6•23 years ago
|
||
r=khanson
Comment 7•23 years ago
|
||
Comment on attachment 53631 [details] [diff] [review] re-adjust pointer increment to allow empty bracket cvs diff -u (use -10 or more for more context) next time, please. sr=brendan@mozilla.org /be
Attachment #53631 -
Flags: superreview+
Assignee | ||
Comment 8•23 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 9•23 years ago
|
||
Verified FIXED. The standalone JS testcase now passes in both the debug and optimized JS shells on WinNT, Linux, and Mac9.1.
Status: RESOLVED → VERIFIED
Comment 10•23 years ago
|
||
This patch seems to have introduced a regression. The character "]" now matches every character class... js> "]".match(/[a]/); ] js> "]".match(/a/); null Both expressions should return null. See also bug 113921
Reporter | ||
Comment 11•23 years ago
|
||
Have added more cases to the existing testcase, to cover strings like '[', ']', etc. Note that Rob's example can be reduced even further: js> "]".match(/[]/); ] <-------------------------- NO! No character can satisfy [] condition js> "]".match(/[^]/); null <-------------------------- NO! Every character can satisfy [^] condition
Comment 12•23 years ago
|
||
."d:\Program Files\Perl\bin\Perl.exe" -e"print 'a' if 'a'=~m/[^]/" /[^]/: unmatched [] in regexp at -e line 1. I know perl doesn't count for anything, i'm off to find a better regexp manual.
Comment 13•23 years ago
|
||
Perl *is* my regexp manual :-)
Comment 14•23 years ago
|
||
the following tests are worth nothing: nc4 also claims that /[^]/ is not a valid regexp: JavaScript Error: unterminated character class [ input ie5.5 claims that /[^]/ is not a valid regexp: Error: Expected ']' in regular expression the language spec trumps everything because someone had to write a standard to trump everything, it's here: http://www.mozilla.org/js/language/E262-3.pdf (thanks brendan)
Reporter | ||
Comment 15•23 years ago
|
||
From the ECMA-262 Edition 3 Final, here is the relevant part of the spec: 15.10.1 Patterns CharacterClass :: [ [lookahead ?{^}] ClassRanges ] [ ^ ClassRanges ] ClassRanges :: [empty] NonemptyClassRanges This shows that according to ECMA, [] and [^] are both valid.
Assignee | ||
Comment 16•23 years ago
|
||
Nuts. The previous fix moved the increment but I forgot that the end pointer (in kid2) is used to limit the eventual parse of the class - so it was always including the ']' as in the set.
Comment 17•23 years ago
|
||
Comment on attachment 61362 [details] [diff] [review] Fixes over grabbing pointer r=rginda
Attachment #61362 -
Flags: review+
Comment 18•23 years ago
|
||
Comment on attachment 61362 [details] [diff] [review] Fixes over grabbing pointer Yoiks, should've caught that in review. I'm checking in for 0.9.7. /be
Attachment #61362 -
Flags: superreview+
Comment 19•23 years ago
|
||
Checked in -- hope this conforms with ECMA-262 Edition 3: js> r=/[]/ /[]/ js> r('hi') null js> r2=/[^]/ /[^]/ js> r2('bye') b js> r2('') null js> r('') null /be
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 20•23 years ago
|
||
I checked with Waldemar on this. All the examples are ECMA-conforming. The first two clearly are, from the explanation at the top of this bug: js> r('hi') null js> r2('bye') b What about the latter two? js> r('') null js> r2('') null Both are also ECMA-conforming. We always expect r(anything) to be null. The character class [] is a condition that can never be satisfied, and always produces a null match. On top of that, the empty string contains no characters to test. Hence the match is null. Why is r2('') === null? Because [^] is also a character class. Although it is satisfied by any character, the empty string contains no characters, and is not itself a character. The match is again null. Contrast this with the following example: js> r3 = /a*/; js> r3('hi') == ''; true js> r3('') == ''; true The regexp pattern is to "match 'a' 0 or more times". It is not a character class in this case. As such, it is not looking for characters, but strings. It matches 0 times in each example, finding the empty string at the beginning of 'hi' and '', respectively. Moral: the empty string is present in every string as a substring, but not as a character. Regexp character classes will never match it.
Reporter | ||
Comment 21•23 years ago
|
||
Verified FIXED. Using JS source pulled at 5PM PST today, the above testcase passes in the debug, optimized JS shells built on WinNT, Linux, and Mac9.1. The testcase includes Rob's examples above.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 22•23 years ago
|
||
I've filed bug 114969 as the Rhino version of this bug -
Updated•19 years ago
|
Flags: testcase+
You need to log in
before you can comment on or make changes to this bug.
Description
•