Closed Bug 100199 Opened 24 years ago Closed 24 years ago

[], [^] are valid RegExp conditions

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
major

Tracking

()

VERIFIED FIXED

People

(Reporter: pschwartau, Assigned: rogerl)

References

()

Details

(Keywords: js1.5, Whiteboard: [Needs fix in Rhino, too])

Attachments

(2 files)

Getting syntax errors when I try the regexp /[]/ in SpiderMonkey or Rhino: IN SPIDERMONKEY: js> 'xyz'.match(/[]/) 9: unterminated character class [: 9: 'xyz'.match(/[]/) 9: ............^ IN RHINO: js> js> 'xyz'.match(/[]/) js: uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [] The empty character class [] is a valid RegExp construct : it is the condition that a character belong to the set containing no characters. As such, no character can match this condition, and any RegExp containing [] should produce a null match. But it should not produce a syntax error.
Similarly, the condition [^] is a valid RegExp construct. ANY character should match this condition. Works properly in SpiderMonkey, but not Rhino: IN SPIDERMONKEY: js> 'abc'.match(/[^]/) a IN RHINO: 'abc'.match(/[^]/) uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [^]
Keywords: js1.5
Whiteboard: [Needs fix in Rhino, too]
Updating summary -
Summary: The empty character class [] is a valid RegExp condition → [], [^] are valid RegExp conditions
Testcase added to JS test suite: mozilla/js/tests/ecma_3/RegExp/regress-100199.js
cc'ing reviewers -
Status: NEW → ASSIGNED
r=khanson
Comment on attachment 53631 [details] [diff] [review] re-adjust pointer increment to allow empty bracket cvs diff -u (use -10 or more for more context) next time, please. sr=brendan@mozilla.org /be
Attachment #53631 - Flags: superreview+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verified FIXED. The standalone JS testcase now passes in both the debug and optimized JS shells on WinNT, Linux, and Mac9.1.
Status: RESOLVED → VERIFIED
This patch seems to have introduced a regression. The character "]" now matches every character class... js> "]".match(/[a]/); ] js> "]".match(/a/); null Both expressions should return null. See also bug 113921
Blocks: 113921
Severity: normal → major
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Have added more cases to the existing testcase, to cover strings like '[', ']', etc. Note that Rob's example can be reduced even further: js> "]".match(/[]/); ] <-------------------------- NO! No character can satisfy [] condition js> "]".match(/[^]/); null <-------------------------- NO! Every character can satisfy [^] condition
."d:\Program Files\Perl\bin\Perl.exe" -e"print 'a' if 'a'=~m/[^]/" /[^]/: unmatched [] in regexp at -e line 1. I know perl doesn't count for anything, i'm off to find a better regexp manual.
Perl *is* my regexp manual :-)
the following tests are worth nothing: nc4 also claims that /[^]/ is not a valid regexp: JavaScript Error: unterminated character class [ input ie5.5 claims that /[^]/ is not a valid regexp: Error: Expected ']' in regular expression the language spec trumps everything because someone had to write a standard to trump everything, it's here: http://www.mozilla.org/js/language/E262-3.pdf (thanks brendan)
From the ECMA-262 Edition 3 Final, here is the relevant part of the spec: 15.10.1 Patterns CharacterClass :: [ [lookahead ?{^}] ClassRanges ] [ ^ ClassRanges ] ClassRanges :: [empty] NonemptyClassRanges This shows that according to ECMA, [] and [^] are both valid.
Nuts. The previous fix moved the increment but I forgot that the end pointer (in kid2) is used to limit the eventual parse of the class - so it was always including the ']' as in the set.
Comment on attachment 61362 [details] [diff] [review] Fixes over grabbing pointer r=rginda
Attachment #61362 - Flags: review+
Comment on attachment 61362 [details] [diff] [review] Fixes over grabbing pointer Yoiks, should've caught that in review. I'm checking in for 0.9.7. /be
Attachment #61362 - Flags: superreview+
Checked in -- hope this conforms with ECMA-262 Edition 3: js> r=/[]/ /[]/ js> r('hi') null js> r2=/[^]/ /[^]/ js> r2('bye') b js> r2('') null js> r('') null /be
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
I checked with Waldemar on this. All the examples are ECMA-conforming. The first two clearly are, from the explanation at the top of this bug: js> r('hi') null js> r2('bye') b What about the latter two? js> r('') null js> r2('') null Both are also ECMA-conforming. We always expect r(anything) to be null. The character class [] is a condition that can never be satisfied, and always produces a null match. On top of that, the empty string contains no characters to test. Hence the match is null. Why is r2('') === null? Because [^] is also a character class. Although it is satisfied by any character, the empty string contains no characters, and is not itself a character. The match is again null. Contrast this with the following example: js> r3 = /a*/; js> r3('hi') == ''; true js> r3('') == ''; true The regexp pattern is to "match 'a' 0 or more times". It is not a character class in this case. As such, it is not looking for characters, but strings. It matches 0 times in each example, finding the empty string at the beginning of 'hi' and '', respectively. Moral: the empty string is present in every string as a substring, but not as a character. Regexp character classes will never match it.
Verified FIXED. Using JS source pulled at 5PM PST today, the above testcase passes in the debug, optimized JS shells built on WinNT, Linux, and Mac9.1. The testcase includes Rob's examples above.
Status: RESOLVED → VERIFIED
I've filed bug 114969 as the Rhino version of this bug -
Flags: testcase+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: