Closed Bug 100199 Opened 23 years ago Closed 23 years ago

[], [^] are valid RegExp conditions

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
major

Tracking

()

VERIFIED FIXED

People

(Reporter: pschwartau, Assigned: rogerl)

References

()

Details

(Keywords: js1.5, Whiteboard: [Needs fix in Rhino, too])

Attachments

(2 files)

Getting syntax errors when I try the regexp /[]/ in SpiderMonkey or Rhino:

IN SPIDERMONKEY:
js> 'xyz'.match(/[]/)
9: unterminated character class [:
9: 'xyz'.match(/[]/)
9: ............^

IN RHINO: 
js> js> 'xyz'.match(/[]/)
js: uncaught JavaScript exception: SyntaxError: Unterminated parenthetical []


The empty character class [] is a valid RegExp construct : it is the 
condition that a character belong to the set containing no characters.
As such, no character can match this condition, and any RegExp containing
[] should produce a null match. But it should not produce a syntax error.
Similarly, the condition [^] is a valid RegExp construct. ANY character
should match this condition. Works properly in SpiderMonkey, but not Rhino:

IN SPIDERMONKEY:
js>  'abc'.match(/[^]/)
a

IN RHINO:
'abc'.match(/[^]/)
uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [^]
Keywords: js1.5
Whiteboard: [Needs fix in Rhino, too]
Updating summary -
Summary: The empty character class [] is a valid RegExp condition → [], [^] are valid RegExp conditions
Testcase added to JS test suite:

         mozilla/js/tests/ecma_3/RegExp/regress-100199.js
cc'ing reviewers -
Status: NEW → ASSIGNED
r=khanson
Comment on attachment 53631 [details] [diff] [review]
re-adjust pointer increment to allow empty bracket

cvs diff -u (use -10 or more for more context) next time, please.

sr=brendan@mozilla.org

/be
Attachment #53631 - Flags: superreview+
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Verified FIXED.

The standalone JS testcase now passes in both the debug and optimized
JS shells on WinNT, Linux, and Mac9.1. 
Status: RESOLVED → VERIFIED
This patch seems to have introduced a regression.  The character "]" now matches
every character class...

js> "]".match(/[a]/);
]
js> "]".match(/a/);
null

Both expressions should return null.

See also bug 113921
Blocks: 113921
Severity: normal → major
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Have added more cases to the existing testcase, to cover strings like
'[', ']', etc. Note that Rob's example can be reduced even further:


js> "]".match(/[]/);
]    <-------------------------- NO! No character can satisfy [] condition


js> "]".match(/[^]/);
null <-------------------------- NO! Every character can satisfy [^] condition
."d:\Program Files\Perl\bin\Perl.exe" -e"print 'a' if 'a'=~m/[^]/"
/[^]/: unmatched [] in regexp at -e line 1.

I know perl doesn't count for anything, i'm off to find a better regexp manual.
Perl *is* my regexp manual :-)
the following tests are worth nothing:
nc4 also claims that /[^]/ is not a valid regexp:
JavaScript Error: unterminated character class [ 
input

ie5.5 claims that /[^]/ is not a valid regexp:
Error: Expected ']' in regular expression

the language spec trumps everything because someone had to write a standard to
trump everything, it's here: http://www.mozilla.org/js/language/E262-3.pdf
(thanks brendan)
From the ECMA-262 Edition 3 Final, here is the relevant part of the spec:

15.10.1 Patterns

CharacterClass ::
    [ [lookahead ?{^}] ClassRanges ]
    [ ^ ClassRanges ]

ClassRanges ::
    [empty]
    NonemptyClassRanges


This shows that according to ECMA, [] and [^] are both valid.
Nuts. The previous fix moved the increment but I forgot that the end pointer
(in kid2) is used to limit the eventual parse of the class - so it was always
including the ']' as in the set.
Comment on attachment 61362 [details] [diff] [review]
Fixes over grabbing pointer

r=rginda
Attachment #61362 - Flags: review+
Comment on attachment 61362 [details] [diff] [review]
Fixes over grabbing pointer

Yoiks, should've caught that in review.  I'm checking in for 0.9.7.

/be
Attachment #61362 - Flags: superreview+
Checked in -- hope this conforms with ECMA-262 Edition 3:

js> r=/[]/
/[]/
js> r('hi')
null
js> r2=/[^]/
/[^]/
js> r2('bye')
b
js> r2('')
null
js> r('')
null

/be
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I checked with Waldemar on this. All the examples are ECMA-conforming.
The first two clearly are, from the explanation at the top of this bug:

js> r('hi')
null
js> r2('bye')
b

What about the latter two? 

js> r('')
null
js> r2('')
null

Both are also ECMA-conforming. We always expect r(anything) to be null.
The character class [] is a condition that can never be satisfied, and
always produces a null match. On top of that, the empty string contains
no characters to test. Hence the match is null.

Why is r2('') === null?  Because [^] is also a character class.
Although it is satisfied by any character, the empty string contains
no characters, and is not itself a character. The match is again null.

Contrast this with the following example:

js> r3 = /a*/;
js> r3('hi') == '';
true
js> r3('') == '';
true

The regexp pattern is to "match 'a' 0 or more times". It is not a
character class in this case. As such, it is not looking for characters,
but strings. It matches 0 times in each example, finding the empty string
at the beginning of 'hi' and '', respectively.

Moral: the empty string is present in every string as a substring,
but not as a character. Regexp character classes will never match it.
Verified FIXED.

Using JS source pulled at 5PM PST today, the above testcase passes
in the debug, optimized JS shells built on WinNT, Linux, and Mac9.1.

The testcase includes Rob's examples above.
Status: RESOLVED → VERIFIED
I've filed bug 114969 as the Rhino version of this bug -
Flags: testcase+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: