[], [^] are valid RegExp conditions

VERIFIED FIXED

Status

()

Core
JavaScript Engine
--
major
VERIFIED FIXED
16 years ago
13 years ago

People

(Reporter: Phil Schwartau, Assigned: rogerl (gone))

Tracking

({js1.5})

Trunk
x86
All
js1.5
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Needs fix in Rhino, too], URL)

Attachments

(2 attachments)

(Reporter)

Description

16 years ago
Getting syntax errors when I try the regexp /[]/ in SpiderMonkey or Rhino:

IN SPIDERMONKEY:
js> 'xyz'.match(/[]/)
9: unterminated character class [:
9: 'xyz'.match(/[]/)
9: ............^

IN RHINO: 
js> js> 'xyz'.match(/[]/)
js: uncaught JavaScript exception: SyntaxError: Unterminated parenthetical []


The empty character class [] is a valid RegExp construct : it is the 
condition that a character belong to the set containing no characters.
As such, no character can match this condition, and any RegExp containing
[] should produce a null match. But it should not produce a syntax error.
(Reporter)

Comment 1

16 years ago
Similarly, the condition [^] is a valid RegExp construct. ANY character
should match this condition. Works properly in SpiderMonkey, but not Rhino:

IN SPIDERMONKEY:
js>  'abc'.match(/[^]/)
a

IN RHINO:
'abc'.match(/[^]/)
uncaught JavaScript exception: SyntaxError: Unterminated parenthetical [^]
Keywords: js1.5
(Reporter)

Updated

16 years ago
Whiteboard: [Needs fix in Rhino, too]
(Reporter)

Comment 2

16 years ago
Updating summary -
Summary: The empty character class [] is a valid RegExp condition → [], [^] are valid RegExp conditions
(Reporter)

Comment 3

16 years ago
Testcase added to JS test suite:

         mozilla/js/tests/ecma_3/RegExp/regress-100199.js
(Assignee)

Comment 4

16 years ago
Created attachment 53631 [details] [diff] [review]
re-adjust pointer increment to allow empty bracket
(Reporter)

Comment 5

16 years ago
cc'ing reviewers -
(Assignee)

Updated

16 years ago
Status: NEW → ASSIGNED

Comment 6

16 years ago
r=khanson
Comment on attachment 53631 [details] [diff] [review]
re-adjust pointer increment to allow empty bracket

cvs diff -u (use -10 or more for more context) next time, please.

sr=brendan@mozilla.org

/be
Attachment #53631 - Flags: superreview+
(Assignee)

Comment 8

16 years ago
Fix checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
(Reporter)

Comment 9

16 years ago
Verified FIXED.

The standalone JS testcase now passes in both the debug and optimized
JS shells on WinNT, Linux, and Mac9.1. 
Status: RESOLVED → VERIFIED

Comment 10

16 years ago
This patch seems to have introduced a regression.  The character "]" now matches
every character class...

js> "]".match(/[a]/);
]
js> "]".match(/a/);
null

Both expressions should return null.

See also bug 113921
Blocks: 113921
Severity: normal → major
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 11

16 years ago
Have added more cases to the existing testcase, to cover strings like
'[', ']', etc. Note that Rob's example can be reduced even further:


js> "]".match(/[]/);
]    <-------------------------- NO! No character can satisfy [] condition


js> "]".match(/[^]/);
null <-------------------------- NO! Every character can satisfy [^] condition

Comment 12

16 years ago
."d:\Program Files\Perl\bin\Perl.exe" -e"print 'a' if 'a'=~m/[^]/"
/[^]/: unmatched [] in regexp at -e line 1.

I know perl doesn't count for anything, i'm off to find a better regexp manual.

Comment 13

16 years ago
Perl *is* my regexp manual :-)

Comment 14

16 years ago
the following tests are worth nothing:
nc4 also claims that /[^]/ is not a valid regexp:
JavaScript Error: unterminated character class [ 
input

ie5.5 claims that /[^]/ is not a valid regexp:
Error: Expected ']' in regular expression

the language spec trumps everything because someone had to write a standard to
trump everything, it's here: http://www.mozilla.org/js/language/E262-3.pdf
(thanks brendan)
(Reporter)

Comment 15

16 years ago
From the ECMA-262 Edition 3 Final, here is the relevant part of the spec:

15.10.1 Patterns

CharacterClass ::
    [ [lookahead ?{^}] ClassRanges ]
    [ ^ ClassRanges ]

ClassRanges ::
    [empty]
    NonemptyClassRanges


This shows that according to ECMA, [] and [^] are both valid.
(Assignee)

Comment 16

16 years ago
Created attachment 61362 [details] [diff] [review]
Fixes over grabbing pointer

Nuts. The previous fix moved the increment but I forgot that the end pointer
(in kid2) is used to limit the eventual parse of the class - so it was always
including the ']' as in the set.

Comment 17

16 years ago
Comment on attachment 61362 [details] [diff] [review]
Fixes over grabbing pointer

r=rginda
Attachment #61362 - Flags: review+
Comment on attachment 61362 [details] [diff] [review]
Fixes over grabbing pointer

Yoiks, should've caught that in review.  I'm checking in for 0.9.7.

/be
Attachment #61362 - Flags: superreview+
Checked in -- hope this conforms with ECMA-262 Edition 3:

js> r=/[]/
/[]/
js> r('hi')
null
js> r2=/[^]/
/[^]/
js> r2('bye')
b
js> r2('')
null
js> r('')
null

/be
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED
(Reporter)

Comment 20

16 years ago
I checked with Waldemar on this. All the examples are ECMA-conforming.
The first two clearly are, from the explanation at the top of this bug:

js> r('hi')
null
js> r2('bye')
b

What about the latter two? 

js> r('')
null
js> r2('')
null

Both are also ECMA-conforming. We always expect r(anything) to be null.
The character class [] is a condition that can never be satisfied, and
always produces a null match. On top of that, the empty string contains
no characters to test. Hence the match is null.

Why is r2('') === null?  Because [^] is also a character class.
Although it is satisfied by any character, the empty string contains
no characters, and is not itself a character. The match is again null.

Contrast this with the following example:

js> r3 = /a*/;
js> r3('hi') == '';
true
js> r3('') == '';
true

The regexp pattern is to "match 'a' 0 or more times". It is not a
character class in this case. As such, it is not looking for characters,
but strings. It matches 0 times in each example, finding the empty string
at the beginning of 'hi' and '', respectively.

Moral: the empty string is present in every string as a substring,
but not as a character. Regexp character classes will never match it.
(Reporter)

Comment 21

16 years ago
Verified FIXED.

Using JS source pulled at 5PM PST today, the above testcase passes
in the debug, optimized JS shells built on WinNT, Linux, and Mac9.1.

The testcase includes Rob's examples above.
Status: RESOLVED → VERIFIED
(Reporter)

Comment 22

16 years ago
I've filed bug 114969 as the Rhino version of this bug -

Updated

13 years ago
Flags: testcase+
You need to log in before you can comment on or make changes to this bug.