Closed Bug 272395 Opened 20 years ago Closed 20 years ago

JavaScript regex incorrect handling of unescaped literal ] in character class : [^]] or []]

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: bugzilla, Unassigned)

References

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0 The ECMAScript specification (I think), or at least most regex implementations, allow an unescaped ] literal in a character class if it follows the opening [ or the NOT carat [^ , as an alternative to escaping it using \] Minimum example: [^]] and [^\]] should both match any character except a ] but only the latter works. **In the former a regex test fails to match any other characters**. Additionally something like [^]]+ would actually match the literal string ]] The regex behaviour is as expected if the literal ] is escaped. Reproducible: Always Steps to Reproduce: 1. Go to http://www.regular-expressions.info/javascriptexample.html 2. Enter [^]] as the regexp and and non-] as the subject string 3. Test Match button Actual Results: No match Expected Results: Successful match
Cite ECMA-262 Edition 3 before filing INVALID bugs. The spec clearly prohibits ] in a character class without a backslash escaping it: 15.10.2.18 ClassAtomNoDash The production ClassAtomNoDash :: SourceCharacter but not one of \ ] - evaluates by returning a one-element CharSet containing the character represented by SourceCharacter. The production ClassAtomNoDash :: \ ClassEscape evaluates by evaluating ClassEscape to obtain a CharSet and returning that CharSet. /be
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
v
Status: RESOLVED → VERIFIED
*** Bug 322129 has been marked as a duplicate of this bug. ***
The ECMA standard is clearly wrong. I doubt it was their intention to diverge from historical and previous established standards, such as POSIX and the Single Unix Specification, which predate ECMAScript. And since JavaScript RE has its origins in Perl RE which in turn is based on POSIX ERE, this bug should be fixed. For example: $ perl -n -e 'print $_ if /[]]/' <<EOT > some text > more [text] <--- > not this > and ] blah <--- > foobar > EOT more [text] <--- and ] blah <--- $ The above example clearly shows that /[]]/ WORKS. Also performing the above with the escaped right bracket: perl -n -e 'print $_ if /[\]]/' <<EOT > some text > more [text] <--- > not this > and ] blah <--- > foobar > EOT more [text] <--- and ] blah <--- $ Also works. Obviously the correct solution for this bug is to support BOTH forms: /[]]/ historical ERE behaviour /[\]]/ more recent behaviour By supporting historical behaviour, ERE become portable across ALL applications and usages where ERE are supported, which was the intent of POSIX and SUS in the first place.
You need to log in before you can comment on or make changes to this bug.