regex [^]] format support




JavaScript Engine
5 years ago
5 years ago


(Reporter: Chris Klopfenstein, Unassigned)


23 Branch
Windows 7

Firefox Tracking Flags

(Not tracked)




5 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0 (Beta/Release)
Build ID: 20130814063812

Steps to reproduce:

In javascript regex, a charclass (list of characters to NOT match) is enclosed in [^ ] .

To include the ] character itself in the list, you either
1. make it 1st on the list (after the ^)
2. put it anywhere else in the list with a \ escape prefix.

The 1st case does not work in FireFox or Chrome. It does in IE. Unfortunately, the 2nd case does not work in a C-compiled system that is double checking the input on the server side (which is restricted to using the same exact RE), so I need case #1 to work in FF.

Reference information on this aspect of regex: 

Actual results:

A test page to illustrate:

<!DOCTYPE html>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <title>Regex Test</title>
    <script type="text/javascript">
		function regexTest()
			var s= "test,test,test";
			// WORKS
			// var regex= /^[^\]\[{}]+$/;
			// WORKS
			// var regex= /^[^\[\]{}]+$/;
			// FAILS...
			var regex= /^[^]\[{}]+$/;
			var a = s.match(regex);
			document.getElementById("results").innerHTML = a.toString();
	<button onclick="regexTest()">Regex Test</button>
	<p id="results">results</p>

Expected results:

The regexes shown are supposed to match on anything but the following characters: []{}
Hoping to get support for the 3rd format, where the ] character doesn't need a \ escape if it is 1st in the charclass list after ^
> 1. make it 1st on the list (after the ^)

Why do you think that?

Looking at the spec, the syntax seems to be:

CharacterClass ::
  [ [lookahead ∉ {^}] ClassRanges ]
  [ ^ ClassRanges ]

ClassRanges ::

NonemptyClassRanges ::
  ClassAtom NonemptyClassRangesNoDash
  ClassAtom - ClassAtom ClassRanges

NonemptyClassRangesNoDash ::
  ClassAtomNoDash NonemptyClassRangesNoDash
  ClassAtomNoDash - ClassAtom ClassRanges

ClassAtom ::

ClassAtomNoDash ::
  SourceCharacter but not one of \ or ] or -
  \ ClassEscape

So "[^]]" is not a valid CharacterClass production as far as I can see.  I'm not quite sure what we end up doing with it (e.g. why it doesn't throw).

Am I missing something that defines that "[^]]" should match "a", say?

Comment 2

5 years ago
The reference cited in comment 0 is wrong regarding "placing [']'] in a position where [it does] not take on [its] special meaning".  The right square bracket character must be escaped to be present in a character class in ECMAScript/JavaScript (other languages may differ, of course).

I see what bz sees, in ES5 and ES6 both.  Unless and until TC39 decides to add this to ECMAScript, this is WONTFIX.
Last Resolved: 5 years ago
Resolution: --- → WONTFIX
> So "[^]]" is not a valid CharacterClass production as far as I can see. 

Ah, I'm wrong.  It's a valid CharacterClass production with '^' and then [empty] follosed by a ']': so it's a character class that negates the empty character class.  So /[^]]/ is the same thing as /.]/.

Comment 4

5 years ago
Thanks for the quick response.
I think the interpretation in comment 3 is not the functionality I was hoping for.
I appreciate it may not be specified in a standard to which the javascript core is built.
I'm left with the fact that IE7,8,9,10 supports all three options, and the server backend which applies the same regex ONLY works with the 3rd option, both in HPUX and RHEL using their native C libraries, so there appears to be no solution compatible with all these environments.
You need to log in before you can comment on or make changes to this bug.