Closed
Bug 418099
Opened 18 years ago
Closed 17 years ago
RegExp classrange (case-insensitive) does not follow ECMA spec.
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 416933
People
(Reporter: mozilla, Unassigned)
Details
ECMA-spec 15.10.2.16 (page 142) gives an example for classranges when case-insensitive is activated:
/[E-f]/i should match the symbols [,\,],^,_ and `.
FF2.0.0.12 and FF3.0beta3 do not match these symbols. (neither does Rhino 1.6rel7 btw).
fwiw, MSIE seems to get this right.
I'd suggest the test be:
/[E-c]/i.test(...) instead of /[E-f]/i.
/[E-c]/i currently yields: Error: invalid range in character class
Comment 3•18 years ago
|
||
The spec is internally inconsistent here, I think. For example:
The internal helper function CharacterRange takes two CharSet parameters A and B and performs the following:
1. If A does not contain exactly one character or B does not contain exactly one character then throw a SyntaxError exception.
2. Let a be the one character in CharSet A.
3. Let b be the one character in CharSet B.
4. Let i be the code point value of character a.
5. Let j be the code point value of character b.
6. If I > j then throw a SyntaxError exception.
7. Return the set containing all characters numbered i through j, inclusive.
In other words, the example that it later provides: /[E-f]/i should yield a syntax error (according to item #6 above, assuming the sudden change in capitalization is an accident and not referring to some other variable "I").
Comment 4•18 years ago
|
||
I think the pseudocode here should be considered more authoritative than the example. This means that we're accepting bogus ranges that we shouldn't be accepting. I'll check it out in a bit...
The spec seems fine to me: code point of E is 69 and of f=102. So Classrange returns 69-102. The case-sensitivity (or insensitivity) comes into play elsewhere: 15.10.2.8 (3rd production):
Atom::CharacterClass:
1. Evaluate CharacterClass to obtain a CharSet A and a boolean invert.
2. Call CharacterSetMatcher(A, invert) and return its Matcher result.
It's in the CharacterSetMatcher that the chars are then canonicalized (on the next page):
CharacterSetMatcher step 6/8:
If there [does not exist/exists] a member a of set A such that Canonicalize(a) == cc, then return [true/failure].
So each char of the CharSet needs to be uppercased and then compared to the uppercased input-char. (semantically at least).
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•