Closed Bug 418099 Opened 18 years ago Closed 17 years ago

RegExp classrange (case-insensitive) does not follow ECMA spec.

Categories

(Core :: JavaScript Engine, defect)

Other
Linux
defect
Not set
minor

Tracking

()

RESOLVED DUPLICATE of bug 416933

People

(Reporter: mozilla, Unassigned)

Details

ECMA-spec 15.10.2.16 (page 142) gives an example for classranges when case-insensitive is activated: /[E-f]/i should match the symbols [,\,],^,_ and `. FF2.0.0.12 and FF3.0beta3 do not match these symbols. (neither does Rhino 1.6rel7 btw).
fwiw, MSIE seems to get this right. I'd suggest the test be: /[E-c]/i.test(...) instead of /[E-f]/i. /[E-c]/i currently yields: Error: invalid range in character class
/[E-e]/i is also interesting
The spec is internally inconsistent here, I think. For example: The internal helper function CharacterRange takes two CharSet parameters A and B and performs the following: 1. If A does not contain exactly one character or B does not contain exactly one character then throw a SyntaxError exception. 2. Let a be the one character in CharSet A. 3. Let b be the one character in CharSet B. 4. Let i be the code point value of character a. 5. Let j be the code point value of character b. 6. If I > j then throw a SyntaxError exception. 7. Return the set containing all characters numbered i through j, inclusive. In other words, the example that it later provides: /[E-f]/i should yield a syntax error (according to item #6 above, assuming the sudden change in capitalization is an accident and not referring to some other variable "I").
I think the pseudocode here should be considered more authoritative than the example. This means that we're accepting bogus ranges that we shouldn't be accepting. I'll check it out in a bit...
The spec seems fine to me: code point of E is 69 and of f=102. So Classrange returns 69-102. The case-sensitivity (or insensitivity) comes into play elsewhere: 15.10.2.8 (3rd production): Atom::CharacterClass: 1. Evaluate CharacterClass to obtain a CharSet A and a boolean invert. 2. Call CharacterSetMatcher(A, invert) and return its Matcher result. It's in the CharacterSetMatcher that the chars are then canonicalized (on the next page): CharacterSetMatcher step 6/8: If there [does not exist/exists] a member a of set A such that Canonicalize(a) == cc, then return [true/failure]. So each char of the CharSet needs to be uppercased and then compared to the uppercased input-char. (semantically at least).
Re: comment 5 -- ooops, thanks, my bad.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.