Closed Bug 488666 Opened 16 years ago Closed 7 years ago

RegExp support for Unicode is inadequate

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID
Future

People

(Reporter: lhansen, Unassigned)

References

Details

Standard character set names like \w and \s follow ES3 and support ASCII only, this is no good now that there are many more internet users outside the English-speaking world than in it. PCRE already supports (we think) the \p and \P sets proposed for ES4, but they are disabled in the code. Supporting those is one idea. Another is to examine whether we want to follow Posix, or Perl, or Java.
is ES5, and/or the TraceMonkey RegExp engine, doing anything regarding enhanced Unicode support? (if so, we should consider those too)
Nothing appears to have happened in ES5 re Unicode, I checked the latest draft. I have not asked about or checked Harmony, nor TraceMonkey. cc'ing Brendan.
Peter Hall makes the case that character class abbreviations like \p and \P require a lot of work to combine into useful sets corresponding to e.g. \w, and that operators like \b are intractable (impossible? don't know what we can do with lookahead / negative lookahead) without built-in support.
Target Milestone: --- → Future
Flags: flashplayer-qrb+
There was a proposal to make `\w`, `\d`, and `\b` & `\B` when the ES6 `u` flag is set, but it was rejected. https://github.com/mathiasbynens/es-regexp-unicode-character-class-escapes/blob/master/d-w-b.md Adding support for Unicode property escapes to regular expressions (bug 1361876) should take away some of the pain.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.