Closed
Bug 488666
Opened 16 years ago
Closed 7 years ago
RegExp support for Unicode is inadequate
Categories
(Tamarin Graveyard :: Virtual Machine, defect)
Tracking
(Not tracked)
RESOLVED
INVALID
Future
People
(Reporter: lhansen, Unassigned)
References
Details
Standard character set names like \w and \s follow ES3 and support ASCII only, this is no good now that there are many more internet users outside the English-speaking world than in it.
PCRE already supports (we think) the \p and \P sets proposed for ES4, but they are disabled in the code. Supporting those is one idea. Another is to examine whether we want to follow Posix, or Perl, or Java.
Comment 1•16 years ago
|
||
is ES5, and/or the TraceMonkey RegExp engine, doing anything regarding enhanced Unicode support? (if so, we should consider those too)
Reporter | ||
Comment 2•16 years ago
|
||
Nothing appears to have happened in ES5 re Unicode, I checked the latest draft. I have not asked about or checked Harmony, nor TraceMonkey. cc'ing Brendan.
Reporter | ||
Comment 3•16 years ago
|
||
Peter Hall makes the case that character class abbreviations like \p and \P require a lot of work to combine into useful sets corresponding to e.g. \w, and that operators like \b are intractable (impossible? don't know what we can do with lookahead / negative lookahead) without built-in support.
Reporter | ||
Updated•15 years ago
|
Blocks: regex-upgrade
Target Milestone: --- → Future
Comment 4•7 years ago
|
||
There was a proposal to make `\w`, `\d`, and `\b` & `\B` when the ES6 `u` flag is set, but it was rejected. https://github.com/mathiasbynens/es-regexp-unicode-character-class-escapes/blob/master/d-w-b.md
Adding support for Unicode property escapes to regular expressions (bug 1361876) should take away some of the pain.
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•