Closed Bug 488666 Opened 16 years ago Closed 7 years ago

RegExp support for Unicode is inadequate

Tracking

(Not tracked)

Status:

RESOLVED INVALID

Milestone:

Future

People

(Reporter: lhansen, Unassigned)

References

Details

Lars T Hansen

Reporter

Description

•

16 years ago

Standard character set names like \w and \s follow ES3 and support ASCII only, this is no good now that there are many more internet users outside the English-speaking world than in it. PCRE already supports (we think) the \p and \P sets proposed for ES4, but they are disabled in the code. Supporting those is one idea. Another is to examine whether we want to follow Posix, or Perl, or Java.

Steven Johnson

Comment 1

•

16 years ago

is ES5, and/or the TraceMonkey RegExp engine, doing anything regarding enhanced Unicode support? (if so, we should consider those too)

Lars T Hansen

Reporter

Comment 2

•

16 years ago

Nothing appears to have happened in ES5 re Unicode, I checked the latest draft. I have not asked about or checked Harmony, nor TraceMonkey. cc'ing Brendan.

Lars T Hansen

Reporter

Comment 3

•

16 years ago

Peter Hall makes the case that character class abbreviations like \p and \P require a lot of work to combine into useful sets corresponding to e.g. \w, and that operators like \b are intractable (impossible? don't know what we can do with lookahead / negative lookahead) without built-in support.

Lars T Hansen

Reporter

Updated

•

15 years ago

Blocks: regex-upgrade

Target Milestone: --- → Future

Dan Smith

Updated

•

13 years ago

Flags: flashplayer-qrb+

Mathias Bynens

Comment 4

•

7 years ago

There was a proposal to make `\w`, `\d`, and `\b` & `\B` when the ES6 `u` flag is set, but it was rejected. https://github.com/mathiasbynens/es-regexp-unicode-character-class-escapes/blob/master/d-w-b.md Adding support for Unicode property escapes to regular expressions (bug 1361876) should take away some of the pain.

Mathias Bynens

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → INVALID

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

RegExp support for Unicode is inadequate

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

Tracking

(Not tracked)

People

(Reporter: lhansen, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 4

Updated