Implement the RegExp Match Indices proposal
Categories
(Core :: JavaScript: Standard Library, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox88 | --- | fixed |
People
(Reporter: alex.fdm, Assigned: iain)
References
(Blocks 1 open bug, )
Details
(Keywords: dev-doc-complete)
Attachments
(4 files)
The proposal is currently in Stage 2.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
This proposal should be straightforward for us to implement. The regexp engine already returns matches in this form (an array of start/end indices), which we post-process to turn into match strings. All we have to do is expose them. We can probably do this while implementing named captures, which should be happening soon (either in 77 or in 78).
I will point out that we are using the same engine as V8 under the covers, so from the point of view of multiple implementations for TC39, it would be good to hear from JSC.
Reporter | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
While working on named captures, I started looking at match indices. It turns out to be more work to implement efficiently than I had hoped.
Match indices add an additional property to every match result, which is an array containing other arrays. Eagerly allocating those objects is likely to add a noticeable performance penalty to regexp matches. Unlike named captures, which opt in to creating new objects using new syntax, match indices are required for every match, which has the potential to slow down the 99%+ of matches that don't use capture indices in favour of the tiny fraction that do.
It's probably necessary to generate these objects on-demand when the indices
property is accessed. There's some tricky caching required to get this right, and reading through the comments on the bug tracking V8's implementation, it seems they've had problems with both performance and correctness.
Making this fast might involve a significant modification to the way we represent regexp results.
(Note: all of the interesting work for match indices happens outside the regexp engine proper.)
Assignee | ||
Comment 5•2 years ago
|
||
The January 2021 TC39 meeting changed the specification of match indices to only generate the indices
property when a new /d
flag is used. This fixes the performance concerns. We can go ahead and implement this.
Assignee | ||
Comment 6•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
Before match indices, all match results had the same shape. (If there are no named capture groups, .groups
exists but is undefined.) Match results with .indices
have a distinct shape, so we need a distinct template object. We also need a template object for .indices
itself.
Depends on D107135
Assignee | ||
Comment 8•2 years ago
|
||
The spec text is here: https://tc39.es/proposal-regexp-match-indices
Unfortunately, the diff algorithm used to generate the spec text for proposals assigns a number to deleted steps, so the step numbers in RegExpBuiltinExec will be different in the final spec. In an attempt to make the step numbers meaningful, I manually renumbered them. Here's the gist that I worked from: https://gist.github.com/iainireland/8e7541a95a7ec6900ce70fc44f9a3b76
match-indices-warp.js
ensures that we have at least one testcase that calls a /d
regexp from Warp. match-indices-dictionary
ensures we have coverage of dictionary mode; it is based on regexp/bug1640473.js
.
Depends on D107136
Assignee | ||
Comment 9•2 years ago
|
||
Depends on D107137
Comment 10•2 years ago
|
||
(In reply to Iain Ireland [:iain] from comment #8)
Unfortunately, the diff algorithm used to generate the spec text for proposals assigns a number to deleted steps, so the step numbers in RegExpBuiltinExec will be different in the final spec. In an attempt to make the step numbers meaningful, I manually renumbered them.
You can view the diff for the actual ECMA262 Pull Request for this feature here: https://arai-a.github.io/ecma262-compare/?pr=1713. This tool generates correct algorithm step numbers as part of the diff.
I've also updated the proposal repository to link to the official PR diff rather than the generated proposal text.
Comment 11•2 years ago
|
||
Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9b26c3d0fe80 Add hasIndices to RegExpFlags r=mgaudet https://hg.mozilla.org/integration/autoland/rev/9b48010357a1 Support .indices in match result template objects r=mgaudet https://hg.mozilla.org/integration/autoland/rev/34ba1b4f06ca Create match indices r=mgaudet https://hg.mozilla.org/integration/autoland/rev/054f16f96be2 Enable test262 tests r=mgaudet
Comment 12•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9b26c3d0fe80
https://hg.mozilla.org/mozilla-central/rev/9b48010357a1
https://hg.mozilla.org/mozilla-central/rev/34ba1b4f06ca
https://hg.mozilla.org/mozilla-central/rev/054f16f96be2
Reporter | ||
Updated•2 years ago
|
Comment 13•7 months ago
|
||
What is needed for us to remove dev-doc-needed on this? From what I can see the main implication is documentation and compatibility data for hasIndices()
- and that seems to have been already done.
Assignee | ||
Comment 14•7 months ago
|
||
It looks good to me!
The only thing I would consider adding is a more prominent description (maybe on the hasIndices
page?) of the structure of the output, similar to the examples here. In particular, it might be nice to document the interaction between /d
and named capture groups.
Comment 15•7 months ago
|
||
Thanks very much Iain. Updates as suggested are in https://github.com/mdn/content/pull/21966
Description
•