Closed Bug 1362154 Opened 4 years ago Closed 1 year ago

Implement RegExp named groups

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla78
Webcompat Priority P2
Tracking Status
firefox60 --- wontfix
firefox78 --- fixed

People

(Reporter: leonardo.balter, Assigned: iain)

References

Details

(Keywords: dev-doc-complete, parity-chrome, parity-safari)

Attachments

(12 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36

Steps to reproduce:

The TC39's proposal is currently on stage 3:

https://github.com/tc39/proposal-regexp-named-groups

Test262 files are using the `regexp-named-groups` features flag:

test262/built-ins/RegExp/named-groups/functional-replace-global.js
test262/built-ins/RegExp/named-groups/functional-replace-non-global.js
test262/built-ins/RegExp/named-groups/groups-properties.js
test262/built-ins/RegExp/named-groups/lookbehind.js
test262/built-ins/RegExp/named-groups/non-unicode-match.js
test262/built-ins/RegExp/named-groups/non-unicode-malformed.js
test262/built-ins/RegExp/named-groups/non-unicode-property-names.js
test262/built-ins/RegExp/named-groups/non-unicode-references.js
test262/built-ins/RegExp/named-groups/string-replace-get.js
test262/built-ins/RegExp/named-groups/string-replace-numbered.js
test262/built-ins/RegExp/named-groups/string-replace-missing.js
test262/built-ins/RegExp/named-groups/string-replace-unclosed.js
test262/built-ins/RegExp/named-groups/string-replace-undefined.js
test262/built-ins/RegExp/named-groups/unicode-match.js
test262/built-ins/RegExp/named-groups/unicode-property-names.js
test262/built-ins/RegExp/named-groups/unicode-references.js
The skip list is here: https://bugzilla.mozilla.org/show_bug.cgi?id=1362169
Depends on: 1367105
This proposal was approved by TC39 and is now part of the ES2018 standard.
It is already supported by current Chrome and by Safari TP.
This proposal was approved by TC39 and is now part of the ES2018 standard.
It is already supported by current Chrome.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
This was supported in Chrome 64 and now on Safari 11.1.  Maybe some "parity" tags should be added here.
I've added the parity flags. Hopefully this and other new regexp features like lookbehinds are prioritized at some point!

Sebastian

(In reply to Jacob Pratt from comment #7)

Is there any implementation happening on this?

As you can see on the previous comments, obviously not, unfortunately.

If not, where would I need to look to get started?

If you mean by that, you want to contribute a patch that implements the feature, you should set up a development environment, read the specification describing the algorithm (maybe also see the proposal for further info), and ask a peer for help where the code lies and how to get started by entering their email address in the "Need more information from" field below the comment box.

Having said that, this feature is presumably rather complex and not something for beginners, but I am not a peer, just another one who can't await that this feature to be implemented.

Sebastian

Type: defect → enhancement

(In reply to Jacob Pratt from comment #7)

Is there any implementation happening on this? If not, where would I need to look to get started?

This feature is tied to its dependency https://bugzilla.mozilla.org/show_bug.cgi?id=1367105. If we can get that one done, we should be able to get all the new RegExp features, not just this one.

I don't think we'll have to implement this from scratch as part of the work is already done, but it's not a simple copy-paste fix either.

(In reply to Sebastian Zartner [:sebo] from comment #8)

read the specification describing the algorithm (maybe also see the proposal for further info) …

I just realized the links I provided are for the wrong feature (lookbehinds, see bug 1225665). Named groups were proposed here.

(In reply to Porama Ruengrairatanaroj from comment #9)

This feature is tied to its dependency https://bugzilla.mozilla.org/show_bug.cgi?id=1367105. If we can get that one done, we should be able to get all the new RegExp features, not just this one.

I don't think we'll have to implement this from scratch as part of the work is already done, but it's not a simple copy-paste fix either.

Thanks for the info, Porama! So, hopefully that one will be picked up again soon!

Sebastian

Media are seeing bustages with https://developer.apple.com/videos/ videos due to scripts used in video playback requiring named capturing groups, per bug 1580895.

Blocks: 1580895

@sdetar, could you check that this has the appropriate priority? It's causing web breakage in at least one known case (bug 1580895).

Flags: needinfo?(sdetar)

Haik, thank you for pointing this out. It seems RegExp features are becoming more of a priority for us to address and we are revisiting/discussing now where this fits into our priority list and how we could resource it.

Flags: needinfo?(sdetar)

I am actively working on updating our regexp support, but it is still too soon to be able to say when it will land.

Webcompat Priority: --- → ?

Emma, this is the right one.
The issue in https://webcompat.com/issues/39461
was also reported on Bug 1580895
but the origin of the failure is here. :)
And it's why I had set the seeAlso and the webcompat priority.

Flags: needinfo?(kdubost) → needinfo?(ehumphries)

Oh, I see it now. I saw the notes about A/V error and stopped reading. I should had read further.

Flags: needinfo?(ehumphries)

(setting as a webcompat p2 because bustage on apple.com isn't great)

Webcompat Priority: ? → P2

We are in the process of syncing our copy of irregexp with V8 upstream. When this work is complete, those features will be available in Firefox. The meta bug tracking this work is bug 1367105.

According to https://github.com/webcompat/web-bugs/issues/51194, https://flutter.dev/ relies on this without transpiling (so certain apps will break in Firefox).

My initial shim of FlatStringReader used AutoCheckCannotGC, but the API we need to support doesn't actually require that. We only need to be able to get characters by index, which at most requires a HandleLinearString.

Getting rid of this AutoCheckCannotGC helps pave the way for allocating GC things while parsing regexps, which is a huge simplification for the FixedArray that irregexp uses to return named capture information.

Assignee: nobody → iireland
Status: NEW → ASSIGNED

In debug, CheckPatternSyntax holds an unrooted RegExpShared across a call to the regexp parser. This is the only remaining obstacle to allocating GC things while parsing regexps.

Depends on D76033

The irregexp parser uses FixedArray internally to return information about named captures. Now that we can allocate GC things during regexp parsing, it's trivial to implement FixedArray using the elements of a regular dense ArrayObject.

Depends on D76034

If a regexp contains any named captures, the irregexp parser returns information about those captures in the form of an ArrayObject, where even elements store the capture name and odd elements store the corresponding capture index. We split this information into two parts. The names are used to create a template object with a property for each capture name. The capture indices are stored as a heap-allocated array. Both of these are stored on the RegExpShared.

In the next patch, we use the template object to create a groups object for the regexp match result, and use the array of capture indices to populate that object.

Depends on D76035

When a regexp has named captures, the result object has an extra groups property, which is an object that in turn has a string property per named capture. This patch creates the groups object and attaches it to the result object.

The changes here correspond to this section of the spec proposal, which conveniently highlights the changes: https://tc39.es/proposal-regexp-named-groups/#sec-regexpbuiltinexec

Depends on D76036

Named captures add a groups property to the match result. The shape of that group depends on the regexp. Allocating an object in masm requires monomorphizing on a particular template object, but all of our existing Ion regexp code is designed to handle arbitrary regexps. This means we can't allocate the groups object in jitcode without a lot of work.

Given that we have to call into C++ to allocate, we might as well just use the existing OOL fallback. This performs a VM call to RegExpMatcherRaw, which will simply call CreateRegExpMatchResult.

Depends on D76037

This patch implements the changes to RegExp.prototype[@@replace] to support replacing named captures.

See: https://tc39.es/proposal-regexp-named-groups/#sec-regexp.prototype-@@replace

(I'm not completely sure what to do with out-of-date step numbers. I updated them in code that I modified, and left them alone elsewhere.)

Depends on D76038

This patch implements the changes to GetSubstitution to support named captures.

See https://tc39.es/proposal-regexp-named-groups/#sec-getsubstitution

Depends on D76039

We have a number of tests that expect a particular set of properties on regexp results, which need to be updated now that they have a (generally undefined) groups property.

Depends on D76045

Depends on D76046

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7f7a204fa024
Part 1: Rewrite FlatStringReader r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/e7d230029514
Part 2: Avoid rooting hazard in CheckPatternSyntax r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/7c315167d74a
Part 3: Implement FixedArray r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/1322042730b6
Part 4: Store named capture information in RegExpShared r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/00d2e220a575
Part 5: Create groups property on result object r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/9bad8ab13f06
Part 6: Fall back to OOL call for named captures in Ion r=jandem
https://hg.mozilla.org/integration/autoland/rev/c296b98a4e22
Part 7: Named capture replacements in self-hosted code r=arai
https://hg.mozilla.org/integration/autoland/rev/a468786b4570
Part 8: Named capture replacements in RegExpGetSubstitution r=arai
https://hg.mozilla.org/integration/autoland/rev/e08aed53ff8f
Part 9: Support named captures in test262-update.py r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/0335b0ac6b9f
Part 10: Enable newly supported tests r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/4ac2a27804b4
Part 11: Update expectations for existing tests r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/bd037efc2398
Part 12: Enable named captures r=mgaudet
Depends on: 1640473
Depends on: 1640475
Depends on: 1640479
Depends on: 1640487
Regressions: 1640473
Regressions: 1640487
Regressions: 1640592
See Also: → 1519483
Regressions: 1718842
You need to log in before you can comment on or make changes to this bug.