Closed Bug 1225665 Opened 5 years ago Closed 2 months ago

Implement RegExp lookbehind

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Webcompat Priority P1

People

(Reporter: evilpie, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete, parity-chrome, Whiteboard: [DocArea=JS])

Attachments

(1 file)

Attached patch WIPSplinter Review
I started porting v8's new code for RegExp lookbehind matches:
https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd

I already got it working in the RegEXp interpreter. Sadly porting the v8 code is not straightforward and I had to do it almost by hand.
Assignee: nobody → evilpies
Attachment #8688704 - Attachment is patch: true
"abcdef".match(/(?<=(c))def/) currently produces ["def", undefined] and not ["def", "c"].
Keywords: dev-doc-needed
Whiteboard: [DocArea=JS]
Please note that the V8 implementation does not fully reflect the linked proposal. The proposal requires the lookbehind to be of fixed size (so that the implementation can jump back by the fixed size and perform a lookahead). This excludes variable quantifiers and back references. V8's experimental implementation reads backwards and therefore does not have such limitations.
Yangguo thanks for commenting about that! Do you maybe have an idea what could cause the issue in comment 1?
Comment on attachment 8688704 [details] [diff] [review]
WIP

Can you take a look? I just can't find the problem. You need to execute the test with --no-native-regexp.
Attachment #8688704 - Flags: feedback?(bhackett1024)
--no-native-regexp does not actually seem to have an effect. I had to comment out the do-while-false block around irregexp::ExecuteCode in vm/RegExpObject.cpp

The actual bug is in Trace::PerformDeferredActions. Note how in the original V8 patch the sentinel value changed from -1 to kMinInt [0]. The reason for this is that with backward read direction, the cp offset can actually be negative, so -1 as sentinel value is misleading. In your test case, the capture length is 1, read backward, the cp offset to be stored is coincidentally -1, so we do not store at all.

With that fixed, the capture works correctly.

Aside from that, I found it kind of inconvenient that SpiderMonkey's irregexp port does not include the diagnostics found in V8, for example tracing regexp bytecode execution. I put some printf into the bytecode interpreter to find the difference.

[0] https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd#diff-a8917623761138efd70f2d5c2055d1c1R1231
Thank you so much. --no-native-regexp prevents us from using the JIT for regexps, which wasn't implemented yet.
Attachment #8688704 - Flags: feedback?(bhackett1024)
I understand what --no-native-regexp is supposed to do. But using it didn't actually take me on the code path for the interpreter, i.e. irregexp::InterpretCode instead of irregexp::ExecuteCode. Maybe I was doing something wrong, but I didn't care enough to figure that one out.
Assignee: evilpies → nobody
Even after fixing this, there is an other issue with zaacbbbcac.match(/(z)((a+)?(b+)?(c))*/). I don't think I will finish this in the near future.
Just curious: /(z)((a+)?(b+)?(c))*/ does not actually contain a lookbehind assertion. Do you mean applying the patch breaks regexps that do not contain lookbehind assertions?
The TC39's proposal is currently on stage 3:

https://github.com/tc39/proposal-regexp-lookbehind

Test262 files are using the `regexp-lookbehind` features flag:

test262/built-ins/RegExp/lookBehind/alternations.js
test262/built-ins/RegExp/lookBehind/back-references-to-captures.js
test262/built-ins/RegExp/lookBehind/back-references.js
test262/built-ins/RegExp/lookBehind/captures.js
test262/built-ins/RegExp/lookBehind/captures-negative.js
test262/built-ins/RegExp/lookBehind/do-not-backtrack.js
test262/built-ins/RegExp/lookBehind/greedy-loop.js
test262/built-ins/RegExp/lookBehind/mutual-recursive.js
test262/built-ins/RegExp/lookBehind/misc.js
test262/built-ins/RegExp/lookBehind/nested-lookaround.js
test262/built-ins/RegExp/lookBehind/negative.js
test262/built-ins/RegExp/lookBehind/simple-fixed-length.js
test262/built-ins/RegExp/lookBehind/sliced-strings.js
test262/built-ins/RegExp/lookBehind/sticky.js
test262/built-ins/RegExp/lookBehind/start-of-line.js
test262/built-ins/RegExp/lookBehind/variable-length.js
test262/built-ins/RegExp/lookBehind/word-boundary.js
Depends on: 1367105
Blocks: 1425763
This proposal was approved by TC39 and is now part of the ES2018 standard.
It is already supported by current Chrome.
The TC39's proposal is on stage 4 now and should be supported. Chrome already do. You can test it with this link: https://jsfiddle.net/sho8dqaa/
Whiteboard: [DocArea=JS] → [DocArea=JS][parity-chrome]
No longer blocks: es-2018
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Keywords: parity-chrome
Whiteboard: [DocArea=JS][parity-chrome] → [DocArea=JS]
Blocks: es-2018
No longer blocks: es-proposals-stage-4
The proposal, which is now at stage 4, does now require that implementations read backwards in the same way V8, the experimental implementation in this bug and many other RegExp implementations implement lookbehind assertions.

This means that the hacky and error prone way to go back a fixed amount and do a lookahead assertion instead was rejected and fixed during the TC39 process.
Duplicate of this bug: 1544999
Type: defect → enhancement
Webcompat Priority: --- → ?

Let me change this. The issue for https://webcompat.com/issues/39461 was with Bug 1362154 (named groups).

Webcompat Priority: ? → ---
Duplicate of this bug: 1584910

:iain, does your work on named groups implement this?

Flags: needinfo?(iireland)

Yes. My work is to re-sync our irregexp implementation with the one that is used in V8. When I'm done, our regexps should be at feature parity with Chrome.

Flags: needinfo?(iireland)

:iain are you still working on this? Your post 4 months ago gave us some hope, but then it went quiet again...

Lookbehind assertions are one of the few ES2018 features that cannot be polyfilled.

It's been almost 3 years since this was added to Node and V8...

At least Firefox has a bug task for it that shows some signs of life; Safari does not even have that. And because of Apple's lock-down on browser engines on iOS, users cannot even switch to Chrome/Opera/Edge to get it.

Flags: needinfo?(iireland)

I've been working away on this behind the scenes. There's still work to be done to clean up some of the code, but all the tests are passing. I'll be starting to put some of the initial patches up for review today.

Flags: needinfo?(iireland)

Any progress on it?

Flags: needinfo?(iireland)

Please track Bug 1367105 to see progress.

Flags: needinfo?(iireland)
Depends on: 1629092
Webcompat Priority: --- → P1
Blocks: 1629092
No longer depends on: 1629092
Duplicate of this bug: 1632725
Duplicate of this bug: 1635724

Fixed by the landing of bug 1634135.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED

Congratulations! This must have been super hard to accomplish. I really appreciate it.

You need to log in before you can comment on or make changes to this bug.