Status

()

enhancement
4 years ago
Last month

People

(Reporter: evilpie, Unassigned)

Tracking

(Depends on 1 bug, Blocks 2 bugs, {dev-doc-needed, parity-chrome})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [DocArea=JS], )

Attachments

(1 attachment)

Reporter

Description

4 years ago
Posted patch WIPSplinter Review
I started porting v8's new code for RegExp lookbehind matches:
https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd

I already got it working in the RegEXp interpreter. Sadly porting the v8 code is not straightforward and I had to do it almost by hand.
Reporter

Updated

4 years ago
Assignee: nobody → evilpies
Reporter

Updated

4 years ago
Attachment #8688704 - Attachment is patch: true
Reporter

Comment 1

4 years ago
"abcdef".match(/(?<=(c))def/) currently produces ["def", undefined] and not ["def", "c"].
Keywords: dev-doc-needed
Whiteboard: [DocArea=JS]

Comment 2

4 years ago
Please note that the V8 implementation does not fully reflect the linked proposal. The proposal requires the lookbehind to be of fixed size (so that the implementation can jump back by the fixed size and perform a lookahead). This excludes variable quantifiers and back references. V8's experimental implementation reads backwards and therefore does not have such limitations.
Reporter

Comment 3

4 years ago
Yangguo thanks for commenting about that! Do you maybe have an idea what could cause the issue in comment 1?
Reporter

Comment 4

4 years ago
Comment on attachment 8688704 [details] [diff] [review]
WIP

Can you take a look? I just can't find the problem. You need to execute the test with --no-native-regexp.
Attachment #8688704 - Flags: feedback?(bhackett1024)

Comment 5

4 years ago
--no-native-regexp does not actually seem to have an effect. I had to comment out the do-while-false block around irregexp::ExecuteCode in vm/RegExpObject.cpp

The actual bug is in Trace::PerformDeferredActions. Note how in the original V8 patch the sentinel value changed from -1 to kMinInt [0]. The reason for this is that with backward read direction, the cp offset can actually be negative, so -1 as sentinel value is misleading. In your test case, the capture length is 1, read backward, the cp offset to be stored is coincidentally -1, so we do not store at all.

With that fixed, the capture works correctly.

Aside from that, I found it kind of inconvenient that SpiderMonkey's irregexp port does not include the diagnostics found in V8, for example tracing regexp bytecode execution. I put some printf into the bytecode interpreter to find the difference.

[0] https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd#diff-a8917623761138efd70f2d5c2055d1c1R1231
Reporter

Comment 6

4 years ago
Thank you so much. --no-native-regexp prevents us from using the JIT for regexps, which wasn't implemented yet.
Reporter

Updated

4 years ago
Attachment #8688704 - Flags: feedback?(bhackett1024)

Comment 7

4 years ago
I understand what --no-native-regexp is supposed to do. But using it didn't actually take me on the code path for the interpreter, i.e. irregexp::InterpretCode instead of irregexp::ExecuteCode. Maybe I was doing something wrong, but I didn't care enough to figure that one out.
Reporter

Updated

4 years ago
Assignee: evilpies → nobody
Reporter

Comment 8

4 years ago
Even after fixing this, there is an other issue with zaacbbbcac.match(/(z)((a+)?(b+)?(c))*/). I don't think I will finish this in the near future.

Comment 9

4 years ago
Just curious: /(z)((a+)?(b+)?(c))*/ does not actually contain a lookbehind assertion. Do you mean applying the patch breaks regexps that do not contain lookbehind assertions?

Comment 10

3 years ago
http://v8project.blogspot.com/2016/02/regexp-lookbehind-assertions.html

Please It's really needed feature, plus V8 seems to add it soon.

Comment 11

2 years ago
The TC39's proposal is currently on stage 3:

https://github.com/tc39/proposal-regexp-lookbehind

Test262 files are using the `regexp-lookbehind` features flag:

test262/built-ins/RegExp/lookBehind/alternations.js
test262/built-ins/RegExp/lookBehind/back-references-to-captures.js
test262/built-ins/RegExp/lookBehind/back-references.js
test262/built-ins/RegExp/lookBehind/captures.js
test262/built-ins/RegExp/lookBehind/captures-negative.js
test262/built-ins/RegExp/lookBehind/do-not-backtrack.js
test262/built-ins/RegExp/lookBehind/greedy-loop.js
test262/built-ins/RegExp/lookBehind/mutual-recursive.js
test262/built-ins/RegExp/lookBehind/misc.js
test262/built-ins/RegExp/lookBehind/nested-lookaround.js
test262/built-ins/RegExp/lookBehind/negative.js
test262/built-ins/RegExp/lookBehind/simple-fixed-length.js
test262/built-ins/RegExp/lookBehind/sliced-strings.js
test262/built-ins/RegExp/lookBehind/sticky.js
test262/built-ins/RegExp/lookBehind/start-of-line.js
test262/built-ins/RegExp/lookBehind/variable-length.js
test262/built-ins/RegExp/lookBehind/word-boundary.js
Depends on: 1367105
Blocks: 1425763
This proposal was approved by TC39 and is now part of the ES2018 standard.
It is already supported by current Chrome.

Comment 13

Last year
The TC39's proposal is on stage 4 now and should be supported. Chrome already do. You can test it with this link: https://jsfiddle.net/sho8dqaa/
Whiteboard: [DocArea=JS] → [DocArea=JS][parity-chrome]
No longer blocks: es-2018
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Keywords: parity-chrome
Whiteboard: [DocArea=JS][parity-chrome] → [DocArea=JS]
Blocks: es-2018
No longer blocks: es-proposals-stage-4

Comment 15

11 months ago
The proposal, which is now at stage 4, does now require that implementations read backwards in the same way V8, the experimental implementation in this bug and many other RegExp implementations implement lookbehind assertions.

This means that the hacky and error prone way to go back a fixed amount and do a lookahead assertion instead was rejected and fixed during the TC39 process.
Comment hidden (advocacy)
Comment hidden (advocacy)

Updated

2 months ago
Duplicate of this bug: 1544999
Type: defect → enhancement
You need to log in before you can comment on or make changes to this bug.