Closed Bug 1225665 Opened 6 years ago Closed 1 year ago
I started porting v8's new code for RegExp lookbehind matches: https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd I already got it working in the RegEXp interpreter. Sadly porting the v8 code is not straightforward and I had to do it almost by hand.
Attachment #8688704 - Attachment is patch: true
"abcdef".match(/(?<=(c))def/) currently produces ["def", undefined] and not ["def", "c"].
Please note that the V8 implementation does not fully reflect the linked proposal. The proposal requires the lookbehind to be of fixed size (so that the implementation can jump back by the fixed size and perform a lookahead). This excludes variable quantifiers and back references. V8's experimental implementation reads backwards and therefore does not have such limitations.
Yangguo thanks for commenting about that! Do you maybe have an idea what could cause the issue in comment 1?
Comment on attachment 8688704 [details] [diff] [review] WIP Can you take a look? I just can't find the problem. You need to execute the test with --no-native-regexp.
--no-native-regexp does not actually seem to have an effect. I had to comment out the do-while-false block around irregexp::ExecuteCode in vm/RegExpObject.cpp The actual bug is in Trace::PerformDeferredActions. Note how in the original V8 patch the sentinel value changed from -1 to kMinInt . The reason for this is that with backward read direction, the cp offset can actually be negative, so -1 as sentinel value is misleading. In your test case, the capture length is 1, read backward, the cp offset to be stored is coincidentally -1, so we do not store at all. With that fixed, the capture works correctly. Aside from that, I found it kind of inconvenient that SpiderMonkey's irregexp port does not include the diagnostics found in V8, for example tracing regexp bytecode execution. I put some printf into the bytecode interpreter to find the difference.  https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd#diff-a8917623761138efd70f2d5c2055d1c1R1231
Thank you so much. --no-native-regexp prevents us from using the JIT for regexps, which wasn't implemented yet.
I understand what --no-native-regexp is supposed to do. But using it didn't actually take me on the code path for the interpreter, i.e. irregexp::InterpretCode instead of irregexp::ExecuteCode. Maybe I was doing something wrong, but I didn't care enough to figure that one out.
Even after fixing this, there is an other issue with zaacbbbcac.match(/(z)((a+)?(b+)?(c))*/). I don't think I will finish this in the near future.
Just curious: /(z)((a+)?(b+)?(c))*/ does not actually contain a lookbehind assertion. Do you mean applying the patch breaks regexps that do not contain lookbehind assertions?
The TC39's proposal is currently on stage 3: https://github.com/tc39/proposal-regexp-lookbehind Test262 files are using the `regexp-lookbehind` features flag: test262/built-ins/RegExp/lookBehind/alternations.js test262/built-ins/RegExp/lookBehind/back-references-to-captures.js test262/built-ins/RegExp/lookBehind/back-references.js test262/built-ins/RegExp/lookBehind/captures.js test262/built-ins/RegExp/lookBehind/captures-negative.js test262/built-ins/RegExp/lookBehind/do-not-backtrack.js test262/built-ins/RegExp/lookBehind/greedy-loop.js test262/built-ins/RegExp/lookBehind/mutual-recursive.js test262/built-ins/RegExp/lookBehind/misc.js test262/built-ins/RegExp/lookBehind/nested-lookaround.js test262/built-ins/RegExp/lookBehind/negative.js test262/built-ins/RegExp/lookBehind/simple-fixed-length.js test262/built-ins/RegExp/lookBehind/sliced-strings.js test262/built-ins/RegExp/lookBehind/sticky.js test262/built-ins/RegExp/lookBehind/start-of-line.js test262/built-ins/RegExp/lookBehind/variable-length.js test262/built-ins/RegExp/lookBehind/word-boundary.js
This proposal was approved by TC39 and is now part of the ES2018 standard. It is already supported by current Chrome.
The TC39's proposal is on stage 4 now and should be supported. Chrome already do. You can test it with this link: https://jsfiddle.net/sho8dqaa/
Whiteboard: [DocArea=JS] → [DocArea=JS][parity-chrome]
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Whiteboard: [DocArea=JS][parity-chrome] → [DocArea=JS]
The proposal, which is now at stage 4, does now require that implementations read backwards in the same way V8, the experimental implementation in this bug and many other RegExp implementations implement lookbehind assertions. This means that the hacky and error prone way to go back a fixed amount and do a lookahead assertion instead was rejected and fixed during the TC39 process.
Webcompat Priority: ? → ---
See Also: https://webcompat.com/issues/39461 →
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.