Implement RegExp lookbehind
Categories
(Core :: JavaScript Engine, enhancement)
Tracking
()
Webcompat Priority | P1 |
People
(Reporter: evilpie, Unassigned)
References
(Blocks 1 open bug, )
Details
(Keywords: dev-doc-complete, parity-chrome, Whiteboard: [DocArea=JS])
Attachments
(1 file)
72.24 KB,
patch
|
Details | Diff | Splinter Review |
I started porting v8's new code for RegExp lookbehind matches: https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd I already got it working in the RegEXp interpreter. Sadly porting the v8 code is not straightforward and I had to do it almost by hand.
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Comment 1•9 years ago
|
||
"abcdef".match(/(?<=(c))def/) currently produces ["def", undefined] and not ["def", "c"].
Updated•9 years ago
|
Updated•9 years ago
|
Please note that the V8 implementation does not fully reflect the linked proposal. The proposal requires the lookbehind to be of fixed size (so that the implementation can jump back by the fixed size and perform a lookahead). This excludes variable quantifiers and back references. V8's experimental implementation reads backwards and therefore does not have such limitations.
Reporter | ||
Comment 3•9 years ago
|
||
Yangguo thanks for commenting about that! Do you maybe have an idea what could cause the issue in comment 1?
Reporter | ||
Comment 4•9 years ago
|
||
Comment on attachment 8688704 [details] [diff] [review] WIP Can you take a look? I just can't find the problem. You need to execute the test with --no-native-regexp.
--no-native-regexp does not actually seem to have an effect. I had to comment out the do-while-false block around irregexp::ExecuteCode in vm/RegExpObject.cpp The actual bug is in Trace::PerformDeferredActions. Note how in the original V8 patch the sentinel value changed from -1 to kMinInt [0]. The reason for this is that with backward read direction, the cp offset can actually be negative, so -1 as sentinel value is misleading. In your test case, the capture length is 1, read backward, the cp offset to be stored is coincidentally -1, so we do not store at all. With that fixed, the capture works correctly. Aside from that, I found it kind of inconvenient that SpiderMonkey's irregexp port does not include the diagnostics found in V8, for example tracing regexp bytecode execution. I put some printf into the bytecode interpreter to find the difference. [0] https://github.com/v8/v8/commit/37632606bbce1418238b13fd90cb6ef6705871cd#diff-a8917623761138efd70f2d5c2055d1c1R1231
Reporter | ||
Comment 6•9 years ago
|
||
Thank you so much. --no-native-regexp prevents us from using the JIT for regexps, which wasn't implemented yet.
Reporter | ||
Updated•9 years ago
|
I understand what --no-native-regexp is supposed to do. But using it didn't actually take me on the code path for the interpreter, i.e. irregexp::InterpretCode instead of irregexp::ExecuteCode. Maybe I was doing something wrong, but I didn't care enough to figure that one out.
Reporter | ||
Updated•9 years ago
|
Reporter | ||
Comment 8•9 years ago
|
||
Even after fixing this, there is an other issue with zaacbbbcac.match(/(z)((a+)?(b+)?(c))*/). I don't think I will finish this in the near future.
Just curious: /(z)((a+)?(b+)?(c))*/ does not actually contain a lookbehind assertion. Do you mean applying the patch breaks regexps that do not contain lookbehind assertions?
Comment 11•8 years ago
|
||
The TC39's proposal is currently on stage 3: https://github.com/tc39/proposal-regexp-lookbehind Test262 files are using the `regexp-lookbehind` features flag: test262/built-ins/RegExp/lookBehind/alternations.js test262/built-ins/RegExp/lookBehind/back-references-to-captures.js test262/built-ins/RegExp/lookBehind/back-references.js test262/built-ins/RegExp/lookBehind/captures.js test262/built-ins/RegExp/lookBehind/captures-negative.js test262/built-ins/RegExp/lookBehind/do-not-backtrack.js test262/built-ins/RegExp/lookBehind/greedy-loop.js test262/built-ins/RegExp/lookBehind/mutual-recursive.js test262/built-ins/RegExp/lookBehind/misc.js test262/built-ins/RegExp/lookBehind/nested-lookaround.js test262/built-ins/RegExp/lookBehind/negative.js test262/built-ins/RegExp/lookBehind/simple-fixed-length.js test262/built-ins/RegExp/lookBehind/sliced-strings.js test262/built-ins/RegExp/lookBehind/sticky.js test262/built-ins/RegExp/lookBehind/start-of-line.js test262/built-ins/RegExp/lookBehind/variable-length.js test262/built-ins/RegExp/lookBehind/word-boundary.js
Comment 12•7 years ago
|
||
This proposal was approved by TC39 and is now part of the ES2018 standard. It is already supported by current Chrome.
Comment 13•7 years ago
|
||
The TC39's proposal is on stage 4 now and should be supported. Chrome already do. You can test it with this link: https://jsfiddle.net/sho8dqaa/
Updated•7 years ago
|
Updated•7 years ago
|
Comment 14•7 years ago
|
||
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Updated•6 years ago
|
Comment 15•6 years ago
|
||
The proposal, which is now at stage 4, does now require that implementations read backwards in the same way V8, the experimental implementation in this bug and many other RegExp implementations implement lookbehind assertions. This means that the hacky and error prone way to go back a fixed amount and do a lookahead assertion instead was rejected and fixed during the TC39 process.
Updated•6 years ago
|
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Updated•5 years ago
|
Updated•5 years ago
|
Comment 23•5 years ago
|
||
Let me change this. The issue for https://webcompat.com/issues/39461 was with Bug 1362154 (named groups).
Comment 25•5 years ago
|
||
:iain, does your work on named groups implement this?
Comment 26•5 years ago
|
||
Yes. My work is to re-sync our irregexp implementation with the one that is used in V8. When I'm done, our regexps should be at feature parity with Chrome.
Comment 29•5 years ago
|
||
:iain are you still working on this? Your post 4 months ago gave us some hope, but then it went quiet again...
Lookbehind assertions are one of the few ES2018 features that cannot be polyfilled.
It's been almost 3 years since this was added to Node and V8...
At least Firefox has a bug task for it that shows some signs of life; Safari does not even have that. And because of Apple's lock-down on browser engines on iOS, users cannot even switch to Chrome/Opera/Edge to get it.
Comment 30•5 years ago
|
||
I've been working away on this behind the scenes. There's still work to be done to clean up some of the code, but all the tests are passing. I'll be starting to put some of the initial patches up for review today.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 35•4 years ago
|
||
Fixed by the landing of bug 1634135.
Comment 36•4 years ago
|
||
Congratulations! This must have been super hard to accomplish. I really appreciate it.
Comment 37•4 years ago
|
||
Developer docs:
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/78#JavaScript
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Assertions
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
Comment 38•4 years ago
|
||
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#browser_compatibility states that Firefox for Android doesn't support lookbehind.
Is that correct? There's a specific bug for Fenix (the asterisk on the page sent me here)?
Thanks!
Comment 39•4 years ago
|
||
(In reply to Giorgio Maone [:mao] from comment #38)
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#browser_compatibility states that Firefox for Android doesn't support lookbehind.
Is that correct? There's a specific bug for Fenix (the asterisk on the page sent me here)?
Thanks!
No, it passes the test: kangax.github.io/compat-table/es2016plus
Looking at BCD, it seems it was missed in https://github.com/mdn/browser-compat-data/pull/6201 because there was no version 78 on Android. I'll open a PR to fix xit.
Comment 40•4 years ago
|
||
It's actually fixed by https://github.com/mdn/browser-compat-data/pull/7658 but the corresponding MDN document is not updated yet. I guess it'll be fixed with the next BCD release.
Updated•4 years ago
|
Updated•2 years ago
|
Description
•