[meta] Sync irregexp with upstream
Categories
(Core :: JavaScript Engine, enhancement, P1)
Tracking
()
People
(Reporter: shu, Unassigned)
References
Details
(Keywords: meta)
Attachments
(1 file)
370.92 KB,
application/x-zip-compressed
|
Details |
There are several stage 3 TC39 proposals regarding regexps in flight: 1. lookbehind (https://github.com/tc39/proposal-regexp-lookbehind) 2. Unicode groups (https://github.com/tc39/proposal-regexp-named-groups) 3. Unicode properties (https://github.com/tc39/proposal-regexp-unicode-property-escapes) 4. dotAll (https://github.com/tc39/proposal-regexp-dotall-flag) These are already implemented in V8, so we should take them.
Updated•7 years ago
|
Comment 1•7 years ago
|
||
http://searchfox.org/mozilla-central/source/js/src/tests/ecma_6/RegExp/unicode-ignoreCase.js needs much longer after updating to upstream irregexp, filed https://bugs.chromium.org/p/v8/issues/detail?id=6727 for further investigation.
Comment 2•7 years ago
|
||
(In reply to André Bargull [:anba] from comment #1) > http://searchfox.org/mozilla-central/source/js/src/tests/ecma_6/RegExp/ > unicode-ignoreCase.js needs much longer after updating to upstream irregexp, > filed https://bugs.chromium.org/p/v8/issues/detail?id=6727 for further > investigation. Does this know you have a patch for updating irregexp? If so, can you post it here, even though it's not ready to land because of the perf regression? Or did you just test the performance in v8 and expect the same regression to happen in SM?
Comment 3•7 years ago
|
||
(In reply to Till Schneidereit [:till] from comment #2) > Does this know you have a patch for updating irregexp? If so, can you post > it here, even though it's not ready to land because of the perf regression? > Or did you just test the performance in v8 and expect the same regression to > happen in SM? Yes, I have a very raw patch (or rather queue of patches) to sync with upstream. Some of the patches still need to be reordered, or in some cases later patches contain changes which actually belong to an earlier patch. There are lots of changes, simply because 1. Upstream has some methods in other files, so I had to reorder many methods/classes to match upstream, which makes it easier to compare differences between our version and upstream. 2. Our irregexp version was partially reformatted using the SpiderMonkey coding styles, which made it quite difficult to spot changes when comparing to upstream irregexp, so I had to change it back to the upstream formatting. I pushed the changes to try: https://hg.mozilla.org/try/pushloghtml?changeset=958c5b935c048c2ee59d608ae304def15804a976 Things still missing in those patches: 1. Support for the stage 3 proposals (it's probably cleaner to implement those changes in separate bugs) 2. Updates for NativeRegExpMacroAssembler (sstangl already volunteered to port any necessary assembler changes :-)
Comment 4•6 years ago
|
||
(In reply to André Bargull [:anba] from comment #3) > Things still missing in those patches: > 1. Support for the stage 3 proposals (it's probably cleaner to implement > those changes in separate bugs) Chrome 64 already enables all Stage 3 proposals by default, so maybe they are ready to be imported now.
Comment 5•6 years ago
|
||
André, sorry for not following up sooner, but can you update the patches to fix the -werr issue? Most of the new spec changes blocked by this have reached stage 4 today, so it'd be great if we could ship them soon-ish.
Comment 6•6 years ago
|
||
Upstream landed some bug-fixes and performance improvements we definitely want to take to avoid regressions compared to our current irregexp version. On top of that, upstream landed a few clean-ups which "nicely" conflict with the current patches [1,2,3] and also other internal changes like [4] which we probably should take, too, so it's easier for us to implement the stage 4 proposals. I did start updating the patches this week, but it's a cumbersome process because I effectively need to compare every single irregexp file with a diff-tool to see what needs to be updated in our local copy. :-/ [1] https://github.com/v8/v8/commit/62f929ff4cf322bc05e3a384c4ad18005f50ebda [2] https://github.com/v8/v8/commit/447af335e044c92196de3c3ddf1059fad6901650 [3] https://github.com/v8/v8/commit/ab43c76dde97188e79a6a1c5bb994407606521d2 [4] https://github.com/v8/v8/commit/04f7d484db22b1526afa5414c06eda443c5b4fad
Comment 7•6 years ago
|
||
This issue blocks 4 RegExp features that ware already approved by TC39 and are now part of the ES2018 standard.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 8•6 years ago
|
||
André, we have to get this done. Ashley's interested in finishing the work. Are the patches linked in comment 3 the latest version? Are those patches still a good starting point?
Updated•6 years ago
|
Comment 9•6 years ago
|
||
I have local (git) patches from this year April, which are probably a better starting point, except I don't know offhand in which state these patches are in. I'll try to look into them in the next week. (Keeping the NI for now so I don't forgot about this.)
Updated•6 years ago
|
Updated•6 years ago
|
Updated•5 years ago
|
Comment 11•5 years ago
|
||
I assume the patches are outdated again. What blocked you from looking into them?
Sebastian
Comment 12•5 years ago
|
||
(In reply to Sebastian Zartner [:sebo] from comment #11)
I assume the patches are outdated again. What blocked you from looking into them?
Other tasks which are more pleasantly to work at compared to this one. :-)
Comment 13•5 years ago
•
|
||
Actually, the last code merge to irregexp was on nov/2017, so I believe this patch may still apply...
Comment 14•5 years ago
|
||
This is the only bug blocking the last missing 4 ECMAScript 2018 features. It would be nice to have it for Firefox 68, which is an ESR version.
Comment 15•5 years ago
|
||
Given it's a week til the branch-off for 68 and we're into a soft freeze and we don't have a full patch for this regardless, I'm pretty confident this isn't going to make 68, nice as it might be if it did.
Comment 16•5 years ago
|
||
How about 69?
We already have 70, so no.
Could we at least undo the formatting so that diffing can be easier?
Comment 19•5 years ago
|
||
Something needs to happen here, we can't just pretend this isn't important. Steven any ideas?
Comment 20•5 years ago
|
||
evilpie, thanks for bringing this up again, it is important. We are revisiting where this fits into our priorities and looking to how to possible resource this.
Comment 21•4 years ago
|
||
I am working on a fresh import of V8's regexp code into SM. I'll commandeer this bug as a meta bug for tracking that work.
To reduce the risk of getting caught in the same position in the future, I'm making an effort to keep the V8 source as close to upstream as possible. Ideally, we could avoid making any non-trivial changes at all. (Example of a trivial change: removing #includes that point to V8 header files.) To make this work, I've been creating new regexp-shim.h and regexp-masm-shim.h headers, which define all the V8-isms that are used inside the regexp implementation.
Many of the shims still need implementations, but I've reached the point where the entire regexp engine compiles in SM and then fails when linking in missing implementations. Specifically, everything in v8/src/regexp, with the exception of:
- regexp.cc and regexp-utils.cc, which implement the interface between the regexp implementation and the rest of the engine. They are more tightly coupled to the rest of V8, and it is more effective to just rewrite them from scratch.
- s390/* and ppc/* contain macroassembler code for platforms we don't support
- I have only finished stubbing out the masm code for x64. x86, arm, and (eventually) MIPS will come later.
- Additional work is necessary to enable V8_INTL_SUPPORT (V8's equivalent to ENABLE_INTL_API)
As partial validation of this approach, I recently pulled in a month of V8 changes, including the addition of an entire peephole optimization pass, and it took less than an hour of additional shim work. Obviously nothing is certain until we have it working, but I am hopeful that this will be a more sustainable approach than the status quo.
Comment 22•4 years ago
|
||
Assuming that SM will also get RegExp Match Indices (Bug 1519483) as part of this effort.
Comment 23•4 years ago
|
||
iain, did you consider the opposite approach of creating a stronger separation between irregexp and V8 upstream? Irregexp would ideally be V8-agnostic. The embedder (in this case SpiderMonkey) would of course need to implement all required macro-assembler calls.
mconca, match indices are regexp builtins work and do not require irregexp changes.
Comment 24•4 years ago
|
||
Jakob: When Luke pinged the V8 team about this, the answer he got was that V8 is understandably not very interested in moving their regexp code into a separate repo, but that refactoring APIs between irregexp and the JSVM / code generator would be possible.
In some cases it will be faster to make upstream changes than to implement things ourselves: for example, a few changes to NativeRegExpMacroAssembler::Match and NativeRegExpMacroAssembler::StringCharacterPosition could significantly reduce the string API that SpiderMonkey has to shim.
In other cases, the necessary refactoring is unlikely to be feasible in a reasonable timeframe, so we're better off with a shim. For example, it's faster for us to provide APIs that mimic V8's memory allocation than it would be to refactor all the memory allocation code inside irregexp.
I have a pile of notes on this stuff. At some point in the near future, once my ducks are better aligned, I'll reach out to the V8 team to talk it over. If V8 is interested in making irregexp V8-agnostic, then -- good news! I have a list of the places where it currently isn't! :)
Comment 25•4 years ago
|
||
I think this is something we should pursue. Ideally, updating irregexp in SM and other embedders should be as easy as plopping a current copy of irregexp source code into their project without extra effort.
I agree that e.g. replacing V8's Zone allocations with a normal malloc/free model (or something else) is not something we want, but it should be doable with reasonable effort to abstract the Zone model into an allocator interface (with just New, no Delete) plus a set of data structures based on this allocator.
This work would not be trivial by any means, but I assume it's preferable for you to do this once and then have a fairly small and well-defined set of functionality to implement within SM, rather than having to shim a large part of V8 that you perpetually need to update. I'll try to reach out to you next week, maybe we can figure something out :)
Comment 26•4 years ago
|
||
(In reply to Jakob Gruber from comment #23)
mconca, match indices are regexp builtins work and do not require irregexp changes.
Thanks, Jakob.
Comment 27•4 years ago
|
||
Jakob: yes, let's talk this week. I am on Toronto time.
Comment hidden (obsolete) |
Comment 29•4 years ago
|
||
I may have underestimate the changes required, sorry. There's no way to delete postings here (I've requested it in the Bugzilla forum).
Comment 30•4 years ago
|
||
Iain is actively landing code changes for this right now, see bug 1592302 and bug 1592307 for example. I know it can be frustrating to wait, but we're doing what we can to get this fixed as soon as possible.
Comment 31•4 years ago
|
||
All depending on/blocking bugs are now fixed.
Is it time to close?
Updated•4 years ago
|
Comment 32•3 years ago
|
||
Would I be correct in assuming that bugs such as https://bugs.chromium.org/p/v8/issues/detail?id=11616 will get automatically merged into Firefox once they are fixed upstream?
Comment 33•3 years ago
|
||
We merge upstream fixes manually, because they may require updates to our shim code, but the process is mostly automated. See import-irregexp.py.
I will keep an eye on that bug. Thanks for the pointer!
Description
•