Closed Bug 1367105 Opened 3 years ago Closed 1 month ago

[meta] Sync irregexp with upstream

Categories

(Core :: JavaScript Engine, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- affected

People

(Reporter: shu, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta)

Attachments

(1 file)

370.92 KB, application/x-zip-compressed
Details
There are several stage 3 TC39 proposals regarding regexps in flight:

1. lookbehind (https://github.com/tc39/proposal-regexp-lookbehind)
2. Unicode groups (https://github.com/tc39/proposal-regexp-named-groups)
3. Unicode properties (https://github.com/tc39/proposal-regexp-unicode-property-escapes)
4. dotAll (https://github.com/tc39/proposal-regexp-dotall-flag)

These are already implemented in V8, so we should take them.
Blocks: 1361856
Blocks: 1361876
(In reply to André Bargull [:anba] from comment #1)
> http://searchfox.org/mozilla-central/source/js/src/tests/ecma_6/RegExp/
> unicode-ignoreCase.js needs much longer after updating to upstream irregexp,
> filed https://bugs.chromium.org/p/v8/issues/detail?id=6727 for further
> investigation.

Does this know you have a patch for updating irregexp? If so, can you post it here, even though it's not ready to land because of the perf regression? Or did you just test the performance in v8 and expect the same regression to happen in SM?
Flags: needinfo?(andrebargull)
(In reply to Till Schneidereit [:till] from comment #2)
> Does this know you have a patch for updating irregexp? If so, can you post
> it here, even though it's not ready to land because of the perf regression?
> Or did you just test the performance in v8 and expect the same regression to
> happen in SM?

Yes, I have a very raw patch (or rather queue of patches) to sync with upstream. Some of the patches still need to be reordered, or in some cases later patches contain changes which actually belong to an earlier patch. There are lots of changes, simply because
1. Upstream has some methods in other files, so I had to reorder many methods/classes to match upstream, which makes it easier to compare differences between our version and upstream.
2. Our irregexp version was partially reformatted using the SpiderMonkey coding styles, which made it quite difficult to spot changes when comparing to upstream irregexp, so I had to change it back to the upstream formatting.

I pushed the changes to try:
https://hg.mozilla.org/try/pushloghtml?changeset=958c5b935c048c2ee59d608ae304def15804a976

Things still missing in those patches:
1. Support for the stage 3 proposals (it's probably cleaner to implement those changes in separate bugs)
2. Updates for NativeRegExpMacroAssembler (sstangl already volunteered to port any necessary assembler changes :-)
Flags: needinfo?(andrebargull)
(In reply to André Bargull [:anba] from comment #3)
> Things still missing in those patches:
> 1. Support for the stage 3 proposals (it's probably cleaner to implement
> those changes in separate bugs)
Chrome 64 already enables all Stage 3 proposals by default, so maybe they are ready to be imported now.
André, sorry for not following up sooner, but can you update the patches to fix the -werr issue? Most of the new spec changes blocked by this have reached stage 4 today, so it'd be great if we could ship them soon-ish.
Flags: needinfo?(andrebargull)
Upstream landed some bug-fixes and performance improvements we definitely want to take to avoid regressions compared to our current irregexp version. On top of that, upstream landed a few clean-ups which "nicely" conflict with the current patches [1,2,3] and also other internal changes like [4] which we probably should take, too, so it's easier for us to implement the stage 4 proposals. I did start updating the patches this week, but it's a cumbersome process because I effectively need to compare every single irregexp file with a diff-tool to see what needs to be updated in our local copy. :-/

[1] https://github.com/v8/v8/commit/62f929ff4cf322bc05e3a384c4ad18005f50ebda
[2] https://github.com/v8/v8/commit/447af335e044c92196de3c3ddf1059fad6901650
[3] https://github.com/v8/v8/commit/ab43c76dde97188e79a6a1c5bb994407606521d2
[4] https://github.com/v8/v8/commit/04f7d484db22b1526afa5414c06eda443c5b4fad
Flags: needinfo?(andrebargull)
This issue blocks 4 RegExp features that ware already approved by TC39 and are now part of the ES2018 standard.
Priority: -- → P1
André, we have to get this done. Ashley's interested in finishing the work. Are the patches linked in comment 3 the latest version? Are those patches still a good starting point?
Flags: needinfo?(andrebargull)
I have local (git) patches from this year April, which are probably a better starting point, except I don't know offhand in which state these patches are in. I'll try to look into them in the next week. (Keeping the NI for now so I don't forgot about this.)
Attached file irregexp-patches.zip
My current patches, last updated in April.
Flags: needinfo?(andrebargull)
Flags: needinfo?(jorendorff)
Depends on: 1510128

I assume the patches are outdated again. What blocked you from looking into them?

Sebastian

Type: defect → enhancement
Flags: needinfo?(andrebargull)

(In reply to Sebastian Zartner [:sebo] from comment #11)

I assume the patches are outdated again. What blocked you from looking into them?

Other tasks which are more pleasantly to work at compared to this one. :-)

Flags: needinfo?(andrebargull)

Actually, the last code merge to irregexp was on nov/2017, so I believe this patch may still apply...

This is the only bug blocking the last missing 4 ECMAScript 2018 features. It would be nice to have it for Firefox 68, which is an ESR version.

Given it's a week til the branch-off for 68 and we're into a soft freeze and we don't have a full patch for this regardless, I'm pretty confident this isn't going to make 68, nice as it might be if it did.

How about 69?

We already have 70, so no.

Could we at least undo the formatting so that diffing can be easier?

Something needs to happen here, we can't just pretend this isn't important. Steven any ideas?

Flags: needinfo?(sdetar)

evilpie, thanks for bringing this up again, it is important. We are revisiting where this fits into our priorities and looking to how to possible resource this.

Flags: needinfo?(sdetar)

I am working on a fresh import of V8's regexp code into SM. I'll commandeer this bug as a meta bug for tracking that work.

To reduce the risk of getting caught in the same position in the future, I'm making an effort to keep the V8 source as close to upstream as possible. Ideally, we could avoid making any non-trivial changes at all. (Example of a trivial change: removing #includes that point to V8 header files.) To make this work, I've been creating new regexp-shim.h and regexp-masm-shim.h headers, which define all the V8-isms that are used inside the regexp implementation.

Many of the shims still need implementations, but I've reached the point where the entire regexp engine compiles in SM and then fails when linking in missing implementations. Specifically, everything in v8/src/regexp, with the exception of:

  • regexp.cc and regexp-utils.cc, which implement the interface between the regexp implementation and the rest of the engine. They are more tightly coupled to the rest of V8, and it is more effective to just rewrite them from scratch.
  • s390/* and ppc/* contain macroassembler code for platforms we don't support
  • I have only finished stubbing out the masm code for x64. x86, arm, and (eventually) MIPS will come later.
  • Additional work is necessary to enable V8_INTL_SUPPORT (V8's equivalent to ENABLE_INTL_API)

As partial validation of this approach, I recently pulled in a month of V8 changes, including the addition of an entire peephole optimization pass, and it took less than an hour of additional shim work. Obviously nothing is certain until we have it working, but I am hopeful that this will be a more sustainable approach than the status quo.

Keywords: meta
Priority: P2 → P1
Summary: Sync irregexp with upstream → [meta] Sync irregexp with upstream
Depends on: 1592302
Depends on: 1592307

Assuming that SM will also get RegExp Match Indices (Bug 1519483) as part of this effort.

Blocks: 1519483

iain, did you consider the opposite approach of creating a stronger separation between irregexp and V8 upstream? Irregexp would ideally be V8-agnostic. The embedder (in this case SpiderMonkey) would of course need to implement all required macro-assembler calls.

mconca, match indices are regexp builtins work and do not require irregexp changes.

Jakob: When Luke pinged the V8 team about this, the answer he got was that V8 is understandably not very interested in moving their regexp code into a separate repo, but that refactoring APIs between irregexp and the JSVM / code generator would be possible.

In some cases it will be faster to make upstream changes than to implement things ourselves: for example, a few changes to NativeRegExpMacroAssembler::Match and NativeRegExpMacroAssembler::StringCharacterPosition could significantly reduce the string API that SpiderMonkey has to shim.

In other cases, the necessary refactoring is unlikely to be feasible in a reasonable timeframe, so we're better off with a shim. For example, it's faster for us to provide APIs that mimic V8's memory allocation than it would be to refactor all the memory allocation code inside irregexp.

I have a pile of notes on this stuff. At some point in the near future, once my ducks are better aligned, I'll reach out to the V8 team to talk it over. If V8 is interested in making irregexp V8-agnostic, then -- good news! I have a list of the places where it currently isn't! :)

I think this is something we should pursue. Ideally, updating irregexp in SM and other embedders should be as easy as plopping a current copy of irregexp source code into their project without extra effort.

I agree that e.g. replacing V8's Zone allocations with a normal malloc/free model (or something else) is not something we want, but it should be doable with reasonable effort to abstract the Zone model into an allocator interface (with just New, no Delete) plus a set of data structures based on this allocator.

This work would not be trivial by any means, but I assume it's preferable for you to do this once and then have a fairly small and well-defined set of functionality to implement within SM, rather than having to shim a large part of V8 that you perpetually need to update. I'll try to reach out to you next week, maybe we can figure something out :)

(In reply to Jakob Gruber from comment #23)

mconca, match indices are regexp builtins work and do not require irregexp changes.

Thanks, Jakob.

No longer blocks: 1519483

Jakob: yes, let's talk this week. I am on Toronto time.

Depends on: 1594545

I may have underestimate the changes required, sorry. There's no way to delete postings here (I've requested it in the Bugzilla forum).

Iain is actively landing code changes for this right now, see bug 1592302 and bug 1592307 for example. I know it can be frustrating to wait, but we're doing what we can to get this fixed as soon as possible.

Depends on: 1620020
Depends on: 1624015
Depends on: 1626713
Depends on: 1627838
Depends on: 1628835
Depends on: 1629670
Depends on: 1630090
Depends on: 1630383
Depends on: 1631504
Depends on: 1634135
Depends on: 1634810
Depends on: 1635275

All depending on/blocking bugs are now fixed.
Is it time to close?

Status: NEW → RESOLVED
Closed: 1 month ago
Flags: needinfo?(jorendorff)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.