Closed Bug 1624015 Opened 5 months ago Closed 5 months ago

Simplify irregexp re-import process

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(4 files)

In the process of importing irregexp, I created a stack of upstream patches to contribute back to V8, to fix bugs and/or make it easier for us to use their code. The last of those patches have landed, so it's a good time to re-import the latest copy of irregexp and make sure we are in sync.

I took advantage of the opportunity to automate as much of the process as possible.

This script handles all the mechanical steps of importing irregexp from v8:

  1. Acquire the source: either from github, or optionally from a local copy of v8.
  2. Copy the contents of v8/src/regexp into js/src/new-regexp
    • Exclude files that we have chosen not to import.
  3. While doing so, update #includes:
    • Change "src/regexp/" to "new-regexp/".
    • Remove other v8-specific headers completely.
      (This subsumes the previous "update-headers.py" script.)
  4. Add '#include "new-regexp/regexp-shim.h" in the necessary places.
  5. Update the VERSION file to include the correct git hash.

The only remaining task is to try compiling the code and see whether any of the shim code needs to be updated.

The contents of this patch were automatically generated using import-irregexp.py.

This is up to date with upstream V8 as of March 19, 2020.

Depends on D67716

In V8, gen-regexp-special-case.cc is compiled and run as a special build step to produce special-case.cc. That's a waste of time for us. special-case.cc can only change if one of the following occurs:

  1. The Unicode consortium changes the case-folding behaviour of characters in the Basic Multilingual Plane. Given that there are only 16 undefined codepoints remaining in the BMP, this is not expected to happen often, if indeed it ever happens again.

  2. Changes are made to gen-regexp-special-case.cc.

Because of this, special-case.cc is checked in directly in SpiderMonkey.

As it happens, one of the patches that I contributed back upstream to V8 fixed a number of bugs with /iu (ignoreCase, non-unicode) matches. Fixing that bug involved rewriting gen-regexp-special-case.cc to match the JS spec.

This patch checks in the resulting changes to special-case.cc.

Depends on D67717

Because most of the recent changes to irregexp were patches I contributed myself, we barely need to change any of the shim code.

The only notable change is the addition of '#define COMPILING_IRREGEXP_FOR_EXTERNAL_EMBEDDER'. This is the solution that Jakob Gruber and I eventually came up with for the question of what to do with awkward V8 code that SM doesn't want. For example, NativeRegExpMacroAssembler::Match (in regexp-macro-assembler.cc) gets down in the muck with the internal details of V8's String implementation. It would be most convenient for SM if that function just didn't exist; we aren't going to use it, and we don't want to have to define a bunch of unused string API gunk in our shim. The answer is to wrap functions we don't need in "#ifndef COMPILING_IRREGEXP_FOR_EXTERNAL_EMBEDDER", which solves our problem and is minimally disruptive upstream.

Depends on D67718

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/33167ac2b14b
Create script to automate import from v8 r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/bd51cd54aecf
Re-import irregexp using import-irregexp.py r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/cc745ddc7d63
Update special-case.cc r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/5cf0cb9c2c73
Update shim code to support new import r=mgaudet
You need to log in before you can comment on or make changes to this bug.