Closed Bug 1592307 Opened 1 year ago Closed 1 year ago

Create shim definitions for V8-specific code in new regexp implementation

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox73 --- fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(19 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

To make it as easy as possible to keep our regexp code up to date with upstream, I've cobbled together a header file with shim definitions of the other parts of V8 needed by the regexp implementation.

I will break up my existing shim header into individual patches for review and attach them to this bug. Missing shim definitions (non-x64 masm + full Unicode support) will probably happen in separate bugs.

A lot of the implementations are fairly trivial interfaces to existing SM code, or short standalone functions. I'll open new bugs for the more complicated remaining implementation tasks as well.

Creating shim files, to be filled in over the course of many patches.

This patch adds '#include new-regexp/regexp-shim.h' in the necessary places.

The intent of this patch is to cover all of the necessary changes to
V8 source files that are not automated away by update-headers.py, so
that importing a new version of irregexp is as simple as running
update-headers.py and re-applying this patch.

Depends on D56484

This patch includes a variety of non-domain-specific definitions. Most of them are copy-pasted directly from V8 source. The main (partial) exception is saturated_cast: the full definition requires dozens of lines of template goo, but the only specialization we actually need is trivial to write by hand.

I included links to V8 source code for anything that seemed non-trivial, to make it easier to review.

Depends on D56485

Depends on D56486

This patch adds a shim for V8's Object class, which maps fairly well to SM's Value.

Remaining rough spots:

  1. V8 provides an interface for converting between an Object and an Address (typedef for uintptr_t). SM does its best to avoid exposing the internal Value representation to the outside world. We can either make this work by adding v8::internal::Object as a friend class of JSValue, or upstream an irregexp patch to remove the relatively few places where this is necessary. (A later patch goes with the former for now.)
  2. V8's HeapObject (halfway between SM's Object and GCThing) has a Size method, but it should be easy to upstream a patch to eliminate the need for it in irregexp. (This patch is written and ready to go.)
  3. V8's ByteArray is a fixed-length array of bytes. In the previous port of irregexp, we just used uint8_t[] (or a unique pointer to uint8_t[]). However, since our goal here is to avoid modifying the implementation of irregexp, we would prefer something that is compatible with the existing code, which means it needs to be a HeapObject. There are a variety of options (add a new class, use Uint8Array, rewrite HeapObject to enable us to store non-GCThings), but none of them is so obviously correct that I was willing to commit to it here. (In a later patch, I create a new class.)

Depends on D56487

This patch adds Vector, which is roughly equivalent to mozilla::Span.

Depends on D56488

This patch hooks up jsmalloc and adds Zones, which are a wrapper around LifoAlloc. Containers that allocate in Zones will be added in the next patch.

Depends on D56490

There are two different sets of zone containers used in irregexp.

  1. ZoneList: a growable vector that allocates its elements in a Zone. It appears to be the last remnant of an older set of data structures in V8; when the last kind of List other than ZoneList was removed, the base class implementations of List functionality were moved into ZoneList.

This code is mostly a straight copy from zone.h, except:
a) The DCHECKs have been converted to MOZ_ASSERT
b) ZoneAllocationPolicy appears to be a remnant of an abstraction layer across Lists. We don't need that generality for irregexp (and I am not sure that V8 needs it at all), so I removed it and simplified the resulting code.
c) Implementations have been inlined from zone-list-inl.h into zone-containers.h
d) I cleaned up a few bits of code to match SM-style (no unbraced ifs, etc...)

  1. ZoneVector/ZoneLinkedList/ZoneSet/ZoneMap/ZoneUnorderedMap are stdlib containers that are specialized (using ZoneAllocator) to allocate in Zones. This code is copied from zone-allocator.h and zone-containers.h.

Depends on D56492

SmallVector, like js::Vector, starts with elements allocated inline and expands when full. It is implemented as a wrapper around js::Vector.

Depends on D56493

V8's labels use different names than SpiderMonkey's, but the underlying concepts are the same.

Depends on D56494

V8 has Handles, but they have some important differences from SM's. This patch only includes the required Handle API for irregexp. The implementation comes in a later patch.

Depends on D56496

V8 has a message template system similar to SM's JSMSG. However, it is currently barely used in irregexp. Only stack overflow uses the message template system. Every other error uses a literal string.

This stub code implements the current required API. I am looking into the possibility of uplifting a patch to V8 to consistently use the message template system.

Depends on D56497

There are a variety of helper functions for unicode support. In cases where they map cleanly onto SM code, I hooked them up. The rest are left as stubs and implemented in a later patch. Similarly, the work necessary to support V8's equivalent to JS_USE_INTL_API comes in a later patch.

Depends on D56498

This patch adds a String wrapper and implements some of the easy bits.

For the most part, all irregexp needs from a String is access to a contiguous array of 1 or 2 byte characters, which is fairly easy to provide using SM Strings. Currently, there are a couple of places where the underlying representation of the String is inspected. My plan is to upstream a patch that hoists this code into a separate file, which will significantly reduce the overall surface area.

Depends on D56500

V8's equivalent to JitOptions is a set of FLAG_xxx bools. This patch adds the flags that are checked inside irregexp. Actually hooking them up to JitOptions is a future task.

Depends on D56501

This patch adds the JSRegExp object, which (as used within irregexp) is equivalent to our RegExpShared. This implementation of JSRegExp defines the members that are used from within irregexp. Implementing them and figuring out the external interface is future work.

One of the main uses of JSRegExp inside the engine is the Flag enum it contains. To keep things simple, I took the flags implementation directly from V8.

Depends on D56502

This patch adds the minimal shim for calling generated code and passing around references to code objects. In the future, depending on how gross this is to implement, we might add a cleaner abstraction layer to upstream V8.

Depends on D56503

Isolate is the V8 equivalent of JSContext. This patch adds a shim for Isolate.

In V8, the Factory is responsible for allocating memory. SM doesn't really have an equivalent, so instead of adding a separate Factory shim, I made Factory an alias for Isolate and moved the Factory methods into Isolate. The same is true for StackGuard.

Depends on D56504

This structure is used in a variety of places to check for stack overflow / interrupts. It seems to map pretty well to our equivalent code.

After this patch, everything needed by irregexp is declared in the shim, and the code compiles with --enable-new-regexp. (There are another dozen patches of implementations before it links successfully, though.)

Depends on D56505

Attachment #9114757 - Attachment description: Bug 1592307: Part 2: Add regexp-shim.h and -inl.h imports to V8 files r=mgaudet → Bug 1592307: Part 2: Add regexp-shim.h imports to V8 files r=mgaudet
Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/02dbaf4de6c9
Part 1: Define empty stub files r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/4c0276d45e88
Part 2: Add regexp-shim.h imports to V8 files r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/67003ef064c3
Part 3: Macros and #defines r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/7d5e2cda6fbb
Part 4: More simple definitions r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/2e11fc71e4cf
Part 5: Add V8::Object shim r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/985f326c5fd7
Part 6: Vector r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/5ad6e0ba170e
Part 7: Non-GC memory allocation r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/5adb89d54f12
Part 8: Zone containers r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/dfdef83bf5aa
Part 9: SmallVector r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/8d8524404286
Part 10: Label r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/189881fc7e8d
Part 11: Handles r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/7d5b3e320550
Part 12: Messages r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/bd295f254c0c
Part 13: Unicode helpers r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/38143e74fb7f
Part 14: Strings r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/a152892c24fc
Part 15: Option flags r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/796dd7dc1f20
Part 16: JSRegExp object r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/46b7909877fd
Part 16: GeneratedCode r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/b910dfaabe98
Part 17: Isolate r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/146b56bc48fb
Part 18: StackLimitCheck r=mgaudet
You need to log in before you can comment on or make changes to this bug.