Closed Bug 1620020 Opened 4 years ago Closed 4 years ago

Implement shim for irregexp

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(15 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review

In bug 1592307 we added shim definitions for all of the V8-specific code that is used in our new import of irregexp. Many of those were left unimplemented. These patches fill in those gaps.

This shim code was only used in irregexp code that we're factoring out into a separate file and not importing.

V8's Code maps nicely onto SM's JitCode.

Depends on D65530

We use char16_t as our two-byte character type. V8 uses uint16_t. A reinterpret_cast between char16_t* and uint16_t* may or may not be defined, but fortunately we can just change "using uc16 = uint16_t" to "using uc16 = char16_t" and everything works out.

Depends on D65531

This fills in various string methods.

Depends on D65532

SM's roots are a linked list that lives on the stack. V8's stack roots are stored in a side table, which allows them to allocate handles that outlive the current stack frame. The lifetimes of those handles are instead determined by HandleScope objects. When a HandleScope goes out of scope, all "roots" created in that scope are freed simultaneously. This patch implements the V8 API inside SM.

A ByteArray is a fixed-length array of bytes. V8 uses ByteArrays to store bytecode for the irregexp interpreter, and to store lookup tables when compiling regexps. V8's ByteArrays are GC things (meaning that we have to be able to store them in a Value), but SM's version can't be a GC thing, because we have to be able to allocate a ByteArray while generating masm. We therefore end up with this approach, where a ByteArray is a PrivateValue pointing to a ByteArrayData (a length-prefixed buffer).

Depends on D65571

In V8, the Isolate (~JSContext) owns a Factory, which is responsible for allocating objects. The SM shim unifies Isolate and Factory into a single class.

This patch implements the Factory methods using the Handle infrastructure from previous patches.

Depends on D65572

This patch fills in the ICU-less Unicode implementation by copying the relevant code from V8's implementation. There are a number of big tables here, but they are all only defined when we're not using ICU (aka only in local shell builds).

Depends on D65573

The actual definition of these methods depends on future changes to RegExpShared. For now, we just stub them out.

Depends on D66098

When ICU is available, case-insensitive non-unicode matches (/i, not /iu) are performed using precomputed sets of characters that need special handling to follow the JS spec's weird rules. These sets live in special-case.cc. In V8, special-case.cc is generated at compile time with a special build step. This is overkill. Barring changes to gen-regexp-special-case.cc, special-case.cc will only change when we import a new version of ICU, and even then only if Unicode defines new case-folding shenanigans. This patch checks in a copy of special-case.cc. I'll open another bug to hook this process up to make_unicode.py.

PS: This version of special-case.cc is actually wrong due to a bug in V8. My patch to fix it upstream is under review; I'll fix it here in a later patch.

Depends on D66099

StdoutStream is used for debug output when trace-regexp-parser is enabled. The existing code doesn't actually print anything. V8's implementation goes to great lengths to make output work, even on Android. Instead of pulling in dozens of lines of code just to get some debug output working, this implementation just tapes a piece of paper to its chest with "std::cout" written in crayon and pretends.

The snprintf changes are to satisfy a static analysis (SprintfLiteralChecker).

Depends on D66101

FixedArray must store v8 Objects (aka JS::Values), but because it is allocated during parsing, it can't be a GC thing itself. The current implementation doesn't work. Writing a correct implementation is a little delicate. Fortunately, we only need it to support named captures, which are future work. For now, I am stubbing out the implementation of FixedArray to get rid of some GC hazards.

Depends on D66102

std::stable_sort allocates a buffer internally for temporary scratch space, but SpiderMonkey doesn't want anybody to allocate memory without going through us. This patch appeases our static analysis.

Depends on D66103

This patch turns on various optimization options by default. A later patch will allow us to control these flags with JitOptions.

Depends on D66104

For obvious reasons, irregexp does not contain any calls to EnsureBallast. We therefore need to make our LifoAlloc allocation fallible so that it can allocate a new chunk if necessary.

Also, we want to use the current size of the LifoAlloc, not the peak size, to decide whether we've allocated too much memory. Nobody was using the old ComputedSizeOfExcludingThis, so I rewrote it to use the value we're already tracking.

Depends on D66100

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a8b8b6871932
Handles and HandleScopes r=sfink
https://hg.mozilla.org/integration/autoland/rev/adeda9123386
Implement ByteArray r=sfink
https://hg.mozilla.org/integration/autoland/rev/b7907706b145
Implement Factory methods on Isolate r=sfink
https://hg.mozilla.org/integration/autoland/rev/78b1fec9092c
Import non-ICU Unicode implementation r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/e20389dc460e
Stub definitions for JSRegExp r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/164362495411
Add ICU support r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/ded485063dd6
Fix Zone implementation r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/d36c83619f1b
Implement StdoutStream r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/7af9b99f4af0
Remove invalid implementation of FixedArray r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/3ed69991b934
Use MergeSort instead of std::stable_sort to appease check_vanilla_allocations.py r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/0b8e548db9e9
Turn on optimization knobs r=mgaudet
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: