Closed Bug 1690142 Opened 4 years ago Closed 4 years ago

Update irregexp (Feb 2021)

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox88 --- fixed

People

(Reporter: iain, Assigned: iain)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

In the most recent TC39 meeting, the capture indices proposal for regular expressions was updated. The update addressed our concerns about performance by gating the creation of capture indices behind a flag.

Now that TC39 has approved a fix to the performance concerns that prevented us from implementing capture indices, I've started implementing that proposal. Although it's not strictly required, because capture indices are implemented outside of irregexp itself, this seems like as good a time as any to pull in the latest version of irregexp.

The set of noteworthy changes is short. Most of the work in irregexp since our last update has been to implement the experimental non-backtracking engine, which is not yet mature enough for us to import. Aside from some whole-engine refactoring, the main changes have been a smattering of small patches to be more explicit about integer conversions.

V8 added a new metadata file that we have no need to import.

This patch is the result of running import-irregexp.py.

Depends on D106963

Zone is the V8 equivalent of SM's LifoAlloc. The API was changed to enable data collection about allocations. We don't need the data collection, but we have to update our Zone shim.

Depends on D106964

A variety of small updates. Most notable:

  1. V8 added support for a safepoint mechanism where concurrent threads can pause for GC. This means that garbage collection can be triggered without heap allocation, so DisallowHeapAllocation was replaced with DisallowGarbageCollection in most places (including all the ones we care about).

  2. CompareCharsEqual was added to make string comparison more efficient in code that is only testing for equality and doesn't have to worry about memcmp giving the wrong ordering for two-byte chars on little-endian systems. The implementation is copy-pasted directly from V8.

  3. Some code was rewritten upstream to tighten up integer conversions. As part of that change, uc32 (which represents a Unicode char) is now unsigned. (The maximum valid codepoint in Unicode is 0x10FFFF, so signed vs unsigned doesn't generally matter in practice.)

Depends on D106965

-Werror=type-limits emits an error if a comparison is vacuously true. Upstream irregexp does not compile with -Werror=type-limits, so occasionally irregexp will, for example, assert that an unsigned variable is >= 0. It's not worth upstreaming a patch to remove the trivial assertion every time a case sneaks in; instead, we can just suppress the error in irregexp/moz.build.

Depends on D106966

Blocks: Irregexp
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: