Closed Bug 1779849 Opened 3 years ago Closed 3 years ago

Update irregexp (Jul 2022)

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(15 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

It has been a while since we've pulled in a new version of irregexp, and several useful patches have landed upstream, including one that should fix bug 1775005.

This patch was generated by running import-irregexp.py.

Depends on D152901

V8 added an enum class instead of using raw chars.

Drive-by: Mark CanReadUnaligned as const.

Depends on D152902

Preparing for the next patch

Depends on D152903

IsCharacterInRangeArray is translated from the version in imported/regexp-macro-assembler.h, with more comments and adjustments to make callWithABI work.

Depends on D152904

While adding support for syntax-parsing / early errors, V8 pushed the allocation of the named capture array one level outward, which lets us skip the creation of the FixedArray completely.

The awkward sorting requirement is also present in V8: compare https://github.com/v8/v8/blob/main/src/regexp/regexp.cc#L507-L508.

Depends on D152905

While refactoring this code, upstream V8 removed their own need to have Isolate as a friend class of RegExpStack. We need to be able to reach inside RegExpStack for a couple of reasons, so we use ExternalReference, which is still a friend class (to support ExternalReference::TopOfRegExpStack, which is used in V8's macroassembler code).

Since ExternalReference already exists, I'm also using it for the memory accounting code.

Depends on D152906

Depends on D152907

Another change made to support early errors in V8.

Depends on D152908

A bunch of code was moved upstream from the v8::internal namespace into the v8::base namespace. Adjust accordingly.

Depends on D152909

Depends on D152910

Zone is the V8 equivalent of LifoAlloc. A zone-allocated SmallVec is used in regexp-parser.cc.

Depends on D152911

V8 rewrote some stack overflow code to fix their equivalent of bug 1775005. These changes support that fix.

Depends on D152912

Depends on D152913

Backed out 14 changesets (Bug 1779849) for causing bustages on regexp-parser.cc.
Backout link
Push with failures <--> Bbc
Failure Log

Flags: needinfo?(iireland)

In some but not all clang builds, for reasons I haven't been able to fully pin down, the existing patch stack fails to build with an error about trying to access a private constructor. An over-simplified version of the code:

In regexp-parser.h (V8 code):

// A class with a templated static method.
class RegExpParser {
  template <class CharT>
  static bool VerifyRegExpSyntax(..., const CharT* input, ...);
};

In regexp-parser.cc (V8 code):

// A templated class
template <typename CharT>
class RegExpParserImpl {
 private:
   RegExpParserImpl(const CharT* ...);

 // A friend declaration for the templated static method
 friend bool RegExpParser::VerifyRegExpSyntax<CharT>(..., const CharT*, ...);
}

// Two instantiations of the templated class
template class RexExpParserImpl<uint8_t>
template class RexExpParserImpl<base::uc16>

// The implementation of the templated static method
template <class CharT>
bool RegExpParser::VerifyRegExpSyntax(..., const CharT*, ...) {
  // A call to the private constructor, which should be allowed
  // because of the friend declaration
  RexExpParserImpl( ...)
}

// Two instantiations of the templated static method
template bool RegExpParser::VerifyRegExpSyntax<uint8_t>(...);
template bool RegExpParser::VerifyRegExpSyntax<base::uc16>(...);

In RegExpAPI.cpp (our code), we call RegExpParser::VerifyRegExpSyntax. This triggers a compiler error in the build-linux64-base-toolchains-clang/debug job, but not in any other job. I can't figure out what the difference is. I've tried adding explicit casts to guarantee that RegExpAPI.cpp calls VerifyRegExpSyntax with an already instantiated template parameter (uint8_t or base::uc16), but that didn't fix anything.

After several failed iterations, I've fallen back on the approach of changing the friend declaration to friend class v8::internal::RegExpParser, which solves the problem. I haven't upstreamed this patch yet; it's small, hacky, and unsatisfying enough that I'm willing to patch our local copy and see if I can get to the bottom of this mystery next time I update irregexp.

Pushed by iireland@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/51cfbf07529b Update import-irregexp.py r=mgaudet https://hg.mozilla.org/integration/autoland/rev/0e09bbc94f04 Re-import irregexp r=mgaudet https://hg.mozilla.org/integration/autoland/rev/f390b51c4e67 Use enum class in CheckSpecialCharacterClass r=mgaudet https://hg.mozilla.org/integration/autoland/rev/72e453021744 Refactor ByteArray r=mgaudet https://hg.mozilla.org/integration/autoland/rev/a33a389e957a Implement CheckCharacterInRangeArray r=mgaudet https://hg.mozilla.org/integration/autoland/rev/3f5b77469ad0 Refactor InitializeNamedCaptures r=mgaudet https://hg.mozilla.org/integration/autoland/rev/58a6369ecf67 Update RegExpStack code r=mgaudet https://hg.mozilla.org/integration/autoland/rev/282f8562675c Update RegExpFlags r=mgaudet https://hg.mozilla.org/integration/autoland/rev/a1954be3f4f9 Remove FlatStringReader r=mgaudet https://hg.mozilla.org/integration/autoland/rev/3b1c86a28842 Move definitions into base namespace r=mgaudet https://hg.mozilla.org/integration/autoland/rev/77243ba8b574 Miscellaneous shim changes r=mgaudet https://hg.mozilla.org/integration/autoland/rev/13be26ada116 Support zone allocation for SmallVec r=mgaudet https://hg.mozilla.org/integration/autoland/rev/b622378d5a70 Update stack overflow detection r=mgaudet https://hg.mozilla.org/integration/autoland/rev/9812c5b56a78 Remove dead shim code r=mgaudet https://hg.mozilla.org/integration/autoland/rev/72cd1237f80d Fix build issue r=mgaudet
Flags: needinfo?(iireland)
Regressions: 1783555
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: