Enable new regexp engine
Categories
(Core :: JavaScript Engine, enhancement, P1)
Tracking
()
People
(Reporter: iain, Assigned: iain)
References
(Blocks 1 open bug)
Details
(Keywords: dev-doc-complete)
Attachments
(12 files, 1 obsolete file)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
It's a little late in the 77 cycle to want to flip the switch, but we want to turn this on and get some testing in Nightly as soon as 78 opens.
My try builds are all green, and the performance numbers look good. There is still a ~5% regression on the backbonejs speedometer tests that I'm looking into, but it's balanced out by some nice wins on jetstream2.
With these patches, we support lookbehind assertions, unicode escape sequences, and the dotAll property. Named captures and capture indices will be implemented separately.
Assignee | ||
Comment 1•5 years ago
|
||
This was broken by changes to token streams in bug 1592105.
Assignee | ||
Comment 2•5 years ago
|
||
If a look-behind back-reference succeeds, we have to subtract the length of the capture from the current position (so that the current position points to the beginning of the capture). We don't have the length in a register, so we have to read it from the capture registers, which are stored on the stack. However, we pushed the initial value of the current position, so the stack pointer is offset by one word from where we expect.
The fix is to pop the saved value before subtracting the length.
With this fix, we pass all the test262 tests for look-behind assertions, dotAll, and unicode property escapes. (I will turn them on in a separate bug.)
Depends on D73112
Assignee | ||
Comment 3•5 years ago
|
||
The engine supports parsing named captures, but we don't have the code in place to expose the captured groups. Until we do, this code will make sure that we get a syntax error when parsing them.
Depends on D73113
Assignee | ||
Comment 4•5 years ago
|
||
Interpreted bytecode is ~3-5 times slower than compiled code, but ~5-10 times smaller. In general it seems like this is a good trade-off for the first few iterations, but for particularly long input strings this can cause a significant slowdown before we tier up. V8 eagerly tiers up to compiled code when the input string is 1000+ characters long. Following their lead on this fixes a significant regression in sunspider-regexp-dna.
Depends on D73115
Assignee | ||
Comment 5•5 years ago
|
||
This is as good a time as any to pull in some recent upstream changes (many of which I wrote). This patch was auto-generated using import-irregexp.py. The necessary changes to the shim code are in the next patch.
Depends on D73116
Assignee | ||
Comment 6•5 years ago
|
||
There are a few changes here:
-
The code that used to be in wrapBody was moved to RegExpCompiler::PreprocessRegExp.
-
The
code
field in RegExpCompileData used to be a bare Object, and is now a Handle<Object>. (This makes named captures way easier to implement, because it lets us to allocate GC things while parsing a regexp without angering the hazard analysis.) I also took this opportunity to remove some dead code in the shim implementation of Handle. -
Upstream V8 made a change to simplify the interface with the interpreter. Previously, the caller of IrregexpInterpreter::MatchForCallFromRuntime was responsible for allocating enough memory to hold both the capture results and the scratch registers. Now, the interpreter has the same interface as the compiler: the caller passes in a buffer large enough to hold the capture results, and the memory for the scratch registers is allocated internally. This requires a few small additions to the shim (IsInRange plus new methods on JSRegExp and SmallVector).
Depends on D73117
Assignee | ||
Comment 7•5 years ago
|
||
These two tests need to be updated to be aware of the new dotAll flag.
If we ever have to turn off the new engine, this patch should also be temporarily reverted.
Depends on D73118
Assignee | ||
Comment 8•5 years ago
|
||
Pull the lever!
(After responsibly waiting for 78 to open.)
Depends on D73119
Comment 9•5 years ago
|
||
Once the patch for this landed, it should probably be documented at https://wiki.developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features.
Though it's not clear to me yet whether enabling the new engine actually also automatically enables all the features blocked by bug 1367105. Iain, can you please clarify that?
Sebastian
Assignee | ||
Comment 10•5 years ago
|
||
With these patches, we support lookbehind assertions, unicode escape sequences, and the dotAll property. Named captures and capture indices will be implemented separately.
Thanks for the pointer to the wiki.
Assignee | ||
Comment 11•5 years ago
|
||
If a regular expression is too big, the assembler may fail with RegExpError::kTooLarge. When it does so, we want to throw an error: "regexp too big".
Until the most recent reimport of irregexp, we were actually reporting an OOM in these cases, because CompilationResult::code
was default-constructed as an UndefinedValue and we took the "OOM in GetCode" path. Now CompilationResult::code
is a Handle, so we crash if we try to access the value.
Making the situation slightly more complicated is the fact that we still have a macroassembler live, which means that we can't GC, which means that we can't report an error. The old code used an AutoSuppressGC for this (https://searchfox.org/mozilla-central/source/js/src/irregexp/RegExpEngine.cpp#1703), but that seems like an extremely blunt instrument.
Instead, I've refactored CompilePattern
to call a separate Assemble
function. This means that we clean up the macroassembler before we call JS_ReportErrorASCII
. The new function is a straight copy-paste of the old code, except for error handling and .
to ->
conversions for the values being passed by reference. Note that the order of checks has changed after calling compiler->Assemble(...)
: now we check result.Succeeded()
before examining result.code
.
We also change the shared labels in SMRegExpMacroAssembler to be NonAssertingLabels. This suppresses assertions in the Label destructor that they are not used without being bound. The assertion is already suppressed for OOM (https://searchfox.org/mozilla-central/source/js/src/jit/Label.h#82-86), which is why we did not trigger it previously.
Comment 12•5 years ago
|
||
Comment 13•5 years ago
•
|
||
Backed out for assertion failure on nsContentUtils.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/e392e4a34033d5b594da33bdbf207119bdd5af54
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301051447&repo=autoland&lineNumber=6354
Also failures on test_xrayToJS.xhtml -> https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301050193&repo=autoland&lineNumber=9510
Also, please take a look at :
Assignee | ||
Comment 14•5 years ago
|
||
We are adding support for the dotAll (/s) RegExp flag, so the list of expected properties on the RegExp prototype has to be updated.
Assignee | ||
Comment 15•5 years ago
|
||
This assertion triggered in dom/base/crashtests/1505811.html, which exhausts the stack before calling IsPatternMatching with an empty string pattern. The first attempt at parsing doesn't involve any stack checks, but the second regexp is more complicated and calls CheckRecursionLimit, which fails.
The fix is to handle the failure instead of asserting.
(Note: I think returning Nothing()
here is correct, for the same reasons that we return Nothing()
if the subsequent call to ExecuteRegExpNoStatics fails. I'm not sure why the first attempt at compilation returns Some
, though.)
Depends on D74149
Assignee | ||
Comment 16•5 years ago
|
||
ASAN found a leak.
We destroy the isolate with a level of indirection through RegexpAPI so that JSContext doesn't have to be able to see the full definition of Isolate.
Updated•5 years ago
|
Assignee | ||
Comment 17•5 years ago
|
||
The regexp engine is adding support for the dotAll flag (/s). As a result, /abc/gimsuy
in structured-clone.any.js (https://searchfox.org/mozilla-central/source/testing/web-platform/tests/IndexedDB/structured-clone.any.js#159) is no longer a syntax error, which means that the structured cloning tests don't all immediately fail.
Some of them still fail, presumably for different reasons. This patch is the result of running the wpt tests and updating the expected results according to the instructions here: https://searchfox.org/mozilla-central/source/testing/web-platform/README.md#158-173
Try build: https://searchfox.org/mozilla-central/source/testing/web-platform/README.md#158-173
Updated•5 years ago
|
Comment 18•5 years ago
|
||
Thanks for all your hard work - good to see the new regexes are finally here!
Named captures and capture indices will be implemented separately.
Do you know when that will also be in? Will it be come before the lookbehind support moves from nightly into the beta release?
If a regular expression is too big, the assembler may fail with RegExpError::kTooLarge.
Are there any built-in limits, or is "too big" just based on available machine memory? And is the limit any different from the current Firefox release?
(Our largest, working on Chrome/Opera, is made from a string with over 1M unicode characters.)
Assignee | ||
Comment 19•5 years ago
|
||
Do you know when that will also be in? Will it be come before the lookbehind support moves from nightly into the beta release?
Named captures are mostly done, and should be in Firefox 78 alongside lookbehind and friends. I just have to finish up the integration with Regexp.prototype[@@replace].
I haven't started capture indices yet. I'm a little concerned that adding them might slow down the common case. I have some ideas for how to work around that, but it would be a larger project, and I'm not sure it will fit into 78.
Are there any built-in limits, or is "too big" just based on available machine memory? And is the limit any different from the current Firefox release?
Under the covers, we are using the same regexp engine as Chrome, so if it's not too big for Chrome, it shouldn't be too big for Firefox.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a73e5ff68480
https://hg.mozilla.org/mozilla-central/rev/49b9d6a8a3a5
https://hg.mozilla.org/mozilla-central/rev/f53110858cb9
https://hg.mozilla.org/mozilla-central/rev/ba01ceb63cf3
https://hg.mozilla.org/mozilla-central/rev/9dddfd577a3d
https://hg.mozilla.org/mozilla-central/rev/b3b82e1cda7f
https://hg.mozilla.org/mozilla-central/rev/f3b9c956fa85
https://hg.mozilla.org/mozilla-central/rev/3f680457842f
https://hg.mozilla.org/mozilla-central/rev/d8f770d123f2
https://hg.mozilla.org/mozilla-central/rev/4eda5acc8e1f
https://hg.mozilla.org/mozilla-central/rev/0081b4c73633
https://hg.mozilla.org/mozilla-central/rev/ee1018a8611a
Comment 22•5 years ago
|
||
Iain, I suppose this should be documented in the developer release notes for 78? https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/78 (dev-doc-needed
keyword). Is that big enough of a change to be also in the developer section of our general release notes?
Comment 23•5 years ago
|
||
![]() |
||
Comment 24•5 years ago
|
||
Backed out together with bug 1636495 for causing crashes e.g. when urls get pasted in Slack (Bug 1637243) - had to be done to create new Nightlies.
https://hg.mozilla.org/mozilla-central/rev/2ad15139df29d03fbf6157bb37bf3592bd3348aa
![]() |
||
Updated•5 years ago
|
![]() |
||
Comment 25•5 years ago
|
||
Backout has been merged to autoland.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 26•5 years ago
|
||
Assignee | ||
Comment 27•5 years ago
|
||
Pascal: Yes, this needs a dev doc update. I think it is probably big enough to be in the general release notes, but I don't know what the criteria are for that.
Comment 28•5 years ago
•
|
||
Please note that only shipping features should be mentioned in the release notes. As far as I can see, this bug only enables it for Nightly, though, which rather qualifies it to be mentioned at https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features, as I mentioned earlier.
Iain, I didn't see a bug yet to let the changes ride the trains to release. Is there already one? If not, could you please create one? Once that one is fixed, it should be mentioned in the release notes. For MDN setting the dev-doc-needed
keyword is enough, for the general Firefox release notes you should set the relnote-firefox
then.
And by the way, thank you very much for implementing this! I've been waiting for the lookbehinds and the other features for years!
Sebastian
Assignee | ||
Comment 29•5 years ago
|
||
Sorry, the name of this bug is misleading. To be clear: these changes are intended to ride the trains to release in 78.
(The first version of the final patch in this stack only turned it on for Nightly, but since we're targeting 78 anyway, we turned it on everywhere to avoid merge-to-beta issues on the new tests. I forgot to update the patch title.)
Comment 30•5 years ago
|
||
Ah, I see! So if the changes aren't backed out again, it should be fine to set relnote-firefox
.
Sebastian
Comment 31•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/80d21da86415
https://hg.mozilla.org/mozilla-central/rev/e0f929077cce
https://hg.mozilla.org/mozilla-central/rev/c474eb63f626
https://hg.mozilla.org/mozilla-central/rev/596dca6e3ed1
https://hg.mozilla.org/mozilla-central/rev/09b8bab47f6b
https://hg.mozilla.org/mozilla-central/rev/534658e99952
https://hg.mozilla.org/mozilla-central/rev/98d7859686e6
https://hg.mozilla.org/mozilla-central/rev/16a1d1bbdd3e
https://hg.mozilla.org/mozilla-central/rev/5c0565636d06
https://hg.mozilla.org/mozilla-central/rev/5871f4ecce87
https://hg.mozilla.org/mozilla-central/rev/3d92a3a541ff
https://hg.mozilla.org/mozilla-central/rev/e365cbeabc4f
Comment 32•5 years ago
|
||
(In reply to Sebastian Zartner [:sebo] from comment #28)
Please note that only shipping features should be mentioned in the release notes. As far as I can see, this bug only enables it for Nightly, though, which rather qualifies it to be mentioned at https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Experimental_features, as I mentioned earlier.
This is not correct. We have release notes for nightly and beta maintained by the release management team so any new feature in nightly and beta should be noted there following this process https://wiki.mozilla.org/Release_Management/Release_Notes#How_to_nominate_a_bug_for_release_notes_addition.3F
Release notes in nightly do not get into beta release notes automatically and beta release notes do not get into the final release notes automatically either.
So yes please, request an addition to the general release notes :)
Regards
Comment 33•5 years ago
|
||
Hi Iain, can you suggest some text for this release note?
Assignee | ||
Comment 34•5 years ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]: This adds support for several new RegExp features, closing a significant gap in web-compat.
[Affects Firefox for Android]: Yes.
[Suggested wording]: New RegExp engine in SpiderMonkey, adding support for the dotAll flag, Unicode escape sequences, lookbehind references, and named captures.
[Links (documentation, blog post, etc)]: I am in the process of writing a Hacks blog post. I will add a comment linking to that post once it's up.
Comment 36•5 years ago
|
||
Developer docs:
https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/78#JavaScript
https://hacks.mozilla.org/2020/06/a-new-regexp-engine-in-spidermonkey/
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Assertions
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Unicode_Property_Escapes
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Groups_and_Ranges
Description
•