Closed Bug 808245 Opened 12 years ago Closed 12 years ago

Use YARR's new MatchOnly JIT mode

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla20

People

(Reporter: dvander, Assigned: sstangl)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 3 obsolete files)

On benchmarks, particularly v8-regexp (and on the web, too), it's common to have code like:

       /(pattern)/.exec(string);

And never consume the matched subexpressions. We currently optimize this by not creating the output array, but we can do better by using YARR's new MatchOnly mode. This mode compiles the regex code to not bother saving the subexpression positions in the first place.

Unfortunately there is one complication: the global RegExp object has unspecified, defacto behavior that the last match results are available via RegExp.$1, RegExp.$2, etc. In SpiderMonkey, these are implemented in RegExpStatics.*. To avoid breaking this, JavaScriptCore has an internal variable on the global RegExp constructor that caches the last match inputs (that is, the string and regex pattern). If the user requests RegExp.$n, and the results were optimized away, it will re-run the regex to compute the results.

We expect this to fix bug 806646, which would account for the largest single performance improvement we can currently make on our V8 score. It should be pretty straightforward.
If the result of exec() is unused, we can actually get away with executing almost nothing at all -- all that's necessary is to cache the last input for reasons given in the above comment.
(In reply to Sean Stangl from comment #1)
> If the result of exec() is unused, we can actually get away with executing
> almost nothing at all -- all that's necessary is to cache the last input for
> reasons given in the above comment.

Good point!
Assignee: general → sstangl
Status: NEW → ASSIGNED
How much of an improvement is this bug expected to make?  Currently, our v8-regex score is really bad.  ETA on fix?
(In reply to Paul Wright from comment #3)
> How much of an improvement is this bug expected to make?  Currently, our
> v8-regex score is really bad.  ETA on fix?

Adding MatchOnly mode moves v8-regexp's score from 1800 to 2000 locally; we require 3600. MatchOnly mode roughly matches our performance on .test() and unused .exec() with the v8 engine (8800 versus 9000 on a v8-regexp benchmark with only .exec() calls), but our benchmark score is held back by poor .replace() performance.

I have patches to address that, but I'm working on breaking them up to post in separate bugs for ease of review. The patch I originally wrote for this bug turned out to be way too complicated.
> but our benchmark score is held back by poor .replace() performance.
> 
> I have patches to address that, but I'm working on breaking them up to post
> in separate bugs for ease of review.

Can you point me in the direction of the applicable bugs / patches?
Blocks: 820118
Attached patch Implement MatchOnly support. (obsolete) — Splinter Review
This is the first patch in a three-part series, the latter two to be resolved by Bug 820118 and Bug 820124. I have broken it up in this way in the hope of making the patches easier to review.

This patch implements YARR MatchOnly support and uses it in regexp_test(). In order to support this new mode, it was necessary to change the behavior of ExecuteRegExp() and its callsites. Because ExecuteRegExp() behavior is baked into the JSAPI, a function ExecuteRegExpLegacy() now exists that provides the old behavior, including output array creation.

A number of data structure changes facilitate the MatchOnly support. Instead of requiring ExecuteRegExp() callers to hold a LifoAllocScope, re.execute() now accepts a |MatchPairs| outparam, which is a resizable array of |MatchPair| objects, with memory details fully handled by the caller via |ScopedMatchPairs| and |VectorMatchPairs| (for RegExpStatics). This allows the same code to store directly into the RegExpStatics matches buffer without copying in the lazy evaluation case. Most uses of raw integers in matches have also been changed into uses of MatchPair objects, improving readability.

In total, the patch makes the following changes:

> Removes the RegExpCode/RegExpShared distinction, keeping RegExpShared,
> Compiles RegExp code at execution time, not when getting the RegExpShared,
> Changes RegExp execution to always output MatchPairs or a MatchPair,
> Uses a MatchConduit (union) to switch between MatchOnly/IncludeSubpatterns,
> Moves most RegExpShared querying logic into MatchPairs,
> Lazifies RegExpShared.

Local improvement to v8-regexp is about 1280 -> 1380, but the score is primarily held back by str_replace() performance. Bug 820124 will fix that, but requires MatchOnly mode to exist. On a version of the v8-regexp benchmark with only calls to .exec() (which are executed as .test()), we roughly match v8, ~8800 versus ~9000.
Attachment #690572 - Flags: review?(dvander)
Attachment #690572 - Flags: feedback?(choller)
Attached patch Implement MatchOnly support, v2 (obsolete) — Splinter Review
Version 2, with some additional cleanup and bugfixes around lazy RegExpStatics::makeMatch().
Attachment #690572 - Attachment is obsolete: true
Attachment #690572 - Flags: review?(dvander)
Attachment #690572 - Flags: feedback?(gary)
Attachment #690572 - Flags: feedback?(choller)
Attachment #691128 - Flags: review?(dvander)
Attachment #691128 - Flags: feedback?(gary)
Attachment #691128 - Flags: feedback?(choller)
Comment on attachment 691128 [details] [diff] [review]
Implement MatchOnly support, v2

feedback+ given that it didn't blow up jsfunfuzz after about half an hour, but this may be a red herring given that the regex fuzzer in jsfunfuzz keeps hitting bug 808478 before anything else.
Attachment #691128 - Flags: feedback?(gary) → feedback+
To make the patch easier to review, I have retroactively broken it up into 5 parts. I didn't go too crazy with rewriting data structures, so occasionally a component from a later patch will creep into an earlier one. I verified that all of the patches compile, but did not bother running or testing them.

The series appears to be green on try, with some random orange: https://tbpl.mozilla.org/?tree=Try&rev=050160f7bc4e
Changes are mostly to vim modelines. At some point I will run through the repo and update all of them, but these were the ones I changed when my editor behaved obnoxiously.
Attachment #691128 - Attachment is obsolete: true
Attachment #691128 - Flags: review?(dvander)
Attachment #691128 - Flags: feedback?(choller)
Attachment #691640 - Flags: review?(dvander)
Precursor to supporting compilation at execution time. Unused, and OK'd by Luke.
Attachment #691641 - Flags: review?(dvander)
Data structure simplification.
Attachment #691642 - Flags: review?(dvander)
Attachment #691643 - Flags: review?(dvander)
This is the main data structure change patch, which rewrites re.execute() and all associated functions to pass the full matches list up to the caller. The interface to MatchPairs is also improved.

Most of the changes to jsstr.cpp are in preparation for lazifying RegExpStatics. They were too obnoxious to rip out.
Attachment #691646 - Flags: review?(dvander)
Adds executeMatchOnly(). In order to use MatchOnly mode in regexp_test(), lazy RegExpStatics need to be in place, so this patch does that also. The cleanup to RegExpStatics was done as part of the previous patch -- I just deleted anything to do with the lazy mode.
Attachment #691647 - Flags: review?(dvander)
Attachment #691640 - Flags: review?(dvander) → review+
Attachment #691641 - Flags: review?(dvander) → review+
Attachment #691642 - Flags: review?(dvander) → review+
Comment on attachment 691643 [details] [diff] [review]
Part 3/5 - Compile at execution time

Review of attachment 691643 [details] [diff] [review]:
-----------------------------------------------------------------

talked IRL, map code needs to be added back
Attachment #691643 - Flags: review?(dvander)
Oh my. This is the correct version of the patch. dvander pointed out that I got carried away with code removal, and removed waaaay too much.

1) Adding the RegExpShared into map_ is necessary for code reuse.
2) Adding the RegExpShared into inUse_ is necessary for object cleanup.
Attachment #691643 - Attachment is obsolete: true
Attachment #692092 - Flags: review?(dvander)
Attachment #691647 - Flags: review?(dvander) → review+
Attachment #691646 - Flags: review?(dvander) → review+
Comment on attachment 692092 [details] [diff] [review]
Part 3/5 - Compile at execution time [v2]

Review of attachment 692092 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, glad to see all this RegExp stuff get cleaned up.
Attachment #692092 - Flags: review?(dvander) → review+
Thank you for the quick reviews!

Try run submitted for final state, green with random orange:
https://tbpl.mozilla.org/?tree=Try&rev=d8a45a696e8b
Depends on: 822145
Comment on attachment 691646 [details] [diff] [review]
Part 4/5 - Use MatchPairs for RegExp output

Review of attachment 691646 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/RegExpObject.cpp
@@ +122,5 @@
>  
> +    /* Initialize all MatchPair objects to invalid locations. */
> +    for (size_t i = 0; i < pairCount; i++) {
> +        pairs_[i].start = size_t(-1);
> +        pairs_[i].limit = size_t(-1);

Sooo... start and limit are ints, which gives me

js/src/vm/RegExpObject.cpp:125:36 [-Woverflow] overflow in implicit constant conversion
js/src/vm/RegExpObject.cpp:126:36 [-Woverflow] overflow in implicit constant conversion
(In reply to :Ms2ger from comment #22)
> Comment on attachment 691646 [details] [diff] [review]
> Part 4/5 - Use MatchPairs for RegExp output
> 
> Review of attachment 691646 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/RegExpObject.cpp
> @@ +122,5 @@
> >  
> > +    /* Initialize all MatchPair objects to invalid locations. */
> > +    for (size_t i = 0; i < pairCount; i++) {
> > +        pairs_[i].start = size_t(-1);
> > +        pairs_[i].limit = size_t(-1);
> 
> Sooo... start and limit are ints, which gives me
> 
> js/src/vm/RegExpObject.cpp:125:36 [-Woverflow] overflow in implicit constant
> conversion
> js/src/vm/RegExpObject.cpp:126:36 [-Woverflow] overflow in implicit constant
> conversion

That initialization isn't even necessary, since Yarr's JIT performs the same initialization as the first step of execution. I'll fix it as part of one of the other regexp bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: