Closed Bug 581595 Opened 10 years ago Closed 10 years ago

Optimize creation of RegExp.prototype.exec's return value

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: alangpierce, Assigned: njn)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

In the v8 regexp test (see http://arewefastyet.com/individual.php#graph-regexp ), even with yarr applied (the big dip on July 3), we still take over 3 times as long on the test as JSC.

I did some shark analysis with the yarr patch applied. According to shark, we spend 67% of our time in js::RegExp::execute. Here's a cleaned-up profile of where this time is being spent, as a percentage of the total time for execute.

+ 100.0%, js::RegExp::execute
| + 42.3%, js_DefineProperty
| | + 41.0%, js_DefineNativeProperty
| | | + 32.9%, JSScope::putProperty
| | | | + 30.2%, JSScope::addPropertyHelper
| | | | | + 21.9%, JSScope::getChildProperty
| | | | | | - 9.9%, js::PropertyTree::getChild
| | | | | | + 8.8%, js_AllocSlot
| | | | | | | - 7.4%, JSObject::growSlots
| + 18.3%, js_NewSlowArrayObject
| | + 15.4%, js_NewFinalizableGCThing
| | | + 15.1% js_GC
| - 8.3%, malloc
| - 6.0%, free
| - 4.0%, jsRegExpExecute
| - 1.4%, js_NewDependentString

Looks like most of our time is spent creating the return array to RegExp.prototype.exec. If I remove the call to js_DefineProperty, we take about half as much time on v8-regexp, so optimizing or special-casing the return value creation could gain us up to about 10% of the v8 suite as a whole.
Blocks: JaegerSpeed
We used to inspect call-site bytecode to decide that the r.v. was not needed, only a falsy or truthy value. I think that all got stripped out, but it is a great idea -- I see .exec and .match calls in the v8-regexp.js test discarding the return value (does anything actually get used? Ahem).

/be
Hah, yeah, I remember removing that optimization for String.match because it didn't seem to matter.  I recall saying "if such an optimization matters, we should just have a cx member that has semantic hints about how the return value is or isn't used.  That still does seem like the principled way to do this category of optimizations and it blends into the linearity analysis that we've discussed previously.  For now, peephole bytecode optimization seems like the way to go, especially since we never ended up successfully removing pc-stores in JM.
So bytecode-inspection doesn't work on trace, where it matters, since all these RegExp.execs happen in a loop.  However, for trace, we know exactly which native we're calling so we can just call a variant of regexp_exec that doesn't create a return value.
blocking2.0: --- → beta5+
blocking2.0: beta5+ → betaN+
Depends on: 586842
I tried doing a premature return from RegExp::execute(), just before the RegExpMatchBuilder is created, and got a 2.00x speedup.
Here's the dirty hack, as requested.

- I added a RegExp.exec() variant that just returns an empty array on success.  It can be used if the result isn't used, or is only tested against null.

- In the tracer I detect a call/trace/pop bytecode sequence where the call is to regexp_exec, and change it to regexp_exec_unfilled_array_retval.  This matching is a fragile;  Luke tells me the 'trace' will be disappearing soon.

- I see a 1.8x speedup on v8-regexp on my MacBook Pro when running with '-j -m'.  (But only on 32-bit... is '-j -m' not hooked up on 64-bit?)

The patch is against JM but I could port it to TM easily.
Assignee: general → nnethercote
Status: NEW → ASSIGNED
Attachment #465549 - Flags: review?(lw)
Attachment #465549 - Flags: review?(sayrer)
Just chatted with Luke on IRC, he noticed that a less hacky optimization would be to have a js_DefineProperties function.  Instead of adding one property at a time, repeating all the checks and stuff, batch them up.  This would be similar to how dense arrays are converted to sparse arrays.
I was just looking at js_DefineProperty() and the everything beneath it.  Goodness, what a rabbit-hole.  So much code gets run just to add a property;  there are so many tests for so many different cases.  Makes me wonder if there are a small number of hot cases that should be looked for first.
(In reply to comment #3)
> So bytecode-inspection doesn't work on trace, where it matters, since all these
> RegExp.execs happen in a loop.  However, for trace, we know exactly which
> native we're calling so we can just call a variant of regexp_exec that doesn't
> create a return value.

Isn't this worth pursuing before optimizing define-property? There are indeed faster less general paths -- but no define and even better, no array at all is much faster.

/be
That's what the attached patch does.  It's kind of gross, though!
This is a big time win for V8, if it's deemed acceptable.  Can I get a review, please?
Comment on attachment 465549 [details] [diff] [review]
patch (against JM 50586:f6dcaa717899)

Oops, sorry for delay.  Alan, look what you've unleashed!

Why does the tracer pass UNFILLED_ARRAY_RETVAL instead of some UNUSED flag which exits before creating the result object?

Also, a nit: the unqualified/unscoped names BOOL_RETVAL, UNFILLED_ARRAY_RETVAL, FILLED_ARRAY_RETVAL.  Any of the classic workarounds for lack of scoped enums would do.  How about:

  namespace RegExpResultUsage {
    enum Type { UNUSED, AS_BOOL, AS_OBJECT };
  }
Attachment #465549 - Flags: review?(lw) → review+
Review ping for sayrer.
Attachment #465549 - Flags: review?(sayrer) → review+
This patch is much better:

- It just converts RegExp.exec() calls to RegExp.test().  No need for any enums about the return value.  This greatly reduces the number of lines of code touched.

- It does the optimization in the case where the result is tested for nullness as well as the case where the result is unused.

- By returning bools instead of empty arrays, the speed-up on v8-regexp went from 1.7x to 1.8x on my Linux box.
Attachment #465549 - Attachment is obsolete: true
Attachment #468214 - Flags: review?(lw)
Comment on attachment 468214 [details] [diff] [review]
patch, v2 (51034:14a90a53ceeb)

Much nicer; almost doesn't feel dirty :)

If you are interested in more free speedup, you might consider factoring out the unused-result bytecode predicate and applying it to str_replace.  It just might hit a few times in v8-regexp...  (N.B. str_replace interprets regexp arguments as regexps (unlike str_indexOf) and non-regexp arguments as strings (unlike str_search).)
Attachment #468214 - Flags: review?(lw) → review+
(In reply to comment #14)
> (N.B. str_replace interprets regexp arguments as regexps (unlike str_indexOf)

str_replace came in with ES3 and regexps, str_indexOf was ES1 ("just like Java").

> and non-regexp arguments as strings (unlike str_search).)

String.prototype.search is only for regexp params -- you want indexOf if you want a string search :-P. Again, compatibility bites.

/be
(In reply to comment #14)
> 
> If you are interested in more free speedup, you might consider factoring out
> the unused-result bytecode predicate and applying it to str_replace.
> It just might hit a few times in v8-regexp...

AFAICT, if the 'replacement' argument isn't a function, then str_replace is pure, in which case the entire call can be removed if the result isn't used.

That optimization feels very dirty, though... whereas it's conceivable that someone would use RegExp.exec() instead of RegExp.test() and then test the result for nullness because they don't know any better, nobody other than a synthetic benchmark author would call String.replace() and throw away the result...  right?
I use the tracemonkey build which from http://hg.mozilla.org/tracemonkey/rev/e2e1ea2a39ce.
But  jQuery script is hanging at the following  chunker.exec

unction () {
        var chunker = /((?:\((?:\([^()]+\)|[^()]+)+\)|\[(?:\[[^[\]]*\]|['"][^'"]*['"]|[^[\]'"]+)+\]|\\.|[^ >+~,(\[\\]+)+|[>+~])(\s*,\s*)?/g,
            done = 0,
            toString = Object.prototype.toString;
        var Sizzle = function (selector, context, results, seed) {
            results = results || [];
            context = context || document;
            if (context.nodeType !== 1 && context.nodeType !== 9) return [];
            if (!selector || typeof selector !== "string") {
                return results;
            }
            var parts = [],
                m, set, checkSet, check, mode, extra, prune = true;
            chunker.lastIndex = 0;
            while ( (m=chunker.exec(selector)) != null) 
            {
                parts.push(m[1]);
                if (m[2]) {
                    extra = RegExp.rightContext;
                    break;
                }
            }
---
Is this patch is the culprit ?
(In reply to comment #17)
>
> Is this patch is the culprit ?

It's possible, but difficult to know without more information.  There are two things you could do to help:

- Try a build of the revision before this patch (66c8ad02543b) and compare it to the revision that added this patch (24749e6ae6e9), or

- Post the full code that you were running so someone else can try to reproduce the problem.

Also, it's generally better to file problems with patches as follow-up bugs.
Depends on: 594126
Depends on: 594108
http://hg.mozilla.org/tracemonkey/rev/24749e6ae6e9
http://hg.mozilla.org/mozilla-central/rev/24749e6ae6e9
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
No longer depends on: 586842
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
You need to log in before you can comment on or make changes to this bug.