Closed Bug 511777 Opened 15 years ago Closed 15 years ago

Avoid RegExp engine for String regex ops taking flat strings

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(1 file, 3 obsolete files)

For operations like:

  "abcd".search("c")
  "abcd".match("c")
  "abcd".replace("c","C")

we create a whole RegExp, even for 'replace', where the spec says that, when not a RegExp, the pattern argument is interpreted as a flat string.  It seems like there is a fair amount of boilerplate to wade through to get to the underlying, essentially-strstr(), algorithm.  I propose that we just implement said loop directly for flat strings.  In addition to flat Strings in String.replace, I propose that we try to do this for 'search' and 'match' strings by searching the [length < fixed upper bound] pattern string for regexp meta-characters.

To avoid fragmenting regexp logic, the functionality could be put in jsregexp.cpp and exported as functions:

  bool js_CharsRequireRegExpMatching(const char *, size_t);
  bool js_ExecuteFlatRegeExp(const char *, size_t, const char *, size_t);

Does this sound like a good idea?  It seems like it would show up in the benchmarks we care about.
(Depending on bug 460904 because it de-obfuscates String RegExp methods.)
For the most part this seems OK, but one thing to check is whether |strstr| is faster than a regexp compiled to native code. It may not be, because the compilation achieves loop unrolling (specialized on the flat string contents). You could certainly simplify the compilation path a lot in any case.
That is a valid point, but:
 - I recall trace-lookup being somewhat expensive.  Its possible that with that and all the other boilerplate surrounding the underlying loop, strstr could complete before the the regexp engine even starts.
 - We could do something like Duff's device to unroll for ourselves.  Since the effectiveness of unrolling dies off a certain number of iterations, viz., when the loop body is longer than an i-cache line or when memory loads are the bottleneck, this should be able to achieve equivalent performance.
(In reply to comment #3)
> That is a valid point, but:
>  - I recall trace-lookup being somewhat expensive.  Its possible that with that
> and all the other boilerplate surrounding the underlying loop, strstr could
> complete before the the regexp engine even starts.

That is definitely true for small text strings, which makes these things hard to tune.

>  - We could do something like Duff's device to unroll for ourselves.  Since the
> effectiveness of unrolling dies off a certain number of iterations, viz., when
> the loop body is longer than an i-cache line or when memory loads are the
> bottleneck, this should be able to achieve equivalent performance.

++ on the 'viz.' usage. ;-) But Duff's doesn't specialize for the pattern string. I'm pretty sure memory loads *are* the bottleneck, but without specializing you have more loads for the pattern text. And I assume you mean bigger (ish) than the i-cache, not just one cache line. Which reminds me, I wanted to dig out my architecture textbooks to refresh myself on that stuff. Thanks!
(In reply to comment #4)
> I'm pretty sure memory loads *are* the bottleneck, but without
> specializing you have more loads for the pattern text.

I expect the bottleneck would be with the memory-controller, and since the pattern text would be kept in the d-cache by constant usage, it shouldn't be sapping the memory-controller bandwidth.  But my argument is growing further dependent on the micro-architecture; it is definitely worth doing an experiment before going any further, though.
We use Boyer-Moore-Horspool from String.prototype.indexOf, with thresholding that badly needs tuning. The BMH helper is factored out.

/be
(In reply to comment #5)
> (In reply to comment #4)
> > I'm pretty sure memory loads *are* the bottleneck, but without
> > specializing you have more loads for the pattern text.
> 
> I expect the bottleneck would be with the memory-controller, and since the
> pattern text would be kept in the d-cache by constant usage, it shouldn't be
> sapping the memory-controller bandwidth.  But my argument is growing further
> dependent on the micro-architecture; it is definitely worth doing an experiment
> before going any further, though.

I just so happen to have a directory full of regexp design files. I tried it on a sunspider-y flat regexp and the perf was not detectably different between the ->native approach and the wcsstr approach.

BMH makes sense.
So I did a conservative first stab: only String.search, no loop unrolling, no BMH.  On the microbenchmark:

s = 0;
for (var i = 0; i < 1000000; ++i) {
    s += "asdfasdfasdfasdfasdfasdfasdfasdfasdfilyhb".search("ilyhb");
}

I get about 5x speedup without JIT and 10x with.

I can't find any uses of flat String.search in sunspider, v8, or peacekeeper, but this portends good speedup for match and replace, of which there are many with flat string arguments in peackeeper/string.
Dup indeed.
Attached patch optimize for flat patterns (obsolete) — Splinter Review
This patch does the flat-pattern optimization for String.replace, match, and search.  Synthetic benchmarks (like the one above) show ~2x speedup for match and replace.  On the stringChat, stringReplace portions of peacekeeper (calling run() on the raw .js file a few thousand times), the speedup is ~2x and for stringWeighted 20%.
Attached patch v.2 (obsolete) — Splinter Review
So this patch gives a 40% higher score on Peacekeeper Text Parsing.
(For reference, before: 1469, after: 2073)

A couple of notes:

- I did manual unrolling (a la Duff's device) and got a pretty good speedup (~40% doing indexOf in a loop, ~6% on peacekeeper Text Parsing).  Yeah, its gross-lookin', but so is js_BoyerMooreHorspool, and now they are both factored into StringMatch so that the "which algorithm" logic is in one place.  I bet if I root around jsstr.cpp I can find another place to use it...

- I didn't end up tweaking the BMH threshhold because it seems that BMH wins not for string lengths > k, but for strings with many partial matches.  Hence, even relatively short searches ("abcdabcdabcdabcdabcde".search("abcde")) show speedup, and conversely long inputs with no partial matches can show a tiny slowdown.  The 512 lower bound makes sense by saying "we'll only try when the winning input wins big and the losing input loses small", so it looks as good as any other big number to me.

- I noticed the types used in the BMH algo are too small for 64-bit systems.  Changing BMH's types involved changing str_indexOf, so I factored the flat-matching algorithm so that indexOf and match/search/replace could all share.  While I was in indexOf I added a path so that integer indices (e.g., as used in peacekeeper) don't involve a function calls and conversion to doubles and back.  That's part of the peacekeeper speedup.

This patch still doesn't look into regexps to see if they are flat or do anything special for global matches.
Attachment #397415 - Attachment is obsolete: true
Comment on attachment 397964 [details] [diff] [review]
v.2

>diff --git a/js/src/jsapi.h b/js/src/jsapi.h
>--- a/js/src/jsapi.h
>+++ b/js/src/jsapi.h
>@@ -258,16 +258,26 @@ UWORD_TO_ROOTED_JSVAL(JSContext *cx, siz
> {
>     extern JSBool js_NewNumberInRootedValue(JSContext *cx, jsdouble d, jsval *vp);
>     if (i >= (size_t)JSVAL_INT_MAX)
>         return js_NewNumberInRootedValue(cx, i, vp);
>     *vp = INT_TO_JSVAL(i);
>     return JS_TRUE;
> }
> 
>+static JS_ALWAYS_INLINE JSBool
>+WORD_TO_ROOTED_JSVAL(JSContext *cx, ptrdiff_t i, jsval *vp)
>+{
>+    extern JSBool js_NewNumberInRootedValue(JSContext *cx, jsdouble d, jsval *vp);
>+    if (!INT_FITS_IN_JSVAL(i))
>+        return js_NewNumberInRootedValue(cx, i, vp);
>+    *vp = INT_TO_JSVAL(i);
>+    return JS_TRUE;
>+}
>+

Not in jsapi.h! For one, js_NewNumberInRootedValue is not exported with JS_FRIEND_API let alone JS_PUBLIC_API. For another we do not worry about jsapi.h layering overhead. If the engine's internals need this inline, it should go in a non-public header.

>diff --git a/js/src/jsregexp.cpp b/js/src/jsregexp.cpp
>--- a/js/src/jsregexp.cpp
>+++ b/js/src/jsregexp.cpp
>@@ -5764,8 +5764,24 @@ JSBool
> js_SetLastIndex(JSContext *cx, JSObject *obj, jsdouble lastIndex)
> {
>     jsval v;
> 
>     return JS_NewNumberValue(cx, lastIndex, &v) &&
>            JS_SetReservedSlot(cx, obj, 0, v);
> }
> 
>+bool
>+js_FlatString(const jschar *chars, size_t length)
>+{
>+    for (size_t i = 0; i < length; ++i) {
>+        char c = chars[i];
>+        switch (c) {
>+            /* Taken from the PatternCharacter production in 15.10.1. */
>+            case '^': case '$': case '\\': case '.': case '*': case '+':
>+            case '?': case '(': case ')': case '[': case ']': 
>+            case '{': case '}': case '|':

Half-indent (two spaces) case and goto labels, style guide sez...

>+                return false;

... so you don't have to overindent this one c-basic-offset.

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -70,16 +70,18 @@
> #include "jsopcode.h"
> #include "jsregexp.h"
> #include "jsscope.h"
> #include "jsstaticcheck.h"
> #include "jsstr.h"
> #include "jsbit.h"
> #include "jsvector.h"
> 
>+#include <wchar.h>

Standard headers first, no need for blank line.

>+#if 0 /* If someone decide's Duff's device is too gnarly. */

Duff's device rocks, no need for this #if 0 hedge.

>+    switch ((textend - text) & 7) {
>+        do {
>+            case 0: if (*t++ == p0) { fixup = 8; goto match; }
>+            case 7: if (*t++ == p0) { fixup = 7; goto match; }
>+            case 6: if (*t++ == p0) { fixup = 6; goto match; }
>+            case 5: if (*t++ == p0) { fixup = 5; goto match; }
>+            case 4: if (*t++ == p0) { fixup = 4; goto match; }
>+            case 3: if (*t++ == p0) { fixup = 3; goto match; }
>+            case 2: if (*t++ == p0) { fixup = 2; goto match; }
>+            case 1: if (*t++ == p0) { fixup = 1; goto match; }

Half-indent case again, even when not directly within switch's body.

I couldn't spot the <wchar> dependencies at a glance -- what were they?

Nice speedup work! Gratifying to know my 1997-era SWAG of 512 survives.

/be
(In reply to comment #13)

> I couldn't spot the <wchar> dependencies at a glance -- what were they?

I guess technically they are in my vim buffer history, which is to say, I was playing with wcsstr() and forgot to cut out the header :)
Attachment #397964 - Attachment is obsolete: true
Attachment #398052 - Flags: review?(jwalden+bmo)
Attachment #398052 - Flags: review?(jwalden+bmo)
Attached patch v.4, nicerSplinter Review
I thought I'd extend the optimization to flat RegExps, and in doing so I refactored/cleaned up and found bugs.  Also, this patch assumes dvander's patch in bug 513348 which limits string lengths to 2^30-1 on x64 systems, so that I can take out the extra logic I added to handle too-big indices stored in jsvals.

Then I realized (i.e., js/tests informed me) there is a lot of extra junk to do with RegExps, like setting all the static fields, so I turned that off.  But the resulting patch is still better than v.3.
Attachment #398052 - Attachment is obsolete: true
Attachment #398218 - Flags: review?(jwalden+bmo)
Depends on: 513348
No longer depends on: 460904
Comment on attachment 398218 [details] [diff] [review]
v.4, nicer

>diff --git a/js/src/jsregexp.cpp b/js/src/jsregexp.cpp

>+bool
>+js_FlatString(const jschar *chars, size_t length)
>+{
>+    for (size_t i = 0; i < length; ++i) {
>+        char c = chars[i];
>+        switch (c) {
>+          /* Taken from the PatternCharacter production in 15.10.1. */
>+          case '^': case '$': case '\\': case '.': case '*': case '+':
>+          case '?': case '(': case ')': case '[': case ']': case '{':
>+          case '}': case '|':
>+            return false;
>+          default:;
>+        }
>+    }
>+    return true;
>+}

This naming overload of "flat" is horrible.  Invert its meaning and make that js_ContainsRegExpSpecialChars perhaps?  Also, given that this is being exposed multiple places, it seems to me that it would be better as a method on JSString, with the usual naming-style changes.


>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp

>+js_BoyerMooreHorspool(const jschar *text, jsuint textlen,
>+                      const jschar *pat, jsuint patlen)
>+{
>     uint8 skip[BMH_CHARSET_SIZE];
>-    jschar c;
>-
>-    JS_ASSERT(0 < patlen && patlen <= BMH_PATLEN_MAX);
>-    for (i = 0; i < BMH_CHARSET_SIZE; i++)
>+
>+    JS_ASSERT(patlen <= BMH_PATLEN_MAX);

Why no > 0 assertion here, indeed why specifically remove it?


>+    for (uint i = 0; i < BMH_CHARSET_SIZE; i++)
>         skip[i] = (uint8)patlen;

uint as far as I can tell is an obsolescent type used in only one other location in SpiderMonkey; make this jsuint or some other unsigned type than uint.


>+static JS_ALWAYS_INLINE jsint
>+StringMatch(const jschar *text, jsuint textlen,
>+            const jschar *pat, jsuint patlen)
>+{
>+    if (patlen == 0)
>+        return 0;
>+    if (textlen < patlen)
>+        return -1;
>+
>+    /* XXX tune the BMH threshold (512) */
>+    if (textlen >= 512 && patlen <= BMH_PATLEN_MAX) {

While you're here move 512 into a static const.


>+    JSString *str;
>+    NORMALIZE_THIS(cx, vp, str);
>+
>+    JSString *str2 = ArgToRootedString(cx, argc, vp, 0);
>+    if (!str2)
>+        return JS_FALSE;

patstr would be a better name for this variable.


>+static JS_ALWAYS_INLINE bool
>+DefProp(JSContext *cx, JSObject *obj, jsid id, jsval val)
>+{
>+    return js_DefineProperty(cx, obj, id, val, JS_PropertyStub, JS_PropertyStub,
>+                             JSPROP_ENUMERATE);
>+}

This mini-typing aid doesn't seem necessary; just use obj->defineProperty(...).
Attachment #398218 - Flags: review?(jwalden+bmo) → review+
(In reply to comment #17)

Thanks!

> This naming overload of "flat" is horrible.  Invert its meaning and make that
> js_ContainsRegExpSpecialChars perhaps?  Also, given that this is being exposed
> multiple places, it seems to me that it would be better as a method on
> JSString, with the usual naming-style changes.

Roger on the renaming, but I don't agree on making it a member of JSString; this is regexp-specific logic and I don't think belongs in JSString.  That's how one gets 2000-member classes.

> >+js_BoyerMooreHorspool(const jschar *text, jsuint textlen,
> >+                      const jschar *pat, jsuint patlen)
> >+{
> >     uint8 skip[BMH_CHARSET_SIZE];
> >-    jschar c;
> >-
> >-    JS_ASSERT(0 < patlen && patlen <= BMH_PATLEN_MAX);
> >-    for (i = 0; i < BMH_CHARSET_SIZE; i++)
> >+
> >+    JS_ASSERT(patlen <= BMH_PATLEN_MAX);
> 
> Why no > 0 assertion here, indeed why specifically remove it?

Oops, I got a bit overzealous when I switched to from jsint to jsuint.

> >+static JS_ALWAYS_INLINE bool
> >+DefProp(JSContext *cx, JSObject *obj, jsid id, jsval val)
> >+{
> >+    return js_DefineProperty(cx, obj, id, val, JS_PropertyStub, JS_PropertyStub,
> >+                             JSPROP_ENUMERATE);
> >+}
> 
> This mini-typing aid doesn't seem necessary; just use obj->defineProperty(...).

I added that so its easy to see what (id,value)s are being added, and not get lost in the river of text.  Its much grosser looking without.
We're quite far from JSString being a 2000-member class.

Two JS_PropertyStubs are pretty ignorable, I think.
(In reply to comment #19)
> We're quite far from JSString being a 2000-member class.

Even if JSString had two members, the argument for not putting regular-expression logic in JSString would remain.  Growing classes with giant interfaces happens 1 member at a time.

> Two JS_PropertyStubs are pretty ignorable, I think.

They might be if they were textually aligned, which, with reasonable column widths, they aren't.  This is just factoring out redundancy, and I see it done in other places with as little or littler benefit as here.
This seems like it would be preferable in JSObject, then, to avoid having everyone define their own helpers to do this (or not, as the case more often would be:

    inline definePropertyValue(JSContext* cx, jsid id, jsval v) {
        return defineProperty(cx, id, v, JS_PropertyStub, JS_PropertyStub,
               JSPROP_ENUMERATE);
    }
That's a nice idea.  Alternatively, we could give the last 3 parameters of  JSObject::defineProperty default values.
(In reply to comment #22)
> Alternatively, we could give the last 3 parameters of
> JSObject::defineProperty default values.

I'm in favor!
Dude, default parameters rule!

/be
Hmm, mochitest/everythingelse failures...
backed out yesterday, found unthinkably stupid mistake made while addressing comment 17,
http://hg.mozilla.org/tracemonkey/rev/f569d59461d4
http://hg.mozilla.org/mozilla-central/rev/f569d59461d4
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 526173
Depends on: 565608
Depends on: 612838
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: