optimize string.replace for flat strings




11 years ago
9 years ago


(Reporter: shaver, Assigned: dmandelin)



Bug Flags:
wanted1.9.1 ?

Firefox Tracking Flags

(Not tracked)


(Whiteboard: [parity-IE][parity-Safari])


(2 attachments, 2 obsolete attachments)

Lots of cases on the web of using a flat string as the needle, and it's pretty expensive to spin up the regex engine for such a case.  We have such code in our own FE, for doing entity replacement and other such processing.

I think there are 3 phases here:

1) detect the simple str.replace("needle", "new") case and short-circuit it by reusing indexOf's internals
2) handle the global and case-insensitive versions, like str.replace("needLE", "new", "gi")
3) undo all of this in favour of optimizing the regex engine for the simple-string case

I have a patch for 1, which shows a decent gain on Celtic Kane's Test_String:

Before: Test_String :  308
After : Test_String :  221

(The next big chunk of that test is spent finding the String prototype to look up the methods, with my patch.)

As well as my patch, I'll attach my sharkified ck.js here, I typically run it via

js -f ck.js -e 'useshark = 1; testcount = 1; RunTests("Test_String")'

to profile.
Created attachment 319669 [details] [diff] [review]
phase 1: str.replace("needle", "flat")
Assignee: general → shaver
Created attachment 319670 [details]
Celtic Kane's test with some constants increased to ease profiling
Comment on attachment 319669 [details] [diff] [review]
phase 1: str.replace("needle", "flat")

>+#if 0
>     /* XXX tune the BMH threshold (512) */
>     if (textlen - i >= 512 && (jsuint)(patlen - 2) <= BMH_PATLEN_MAX - 2) {
>         index = js_BoyerMooreHorspool(text, textlen, pat, patlen, i);

>+    v = JS_ARGV(cx, vp)[0];
>+    if (!JSVAL_IS_REGEXP(cx, v) && repstr != NULL && argc < 3) {

Fastest path is for JSVAL_IS_STRING(v). Who cares about objects, etc.?

>+        global = JS_FALSE; /* for when we do flat-string regexes through this */

Premature logic control alert! :-P

>+        do {
>+            jsint found;
>+            found = patlen
>+                ? str_indexOf_inner(text, textlen, pat, patlen, i)
>+                : 0;

Nit: underhang right-hand side of assignment.

Prefer _helper to _inner (see jsxml.c).

>+            if (found < 0)
>+                break;

No more occurrences.

>+            length += found - i + replen;
>+            chars = (jschar *) JS_realloc(cx, chars, length * sizeof(jschar));
>+            if (!chars)
>+                return JS_FALSE;
>+            APPEND_CHARS(text, i, found - i);
>+            APPEND_CHARS(reptext, 0, replen);
>+            i = found + patlen;
>+            if (!global)
>+                break;

>+        chars = (jschar *)JS_realloc(cx, chars, length * sizeof(jschar));

Nit: space after cast.

>+        if (!chars)
>+            return JS_FALSE;

Non-nit: leak on error. Need newchars or similar.

>+        APPEND_CHARS(text, i, textlen - i);
>+        str = js_NewString(cx, chars, length);

Non-nit: need (length + 1) * sizeof(jschar) JS_realloc argument.

>+        if (!str) {
>+            JS_free(cx, chars);
>+            return JS_FALSE;
>+        }

Created attachment 320027 [details] [diff] [review]
brendan's comments addressed, performance improved

Ripping out the unused global-replace capabilities let me use a single allocation point, making for better speed and simpler code.

Test_String :  172
Attachment #319669 - Attachment is obsolete: true
Attachment #320027 - Flags: review?(brendan)
Comment on attachment 320027 [details] [diff] [review]
brendan's comments addressed, performance improved

>+    if (JSVAL_IS_STRING(v) && repstr != NULL && argc < 3) {

Eschew != NULL unless nesting assignment on the right in a loop condition.

>+        jschar *text, *pat, *reptext;
>+        size_t textlen, patlen, replen, charsindex;
>+        jsint found;

Order by order of initialization?

>+        str = JSVAL_TO_STRING(v);
>+        JSSTRING_CHARS_AND_LENGTH(str, pat, patlen);
>+        NORMALIZE_THIS(cx, vp, str);
>+        JSSTRING_CHARS_AND_LENGTH(str, text, textlen);
>+        JSSTRING_CHARS_AND_LENGTH(repstr, reptext, replen);
>+        chars = NULL;
>+        charsindex = 0;

chars is not used before set after this.

charsindex is unused after this.

>+        found = patlen ? str_indexOf_helper(text, textlen, pat, patlen, 0) : 0;
>+        if (found < 0) {
>+            *vp = vp[1]; /* Return original value if pattern wasn't found. */
>+            return JS_TRUE;
>+        }

Use an if-else to avoid testing found < 0 in the patlen == 0 case.

>+        length = textlen + replen - patlen;
>+        chars = (jschar *) JS_malloc(cx, (length + 1) * sizeof(jschar));
>+        if (!chars)
>+            return JS_FALSE;
>+        js_strncpy(chars, text, found);              /* Prefix. */
>+        js_strncpy(chars + found, reptext, replen);  /* Replacement text. */
>+        js_strncpy(chars + found + replen, text + found + patlen,
>+                   textlen - found - patlen);        /* Suffix. */

Nit: "inline" or "on the right" comments use sentence fragments -- no capitalization or full stop.

Created attachment 322955 [details] [diff] [review]
with brendan's comments addressed
Attachment #320027 - Attachment is obsolete: true
Attachment #322955 - Flags: review?(brendan)
Attachment #320027 - Flags: review?(brendan)
Comment on attachment 322955 [details] [diff] [review]
with brendan's comments addressed

>+    jsint i, index, textlen, patlen;

Given the following changes, should i be renamed to start? Probably -- r=me with that if you agree.

Attachment #322955 - Flags: review?(brendan) → review+
Committed with the i/start rename:

15341[tip]   1f599577eca2   2008-06-12 18:30 -0700   shaver
  Bug 432525: optimize string.replace for flat strings; r=brendan
Last Resolved: 11 years ago
Resolution: --- → FIXED
This caused a large number of mochitest failures on all platforms so I backed it out.
Resolution: FIXED → ---
Um, yeah.  I think I had my mq entirely qpopped when I mochitested, because wow.  Sorry, and thanks to roc for wiping my chin.


11 years ago
Flags: wanted1.9.1?

Comment 11

11 years ago
It's probably a moot point now - but historically I've found:
to be faster than
  string.replace("needle", "flat")

Would there be interest in some test cases to see if/how we've improved here?


10 years ago
OS: Mac OS X → All
I think Dave is going to pick up this patch and iron it out.

The first version of the patch has the infrastructure for global replace, though not wired up to the "g" arg.  (SunSpider, at least, needs it.)  You want to work from the most recent patch, I think, but the first one shows one way to get the global stuff connected.
Assignee: shaver → dmandelin

Comment 13

10 years ago
Thanks for the tip about global. I know I have to do it but I wasn't aware you had already coded it up in the first patch--I had looked only at the latest one.

Comment 14

10 years ago
So, do we still care about this now that SunSpider has been updated not to use the 'g' flag? As of now, this patch speeds up trunk SunSpider by about 2ms, by reducing the time for the replace operations from 15ms to 13ms.

I think it's worthwhile only if we care to speed up the old SunSpider, which will presumably be the one on the web for some time. In that case I'd need to get back the "g" machinery and also put in something for $&. Let me know.

Comment 15

10 years ago
It seems like both IE and Safari do this optimization (and both without any $& substitution) which can and does confuse web developers (see e.g. bug 459378).
Whiteboard: [parity-IE][parity-Safari]

Comment 16

9 years ago
There are still some things mentioned here that aren't being done by bug 511777 (e.g. picking out flat RegExp objects, and 'g' matching).
Last Resolved: 11 years ago9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 511777
You need to log in before you can comment on or make changes to this bug.