Last Comment Bug 562553 - TM: stupidly cache double-to-string conversions to speed up stupid v8-splay.js
: TM: stupidly cache double-to-string conversions to speed up stupid v8-splay.js
Status: RESOLVED FIXED
fixed-in-tracemonkey
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: x86 Mac OS X
: -- normal (vote)
: ---
Assigned To: Nicholas Nethercote [:njn]
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-28 20:46 PDT by Nicholas Nethercote [:njn]
Modified: 2010-05-13 00:46 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
quick and dirty implementation of Nick's idea (2.30 KB, patch)
2010-04-28 22:39 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch v2 (2.50 KB, patch)
2010-04-28 23:18 PDT, Nicholas Nethercote [:njn]
no flags Details | Diff | Splinter Review
patch v3 (2.51 KB, patch)
2010-04-28 23:29 PDT, Nicholas Nethercote [:njn]
cdleary: review+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2010-04-28 20:46:38 PDT
I took a quick look at v8-splay.js with Cachegrind:

--------------------------------------------------------------------------------
          Ir
--------------------------------------------------------------------------------
3,926,401,052  PROGRAM TOTALS

--------------------------------------------------------------------------------
          Ir  file:function
--------------------------------------------------------------------------------
1,549,212,505  /home/njn/moz/ws0/js/src/optg32/../dtoa.c:js_dtostr
 291,775,264  /home/njn/moz/ws0/js/src/optg32/../dtoa.c:multadd
 200,155,572  /build/buildd/eglibc-2.10.1/malloc/malloc.c:_int_malloc
 173,398,709  /home/njn/moz/ws0/js/src/optg32/../jsgc.cpp:js_CallGCMarker
 172,717,716  ???:???
 118,097,974  /home/njn/moz/ws0/js/src/optg32/../jsops.cpp:js_Interpret
  81,618,330  /build/buildd/eglibc-2.10.1/malloc/malloc.c:_int_free
  77,324,952  /home/njn/moz/ws0/js/src/optg32/../dtoa.c:lshift

Wow, what a lot of double-to-string conversions.  They come from this function:


Digging some more, we have lots of calls to js_NumberToString().  They are due to this function:

function GeneratePayloadTree(depth, key) {
  if (depth == 0) {
    return {
      array  : [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ],
      string : 'String for key ' + key + ' in leaf node'
    };
  } else {
    return {
      left:  GeneratePayloadTree(depth - 1, key),
      right: GeneratePayloadTree(depth - 1, key)
    };
  }
}

This is called with 'depth'==5 and 'key'==a random generated double.  Because it's recursive, we end up converting 'key' 32 times.  This function is called a zillion times.

I tried changing the benchmark to do the conversion first, it ran about 2.3x faster:

Index: tests/v8-v4/v8-splay.js
===================================================================
--- tests/v8-v4/v8-splay.js     (revision 52427)
+++ tests/v8-v4/v8-splay.js     (working copy)
@@ -69,7 +69,7 @@
   do {
     key = GenerateKey();
   } while (splayTree.find(key) != null);
-  splayTree.insert(key, GeneratePayloadTree(kSplayTreePayloadDepth, key));
+  splayTree.insert(key, GeneratePayloadTree(kSplayTreePayloadDepth, "" + key));
   return key;
 }


If I instead generate the entire string first, the 2.3x improves to about 2.8x:

Index: tests/v8-v4/v8-splay.js
===================================================================
--- tests/v8-v4/v8-splay.js     (revision 52427)
+++ tests/v8-v4/v8-splay.js     (working copy)
@@ -45,7 +45,7 @@
   if (depth == 0) {
     return {
       array  : [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ],
-      string : 'String for key ' + key + ' in leaf node'
+      string : key
     };
   } else {
     return {
@@ -69,7 +69,7 @@
   do {
     key = GenerateKey();
   } while (splayTree.find(key) != null);
-  splayTree.insert(key, GeneratePayloadTree(kSplayTreePayloadDepth, key));
+  splayTree.insert(key, GeneratePayloadTree(kSplayTreePayloadDepth, 'String for key ' + key + ' in leaf node'));
   return key;
 }

The program has this rather amusing/inaccurate comment at the top:

// This benchmark is based on a JavaScript log processing module used
// by the V8 profiler to generate execution time profiles for runs of
// JavaScript applications, and it effectively measures how fast the
// JavaScript engine is at allocating nodes and reclaiming the memory
// used for old nodes. Because of the way splay trees work, the engine
// also has to deal with a lot of changes to the large tree object
// graph.

I see two options here.  If we assume we can't change the benchmark, we should just cache the last double-to-string conversion -- a stupid hack for a stupid benchmark.

Alternatively, we could file a bug with Google and try to get them to improve it.

Or we could do both.
Comment 1 Boris Zbarsky [:bz] 2010-04-28 21:09:10 PDT
Hmm.  Any idea how other engines handle this case?
Comment 2 Andreas Gal :gal 2010-04-28 22:39:11 PDT
Created attachment 442320 [details] [diff] [review]
quick and dirty implementation of Nick's idea
Comment 3 Andreas Gal :gal 2010-04-28 22:39:43 PDT
++njn for the profiling work. nice find.
Comment 4 Nicholas Nethercote [:njn] 2010-04-28 23:18:43 PDT
Created attachment 442329 [details] [diff] [review]
patch v2

This is just Andreas's patch with a couple of extra comments.  It looks good to me but I'm not expert on thread-local stuff so someone who knows about that should check it.

I see a 2.3x speedup on i386/mac and 1.7x on X64/mac;  not sure why there's such a discrepancy, it seems that js_dtostr is cheaper on 64-bit.  These improvements result in respective 10% and 6% improvements for V8 overall.
Comment 5 Andreas Gal :gal 2010-04-28 23:24:42 PDT
Comment on attachment 442329 [details] [diff] [review]
patch v2

>diff --git a/js/src/jscntxt.cpp b/js/src/jscntxt.cpp
>--- a/js/src/jscntxt.cpp
>+++ b/js/src/jscntxt.cpp
>@@ -175,16 +175,18 @@ JSThreadData::purge(JSContext *cx)
>     if (cx->runtime->gcRegenShapes)
>         traceMonitor.needFlush = JS_TRUE;
> #endif
> 
>     /* Destroy eval'ed scripts. */
>     js_DestroyScriptsToGC(cx, this);
> 
>     js_PurgeCachedNativeEnumerators(cx, this);
>+
>+    dtoaCache.s = NULL;
> }
> 
> void
> JSThreadData::purgeGCFreeLists()
> {
>     if (!localRootStack) {
>         gcFreeLists.purge();
>     } else {
>diff --git a/js/src/jscntxt.h b/js/src/jscntxt.h
>--- a/js/src/jscntxt.h
>+++ b/js/src/jscntxt.h
>@@ -551,16 +551,24 @@ struct JSThreadData {
> 
> #ifdef JS_EVAL_CACHE_METERING
>     JSEvalCacheMeter    evalCacheMeter;
> #endif
> 
>     /* State used by dtoa.c. */
>     DtoaState           *dtoaState;
> 
>+    /* State used to cache some double-to-string conversions.  Sometimes we
>+     * get lucky and are asked to convert the same double twice in a row. */

Comment style in the JS engine:

/*
 *
 */

>+    struct {
>+        jsdouble d;
>+        jsint    base;
>+        JSString *s;        // if s==NULL, d and base are not valid

Maybe stick to /* */ ? But // seems ok too

>+    } dtoaCache;
>+
>     /*
>      * Cache of reusable JSNativeEnumerators mapped by shape identifiers (as
>      * stored in scope->shape). This cache is nulled by the GC and protected
>      * by gcLock.
>      */
> #define NATIVE_ENUM_CACHE_LOG2  8
> #define NATIVE_ENUM_CACHE_MASK  JS_BITMASK(NATIVE_ENUM_CACHE_LOG2)
> #define NATIVE_ENUM_CACHE_SIZE  JS_BIT(NATIVE_ENUM_CACHE_LOG2)
>diff --git a/js/src/jsnum.cpp b/js/src/jsnum.cpp
>--- a/js/src/jsnum.cpp
>+++ b/js/src/jsnum.cpp
>@@ -880,22 +880,28 @@ js_NumberToStringWithBase(JSContext *cx,
>         if (base == 10 && jsuint(i) < INT_STRING_LIMIT)
>             return JSString::intString(i);
>         if (jsuint(i) < jsuint(base)) {
>             if (i < 10)
>                 return JSString::intString(i);
>             return JSString::unitString(jschar('a' + i - 10));
>         }
>     }
>+    JSThreadData *data = JS_THREAD_DATA(cx);
>+    if (data->dtoaCache.s && data->dtoaCache.base == base && data->dtoaCache.d == d)
>+        return data->dtoaCache.s;
>     numStr = NumberToCString(cx, d, base, buf, sizeof buf);
>     if (!numStr)
>         return NULL;
>     s = JS_NewStringCopyZ(cx, numStr);
>     if (!(numStr >= buf && numStr < buf + sizeof buf))
>         js_free(numStr);
>+    data->dtoaCache.base = base;
>+    data->dtoaCache.d = d;
>+    data->dtoaCache.s = s;
>     return s;
> }
> 
> JSString * JS_FASTCALL
> js_NumberToString(JSContext *cx, jsdouble d)
> {
>     return js_NumberToStringWithBase(cx, d, 10);
> }
Comment 6 Nicholas Nethercote [:njn] 2010-04-28 23:28:50 PDT
> >+    /* State used to cache some double-to-string conversions.  Sometimes we
> >+     * get lucky and are asked to convert the same double twice in a row. */
> 
> Comment style in the JS engine:
> 
> /*
>  *
>  */

I'll fix.

> 
> >+    struct {
> >+        jsdouble d;
> >+        jsint    base;
> >+        JSString *s;        // if s==NULL, d and base are not valid
> 
> Maybe stick to /* */ ? But // seems ok too

// is used elsewhere for one-liners.

Andreas on nit patrol, who would have thought? :P
Comment 7 Nicholas Nethercote [:njn] 2010-04-28 23:29:12 PDT
Created attachment 442333 [details] [diff] [review]
patch v3
Comment 8 Nicholas Nethercote [:njn] 2010-04-29 17:19:08 PDT
http://hg.mozilla.org/tracemonkey/rev/74ef6155d096
Comment 9 Nicholas Nethercote [:njn] 2010-05-02 22:33:45 PDT
Google has a bug tracking the problem with v8-splay.js:

http://code.google.com/p/v8/issues/detail?id=690
Comment 11 Alfred Kayser 2010-05-13 00:46:37 PDT
What about caching or optimizing: string : 'String for key ' + key + ' in leaf node'?
That is also called 32 times with the same key.

Note You need to log in before you can comment on or make changes to this bug.