Closed Bug 411722 Opened 14 years ago Closed 14 years ago

js_GetLocalNames is slow

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Keywords: perf)

Attachments

(2 files, 11 obsolete files)

This is a spin-off of bug 411575 to focus on the slowness of js_GetLocalNames. As bug 411575 comment 0 indicates one of the cause of the slowness of js_PutCallObject is slow js_GetLocalNames. 

It would be nice to speedup that function.
Blocks: 411575
No longer depends on: 411575
Attached patch v1 (obsolete) — Splinter Review
I assume that part of js_GetLocalNames slowness comes from re-encoding the local names array as an array of atoms with extra checks to see if the bitmap array should be constructed as well.

The patch replaces that by exposing the array directly when its length is less than MAX_ARRAY_LOCALS (==8) or allocating a temporary otherwise. The temporary is allocated using JS_malloc to speedup the common case of fewer than 8 locals.

To simplify xdr logic the code changes the format to serialize the locals in chunks of 32 names. This way there is no need to manage a separated bitmap array.

The patch bumps the XDR version since the above changes the format when XDR-ed function has more than 32 locals.
Attachment #296383 - Flags: review?(brendan)
jst, can you re-profile with this patch applied?

/be
The patch did help, it made the time spent in js_PutCallObject drop from being about 20% of all the time spent in JS_Invoke() down to ~8%. The breakdown of what we do in js_PutCallObject() now shows that ~97% of the time is spent in js_LookupPropertyWithFlags(). It'd still be good to speed up js_PutCallObject() further, if possible, but this bug appears to be fixed with the attached patch.
Comment on attachment 296383 [details] [diff] [review]
v1

>+        constMask = (jsuword) 0;
>     } else {
>         JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
>                   entry->localKind == JSLOCAL_CONST);
>         JS_ASSERT(entry->index < args->fun->u.i.nvars);
>         JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
>         i = args->fun->nargs + entry->index;
>+        constMask = (jsuword) ((entry->localKind == JSLOCAL_CONST) ? 1 : 0);

Those (jsuword) 0 and (jsuword) (... ? 1 : 0) casts should not be necessary. Also, constFlag name seems better than constMask.

>+    /*
>+     * No need to check for overflow of the allocation size as we making a

"we are making"

>+js_ReleaseLocalNameArray(JSContext *cx, JSFunction *fun, jsuword *names)

Suggest js_PutLocalNameArray for symmetry with Get (not necessarily allocating of freeing, as Get and Release imply but shorter and antonym of Get).

>+            if (xdr->mode == JSXDR_DECODE) {
>+                localKind = i < fun->nargs
>+                            ? JSLOCAL_ARG
>+                            : bitmap & JS_BIT(j)

Parenthesize (bitmap & JS_BIT(j)).

>+#define JS_LOCAL_NAME_WORD_TO_ATOM(nameWord)                                  \
>+    ((JSAtom *) ((nameWord) & ~(jsuword) 1))
>+
>+#define JS_LOCAL_NAME_WORD_TO_CONST_FLAG(nameWord)                            \
>+    ((((nameWord) & (jsuword) 1)) != 0)

Suggest slightly shorter names, dropping noise-word WORD (Word in macro param name: JS_LOCAL_NAME_TO_ATOM(name), etc. Also, how about JS_LOCAL_NAME_IS_CONST since the macro abstracts away the flag-bit implementation detail? Shorter and more abstract.

r=me with these nits picked, thanks.

/be
Attachment #296383 - Flags: review?(brendan) → review+
Attached patch v2 (obsolete) — Splinter Review
Here is a version ready for check in with the nits addressed.
Attachment #296383 - Attachment is obsolete: true
Attachment #298095 - Flags: review+
Attached patch v1 - v2 delta (obsolete) — Splinter Review
Comment on attachment 298095 [details] [diff] [review]
v2

Asking for 1.9 approval as the patch improves the performace with some benchmarks (see comment 3) and shrinks the code as well.
Attachment #298095 - Flags: approval1.9?
Flags: blocking1.9+
I checked in the patch from comment 5 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200850800&maxdate=1200850942&who=igor%25mir2.org
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: perf
This looks like it's responsible for a large Ts jump on all boxes. For example:

bldlnx03 855ms-->1062ms
bldxp01 1656ms-->1937ms
(In reply to comment #9)
> This looks like it's responsible for a large Ts jump on all boxes. For example:
> 
> bldlnx03 855ms-->1062ms
> bldxp01 1656ms-->1937ms

I backed out the patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch v3 (obsolete) — Splinter Review
The new version uses the arena for allocating names arrays when the number of locals exceeds MAX_ARRAY_LOCALS. I do not see anything else that could caused the regression.

jst, could you profile one more time to check that an extra arena marking/release that is done with the patch even for functions with few locals does not contribute much to call_enumerate?
Attachment #298095 - Attachment is obsolete: true
Attachment #298096 - Attachment is obsolete: true
Attachment #298148 - Flags: review?(brendan)
Attachment #298095 - Flags: approval1.9?
Attached patch v2 - v3 delta (obsolete) — Splinter Review
Here is a patch to dump to /tmp/names.txt of the number of calls to js_GetLocalNames after each GC. The output for a typical browser run looks like:

big/all get calls: 61/676 (9.02%), big temp bytes: 3K
big/all get calls: 92/1018 (9.04%), big temp bytes: 4K
big/all get calls: 159/1851 (8.59%), big temp bytes: 8K

-- browser has started into empty window

big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 362/2421 (14.95%), big temp bytes: 16K
big/all get calls: 382/2666 (14.33%), big temp bytes: 17K
big/all get calls: 389/2747 (14.16%), big temp bytes: 17K
big/all get calls: 400/2898 (13.80%), big temp bytes: 18K
big/all get calls: 400/2901 (13.79%), big temp bytes: 18K
big/all get calls: 411/3051 (13.47%), big temp bytes: 18K
big/all get calls: 416/3179 (13.09%), big temp bytes: 19K
big/all get calls: 417/3202 (13.02%), big temp bytes: 19K
big/all get calls: 625/3455 (18.09%), big temp bytes: 27K
big/all get calls: 1039/4011 (25.90%), big temp bytes: 46K
big/all get calls: 1468/4813 (30.50%), big temp bytes: 65K
big/all get calls: 1593/5016 (31.76%), big temp bytes: 70K
big/all get calls: 2004/5454 (36.74%), big temp bytes: 86K
big/all get calls: 2185/5674 (38.51%), big temp bytes: 93K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2206/5886 (37.48%), big temp bytes: 94K
big/all get calls: 2226/6040 (36.85%), big temp bytes: 95K
big/all get calls: 2233/6070 (36.79%), big temp bytes: 96K
big/all get calls: 2234/6081 (36.74%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K


It is very unclear what exactly has caused the reported Ts regression given that there is just 1851 calls after the start-up with only 159 triggering the array allocation. My only theory is that not using the arena for the temporaries affected the arena allocation pattern.

In any case, as in a longer run the amount of times the array has to be allocated reaches 35% it makes sense to use the arena for the temporary array, not the malloc call.
Michal any chance you can do a differential Ts profile to see if there's anything obvious in this patch that affects Ts?
Comment on attachment 298148 [details] [diff] [review]
v3

r/a=me based on interdiff provided. Thanks,

/be
Attachment #298148 - Flags: review?(brendan)
Attachment #298148 - Flags: review+
Attachment #298148 - Flags: approval1.9+
I checked in the patch from comment 13 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200989760&maxdate=1200990180&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
I backed out the patch again due to bad Ts regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #14)
> Michal any chance you can do a differential Ts profile to see if there's
> anything obvious in this patch that affects Ts?
> 

OK, I'll try to do it.
When patch #298148 is applied, script fastloading is broken for some reason. I'm debugging release build now, and this is really pain.

Here rv == 0x80470002 (NS_BASE_STREAM_CLOSED) :
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1123
I can't find more in release build :-/ I'll do debug build but maybe someone already know from this info, where is the problem?

It seems that this is the only reason why startup is much slower...
(In reply to comment #19)
> It seems that this is the only reason why startup is much slower...

Do you mean that fast loading does not work at all with the patch? Or it happens but slow?
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199(In reply to comment #20)
> (In reply to comment #19)
> > It seems that this is the only reason why startup is much slower...
> 
> Do you mean that fast loading does not work at all with the patch? Or it
> happens but slow?
> 

http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199 is executed always, so IMO it doesn't work at all
Attached patch v4 (obsolete) — Splinter Review
The change in the xdr format introduced by the previous patch requires to modify the magic number. It is not sufficient to just bump the version since the code will continue to decode the script even if the version is different. 

So the new version keeps the same xdr format.
Attachment #298148 - Attachment is obsolete: true
Attachment #298149 - Attachment is obsolete: true
Attachment #298482 - Flags: review?(brendan)
Attached patch v3 - v4 delta (obsolete) — Splinter Review
Comment on attachment 298482 [details] [diff] [review]
v4

Sorry, should have spotted that. If this diffs well against the last version in CVS to XDR this way, great (no need to attach diff, just suggesting spot check).

/be
Attachment #298482 - Flags: review?(brendan)
Attachment #298482 - Flags: review+
Attachment #298482 - Flags: approval1.9+
Since the patch has moved the code to xdr local names from fun_xdrObject to a separated XdrLocalNames function, the diff against the trunk does not show the essence of changes. So here is a direct diff between 2 parts of the code.
(In reply to comment #25)
> Created an attachment (id=298681) [details]
> diff of changes to local names xdr code

This shows immediately the problem with patch: it does not release arena's mark when js_GetLocalNameArray fails.
Attached patch v5 (obsolete) — Splinter Review
In the new version I fixed the error recovery and removed redundant #include "jsxdrapi.h" as that is now done at the beginning of the file. Here is a delta from v4:

--- /home/igor/m/ff/mozilla/quilt.UQ3528/js/src/jsfun.c	2008-01-23 11:04:33.000000000 +0100
+++ js/src/jsfun.c	2008-01-23 11:00:03.000000000 +0100
@@ -1162,18 +1162,16 @@ fun_convert(JSContext *cx, JSObject *obj
         return JS_TRUE;
       default:
         return js_TryValueOf(cx, obj, type, vp);
     }
 }
 
 #if JS_HAS_XDR
 
-#include "jsxdrapi.h"
-
 static JSBool
 XDRLocalNames(JSXDRState *xdr, JSFunction *fun);
 
 /* XXX store parent and proto, if defined */
 static JSBool
 fun_xdrObject(JSXDRState *xdr, JSObject **objp)
 {
     JSContext *cx;
@@ -2580,18 +2578,20 @@ XDRLocalNames(JSXDRState *xdr, JSFunctio
      * as const, not as ordinary var.
      */
     bitmapLength = JS_HOWMANY(n, JS_BITS_PER_UINT32);
     JS_ARENA_ALLOCATE_CAST(bitmap, uint32 *, &xdr->cx->tempPool,
                            bitmapLength * sizeof *bitmap);
 
     if (xdr->mode == JSXDR_ENCODE) {
         names = js_GetLocalNameArray(xdr->cx, fun, &xdr->cx->tempPool);
-        if (!names)
-            return JS_FALSE;
+        if (!names) {
+            ok = JS_FALSE;
+            goto out;
+        }
         memset(bitmap, 0, bitmapLength * sizeof *bitmap);
         for (i = 0; i != n; ++i) {
             if (i < fun->nargs
                 ? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
                 : JS_LOCAL_NAME_IS_CONST(names[i])) {
                 bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
                     JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
             }
Attachment #298482 - Attachment is obsolete: true
Attachment #298483 - Attachment is obsolete: true
Attachment #298681 - Attachment is obsolete: true
Attachment #298682 - Flags: review?(brendan)
This is a new diff for the local name xdr code. I have used 
  i >> JS_BITS_PER_UINT32_LOG2
and not
  i / JS_BITS_PER_UINT32
to be explicit about the nature of operations and for consistency. Using the division requires to use the remainder when accessing the rest of "i" bits to keep the same level ob abstraction. Yet when adding the local name support I wrote:
  JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
and not 
  JS_BIT(i % JS_BITS_PER_UINT32).

In addition the code calls js_FreezeLocalNames only after successful decoding of the local names.
Attached patch v6 (obsolete) — Splinter Review
In the previous patch I forgot to check if the result of JS_ARENA_ALLOCATE_CAST was NULL. So to minimize the risk of more regressions I made the patch as minimal as possible. In particular, it just touches the parts of fun_xdrObject that are necessary due to js_GetLocalNameArray changes.
Attachment #298682 - Attachment is obsolete: true
Attachment #298685 - Attachment is obsolete: true
Attachment #298708 - Flags: review?(brendan)
Attachment #298682 - Flags: review?(brendan)
Comment on attachment 298708 [details] [diff] [review]
v6

>+                    bitmap[i / JS_BITS_PER_UINT32] |=
>+                        JS_BIT(i & (JS_BITS_PER_UINT32 - 1));

So v5 used explicit reduction in strength consistently, but v6 goes back to / instead of >> -- is that intended? I prefer consistent, explicit strength reduction.

>+        constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;

C guarantees 0 or 1 for == result, so no need for ?: here.

Looks good otherwise, hope it does not affect any perf tests. r=me with these changes.

/be
Attachment #298708 - Flags: review?(brendan) → review+
Attached patch v7Splinter Review
The new version addresses the nits. Here is the delta from v6:

--- /home/igor/m/ff/mozilla/quilt.s12092/js/src/jsfun.c	2008-01-28 18:15:32.000000000 -0800
+++ js/src/jsfun.c	2008-01-28 18:10:12.000000000 -0800
@@ -1256,34 +1256,34 @@ fun_xdrObject(JSXDRState *xdr, JSObject 
                 ok = JS_FALSE;
                 goto release_mark;
             }
             memset(bitmap, 0, bitmapLength * sizeof *bitmap);
             for (i = 0; i != n; ++i) {
                 if (i < fun->nargs
                     ? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
                     : JS_LOCAL_NAME_IS_CONST(names[i])) {
-                    bitmap[i / JS_BITS_PER_UINT32] |=
+                    bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
                         JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
                 }
             }
         }
 #ifdef __GNUC__
         else {
             names = NULL;   /* quell GCC uninitialized warning */
         }
 #endif
         for (i = 0; i != bitmapLength; ++i) {
             ok = JS_XDRUint32(xdr, &bitmap[i]);
             if (!ok)
                 goto release_mark;
         }
         for (i = 0; i != n; ++i) {
             if (i < nargs &&
-                !(bitmap[i / JS_BITS_PER_UINT32] &
+                !(bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
                   JS_BIT(i & (JS_BITS_PER_UINT32 - 1)))) {
                 if (xdr->mode == JSXDR_DECODE) {
                     ok = js_AddLocal(xdr->cx, fun, NULL, JSLOCAL_ARG);
                     if (!ok)
                         goto release_mark;
                 } else {
                     JS_ASSERT(!JS_LOCAL_NAME_TO_ATOM(names[i]));
                 }
@@ -1292,17 +1292,17 @@ fun_xdrObject(JSXDRState *xdr, JSObject 
             if (xdr->mode == JSXDR_ENCODE)
                 name = JS_LOCAL_NAME_TO_ATOM(names[i]);
             ok = js_XDRStringAtom(xdr, &name);
             if (!ok)
                 goto release_mark;
             if (xdr->mode == JSXDR_DECODE) {
                 localKind = (i < nargs)
                             ? JSLOCAL_ARG
-                            : bitmap[i / JS_BITS_PER_UINT32] &
+                            : bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
                               JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
                             ? JSLOCAL_CONST
                             : JSLOCAL_VAR;
                 ok = js_AddLocal(xdr->cx, fun, name, localKind);
                 if (!ok)
                     goto release_mark;
             }
         }
@@ -2491,17 +2491,17 @@ get_local_names_enumerator(JSDHashTable 
         i = entry->index;
         constFlag = 0;
     } else {
         JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
                   entry->localKind == JSLOCAL_CONST);
         JS_ASSERT(entry->index < args->fun->u.i.nvars);
         JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
         i = args->fun->nargs + entry->index;
-        constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;
+        constFlag = (entry->localKind == JSLOCAL_CONST);
     }
     args->names[i] = (jsuword) entry->name | constFlag;
     return JS_DHASH_NEXT;
 }
 
 jsuword *
 js_GetLocalNameArray(JSContext *cx, JSFunction *fun, JSArenaPool *pool)
 {
Attachment #298708 - Attachment is obsolete: true
Attachment #299904 - Flags: review+
I checked in the patch from comment 31 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1202515224&maxdate=1202515320&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.