Status

()

Core
JavaScript Engine
--
enhancement
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: Igor Bukanov, Assigned: Igor Bukanov)

Tracking

({perf})

Trunk
Points:
---
Bug Flags:
blocking1.9 +
in-testsuite -
in-litmus -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 11 obsolete attachments)

(Assignee)

Description

10 years ago
This is a spin-off of bug 411575 to focus on the slowness of js_GetLocalNames. As bug 411575 comment 0 indicates one of the cause of the slowness of js_PutCallObject is slow js_GetLocalNames. 

It would be nice to speedup that function.
(Assignee)

Updated

10 years ago
Blocks: 411575
No longer depends on: 411575
(Assignee)

Comment 1

10 years ago
Created attachment 296383 [details] [diff] [review]
v1

I assume that part of js_GetLocalNames slowness comes from re-encoding the local names array as an array of atoms with extra checks to see if the bitmap array should be constructed as well.

The patch replaces that by exposing the array directly when its length is less than MAX_ARRAY_LOCALS (==8) or allocating a temporary otherwise. The temporary is allocated using JS_malloc to speedup the common case of fewer than 8 locals.

To simplify xdr logic the code changes the format to serialize the locals in chunks of 32 names. This way there is no need to manage a separated bitmap array.

The patch bumps the XDR version since the above changes the format when XDR-ed function has more than 32 locals.
Attachment #296383 - Flags: review?(brendan)
jst, can you re-profile with this patch applied?

/be
The patch did help, it made the time spent in js_PutCallObject drop from being about 20% of all the time spent in JS_Invoke() down to ~8%. The breakdown of what we do in js_PutCallObject() now shows that ~97% of the time is spent in js_LookupPropertyWithFlags(). It'd still be good to speed up js_PutCallObject() further, if possible, but this bug appears to be fixed with the attached patch.
Comment on attachment 296383 [details] [diff] [review]
v1

>+        constMask = (jsuword) 0;
>     } else {
>         JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
>                   entry->localKind == JSLOCAL_CONST);
>         JS_ASSERT(entry->index < args->fun->u.i.nvars);
>         JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
>         i = args->fun->nargs + entry->index;
>+        constMask = (jsuword) ((entry->localKind == JSLOCAL_CONST) ? 1 : 0);

Those (jsuword) 0 and (jsuword) (... ? 1 : 0) casts should not be necessary. Also, constFlag name seems better than constMask.

>+    /*
>+     * No need to check for overflow of the allocation size as we making a

"we are making"

>+js_ReleaseLocalNameArray(JSContext *cx, JSFunction *fun, jsuword *names)

Suggest js_PutLocalNameArray for symmetry with Get (not necessarily allocating of freeing, as Get and Release imply but shorter and antonym of Get).

>+            if (xdr->mode == JSXDR_DECODE) {
>+                localKind = i < fun->nargs
>+                            ? JSLOCAL_ARG
>+                            : bitmap & JS_BIT(j)

Parenthesize (bitmap & JS_BIT(j)).

>+#define JS_LOCAL_NAME_WORD_TO_ATOM(nameWord)                                  \
>+    ((JSAtom *) ((nameWord) & ~(jsuword) 1))
>+
>+#define JS_LOCAL_NAME_WORD_TO_CONST_FLAG(nameWord)                            \
>+    ((((nameWord) & (jsuword) 1)) != 0)

Suggest slightly shorter names, dropping noise-word WORD (Word in macro param name: JS_LOCAL_NAME_TO_ATOM(name), etc. Also, how about JS_LOCAL_NAME_IS_CONST since the macro abstracts away the flag-bit implementation detail? Shorter and more abstract.

r=me with these nits picked, thanks.

/be
Attachment #296383 - Flags: review?(brendan) → review+
(Assignee)

Comment 5

10 years ago
Created attachment 298095 [details] [diff] [review]
v2

Here is a version ready for check in with the nits addressed.
Attachment #296383 - Attachment is obsolete: true
Attachment #298095 - Flags: review+
(Assignee)

Comment 6

10 years ago
Created attachment 298096 [details] [diff] [review]
v1 - v2 delta
(Assignee)

Comment 7

10 years ago
Comment on attachment 298095 [details] [diff] [review]
v2

Asking for 1.9 approval as the patch improves the performace with some benchmarks (see comment 3) and shrinks the code as well.
Attachment #298095 - Flags: approval1.9?

Updated

10 years ago
Flags: blocking1.9+
(Assignee)

Comment 8

10 years ago
I checked in the patch from comment 5 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200850800&maxdate=1200850942&who=igor%25mir2.org
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Keywords: perf
This looks like it's responsible for a large Ts jump on all boxes. For example:

bldlnx03 855ms-->1062ms
bldxp01 1656ms-->1937ms
(Assignee)

Comment 10

10 years ago
(In reply to comment #9)
> This looks like it's responsible for a large Ts jump on all boxes. For example:
> 
> bldlnx03 855ms-->1062ms
> bldxp01 1656ms-->1937ms

I backed out the patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 11

10 years ago
Created attachment 298148 [details] [diff] [review]
v3

The new version uses the arena for allocating names arrays when the number of locals exceeds MAX_ARRAY_LOCALS. I do not see anything else that could caused the regression.

jst, could you profile one more time to check that an extra arena marking/release that is done with the patch even for functions with few locals does not contribute much to call_enumerate?
Attachment #298095 - Attachment is obsolete: true
Attachment #298096 - Attachment is obsolete: true
Attachment #298148 - Flags: review?(brendan)
Attachment #298095 - Flags: approval1.9?
(Assignee)

Comment 12

10 years ago
Created attachment 298149 [details] [diff] [review]
v2 - v3 delta
(Assignee)

Comment 13

10 years ago
Created attachment 298271 [details] [diff] [review]
measuring the number of js_GetLocalNameArray

Here is a patch to dump to /tmp/names.txt of the number of calls to js_GetLocalNames after each GC. The output for a typical browser run looks like:

big/all get calls: 61/676 (9.02%), big temp bytes: 3K
big/all get calls: 92/1018 (9.04%), big temp bytes: 4K
big/all get calls: 159/1851 (8.59%), big temp bytes: 8K

-- browser has started into empty window

big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 362/2421 (14.95%), big temp bytes: 16K
big/all get calls: 382/2666 (14.33%), big temp bytes: 17K
big/all get calls: 389/2747 (14.16%), big temp bytes: 17K
big/all get calls: 400/2898 (13.80%), big temp bytes: 18K
big/all get calls: 400/2901 (13.79%), big temp bytes: 18K
big/all get calls: 411/3051 (13.47%), big temp bytes: 18K
big/all get calls: 416/3179 (13.09%), big temp bytes: 19K
big/all get calls: 417/3202 (13.02%), big temp bytes: 19K
big/all get calls: 625/3455 (18.09%), big temp bytes: 27K
big/all get calls: 1039/4011 (25.90%), big temp bytes: 46K
big/all get calls: 1468/4813 (30.50%), big temp bytes: 65K
big/all get calls: 1593/5016 (31.76%), big temp bytes: 70K
big/all get calls: 2004/5454 (36.74%), big temp bytes: 86K
big/all get calls: 2185/5674 (38.51%), big temp bytes: 93K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2206/5886 (37.48%), big temp bytes: 94K
big/all get calls: 2226/6040 (36.85%), big temp bytes: 95K
big/all get calls: 2233/6070 (36.79%), big temp bytes: 96K
big/all get calls: 2234/6081 (36.74%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K


It is very unclear what exactly has caused the reported Ts regression given that there is just 1851 calls after the start-up with only 159 triggering the array allocation. My only theory is that not using the arena for the temporaries affected the arena allocation pattern.

In any case, as in a longer run the amount of times the array has to be allocated reaches 35% it makes sense to use the arena for the temporary array, not the malloc call.

Comment 14

10 years ago
Michal any chance you can do a differential Ts profile to see if there's anything obvious in this patch that affects Ts?
Comment on attachment 298148 [details] [diff] [review]
v3

r/a=me based on interdiff provided. Thanks,

/be
Attachment #298148 - Flags: review?(brendan)
Attachment #298148 - Flags: review+
Attachment #298148 - Flags: approval1.9+
(Assignee)

Comment 16

10 years ago
I checked in the patch from comment 13 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200989760&maxdate=1200990180&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED
(Assignee)

Comment 17

10 years ago
I backed out the patch again due to bad Ts regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to comment #14)
> Michal any chance you can do a differential Ts profile to see if there's
> anything obvious in this patch that affects Ts?
> 

OK, I'll try to do it.
When patch #298148 is applied, script fastloading is broken for some reason. I'm debugging release build now, and this is really pain.

Here rv == 0x80470002 (NS_BASE_STREAM_CLOSED) :
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1123
I can't find more in release build :-/ I'll do debug build but maybe someone already know from this info, where is the problem?

It seems that this is the only reason why startup is much slower...
(Assignee)

Comment 20

10 years ago
(In reply to comment #19)
> It seems that this is the only reason why startup is much slower...

Do you mean that fast loading does not work at all with the patch? Or it happens but slow?
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199(In reply to comment #20)
> (In reply to comment #19)
> > It seems that this is the only reason why startup is much slower...
> 
> Do you mean that fast loading does not work at all with the patch? Or it
> happens but slow?
> 

http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199 is executed always, so IMO it doesn't work at all
(Assignee)

Comment 22

10 years ago
Created attachment 298482 [details] [diff] [review]
v4

The change in the xdr format introduced by the previous patch requires to modify the magic number. It is not sufficient to just bump the version since the code will continue to decode the script even if the version is different. 

So the new version keeps the same xdr format.
Attachment #298148 - Attachment is obsolete: true
Attachment #298149 - Attachment is obsolete: true
Attachment #298482 - Flags: review?(brendan)
(Assignee)

Comment 23

10 years ago
Created attachment 298483 [details] [diff] [review]
v3 - v4 delta
Comment on attachment 298482 [details] [diff] [review]
v4

Sorry, should have spotted that. If this diffs well against the last version in CVS to XDR this way, great (no need to attach diff, just suggesting spot check).

/be
Attachment #298482 - Flags: review?(brendan)
Attachment #298482 - Flags: review+
Attachment #298482 - Flags: approval1.9+
(Assignee)

Comment 25

10 years ago
Created attachment 298681 [details] [diff] [review]
diff of changes to local names xdr code

Since the patch has moved the code to xdr local names from fun_xdrObject to a separated XdrLocalNames function, the diff against the trunk does not show the essence of changes. So here is a direct diff between 2 parts of the code.
(Assignee)

Comment 26

10 years ago
(In reply to comment #25)
> Created an attachment (id=298681) [details]
> diff of changes to local names xdr code

This shows immediately the problem with patch: it does not release arena's mark when js_GetLocalNameArray fails.
(Assignee)

Comment 27

10 years ago
Created attachment 298682 [details] [diff] [review]
v5

In the new version I fixed the error recovery and removed redundant #include "jsxdrapi.h" as that is now done at the beginning of the file. Here is a delta from v4:

--- /home/igor/m/ff/mozilla/quilt.UQ3528/js/src/jsfun.c	2008-01-23 11:04:33.000000000 +0100
+++ js/src/jsfun.c	2008-01-23 11:00:03.000000000 +0100
@@ -1162,18 +1162,16 @@ fun_convert(JSContext *cx, JSObject *obj
         return JS_TRUE;
       default:
         return js_TryValueOf(cx, obj, type, vp);
     }
 }
 
 #if JS_HAS_XDR
 
-#include "jsxdrapi.h"
-
 static JSBool
 XDRLocalNames(JSXDRState *xdr, JSFunction *fun);
 
 /* XXX store parent and proto, if defined */
 static JSBool
 fun_xdrObject(JSXDRState *xdr, JSObject **objp)
 {
     JSContext *cx;
@@ -2580,18 +2578,20 @@ XDRLocalNames(JSXDRState *xdr, JSFunctio
      * as const, not as ordinary var.
      */
     bitmapLength = JS_HOWMANY(n, JS_BITS_PER_UINT32);
     JS_ARENA_ALLOCATE_CAST(bitmap, uint32 *, &xdr->cx->tempPool,
                            bitmapLength * sizeof *bitmap);
 
     if (xdr->mode == JSXDR_ENCODE) {
         names = js_GetLocalNameArray(xdr->cx, fun, &xdr->cx->tempPool);
-        if (!names)
-            return JS_FALSE;
+        if (!names) {
+            ok = JS_FALSE;
+            goto out;
+        }
         memset(bitmap, 0, bitmapLength * sizeof *bitmap);
         for (i = 0; i != n; ++i) {
             if (i < fun->nargs
                 ? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
                 : JS_LOCAL_NAME_IS_CONST(names[i])) {
                 bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
                     JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
             }
Attachment #298482 - Attachment is obsolete: true
Attachment #298483 - Attachment is obsolete: true
Attachment #298681 - Attachment is obsolete: true
Attachment #298682 - Flags: review?(brendan)
(Assignee)

Comment 28

10 years ago
Created attachment 298685 [details] [diff] [review]
diff of changes to local names xdr code

This is a new diff for the local name xdr code. I have used 
  i >> JS_BITS_PER_UINT32_LOG2
and not
  i / JS_BITS_PER_UINT32
to be explicit about the nature of operations and for consistency. Using the division requires to use the remainder when accessing the rest of "i" bits to keep the same level ob abstraction. Yet when adding the local name support I wrote:
  JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
and not 
  JS_BIT(i % JS_BITS_PER_UINT32).

In addition the code calls js_FreezeLocalNames only after successful decoding of the local names.
(Assignee)

Comment 29

10 years ago
Created attachment 298708 [details] [diff] [review]
v6

In the previous patch I forgot to check if the result of JS_ARENA_ALLOCATE_CAST was NULL. So to minimize the risk of more regressions I made the patch as minimal as possible. In particular, it just touches the parts of fun_xdrObject that are necessary due to js_GetLocalNameArray changes.
Attachment #298682 - Attachment is obsolete: true
Attachment #298685 - Attachment is obsolete: true
Attachment #298708 - Flags: review?(brendan)
Attachment #298682 - Flags: review?(brendan)
Comment on attachment 298708 [details] [diff] [review]
v6

>+                    bitmap[i / JS_BITS_PER_UINT32] |=
>+                        JS_BIT(i & (JS_BITS_PER_UINT32 - 1));

So v5 used explicit reduction in strength consistently, but v6 goes back to / instead of >> -- is that intended? I prefer consistent, explicit strength reduction.

>+        constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;

C guarantees 0 or 1 for == result, so no need for ?: here.

Looks good otherwise, hope it does not affect any perf tests. r=me with these changes.

/be
Attachment #298708 - Flags: review?(brendan) → review+
(Assignee)

Comment 31

10 years ago
Created attachment 299904 [details] [diff] [review]
v7

The new version addresses the nits. Here is the delta from v6:

--- /home/igor/m/ff/mozilla/quilt.s12092/js/src/jsfun.c	2008-01-28 18:15:32.000000000 -0800
+++ js/src/jsfun.c	2008-01-28 18:10:12.000000000 -0800
@@ -1256,34 +1256,34 @@ fun_xdrObject(JSXDRState *xdr, JSObject 
                 ok = JS_FALSE;
                 goto release_mark;
             }
             memset(bitmap, 0, bitmapLength * sizeof *bitmap);
             for (i = 0; i != n; ++i) {
                 if (i < fun->nargs
                     ? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
                     : JS_LOCAL_NAME_IS_CONST(names[i])) {
-                    bitmap[i / JS_BITS_PER_UINT32] |=
+                    bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
                         JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
                 }
             }
         }
 #ifdef __GNUC__
         else {
             names = NULL;   /* quell GCC uninitialized warning */
         }
 #endif
         for (i = 0; i != bitmapLength; ++i) {
             ok = JS_XDRUint32(xdr, &bitmap[i]);
             if (!ok)
                 goto release_mark;
         }
         for (i = 0; i != n; ++i) {
             if (i < nargs &&
-                !(bitmap[i / JS_BITS_PER_UINT32] &
+                !(bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
                   JS_BIT(i & (JS_BITS_PER_UINT32 - 1)))) {
                 if (xdr->mode == JSXDR_DECODE) {
                     ok = js_AddLocal(xdr->cx, fun, NULL, JSLOCAL_ARG);
                     if (!ok)
                         goto release_mark;
                 } else {
                     JS_ASSERT(!JS_LOCAL_NAME_TO_ATOM(names[i]));
                 }
@@ -1292,17 +1292,17 @@ fun_xdrObject(JSXDRState *xdr, JSObject 
             if (xdr->mode == JSXDR_ENCODE)
                 name = JS_LOCAL_NAME_TO_ATOM(names[i]);
             ok = js_XDRStringAtom(xdr, &name);
             if (!ok)
                 goto release_mark;
             if (xdr->mode == JSXDR_DECODE) {
                 localKind = (i < nargs)
                             ? JSLOCAL_ARG
-                            : bitmap[i / JS_BITS_PER_UINT32] &
+                            : bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
                               JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
                             ? JSLOCAL_CONST
                             : JSLOCAL_VAR;
                 ok = js_AddLocal(xdr->cx, fun, name, localKind);
                 if (!ok)
                     goto release_mark;
             }
         }
@@ -2491,17 +2491,17 @@ get_local_names_enumerator(JSDHashTable 
         i = entry->index;
         constFlag = 0;
     } else {
         JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
                   entry->localKind == JSLOCAL_CONST);
         JS_ASSERT(entry->index < args->fun->u.i.nvars);
         JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
         i = args->fun->nargs + entry->index;
-        constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;
+        constFlag = (entry->localKind == JSLOCAL_CONST);
     }
     args->names[i] = (jsuword) entry->name | constFlag;
     return JS_DHASH_NEXT;
 }
 
 jsuword *
 js_GetLocalNameArray(JSContext *cx, JSFunction *fun, JSArenaPool *pool)
 {
Attachment #298708 - Attachment is obsolete: true
Attachment #299904 - Flags: review+
(Assignee)

Comment 32

10 years ago
I checked in the patch from comment 31 to the trunk:

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1202515224&maxdate=1202515320&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Last Resolved: 10 years ago10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.