Closed
Bug 411722
Opened 17 years ago
Closed 17 years ago
js_GetLocalNames is slow
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
(Keywords: perf)
Attachments
(2 files, 11 obsolete files)
2.64 KB,
patch
|
Details | Diff | Splinter Review | |
21.74 KB,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
This is a spin-off of bug 411575 to focus on the slowness of js_GetLocalNames. As bug 411575 comment 0 indicates one of the cause of the slowness of js_PutCallObject is slow js_GetLocalNames.
It would be nice to speedup that function.
Assignee | ||
Updated•17 years ago
|
Assignee | ||
Comment 1•17 years ago
|
||
I assume that part of js_GetLocalNames slowness comes from re-encoding the local names array as an array of atoms with extra checks to see if the bitmap array should be constructed as well.
The patch replaces that by exposing the array directly when its length is less than MAX_ARRAY_LOCALS (==8) or allocating a temporary otherwise. The temporary is allocated using JS_malloc to speedup the common case of fewer than 8 locals.
To simplify xdr logic the code changes the format to serialize the locals in chunks of 32 names. This way there is no need to manage a separated bitmap array.
The patch bumps the XDR version since the above changes the format when XDR-ed function has more than 32 locals.
Attachment #296383 -
Flags: review?(brendan)
Comment 2•17 years ago
|
||
jst, can you re-profile with this patch applied?
/be
Comment 3•17 years ago
|
||
The patch did help, it made the time spent in js_PutCallObject drop from being about 20% of all the time spent in JS_Invoke() down to ~8%. The breakdown of what we do in js_PutCallObject() now shows that ~97% of the time is spent in js_LookupPropertyWithFlags(). It'd still be good to speed up js_PutCallObject() further, if possible, but this bug appears to be fixed with the attached patch.
Comment 4•17 years ago
|
||
Comment on attachment 296383 [details] [diff] [review]
v1
>+ constMask = (jsuword) 0;
> } else {
> JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
> entry->localKind == JSLOCAL_CONST);
> JS_ASSERT(entry->index < args->fun->u.i.nvars);
> JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
> i = args->fun->nargs + entry->index;
>+ constMask = (jsuword) ((entry->localKind == JSLOCAL_CONST) ? 1 : 0);
Those (jsuword) 0 and (jsuword) (... ? 1 : 0) casts should not be necessary. Also, constFlag name seems better than constMask.
>+ /*
>+ * No need to check for overflow of the allocation size as we making a
"we are making"
>+js_ReleaseLocalNameArray(JSContext *cx, JSFunction *fun, jsuword *names)
Suggest js_PutLocalNameArray for symmetry with Get (not necessarily allocating of freeing, as Get and Release imply but shorter and antonym of Get).
>+ if (xdr->mode == JSXDR_DECODE) {
>+ localKind = i < fun->nargs
>+ ? JSLOCAL_ARG
>+ : bitmap & JS_BIT(j)
Parenthesize (bitmap & JS_BIT(j)).
>+#define JS_LOCAL_NAME_WORD_TO_ATOM(nameWord) \
>+ ((JSAtom *) ((nameWord) & ~(jsuword) 1))
>+
>+#define JS_LOCAL_NAME_WORD_TO_CONST_FLAG(nameWord) \
>+ ((((nameWord) & (jsuword) 1)) != 0)
Suggest slightly shorter names, dropping noise-word WORD (Word in macro param name: JS_LOCAL_NAME_TO_ATOM(name), etc. Also, how about JS_LOCAL_NAME_IS_CONST since the macro abstracts away the flag-bit implementation detail? Shorter and more abstract.
r=me with these nits picked, thanks.
/be
Attachment #296383 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 5•17 years ago
|
||
Here is a version ready for check in with the nits addressed.
Attachment #296383 -
Attachment is obsolete: true
Attachment #298095 -
Flags: review+
Assignee | ||
Comment 6•17 years ago
|
||
Assignee | ||
Comment 7•17 years ago
|
||
Comment on attachment 298095 [details] [diff] [review]
v2
Asking for 1.9 approval as the patch improves the performace with some benchmarks (see comment 3) and shrinks the code as well.
Attachment #298095 -
Flags: approval1.9?
Updated•17 years ago
|
Flags: blocking1.9+
Assignee | ||
Comment 8•17 years ago
|
||
I checked in the patch from comment 5 to the trunk:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200850800&maxdate=1200850942&who=igor%25mir2.org
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 9•17 years ago
|
||
This looks like it's responsible for a large Ts jump on all boxes. For example:
bldlnx03 855ms-->1062ms
bldxp01 1656ms-->1937ms
Assignee | ||
Comment 10•17 years ago
|
||
(In reply to comment #9)
> This looks like it's responsible for a large Ts jump on all boxes. For example:
>
> bldlnx03 855ms-->1062ms
> bldxp01 1656ms-->1937ms
I backed out the patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 11•17 years ago
|
||
The new version uses the arena for allocating names arrays when the number of locals exceeds MAX_ARRAY_LOCALS. I do not see anything else that could caused the regression.
jst, could you profile one more time to check that an extra arena marking/release that is done with the patch even for functions with few locals does not contribute much to call_enumerate?
Attachment #298095 -
Attachment is obsolete: true
Attachment #298096 -
Attachment is obsolete: true
Attachment #298148 -
Flags: review?(brendan)
Attachment #298095 -
Flags: approval1.9?
Assignee | ||
Comment 12•17 years ago
|
||
Assignee | ||
Comment 13•17 years ago
|
||
Here is a patch to dump to /tmp/names.txt of the number of calls to js_GetLocalNames after each GC. The output for a typical browser run looks like:
big/all get calls: 61/676 (9.02%), big temp bytes: 3K
big/all get calls: 92/1018 (9.04%), big temp bytes: 4K
big/all get calls: 159/1851 (8.59%), big temp bytes: 8K
-- browser has started into empty window
big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 165/2207 (7.48%), big temp bytes: 8K
big/all get calls: 362/2421 (14.95%), big temp bytes: 16K
big/all get calls: 382/2666 (14.33%), big temp bytes: 17K
big/all get calls: 389/2747 (14.16%), big temp bytes: 17K
big/all get calls: 400/2898 (13.80%), big temp bytes: 18K
big/all get calls: 400/2901 (13.79%), big temp bytes: 18K
big/all get calls: 411/3051 (13.47%), big temp bytes: 18K
big/all get calls: 416/3179 (13.09%), big temp bytes: 19K
big/all get calls: 417/3202 (13.02%), big temp bytes: 19K
big/all get calls: 625/3455 (18.09%), big temp bytes: 27K
big/all get calls: 1039/4011 (25.90%), big temp bytes: 46K
big/all get calls: 1468/4813 (30.50%), big temp bytes: 65K
big/all get calls: 1593/5016 (31.76%), big temp bytes: 70K
big/all get calls: 2004/5454 (36.74%), big temp bytes: 86K
big/all get calls: 2185/5674 (38.51%), big temp bytes: 93K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5724 (38.42%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2199/5728 (38.39%), big temp bytes: 94K
big/all get calls: 2206/5886 (37.48%), big temp bytes: 94K
big/all get calls: 2226/6040 (36.85%), big temp bytes: 95K
big/all get calls: 2233/6070 (36.79%), big temp bytes: 96K
big/all get calls: 2234/6081 (36.74%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2235/6092 (36.69%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K
big/all get calls: 2236/6132 (36.46%), big temp bytes: 96K
It is very unclear what exactly has caused the reported Ts regression given that there is just 1851 calls after the start-up with only 159 triggering the array allocation. My only theory is that not using the arena for the temporaries affected the arena allocation pattern.
In any case, as in a longer run the amount of times the array has to be allocated reaches 35% it makes sense to use the arena for the temporary array, not the malloc call.
Comment 14•17 years ago
|
||
Michal any chance you can do a differential Ts profile to see if there's anything obvious in this patch that affects Ts?
Comment 15•17 years ago
|
||
Comment on attachment 298148 [details] [diff] [review]
v3
r/a=me based on interdiff provided. Thanks,
/be
Attachment #298148 -
Flags: review?(brendan)
Attachment #298148 -
Flags: review+
Attachment #298148 -
Flags: approval1.9+
Assignee | ||
Comment 16•17 years ago
|
||
I checked in the patch from comment 13 to the trunk:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%252Fcvsroot&date=explicit&mindate=1200989760&maxdate=1200990180&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 17•17 years ago
|
||
I backed out the patch again due to bad Ts regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 18•17 years ago
|
||
(In reply to comment #14)
> Michal any chance you can do a differential Ts profile to see if there's
> anything obvious in this patch that affects Ts?
>
OK, I'll try to do it.
Comment 19•17 years ago
|
||
When patch #298148 is applied, script fastloading is broken for some reason. I'm debugging release build now, and this is really pain.
Here rv == 0x80470002 (NS_BASE_STREAM_CLOSED) :
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1123
I can't find more in release build :-/ I'll do debug build but maybe someone already know from this info, where is the problem?
It seems that this is the only reason why startup is much slower...
Assignee | ||
Comment 20•17 years ago
|
||
(In reply to comment #19)
> It seems that this is the only reason why startup is much slower...
Do you mean that fast loading does not work at all with the patch? Or it happens but slow?
Comment 21•17 years ago
|
||
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199(In reply to comment #20)
> (In reply to comment #19)
> > It seems that this is the only reason why startup is much slower...
>
> Do you mean that fast loading does not work at all with the patch? Or it
> happens but slow?
>
http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/loader/mozJSComponentLoader.cpp#1199 is executed always, so IMO it doesn't work at all
Assignee | ||
Comment 22•17 years ago
|
||
The change in the xdr format introduced by the previous patch requires to modify the magic number. It is not sufficient to just bump the version since the code will continue to decode the script even if the version is different.
So the new version keeps the same xdr format.
Attachment #298148 -
Attachment is obsolete: true
Attachment #298149 -
Attachment is obsolete: true
Attachment #298482 -
Flags: review?(brendan)
Assignee | ||
Comment 23•17 years ago
|
||
Comment 24•17 years ago
|
||
Comment on attachment 298482 [details] [diff] [review]
v4
Sorry, should have spotted that. If this diffs well against the last version in CVS to XDR this way, great (no need to attach diff, just suggesting spot check).
/be
Attachment #298482 -
Flags: review?(brendan)
Attachment #298482 -
Flags: review+
Attachment #298482 -
Flags: approval1.9+
Assignee | ||
Comment 25•17 years ago
|
||
Since the patch has moved the code to xdr local names from fun_xdrObject to a separated XdrLocalNames function, the diff against the trunk does not show the essence of changes. So here is a direct diff between 2 parts of the code.
Assignee | ||
Comment 26•17 years ago
|
||
(In reply to comment #25)
> Created an attachment (id=298681) [details]
> diff of changes to local names xdr code
This shows immediately the problem with patch: it does not release arena's mark when js_GetLocalNameArray fails.
Assignee | ||
Comment 27•17 years ago
|
||
In the new version I fixed the error recovery and removed redundant #include "jsxdrapi.h" as that is now done at the beginning of the file. Here is a delta from v4:
--- /home/igor/m/ff/mozilla/quilt.UQ3528/js/src/jsfun.c 2008-01-23 11:04:33.000000000 +0100
+++ js/src/jsfun.c 2008-01-23 11:00:03.000000000 +0100
@@ -1162,18 +1162,16 @@ fun_convert(JSContext *cx, JSObject *obj
return JS_TRUE;
default:
return js_TryValueOf(cx, obj, type, vp);
}
}
#if JS_HAS_XDR
-#include "jsxdrapi.h"
-
static JSBool
XDRLocalNames(JSXDRState *xdr, JSFunction *fun);
/* XXX store parent and proto, if defined */
static JSBool
fun_xdrObject(JSXDRState *xdr, JSObject **objp)
{
JSContext *cx;
@@ -2580,18 +2578,20 @@ XDRLocalNames(JSXDRState *xdr, JSFunctio
* as const, not as ordinary var.
*/
bitmapLength = JS_HOWMANY(n, JS_BITS_PER_UINT32);
JS_ARENA_ALLOCATE_CAST(bitmap, uint32 *, &xdr->cx->tempPool,
bitmapLength * sizeof *bitmap);
if (xdr->mode == JSXDR_ENCODE) {
names = js_GetLocalNameArray(xdr->cx, fun, &xdr->cx->tempPool);
- if (!names)
- return JS_FALSE;
+ if (!names) {
+ ok = JS_FALSE;
+ goto out;
+ }
memset(bitmap, 0, bitmapLength * sizeof *bitmap);
for (i = 0; i != n; ++i) {
if (i < fun->nargs
? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
: JS_LOCAL_NAME_IS_CONST(names[i])) {
bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
}
Attachment #298482 -
Attachment is obsolete: true
Attachment #298483 -
Attachment is obsolete: true
Attachment #298681 -
Attachment is obsolete: true
Attachment #298682 -
Flags: review?(brendan)
Assignee | ||
Comment 28•17 years ago
|
||
This is a new diff for the local name xdr code. I have used
i >> JS_BITS_PER_UINT32_LOG2
and not
i / JS_BITS_PER_UINT32
to be explicit about the nature of operations and for consistency. Using the division requires to use the remainder when accessing the rest of "i" bits to keep the same level ob abstraction. Yet when adding the local name support I wrote:
JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
and not
JS_BIT(i % JS_BITS_PER_UINT32).
In addition the code calls js_FreezeLocalNames only after successful decoding of the local names.
Assignee | ||
Comment 29•17 years ago
|
||
In the previous patch I forgot to check if the result of JS_ARENA_ALLOCATE_CAST was NULL. So to minimize the risk of more regressions I made the patch as minimal as possible. In particular, it just touches the parts of fun_xdrObject that are necessary due to js_GetLocalNameArray changes.
Attachment #298682 -
Attachment is obsolete: true
Attachment #298685 -
Attachment is obsolete: true
Attachment #298708 -
Flags: review?(brendan)
Attachment #298682 -
Flags: review?(brendan)
Comment 30•17 years ago
|
||
Comment on attachment 298708 [details] [diff] [review]
v6
>+ bitmap[i / JS_BITS_PER_UINT32] |=
>+ JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
So v5 used explicit reduction in strength consistently, but v6 goes back to / instead of >> -- is that intended? I prefer consistent, explicit strength reduction.
>+ constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;
C guarantees 0 or 1 for == result, so no need for ?: here.
Looks good otherwise, hope it does not affect any perf tests. r=me with these changes.
/be
Attachment #298708 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 31•17 years ago
|
||
The new version addresses the nits. Here is the delta from v6:
--- /home/igor/m/ff/mozilla/quilt.s12092/js/src/jsfun.c 2008-01-28 18:15:32.000000000 -0800
+++ js/src/jsfun.c 2008-01-28 18:10:12.000000000 -0800
@@ -1256,34 +1256,34 @@ fun_xdrObject(JSXDRState *xdr, JSObject
ok = JS_FALSE;
goto release_mark;
}
memset(bitmap, 0, bitmapLength * sizeof *bitmap);
for (i = 0; i != n; ++i) {
if (i < fun->nargs
? JS_LOCAL_NAME_TO_ATOM(names[i]) != NULL
: JS_LOCAL_NAME_IS_CONST(names[i])) {
- bitmap[i / JS_BITS_PER_UINT32] |=
+ bitmap[i >> JS_BITS_PER_UINT32_LOG2] |=
JS_BIT(i & (JS_BITS_PER_UINT32 - 1));
}
}
}
#ifdef __GNUC__
else {
names = NULL; /* quell GCC uninitialized warning */
}
#endif
for (i = 0; i != bitmapLength; ++i) {
ok = JS_XDRUint32(xdr, &bitmap[i]);
if (!ok)
goto release_mark;
}
for (i = 0; i != n; ++i) {
if (i < nargs &&
- !(bitmap[i / JS_BITS_PER_UINT32] &
+ !(bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
JS_BIT(i & (JS_BITS_PER_UINT32 - 1)))) {
if (xdr->mode == JSXDR_DECODE) {
ok = js_AddLocal(xdr->cx, fun, NULL, JSLOCAL_ARG);
if (!ok)
goto release_mark;
} else {
JS_ASSERT(!JS_LOCAL_NAME_TO_ATOM(names[i]));
}
@@ -1292,17 +1292,17 @@ fun_xdrObject(JSXDRState *xdr, JSObject
if (xdr->mode == JSXDR_ENCODE)
name = JS_LOCAL_NAME_TO_ATOM(names[i]);
ok = js_XDRStringAtom(xdr, &name);
if (!ok)
goto release_mark;
if (xdr->mode == JSXDR_DECODE) {
localKind = (i < nargs)
? JSLOCAL_ARG
- : bitmap[i / JS_BITS_PER_UINT32] &
+ : bitmap[i >> JS_BITS_PER_UINT32_LOG2] &
JS_BIT(i & (JS_BITS_PER_UINT32 - 1))
? JSLOCAL_CONST
: JSLOCAL_VAR;
ok = js_AddLocal(xdr->cx, fun, name, localKind);
if (!ok)
goto release_mark;
}
}
@@ -2491,17 +2491,17 @@ get_local_names_enumerator(JSDHashTable
i = entry->index;
constFlag = 0;
} else {
JS_ASSERT(entry->localKind == JSLOCAL_VAR ||
entry->localKind == JSLOCAL_CONST);
JS_ASSERT(entry->index < args->fun->u.i.nvars);
JS_ASSERT(args->nCopiedVars++ < args->fun->u.i.nvars);
i = args->fun->nargs + entry->index;
- constFlag = (entry->localKind == JSLOCAL_CONST) ? 1 : 0;
+ constFlag = (entry->localKind == JSLOCAL_CONST);
}
args->names[i] = (jsuword) entry->name | constFlag;
return JS_DHASH_NEXT;
}
jsuword *
js_GetLocalNameArray(JSContext *cx, JSFunction *fun, JSArenaPool *pool)
{
Attachment #298708 -
Attachment is obsolete: true
Attachment #299904 -
Flags: review+
Assignee | ||
Comment 32•17 years ago
|
||
I checked in the patch from comment 31 to the trunk:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1202515224&maxdate=1202515320&who=igor%25mir2.org
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Updated•17 years ago
|
Flags: in-testsuite-
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•