Closed Bug 554626 Opened 15 years ago Closed 15 years ago

Fix for bug 497789 causes big increase in L2 misses

Categories

(Core :: JavaScript Engine, defect, P1)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.3a4

People

(Reporter: jseward, Assigned: brendan)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

Bug 497789 contains a fix which majorly improves TM performance on v8-deltablue, but causes slowdowns of 17% on v8-raytrace and 6% on v8-earley-boyer. Initial investigations on the lossage for v8-raytrace indicate this is mostly IPC lossage due to a doubling of the L2 miss rate. For initial analysis and a bunch of background numbers see the following: https://bugzilla.mozilla.org/show_bug.cgi?id=497789#c105 https://bugzilla.mozilla.org/show_bug.cgi?id=497789#c106 https://bugzilla.mozilla.org/show_bug.cgi?id=497789#c107 https://bugzilla.mozilla.org/show_bug.cgi?id=497789#c108 https://bugzilla.mozilla.org/show_bug.cgi?id=497789#c109 From these it is apparent that the fix not only doubles the L2 miss rate, it also nearly doubles the heap space requirement for running v8-raytrace. The attached file contains before/after stats for L2 miss rates broken down on a per-function basis. From this it would appear that the increase is due to maintenance/use of a hash table that has something to do with the property tree.
(In reply to comment #0) > > From these it is apparent that the fix not only doubles the L2 miss > rate, it also nearly doubles the heap space requirement for running > v8-raytrace. That sounds worrisome. Do we understand why it happened?
hmm, I see the patch in bug 497789 uses JSDHashTable. js::HashTable outperforms JSDHashTable in microbenchmarks, so maybe it will help here. Probably won't do much about the heap space increase, though.
I'm looking into this now. It seems many of the misses arise from js::PropertyTree::getChild calling JS_DHashTableOperate calling ChangeTable.
If most of the overhead is in ChangeTable, then the problem would seem to be that the hash table is growing a lot then shrinking a lot over and over. A simple fix to reduce the frequency of resizes is to use JS_DHashTableSetAlphaBounds to set a higher maxAlpha and lower minAlpha.
Julian, could you post the stack trace that leads to getChild.
The stack won't tell us enough. I'm looking into this. Shape generation number is too high at end of benchmark, and combined with Julian's data this suggests too many empty scope shapes. More when I know more. Luke: the table is not shrinking except during GC and there's no GC run during v8-raytrace.js's execution. There's a followup bug to use jshashtable.h instead of jsdhash.h but it won't help here. /be
Assignee: general → brendan
Status: NEW → ASSIGNED
OS: Mac OS X → All
Priority: -- → P1
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.3a4
Oh, right, I forgot, ChangeTable is also called when the table size stays the same but the number of tombstones grows too high.
(In reply to comment #5) Err, being very confused by Callgrind here. But anyway, what I think I have is this, the numbers being the sum (L2 rd misses + L2 wr misses): 1498232 js::PropertyTree::getChild The trail leading to here goes directly back like this: 1498232 JSScope::getChildProperty 1498488 JSScope::addPropertyHelper 1498787 JSScope::putProperty 1498636 js_DefineNativeProperty 1498587 js_DefineProperty 1498289 args_resolve 1499144 js_LookupPropertyWithFlags <cycle 1> Then splits, there are 5 callers to js_LookupPropertyWithFlags 78039 js_Interpret 134321 js_Arguments 218606 JS_GetElement 221164 js_GetArgsValue 835872 js_GetLengthProperty chasing caller of js_GetLengthProperty: 835872 js_IsArrayLike in turn called by 792170 js_fun_apply then we're back at js_Interpret. Does that make any sense?
Interesting. Looks like this is the arguments object resolve hook constantly defining properties.
From what I can see, we touch arguments objects a lot here. Every time we read an indexed property of the arguments object (0, 1, 2, ...) we run the resolve hook and create a property on it on the fly. That seems to be slow. I will try to find out whether this is properly shared in the prop tree, or not even that. In general we should consider a denser (dense array-like) representation of arguments objects (I think).
Comment 10 is right on the money about how we badly deoptimize arguments.length and arguments[i] accesses, the latter in particular from js_fun_apply. I am fixing that. But the shape proliferation is due to another bug: arguments objects each get their own scope on creation, with its own shape, instead of sharing an emptyScope owned by their proto. This is because all arguments objects within a given scope-context (scope chain ending in a given global object) have Object.prototype as their proto, and since the class of the newborn differs from the class of its proto, Object.prototype cannot provide a shared emptyScope for each new arguments object to share. This is important to fix too. Patch soon. /be
Attached patch proposed patch (obsolete) — Splinter Review
This looks like it'll be a huge perf win. /be
Attachment #435537 - Flags: review?(igor)
Julian, thanks again for all the help. /be
Comment on attachment 435537 [details] [diff] [review] proposed patch From a quick look so far: > static JSObject * > NewArguments(JSContext *cx, JSObject *parent, uint32 argc, JSObject *callee) > { >- JSObject *argsobj = js_NewObject(cx, &js_ArgumentsClass, NULL, parent, 0); >- if (!argsobj || !js_EnsureReservedSlots(cx, argsobj, argc)) >+ JSObject *argsobj = js_NewGCObject(cx); >+ if (!argsobj) >+ return NULL; >+ >+ argsobj->map = cx->runtime->emptyArgumentsScope; >+ cx->runtime->emptyArgumentsScope->hold(); Is this hold call truly necessary since the empty scope persists over runtime lifetime? >+ >+ JSObject *proto; >+ if (!js_GetClassPrototype(cx, parent, JSProto_Object, &proto)) >+ return NULL; >+ argsobj->init(&js_ArgumentsClass, proto, parent, JSVAL_NULL); The init call must always follow the js_NewGCObject call not to expose uninitialized object to the GC.
Attachment #435537 - Flags: review?(igor) → review-
Comment on attachment 435537 [details] [diff] [review] proposed patch More comments: >@@ -2094,21 +2095,39 @@ js_fun_apply(JSContext *cx, uintN argc, ... >+ if (aobj && aobj->isArguments()) { >+ JSStackFrame *fp = (JSStackFrame *) aobj->getPrivate(); >+ if (fp) { >+ memcpy(sp, fp->argv, argc * sizeof(jsval)); >+ for (i = 0; i < argc; i++) { >+ if (aobj->dslots[i] == JSVAL_HOLE) // suppress deleted element >+ sp[i] = JSVAL_VOID; >+ } >+ } else { >+ JS_STATIC_ASSERT(JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS); >+ memcpy(sp, aobj->dslots, argc * sizeof(jsval)); >+ for (i = 0; i < argc; i++) { >+ if (sp[i] == JSVAL_HOLE) >+ sp[i] = JSVAL_VOID; >+ } if/else here can be merged into JS_STATIC_ASSERT(JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS); memcpy(sp, fp ? fp->argv : aobj->dslots, argc * sizeof(jsval)); for (i = 0; i < argc; i++) { if (sp[i] == JSVAL_HOLE) sp[i] = JSVAL_VOID; } Also the code avoids locking - see below. >diff --git a/js/src/jsfun.h b/js/src/jsfun.h >--- a/js/src/jsfun.h >+++ b/js/src/jsfun.h > extern JSClass js_ArgumentsClass; >+ >+inline bool >+JSObject::isArguments() const >+{ >+ return getClass() == &js_ArgumentsClass; >+} Defining the inline here looks arbitrary. jsobjinlines.h is a better place IMO. Alternatively extern JSClass js_ArgumentsClass can e moved into jsobj.h > > BEGIN_CASE(JSOP_CALLPROP) >@@ -1885,29 +1892,46 @@ BEGIN_CASE(JSOP_GETELEM) > rval = STRING_TO_JSVAL(str); > goto end_getelem; > } > } > > VALUE_TO_OBJECT(cx, -2, lval, obj); > if (JSVAL_IS_INT(rval)) { > if (obj->isDenseArray()) { >- jsuint length; >- >- length = js_DenseArrayCapacity(obj); >- i = JSVAL_TO_INT(rval); >- if ((jsuint)i < length && >- i < obj->fslots[JSSLOT_ARRAY_LENGTH]) { >- rval = obj->dslots[i]; >+ jsuint idx = JSVAL_TO_INT(rval); Nit: add a cast from int to jsuint around JSVAL_TO_INT. >+ >+ if (idx < jsuint(obj->fslots[JSSLOT_ARRAY_LENGTH]) && >+ idx < js_DenseArrayCapacity(obj)) { >+ rval = obj->dslots[idx]; > if (rval != JSVAL_HOLE) > goto end_getelem; > > /* Reload rval from the stack in the rare hole case. */ > rval = FETCH_OPND(-1); > } >+ } else if (obj->isArguments() >+#ifdef JS_TRACER >+ && !GetArgsPrivateNative(obj) >+#endif >+ ) { >+ uint32 arg = uint32(JSVAL_TO_INT(rval)); >+ >+ if (arg < GetArgsLength(obj)) { >+ JSStackFrame *afp = (JSStackFrame *) obj->getPrivate(); >+ if (afp) { >+ rval = afp->argv[arg]; >+ goto end_getelem; >+ } >+ >+ rval = OBJ_GET_SLOT(cx, obj, JSSLOT_ARGS_COPY_START + arg); The patch assumes in other places that the argument object is single-threaded so accessing its dynamic slots does not require locking. Thus the code should either use locking everywhere or avoid it completly and use STOBJ_GET_LOT here and in PutArguments. Also, to follow other examples the code should statically assert that JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS and access the dslots directly. >diff --git a/js/src/jsprvtd.h b/js/src/jsprvtd.h > > /* >- * Maximum supported value of Arguments.length. It bounds the maximum number >- * of arguments that can be supplied to the function call using >- * Function.prototype.apply. This value also gives the maximum number of >- * elements in the array initializer. >+ * Maximum supported value of arguments.length. It bounds the maximum number of >+ * arguments that can be supplied via the second (so-called |argArray|) param >+ * to Function.prototype.apply. This value also bounds the number of elements >+ * parsed in an array initialiser. > */ > #define JS_ARGS_LENGTH_MAX (JS_BIT(24) - 1) Nit: add a static assert that JS_ARGS_LENGTH_MAX <= int jsval max.
(In reply to comment #14) > > static JSObject * > > NewArguments(JSContext *cx, JSObject *parent, uint32 argc, JSObject *callee) > > { > >- JSObject *argsobj = js_NewObject(cx, &js_ArgumentsClass, NULL, parent, 0); > >- if (!argsobj || !js_EnsureReservedSlots(cx, argsobj, argc)) > >+ JSObject *argsobj = js_NewGCObject(cx); > >+ if (!argsobj) > >+ return NULL; > >+ > >+ argsobj->map = cx->runtime->emptyArgumentsScope; > >+ cx->runtime->emptyArgumentsScope->hold(); > > Is this hold call truly necessary since the empty scope persists over runtime > lifetime? It is needed to balance either this drop at end of js_GetMutableScope: static_cast<JSEmptyScope *>(scope)->drop(cx); or this one: static_cast<JSEmptyScope *>(scope)->dropFromGC(cx); from FinalizeObject, in the case where the arguments object never was mutated in a way that gave it its own scope. > >+ JSObject *proto; > >+ if (!js_GetClassPrototype(cx, parent, JSProto_Object, &proto)) > >+ return NULL; > > >+ argsobj->init(&js_ArgumentsClass, proto, parent, JSVAL_NULL); > > The init call must always follow the js_NewGCObject call not to expose > uninitialized object to the GC. Ok, fixing -- thanks. I'm relying on a newborn root here, aren't I? Yuck. (In reply to comment #15) > >@@ -2094,21 +2095,39 @@ js_fun_apply(JSContext *cx, uintN argc, > ... > >+ if (aobj && aobj->isArguments()) { > >+ JSStackFrame *fp = (JSStackFrame *) aobj->getPrivate(); > >+ if (fp) { > >+ memcpy(sp, fp->argv, argc * sizeof(jsval)); > >+ for (i = 0; i < argc; i++) { > >+ if (aobj->dslots[i] == JSVAL_HOLE) // suppress deleted element > >+ sp[i] = JSVAL_VOID; > >+ } > >+ } else { > >+ JS_STATIC_ASSERT(JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS); > >+ memcpy(sp, aobj->dslots, argc * sizeof(jsval)); > >+ for (i = 0; i < argc; i++) { > >+ if (sp[i] == JSVAL_HOLE) > >+ sp[i] = JSVAL_VOID; > >+ } > > if/else here can be merged into > > JS_STATIC_ASSERT(JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS); > memcpy(sp, fp ? fp->argv : aobj->dslots, argc * sizeof(jsval)); > for (i = 0; i < argc; i++) { > if (sp[i] == JSVAL_HOLE) > sp[i] = JSVAL_VOID; > } This is not equivalent, since the first loop in the patch copies from fp->argv but tests aobj->dslots[i] against JSVAL_HOLE. > Also the code avoids locking - see below. MT arguments objects? Just say no ;-). > >+++ b/js/src/jsfun.h > > extern JSClass js_ArgumentsClass; > >+ > >+inline bool > >+JSObject::isArguments() const > >+{ > >+ return getClass() == &js_ArgumentsClass; > >+} > > Defining the inline here looks arbitrary. jsobjinlines.h is a better place IMO. > Alternatively extern JSClass js_ArgumentsClass can e moved into jsobj.h We already have the JSObject::is<Class> tests farmed out to the .h files where the extern JSClass (sometimes friend, sometimes not) declarations live. I was just following that pattern and prefer it for modularity reasons (ok, you can't have it both ways, but putting all the JSClass externs together seems slightly less modular, with precedent favoring keeping them subsidiary). > The patch assumes in other places that the argument object is single-threaded > so accessing its dynamic slots does not require locking. Thus the code should > either use locking everywhere or avoid it completly and use STOBJ_GET_LOT here > and in PutArguments. Also, to follow other examples the code should statically > assert that JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS and access the dslots > directly. Ok, good points. Thanks for the fast turnaround. I will update the patch today, in a few hours. /be
(In reply to comment #15) > > #define JS_ARGS_LENGTH_MAX (JS_BIT(24) - 1) > > Nit: add a static assert that JS_ARGS_LENGTH_MAX <= int jsval max. You already did, in jsfun.h: jsfun.h:JS_STATIC_ASSERT(JS_ARGS_LENGTH_MAX <= JS_BIT(30)); jsfun.h:JS_STATIC_ASSERT(jsval((JS_ARGS_LENGTH_MAX << 1) | 1) <= JSVAL_INT_MAX); But this reminds me: why is JS_ARGS_LENGTH_MAX #define'd in jsprvtd.h instead of in jsfun.h? $ grep -l JS_ARGS_LENGTH_MAX *.h *.cpp | sort > /tmp/1 $ grep -l jsfun.h *.cpp | sort > /tmp/2 $ comm -3 /tmp/[12] jsapi.cpp jscntxt.cpp jsdbgapi.cpp jsdtracef.cpp jsemit.cpp jsexn.cpp jsfun.h jsgc.cpp jsiter.cpp jsobj.cpp json.cpp jsopcode.cpp jsops.cpp jsprvtd.h jsregexp.cpp jsscope.cpp jsscript.cpp jsstr.cpp jstracer.cpp jsxml.cpp And: $ grep -l '#include "jsfun.h"' *.h jsinterp.h jsscopeinlines.h jsscriptinlines.h so it seems JS_ARGS_LENGTH_MAX could be moved to jsfun.h with only two direct #include "jsfun.h" being added (to jsscope.cpp, jsstr.cpp, which already have indirect dependencies on jsfun.h). What do you think? /be
(In reply to comment #16) > > Is this hold call truly necessary since the empty scope persists over runtime > > lifetime? > > It is needed to balance either this drop at end of js_GetMutableScope: > > static_cast<JSEmptyScope *>(scope)->drop(cx); Right, I will file a bug to avoid that drop for known shared empty scopes. > > >+ argsobj->init(&js_ArgumentsClass, proto, parent, JSVAL_NULL); > > > > The init call must always follow the js_NewGCObject call not to expose > > uninitialized object to the GC. > > Ok, fixing -- thanks. I'm relying on a newborn root here, aren't I? Yuck. This is not a problem. The real one is that js_NewGCObject returns something that must be initialized before the GC can ever reach it perhaps via newborn scanning. > This is not equivalent, since the first loop in the patch copies from fp->argv > but tests aobj->dslots[i] against JSVAL_HOLE. Yep, I was wrong. Still the code needs comments that at least points into the arguments code in jsfun.cpp. > > > Also the code avoids locking - see below. > > MT arguments objects? Just say no ;-). Whatever the preference, it should be consistent.
(In reply to comment #17) > so it seems JS_ARGS_LENGTH_MAX could be moved to jsfun.h with only two direct > #include "jsfun.h" being added (to jsscope.cpp, jsstr.cpp, which already have > indirect dependencies on jsfun.h). What do you think? Yes, lets do it.
Blocks: 555722
Attached patch proposed patch, v2 (obsolete) — Splinter Review
Attachment #435537 - Attachment is obsolete: true
Attachment #435669 - Flags: review?(igor)
Comment on attachment 435675 [details] [diff] [review] interdiff of two patches (not sure why bugzilla interdiff failed) > PutArguments(JSContext *cx, JSObject *argsobj, jsval *args) > { > uint32 argc = GetArgsLength(argsobj); >- JS_LOCK_OBJ(cx, argsobj); > for (uint32 i = 0; i != argc; ++i) { >- jsval v = STOBJ_GET_SLOT(argsobj, JSSLOT_ARGS_COPY_START + i); >+ jsval v = argsobj->dslots[i]; > if (v != JSVAL_HOLE) >- STOBJ_SET_SLOT(argsobj, JSSLOT_ARGS_COPY_START + i, args[i]); >+ argsobj->dslots[i] = args[i]; The final nit: if we ever change the number of fixed slots in the object it would be necessary to find all those places with hard-coded assumption that JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS. To avoid that and to make the slot access self-documenting I suggest to add a helper method like ArgumentCopySlots(obj) { return obj->dslots; } and always use it when accessing argument slots.
Attachment #435669 - Flags: review?(igor) → review+
(In reply to comment #22) > (From update of attachment 435675 [details] [diff] [review]) > > PutArguments(JSContext *cx, JSObject *argsobj, jsval *args) > > { > > uint32 argc = GetArgsLength(argsobj); > >- JS_LOCK_OBJ(cx, argsobj); > > for (uint32 i = 0; i != argc; ++i) { > >- jsval v = STOBJ_GET_SLOT(argsobj, JSSLOT_ARGS_COPY_START + i); > >+ jsval v = argsobj->dslots[i]; > > if (v != JSVAL_HOLE) > >- STOBJ_SET_SLOT(argsobj, JSSLOT_ARGS_COPY_START + i, args[i]); > >+ argsobj->dslots[i] = args[i]; > > The final nit: if we ever change the number of fixed slots in the object it > would be necessary to find all those places with hard-coded assumption that > JSSLOT_ARGS_COPY_START == JS_INITIAL_NSLOTS. To avoid that and to make the slot > access self-documenting I suggest to add a helper method like > ArgumentCopySlots(obj) { return obj->dslots; } and always use it when accessing > argument slots. Sure -- I was really pulling the wrong direction here, given njn's bug 555429. /be
Attached patch patch to commitSplinter Review
I took the liberty of shortening a few names and using const instead of #define. /be
Attachment #435669 - Attachment is obsolete: true
Attachment #435675 - Attachment is obsolete: true
Attachment #435726 - Flags: review+
Any word on the resulting perf win?
On my noisy MBP: raytrace: 1.96x as fast 2345.6ms +/- 3.6% 1198.9ms +/- 3.7% significant /be
> raytrace: 1.96x as fast 2345.6ms +/- 3.6% 1198.9ms +/- 3.7% Awesome. Is that the only one affected?
(In reply to comment #27) > > raytrace: 1.96x as fast 2345.6ms +/- 3.6% 1198.9ms +/- 3.7% > > Awesome. Is that the only one affected? $ grep -wl arguments v8/*.js t/*.js v8/earley-boyer.js v8/raytrace.js t/crypto-md5.js t/crypto-sha1.js t/string-unpack-code.js So earley-boyer might speed up, along with the SunSpider ones that use arguments, but of course it depends on frequency at runtime of arguments property gets and sets. Here's the full v8-suite comparison: $ ./sunspider-compare-results --shell=$HOME/nojs --v8-suite TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.073x as fast 24762.3ms +/- 1.0% 23074.4ms +/- 0.9% significant ============================================================================= v8: 1.073x as fast 24762.3ms +/- 1.0% 23074.4ms +/- 0.9% significant crypto: 1.033x as fast 3605.6ms +/- 1.9% 3490.0ms +/- 2.4% significant deltablue: - 5198.9ms +/- 1.6% 5124.9ms +/- 3.2% earley-boyer: 1.041x as fast 2654.7ms +/- 3.0% 2549.9ms +/- 2.9% significant raytrace: 1.96x as fast 2345.6ms +/- 3.6% 1198.9ms +/- 3.7% significant regexp: *1.063x as slow* 1591.5ms +/- 2.5% 1691.8ms +/- 3.1% significant richards: 1.038x as fast 7921.9ms +/- 1.0% 7628.8ms +/- 1.7% significant splay: - 1444.1ms +/- 4.9% 1390.1ms +/- 3.4% I haven't had time (moving households) to run SunSpider tests, and my MBP is as noisy as anyone's. Maybe someone else can try. I'll rebase the patch now that the great back-out/re-add seems to be over. /be
Whiteboard: fixed-in-tracemonkey
(In reply to comment #28) > t/crypto-md5.js > t/crypto-sha1.js These don't really use arguments, just have "arguments" in some comment text.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 597816
This seems to have broken at least some uses of jquery; see bug 597816
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: