Closed
Bug 469044
Opened 16 years ago
Closed 16 years ago
TM: Support code with type-unstable globals
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
VERIFIED
FIXED
People
(Reporter: gal, Assigned: gal)
References
Details
(Keywords: verified1.9.1, Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 26 obsolete files)
53.38 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•16 years ago
|
||
Assignee: general → gal
runs trace.js now
Attachment #352454 -
Attachment is obsolete: true
fixed a bunch of allocation and typemap offset bugs and cleaned up some code.
we pass a lot more trace-tests now, but still assert about halfway through.
Attachment #352607 -
Attachment is obsolete: true
helps to attach the right file.
Attachment #352644 -
Attachment is obsolete: true
Updated•16 years ago
|
Assignee: gal → danderson
More tests pass now, but hits an assert after the andTests:
> Assertion failure: JSVAL_TAG(v) == JSVAL_STRING, at ../jstracer.cpp:1417
This patch also cleans up TreeInfo to have wrappers around the type map members.
Attachment #352645 -
Attachment is obsolete: true
passes trace-tests after fixing a few remaining bugs.
I will do performance testing shortly, a cursory run of fannkuch (my arch nemesis) shows the number of side exits grew, which might be explainable by the change in demotion logic.
Attachment #354872 -
Attachment is obsolete: true
Assignee | ||
Comment 7•16 years ago
|
||
Comment on attachment 354911 [details] [diff] [review]
v5, works
>diff -r d0e8862aa513 js/src/jscntxt.h
>--- a/js/src/jscntxt.h Mon Dec 29 23:22:23 2008 -0800
>+++ b/js/src/jscntxt.h Tue Dec 30 22:42:23 2008 -0500
>@@ -101,7 +101,6 @@
> class TraceRecorder;
> extern "C++" { template<typename T> class Queue; }
> typedef Queue<uint16> SlotList;
>-class TypeMap;
>
> # define CLS(T) T*
> #else
>@@ -125,7 +124,6 @@
> CLS(TraceRecorder) recorder;
> uint32 globalShape;
> CLS(SlotList) globalSlots;
>- CLS(TypeMap) globalTypeMap;
> jsval *recoveryDoublePool;
> jsval *recoveryDoublePoolPtr;
>
>diff -r d0e8862aa513 js/src/jstracer.cpp
>--- a/js/src/jstracer.cpp Mon Dec 29 23:22:23 2008 -0800
>+++ b/js/src/jstracer.cpp Tue Dec 30 22:42:23 2008 -0500
>@@ -113,12 +113,6 @@
>
> /* Max number of branches per tree. */
> #define MAX_BRANCHES 16
>-
>-/* Macros for demote slot lists */
>-#define ALLOCA_UNDEMOTE_SLOTLIST(num) (unsigned*)alloca(((num) + 1) * sizeof(unsigned))
>-#define ADD_UNDEMOTE_SLOT(list, slot) list[++list[0]] = slot
>-#define NUM_UNDEMOTE_SLOTS(list) list[0]
>-#define CLEAR_UNDEMOTE_SLOTLIST(list) list[0] = 0
>
> #ifdef JS_JIT_SPEW
> #define ABORT_TRACE(msg) do { debug_only_v(fprintf(stdout, "abort: %d: %s\n", __LINE__, msg);) return false; } while (0)
>@@ -906,8 +900,8 @@
>
> #define FORALL_SLOTS(cx, ngslots, gslots, callDepth, code) \
> JS_BEGIN_MACRO \
>+ FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, code); \
> FORALL_GLOBAL_SLOTS(cx, ngslots, gslots, code); \
>- FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth, code); \
> JS_END_MACRO
>
> /* Calculate the total number of native frame slots we need from this frame
>@@ -944,14 +938,49 @@
> JS_NOT_REACHED("js_NativeStackSlots");
> }
>
>-/* Capture the type map for the selected slots of the global object. */
>-JS_REQUIRES_STACK void
>-TypeMap::captureGlobalTypes(JSContext* cx, SlotList& slots)
>+/*
>+ * Capture the type map for the selected slots of the global object and currently pending
>+ * stack frames.
>+ */
>+JS_REQUIRES_STACK void
>+TypeMap::captureTypes(JSContext* cx, SlotList& slots, unsigned callDepth)
> {
> unsigned ngslots = slots.length();
> uint16* gslots = slots.data();
>- setLength(ngslots);
>+ setLength(js_NativeStackSlots(cx, callDepth) + ngslots);
> uint8* map = data();
>+ uint8* m = map;
>+ FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth,
>+ uint8 type = getCoercedType(*vp);
>+ if ((type == JSVAL_INT) && oracle.isStackSlotUndemotable(cx, unsigned(m - map)))
>+ type = JSVAL_DOUBLE;
>+ JS_ASSERT(type != JSVAL_BOXED);
>+ debug_only_v(printf("capture stack type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);)
>+ JS_ASSERT(uintptr_t(m - map) < length());
>+ *m++ = type;
>+ );
>+ FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
>+ uint8 type = getCoercedType(*vp);
>+ if ((type == JSVAL_INT) && oracle.isGlobalSlotUndemotable(cx, gslots[n]))
>+ type = JSVAL_DOUBLE;
>+ JS_ASSERT(type != JSVAL_BOXED);
>+ debug_only_v(printf("capture global type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);)
>+ JS_ASSERT(uintptr_t(m - map) < length());
>+ *m++ = type;
>+ );
>+ JS_ASSERT(uintptr_t(m - map) == length());
>+}
>+
>+JS_REQUIRES_STACK void
>+TypeMap::captureMissingGlobalTypes(JSContext* cx, SlotList& slots, unsigned stackSlots)
>+{
>+ unsigned oldSlots = length() - stackSlots;
>+ int diff = slots.length() - oldSlots;
>+ JS_ASSERT(diff >= 0);
>+ unsigned ngslots = slots.length();
>+ uint16* gslots = slots.data();
>+ setLength(length() + diff);
>+ uint8* map = data() + stackSlots;
> uint8* m = map;
> FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
> uint8 type = getCoercedType(*vp);
>@@ -959,25 +988,9 @@
> type = JSVAL_DOUBLE;
> JS_ASSERT(type != JSVAL_BOXED);
> debug_only_v(printf("capture global type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);)
>- *m++ = type;
>- );
>-}
>-
>-/* Capture the type map for the currently pending stack frames. */
>-JS_REQUIRES_STACK void
>-TypeMap::captureStackTypes(JSContext* cx, unsigned callDepth)
>-{
>- setLength(js_NativeStackSlots(cx, callDepth));
>- uint8* map = data();
>- uint8* m = map;
>- FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth,
>- uint8 type = getCoercedType(*vp);
>- if ((type == JSVAL_INT) &&
>- oracle.isStackSlotUndemotable(cx, unsigned(m - map))) {
>- type = JSVAL_DOUBLE;
>- }
>- debug_only_v(printf("capture stack type %s%d: %d=%c\n", vpname, vpnum, type, typeChar[type]);)
>- *m++ = type;
>+ *m = type;
>+ JS_ASSERT((m > map + oldSlots) || (*m == type));
>+ m++;
> );
> }
>
>@@ -1008,7 +1021,7 @@
>
> JS_REQUIRES_STACK
> TraceRecorder::TraceRecorder(JSContext* cx, VMSideExit* _anchor, Fragment* _fragment,
>- TreeInfo* ti, unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap,
>+ TreeInfo* ti, unsigned stackSlots, unsigned ngslots, uint8* typeMap,
> VMSideExit* innermostNestedGuard, Fragment* outerToBlacklist)
> {
> JS_ASSERT(!_fragment->vmprivate && ti);
>@@ -1061,7 +1074,7 @@
> eor_ins = addName(lir->insLoad(LIR_ldp, lirbuf->state, offsetof(InterpState, eor)), "eor");
>
> /* read into registers all values on the stack and all globals we know so far */
>- import(treeInfo, lirbuf->sp, ngslots, callDepth, globalTypeMap, stackTypeMap);
>+ import(treeInfo, lirbuf->sp, stackSlots, ngslots, callDepth, typeMap);
>
> #if defined(JS_HAS_OPERATION_COUNT) && !JS_HAS_OPERATION_COUNT
> if (fragment == fragment->root) {
>@@ -1588,8 +1601,8 @@
> }
>
> JS_REQUIRES_STACK void
>-TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned ngslots, unsigned callDepth,
>- uint8* globalTypeMap, uint8* stackTypeMap)
>+TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned stackSlots, unsigned ngslots,
>+ unsigned callDepth, uint8* typeMap)
> {
> /* If we get a partial list that doesn't have all the types (i.e. recording from a side
> exit that was recorded but we added more global slots later), merge the missing types
>@@ -1602,11 +1615,14 @@
> map. Since thats exactly what we used to fill in the types our current side exit
> didn't provide, this is always safe to do. */
> unsigned length;
>- if (ngslots < (length = traceMonitor->globalTypeMap->length()))
>- mergeTypeMaps(&globalTypeMap, &ngslots,
>- traceMonitor->globalTypeMap->data(), length,
>+ /* This is potentially the typemap of the side exit and thus shorter than the tree's
>+ global type map. */
>+ uint8* globalTypeMap = typeMap + stackSlots;
>+ if (ngslots < (length = treeInfo->globalSlots()))
>+ mergeTypeMaps(&globalTypeMap/*out param*/, &ngslots/*out param*/,
>+ treeInfo->globalTypeMap(), length,
> (uint8*)alloca(sizeof(uint8) * length));
>- JS_ASSERT(ngslots == traceMonitor->globalTypeMap->length());
>+ JS_ASSERT(ngslots == treeInfo->globalSlots());
>
> /* the first time we compile a tree this will be empty as we add entries lazily */
> uint16* gslots = traceMonitor->globalSlots->data();
>@@ -1616,7 +1632,7 @@
> m++;
> );
> ptrdiff_t offset = -treeInfo->nativeStackBase;
>- m = stackTypeMap;
>+ m = typeMap;
> FORALL_SLOTS_IN_PENDING_FRAMES(cx, callDepth,
> import(sp, offset, vp, *m, vpname, vpnum, fp);
> m++; offset += sizeof(double);
>@@ -1641,7 +1657,7 @@
> uint8 type = getCoercedType(*vp);
> if ((type == JSVAL_INT) && oracle.isGlobalSlotUndemotable(cx, slot))
> type = JSVAL_DOUBLE;
>- traceMonitor->globalTypeMap->add(type);
>+ treeInfo->typeMap.add(type);
> import(gp_ins, slot*sizeof(double), vp, type, "global", index, NULL);
> return true;
> }
>@@ -1769,13 +1785,14 @@
> /* Promote slots if necessary to match the called tree' type map and report error if thats
> impossible. */
> JS_REQUIRES_STACK bool
>-TraceRecorder::adjustCallerTypes(Fragment* f, unsigned* demote_slots, bool& trash)
>+TraceRecorder::adjustCallerTypes(Fragment* f, bool& trash)
> {
> JSTraceMonitor* tm = traceMonitor;
>- uint8* m = tm->globalTypeMap->data();
>- uint16* gslots = traceMonitor->globalSlots->data();
>- unsigned ngslots = traceMonitor->globalSlots->length();
>- uint8* map = ((TreeInfo*)f->vmprivate)->stackTypeMap.data();
>+ uint8* m = treeInfo->globalTypeMap();
>+ uint16* gslots = tm->globalSlots->data();
>+ unsigned ngslots = tm->globalSlots->length();
>+ JS_ASSERT(ngslots == treeInfo->globalSlots());
>+ uint8* map = ((TreeInfo*)f->vmprivate)->typeMap.data();
> bool ok = true;
> trash = false;
> FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
>@@ -1798,22 +1815,17 @@
> lir->insStorei(get(vp), lirbuf->sp,
> -treeInfo->nativeStackBase + nativeStackOffset(vp));
> /* Aggressively undo speculation so the inner tree will compile if this fails. */
>- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map));
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - map));
> } else if (!isPromote && *m == JSVAL_INT) {
> debug_only_v(printf("adjusting will fail, %s%d, slot %d\n", vpname, vpnum, m - map);)
> ok = false;
>- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map));
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - map));
> } else if (JSVAL_IS_INT(*vp) && *m == JSVAL_DOUBLE) {
> /* Aggressively undo speculation so the inner tree will compile if this fails. */
>- ADD_UNDEMOTE_SLOT(demote_slots, unsigned(m - map));
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - map));
> }
> ++m;
> );
>- /* If this isn't okay, tell the oracle. */
>- if (!ok) {
>- for (unsigned i = 1; i <= NUM_UNDEMOTE_SLOTS(demote_slots); i++)
>- oracle.markStackSlotUndemotable(cx, demote_slots[i]);
>- }
> JS_ASSERT(f == f->root);
> return ok;
> }
>@@ -1925,7 +1937,7 @@
> the top of the stack is a boxed value. Either pc[-cs.length] is JSOP_NEXTITER and we
> want one below top of stack, or else it's JSOP_CALL and we want top of stack. */
> if (resumeAfter) {
>- m[(pc[-cs.length] == JSOP_NEXTITER) ? -2 : -1] = JSVAL_BOXED;
>+ typemap[stackSlots + ((pc[-cs.length] == JSOP_NEXTITER) ? -2 : -1)] = JSVAL_BOXED;
>
> /* Now restore the the original pc (after which early returns are ok). */
> MUST_FLOW_LABEL(restore_pc);
>@@ -1949,7 +1961,7 @@
> for (unsigned n = 0; n < nexits; ++n) {
> VMSideExit* e = exits[n];
> if (e->ip_adj == ip_adj &&
>- !memcmp(getTypeMap(exits[n]), typemap, typemap_size)) {
>+ !memcmp(getFullTypeMap(exits[n]), typemap, typemap_size)) {
> LIns* data = lir_buf_writer->skip(sizeof(GuardRecord));
> GuardRecord* rec = (GuardRecord*)data->payload();
> /* setup guard record structure with shared side exit */
>@@ -1987,7 +1999,7 @@
> exit->ip_adj = ip_adj;
> exit->sp_adj = (stackSlots * sizeof(double)) - treeInfo->nativeStackBase;
> exit->rp_adj = exit->calldepth * sizeof(FrameInfo);
>- memcpy(getTypeMap(exit), typemap, typemap_size);
>+ memcpy(getFullTypeMap(exit), typemap, typemap_size);
>
> /* BIG FAT WARNING: If compilation fails, we currently don't reset the lirbuf so its safe
> to keep references to the side exits here. If we ever start rewinding those lirbufs,
>@@ -2088,85 +2100,80 @@
> * @param root_peer First fragment in peer list.
> * @param stable_peer Outparam for first type stable peer.
> * @param trash Whether to trash the tree (demotion).
>- * @param demotes Array to store demotable stack slots.
>+ * @param demote Tree if stability was achieved through demotion.
True not tree.
> * @return True if type stable, false otherwise.
> */
> JS_REQUIRES_STACK bool
>-TraceRecorder::deduceTypeStability(Fragment* root_peer, Fragment** stable_peer, unsigned* demotes)
>+TraceRecorder::deduceTypeStability(Fragment* root_peer, Fragment** stable_peer, bool& demote)
> {
> uint8* m;
> uint8* typemap;
> unsigned ngslots = traceMonitor->globalSlots->length();
> uint16* gslots = traceMonitor->globalSlots->data();
>- JS_ASSERT(traceMonitor->globalTypeMap->length() == ngslots);
>+ JS_ASSERT(ngslots == treeInfo->globalSlots());
>
> if (stable_peer)
> *stable_peer = NULL;
>
>- CLEAR_UNDEMOTE_SLOTLIST(demotes);
>-
>+ demote = false;
>+
> /*
> * Rather than calculate all of this stuff twice, it gets cached locally. The "stage" buffers
> * are for calls to set() that will change the exit types.
> */
> bool success;
>- bool unstable_from_undemotes;
> unsigned stage_count;
>- jsval** stage_vals = (jsval**)alloca(sizeof(jsval*) * (ngslots + treeInfo->stackTypeMap.length()));
>- LIns** stage_ins = (LIns**)alloca(sizeof(LIns*) * (ngslots + treeInfo->stackTypeMap.length()));
>+ jsval** stage_vals = (jsval**)alloca(sizeof(jsval*) * (treeInfo->typeMap.length()));
>+ LIns** stage_ins = (LIns**)alloca(sizeof(LIns*) * (treeInfo->typeMap.length()));
>
> /* First run through and see if we can close ourselves - best case! */
> stage_count = 0;
> success = false;
>- unstable_from_undemotes = false;
>
> debug_only_v(printf("Checking type stability against self=%p\n", fragment);)
>
>- m = typemap = traceMonitor->globalTypeMap->data();
>+ m = typemap = treeInfo->globalTypeMap();
> FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
> debug_only_v(printf("%s%d ", vpname, vpnum);)
> if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) {
> /* If the failure was an int->double, tell the oracle. */
>- if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp)))
>+ if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) {
> oracle.markGlobalSlotUndemotable(cx, gslots[n]);
>- trashSelf = true;
>- goto checktype_fail_1;
>+ demote = true;
>+ } else {
>+ goto checktype_fail_1;
>+ }
> }
> ++m;
> );
>- m = typemap = treeInfo->stackTypeMap.data();
>+ m = typemap = treeInfo->stackTypeMap();
> FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0,
> debug_only_v(printf("%s%d ", vpname, vpnum);)
> if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count)) {
>- if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp)))
>- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap));
>- else
>+ if (*m == JSVAL_INT && isNumber(*vp) && !isPromoteInt(get(vp))) {
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap));
>+ demote = true;
>+ } else {
> goto checktype_fail_1;
>+ }
> }
> ++m;
> );
>
>- /*
>- * If there's an exit that's unstable because of undemotable slots, we want to search for
>- * peers just in case we can make a connection.
>- */
>- if (NUM_UNDEMOTE_SLOTS(demotes))
>- unstable_from_undemotes = true;
>- else
>- success = true;
>+ success = true;
>
> checktype_fail_1:
> /* If we got a success and we don't need to recompile, we should just close here. */
>- if (success) {
>+ if (success && !demote) {
> for (unsigned i = 0; i < stage_count; i++)
> set(stage_vals[i], stage_ins[i]);
> return true;
> /* If we need to trash, don't bother checking peers. */
> } else if (trashSelf) {
> return false;
>- } else {
>- CLEAR_UNDEMOTE_SLOTLIST(demotes);
>- }
>+ }
>+
>+ demote = false;
>
> /* At this point the tree is about to be incomplete, so let's see if we can connect to any
> * peer fragment that is type stable.
>@@ -2179,17 +2186,25 @@
> continue;
> ti = (TreeInfo*)f->vmprivate;
> /* Don't allow varying stack depths */
>- if (ti->stackTypeMap.length() != treeInfo->stackTypeMap.length())
>+ if ((ti->stackSlots != treeInfo->stackSlots) ||
>+ (ti->typeMap.length() != treeInfo->typeMap.length()))
> continue;
> stage_count = 0;
> success = false;
>- m = ti->stackTypeMap.data();
>-
>+
>+ m = ti->globalTypeMap();
>+ FORALL_GLOBAL_SLOTS(cx, traceMonitor->globalSlots->length(), traceMonitor->globalSlots->data(),
>+ if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count))
>+ goto checktype_fail_2;
>+ ++m;
>+ );
>+
>+ m = ti->stackTypeMap();
> FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0,
>- if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count))
>- goto checktype_fail_2;
>- ++m;
>- );
>+ if (!checkType(*vp, *m, stage_vals[stage_count], stage_ins[stage_count], stage_count))
>+ goto checktype_fail_2;
>+ ++m;
>+ );
>
> success = true;
>
>@@ -2203,33 +2218,49 @@
> set(stage_vals[i], stage_ins[i]);
> if (stable_peer)
> *stable_peer = f;
>- return false;
>- }
>- }
>-
>- JS_ASSERT(NUM_UNDEMOTE_SLOTS(demotes) == 0);
>+ demote = false;
>+ return false;
>+ }
>+ }
>
> /*
> * If this is a loop trace and it would be stable with demotions, build an undemote list
> * and return true. Our caller should sniff this and trash the tree, recording a new one
> * that will assumedly stabilize.
> */
>- if (unstable_from_undemotes && fragment->kind == LoopTrace) {
>- typemap = m = treeInfo->stackTypeMap.data();
>+ if (demote && fragment->kind == LoopTrace) {
>+ typemap = m = treeInfo->globalTypeMap();
>+ FORALL_GLOBAL_SLOTS(cx, traceMonitor->globalSlots->length(), traceMonitor->globalSlots->data(),
>+ if (*m == JSVAL_INT) {
>+ JS_ASSERT(isNumber(*vp));
>+ if (!isPromoteInt(get(vp)))
>+ oracle.markGlobalSlotUndemotable(cx, gslots[n]);
>+ } else if (*m == JSVAL_DOUBLE) {
>+ JS_ASSERT(isNumber(*vp));
>+ oracle.markGlobalSlotUndemotable(cx, gslots[n]);
>+ } else {
>+ JS_ASSERT(*m == JSVAL_TAG(*vp));
>+ }
>+ m++;
>+ );
>+
>+ typemap = m = treeInfo->stackTypeMap();
> FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0,
> if (*m == JSVAL_INT) {
> JS_ASSERT(isNumber(*vp));
> if (!isPromoteInt(get(vp)))
>- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap));
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap));
> } else if (*m == JSVAL_DOUBLE) {
> JS_ASSERT(isNumber(*vp));
>- ADD_UNDEMOTE_SLOT(demotes, unsigned(m - typemap));
>+ oracle.markStackSlotUndemotable(cx, unsigned(m - typemap));
> } else {
> JS_ASSERT((*m == JSVAL_TNULL) ? JSVAL_IS_NULL(*vp) : *m == JSVAL_TAG(*vp));
> }
> m++;
> );
> return true;
>+ } else {
>+ demote = false;
> }
>
> return false;
>@@ -2283,11 +2314,11 @@
> js_JoinPeersIfCompatible(Fragmento* frago, Fragment* stableFrag, TreeInfo* stableTree,
> VMSideExit* exit)
> {
>- JS_ASSERT(exit->numStackSlots == stableTree->stackTypeMap.length());
>+ JS_ASSERT(exit->numStackSlots == stableTree->stackSlots);
>+
> /* Must have a matching type unstable exit. */
>- if (memcmp(getTypeMap(exit) + exit->numGlobalSlots,
>- stableTree->stackTypeMap.data(),
>- stableTree->stackTypeMap.length()) != 0) {
>+ if ((exit->numGlobalSlots + exit->numStackSlots != stableTree->typeMap.length()) ||
>+ memcmp(getFullTypeMap(exit), stableTree->typeMap.data(), stableTree->typeMap.length())) {
> return false;
> }
>
>@@ -2301,7 +2332,7 @@
>
> /* Complete and compile a trace and link it to the existing tree if appropriate. */
> JS_REQUIRES_STACK bool
>-TraceRecorder::closeLoop(Fragmento* fragmento, bool& demote, unsigned *demotes)
>+TraceRecorder::closeLoop(Fragmento* fragmento, bool& demote)
> {
> bool stable;
> LIns* exitIns;
>@@ -2309,8 +2340,6 @@
> VMSideExit* exit;
> Fragment* peer_root;
>
>- demote = false;
>-
> exitIns = snapshot(UNSTABLE_LOOP_EXIT);
> exit = (VMSideExit*)((GuardRecord*)exitIns->payload())->exit;
>
>@@ -2321,14 +2350,14 @@
> return false;
> }
>
>- JS_ASSERT(exit->numStackSlots == treeInfo->stackTypeMap.length());
>+ JS_ASSERT(exit->numStackSlots == treeInfo->stackSlots);
>
> peer_root = fragmento->getLoop(fragment->root->ip);
> JS_ASSERT(peer_root != NULL);
>- stable = deduceTypeStability(peer_root, &peer, demotes);
>+ stable = deduceTypeStability(peer_root, &peer, demote);
>
> #if DEBUG
>- if (!stable || NUM_UNDEMOTE_SLOTS(demotes))
>+ if (!stable)
> AUDIT(unstableLoopVariable);
> #endif
>
>@@ -2337,9 +2366,8 @@
> return false;
> }
>
>- if (stable && NUM_UNDEMOTE_SLOTS(demotes)) {
>+ if (stable && demote) {
> JS_ASSERT(fragment->kind == LoopTrace);
>- demote = true;
> return false;
> }
>
>@@ -2370,10 +2398,6 @@
> */
> if (walkedOutOfLoop())
> exit->ip_adj = terminate_ip_adj;
>-
>- /* If we were trying to stabilize a promotable tree, trash it. */
>- if (promotedPeer)
>- js_TrashTree(cx, promotedPeer);
> } else {
> JS_ASSERT(peer->code());
> exit->target = peer;
>@@ -2405,12 +2429,14 @@
> TraceRecorder::joinEdgesToEntry(Fragmento* fragmento, Fragment* peer_root)
> {
> if (fragment->kind == LoopTrace) {
>+ JSTraceMonitor* tm = traceMonitor;
> TreeInfo* ti;
> Fragment* peer;
> uint8* t1, *t2;
> UnstableExit* uexit, **unext;
>-
>- unsigned* demotes = (unsigned*)alloca(treeInfo->stackTypeMap.length() * sizeof(unsigned));
>+ uint32* stackDemotes = (uint32*)alloca(sizeof(uint32) * treeInfo->stackSlots);
>+ uint32* globalDemotes = (uint32*)alloca(sizeof(uint32) * treeInfo->globalSlots());
>+
> for (peer = peer_root; peer != NULL; peer = peer->peer) {
> if (!peer->code())
> continue;
>@@ -2428,20 +2454,34 @@
> This is actually faster than trashing the original tree as soon as the
> instability is detected, since we could have compiled a fairly stable
> tree that ran faster with integers. */
>- unsigned count = 0;
>- t1 = treeInfo->stackTypeMap.data();
>- t2 = getTypeMap(uexit->exit) + uexit->exit->numGlobalSlots;
>+ unsigned stackCount = 0;
>+ unsigned globalCount = 0;
>+ t1 = treeInfo->stackTypeMap();
>+ t2 = getStackTypeMap(uexit->exit);
> for (unsigned i = 0; i < uexit->exit->numStackSlots; i++) {
> if (t2[i] == JSVAL_INT && t1[i] == JSVAL_DOUBLE) {
>- demotes[count++] = i;
>+ stackDemotes[stackCount++] = i;
> } else if (t2[i] != t1[i]) {
>- count = 0;
>+ stackCount = 0;
> break;
> }
> }
>- if (count) {
>- for (unsigned i = 0; i < count; i++)
>- oracle.markStackSlotUndemotable(cx, demotes[i]);
>+ t1 = treeInfo->globalTypeMap();
>+ t2 = getGlobalTypeMap(uexit->exit);
>+ for (unsigned i = 0; i < uexit->exit->numGlobalSlots; i++) {
>+ if (t2[i] == JSVAL_INT && t1[i] == JSVAL_DOUBLE) {
>+ globalDemotes[globalCount++] = i;
>+ } else if (t2[i] != t1[i]) {
>+ globalCount = 0;
>+ stackCount = 0;
>+ break;
>+ }
>+ }
>+ if (stackCount || globalCount) {
>+ for (unsigned i = 0; i < stackCount; i++)
>+ oracle.markStackSlotUndemotable(cx, stackDemotes[i]);
>+ for (unsigned i = 0; i < globalCount; i++)
>+ oracle.markGlobalSlotUndemotable(cx, tm->globalSlots->data()[globalDemotes[i]]);
> JS_ASSERT(peer == uexit->fragment->root);
> if (fragment == peer)
> trashSelf = true;
>@@ -2541,8 +2581,8 @@
> LIns* args[] = { INS_CONSTPTR(inner), lirbuf->state }; /* reverse order */
> LIns* ret = lir->insCall(&js_CallTree_ci, args);
> /* Read back all registers, in case the called tree changed any of them. */
>- import(ti, inner_sp_ins, exit->numGlobalSlots, exit->calldepth,
>- getTypeMap(exit), getTypeMap(exit) + exit->numGlobalSlots);
>+ import(ti, inner_sp_ins, exit->numStackSlots, exit->numGlobalSlots,
>+ exit->calldepth, getFullTypeMap(exit));
> /* Restore sp and rp to their original values (we still have them in a register). */
> if (callDepth > 0) {
> lir->insStorei(lirbuf->sp, lirbuf->state, offsetof(InterpState, sp));
>@@ -2736,7 +2776,7 @@
>
> static JS_REQUIRES_STACK bool
> js_StartRecorder(JSContext* cx, VMSideExit* anchor, Fragment* f, TreeInfo* ti,
>- unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap,
>+ unsigned stackSlots, unsigned ngslots, uint8* typeMap,
> VMSideExit* expectedInnerExit, Fragment* outer)
> {
> JSTraceMonitor* tm = &JS_TRACE_MONITOR(cx);
>@@ -2752,7 +2792,7 @@
>
> /* start recording if no exception during construction */
> tm->recorder = new (&gc) TraceRecorder(cx, anchor, f, ti,
>- ngslots, globalTypeMap, stackTypeMap,
>+ stackSlots, ngslots, typeMap,
> expectedInnerExit, outer);
> if (cx->throwing) {
> js_AbortRecording(cx, "setting up recorder failed");
>@@ -2973,7 +3013,7 @@
> #endif
>
> JS_REQUIRES_STACK bool
>-js_RecordTree(JSContext* cx, JSTraceMonitor* tm, Fragment* f, Fragment* outer, unsigned* demotes)
>+js_RecordTree(JSContext* cx, JSTraceMonitor* tm, Fragment* f, Fragment* outer)
> {
> JS_ASSERT(f->root == f);
>
>@@ -2986,17 +3026,6 @@
> /* Make sure the global type map didn't change on us. */
> JSObject* globalObj = JS_GetGlobalForObject(cx, cx->fp->scopeChain);
> if (!js_CheckGlobalObjectShape(cx, tm, globalObj)) {
>- js_FlushJITCache(cx);
>- return false;
>- }
>- TypeMap current;
>- current.captureGlobalTypes(cx, *tm->globalSlots);
>- if (!current.matches(*tm->globalTypeMap)) {
>- debug_only_v(printf("Global type map mismatch in RecordTree, flushing cache.\n");)
>- debug_only_v(printf("Current global type map:\n");)
>- debug_only_v(js_dumpMap(current));
>- debug_only_v(printf("Cached global type map:\n");)
>- debug_only_v(js_dumpMap(*tm->globalTypeMap));
> js_FlushJITCache(cx);
> return false;
> }
>@@ -3030,18 +3059,10 @@
> /* setup the VM-private treeInfo structure for this fragment */
> TreeInfo* ti = new (&gc) TreeInfo(f);
>
>- /* capture the coerced type of each active slot in the stack type map */
>- ti->stackTypeMap.captureStackTypes(cx, 0/*callDepth*/);
>-
>- if (demotes) {
>- /* If we get a list of demotions, an outer tree is telling us our types are not callable. */
>- uint8* typeMap = ti->stackTypeMap.data();
>- for (unsigned i = 1; i <= NUM_UNDEMOTE_SLOTS(demotes); i++) {
>- JS_ASSERT(demotes[i] < ti->stackTypeMap.length());
>- if (typeMap[demotes[i]] == JSVAL_INT)
>- typeMap[demotes[i]] = JSVAL_DOUBLE;
>- }
>- }
>+ /* capture the coerced type of each active slot in the type map */
>+ SlotList& globalSlots = *tm->globalSlots;
>+ ti->typeMap.captureTypes(cx, globalSlots, 0/*callDepth*/);
>+ ti->stackSlots = ti->typeMap.length() - globalSlots.length();
>
> /* Check for duplicate entry type maps. This is always wrong and hints at trace explosion
> since we are trying to stabilize something without properly connecting peer edges. */
>@@ -3052,12 +3073,12 @@
> continue;
> ti_other = (TreeInfo*)peer->vmprivate;
> JS_ASSERT(ti_other);
>- JS_ASSERT(!ti->stackTypeMap.matches(ti_other->stackTypeMap));
>+ JS_ASSERT(!ti->typeMap.matches(ti_other->typeMap));
> }
> #endif
>
> /* determine the native frame layout at the entry point */
>- unsigned entryNativeStackSlots = ti->stackTypeMap.length();
>+ unsigned entryNativeStackSlots = ti->stackSlots;
> JS_ASSERT(entryNativeStackSlots == js_NativeStackSlots(cx, 0/*callDepth*/));
> ti->nativeStackBase = (entryNativeStackSlots -
> (cx->fp->regs->sp - StackBase(cx->fp))) * sizeof(double);
>@@ -3067,8 +3088,9 @@
>
> /* recording primary trace */
> if (!js_StartRecorder(cx, NULL, f, ti,
>- tm->globalSlots->length(), tm->globalTypeMap->data(),
>- ti->stackTypeMap.data(), NULL, outer)) {
>+ ti->stackSlots,
>+ tm->globalSlots->length(),
>+ ti->typeMap.data(), NULL, outer)) {
> return false;
> }
>
>@@ -3080,28 +3102,22 @@
> {
> JSTraceMonitor* tm = &JS_TRACE_MONITOR(cx);
> Fragment* from = exit->from->root;
>- unsigned* demotes;
>
> JS_ASSERT(exit->from->root->code());
>-
>- demotes = ALLOCA_UNDEMOTE_SLOTLIST(exit->numStackSlots);
>- CLEAR_UNDEMOTE_SLOTLIST(demotes);
>-
>- uint8* t2 = getTypeMap(exit) + exit->numGlobalSlots;
>+
>+ /* Make sure any doubles are not accidentally undemoted */
>+ uint8* m = getStackTypeMap(exit);
> for (unsigned i = 0; i < exit->numStackSlots; i++) {
>- if (t2[i] == JSVAL_DOUBLE)
>- ADD_UNDEMOTE_SLOT(demotes, i);
>- }
>-
>- if (!NUM_UNDEMOTE_SLOTS(demotes))
>- demotes = NULL;
>-
>- if (!js_RecordTree(cx, tm, from->first, outer, demotes))
>- return false;
>-
>- tm->recorder->setPromotedPeer(demotes ? from : NULL);
>-
>- return true;
>+ if (m[i] == JSVAL_DOUBLE)
>+ oracle.markStackSlotUndemotable(cx, i);
>+ }
>+ m = getGlobalTypeMap(exit);
>+ for (unsigned i = 0; i < exit->numGlobalSlots; i++) {
>+ if (m[i] == JSVAL_DOUBLE)
>+ oracle.markGlobalSlotUndemotable(cx, i);
>+ }
>+
>+ return js_RecordTree(cx, tm, from->first, outer);
> }
>
> static JS_REQUIRES_STACK bool
>@@ -3129,16 +3145,16 @@
> if (++c->hits() >= HOTEXIT) {
> /* start tracing secondary trace from this point */
> c->lirbuf = f->lirbuf;
>+ unsigned stackSlots;
> unsigned ngslots;
>- uint8* globalTypeMap;
>- uint8* stackTypeMap;
>+ uint8* typeMap;
> TypeMap fullMap;
> if (exitedFrom == NULL) {
> /* If we are coming straight from a simple side exit, just use that exit's type map
> as starting point. */
> ngslots = anchor->numGlobalSlots;
>- globalTypeMap = getTypeMap(anchor);
>- stackTypeMap = globalTypeMap + ngslots;
>+ stackSlots = anchor->numStackSlots;
>+ typeMap = getFullTypeMap(anchor);
> } else {
> /* If we side-exited on a loop exit and continue on a nesting guard, the nesting
> guard (anchor) has the type information for everything below the current scope,
>@@ -3146,14 +3162,15 @@
> scope (and whatever it inlined). We have to merge those maps here. */
> VMSideExit* e1 = anchor;
> VMSideExit* e2 = exitedFrom;
>- fullMap.add(getTypeMap(e1) + e1->numGlobalSlots, e1->numStackSlotsBelowCurrentFrame);
>- fullMap.add(getTypeMap(e2) + e2->numGlobalSlots, e2->numStackSlots);
>+ fullMap.add(getStackTypeMap(e1), e1->numStackSlotsBelowCurrentFrame);
>+ fullMap.add(getStackTypeMap(e2), e2->numStackSlots);
>+ stackSlots = fullMap.length();
>+ fullMap.add(getGlobalTypeMap(e2), e2->numGlobalSlots);
> ngslots = e2->numGlobalSlots;
>- globalTypeMap = getTypeMap(e2);
>- stackTypeMap = fullMap.data();
>+ typeMap = fullMap.data();
> }
>- return js_StartRecorder(cx, anchor, c, (TreeInfo*)f->vmprivate,
>- ngslots, globalTypeMap, stackTypeMap, exitedFrom, outer);
>+ return js_StartRecorder(cx, anchor, c, (TreeInfo*)f->vmprivate, stackSlots,
>+ ngslots, typeMap, exitedFrom, outer);
> }
> return false;
> }
>@@ -3185,10 +3202,7 @@
>
> bool demote;
> Fragment* f = r->getFragment();
>- TreeInfo* ti = r->getTreeInfo();
>- unsigned* demotes = ALLOCA_UNDEMOTE_SLOTLIST(ti->stackTypeMap.length());
>- r->closeLoop(fragmento, demote, demotes);
>- JS_ASSERT(!demote || NUM_UNDEMOTE_SLOTS(demotes));
>+ r->closeLoop(fragmento, demote);
> js_DeleteRecorder(cx);
>
> /*
>@@ -3196,7 +3210,7 @@
> * compiler again here since we didn't return to the loop header.
> */
> if (demote && !walkedOutOfLoop)
>- return js_RecordTree(cx, tm, f, NULL, demotes);
>+ return js_RecordTree(cx, tm, f, NULL);
> return false;
> }
>
>@@ -3248,16 +3262,10 @@
> Fragment* empty;
> bool trash = false;
> bool success = false;
>- unsigned* demotes = NULL;
>
> f = r->findNestedCompatiblePeer(f, &empty);
>- if (f && f->code()) {
>- TreeInfo* ti = (TreeInfo*)f->vmprivate;
>- /* alloca docs says it lasts out of scopes. */
>- demotes = ALLOCA_UNDEMOTE_SLOTLIST(ti->stackTypeMap.length());
>- CLEAR_UNDEMOTE_SLOTLIST(demotes);
>- success = r->adjustCallerTypes(f, demotes, trash);
>- }
>+ if (f && f->code())
>+ success = r->adjustCallerTypes(f, trash);
>
> if (!success) {
> AUDIT(noCompatInnerTrees);
>@@ -3277,7 +3285,7 @@
> if (old->recordAttempts < MAX_MISMATCH)
> old->resetHits();
> f = empty ? empty : tm->fragmento->getAnchor(cx->fp->regs->pc);
>- return js_RecordTree(cx, tm, f, old, demotes);
>+ return js_RecordTree(cx, tm, f, old);
> }
>
> r->prepareTreeCall(f);
>@@ -3383,16 +3391,6 @@
> tm = &JS_TRACE_MONITOR(cx);
> unsigned int ngslots = tm->globalSlots->length();
> uint16* gslots = tm->globalSlots->data();
>- uint8* m = tm->globalTypeMap->data();
>-
>- if (ngslots) {
>- FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
>- debug_only_v(printf("%s%d=", vpname, vpnum);)
>- if (!js_IsEntryTypeCompatible(vp, m))
>- return NULL;
>- m++;
>- );
>- }
>
> /* We keep a maximum tally - we want to select the peer most likely to work so we don't keep
> * recording.
>@@ -3409,11 +3407,15 @@
>
> unsigned demotes = 0;
> ti = (TreeInfo*)f->vmprivate;
>- m = ti->stackTypeMap.data();
>
> debug_only_v(printf("checking nested types %p: ", f);)
>
>- FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0,
>+ if (ngslots > ti->globalSlots())
>+ ti->typeMap.captureMissingGlobalTypes(cx, *tm->globalSlots, ti->stackSlots);
>+
>+ uint8* m = ti->typeMap.data();
>+
>+ FORALL_SLOTS(cx, ngslots, gslots, 0,
> debug_only_v(printf("%s%d=", vpname, vpnum);)
> if (!js_IsEntryTypeCompatible(vp, m))
> goto check_fail;
>@@ -3423,7 +3425,7 @@
> demotes++;
> m++;
> );
>- JS_ASSERT(unsigned(m - ti->stackTypeMap.data()) == ti->stackTypeMap.length());
>+ JS_ASSERT(unsigned(m - ti->typeMap.data()) == ti->typeMap.length());
>
> debug_only_v(printf(" (demotes %d)\n", demotes);)
>
>@@ -3463,25 +3465,25 @@
> tm = &JS_TRACE_MONITOR(cx);
> unsigned int ngslots = tm->globalSlots->length();
> uint16* gslots = tm->globalSlots->data();
>- uint8* m = tm->globalTypeMap->data();
>-
>- if (ngslots) {
>- FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
>- debug_only_v(printf("%s%d=", vpname, vpnum);)
>- if (!js_IsEntryTypeCompatible(vp, m))
>- goto check_fail;
>- m++;
>- );
>- }
>-
>- m = ti->stackTypeMap.data();
>- FORALL_SLOTS_IN_PENDING_FRAMES(cx, 0,
>+
>+ JS_ASSERT(ti->stackSlots == js_NativeStackSlots(cx, 0));
>+
>+ if (ngslots > ti->globalSlots())
>+ ti->typeMap.captureMissingGlobalTypes(cx, *tm->globalSlots, ti->stackSlots);
>+
>+ uint8* m = ti->typeMap.data();
>+
>+ JS_ASSERT(ti->typeMap.length() == js_NativeStackSlots(cx, 0) + ngslots);
>+ JS_ASSERT(ti->typeMap.length() == ti->stackSlots + ngslots);
>+ JS_ASSERT(ti->globalSlots() == ngslots);
>+ FORALL_SLOTS(cx, ngslots, gslots, 0,
> debug_only_v(printf("%s%d=", vpname, vpnum);)
>+ JS_ASSERT(*m != 0xCD);
> if (!js_IsEntryTypeCompatible(vp, m))
> goto check_fail;
> m++;
> );
>- JS_ASSERT(unsigned(m - ti->stackTypeMap.data()) == ti->stackTypeMap.length());
>+ JS_ASSERT(unsigned(m - ti->typeMap.data()) == ti->typeMap.length());
>
> debug_only_v(printf("\n");)
> return true;
>@@ -3548,9 +3550,11 @@
> ti->maxNativeStackSlots,
> f->code());)
>
>- if (ngslots)
>- BuildNativeGlobalFrame(cx, ngslots, gslots, tm->globalTypeMap->data(), global);
>- BuildNativeStackFrame(cx, 0/*callDepth*/, ti->stackTypeMap.data(), stack);
>+ JS_ASSERT(ti->globalSlots() == ngslots);
>+
>+ if (ngslots)
>+ BuildNativeGlobalFrame(cx, ngslots, gslots, ti->globalTypeMap(), global);
>+ BuildNativeStackFrame(cx, 0/*callDepth*/, ti->typeMap.data(), stack);
>
> double* entry_sp = &stack[ti->nativeStackBase/sizeof(double)];
> FrameInfo callstack_buffer[MAX_CALL_STACK_ENTRIES];
>@@ -3717,17 +3721,17 @@
> cycles));
>
> /* If this trace is part of a tree, later branches might have added additional globals for
>- with we don't have any type information available in the side exit. We merge in this
>+ which we don't have any type information available in the side exit. We merge in this
> information from the entry type-map. See also comment in the constructor of TraceRecorder
> why this is always safe to do. */
> unsigned exit_gslots = innermost->numGlobalSlots;
>- JS_ASSERT(ngslots == tm->globalTypeMap->length());
>+ JS_ASSERT(ngslots == ti->globalSlots());
> JS_ASSERT(ngslots >= exit_gslots);
>- uint8* globalTypeMap = getTypeMap(innermost);
>+ uint8* globalTypeMap = getGlobalTypeMap(innermost);
> if (exit_gslots < ngslots)
>- mergeTypeMaps(&globalTypeMap, &exit_gslots, tm->globalTypeMap->data(), ngslots,
>+ mergeTypeMaps(&globalTypeMap, &exit_gslots, ti->globalTypeMap(), ngslots,
> (uint8*)alloca(sizeof(uint8) * ngslots));
>- JS_ASSERT(exit_gslots == tm->globalTypeMap->length());
>+ JS_ASSERT(exit_gslots == ti->globalSlots());
>
> /* write back interned globals */
> int slots = FlushNativeGlobalFrame(cx, exit_gslots, gslots, globalTypeMap, global);
>@@ -3738,7 +3742,7 @@
>
> /* write back native stack frame */
> slots = FlushNativeStackFrame(cx, innermost->calldepth,
>- getTypeMap(innermost) + innermost->numGlobalSlots,
>+ getStackTypeMap(innermost),
> stack, NULL);
> if (slots < 0)
> return NULL;
>@@ -3796,7 +3800,7 @@
> if (++f->hits() >= HOTLOOP) {
> /* We can give RecordTree the root peer. If that peer is already taken, it will
> walk the peer list and find us a free slot or allocate a new tree if needed. */
>- return js_RecordTree(cx, tm, f->first, NULL, NULL);
>+ return js_RecordTree(cx, tm, f->first, NULL);
> }
> /* Threshold not reached yet. */
> return false;
>@@ -4022,12 +4026,11 @@
> }
> #endif
> if (!tm->fragmento) {
>- JS_ASSERT(!tm->globalSlots && !tm->globalTypeMap && !tm->recoveryDoublePool);
>+ JS_ASSERT(!tm->globalSlots && !tm->recoveryDoublePool);
> Fragmento* fragmento = new (&gc) Fragmento(core, 24);
> verbose_only(fragmento->labels = new (&gc) LabelMap(core, NULL);)
> tm->fragmento = fragmento;
> tm->globalSlots = new (&gc) SlotList();
>- tm->globalTypeMap = new (&gc) TypeMap();
> tm->recoveryDoublePoolPtr = tm->recoveryDoublePool = new jsval[MAX_NATIVE_STACK_SLOTS];
> }
> if (!tm->reFragmento) {
>@@ -4059,14 +4062,12 @@
> }
> #endif
> if (tm->fragmento != NULL) {
>- JS_ASSERT(tm->globalSlots && tm->globalTypeMap && tm->recoveryDoublePool);
>+ JS_ASSERT(tm->globalSlots && tm->recoveryDoublePool);
> verbose_only(delete tm->fragmento->labels;)
> delete tm->fragmento;
> tm->fragmento = NULL;
> delete tm->globalSlots;
> tm->globalSlots = NULL;
>- delete tm->globalTypeMap;
>- tm->globalTypeMap = NULL;
> delete[] tm->recoveryDoublePool;
> tm->recoveryDoublePool = tm->recoveryDoublePoolPtr = NULL;
> }
>@@ -4133,7 +4134,6 @@
> if (cx->fp) {
> tm->globalShape = OBJ_SHAPE(JS_GetGlobalForObject(cx, cx->fp->scopeChain));
> tm->globalSlots->clear();
>- tm->globalTypeMap->clear();
> }
> }
>
>@@ -8612,20 +8612,24 @@
> printf("fragment %p:\nENTRY: ", f);
> ti = (TreeInfo*)f->vmprivate;
> if (looped)
>- JS_ASSERT(ti->stackTypeMap.length() == length);
>- for (unsigned i = 0; i < ti->stackTypeMap.length(); i++)
>- printf("%d ", ti->stackTypeMap.data()[i]);
>+ JS_ASSERT(ti->typeMap.length() == length);
>+ for (unsigned i = 0; i < ti->stackSlots; i++)
>+ printf("S%d ", ti->stackTypeMap()[i]);
>+ for (unsigned i = 0; i < ti->globalSlots(); i++)
>+ printf("G%d ", ti->globalTypeMap()[i]);
> printf("\n");
> UnstableExit* uexit = ti->unstableExits;
> while (uexit != NULL) {
> printf("EXIT: ");
>- uint8* m = getTypeMap(uexit->exit) + uexit->exit->numGlobalSlots;
>+ uint8* m = getFullTypeMap(uexit->exit);
> for (unsigned i = 0; i < uexit->exit->numStackSlots; i++)
>- printf("%d ", m[i]);
>+ printf("S%d ", m[i]);
>+ for (unsigned i = 0; i < uexit->exit->numGlobalSlots; i++)
>+ printf("G%d ", m[uexit->exit->numStackSlots + i]);
> printf("\n");
> uexit = uexit->next;
> }
>- length = ti->stackTypeMap.length();
>+ length = ti->typeMap.length();
> looped = true;
> }
> }
>diff -r d0e8862aa513 js/src/jstracer.h
>--- a/js/src/jstracer.h Mon Dec 29 23:22:23 2008 -0800
>+++ b/js/src/jstracer.h Tue Dec 30 22:42:23 2008 -0500
>@@ -66,6 +66,9 @@
> while (_max < size)
> _max <<= 1;
> _data = (T*)realloc(_data, _max * sizeof(T));
>+#if defined(DEBUG)
>+ memset(&_data[_len], 0xcd, _max - _len);
>+#endif
> }
> public:
> Queue(unsigned max = 16) {
>@@ -173,8 +176,10 @@
>
> class TypeMap : public Queue<uint8> {
> public:
>- JS_REQUIRES_STACK void captureGlobalTypes(JSContext* cx, SlotList& slots);
>- JS_REQUIRES_STACK void captureStackTypes(JSContext* cx, unsigned callDepth);
>+ JS_REQUIRES_STACK void captureTypes(JSContext* cx, SlotList& slots, unsigned callDepth);
>+ JS_REQUIRES_STACK void captureMissingGlobalTypes(JSContext* cx,
>+ SlotList& slots,
>+ unsigned stackSlots);
Could be fewer lines too.
> bool matches(TypeMap& other) const;
> };
>
>@@ -202,9 +207,19 @@
> ExitType exitType;
> };
>
>-static inline uint8* getTypeMap(nanojit::SideExit* exit)
>+static inline uint8* getStackTypeMap(nanojit::SideExit* exit)
> {
> return (uint8*)(((VMSideExit*)exit) + 1);
>+}
>+
>+static inline uint8* getGlobalTypeMap(nanojit::SideExit* exit)
>+{
>+ return getStackTypeMap(exit) + ((VMSideExit*)exit)->numStackSlots;
>+}
>+
>+static inline uint8* getFullTypeMap(nanojit::SideExit* exit)
>+{
>+ return getStackTypeMap(exit);
> }
>
> struct InterpState
>@@ -235,7 +250,8 @@
> unsigned maxNativeStackSlots;
> ptrdiff_t nativeStackBase;
> unsigned maxCallDepth;
>- TypeMap stackTypeMap;
>+ TypeMap typeMap;
>+ unsigned stackSlots;
> Queue<nanojit::Fragment*> dependentTrees;
> unsigned branchCount;
> Queue<VMSideExit*> sideExits;
>@@ -245,6 +261,16 @@
> fragment = _fragment;
> }
> ~TreeInfo();
>+
>+ inline unsigned globalSlots() {
>+ return typeMap.length() - stackSlots;
>+ }
>+ inline uint8* globalTypeMap() {
>+ return typeMap.data() + stackSlots;
>+ }
>+ inline uint8* stackTypeMap() {
>+ return typeMap.data();
>+ }
> };
>
> struct FrameInfo {
>@@ -304,7 +330,6 @@
> bool terminate;
> intptr_t terminate_ip_adj;
> nanojit::Fragment* outerToBlacklist;
>- nanojit::Fragment* promotedPeer;
> TraceRecorder* nextRecorderToAbort;
> bool wasRootFragment;
>
>@@ -313,8 +338,8 @@
> JS_REQUIRES_STACK ptrdiff_t nativeStackOffset(jsval* p) const;
> JS_REQUIRES_STACK void import(nanojit::LIns* base, ptrdiff_t offset, jsval* p, uint8& t,
> const char *prefix, uintN index, JSStackFrame *fp);
>- JS_REQUIRES_STACK void import(TreeInfo* treeInfo, nanojit::LIns* sp, unsigned ngslots,
>- unsigned callDepth, uint8* globalTypeMap, uint8* stackTypeMap);
>+ JS_REQUIRES_STACK void import(TreeInfo* treeInfo, nanojit::LIns* sp, unsigned stackSlots,
>+ unsigned callDepth, unsigned ngslots, uint8* typeMap);
> void trackNativeStackUse(unsigned slots);
>
> JS_REQUIRES_STACK bool lazilyImportGlobalSlot(unsigned slot);
>@@ -332,7 +357,8 @@
> JS_REQUIRES_STACK bool checkType(jsval& v, uint8 t, jsval*& stage_val,
> nanojit::LIns*& stage_ins, unsigned& stage_count);
> JS_REQUIRES_STACK bool deduceTypeStability(nanojit::Fragment* root_peer,
>- nanojit::Fragment** stable_peer, unsigned* demotes);
>+ nanojit::Fragment** stable_peer,
>+ bool& demote);
Fits into the same line.
>
> JS_REQUIRES_STACK jsval& argval(unsigned n) const;
> JS_REQUIRES_STACK jsval& varval(unsigned n) const;
>@@ -437,7 +463,7 @@
> public:
> JS_REQUIRES_STACK
> TraceRecorder(JSContext* cx, VMSideExit*, nanojit::Fragment*, TreeInfo*,
>- unsigned ngslots, uint8* globalTypeMap, uint8* stackTypeMap,
>+ unsigned stackSlots, unsigned ngslots, uint8* typeMap,
> VMSideExit* expectedInnerExit, nanojit::Fragment* outerToBlacklist);
> ~TraceRecorder();
>
>@@ -448,14 +474,12 @@
> nanojit::Fragment* getFragment() const { return fragment; }
> JS_REQUIRES_STACK bool isLoopHeader(JSContext* cx) const;
> JS_REQUIRES_STACK void compile(nanojit::Fragmento* fragmento);
>- JS_REQUIRES_STACK bool closeLoop(nanojit::Fragmento* fragmento, bool& demote,
>- unsigned *demotes);
>+ JS_REQUIRES_STACK bool closeLoop(nanojit::Fragmento* fragmento, bool& demote);
> JS_REQUIRES_STACK void endLoop(nanojit::Fragmento* fragmento);
> JS_REQUIRES_STACK void joinEdgesToEntry(nanojit::Fragmento* fragmento,
> nanojit::Fragment* peer_root);
> void blacklist() { fragment->blacklist(); }
>- JS_REQUIRES_STACK bool adjustCallerTypes(nanojit::Fragment* f, unsigned* demote_slots,
>- bool& trash);
>+ JS_REQUIRES_STACK bool adjustCallerTypes(nanojit::Fragment* f, bool& trash);
> JS_REQUIRES_STACK nanojit::Fragment* findNestedCompatiblePeer(nanojit::Fragment* f,
> nanojit::Fragment** empty);
> JS_REQUIRES_STACK void prepareTreeCall(nanojit::Fragment* inner);
>@@ -477,7 +501,6 @@
> void deepAbort() { deepAborted = true; }
> bool wasDeepAborted() { return deepAborted; }
> bool walkedOutOfLoop() { return terminate; }
>- void setPromotedPeer(nanojit::Fragment* peer) { promotedPeer = peer; }
> TreeInfo* getTreeInfo() { return treeInfo; }
>
> #define OPDEF(op,val,name,token,length,nuses,ndefs,prec,format) \
Attachment #354911 -
Flags: review+
fasta gets 10ms faster.
fannkuch gets 10ms slower.
bench gets 100ms slower on my machine, but time.sh doesn't show any differences that add up to such a big penalty.
someone else might want to give sunspider a run with this patch, I don't have any other computers to play with for a few days.
Attachment #354911 -
Attachment is obsolete: true
Assignee | ||
Comment 9•16 years ago
|
||
When run consecutively in one test run, I get the following results without the patch:
207:src gal$ ./bench2.sh
t/3d-cube.js 44
t/3d-morph.js 31
t/3d-raytrace.js 83
t/access-binary-trees.js 40
t/access-fannkuch.js 57
t/access-nbody.js 30
t/access-nsieve.js 13
t/bitops-3bit-bits-in-byte.js 2
t/bitops-bits-in-byte.js 8
t/bitops-bitwise-and.js 2
t/bitops-nsieve-bits.js 26
t/controlflow-recursive.js 33
t/crypto-aes.js 39
t/crypto-md5.js 20
t/crypto-sha1.js 6
t/date-format-tofte.js 119
t/date-format-xparb.js 103
t/math-cordic.js 19
t/math-partial-sums.js 15
t/math-spectral-norm.js 8
t/regexp-dna.js 77
t/string-base64.js 17
t/string-fasta.js 77
t/string-tagcloud.js 119
t/string-unpack-code.js 135
t/string-validate-input.js 37
And this is with:
207:src gal$ ./bench2.sh
t/3d-cube.js 48
t/3d-morph.js 31
t/3d-raytrace.js 84
t/access-binary-trees.js 40
t/access-fannkuch.js 56
t/access-nbody.js 24
t/access-nsieve.js 12
t/bitops-3bit-bits-in-byte.js 2
t/bitops-bits-in-byte.js 13
t/bitops-bitwise-and.js 3
t/bitops-nsieve-bits.js 25
t/controlflow-recursive.js 35
t/crypto-aes.js 40
t/crypto-md5.js 19
t/crypto-sha1.js 9
t/date-format-tofte.js 121
t/date-format-xparb.js 102
t/math-cordic.js 24
t/math-partial-sums.js 15
t/math-spectral-norm.js 7
t/regexp-dna.js 76
t/string-base64.js 16
t/string-fasta.js 141
t/string-tagcloud.js 118
t/string-unpack-code.js 129
t/string-validate-input.js 38
fasta gets 70ms slower. It doesn't happen when fasta is run separately.
Assignee | ||
Comment 10•16 years ago
|
||
This is the modified bench script I used:
#!/bin/bash
X="var d = Date.now();";
for i in t/*.js; do X="$X load(\"$i\"); var d2 = Date.now(); print(\"$i \" + (d2 - d)); d = d2; "; done
# X="$X print(Date.now() - d);"
echo $X | (./Darwin_OPT.OBJ/js -j || ./Linux_All_OPT.OBJ/js -j)
Assignee | ||
Comment 11•16 years ago
|
||
This is not a gc issue. Triggering a gc in between each load does not fix the performance regression observed in fasta.
t/3d-cube.js 46
before 1167360, after 45056, break 01800000
t/3d-morph.js 39
before 1740800, after 49152, break 01800000
t/3d-raytrace.js 87
before 1376256, after 65536, break 01800000
t/access-binary-trees.js 43
before 1454080, after 77824, break 01800000
t/access-fannkuch.js 65
before 81920, after 77824, break 01800000
t/access-nbody.js 32
before 2830336, after 81920, break 01800000
t/access-nsieve.js 14
before 81920, after 81920, break 01800000
t/bitops-3bit-bits-in-byte.js 2
before 81920, after 81920, break 01800000
t/bitops-bits-in-byte.js 8
before 81920, after 81920, break 01800000
t/bitops-bitwise-and.js 3
before 81920, after 81920, break 01800000
t/bitops-nsieve-bits.js 25
before 1302528, after 86016, break 01800000
t/controlflow-recursive.js 42
before 86016, after 86016, break 01800000
t/crypto-aes.js 46
before 815104, after 90112, break 01800000
t/crypto-md5.js 29
before 118784, after 98304, break 01800000
t/crypto-sha1.js 7
before 196608, after 102400, break 01800000
t/date-format-tofte.js 117
before 1613824, after 106496, break 01800000
t/date-format-xparb.js 107
before 1970176, after 110592, break 01800000
t/math-cordic.js 32
before 110592, after 106496, break 01800000
t/math-partial-sums.js 17
before 106496, after 106496, break 01800000
t/math-spectral-norm.js 7
before 122880, after 110592, break 01800000
t/regexp-dna.js 80
before 110652, after 110592, break 01800000
t/string-base64.js 18
before 692224, after 114688, break 01800000
t/string-fasta.js 143
before 3666028, after 118784, break 01800000
t/string-tagcloud.js 126
before 1097792, after 126976, break 01800000
t/string-unpack-code.js 146
before 1159168, after 139264, break 01800000
t/string-validate-input.js 47
before 929792, after 143360, break 01800000
Assignee | ||
Comment 13•16 years ago
|
||
Patch is out of date again. Will hunt down danderson and re-test this with the most recent ebp perf issue fixed.
Assignee | ||
Comment 14•16 years ago
|
||
I tried to refresh but the patch also fails trace-tests.js now.
Updated•16 years ago
|
Flags: blocking1.9.1?
Refreshed again. Also fixed a bug where when extending a branch, the TreeInfo didn't get its type map extended for missing globals.
I still get about a 100ms slowdown on my laptop when running SS through bench.sh or in the browser, but not when running tests individually. This might suggest that globals are poisoning subsequent tests somehow.
Also, it might be a good idea to introduce a peer cap alongside (or after) this patch. Somewhere in trace-tests (mandlebrot, probably) I noticed a huge amount of spew from just trying to find compatible peers in lists containing 10+ fragments. Yuck.
Attachment #355141 -
Attachment is obsolete: true
I narrowed the regression down to string-fasta running after regexp-dna. The side exits increase from 79 to around 87,000.
fasta has a main work loop (actually, a nested loop) that bangs on a table each iteration. This main loop is run twice, each time with a different input table. The tables have different shapes.
Before this patch, the second run of the main loop would hit a global type map mismatch, flush the cache, and the loop would be recompiled with the new shape. This patch removes the cache flushing and the loop never gets recompiled. Every single iteration hits a shape mismatch guard, causing the perf regression.
Changing the shape guards to BRANCH_EXIT instead of MISMATCH_EXIT brings the perf difference to a ~20ms loss.
Updated•16 years ago
|
Attachment #356388 -
Flags: review?(gal)
Assignee | ||
Updated•16 years ago
|
Attachment #356388 -
Flags: review?(gal) → review+
Assignee | ||
Comment 18•16 years ago
|
||
This patch is almost ready. We hit a chain of new bugs along the way. Those are fixed. Currently being tested. As long the other stuff goes green this will go in next.
Running mochitests exposed a bug where unstable exits were not joined to peers if the exit had an incomplete global typemap. This patch extends js_AttemptToStabilizeTree to check for and resolve this condition.
Mochitests passes now. I'll re-run sunspider in the browser tomorrow to see what the current damage is.
Attachment #356388 -
Attachment is obsolete: true
Attachment #356693 -
Flags: review?(gal)
Assignee | ||
Comment 20•16 years ago
|
||
Comment on attachment 356693 [details] [diff] [review]
v8, fixes multitrees regression
I see 20ms perf loss.
Attachment #356693 -
Flags: review?(gal) → review+
Comment 21•16 years ago
|
||
(function() {
for (var j = 0; j < 4; ++j) {
for each (e in ['A', 1, 'A']) {
}
}
})();
Assertion failure: !ti->typeMap.matches(ti_other->typeMap), at ../jstracer.cpp:3075
Comment 22•16 years ago
|
||
I think this interacts with bug 472528 in bad ways.
Comment 23•16 years ago
|
||
var g;
(function() {
for (let j = 0; j < 4; ++j) {
g = 64;
for each (let a in [.3, .3]) {
g /= 2;
print("" + (g));
}
}
})();
On the last iteration of the outer loop, outputs "0" twice instead of "32" and "16". I can't figure out how to create a testcase that doesn't use print() :(
Nice, looks like there's more work to do here.
This was another multitrees regression, this time from oracle misuse. Still working on the correctness bug in comment #23.
Attachment #356693 -
Attachment is obsolete: true
Assignee | ||
Comment 26•16 years ago
|
||
David is work on the fuzzer bugs, I am taking a look at the perf loss.
Assignee | ||
Comment 27•16 years ago
|
||
TEST COMPARISON FROM TO DETAILS
=============================================================================
** TOTAL **: *1.016x as slow* 1063.5ms +/- 0.3% 1080.0ms +/- 0.2% significant
=============================================================================
3d: *1.045x as slow* 150.4ms +/- 0.7% 157.2ms +/- 0.6% significant
cube: ?? 40.4ms +/- 2.8% 41.7ms +/- 2.3% not conclusive: might be *1.032x as slow*
morph: - 30.4ms +/- 1.2% 30.2ms +/- 1.0%
raytrace: *1.072x as slow* 79.6ms +/- 0.6% 85.3ms +/- 0.4% significant
access: ?? 132.1ms +/- 0.4% 132.6ms +/- 0.6% not conclusive: might be *1.004x as slow*
binary-trees: - 40.5ms +/- 0.9% 40.2ms +/- 0.7%
fannkuch: *1.009x as slow* 55.4ms +/- 0.7% 55.9ms +/- 0.4% significant
nbody: ?? 23.7ms +/- 1.5% 24.1ms +/- 0.9% not conclusive: might be *1.017x as slow*
nsieve: - 12.5ms +/- 3.0% 12.4ms +/- 4.0%
bitops: *1.033x as slow* 36.9ms +/- 1.4% 38.1ms +/- 1.9% significant
3bit-bits-in-byte: ?? 1.4ms +/- 26.4% 1.6ms +/- 23.1% not conclusive: might be *1.143x as slow*
bits-in-byte: *1.135x as slow* 9.6ms +/- 3.8% 10.9ms +/- 2.1% significant
bitwise-and: - 2.0ms +/- 0.0% 2.0ms +/- 0.0%
nsieve-bits: - 23.9ms +/- 1.7% 23.6ms +/- 1.6%
controlflow: ?? 32.6ms +/- 1.1% 32.9ms +/- 0.7% not conclusive: might be *1.009x as slow*
recursive: ?? 32.6ms +/- 1.1% 32.9ms +/- 0.7% not conclusive: might be *1.009x as slow*
crypto: *1.015x as slow* 60.4ms +/- 0.8% 61.3ms +/- 0.6% significant
aes: *1.020x as slow* 34.6ms +/- 1.1% 35.3ms +/- 1.0% significant
md5: ?? 19.1ms +/- 1.2% 19.4ms +/- 1.9% not conclusive: might be *1.016x as slow*
sha1: - 6.7ms +/- 5.2% 6.6ms +/- 5.6%
date: ?? 202.4ms +/- 0.9% 204.2ms +/- 0.3% not conclusive: might be *1.009x as slow*
format-tofte: *1.019x as slow* 99.5ms +/- 0.5% 101.4ms +/- 0.4% significant
format-xparb: - 102.9ms +/- 1.5% 102.8ms +/- 0.3%
math: - 39.8ms +/- 1.1% 39.5ms +/- 1.3%
cordic: *1.021x as slow* 19.1ms +/- 1.2% 19.5ms +/- 1.9% significant
partial-sums: - 13.8ms +/- 2.2% 13.8ms +/- 2.2%
spectral-norm: 1.113x as fast 6.9ms +/- 3.3% 6.2ms +/- 4.9% significant
regexp: ?? 55.2ms +/- 0.5% 55.4ms +/- 0.7% not conclusive: might be *1.004x as slow*
dna: ?? 55.2ms +/- 0.5% 55.4ms +/- 0.7% not conclusive: might be *1.004x as slow*
string: *1.014x as slow* 353.7ms +/- 0.3% 358.8ms +/- 0.3% significant
base64: ?? 15.5ms +/- 2.4% 15.7ms +/- 2.2% not conclusive: might be *1.013x as slow*
fasta: *1.023x as slow* 70.9ms +/- 0.3% 72.5ms +/- 0.5% significant
tagcloud: ?? 105.8ms +/- 1.1% 106.4ms +/- 0.3% not conclusive: might be *1.006x as slow*
unpack-code: *1.014x as slow* 130.4ms +/- 0.4% 132.2ms +/- 0.3% significant
validate-input: *1.029x as slow* 31.1ms +/- 0.7% 32.0ms +/- 0.0% significant
Assignee | ||
Comment 28•16 years ago
|
||
Raytrace is the only concrete slowdown I can observe after repeated runs. The rest is pretty noisy. I will look into this one.
raytrace: *1.073x as slow* 79.6ms +/- 0.6% 85.4ms +/- 0.6% significant
Assignee | ||
Comment 29•16 years ago
|
||
Well this probably explains it:
whale:src gal$ ./Darwin_DBG.OBJ/js -j t/3d-raytrace.js
Assertion failure: !ti->typeMap.matches(ti_other->typeMap), at ../jstracer.cpp:3075
comment #23 exposed that adjustCallerTypes was reading from the wrong typemap for globals.
comment #29 was caused by an aggressive check in js_AttemptStabilizeTree that ended up being wrong.
Attachment #357028 -
Attachment is obsolete: true
The code to build nested trees was busted for cases where a global integer slot needed to be a double.
I don't get any regressions using time.sh now, but bench.sh is once again not liking this patch so I will reduce the suite to see what's causing problems.
Attachment #357085 -
Attachment is obsolete: true
Updated•16 years ago
|
Priority: -- → P1
Assignee | ||
Comment 32•16 years ago
|
||
fasta is the culprit.
fasta: *3.55x as slow* 70.9ms +/- 0.3% 251.6ms +/- 0.4% significant
Assignee | ||
Comment 33•16 years ago
|
||
fasta went from:
recorder: started(12), aborted(2), completed(17), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(2), breaks(3), returns(0), unstableInnerCalls(1)
monitor: triggered(77), exits(77), type mismatch(0), global mismatch(3)
to:
recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5)
monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3)
Looks like we abort a bunch of traces and execute less in native code.
Assignee | ||
Comment 34•16 years ago
|
||
All the other scores look good. Its just fasta thats broken atm as far as I can tell.
Comment 35•16 years ago
|
||
(In reply to comment #34)
> All the other scores look good. Its just fasta thats broken atm as far as I can
> tell.
I'm confused. Attachment 357175 [details] [diff] (v11) claims to be fast on fasta again. Is this not the case?
Assignee | ||
Comment 36•16 years ago
|
||
for #35: We fixed a mild regression in fasta (20ms), exposing a hefty regression (300ms). One step ahead, one mile back. Working on it ...
Assignee | ||
Comment 37•16 years ago
|
||
This is the minimum code to trigger the slowdown:
load("t/regexp-dna.js");
load("t/string-fasta.js");
Commenting out loading regexp makes fasta run in 70ms. With regexp its 260ms.
Assignee | ||
Comment 38•16 years ago
|
||
var seqs = [1,2,3,4,5];
for (j in seqs)
;
This code in front of fasta triggers the slowdown.
Assignee | ||
Comment 39•16 years ago
|
||
This seems to be sufficient as well:
for (x in [])
;
Assignee | ||
Comment 40•16 years ago
|
||
Debug build doesn't seem to show the slowdown, but we compile more with the for-in loop that without (despite the empty loop).
whale:src gal$ ./Darwin_DBG.OBJ/js -j x.js
128
recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5)
monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3)
whale:src gal$ ./Darwin_DBG.OBJ/js -j x.js
131
recorder: started(17), aborted(7), completed(24), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(6), returns(0), unstableInnerCalls(5)
monitor: triggered(31), exits(31), type mismatch(0), global mismatch(2)
Assignee | ||
Comment 41•16 years ago
|
||
The following line in front of fasta is sufficient:
StopIteration;
Assignee | ||
Comment 42•16 years ago
|
||
Attachment #357876 -
Attachment is obsolete: true
Assignee | ||
Comment 43•16 years ago
|
||
Scary stuff. In OPT we hit js_ExecuteTree 56000 times with the 'StopIteration' in front of fasta. In DBG this doesn't happen.
whale:Darwin_OPT.OBJ gal$ ./js -j -e 'StopIteration;' ../t/string-fasta.js
execute trees: 56000
whale:Darwin_OPT.OBJ gal$ ./js -j ../t/string-fasta.js
execute trees: 28
whale:Darwin_DBG.OBJ gal$ ./js -j -e 'StopIteration;' ../t/string-fasta.js
execute trees: 31
recorder: started(17), aborted(7), completed(24), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(6), returns(0), unstableInnerCalls(5)
monitor: triggered(31), exits(31), type mismatch(0), global mismatch(2)
whale:Darwin_DBG.OBJ gal$ ./js -j ../t/string-fasta.js
execute trees: 28
recorder: started(16), aborted(7), completed(23), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(3), breaks(5), returns(0), unstableInnerCalls(5)
monitor: triggered(28), exits(28), type mismatch(0), global mismatch(3)
Assignee | ||
Comment 44•16 years ago
|
||
./js -j -e 'StopIteration;' ../t/string-fasta.js && ./js -j ../t/string-fasta.js
Counting # of times deduceTypeStability sets demote to false (debug):
18
18
Same for opt:
36
18
Very sketchy.
Assignee | ||
Comment 45•16 years ago
|
||
This is the case that slows down run on OPT vs DBG:
--- /tmp/2 2009-01-20 19:24:38.000000000 -0800
+++ /tmp/1 2009-01-20 19:24:34.000000000 -0800
@@ -1,64 +1,178 @@
-recording starting from ../t/string-fasta.js:47@14, global shape=188
+recording starting from ../t/string-fasta.js:47@14, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:48@42, global shape=188
+recording starting from ../t/string-fasta.js:48@42, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:47@14, global shape=188
+recording starting from ../t/string-fasta.js:47@14, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:48@42, global shape=188
+recording starting from ../t/string-fasta.js:48@42, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:47@20, global shape=188
+recording starting from ../t/string-fasta.js:47@20, global shape=179
Abort.
-recording starting from ../t/string-fasta.js:38@13, global shape=188
+recording starting from ../t/string-fasta.js:38@13, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:67@76, global shape=188
+recording starting from ../t/string-fasta.js:67@76, global shape=179
EndLoop
-recording starting from ../t/string-fasta.js:66@57, global shape=188
+recording starting from ../t/string-fasta.js:66@57, global shape=179
Abort.
-recording starting from ../t/string-fasta.js:67@76, global shape=188
+recording starting from ../t/string-fasta.js:67@76, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:68@89, global shape=188
+recording starting from ../t/string-fasta.js:68@89, global shape=179
closeLoop demote=0
-recording starting from ../t/string-fasta.js:66@57, global shape=188
+recording starting from ../t/string-fasta.js:66@57, global shape=179
Abort.
Here the two go out of sync. One records line 68, the other line 67.
-recording starting from ../t/string-fasta.js:68@89, global shape=188
+recording starting from ../t/string-fasta.js:67@76, global shape=179
+closeLoop demote=0
+recording starting from ../t/string-fasta.js:66@57, global shape=179
+Abort.
+recording starting from ../t/string-fasta.js:67@76, global shape=179
EndLoop
67: for (var c in table) {
68: if (r < table[c]) {
line[i] = c;
break;
}
}
It seems the OPT build exits on a branch exit, whereas DEBUG compiles a loop.
Assignee | ||
Comment 46•16 years ago
|
||
entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aea0
leaving trace at ../t/string-fasta.js:48@42, op=, lr=0x24920c, exitType=0, sp=2, calldepth=0
entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aba0
leaving trace at ../t/string-fasta.js:48@42, op=, lr=0x249df0, exitType=0, sp=2, calldepth=0
entering trace at ../t/string-fasta.js:47@14, native stack slots: 14 code: 0x24aba0
leaving trace at ../t/string-fasta.js:47@20, op=, lr=0x249cdc, exitType=0, sp=2, calldepth=0
entering trace at ../t/string-fasta.js:38@13, native stack slots: 13 code: 0x24a7c0
leaving trace at ../t/string-fasta.js:40@48, op=, lr=0x47e664, exitType=3, sp=3, calldepth=0
entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0
leaving trace at ../t/string-fasta.js:68@89, op=, lr=0x47ea24, exitType=0, sp=4, calldepth=0
entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0
leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x47eb5c, exitType=1, sp=2, calldepth=0
entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a6b0
leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x47eb5c, exitType=1, sp=2, calldepth=0
entering trace at ../t/string-fasta.js:67@76, native stack slots: 14 code: 0x24a520
leaving trace at ../t/string-fasta.js:68@89, op=, lr=0x47d31c, exitType=0, sp=4, calldepth=0
The OPT build exists with this side exit instead:
leaving trace at ../t/string-fasta.js:70@111, op=, lr=0x1d8b5c, exitType=1, sp=2, calldepth=0
Assignee | ||
Comment 47•16 years ago
|
||
00000: defvar "last"
00003: defvar "A"
00006: defvar "C"
00009: defvar "M"
00012: deffun function rand(max) {last = (last * A + C) % M;return max * last / M;}
00015: defvar "ALU"
00018: defvar "IUB"
00021: defvar "HomoSap"
00024: deffun function makeCumulative(table) {var last = null;for (var c in table) {if (last) {table[c] += table[last];}last = c;}}
00027: deffun function fastaRepeat(n, seq) {var seqi = 0, lenOut = 60;while (n > 0) {if (n < lenOut) {lenOut = n;}if (seqi + lenOut < seq.length) {ret = seq.substring(seqi, seqi + lenOut);seqi += lenOut;} else {var s = seq.substring(seqi);seqi = lenOut - s.length;ret = s + seq.substring(0, seqi);}n -= lenOut;}}
00030: deffun function fastaRandom(n, table) {var line = new Array(60);makeCumulative(table);while (n > 0) {if (n < line.length) {line = new Array(n);}for (var i = 0; i < line.length; i++) {var r = rand(1);for (var c in table) {if (r < table[c]) {line[i] = c;break;}}}ret = line.join("");n -= line.length;}}
00033: defvar "ret"
00036: defvar "count"
main:
00039: int8 42
00041: setgvar "last"
00044: pop
00045: uint16 3877
00048: setgvar "A"
00051: pop
00052: uint16 29573
00055: setgvar "C"
00058: pop
00059: uint24 139968
00063: setgvar "M"
00066: pop
00067: nop
00068: string "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"
00071: setgvar "ALU"
00074: pop
00075: newinit 1
00077: double 0.27
00080: initprop "a"
00083: double 0.12
00086: initprop "c"
00089: double 0.12
00092: initprop "g"
00095: double 0.27
00098: initprop "t"
00101: double 0.02
00104: initprop "B"
00107: double 0.02
00110: initprop "D"
00113: double 0.02
00116: initprop "H"
00119: double 0.02
00122: initprop "K"
00125: double 0.02
00128: initprop "M"
00131: double 0.02
00134: initprop "N"
00137: double 0.02
00140: initprop "R"
00143: double 0.02
00146: initprop "S"
00149: double 0.02
00152: initprop "V"
00155: double 0.02
00158: initprop "W"
00161: double 0.02
00164: initprop "Y"
00167: endinit
00168: setgvar "IUB"
00171: pop
00172: newinit 1
00174: double 0.302954942668
00177: initprop "a"
00180: double 0.1979883004921
00183: initprop "c"
00186: double 0.1975473066391
00189: initprop "g"
00192: double 0.3015094502008
00195: initprop "t"
00198: endinit
00199: setgvar "HomoSap"
00202: pop
00203: nop
00204: nop
00205: nop
00206: getgvar "ret"
00209: pop
00210: int8 7
00212: setgvar "count"
00215: pop
00216: callgvar "fastaRepeat"
00219: int8 2
00221: getgvar "count"
00224: mul
00225: uint24 100000
00229: mul
00230: getgvar "ALU"
00233: call 2
00236: setgvar "ret"
00239: pop
00240: callgvar "fastaRandom"
00243: int8 3
00245: getgvar "count"
00248: mul
00249: uint16 1000
00252: mul
00253: getgvar "IUB"
00256: call 2
00259: setgvar "ret"
00262: pop
00263: callgvar "fastaRandom"
00266: int8 5
00268: getgvar "count"
00271: mul
00272: uint16 1000
00275: mul
00276: getgvar "HomoSap"
00279: call 2
00282: setgvar "ret"
00285: pop
00286: stop
Source notes:
0: 0 [ 0] setline lineno 6
2: 15 [ 15] xdelta
3: 15 [ 0] setline lineno 13
5: 18 [ 3] setline lineno 22
7: 21 [ 3] setline lineno 29
9: 33 [ 12] xdelta
10: 33 [ 0] setline lineno 79
12: 36 [ 3] setline lineno 81
14: 39 [ 3] setline lineno 1
16: 39 [ 0] setline lineno 6
18: 41 [ 2] decl offset 0
20: 44 [ 3] pcdelta offset 7
22: 51 [ 7] pcdelta offset 7
24: 58 [ 7] pcdelta offset 8
26: 67 [ 9] xdelta
27: 67 [ 0] setline lineno 8
29: 67 [ 0] funcdef function 0 (function rand(max) {last = (last * A + C) % M;return max * last / M;})
31: 68 [ 1] setline lineno 13
33: 68 [ 0] newline
34: 71 [ 3] decl offset 0
36: 75 [ 4] setline lineno 22
38: 77 [ 2] newline
39: 101 [ 24] xdelta
40: 101 [ 0] newline
41: 125 [ 24] xdelta
42: 125 [ 0] newline
43: 149 [ 24] xdelta
44: 149 [ 0] newline
45: 168 [ 19] xdelta
46: 168 [ 0] decl offset 0
48: 172 [ 4] setline lineno 29
50: 174 [ 2] newline
51: 180 [ 6] newline
52: 186 [ 6] newline
53: 192 [ 6] newline
54: 199 [ 7] decl offset 0
56: 203 [ 4] setline lineno 36
58: 203 [ 0] funcdef function 1 (function makeCumulative(table) {var last = null;for (var c in table) {if (last) {table[c] += table[last];}last = c;}})
60: 204 [ 1] setline lineno 44
62: 204 [ 0] funcdef function 2 (function fastaRepeat(n, seq) {var seqi = 0, lenOut = 60;while (n > 0) {if (n < lenOut) {lenOut = n;}if (seqi + lenOut < seq.length) {ret = seq.substring(seqi, seqi + lenOut);seqi += lenOut;} else {var s = seq.substring(seqi);seqi = lenOut - s.length;ret = s + seq.substring(0, seqi);}n -= lenOut;}})
64: 205 [ 1] setline lineno 60
66: 205 [ 0] funcdef function 3 (function fastaRandom(n, table) {var line = new Array(60);makeCumulative(table);while (n > 0) {if (n < line.length) {line = new Array(n);}for (var i = 0; i < line.length; i++) {var r = rand(1);for (var c in table) {if (r < table[c]) {line[i] = c;break;}}}ret = line.join("");n -= line.length;}})
68: 206 [ 1] setline lineno 79
70: 206 [ 0] decl offset 0
72: 210 [ 4] setline lineno 81
74: 212 [ 2] decl offset 0
76: 216 [ 4] newline
77: 233 [ 17] xdelta
78: 233 [ 0] pcbase offset 17
80: 240 [ 7] newline
81: 256 [ 16] xdelta
82: 256 [ 0] pcbase offset 16
84: 263 [ 7] newline
85: 279 [ 16] xdelta
86: 279 [ 0] pcbase offset 16
Assignee | ||
Comment 48•16 years ago
|
||
Offsets are totally out of whack. I will try to catch both versions in gdb.
Assignee | ||
Comment 49•16 years ago
|
||
Attachment #357882 -
Attachment is obsolete: true
Assignee | ||
Comment 50•16 years ago
|
||
Attachment #358035 -
Attachment is obsolete: true
Assignee | ||
Comment 51•16 years ago
|
||
Attachment #358040 -
Attachment is obsolete: true
Assignee | ||
Comment 52•16 years ago
|
||
Attachment #358041 -
Attachment is obsolete: true
Assignee | ||
Comment 53•16 years ago
|
||
Assignee | ||
Comment 54•16 years ago
|
||
Attachment #358043 -
Attachment is obsolete: true
Assignee | ||
Comment 55•16 years ago
|
||
Attachment #358060 -
Attachment is obsolete: true
Assignee | ||
Comment 56•16 years ago
|
||
DBG:
recording completed at fasta.js:15@170 via closeLoop
O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x24af20
leaving trace at fasta.js:17@194, op=, lr=0x24b2d0, exitType=1, sp=2, calldepth=0
O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x24af20
leaving trace at fasta.js:17@194, op=, lr=0x24b2d0, exitType=1, sp=2, calldepth=0
capture global type global0: 4=S
capture global type global1: 2=D
capture global type global2: 0=O
capture global type global3: 2=D
recording starting from fasta.js:13@128
OPT:
recording completed at fasta.js:15@170 via closeLoop
O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x14ef20
leaving trace at fasta.js:17@194, op=, lr=0x14f2d0, exitType=1, sp=2, calldepth=0
O/O S/S S/S D/D O/O I/I entering trace at fasta.js:15@170, native stack slots: 6 code: 0x14ef20
leaving trace at fasta.js:17@194, op=, lr=0x14f2d0, exitType=1, sp=2, calldepth=0
capture global type 0: 4=S
capture global type 0: 2=D
capture global type 0: 0=O
capture global type 0: 1=I
recording starting from fasta.js:13@128
global3 is D for DBG and I for OPT. This is the first divergence I could make out.
Assignee | ||
Comment 57•16 years ago
|
||
Bug found. vpnum is used incorrectly in an OPT build instead n. Patch to follow.
Assignee | ||
Comment 58•16 years ago
|
||
This patch fixes the last perf regression. We are up to parity with tip.
Attachment #357175 -
Attachment is obsolete: true
Attachment #358045 -
Attachment is obsolete: true
Attachment #358067 -
Attachment is obsolete: true
Assignee | ||
Comment 59•16 years ago
|
||
Assignee | ||
Comment 60•16 years ago
|
||
Attachment #358104 -
Attachment is obsolete: true
Assignee | ||
Comment 61•16 years ago
|
||
Comment on attachment 358105 [details] [diff] [review]
v12, fixed gslots[vpnum] => gslots[n]
David isn't around, so asking for review from Brendan for the vpnum->n change only (the rest is reviewed).
Attachment #358105 -
Flags: review?(brendan)
Comment 62•16 years ago
|
||
Comment on attachment 358105 [details] [diff] [review]
v12, fixed gslots[vpnum] => gslots[n]
>+TraceRecorder::import(TreeInfo* treeInfo, LIns* sp, unsigned stackSlots, unsigned ngslots,
>+ unsigned callDepth, uint8* typeMap)
> {
> /* If we get a partial list that doesn't have all the types (i.e. recording from a side
> exit that was recorded but we added more global slots later), merge the missing types
> from the entry type map. This is safe because at the loop edge we verify that we
> have compatible types for all globals (entry type and loop edge type match). While
> a different trace of the tree might have had a guard with a different type map for
> these slots we just filled in here (the guard we continue from didn't know about them),
> since we didn't take that particular guard the only way we could have ended up here
> is if that other trace had at its end a compatible type distribution with the entry
> map. Since thats exactly what we used to fill in the types our current side exit
> didn't provide, this is always safe to do. */
> unsigned length;
>- if (ngslots < (length = traceMonitor->globalTypeMap->length()))
>- mergeTypeMaps(&globalTypeMap, &ngslots,
>- traceMonitor->globalTypeMap->data(), length,
>+ /* This is potentially the typemap of the side exit and thus shorter than the tree's
>+ global type map. */
>+ uint8* globalTypeMap = typeMap + stackSlots;
>+ if (ngslots < (length = treeInfo->globalSlots()))
Nit: this is too crunched vertically, a blank line before any comment that consumes one or more lines is better.
Nit: length is declared early and then assigned in a nested assignment on the right of < in an if condition. Much more readable to pull that assignment out and make it the initializer for the length declaration.
>+ UnstableExit* uexit;
>+ UnstableExit** tail = &ti->unstableExits;
>+ for (uexit = ti->unstableExits; uexit != NULL; uexit = uexit->next) {
>+ if (uexit->exit == exit) {
>+ *tail = uexit->next;
>+ bound = true;
>+ break;
>+ }
>+ tail = &uexit->next;
>+ }
>+ JS_ASSERT(bound);
>+ }
No uexit usage after the for loop, so move its decl into the first part of the for-head.
r=me with these, and let's have that followup bug to catch future opt-build vpnum/vpname uses!
Have fun storming the TinderboxCastle!
/be
Attachment #358105 -
Flags: review?(brendan) → review+
Assignee | ||
Comment 63•16 years ago
|
||
Attachment #358105 -
Attachment is obsolete: true
Comment 64•16 years ago
|
||
Comment on attachment 358109 [details] [diff] [review]
v13, with brendan's nits
Wahhh. (Andreas got the message in person. ;-)
/be
Assignee | ||
Comment 65•16 years ago
|
||
Attachment #358109 -
Attachment is obsolete: true
Comment 66•16 years ago
|
||
for (var i = 0; i < 2; ++i) {
for (var e = 0; e < 2; ++e) {
}
var c = void 0;
print("---");
for (var a = 0; a < 3; ++a) {
c <<= c;
print("" + c);
}
}
In builds without this patch, this outputs only zeros. But in builds with v12, I get "8 2048 2048" on the second time through the outer loop.
Comment 67•16 years ago
|
||
Here's a version of comment 66 that is more likely to work as an automated test:
var A = [];
for (var i = 0; i < 2; ++i) {
var c = void 0;
print;
for (var a = 0; a < 3; ++a) {
c <<= c;
A.push(c);
}
}
print(uneval(A));
Assignee | ||
Comment 68•16 years ago
|
||
This bug involves lazy resolution. If you put print("") on top of the test case the bug goes away:
print("");
for (var i = 0; i < 2; ++i) {
for (var e = 0; e < 2; ++e) {
}
var c = void 0;
for (var a = 0; a < 3; ++a) {
c <<= c;
print("" + c);
}
print(c);
}
Adding Math; right around print triggers it again:
print("");
for (var i = 0; i < 2; ++i) {
for (var e = 0; e < 2; ++e) {
}
var c = void 0;
for (var a = 0; a < 3; ++a) {
c <<= c;
Math;
print("" + c);
}
print(c);
}
captureMissingGlobalTypes was overwriting all types, not just the missing types. review on interdiff requested
Attachment #358111 -
Attachment is obsolete: true
Attachment #358140 -
Flags: review?(gal)
Updated•16 years ago
|
Assignee: gal → danderson
Assignee | ||
Updated•16 years ago
|
Attachment #358140 -
Flags: review?(gal) → review+
Assignee | ||
Comment 70•16 years ago
|
||
Pushed as changeset: http://hg.mozilla.org/tracemonkey/rev/186e782c623d
Please backout at the slightest sign of tinderbox sadness - I'll be around tomorrow to debug.
Assignee: danderson → gal
Assignee | ||
Comment 72•16 years ago
|
||
Our tree is currently totally busted. Looks like massive tinderbox failure on a patch that previously cycled. Did I ever voice my opinion regarding our build infrastructure?
Assignee | ||
Comment 73•16 years ago
|
||
Looks like the tinderboxes liked the patch. Victory at last!
Whiteboard: fixed-in-tracemonkey
Assignee | ||
Comment 74•16 years ago
|
||
As suggested by shaver, making vpnum a large number to catch future accidental uses of it in OPT builds. Reverting gslots[n] to gslots[vpnum] yields the following result when running the latest testcase:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x8020a140
0x0009b344 in TraceRecorder::adjustCallerTypes (this=0x20c970, f=0x0) at ../jstracer.cpp:1812
1812 FORALL_GLOBAL_SLOTS(cx, ngslots, gslots,
(gdb) bt
#0 0x0009b344 in TraceRecorder::adjustCallerTypes (this=0x20c970, f=0x0) at ../jstracer.cpp:1812
#1 0x0009fcd7 in js_RecordLoopEdge (cx=0x20a830, r=0x20c970, inlineCallCount=@0xbffff65c) at ../jstracer.cpp:3341
#2 0x0009feda in js_MonitorLoopEdge (cx=0x20a830, inlineCallCount=@0xbffff65c) at ../jstracer.cpp:3848
#3 0x0003b029 in js_Interpret (cx=0x20a830) at ../jsinterp.cpp:3095
#4 0x0003ed03 in js_Execute (cx=0x20a830, chain=0x153000, script=0x20c3c0, down=0x0, flags=0, result=0x0) at jsinterp.cpp:1564
#5 0x0000c7b8 in JS_ExecuteScript (cx=0x20a830, obj=0x0, script=0x0, rval=0x0) at ../jsapi.cpp:5083
#6 0x00006cc9 in Process ()
#7 0x0000725a in main ()
Pushed to TM: http://hg.mozilla.org/tracemonkey/rev/5dbe936e4f76
Comment 75•16 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•16 years ago
|
Whiteboard: fixed-in-tracemonkey → fixed-in-tracemonkey [needs 191 landing]
Comment 76•16 years ago
|
||
Keywords: fixed1.9.1
Whiteboard: fixed-in-tracemonkey [needs 191 landing] → fixed-in-tracemonkey
Comment 77•16 years ago
|
||
Comment 78•16 years ago
|
||
Flags: in-testsuite+
Flags: in-litmus-
Comment 79•16 years ago
|
||
Comment 80•16 years ago
|
||
js1_5/Regress/regress-469044.js
js1_8_1/trace/trace-test.js
v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
Keywords: fixed1.9.1 → verified1.9.1
Updated•16 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•