Closed Bug 480132 Opened 16 years ago Closed 16 years ago

SpiderMonkey clones too many blocks into the heap

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.9.2a1

People

(Reporter: jimb, Assigned: jimb)

References

Details

(Keywords: verified1.9.1, Whiteboard: [tb3needs])

Attachments

(6 files, 4 obsolete files)

Clone lexical blocks only when needed. Interpreter-only version. 16 years ago Jim Blandy :jimb 17.19 KB, patch		Details \| Diff \| Splinter Review
Graph comparing profiles with and without patch 16 years ago Jim Blandy :jimb 251.48 KB, image/png		Details
Graph of level 1 instruction misses 16 years ago Jim Blandy :jimb 111.73 KB, image/png		Details
microbenchmark showing improvement 16 years ago Jim Blandy :jimb 182 bytes, application/javascript		Details
(corrected) microbench showing improvement 16 years ago Jim Blandy :jimb 338 bytes, application/javascript		Details
SunSpider benchmark results, unpatched vs. patched 16 years ago Jim Blandy :jimb 3.59 KB, text/plain		Details
Plot of runtime of SunSpider access-nsieve only 16 years ago Jim Blandy :jimb 128.42 KB, image/png		Details
Bug 480132: Clone lexical blocks only when needed. 16 years ago Jim Blandy :jimb 20.69 KB, patch	igor : review+	Details \| Diff \| Splinter Review
Bug 480132: Clone lexical blocks only when needed. 16 years ago Jim Blandy :jimb 20.79 KB, patch		Details \| Diff \| Splinter Review
Bug 480132: Clone lexical blocks only when needed. r=igor 16 years ago Jim Blandy :jimb 20.04 KB, patch		Details \| Diff \| Splinter Review

Jim Blandy :jimb

Assignee

Description

•

16 years ago

Attached patch Clone lexical blocks only when needed. Interpreter-only version. (obsolete) — Details — Splinter Review

In JavaScript, it's not possible to tell in advance which local variables may be allocated solely on the stack, and which must live in the heap because they have been closed over by a function object. When possible, it's cheaper to allocate them on the stack only. SpiderMonkey initially allocates all local variables on the stack, and points fp->blockChain at immutable, compiler-allocated block objects to track them. If we do close over those blocks, we create clones of those immutable block objects, put the clones on fp->scopeChain, and then copy the variables from the stack into the clones when we pop them. The first time SpiderMonkey clones any blocks, it sets JSFRAME_POP_BLOCKS in fp->flags. Once that flag is set, we clone all subsequent lexical blocks as soon as we enter them, whether they are closed over or not. Thus, once a frame creates a single closure (that actually captures a lexical block), all subsequent lexical block entry and exit in that frame becomes slower and increases GC load --- even if those blocks are never closed over. However, SpiderMonkey is already doing all the bookkeeping it needs to clone only those blocks that actually serve as parents of closures. The attached patch eliminates JSFRAME_POP_BLOCKS and clones only when we actually create a closure (or enter a 'with' block, *sigh*). Taking advantage of that would be independent of the work in bug 452498 ("upvar, part deux"), which pursues a different strategy for reducing block capture. I also suspect that a lot of the assertion failures like that in bug 477733 are a result of the JIT not being able to tell whether it needs to clone blocks, since JSFRAME_POP_BLOCKS is imprecise, and can be clear when we record, but set later. If SpiderMonkey were to adopt the technique implemented in this patch, it would be simpler for the tracer to recognize the situations it can't handle.

Jim Blandy :jimb

Assignee

Updated

•

16 years ago

Assignee: general → jim

Jim Blandy :jimb

Assignee

Comment 1

•

16 years ago

Comment on attachment 364108 [details] [diff] [review] Clone lexical blocks only when needed. Interpreter-only version. Jason, would you be willing to look this over? If you have suggestions for the JIT side, that'd be especially cool.

Attachment #364108 - Flags: review?(jorendorff)

Jim Blandy :jimb

Assignee

Comment 2

•

16 years ago

bench.sh didn't notice, but Sunspider says this change is a slowdown. This was pretty mysterious, as SunSpider doesn't use ENTERBLOCK at all, and so should be unaffected by the changes to js_Interp, and never make it into the changed portions of js_GetScopeChain. oprofile has a user interface only a mother could love, but I think what's going on is that several other call sites for js_GetScopeChain inline js_GSC's first test, to avoid calling it altogether --- and those tests are now wrong, so js_GetScopeChain is being called more often; we're incurring the function call overhead again. I'll put in some instrumentation to verify this, and see if I can update the call sites.

Jim Blandy :jimb

Assignee

Comment 3

•

16 years ago

I was wrong --- js_GetScopeChain is called exactly as many times with and without the patch. The heuristics at the call sites mean now something slightly different ("not in the static scope of any blocks" vs. "scope chain is already current"), but they work just as well to protect code that doesn't use let. The slowdown is due to the new code's reference to js_WithClass. Mentioning that causes GCC to emit code to find the global offset table: c11: 53 push %ebx c12: 83 ec 1c sub $0x1c,%esp c15: 8b 45 0c mov 0xc(%ebp),%eax c18: e8 fc ff ff ff call c19 <js_GetScopeChain+0xd> c19: R_386_PC32 __i686.get_pc_thunk.bx c1d: 81 c3 02 00 00 00 add $0x2,%ebx c1f: R_386_GOTPC _GLOBAL_OFFSET_TABLE_ All this code disappears if I replace '&js_WithClass' with 'NULL'. It's annoying that GCC insists on establishing the GOT pointer before it needs it.

Jim Blandy :jimb

Assignee

Comment 4

•

16 years ago

(The point being that the above code runs upon entry to js_GetScopeChain, before the first 'if', and thus lies on the fast path even though the GOT pointer is dead on the fast path.)

Jim Blandy :jimb

Assignee

Comment 5

•

16 years ago

I've changed the patches so that the js_GetScopeChain fast path no longer tries to compute the GOT address. It is still 2.3% slower. I ran SunSpider with these patches applied and not applied, 4800 runs on each, with oprofile collecting data. The attached graph shows the results of the comparison, looking at time spent in jsinterp.cpp: - Unpatched SM is on the left, patched SM is on the right. - Each horizontal row of pixels corresponds to one source line. - Labels down the sides show line numbers; the two files are aligned a la diff, so the line numbers don't line up. - Excess cycles on the left show up in green; on the right, in red. - The horizontal scale is linear, so area is proportional to time. Since control never reaches a modified line, there are no lines that show counts only on one side or the other. I've looked at the places where there is significant red. One is just before line 3900 in the patched SM. That is the code for JSOP_SUB, a use of BINARY_OP(-). I've gone through and looked at the disassembly for all six basic blocks that result from the expansion of BINARY_OP(-); they're identical. So I'm not really sure what to attribute the slowdown to. I'd love to blame it on code cache line conflicts; I'm going to re-profile with a different event counter and see.

Jim Blandy :jimb

Assignee

Comment 6

•

16 years ago

Attached image Graph comparing profiles with and without patch — Details

Jim Blandy :jimb

Assignee

Comment 7

•

16 years ago

Attached image Graph of level 1 instruction misses — Details

Here oprofile counted L1I_MISSES events (number of instruction fetch misses). You can see that the patched profile, on the right, is doing a lot worse in this regard.

Jim Blandy :jimb

Assignee

Comment 8

•

16 years ago

Attached file microbenchmark showing improvement (obsolete) — Details

For what it's worth, here's a microbenchmark showing the improvement this patch makes. SunSpider shows SpiderMonkey to be 2.63x as fast with the patch applied.

Jim Blandy :jimb

Assignee

Comment 9

•

16 years ago

Attached file (corrected) microbench showing improvement — Details

Use this one instead.

Attachment #365743 - Attachment is obsolete: true

Jim Blandy :jimb

Assignee

Comment 10

•

16 years ago

With the patch above: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 2.25x as fast 251.3ms +/- 2.0% 111.5ms +/- 4.6% significant ============================================================================= controlflow: 2.25x as fast 251.3ms +/- 2.0% 111.5ms +/- 4.6% significant blocks: 2.25x as fast 251.3ms +/- 2.0% 111.5ms +/- 4.6% significant

Robert Sayre

Comment 11

•

16 years ago

(In reply to comment #10) > With the patch above: > can you attach full sunspider results, patch vs. no patch?

Jim Blandy :jimb

Assignee

Comment 12

•

16 years ago

Attached file SunSpider benchmark results, unpatched vs. patched — Details

Jim Blandy :jimb

Assignee

Updated

•

16 years ago

Attachment #365994 - Attachment mime type: application/octet-stream → text/plain

Jim Blandy :jimb

Assignee

Comment 13

•

16 years ago

Attached image Plot of runtime of SunSpider access-nsieve only — Details

Jim Blandy :jimb

Assignee

Comment 14

•

16 years ago

In the nsieve-only results, I'm seeing jumps in runtime in JSOP_SETELEM, JSOP_GETLOCAL, and on the END_CASE for JSOP_LOOP.

Jim Blandy :jimb

Assignee

Comment 15

•

16 years ago

Here are the profile counts I get for the head of JSOP_SETELEM unpatched: : BEGIN_CASE(JSOP_SETELEM) : rval = FETCH_OPND(-1); 4793 0.1937 : 80e8fe3: mov -0x5c(%ebp),%edx 6 2.4e-04 : 80e8fe6: mov -0x4(%edx),%eax 152 0.0061 : 80e8fe9: mov %eax,-0x44(%ebp) : FETCH_OBJECT(cx, -3, lval, obj); 73 0.0030 : 80e8fec: mov -0xc(%edx),%eax 5020 0.2029 : 80e8fef: test $0x7,%al : 80e8ff1: je 80ed50d <js_Interpret+0x63d5> Here are the corresponding lines, patched: : BEGIN_CASE(JSOP_SETELEM) : rval = FETCH_OPND(-1); 5006 0.2023 : 80e8ed9: mov -0x5c(%ebp),%edx 44201 1.7864 : 80e8edc: mov -0x4(%edx),%eax 9404 0.3801 : 80e8edf: mov %eax,-0x44(%ebp) : FETCH_OBJECT(cx, -3, lval, obj); 3112 0.1258 : 80e8ee2: mov -0xc(%edx),%eax 1852 0.0748 : 80e8ee5: test $0x7,%al : 80e8ee7: je 80ed63d <js_Interpret+0x6859> The first instruction is fetching regs.sp; the second is fetching the value at the top of the stack. I don't see why that second instruction should be significantly slower.

Jim Blandy :jimb

Assignee

Comment 16

•

16 years ago

I'm going to do a longer profiling run, to have more confidence I'm not looking at noise.

Mike Shaver (:shaver emeritus)

Comment 17

•

16 years ago

Is there a pattern in the alignment of regs.sp?

David Mandelin [:dmandelin]

Comment 18

•

16 years ago

I'm going to try a little VTune on this. With the SETELEM thing, it's pretty unobvious what's going on, especially because if the first instruction is slow, it could be making the second one slow. It might even be a symptom of an L1 miss on the first instruction. You could look at indirect branch mispredict rates. You could also try inserting NOPs between the first and second instructions to see if it does or doesn't have an effect.

David Mandelin [:dmandelin]

Comment 19

•

16 years ago

As of now, I'm not seeing much of any perf difference. I got +8 ms the last time I tried it. I suspect this is just the usual noise in js_Interpret from GCC moving blocks around or frobulating its register allocator or whatever. Also, there is no good reason to believe this would cause a perf difference: it only touches js_GetScopeChain, and the total execution time of this function is negligible in all the SS benchmarks I looked at.

Jim Blandy :jimb

Assignee

Comment 20

•

16 years ago

Out of paranoia, I checked that the bytecodes being executed were not affected by the patch; they're not.

Brendan Eich [:brendan]

Comment 21

•

16 years ago

This should get in and bake, no need to worry about noise-level effects. Igor should prolly review (I owe him two reviews, so I'm ducking; he did the last major improvement to js_GetScopeChain during Firefox 3 era, with mrbkap and me as buddies). /be

Jim Blandy :jimb

Assignee

Updated

•

16 years ago

Blocks: 477733

Robert Sayre

Updated

•

16 years ago

Attachment #364108 - Flags: review?(igor)

Igor Bukanov

Comment 22

•

16 years ago

Comment on attachment 364108 [details] [diff] [review] Clone lexical blocks only when needed. Interpreter-only version. Does this passes the shell test suite? >diff --git a/js/src/jsinterp.cpp b/js/src/jsinterp.cpp >--- a/js/src/jsinterp.cpp >+++ b/js/src/jsinterp.cpp >@@ -663,11 +663,9 @@ js_FreeStack(JSContext *cx, void *mark) > JSObject * > js_GetScopeChain(JSContext *cx, JSStackFrame *fp) > { >- JSObject *obj, *cursor, *clonedChild, *parent; >- JSTempValueRooter tvr; >- >- obj = fp->blockChain; >- if (!obj) { >+ JSObject *sharedBlock = fp->blockChain; >+ >+ if (! sharedBlock) { Nit: no space after !. > if (!js_GetCallObject(cx, fp)) > return NULL; >+ /* We know we must clone everything on blockChain. */ >+ limitBlock = limitClone = NULL; Nit: blank line before the comments. >+ while (OBJ_GET_CLASS(cx, limitClone) == &js_WithClass) >+ limitClone = OBJ_GET_PARENT(cx, limitClone); >+ JS_ASSERT(limitClone); Nit: blank line before the comments. >+ if (! innermostNewChild) >+ return NULL; Nit: no space after !. >+ JSTempValueRooter tvr; >+ JS_PUSH_TEMP_ROOT_OBJECT(cx, innermostNewChild, &tvr); Nit: use JSAutoTempValueRooter here (with JSObject* constructor case added). >+ if (sharedBlock == limitBlock || ! sharedBlock) >+ break; Nit: no space after !. >+ if (! clone) { Nit: no space after !. >+ JS_ASSERT_IF(limitBlock && >+ OBJ_GET_CLASS(cx, limitBlock) == &js_BlockClass && >+ OBJ_GET_PRIVATE(cx, limitClone) == fp, >+ sharedBlock != NULL); Nit: Use just sharedBlock, not sharedBlock != NULL, as the condition. > >+#ifdef DEBUG >+ JS_ASSERT(fp->blockChain == OBJ_GET_PARENT(cx, obj)); > /* Nit: a blank line before the comment.

Jason Orendorff [:jorendorff]

Updated

•

16 years ago

Attachment #364108 - Flags: review?(jorendorff)

Jim Blandy :jimb

Assignee

Comment 23

•

16 years ago

Attached patch Bug 480132: Clone lexical blocks only when needed. (obsolete) — Details — Splinter Review

This version should cover Igor's comments. Passes js/tests and js/src/trace-test.js with JIT enabled and disabled. ---- Terminology: A "script block" is an object of class Block allocated by the byte compiler and associated with a script. Script blocks are never modified, and may be used as a prototype for a "closure block": A "closure block" is an object of class Block that holds variables that have been closed over (although we actually leave the variables on the stack until we leave their dynamic scope). A closure block is a clone of a script block (its prototype is a script block). Adjust the meanings of fp->blockChain and fp->scopeChain: fp->blockChain is always the innermost script block in whose static scope we're executing. fp->scopeChain is the current scope chain, including 'call' objects and closure blocks for those function calls and blocks in whose static scope we are currently executing, and 'with' objects for with statements; the chain is typically terminated by a global object. However, as an optimization, the young end of the chain omits block objects we have not yet needed to clone. Closures need fully reified scope chains, so have js_GetScopeChain reify any closure blocks missing from the young end of fp->scopeChain by cloning script blocks as needed from fp->blockChain. Thus, if we never actually close over a particular block, we never place a closure block for it on fp->scopeChain. Have JSOP_ENTERBLOCK and JSOP_LEAVEBLOCK always keep fp->blockChain current. When JSOP_LEAVEBLOCK pops a block from fp->blockChain that has been cloned on fp->scopeChain, pop fp->scopeChain as well. Remove the JSFRAME_POP_BLOCKS flag, as it is no longer needed. Ensure that the JIT won't have to create closure blocks or call js_PutBlockObject; it can't handle those things yet. Note our current script block when we begin recording. Abort recording if we leave that block; we can't tell in advance whether it will need to be "put" in future trace invocations. Abort recording if we call js_GetScopeChain while in the static scope of lexical blocks. Remove JIT tests based on JSFRAME_POP_BLOCKS. Verify that generators capture the correct value for blockChain. Add a constructor to JSAutoTempValueRooter for rooting JSObject pointers.

Attachment #364108 - Attachment is obsolete: true

Attachment #367130 - Flags: review?(igor)

Attachment #364108 - Flags: review?(igor)

Igor Bukanov

Comment 24

•

16 years ago

Comment on attachment 367130 [details] [diff] [review] Bug 480132: Clone lexical blocks only when needed. > JSBool >@@ -6741,51 +6800,57 @@ js_Interpret(JSContext *cx) ... >+ obj2 = fp->scopeChain; >+ while ((clasp = OBJ_GET_CLASS(cx, obj2)) == &js_WithClass) >+ obj2 = OBJ_GET_PARENT(cx, obj2); >+ if (clasp == &js_BlockClass && >+ OBJ_GET_PRIVATE(cx, obj2) == fp) { >+ obj2 = OBJ_GET_PROTO(cx, obj2); Nit: do not reuse here obj2 for something that have very different meaning. A new local like lastClonedBlock as the target of the last assignment together with an assert that it is a static lexical block would be very helpful to clarify what is going. And thanks for nice comments in the patch!

Attachment #367130 - Flags: review?(igor) → review+

Robert Sayre

Comment 25

•

16 years ago

I pushed this http://hg.mozilla.org/tracemonkey/rev/acd36c659a72 since the tree was quiet. I think I addressed igor's nit correctly, mea culpa if not. The tree was green and quiet, so worth getting the testing in, I think.

Robert Sayre

Comment 26

•

16 years ago

well, that didn't last long: http://hg.mozilla.org/tracemonkey/rev/55c8e88b7a84 Looks like the nit fix busted the static analysis compile. At least we'll get a run in on all the boxes.

Jim Blandy :jimb

Assignee

Comment 27

•

16 years ago

Attached patch Bug 480132: Clone lexical blocks only when needed. (obsolete) — Details — Splinter Review

Revised for Igor's last suggestion; he was concerned about the second use of obj2 in the section he quoted, not the first. I think this should pass the static analysis, because the new variable is local to a block being jumped over, so the 'error' label isn't within its scope. That log message... a lot of that should be source comments, not a log message. Pushed to TM as changeset: http://hg.mozilla.org/tracemonkey/rev/22e58c3984ba

Attachment #367130 - Attachment is obsolete: true

Jim Blandy :jimb

Assignee

Comment 28

•

16 years ago

Attached patch Bug 480132: Clone lexical blocks only when needed. r=igor — Details — Splinter Review

Previous patch failed static analysis checks: we should be leaving trace, not aborting recording. Simply because we reach js_GetScopeChain somehow while recording doesn't mean that we will or will not reach it from trace --- but we cannot stay on trace after cloning blocks, because JITted code does not call js_PutBlockObject.

Attachment #367355 - Attachment is obsolete: true

Jim Blandy :jimb

Assignee

Comment 29

•

16 years ago

Backed out; latest rev causes orange on Linux: http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1237017748.1237029269.2403.gz&fulltext=1

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

16 years ago

Blocks: 476655

Jim Blandy :jimb

Assignee

Comment 30

•

16 years ago

Landed in TM: http://hg.mozilla.org/tracemonkey/rev/74416c7004c5

Robert Sayre

Comment 31

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/74416c7004c5

Status: NEW → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

16 years ago

Blocks: 483749

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

16 years ago

Blocks: 472703

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

16 years ago

Blocks: 472528

Robert Sayre

Updated

•

16 years ago

Flags: blocking1.9.1+

Kent James (:rkent)

Comment 32

•

16 years ago

FYI I was getting "Assertion failure: !(fp->flags & JSFRAME_POP_BLOCKS)" from unit test http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/test/unit/test_intl.js in my Thunderbird 3.0 (Gecko 1.9.1) debug build on Windows 2003 server. I applied the patch from attachment 367356 [details] [diff] [review] on this bug, and the problem went away. This does not seem to appear in the tinderbox builds for some reason, only on my local build.

Mark Banner (:standard8)

Comment 33

•

16 years ago

(In reply to comment #32) > FYI I was getting "Assertion failure: !(fp->flags & JSFRAME_POP_BLOCKS)" from > unit test > http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/test/unit/test_intl.js FTR this is covered by bug 483857.

Blocks: 483857

Dan Mosedale (:dmosedale, :dmose)

Comment 34

•

16 years ago

Jim: what's the next step to getting this landed on the branch?

Whiteboard: [tb3needs]

Jim Blandy :jimb

Assignee

Comment 35

•

16 years ago

Landed in 1.9.1: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/921352e9b2ce

Mark Banner (:standard8)

Comment 36

•

16 years ago

(In reply to comment #35) > Landed in 1.9.1: > http://hg.mozilla.org/releases/mozilla-1.9.1/rev/921352e9b2ce Adding fixed1.9.1 keyword...

Keywords: fixed1.9.1

Henrik Skupin [:whimboo][⌚️UTC+2] (away 02/17 - 02/21)

Comment 37

•

16 years ago

No backout within the last 3 weeks. I think we can safely mark this bug as verified now.

Status: RESOLVED → VERIFIED

Keywords: fixed1.9.1 → verified1.9.1

OS: Linux → All

Hardware: x86 → All

Target Milestone: --- → mozilla1.9.2a1

Version: unspecified → Trunk

You need to log in before you can comment on or make changes to this bug.