610296 - Closures as a way to modularize JS code slows us down

Reporter

Description

•

15 years ago

Attached file Demo — Details

Attached is a demo of the Bullet physics engine (compiled from C++ using emscripten) with WebGL (through the GLGE rendering engine). Instructions for how to run it are in the README.txt file. In the file bullet.js, there are two possibilities - use a closure for the Bullet library, or not. At the very top of the file and the very bottom of the file are a few lines marked with //=== FAST or //=== SLOW, uncomment them out (the same way on top and bottom) to test each way. The difference is that in FAST we do not use a closure, whereas in SLOW we do. The difference is just a few lines of code here, but it makes a big difference in speed. On minefield, compared to Chrome 7, I get the following (all numbers are frames per second, more is better): Fast (without closure) ---------------------- Firefox: 15-40 during the 'action', stabilize later on 55-60 Chrome: 45-70, 85 Slow (with closure) ------------------- Firefox: 5-20 during the 'action', stabilize later on 25 Chrome: 45-70, 85 (unchanged) For Firefox the closure greatly affects performance, for Chrome it doesn't seem to matter. Note that even without the closure Chrome is faster. This might be because that closure pattern is used also in GLGE, the rendering engine used in the attached demo. This seems to be a common pattern in large JavaScript projects, since it's useful as a way to modularize things (|var|s in the closure are hidden and only available to that module, etc.).

Alon Zakai (:azakai)

Reporter

Updated

•

15 years ago

Blocks: WebJSPerf

Boris Zbarsky [:bzbarsky]

Comment 1

•

15 years ago

Alon, any chance you can extract a minimal-ish testcase here? We put in a fair amount of work to make closures very fast in the tracer, but that might not have migrated to JM yet... so whether you're tracing or not might matter a good bit. Or perhaps the access patterns involved here are slow on both. Hard to tell with a huge chunk of JS... See also bug 517164.

Alon Zakai (:azakai)

Reporter

Comment 2

•

15 years ago

Attached file Shell test (obsolete) — Details

I can't seem to make a minimalist test case that has this problem. I did however manage to simplify the original testcase, into one that runs in the shell - so all the browser stuff, webgl, rendering, etc., are irrelevant. Run the shell test by running bullet_demo.js. Note that you need to apply the included patch, as otherwise the shell will complain about exceeding the script stack space (see bug 502836). Results: //=== FAST (no closure) tm -m 0.926 v8 1.196 tm 1.398 tm -j -m 4.383 tm -j 11.258 //=== SLOW (with closure) v8 0.635 tm -m 2.344 tm -j -m 17.124 tm 26.853 tm -j 44.706 So it looks like there are two issues here: first, the closures slow us down, and this happens even without either JIT enabled. Second, that the tracing JIT slows us down (by a factor of 10 or more).

Alon Zakai (:azakai)

Reporter

Comment 3

•

15 years ago

Adding blocking on |JM: Tune Trace JIT Heuristics|, since as mentioned in the comment above, in the shell test the tracing JIT slows us down by a factor of 10.

Blocks: 580468

Boris Zbarsky [:bzbarsky]

Comment 4

•

15 years ago

You ran your shell tests with heuristics disabled. Here are the numbers on my machine: ~/test% foreach args ("" "-m" "-j" "-m -j" "-m -j -p") foreach? echo "args: $args" foreach? ~/mozilla/tracemonkey/obj-firefox-opt/dist/bin/js $args bullet_demo.js foreach? end args: simulation time: 19.571 seconds args: -m simulation time: 1.487 seconds args: -j simulation time: 41.726 seconds args: -m -j simulation time: 8.931 seconds args: -m -j -p simulation time: 3.492 seconds That last run is with the heuristics enabled. So the heuristics certainly help over raw -m -j, though they don't quite get us down to where -m is on its own. So it really sounds like the issues here are: 1) Closures are slow in _JM_ (known issue; see comment 1). 2) This particular testcase doesn't trace well, for whatever reason, but unrelated to the speed of closures. 3) Heuristics don't get us down to quite the JM speed. I suggest filing a separate bug for #3, and possibly a separate bug for #2. #1 presumably depends on bug 517164.

No longer blocks: 580468

Depends on: ClosurePerf

Boris Zbarsky [:bzbarsky]

Comment 5

•

15 years ago

Oh, and you don't have to patch things to change the script space quota; in the JS shell, I just did: stackQuota(100000000); at the top of bullet_demo.js and it worked fine.

Boris Zbarsky [:bzbarsky]

Comment 6

•

15 years ago

I'm also not quite sure why you needed the two separate files, btw... In any case, a profile of the -m run shows that we spend 82% of our time under js_FindPropertyHelper (split 50-50 between self time in js::PropertyCache::fill and self time in js_LookupPropertyWithFlags), so it's not the closure access per se that's being slow; it's finding where the heck to look for the values. In both functions, the vast majority of the time is spent (according to shark) on lea instructions in a loop like so: 0x100105270 cmp rsi, qword ptr [rdx+16] Loop start[1] jsscope.h:833 0x100105274 jz 0x1001051a4 jsscope.h:833 0x10010527a lea rax, qword ptr [rdx+48] jsscope.h:832 0x10010527e mov rdx, qword ptr [rdx+48] jsscope.h:832 0x100105282 test rdx, rdx jsscope.h:832 0x100105285 jnz 0x100105270 Loop end[1] jsscope.h:832 In fact, it's the _same_ loop in both functions; it's inlined into both of them. In my tree, jsscope.h lines 832 and 833 are: 832 for (spp = startp; js::Shape *shape = *spp; spp = &shape->parent) { 833 if (shape->id == id) { in Shape::search. So the point is that we're likely just hitting L2 misses all over or something. In fill(), we hit this loop because of this check, I think: 74 if (!pobj->nativeContains(*shape)) { None of which is sounding like closure access per se being slow. So why are we ending up with all the resolve/fill calls (are we missing propcache or pics somewhere? Too many distinct shapes?)? And how deep is the chain in search? And can we get rid of this code in fill() somehow?

Boris Zbarsky [:bzbarsky]

Comment 7

•

15 years ago

So I unstrumented that loop to print the number of iterations it goes through (the number of times the loop body is entered). Here are the statistics for just the part after the testcase prints "simulating" (the second column is the iteration count; the first column is the number of times we saw that iteration count): 4 3 4 4 4 5 11 2 11 6 19 100 29 136 63 184 64 230 73 202 94 3744 99 182 144 83 166 524 213 147 323 80 22649 1 Why are those things like 3744 and 524 and whatnot not being hashed?

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

I filed bug 610370 on the lack of hashing there.

Depends on: 610370

Boris Zbarsky [:bzbarsky]

Comment 9

•

15 years ago

Numbers on my machine with my proposed patch for bug 610370: args: simulation time: 0.584 seconds args: -m simulation time: 0.64 seconds args: -j simulation time: 7.861 seconds args: -m -j simulation time: 3.589 seconds args: -m -j -p simulation time: 1.125 seconds So getting better, though -m being no faster than interp is sadmaking. v8 can't run the attached testcase, so I have no idea how we compare to that. A new profile of the -m run shows that we're now spending: 40% of our time in mjit-generated code 20% under stubs::GetElem; calling js::PropertyTable::search from js_GetProperty is half of this! The rest is GetElem and js_GetProperty self time. 16% under stubs::SetElem; again PropertyTable::search is about half of this, though this time we end up calling js_LookupPropertyWithFlags on the way there(?). 9% is the mjit compiler itself, from ic::Call. I wonder why.... do we keep generating new Function objects it has to compile? 4% under stubs::CallName The rest is minor stuff. So yes, now things look like possible closure access, if the GetElem and SetElem are on closures... dvander, does that seem possible?

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

Oh, and issues 2 and 3 from comment 4 remain. Please file bugs?

Brendan Eich [:brendan]

Comment 11

•

15 years ago

This bug should be meta and track concrete bugs that get patches with tests, as bz said. Best not to make this bug about one of N concrete problems/solutions. GetElem and SetElem are for indexed expressions, but you cannot access upvars except by unqualified names. Something else is causing these ops. What? /be

Boris Zbarsky [:bzbarsky]

•

15 years ago

(In reply to comment #14) > Alon, can you make an all-in-one .js file for the shell testcase? Thanks. > I'm trying to simplify the test case now, but making it into one file won't work without the patch for the stack quota, as running stackQuota() as bz suggested must be done before loading the file. So we will need (for now) 2 files, one of which is a shell file that just calls stackQuota() and load()s the actual benchmark.

Boris Zbarsky [:bzbarsky]

•

15 years ago

Attached file single-file shell test — Details

(In reply to comment #17) > Should be fine to just have one file and then run: > > js -e "stackQuota(100000000)" -f filename > Ah, I didn't know about that, nice. Attached is a single-file shell testcase, runnable with ./js [-j -m -p etc.] -e "stackQuota(100000000)" -f bullet_test.js Regarding typed arrays, the test can use them. It's disabled right now (to make the comparison to v8 easier), but you can change var TYPED_ARRAYS = false; to true, to use typed arrays. Might be worth benchmarking that later on. Filed bug 610438 and bug 610440 for issues #2 and #3 mentioned before, that we said we should have bugs for.

Attachment #488816 - Attachment is obsolete: true

Alon Zakai (:azakai)

Reporter

Updated

•

14 years ago

Depends on: 663138

Alon Zakai (:azakai)

Reporter

Comment 19

•

14 years ago

Checked on latest tracemonkey, results with js -m are now no closure : 2.503 with closure: 3.328 So the closure one is 32% slower.

Alon Zakai (:azakai)

Reporter

Updated

•

14 years ago

Resolution: --- → FIXED

Demo 15 years ago Alon Zakai (:azakai) 982.61 KB, application/x-bzip		Details
Shell test 15 years ago Alon Zakai (:azakai) 805.26 KB, application/x-bzip		Details
single-file shell test 15 years ago Alon Zakai (:azakai) 803.08 KB, application/octet-stream		Details