Closed Bug 543149 Opened 15 years ago Closed 15 years ago

Survey closure usage

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: dmandelin)

References

Details

Now we need to do a survey of how closures are used, so that we can see which design tradeoffs are best in practice. Workloads should include: - Benchmarks - SunSpider - V8 - Dromaeo - Web apps - Gmail - Google Docs - Facebook, including chat - Some Chrome Experiments - Existing perf test cases - Go benchmark - JS VMs - JS raytracer, fluid simulator, Box2D Things to measure for each test: - Total JSOPs executed (to give a baseline of how much work is being done) - Count closures created - SM full closures - SM flat closures - SM null closures - closures where cloning was avoided - Variables read, written, and incremented - count by type of closure - get distribution of static scope distance from usage point to definition - get distribution of count of call objects between usage point and definition - get distribution of count of dynamic scopes (using eval) btw use and def - Tracing considerations - count closures read/written through different code paths - reading - directly from tracker - through GetFromClosure - through GetUpvarOnTrace - in interpreter - writing - directly through tracker - direct access of call object - out of range to native stack - out of range to call object - in interpreter
Early results: closures created by V8 classification count % of parent % of all all: 9699 100 null: 957 9.8 9.8 cloned on creation: 568 59.3 5.9 cloned by read barrier: 175 18.3 1.8 never cloned 214 22.4 2.2 flat: 2 .02 .02 full: 8174 84.3 84.3 Note 1: All null closures created had 0 upvars. Note 2: Both flat closures created had 1 upvar.
Early results: upvars read by V8 with JSOP_(CALL)?NAME Definitions: distance = # of lexical scopes between use and def callobjs = # of callobjs traversed to get answer Total read = 98413 This gives an average of about 10 upvar reads per closure created. Count by distance then callobjs: distance callobjs N 1 32593 1 1 12689 1 2 19904 2 60964 2 2 5628 2 3 55336 3 4 4856 Count by callobjs: callobjs N 1 12689 2 25532 3 55336 4 4856 avg #callobjs = 2.4 Getting the eval-related stats will be a bit more work. But we can do a preliminary analysis here, assuming that all callobjs need to be checked for dynamicness, but none actually are dynamic. (If there really is a dynamic scope in play then we and all engines must be slow, so that can doesn't matter much.) Given those assumptions, to get to an upvar using a v8-like scheme we need to: - (1 read) read the pointer to the scope chain (maybe; it might be in a register in JM) - (2.8 reads and 1.4 cmps) for 1.4 callobjs: - (1 read) read the dynamic flag - (1 cmp) jump conditionally on the dynamic flag - (1 read) read the pointer to the parent - (3 reads and 1 cmp) for the last callobj - (1 read) read the dynamic flag - (1 cmp) jump conditionally on the dynamic flag - (1 read) read dslots - (1 read) read the slot Total is 6.8 reads and 2.4 cmps, or more generally (2+2K reads and 1+K cmps) where K is the number of callobjs touched. If there is no eval in play, then it is only (2+K reads). For comparison, the JSOP_GETUPVAR has do this much at minimum (but keep in mind that JSOP_GETUPVAR is not actually used at all in the V8 suite, and that currently JSOP_GETUPVAR calls a function so it actually does much more work than this): - (1 read) read the cx pointer (although may be in reg in JM) - (1 read) read the display member - (1 read) read the fp at the correct level - (1 read) read the slot (assuming stack slab organization) This is about equal to the stack-walk technique when K=2 without eval. So, JSOP_GETUPVAR can be faster or slower, but we don't know yet what a typical K is for JSOP_GETUPVAR. We also don't have an estimate of the cost per variable read of updating the display yet.
The last bits on V8: - Writes to closure vars. The suite performs 12 writes to closure vars, all distance one, and one callobj up, from the current point. - Incrementing closure vars. The suite also does 12 increments, all distance one, and one callobj up. - Reading from flat closures. The suite reads 3 times from flat closures. (For reference, it creates 4 of them.) - Total ops. The suite executes approximately 117,395,644 JSOPs. Thus, the frequency of closure-related ops is: any closure var access 1 / 1,200 (1 JSOP in 1200 is of this type) closure var write/incr 1 / 4,800,000 closure creation 1 / 12,800 The worst case kind of closure access for us is one that causes us to fall off trace. The cost of those is difficult to quantify; I'll do some tests of closures with tracing later, which may yield some info. We know for sure that it costs us a lot on some workloads, making TM as slow as the interpreter or slower. Otherwise, going down the slow path makes us take about 4x longer than the average JSOP, and the really slow path makes us take about 26x longer (these are numbers for tracing mode). Making closure var access no more expensive than the average JSOP would thus give about a 0.3-2.2% improvement. So, at least for the V8 suite, the primary goal should be designing something that can be practically trace- and jaeger-compiled for any closure usage. That will be the big win. Minimizing the cycle count for closure accesses then might buy us up to another 3% total, which is worthwhile but doesn't need to be a priority. Next step is to do this on more workloads. When that's done I can try to synthesize again and see what it all means.
SunSpider results: 0. Ops N = 163,571,565 1. Closure Creation kind count % full 28,023 98.8 flat 9 0.03 null 354 1.3 all 28,386 100.0 For the flat closures, 8 had one upvar and one had two upvars. Cloning was delayed for 10 null closures (2.8% of null closures, 0.03% of closures). None of those functions were cloned later. 2. Closure Var Access kind count % null 5,500 3.2 dist=1 5,500 100.0 flat 82,261 47.5 dist=1 82,261 100.0 full 85,561 49.3 steps=1, no eval 60,060 29.8 steps=1, with eval 25,501 70.2 all 173,322 100.0 No upvars were written or incremented. 3. Discussion These results are in many ways similar to the ones for the V8 suite. The only big differences are: - This time, about 1/2 of accesses were through flat closures, while in V8 all were through full closures. - This time, the distance to the upvar was always 1, while in V8 it ranged from 1-3, with a mean of 2.4. Because the null closures all had distance 1, it should be slightly faster to read along the scope chain in SS (3 loads instead of 4); emphasize 'slightly', because if we assume the read is an L1 miss but an L2 hit (L2 misses are rare), the speedup is about 1 ms over the suite. This option does take away the opportunity to skip cloning, but we only save 10 clones in this case (saving maybe about 1 us or so). The flat closures also all had distance 1. For that case, we gain 1 extra load by going to the scope chain scheme, which again costs just 1 ms. Short story is that switching all closure var accesses to the optimized compiled scope chain walk would be expected to have no effect on SS to within 1 ms or so.
Dave, could you rerun with the patch for bug 542002 applied? If it doesn't apply, I will rebase it. Thanks, /be
Google Spreadsheets. Procedure. I started a browser, logged in to Google Docs, created a small spreadsheet with a few values and formulas, then changed some fonts and formatting. 0. Op count N = 10,231,973 1. Closure Creation kind count % null 16,194 72.6 0 upvars 16,141 99.7 1 upvar 47 0.3 2-7 upvars 6 0.04 flat 2,215 10.0 1 upvar 1,264 57.1 2 upvars 767 34.6 3 upvars 173 7.8 4 upvars 11 0.5 full 3,889 17.4 all 22,298 All flat closures were reading from 1 step up. Cloning was delayed for 4449 null closures (27% of null closures, 20% of all closures). The delay was permanent (i.e., they never needed to be cloned) for all but 8. The total speedup from delaying cloning on this test is estimated to be about 1 ms (4449 * 200ns / (1M ms/ns) = 0.9 ms). 2. Closure Var Reading null 292 (all distance 1) flat 4,584 (all distance 1) full 16,439 1 step 13,193 (no dynamic scopes in any 1-step reads) 2 steps 3,246 (1 reads crossed a block scope, 4 crossed an eval scope) all 21,315 2 of the 2-step reads crossed a block scope, and 4 crossed an eval scope. Those were the only non-Call objects encountered on the scope chain doing closure variable reads. 3. Closure Variable Writing and Incrementing There were 873 one-step writes and 2 two-step writes. One of the latter crossed an eval scope. There were 4 increments, all of them at one step. 4. Discussion This workload does more closure variable access than the others: closure variable reads were about 1 in 500 ops, compared to 1 in 1000 ops for the benchmarks. Most of the accesses were only 1 or 2 steps away, which bodes well for the scope-chain-walk implementation. It's also nice that truly dynamic scopes are rare.
Dromaeo in browser. Presumably this is also fairly reflective of jQuery-based stuff. 0. Op count N = 1,900,585,288 1. Closure Creation kind count % null 17,236 10.8 flat 769 0.5 full 141,740 88.7 all 159,745 = 1 in 12,000 ops All but 17 of the null closures had 0 upvars. Cloning was avoided for 316 (2% of null closures, 0.2% of all closures). 2. Closure Var Reads null 21,550 flat 233,525 full 5,850,759 1 step 5,717,000 2 steps 110,598 3 steps 11,021 4 steps 12,140 all 6,105,834 = 1 in 300 ops 2.5% of 1-step reads crossed an eval scope. 0.01% of 2-step reads crossed an eval scope. No other kinds of non-Call scope were seen. 3. Closure Var Writes incr 3,729 1 step 3,693 2 steps 35 write 2,101,827 1 step 2,101,637 2 steps 84 3 steps 106 all 2,105,556 = 1 in 900 ops 81 of these writes crossed an eval scope. There were also 160,160 writes and increments (half and half) at a distance of 0 (so they are not really closure vars) in an eval scope.
General Discussion. 1. Closure Creation Depending on the workload, about 1 in 5k-13k ops creates a closure. Thus, making this faster is not going to be much of a win. It's hard to think of an application that would need to create closures at a high rate, as well. It appears that delayed cloning of null closures isn't really buying us much, especially relative to the risk of bugs. 2. Closure Var Read Access About 1 in 300-1000 ops reads a closure variable. Thus, they aren't the most popular of ops, but they are fairly common. In most workloads, almost all are at a 1-callobj distance. The exception is the V8 benchmark suite, where the average distance is 2.4. Most of these reads never cross any scope other than Call objects. Very few flat closures are created, but they are responsible for up to 50% of closure variable reads. So, they are definitely helping us in the interpreter and on trace. With JM in place, they would turn into a small speedup. Given that they are fairly simple, they do seem worth keeping. Null closures break down into two subcases. First, we create a lot of null closures that access no closure variables. Here, they aren't really hurting us, but they don't seem to help much, either. The second subcase is null closures that do access variables. Here, they help quite a bit in the interpreter, but they hurt quite a bit on trace, because the display access path turns into convoluted code on trace. Null closures with upvars are pretty rare, and all of them are at a one-step distance, so I think we could benefit in both perf and code simplicity from replacing them with a fast scope-chain lookup in all execution modes. Closure variable access crosses eval scopes only rarely. We should optimize for this. No eval or declenv objects were ever seen while reading closure variables in these tests. Most upvars have fewer steps than distance. That is, the scope chain lookup path usually has functions that don't need a call object. This tells me that many function calls don't need a call object, so the savings from creating it only when needed is probably substantial. 3. Closure Variable Write Access About 1 op in 1k-10k writes a closure variable. So, this is uncommon, but not truly rare, and should be optimized. In particular, massive slowdowns (like falling off trace) are not acceptable. As with writes, most of these are only one step away and dynamic scopes are rare.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
(In reply to comment #8) > General Discussion. > > 1. Closure Creation > > Depending on the workload, about 1 in 5k-13k ops creates a closure. Thus, > making this faster is not going to be much of a win. It's hard to think of an > application that would need to create closures at a high rate, as well. See bug 508716 and look out for similar code that nests methods inside the constructor function. The indirect effects can be bad: 1. Turning the constructor heavyweight, so you get not only a Call object per constructor invocation, but also a different parent guard for any method call on an instance, even if from the same callsite -- leading to inner tree is trying to grow and other trace pathologies. 2. Making the methods assigned to this.method1, etc. cloned function objects, so for N instances each with M methods, you get M*N function objects plus the Call for the ctor, plus |this|. > It appears that delayed cloning of null closures isn't really buying us much, > especially relative to the risk of bugs. The pattern described above is more common than Ollie's fluid simulator. It has been popularized by Doug Crockford among others, see his "JavaScript: The Good Parts" book. I wouldn't argue the method barrier is free of bugs or costs (no significant code is) but I would not rush to rip it out just yet. > Null closures break down into two subcases. First, we create a lot of null > closures that access no closure variables. Here, they aren't really hurting us, > but they don't seem to help much, either. Again they help in that their parent is the global object on trace, so they do not require any parent guard (see patch in progress for bug 542002 -- the current guardCallee code unnecessarily guards even if parent == globalObj). If there are outer Call object scopes and you do not classify 0-upvar functions as null closures (or equivalently, special case their 0-upvars-used status), you'll wind up with unwanted parent variation. Of course if we can get rid of the parent guard without regressing perf (bz took advantage of it to avoid null-testing Call object privates and branching to get or set values from stack slots, see bug ), then maybe it's ok... > The second subcase is null closures > that do access variables. Here, they help quite a bit in the interpreter, but > they hurt quite a bit on trace, because the display access path turns into > convoluted code on trace. However, we talked today about using upvar skip/slot pairs for all accesses, from null or full closures, to navigate the scope chain. If this is fast enough then great. The interpreter may still be faster if it uses the display -- or it could be close enough to not matter. Display maintenance hurts too, although not a perf regression at the time it went in. If we keep the display we could avoid maintaining it when calling escaped functions that do not nest Algol-like functions. /be
> to get or set values from stack slots, see bug ), then maybe it's ok... Er, see bug 530255. /be
Data for the fluid simulator (bug 508716, http://nerget.com/fluidSim/): opcount = 131,368,863 1. Closure Creation kind count % null 5,073 80 0 upvars 5,061 1 upvar 9 2-4 upvars 3 flat 541 8 full 740 12 all 6,354 = 1 in 20,000 ops Cloning was skipped for 47 (1%) null closures. 2. Closure Var Reads null 86 flat 631 full 4,346,255 1 step 4,259,369 2 steps 86,886 all 4,346,972 = 1 in 30 ops All null and flat reads were at distance 1. No dynamic scopes were observed. 3. Closure Var Writes and Increments There weren't many, just 4 increments at 1 step, 47 writes at 1 step, and 1 write at 2 steps.
(In reply to comment #9) > (In reply to comment #8) > > General Discussion. > > > > 1. Closure Creation > > > > Depending on the workload, about 1 in 5k-13k ops creates a closure. Thus, > > making this faster is not going to be much of a win. It's hard to think of an > > application that would need to create closures at a high rate, as well. > > See bug 508716 and look out for similar code that nests methods inside the > constructor function. The indirect effects can be bad: > > 1. Turning the constructor heavyweight, so you get not only a Call object per > constructor invocation, Avoiding Call object creation is definitely a good thing, but I think it is required for the 'constructor-closure pattern' (whatever it's called) anyway: function createThing(x, y) { return { foo: function() { ... x ... } ... } } The closures created all escape, so they cannot be null closures unless they access no upvars (or can they not even be null closures then due to not escaping?). But a function with no upvars will not force any enclosing function to have a Call object. > but also a different parent guard for any method call on an instance, even > if from the same callsite -- leading to inner tree is trying to grow and > other trace pathologies. The parent guard is a real problem, but we will not get very far unless we can eliminate it for full closures as well. I think we will be able to do that now that we have a real scope chain on trace. For upvar purposes anyway, the callee parent guard is required only for accesses to variables in full closures (i.e., via JSOP_*NAME). If we recode those to do a walk up the scope chain, the on-trace access will be faster, and we won't need the callee parent guard any more for that purpose. > 2. Making the methods assigned to this.method1, etc. cloned function objects, > so for N instances each with M methods, you get M*N function objects plus the > Call for the ctor, plus |this|. The reduction I am seeing in practice is pretty low, less than 1% in all my tests but one. It was 20% in the Google Docs test, which meant that we reduced the function object count by about 5,000. That's about 320K memory savings, which seems like not a huge deal but non-negligible. Almost all of those 5,000 have no upvars (only 53 null closures in that test had upvars). So not cloning can't hurt on-trace upvar access for those. So I revise my opinion to say that not cloning costs us time when the function does have upvars (with neglible time win), but when it has no upvars, not cloning is a win on space with no time cost. > > It appears that delayed cloning of null closures isn't really buying us much, > > especially relative to the risk of bugs. > > The pattern described above is more common than Ollie's fluid simulator. It has > been popularized by Doug Crockford among others, see his "JavaScript: The Good > Parts" book. I wouldn't argue the method barrier is free of bugs or costs (no > significant code is) but I would not rush to rip it out just yet. > > > Null closures break down into two subcases. First, we create a lot of null > > closures that access no closure variables. Here, they aren't really hurting us, > > but they don't seem to help much, either. > > Again they help in that their parent is the global object on trace, so they do > not require any parent guard (see patch in progress for bug 542002 -- the > current guardCallee code unnecessarily guards even if parent == globalObj). > If there are outer Call object scopes and you do not classify 0-upvar functions > as null closures (or equivalently, special case their 0-upvars-used status), > you'll wind up with unwanted parent variation. That's true today. Just repeating myself from above, I think if we walk scope chains on trace whenever there are 1+ upvars, we can do the access faster on trace and also not need the callee parent guard at all. > Of course if we can get rid of the parent guard without regressing perf (bz > took advantage of it to avoid null-testing Call object privates and branching > to get or set values from stack slots, see bug ), then maybe it's ok... > > > The second subcase is null closures > > that do access variables. Here, they help quite a bit in the interpreter, but > > they hurt quite a bit on trace, because the display access path turns into > > convoluted code on trace. > > However, we talked today about using upvar skip/slot pairs for all accesses, > from null or full closures, to navigate the scope chain. If this is fast > enough then great. The key thing here is that to use the skip/slot pair to be fast on trace, we need to have a scope chain, so we would only use null closures when there are no upvars. > The interpreter may still be faster if it uses the display -- or it > could be close enough to not matter. Display maintenance hurts too, although > not a perf regression at the time it went in. For the interpreter, we'll just have to do some tests. I doubt it matters a lot either way, because my tests show <5% of closure var reads being done by null closures. At least I don't see any reason we *can't* use the display in the interpreter for that kind of closure and also do the scope chain walk in compiled code. > If we keep the display we could avoid maintaining it when calling escaped > functions that do not nest Algol-like functions. I have wondered idly about display maintenance costs in the past, but if it was shown empirically not to affect perf in the past, I'm pretty sure it doesn't have a noticeable cost now.
> > If we keep the display we could avoid maintaining it when calling escaped > > functions that do not nest Algol-like functions. > > I have wondered idly about display maintenance costs in the past, but if it was > shown empirically not to affect perf in the past, I'm pretty sure it doesn't > have a noticeable cost now. When start JITing ast paths for JSOP_CALL, it seems like such costs would be magnified. Similarly, when our trace transition (Trace -> Jaeger/Interp) times improve with the dual-stack, display maintenance will be one of the few (only?) things that doesn't fall into the fast fixup scheme.
(In reply to comment #12) BTW, did you "stir the fluid" in the simulator? > > 1. Turning the constructor heavyweight, so you get not only a Call object per > > constructor invocation, > > Avoiding Call object creation is definitely a good thing, but I think it is > required for the 'constructor-closure pattern' (whatever it's called) anyway: > > function createThing(x, y) { > return { foo: function() { ... x ... } ... } > } > > The closures created all escape, so they cannot be null closures unless they > access no upvars (or can they not even be null closures then due to not > escaping?). The issue is not that such methods should be null closures. The issue is that the methods should be flat closures (note x and y definitions dominate all uses in the closure). One problem addressed by the patches in progress for bug 542002 is that if any method uses a "class static" (or "module static" if you prefer) from a Call wrapping the scope of the ctor, then the tip flat closure code gives up -- it wants all upvars flat, or non (full closure). > But a function with no upvars will not force any enclosing function > to have a Call object. Any partial flat closure reaching both one and more than one "skip" up will currently fail to be flat and force the function one up to be heavyweight. > The parent guard is a real problem, but we will not get very far unless we can > eliminate it for full closures as well. Indeed, we don't want to work hard trying to sometimes avoid the parent guard if we can kill it in all cases. > I think we will be able to do that now that we have a real scope chain on > trace. For upvar purposes anyway, the callee parent guard is required only for > accesses to variables in full closures (i.e., via JSOP_*NAME). Really? TR::guardCallee does not avoid emitting it for other cases including flat closures. I will try that to fix my static analysis conundrum in bug 542002 now! > > 2. Making the methods assigned to this.method1, etc. cloned function objects, > > so for N instances each with M methods, you get M*N function objects plus the > > Call for the ctor, plus |this|. > > The reduction I am seeing in practice is pretty low, less than 1% in all my > tests but one. It was 20% in the Google Docs test, which meant that we reduced > the function object count by about 5,000. That's about 320K memory savings, > which seems like not a huge deal but non-negligible. The Crockford "functional vs. prototypal" meme has not spread to Google yet. But it's out there. I'm sounding a note of caution because "in practice" is your helpful sample, which while helpful is kinda tiny ;-). > Almost all of those 5,000 have no upvars (only 53 null closures in that test > had upvars). So not cloning can't hurt on-trace upvar access for those. So I > revise my opinion to say that not cloning costs us time when the function does > have upvars (with neglible time win), but when it has no upvars, not cloning is > a win on space with no time cost. And the only way we avoid cloning is if the "method" is a null closure, and the only null closures with upvars are the non-escaping ones which contradicts the hypothesis of a "method" -- something that escapes via this.m1 = function()... or {m1: function()..., ...} -- so I can see why you found so few null closures with upvars. That says you found few Algol-like non-funarg closures. But it does not say much about the method barrier. Hope I'm making sense. The FUN_KIND classification is too tight, in that it makes both non-escaping upvar-using closures *and* escaping no-upvar method candidates be "null closures". The "null" does not mean empty set of upvars. Could have a better name or (why not?) inline predicates for classifying these two distinct cases. > That's true today. Just repeating myself from above, I think if we walk scope > chains on trace whenever there are 1+ upvars, we can do the access faster on > trace and also not need the callee parent guard at all. Could well be -- competition does that and wins, so we should try it. The separate issue of avoid 1e6 * M method clones for 1e6 constructions of an object using a ctor with M null-closure methods assigned to |this| stands. > The key thing here is that to use the skip/slot pair to be fast on trace, we > need to have a scope chain, so we would only use null closures when there are > no upvars. Right. And based on your findings so far (want bigger sample but I'll take what you can get ;-) the Algol-like case seems rare enough to drop. This saves us from needing to maintain the display. > At least I don't see any reason we *can't* use the display in the > interpreter for that kind of closure and also do the scope chain walk in > compiled code. We could but who maintains the display going back to the interpreter from the JITs? Sounds like pain. Hope you can try the (unsafe, but not in practice) patch I just put in 542002. And stir that fluid! /be
(In reply to comment #13) > > > If we keep the display we could avoid maintaining it when calling escaped > > > functions that do not nest Algol-like functions. > > > > I have wondered idly about display maintenance costs in the past, but if it was > > shown empirically not to affect perf in the past, I'm pretty sure it doesn't > > have a noticeable cost now. > > When start JITing ast paths for JSOP_CALL, it seems like such costs would be > magnified. Similarly, when our trace transition (Trace -> Jaeger/Interp) times > improve with the dual-stack, display maintenance will be one of the few (only?) > things that doesn't fall into the fast fixup scheme. Yes, I agree. Display maintenance not regressing perf says our interpreter call path is just really slow. Let's try fixing both those issues by removing the display and making the (most used) interpreter really fast. /be
(In reply to comment #14) > (In reply to comment #12) > > BTW, did you "stir the fluid" in the simulator? Yes. When I first went to that page, I was kind of confused by the big black box, but I figured it out after randomly clicking on it. > > > 1. Turning the constructor heavyweight, so you get not only a Call object per > > > constructor invocation, > > > > Avoiding Call object creation is definitely a good thing, but I think it is > > required for the 'constructor-closure pattern' (whatever it's called) anyway: > > > > function createThing(x, y) { > > return { foo: function() { ... x ... } ... } > > } > > > > The closures created all escape, so they cannot be null closures unless they > > access no upvars (or can they not even be null closures then due to not > > escaping?). > > The issue is not that such methods should be null closures. > > The issue is that the methods should be flat closures (note x and y definitions > dominate all uses in the closure). One problem addressed by the patches in > progress for bug 542002 is that if any method uses a "class static" (or "module > static" if you prefer) from a Call wrapping the scope of the ctor, then the tip > flat closure code gives up -- it wants all upvars flat, or non (full closure). OK. I have no problem with that. > > But a function with no upvars will not force any enclosing function > > to have a Call object. > > Any partial flat closure reaching both one and more than one "skip" up will > currently fail to be flat and force the function one up to be heavyweight. That's pretty much how it has to be, right? > > The parent guard is a real problem, but we will not get very far unless we can > > eliminate it for full closures as well. > > Indeed, we don't want to work hard trying to sometimes avoid the parent guard > if we can kill it in all cases. Yes, I have some new ideas for this. I need to check them over myself. I'll file some bugs if it looks like they will work. > > I think we will be able to do that now that we have a real scope chain on > > trace. For upvar purposes anyway, the callee parent guard is required only for > > accesses to variables in full closures (i.e., via JSOP_*NAME). > > Really? TR::guardCallee does not avoid emitting it for other cases including > flat closures. I will try that to fix my static analysis conundrum in bug > 542002 now! I don't know exactly what all the reasons for the parent guard are, so I was careful to say it is not required for flat closures for upvar access purposes. I think the other reason is to make sure the global object of the function is the same as on trace, both for security, and access to global slots. I now think those reasons cover it, but I am not entirely sure. Based on that, some kind of guard is clearly necessary. But it can be relaxed to a guard that the global object is equal to the trace global. I want to try doing the relaxed guard the dumb way first, to see if that doesn't regress perf. Otherwise, I think we can still relax it by caching a function object's global closer to itself. > > > 2. Making the methods assigned to this.method1, etc. cloned function objects, > > > so for N instances each with M methods, you get M*N function objects plus the > > > Call for the ctor, plus |this|. > > > > The reduction I am seeing in practice is pretty low, less than 1% in all my > > tests but one. It was 20% in the Google Docs test, which meant that we reduced > > the function object count by about 5,000. That's about 320K memory savings, > > which seems like not a huge deal but non-negligible. > > The Crockford "functional vs. prototypal" meme has not spread to Google yet. > But it's out there. > > I'm sounding a note of caution because "in practice" is your helpful sample, > which while helpful is kinda tiny ;-). Very true. > > Almost all of those 5,000 have no upvars (only 53 null closures in that test > > had upvars). So not cloning can't hurt on-trace upvar access for those. So I > > revise my opinion to say that not cloning costs us time when the function does > > have upvars (with neglible time win), but when it has no upvars, not cloning is > > a win on space with no time cost. > > And the only way we avoid cloning is if the "method" is a null closure, and the > only null closures with upvars are the non-escaping ones which contradicts the > hypothesis of a "method" -- something that escapes via this.m1 = function()... > or {m1: function()..., ...} -- so I can see why you found so few null closures > with upvars. > > That says you found few Algol-like non-funarg closures. But it does not say > much about the method barrier. > > Hope I'm making sense. The FUN_KIND classification is too tight, in that it > makes both non-escaping upvar-using closures *and* escaping no-upvar method > candidates be "null closures". The "null" does not mean empty set of upvars. > Could have a better name or (why not?) inline predicates for classifying these > two distinct cases. I think you're making sense. I also think a reclassification of closures is in order. First, individual free variables can be classified like this (all terminology for discussion purposes only, don't want to start terminology bikeshedding!): imported read via flat closure mechanism chained read/written via optimized scope chain walk Then closures can be classified like this: unchained := 0 chained upvars skips cloning w/ read barrier, can use global as scope chain chained := 1+ chained upvars has real scope chain, must be cloned > > That's true today. Just repeating myself from above, I think if we walk scope > > chains on trace whenever there are 1+ upvars, we can do the access faster on > > trace and also not need the callee parent guard at all. > > Could well be -- competition does that and wins, so we should try it. > > The separate issue of avoid 1e6 * M method clones for 1e6 constructions of an > object using a ctor with M null-closure methods assigned to |this| stands. I assume instead of "null-closure methods" you mean "flat-closure methods and 0-upvar methods" (or "unchained-closure methods" in the terms above). And yes to that. > > The key thing here is that to use the skip/slot pair to be fast on trace, we > > need to have a scope chain, so we would only use null closures when there are > > no upvars. > > Right. And based on your findings so far (want bigger sample but I'll take what > you can get ;-) the Algol-like case seems rare enough to drop. This saves us > from needing to maintain the display. The implementation of the Algol-like case is hard to deal with on trace. But I think the real test is to try the scope-chain-based alternative and compare perf on microbenchmarks. If it doesn't penalize vs. the Algol case, then it seems we're good to go. Otherwise we have to study more, since we can't easily rule out the possibility of code that does what the microbenchmark does. > > At least I don't see any reason we *can't* use the display in the > > interpreter for that kind of closure and also do the scope chain walk in > > compiled code. > > We could but who maintains the display going back to the interpreter from the > JITs? Sounds like pain. I forgot about that part. But don't we already do that? > Hope you can try the (unsafe, but not in practice) patch I just put in 542002. > And stir that fluid! I put it on my list for this week.
(In reply to comment #16) > > > But a function with no upvars will not force any enclosing function > > > to have a Call object. > > > > Any partial flat closure reaching both one and more than one "skip" up will > > currently fail to be flat and force the function one up to be heavyweight. > > That's pretty much how it has to be, right? The case in Ollie's fluid simulator has the Fluid constructor's methods using some number of one-up upvars whose definitions dominate their uses, while the two-up upvar to the "module static" (in FluidField) must reach along the (sparse) scope chain. So with the patch in bug 542002, we do not need a Call object for the Field ctor, and since the FluidField module function is called only once, guarding on its Call object as callee-parent, while it costs a guard, does not branch exit all the time -- the guard always succeeds. > I don't know exactly what all the reasons for the parent guard are, so I was > careful to say it is not required for flat closures for upvar access purposes. > I think the other reason is to make sure the global object of the function is > the same as on trace, both for security, and access to global slots. TraceRecorder::interpretedFunctionCall checks and aborts on cross-global calls: if (JS_GetGlobalForObject(cx, JSVAL_TO_OBJECT(fval)) != globalObj) RETURN_STOP("JSOP_CALL or JSOP_NEW crosses global scopes"); But is this not enough, because a compiled trace could cross when triggered where it did not when recorded? No, because we inline the function call after guarding on its (JSFunction*) identity. So it's really just the upvar purposes, specifically the optimized LIR that bz hacked tracing code to emit. Check me here, please. > First, individual free variables can be classified like this (all > terminology for discussion purposes only, don't want to start terminology > bikeshedding!): > > imported read via flat closure mechanism > chained read/written via optimized scope chain walk > > Then closures can be classified like this: > > unchained := 0 chained upvars skips cloning w/ read barrier, can use > global as scope chain > chained := 1+ chained upvars has real scope chain, must be cloned I dig it -- "Unchained Melody" is playing in my head (or was it the "Cheers" Robin/Rebecca wedding episode, with Frasier and his Karaoke machine upstaged by one of the Righteous Brothers ;-)?) > > The separate issue of avoid 1e6 * M method clones for 1e6 constructions of an > > object using a ctor with M null-closure methods assigned to |this| stands. > > I assume instead of "null-closure methods" you mean "flat-closure methods and > 0-upvar methods" (or "unchained-closure methods" in the terms above). Yes, new terminology is better. But a null-closure *method* (because it heap escapes by being assigned to this.foo, i.e., it's an ad-hoc "method") must have 0 upvars -- it can't be the non-escaping Algol-like thing or it would not have kind JSFUN_NULL_CLOSURE. > And yes to that. Yay. > > We could but who maintains the display going back to the interpreter from the > > JITs? Sounds like pain. > > I forgot about that part. But don't we already do that? We do from tracer back to bytecode interp, but I was thinking about jaeger. /be
(In reply to comment #17) > > I don't know exactly what all the reasons for the parent guard are, so I was > > careful to say it is not required for flat closures for upvar access purposes. > > I think the other reason is to make sure the global object of the function is > > the same as on trace, both for security, and access to global slots. > > TraceRecorder::interpretedFunctionCall checks and aborts on cross-global calls: > > if (JS_GetGlobalForObject(cx, JSVAL_TO_OBJECT(fval)) != globalObj) > RETURN_STOP("JSOP_CALL or JSOP_NEW crosses global scopes"); > > But is this not enough, because a compiled trace could cross when triggered > where it did not when recorded? No, because we inline the function call after > guarding on its (JSFunction*) identity. > > So it's really just the upvar purposes, specifically the optimized LIR that bz > hacked tracing code to emit. Check me here, please. So I think its possible, due to cloning, to have two function objects with the same JSFunction* but different global objects. (If not, skip the rest.) Let's say you take two such function objects and stick em in alternating fashion in an array 'a', and then do: for (i = 0; i < a.length; ++i) a[i](); At record time, the recorder might see a call to a function object with the same global, but, at runtime, it would encounter a different function object with the same JSFunction*, but different global object. Am I missing something?
(In reply to comment #18) Luke, good point -- we have a bug here already, AFAICT (please file it): var g1 = this; var g2 = evalcx(''); function m() { return x; } g2.m = clone(m, g2); var x = 42; g2.x = 99; var a = [g1, g1, g1, g1, g2]; var b = []; for (var i = 0; i < a.length; i++) b[b.length] = a[i].m(); print(uneval(b)); print(g2.m()); checkStats({ recorderStarted: 1, recorderAborted: 0, traceCompleted: 1 }); produces this output: [42, 42, 42, 42, 42] 99 I don't see any global-crossing checks in the JSOP_CALLELEM recording path. /be
(In reply to comment #19) > I don't see any global-crossing checks in the JSOP_CALLELEM recording path. Er, JSOP_CALLPROP, for a[i].m(). We hit the property cache, emit a constant obj for the callee, so TR::functionCall tests if (!get(&fval)->isconstp()) CHECK_STATUS(guardCallee(fval)); and does not guardCallee. It should not, since we must be inlining the constant function (JSFunction) -- but it can't assume the object and its intrinsic scope are the same on-trace. /be
Filed as bug 544161.
I redid the fluid simulator (comment 11) with the patch in bug 508716 applied: opcount = 156,818,628 1. Closure Creation kind count % null 7,970 82 (vs. 80 pre-patch) 0 upvars 7,949 1 upvar 14 2-8 upvars 7 flat 1,236 13 (vs. 8 pre-patch) full 513 5 (vs. 12 pre-patch) all 9,719 = 1 in 16,000 ops 18 flat closures (1.5% of flat closures, 0.2% of all closures) had flat upvar at distance 2 (as well as one at distance 1). All other flat closures had 1-4 upvars, all at distance 1. Cloning was skipped for 48 closures total. Here we see a noticeable improvement with the patch: 12% of closures require a scope chain before, and 5% after, mainly due to getting more flat closures. 2. Closure Var Reads null 581 flat 104,636 full 5,064,718 1 step 4,961,217 2 steps 103,501 all 5,169,935 All null reads were at distance 1. No dynamic scopes were observed. Here again, as we expect, flat closures are the main thing that has changed. Before the patch, 0.01% of upvar reads were through flat closures; after, it is 2%.
(In reply to comment #23) > I redid the fluid simulator (comment 11) with the patch in bug 508716 applied You mean the patch in bug 542002 applied? /be
(In reply to comment #24) > (In reply to comment #23) > > I redid the fluid simulator (comment 11) with the patch in bug 508716 applied > > You mean the patch in bug 542002 applied? Yes. I forgot the number and picked the wrong one writing comment 23.
Depends on: 545051
You need to log in before you can comment on or make changes to this bug.