Open Bug 894971 Opened 11 years ago Updated 2 years ago

Entrain fewer unnecessary variables in closures

Categories

(Core :: JavaScript Engine, defect)

defect

Tracking

()

People

(Reporter: khuey, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file)

In a setup such as:

function outer() {
  var v_1;
  function f_1() {
    doSomething(v_1);
  }
  var v_2;
  function f_2() {
    doSomething(v_2);
  }
  ...
  var v_N;
  function f_N() {
    doSomething(v_N);
  }
}

Let C_n be the set of variables which f_n closes over.  Currently Spidermonkey creates a single call object which every f_n shares, and that call object references the union of each C_n for every n.  This means that keeping a single f_n alive will entrain v_1, v_2, ... v_N.

I'm led to believe that the gold standard in this area is something called "safe-for-space" closures which ensure that a function only entrains the set of variables it absolutely must, but it's possible to imagine less complex methods.  For instance, we could create separate call objects only for functions f_m where C_m is disjoint from the union of all C_n n != m.  That would fix the (admittedly contrived) example above as well as bug 894135.
Whiteboard: [MemShrink]
Ok, I'm curious. It seems like we know what variables are captured, so I don't understand why we can't do the optimal thing and only hang onto the stuff that's accessible.

I'm attaching a "patch" with pseudocode for the obvious thing: don't mark anything in a CallObject unless it's used by something live. You'll see from the patch that I don't understand the data structures involved.

Anyway, I'm sure there's a fatal flaw here where we're not keeping the necessary information for some optimizationy reason, so let this serve as me asking what that reason is.
Attachment #778693 - Flags: feedback?(luke)
Comment on attachment 778693 [details] [diff] [review]
Pseudocode for marking only the used slots of a CallObject

That's certainly an interesting idea.

It's a bit more complicated than iterating through bindingArray, though, since that is just the list of bindings for the containing JSScript.  Also, I think you'd have to do the markVarUsed starting from a JSFunction so that way you could find the right fun->environment() scope chain to mark.

On first consideration, it seems like what you'd need to need is for each JSScript to additionally record its free variables.  You wouldn't want to store the list of *names* of free variables, though, since that would involve hash-table lookups to find slots, but rather you'd want to store, for each free variable, the 32-bit ScopeCoordinate.  With this strategy, I worry about (1) the extra memory to store these free variables, (2) the time spent traversing the scope chain to mark each free variable.  In theory, you could sort the free variables so that you could do all the marking in a single linear pass over the scope chain, but it still seems like this could lead to much higher mark times for code with many live scopes/closures.  Going further, each CallObject could have a bitmask with a bit per nested function that has already marked this CallObject.  That way, each call object would get marked at most K times where K is the (static) number of nested scripts.

Also, the "what about dead variable slots" question you raised is a good one: if the Debugger jumps in and asks to see random variables, we could possibly observe these garbage values.

So, overall, this approach seems feasible (and much easier than safe-for-space) but we'd have to be careful to measure that we didn't regress ordinary closure mark speed or increase memory usage too much.  It'd also be interesting to have a dynamic measurement to see how often variables are unnecessarily entrained in the manner that is preventable in this manner (I was saying earlier that such an analysis would be useful for bug finding anyway).
Attachment #778693 - Flags: feedback?(luke)
Whiteboard: [MemShrink] → [MemShrink:P1]
Blocks: 896088
Whiteboard: [MemShrink:P1] → [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
I have just witnessed an interesting issue that was apparently caused by this bug.

function() {
  let foo = someComponent();
  let cb = function() { dump("Callback called\n"); };
  foo.init(cb);

  // No cycle in the code, but `foo` keeps `cb` alive and `cb` keeps `foo` alive.
}

In the sample, this caused process freezes because garbage-collection of `foo` should have stopped a thread.
The cycle collector should be able to collect such cycles.  Last time I saw something that looked like this, it ended up being a general bug in the platform (bug 731868 comment 35) that likely was causing other leaks (bug 757749).
Yeah, please file a bug on that and CC me, and include the actual component you are using.  The first step would be to look at a CC log.
(In reply to David Rajchenbach Teller [:Yoric] from comment #3)
> I have just witnessed an interesting issue that was apparently caused by
> this bug.
> 
> function() {
>   let foo = someComponent();
>   let cb = function() { dump("Callback called\n"); };
>   foo.init(cb);
> 
>   // No cycle in the code, but `foo` keeps `cb` alive and `cb` keeps `foo`
> alive.
> }
> 
> In the sample, this caused process freezes because garbage-collection of
> `foo` should have stopped a thread.

For my own edification, (1) why does cb keep foo alive? I didn't think we entrained variables that were never captured by any closure. (Unless the example is incomplete, and there's a dummy closure not shown that captures foo.) And (2) are we really using GC to decide when to stop threads? That seems dangerous, as with anything that uses GC to manage other types of resources.
Assignee: general → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: