Closed Bug 1479828 Opened Last year Closed Last year

[hazards] generalize "GC suppressions" to any RAII-like construct

Categories

(Core :: JavaScript: GC, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: sfink, Assigned: sfink)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Currently, the hazard analysis takes GC suppression regions into account by looking at calls made within the scope of RAII GC suppressors, and determining whether a function is always called within a suppression zone.

Actually, let me back up. The hazard analysis is looking for the pattern

    void somefunc() {
      JSObject* obj = JS_NewObject();
      foo(cx);
      use(obj);
    }

(so 'obj' is live across the call to foo()) and it needs to know two things:

  (1) can foo() GC?
  (2) is somefunc() ever called when GC is not suppressed?

GC suppression must be taken into account for both of these questions. For (1), we could ignore all calls made within a GC suppression region when building the callgraph. For (2), this is not adequate; when processing the body of somefunc(), we don't know anything about the context in which it is called. (I suppose we could see if it's entirely missing from the callgraph because nothing calls it, but there have always been ways to get gaps in the callgraph so that would not be very reliable. Some of those aren't really fixable, eg if a function is only called via a function pointer.)

So instead, I propagate GC suppression information through the callgraph: if f1 calls f2 within a GC suppression region, the edge is annotated with "SUPPRESS_GC". If f1 calls f2 multiple times, and at least one of those calls is *not* within a suppression region, then the edge is not annotated. Then when traversing the callgraph, this suppression information is propagated through the edges and associated with each function.

Oh, I should mention that function bodies are analyzed independently to form call graph edges, then the call graph is traversed to propagate suppression info as well as "can reach GC()" info, and then the function bodies are analyzed independently again.

So (1) is achieved by making the local analysis of a function ignore calls in (function-local) suppression regions, and also ignore calls that cannot reach GC() generally.

(2) is achieved by pushing the suppression info through the graph. Then when somefunc() is analyzed, we'll know if it is *always* called with GC suppressed.

Having written this up, I realize that the code for (2) is rather indirect. I pass in whether or not somefunc() is suppressed, but instead of bailing out immediately I still scan through every potential hazard but in the end variableLiveAcrossGC() will always return false if somefunc() was suppressed. This makes sense for detecting unnecessary roots or unsafe references to unrooted values, but seems rather pointless for variableLiveAcrossGC(). /me adds to fix list.
Ok, that was all background. Given that the hazard analysis is our only whole-program analysis, it seems very useful to be able to look at other RAII scopes and track them through the callgraph. That would enable simple things like "do you ever have a ScratchRegister scope nested within a ScratchRegister scope?" or maybe "do you ever try to take a lock when it might be locked already?" So I thought I'd generalize the GC suppression a bit, into "limits", where you have a set represented as an integer bit vector, and you AND them together while traversing the callgraph. (The implementation is a bit more complicated, since the callgraph contains cycles.)

In fact, I'd like to generalize this further, to annotate each edge with two bit sets: one the same as the "limits" in this patch, where all contributors are ANDed together, and then a maximal bitset where the contributors are ORed together. Then you could simultaneously answer the questions "is this *ever* called within the RAII scope of class X?" and "is this *always* called within the RAII scope of class X?" The GC suppression is the latter; static deadlock detection would be the former.

But I haven't done that yet. And when using this stuff, you'd need to be mindful that the callgraph is not complete -- things like function pointers are not fully represented, and different analyses probably want to handle them differently.
Attachment #8996377 - Flags: review?(jcoppeard)
Oops, this is a bugfix for breakage resulting in changes from the other patch in this bug. But it's probably easier to review this way. I'll fold it into the other patch before landing.
Attachment #8996498 - Flags: review?(jcoppeard)
In a way, this is another bugfix to the limits code. And it changes a bunch of that freshly-written code. But the previous version is simpler to understand and mostly works without this change, so it may be easier to review this way. It will be missing limit computations for mutually-recursive root functions, which are not common but not that uncommon either.

The basic problem being addressed is that the previous patches need to do a full traversal of the graph, propagating limits everywhere. But where do you start? I naively started at all nodes with no callers, thinking that I would get all roots that way (even if some of them really shouldn't have been roots, it's just that due to some bug or other the connection between the caller and callee was not detected.)

The problem is that there are quite a few places where a function calls itself but is called by no other functions. Or maybe not that many places, but a fairly large chunk of the callgraph is only reachable from such roots. In itself, that is easy to fix: exclude self-calls from the test of whether a node has any callers.

That handles almost all of the cases, but I was still getting a handful of nodes not being visited. It turned out that there are a couple of places where 2 or a few functions have a cycle, and nothing else calls into that cycle. So I implemented this heavyweight fix to find all recursive roots in a separate pass.
Attachment #8996499 - Flags: review?(jcoppeard)
Comment on attachment 8996377 [details] [diff] [review]
Expand suppression to general limits

Review of attachment 8996377 [details] [diff] [review]:
-----------------------------------------------------------------

I was having a hard time understanding the meaning of the word 'limits' in all this.  This is about whether a function is always/ever called from a scope with a particular RAII class instance live, right?  And the current use is for AutoSuppressGC etc.  My best suggestion for a name would be something involving 'dynamic context' however that's is not exactly pithy.  Something about 'current scope' maybe...  What do you think?

I'm going to r+ this but TBH I don't fully understand all of this.

::: js/src/devtools/rootAnalysis/analyzeRoots.js
@@ +41,4 @@
>  assert(text.pop().length == 0);
>  for (var line of text) {
> +    const [_, limits, func] = line.match(/(.*?) (.*)/);
> +    limitedFunctions[func] = limits | 0;

This should check for no match and report and error.

::: js/src/devtools/rootAnalysis/annotations.js
@@ +362,5 @@
> +    let limit = 0;
> +    if (type in typeInfo.GCSuppressors)
> +        limit = limit | LIMIT_CANNOT_GC;
> +    // if (MainThreadRAIITypes.indexOf(type) != -1)
> +    //     limit = limit | LIMIT_MAINTHREAD;

Did you mean to remove this?

::: js/src/devtools/rootAnalysis/callgraph.js
@@ +20,5 @@
>  {
>      // Note: not dealing with overloading correctly.
>      var nargs = 0;
>      if (field.Type.Kind == "Function" && "TypeFunctionArguments" in field.Type)
> +        nargs = field.Type.TypeFunctionArguments.Type.length;

Not sure what's going on here, unrelated fix?

::: js/src/devtools/rootAnalysis/computeCallgraph.js
@@ +98,4 @@
>  
>          for (var callee of getCallees(edge)) {
> +            // Individual callees may have additional limits. The only such
> +            // limit currently is that nsISupports.{AddRef,Release} are assumed

Hmm, is it true that Release can never GC?

::: js/src/devtools/rootAnalysis/loadCallgraph.js
@@ +41,5 @@
> +// Map from identifier to full "mangled$readable" name. Or sometimes to a
> +// Class.Field name.
> +var functionNames = [""];
> +
> +var mangledToId = {};

Should we use a Map for this?

@@ +74,5 @@
> +// probably useful (it is the order that functions were invoked within
> +// caller.)
> +//
> +// During the same scan, build callersOf from calleesOf.
> +function merge_repeated_calls(calleesOf) {

Thanks for adding comments.  I can't understand what the code is doing though.

@@ +202,3 @@
>                  numGCCalls++;
>              }
>          }

Should we warn if we can't parse the line?

@@ +247,4 @@
>      while (top > 0) {
> +        // Consider caller where (graph) -> caller -> (0 or more callees)
> +        // 'callercaller' is for debugging.
> +        const [caller, edge_limits, callercaller] = worklist[--top];

It would be more conventional to use push()/pop() rather than top.
Attachment #8996377 - Flags: review?(jcoppeard) → review+
Attachment #8996498 - Flags: review?(jcoppeard) → review+
Attachment #8996499 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #4)
> Comment on attachment 8996377 [details] [diff] [review]
> Expand suppression to general limits
> 
> Review of attachment 8996377 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I was having a hard time understanding the meaning of the word 'limits' in
> all this.  This is about whether a function is always/ever called from a
> scope with a particular RAII class instance live, right?  And the current
> use is for AutoSuppressGC etc.  My best suggestion for a name would be
> something involving 'dynamic context' however that's is not exactly pithy. 
> Something about 'current scope' maybe...  What do you think?

Yeah, you're right, I've never been happy with the term "limits". I meant it as in "this point in the code is limited to not GC." And my next use was going to be "this code is limited to mainthread-only." But if it's generalized to any RAII class, then it doesn't fit. "This code is limited to run within the scope of RAII class X"... ugh.

I'm not crazy about either dynamic context or current scope, because neither suggests to me what short of context or scope is being used. guaranteedState is more like it. Then if I add the OR version, it would be possibleState and guaranteedState. That sounds better, but clumsy and long. I may as well spell it out with definiteRAIIScopes/possibleRAIIScopes, though there are uses (eg the mainthread one) that don't involve RAII objects at all, it's just that the point in the control flow graph is dominated by something that establishes some condition -- for the heap write analysis, MOZ_ASSERT(IsMainThread()) or something like it is used to identify mainthread-only portions of the callgraph. dominatingConditions/reachableFromConditions? dominatedBy/reachableFrom?

knownProperties/possibleProperties?

It really seems like there should be prior art for this, since it's so commonly used in static analyses.

So far, I think I'm liking knownProperties/possibleProperties best. Though dominatedBy/reachableFrom isn't too bad.

> I'm going to r+ this but TBH I don't fully understand all of this.

I'll take another stab at the comments before landing, at least.

> ::: js/src/devtools/rootAnalysis/analyzeRoots.js
> @@ +41,4 @@
> >  assert(text.pop().length == 0);
> >  for (var line of text) {
> > +    const [_, limits, func] = line.match(/(.*?) (.*)/);
> > +    limitedFunctions[func] = limits | 0;
> 
> This should check for no match and report and error.

Ok, added an assert. (They're always enabled.)

> ::: js/src/devtools/rootAnalysis/annotations.js
> @@ +362,5 @@
> > +    let limit = 0;
> > +    if (type in typeInfo.GCSuppressors)
> > +        limit = limit | LIMIT_CANNOT_GC;
> > +    // if (MainThreadRAIITypes.indexOf(type) != -1)
> > +    //     limit = limit | LIMIT_MAINTHREAD;
> 
> Did you mean to remove this?

Oops, yeah, that's from long ago. When I first launched into attempting this, for the heap write analysis. Removed.

> ::: js/src/devtools/rootAnalysis/callgraph.js
> @@ +20,5 @@
> >  {
> >      // Note: not dealing with overloading correctly.
> >      var nargs = 0;
> >      if (field.Type.Kind == "Function" && "TypeFunctionArguments" in field.Type)
> > +        nargs = field.Type.TypeFunctionArguments.Type.length;
> 
> Not sure what's going on here, unrelated fix?

Oops, yes. A significant bugfix in its own right; I'll split it out.

> 
> ::: js/src/devtools/rootAnalysis/computeCallgraph.js
> @@ +98,4 @@
> >  
> >          for (var callee of getCallees(edge)) {
> > +            // Individual callees may have additional limits. The only such
> > +            // limit currently is that nsISupports.{AddRef,Release} are assumed
> 
> Hmm, is it true that Release can never GC?

Not at all. I have a dim memory of mccr8 giving me a reason for why this is valid, but in looking at the callgraph I can't imagine why not. Uh... I think I'd better do some more pushes without this annotation to see what I find. Thanks for pointing it out.

> ::: js/src/devtools/rootAnalysis/loadCallgraph.js
> @@ +41,5 @@
> > +// Map from identifier to full "mangled$readable" name. Or sometimes to a
> > +// Class.Field name.
> > +var functionNames = [""];
> > +
> > +var mangledToId = {};
> 
> Should we use a Map for this?

I don't know. Probably. I started out with all plain objects, then at some point got the Map religion, then more recently have been lazily falling back to plain objects. I suppose I should at least be consistent.

I may make that change on top of my full stack, since I don't really want to manually push that change through the 30 or so patches sitting on top of this one.

> @@ +74,5 @@
> > +// probably useful (it is the order that functions were invoked within
> > +// caller.)
> > +//
> > +// During the same scan, build callersOf from calleesOf.
> > +function merge_repeated_calls(calleesOf) {
> 
> Thanks for adding comments.  I can't understand what the code is doing
> though.

I'll take another stab at it. I was fixated on not losing the call order, because it is very common for me to print out the callees of a function and go through them in order while looking at the source. But this code would probably be a lot more readable if generated an order table up front, used sets to eliminate duplicates, and then just put them back in the right order at the end.

> @@ +202,3 @@
> >                  numGCCalls++;
> >              }
> >          }
> 
> Should we warn if we can't parse the line?

Yes. I'll have it abort completely.

I have a tendency to update the format occasionally but then want to run both new code on old files as well as old code on new files, and I figured it wasn't too unsafe because I'm both generating and consuming the format internally. But it's well worth the extra robustness at this point.

> 
> @@ +247,4 @@
> >      while (top > 0) {
> > +        // Consider caller where (graph) -> caller -> (0 or more callees)
> > +        // 'callercaller' is for debugging.
> > +        const [caller, edge_limits, callercaller] = worklist[--top];
> 
> It would be more conventional to use push()/pop() rather than top.

I had some reason to do it this way that I can't remember... oh! I think it was for debugging. It might have been because with nondestructive pops like this, I could see what was on the stack before the pop. Or it might have been because I could break if top were a certain value? Some variant of that rings a few bells.

At any rate, you're right, I should switch this to plain push/pop.
(In reply to Steve Fink [:sfink] [:s:] from comment #5)

> So far, I think I'm liking knownProperties/possibleProperties best. Though
> dominatedBy/reachableFrom isn't too bad.

Oh, I like dominatedBy/reachableFrom.
Priority: -- → P3
https://hg.mozilla.org/mozilla-central/rev/77ada590e047
https://hg.mozilla.org/mozilla-central/rev/4ddc76989782
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.