Closed Bug 353231 Opened 14 years ago Closed 7 years ago

Static analyzer to enforce GC rooting

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr17 --- wontfix
firefox-esr24 --- wontfix
b2g18 --- wontfix

People

(Reporter: igor, Unassigned)

Details

(Keywords: sec-want, Whiteboard: [sg:want])

This is a spin-off of bug  352846 comment 14.

A static source analyzer tool together with source annotations about explicit GC roots  should help to find/avoid GC hazards in SpiderMonkey.
Marking as restricted to comment on typical code paths leading to vulnerabilities.
Group: security
Here is some details on the desired features of the static analyzer.

First some terminology:

1. GC thing - a thing that is allocated from a memory controlled by
the Garabge Collector. For the list of their types see
http://lxr.mozilla.org/seamonkey/source/js/src/jsgc.h#51

2. Close hook for a GC thing - native or a scripted code that is
executed after GC has found that the thing is no longer reachable.
This code can be arbitrary and roughly corresponds to objects with the
finalize method in Java or heap-allocated objects with destructors in
C#. Note that the term "finalizer" is used in SpiderMonkey to denote
native code that is executed right before the corresponding GC thing
is returned to the pool of free memory. Such finalizer typically
releases a system resource and can use very limited set of
SpiderMonkey API. Currently the close hooks is used in SpiderMonkey
only to implement the generators. In particular, when GC determines
that a generator that yielded inside the try statement with a finally
block is unreachable, SpiderMonkey follows Python 2.5 and executes the
finally block after the garbage collection stage.

3. Last-ditch GC - a garbage collection that runs when GC heap is
full. Compared with a garbage collection that application triggers
through an explicit call to JS_GC() or JS_MaybeGC(), the last ditch GC
does not collect atoms, never runs the close hooks and treats newborn
array (see bellow) as normal roots.

4. Strong root - a normal GC root. If GC thing is stored in the strong
root, then it is considered GC-reachable until the application
explicitly removes the thing from the root or removes the root itself
from the set of roots.

5. Weak root - a root that only protects against the last-ditch GC but
not normal GC. There are 3 categories of such roots in SpiderMonkey:

a. newborn array or lastInternalResult fields in JSContext structure,
http://lxr.mozilla.org/seamonkey/source/js/src/jscntxt.h#609. When GC
allocates a new GC thing of a particular type, it places it into
JSContext.newborn[type] array. The array is protected against the last
ditch GC so code can safely allocate several GC things of different
types in a row without the need to root them.
JSContext.lastInternalResult is used to hold the result of the last
scripted or native function invocation so code can allocate a new GC
thing and put the result there without an explicit root.

b. Any location that holds an atom (interned property). The last ditch
GC does not touch them so code can allocate many atoms in a row
without the explicit rooting before placing them into strongly-rooted
entity.

c. A location that can be manipulated by a script such as object
property, array element etc. Normally a GC thing stored in a such
location is GC-protected through its parent object. The problem is
that a close hook can remove the thing from the location and then
trigger GC one more time. Note that even before the introduction of
the close hook support such locations were weak roots as almost all
SpiderMonkey API that can trigger GC, triggers it through a script
execution. In such cases a script does not need to define a close hook
to cut the thing from the object graph and can do it directly. Still
introduction of the close hooks made this weak rooting property
explicit.

6. Weakly rooted GC thing - GC thing that is rooted only via a week
root. Such thing is protected against the last-ditch GC, but any
normal GC call will collect it. Note that a few if not most of
SpiderMonkey API can trigger a full GC as a side affect of calling
application supplied callbacks. Thus one must be very careful with
weakly rooted things.

With this terminology I can classify GC-related bugs into 3 categories.

1. Treating weak roots as strong ones. So far this was the major
source of bugs. A trivial example is:

str = ApiToAllocateNewThing(); // str is only protected by JSContext.newborn
CallAnotherAPIThatCanTriggerFullGC();
Use str.

In many cases such pattern happens when code correctly assumed that
CallAnotherAPIThatCanTriggerFullGC() could only trigger the last ditch
GC but later API changes broke that and allowed for a full GC run.

A more subtle example is:

JSObject *obj;
jsval val;
...
OBJ_GET_PROPERTY(.., obj, ..., &val); // val is rooted only as property of obj
CallAnotherAPIThatCanTriggerFullGC();
Use val

Here to exploit a bug a script can define a close hook that deletes
the property of obj and then trigger GC one more time.

Note that pattern often exists in a form of chained calls. That is,
one function call an API that returns a weakly rooted thing and passes
it to the second function as an argument. The second function first
calls CallAnotherAPIThatCanTriggerFullGC() before using its argument.

This treating weak-as-strong root type of bugs in most cases requires
a very specialized script to expose them and so far no testing tool is
available to check for them.

2. Displacing a value from a strong root. This significantly less
frequent bug but still poses a problem. A pattern to watch would be:

void Example(..., GCThingType *pointerToAStronglyRootedLocation)
{
   GCThingType gcthing = *pointerToAStronglyRootedLocation;
   Use gcthing
   Store another gcthing in PointerToARootedLocation;
   Call API that can trigger any kind of GC
   Use gcthing again.
}

Again the pattern may exist as a call chain like in:

void Good(GCThingType gcthing1, GCThingType *pointerToAStronglyRootedLocation)
{
   GCThingType gcthing2 = AllocateNewGCThing();
   *pointerToAStronglyRootedLocation = gcthing2;
   Call API that can trigger any kind of GC
   Use gcthing1
}

void Bad(GCThingType *pointerToAStronglyRootedLocation)
{
 GCThingType gcthing = *pointerToAStronglyRootedLocation;
 Bad(gcthing, pointerToAStronglyRootedLocation);
}

Here the function Good assumes that *pointerToAStronglyRootedLocation
is a place where to store the result and which it can use as a
temporary root as well and gcthing1 is rooted by some other means
while the function Bad violates that assumption.

3. Forgetting to mark things during GC-marking phase. This is a minor
source of the problems and compiling SpiderMonkey with WAY_TOO_MUCH_GC
(which forces to run GC almost on any allocation) and running various
stress-tests was able to reveal most of them.

Thus it would be nice if a static analyzer can detect the first kind of bugs:

GetWeaklyRootedValue
CallAPIThatCanTriggerFullGC
UseWeaklyRootedValue

where weakly rooted value is not copied to a strong root. Now the past
experience tells that many such bugs can be fixed by changing the API
to require to pass a pointer to a strong root so GetWeaklyRootedValue
would store the value in the strong root immediately. But then the
problem 2 can become very real as it is tempting to optimize away an
extra rooting and reuse the rooted pointer for other things.
Whiteboard: [sg:investigate]
Here is a typical bugs that shows the GC harzads, their exploits and rooting fixes:

Bug 311497 - using unrooted temporary storage for heap sort pivot. This is AFAICS is a hard problem for static analizies. But if some tool can detect it, it would be nice.

Bug 352846 - a recent example that demonstrates how fragile the weekly rooted model.

Bug 322045 - another good example of too much relience on week roots.

Bug 312278 - a good demo for various GC hazards and typical fixes. Note that the patches there use, among other thinks, AllocStack and js_EnterLocalRootScope internal API to root the temporaries. That code was switched to using temp roots, a much better API for strong temporray roots, see bug 357169.

Bug 325269 - the bug that lead to the introduction JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT API and shows that a GC thing that is rooted  only via an object graph is, in fact, a weakly-rooted GC thing and may not survive innocent-looking API call (old-born problem).

Bug 345967 and bug 341956 - the bugs that remind that atoms or interned properties are weakly rooted GC things.

Bug 313276 - another good example of old-born problem.

Bug 316885 - demonstrates how to root GC thing using interpreter stack.

Bug 313952 - bug that lead to the introduction of lastInternalResult.

Some thought on a conservative type checking. I believe if we can check the following 3 rules, then we would avoid GC hazards in future.

In the following I will use PGT or Pointer-to-Gc-Thing instead of jsval since the consideration applies equally well to jsval or explicit pointers to GC things like JSObject*, JSString*, JSXML*, JSQName* or JSNamespace*.

1. The first bad pattern to detect is:

Foo(...)
{
    PGT x;
    ...
    Bar(&x);
    ...
}

That should not be allowed. That is, a pointer to a local variable holding PGT must not be passed to any function.

Note that currently there are still few places that violates this rule especially in the interpreter but the past experience has shown that it is extremely easy to use "x" after "Bar(&x)" without introducing GC hazard.

A typical example of the bug with PGT == jsval and the way it is fixed is array_join_sub function from jsarray.c. Before the fix it contained:

static JSBool
array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op,
               JSString *sep, jsval *rval)
{               
    jsval v;      

    ....
    for (index = 0; index < length; index++) {
        ok = GetArrayElement(cx, obj, index, &hole, &v);
        ...
        ok = js_ValueToObject(cx, v, ...);
        ...
}        
    
This was a GC hazard since  js_ValueToObject requires v to be rooted. The fix changed the code into:

static JSBool
array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op,
               JSString *sep, jsval *rval)
{               

    /* Use rval to locally root each element value as we loop and convert. */
#define v (*rval)
    ....
    for (index = 0; index < length; index++) {
        ok = GetArrayElement(cx, obj, index, &hole, &v);
        ...
        ok = js_ValueToObject(cx, v, ...);
        ...
}        

The nice consequence of this approach is that the code can assume that a function argument of type PGT* can always be used for the local roots in the same as the above fix used jsval *rval. This rule removes specialty of the signature of the native function:

typedef JSBool
(* JS_DLL_CALLBACK JSNative)(JSContext *cx, JSObject *obj, uintN argc,
                             jsval *argv, jsval *rval);

Here jsval *argv and jsval *rval are pointers to rooted locations by the definition.

But how do we get rooted pointers then? There are several ways. One is to use 

extern JS_FRIEND_API(jsval *)
js_AllocStack(JSContext *cx, uintN nslots, void **markp);

which allocates a segment of rooted stack and all pointers into the segment are rooted.

Another way is to register the pointer to PGT with GC. The most widely used way is TEMP_ROOT API. For example, the above example can also be fixed via:

static JSBool
array_join_sub(JSContext *cx, JSObject *obj, enum ArrayToStringOp op,
               JSString *sep, jsval *rval)
{               
    jsval v;
    JSTempValueRooter tvr;

    v = JSVAL_NULL; /* initialize the value before registering it with GC */
    JS_PUSH_TEMP_ROOT(cx, 1, &v, &tvr); /* register 1 location */
    ....
    for (index = 0; index < length; index++) {
        ok = GetArrayElement(cx, obj, index, &hole, &v);
        ...
        ok = js_ValueToObject(cx, v, ...);
        ...
    }
    ...
    JS_POP_TEMP_ROOT(cx, &tvr);    
}        
 
That is, between JS_PUSH_TEMP_ROOT(cx, 1, &v, &tvr) and JS_POP_TEMP_ROOT(cx, &tvr) &v is a pointer to a rooted location and can be passed to other functions.

2. The second bad pattern can be demonstrated with the following assert:

Foo2(PGT x, PGT* pointer, ...)
{
    assert(x != *pointer);
    ...
}

This states that all arguments to a function of type PGT must not be rooted through a copy that exists in a location pointed by an argument of type PGT*.  

In a sense this is a consequence of the previous rule that states that PGT* arguments can be freely as local roots. Clearly if the caller uses them to root PGT passed to the callee as other argument this can not be the case.

For example, consider the following function from jsobj.c:

JSBool
js_ValueToObject(JSContext *cx, jsval v, JSObject **objp);

A bug can happen with a code path like that:

JSObject *obj;
jsval v;
...
obj = NULL;
RegisterObjectRoot(&obj); /* there is no such function but you can get 
                             the same effect with TEMP_ROOT API */

/* Now &obj is a pointer to a rooted location */
ConstructObject(&obj);

v = OBJECT_TO_JSVAL(obj);

js_ValueToObject(cx, v, &obj); /* BUG: after js_ValueToObject changes obj
                                  v is no longer rooted */

UnregisterObjectRoot(&obj);


3. The third pattern is about a function returning PGT. The idea is to assume that given

PGT Foo3(...);

the result of Foo3 must be immediately rooted. That is, if the call to Foo3
is not immediately followed by rooting of its result, assume a bug. For example, jsstring.c had many bugs like the following code fragment:

static JSBool
str_substring(JSContext *cx, JSObject *obj, uintN argc, jsval *argv,
              jsval *rval)
{
    JSString *str;
    jsdouble d;
    jsdouble length, begin, end;

    str = js_ValueToString(cx, OBJECT_TO_JSVAL(obj));
    if (!str)
        return JS_FALSE;
    ...
    use_str
}

Here PGT is JSString *str. After js_ValueToString its result was not rooted leading to GC hazard in use_str. The fix was to change that into:

static JSBool
str_substring(JSContext *cx, JSObject *obj, uintN argc, jsval *argv,
              jsval *rval)
{
    JSString *str;
    jsdouble d;
    jsdouble length, begin, end;

    str = js_ValueToString(cx, OBJECT_TO_JSVAL(obj));
    if (!str)
        return JS_FALSE;
    argv[-1] = STRING_TO_JSVAL(str);        
    ...
    use_str
}

That copied str into a rooted location fixing the bug.

It would be nice to convert js_ValueToString from

extern JSString *
js_ValueToString(JSContext *cx, jsval v);

to

extern JSBool
js_ValueToString(JSContext *cx, jsval v, JSString **pstr);


and require that pstr is a pointer to a rooted location by the previous 2 rules. But for compatibility such calls will continue to exist for some time. Thus it would be nice to check each and every caller of js_ValueToString.
> 
> 1. The first bad pattern to detect is:
> 
> Foo(...)
> {
>     PGT x;
>     ...
>     Bar(&x);
>     ...
> }
> 
> That should not be allowed. That is, a pointer to a local variable holding PGT
> must not be passed to any function.

CQual should be good at detecting #1. The other ones will require flow sensitivity from Oink. Not having flow sensitivity significantly reduces the types of static analysis we can do.
          Garbage Collection Safety Analysis of SpiderMonkey
             Version 2: Result of Mozilla Summit Meetings

              Daniel Wilkerson, Igor Bukanov, Taras Glek
                           22 November 2006

A "jsval instance" is a pointer from C++ into the JavaScript [JS]
heap.  Note that by "a jsval instance" we mean the copy of the jsval
in a particular memory location: that is, another copy of the same
jsval 32-bit value in another location is not "the same" jsval
instance, but a copy.

The JS garbage collector has a list of jsval*-s called the "root
list".  Actually, these jsval*-s are interpreted as arrays and are
accompanied by a length; the length 1 is used for the usual simple
pointer.  A jsval instance is "rooted" if it's address is on the root
list.

Assume that JS heap objects can only be pointed to by jsval-s (and
not, say ints).  At a given timestep, call "C++-live" the set of JS
heap objects reachable from C++ code and call "JS-live" those
reachable from jsvals the addresses of which are on the JS root list.

Goal invariant "safe garbage collection": When the garbage collector
runs, all C++-live objects are also JS-live.  Rules to guarantee Goal
invariant are as follows.

Rule 1a: allocation, stack: When a jsval v is created on the stack, it
must be initialized to JS_NULL.  The next line must be a call to
register(&v) that puts &v onto the root list.  The line after that
must set the value of v using say GET_OBJ_PROPERTY(..., &v) or some
other function; question: how do we guarantee that this line sets the
jsval again?

Rule 1b: allocation, heap: When a jsval* is created on the stack, we
initialize it from alloc_rooted_ptr(), which roots the pointer that it
returns.

Rules 1a and 1b guarantee that the act of creating and initializing
stack-allocated jsval and jsval* variables is atomic with respect to
the garbage collector.

Question: What about C++-stack allocated jsval**-s or higher levels
of indirection?  I think Igor says we do not allow them.

Question: What about pointers to objects that contain jsval*-s?  I
seem to recall Igor saying that the answer is something like the
following.  The only such objects are JS heap objects, not C/C++
objects.  A jsval instance is "transitively-rooted" if it is rooted or
its address is a field of an object that is transitively-rooted.
Inductively, it seems therefore that all transitively-rooted jsvals
must point to JS-live objects; not sure how this definition interacts
with cycle garbage collection.

The allocation and initialization of the jsval in question in rules 1a
and 1b must be atomic with respect to the garbage collector.  I think
that requires the arguments to alloc_rooted_ptr(), register() and
GET_OBJ_PROPERTY() to not call functions that could call the garbage
collector and to be safe we will require them to not call any
functions at all.

The only values left to ensure that they are rooted are
parameter-allocated jsvals and jsval*-s (assuming jsval**-s and higher
cannot happen), but parameter jsval*-s are already rooted because they
can only come from something which is rooted at its creation.  The
only remaining problem is the parameter jsvals.

Question: What about temporary values made during expression
computation, such as calling a function by first computing all the
arguments: one of the arguments may be a jsval which is saved while
another argument computation runs which calls a function which calls
the garbage collector or does other things.

Solution 1: Make parameter jsvals illegal.  This seems to be a fine
solution except that it requires changing the current API.  Taras
suggests that we do that and write wrappers for the public API such
that f(jsval v, foo y) becomes the following.

    int f0(jsval *vp, foo y) {
      // function body of f() with *vp replacing v
    }
    int f(jsval v, foo y) {
      register(&v);
      return f0(&v, y);
    }

I suggest we could automate this code change using Oink and the pretty
printer.

Solution 2:

A jsval instance is "copy-of-rooted" if there is another jsval
instance that has the same 32-bit value that is rooted or
transitively-rooted.

Invariant "rootedness": All jsval instances are 1) rooted (once
initialization is complete as in rules 1a and 1b above) 2)
transitively-rooted, or 3) copy-of-rooted.  This invariant implies the
goal invariant.

Invariant "stack/parameter jsval arguments are un-aliased": All jsval
arguments v to a function f() are stack or parameter allocated and
there is no alias to v available from any data-structures transitively
reachable by f().  This invariant implies invariant "rootedness" as
the jsval parameter to f() is copy-of-rooted and its copy cannot be
changed during the execution of f().

Rule 3: At the call site to f() above where a jsval argument is
passed, we require that the jsval argument be a simple variable
expression of a stack allocated jsval v; Igor says that this is the
common case.  If it is not we can create a temporary immediately
before the call site and assign to it; this requires rooting the
temporary.

We need the following distinction: A pointer is "ephemeral" if
whenever it is read by any expression other than the de-reference
operators ("*", "[]" and "->") it is being copied to another ephemeral
pointer.  We can do a jsval*-ephemeral-analysis by 1) allowing the
user to annotate some pointers as ephemeral and 2) verifying these as
ephemeral.  We take the names rval and argv as user annotations that
these jsval*-s are ephemeral.

Rule 4: For any stack or parameter allocated jsval v: address of v,
&v, 1) is ephemeral and 2) is not passed to f() as an argument.
Here is few comments in reply to comment #6:

>           Garbage Collection Safety Analysis of SpiderMonkey
>              Version 2: Result of Mozilla Summit Meetings
> 
> Rule 1a: allocation, stack: When a jsval v is created on the stack, it
> must be initialized to JS_NULL.  The next line must be a call to
> register(&v) that puts &v onto the root list.  The line after that
> must set the value of v using say GET_OBJ_PROPERTY(..., &v) or some
> other function; question: how do we guarantee that this line sets the
> jsval again?

I suggest not to worry about this problem of using v before the real initialization. As far as I can see it is orthogonal to GC-safty. It is nice to address it but this was not a roblem in practice. 

> 
> Rule 1b: allocation, heap: When a jsval* is created on the stack, we
> initialize it from alloc_rooted_ptr(), which roots the pointer that it
> returns.
> 
> Rules 1a and 1b guarantee that the act of creating and initializing
> stack-allocated jsval and jsval* variables is atomic with respect to
> the garbage collector.
> 
> Question: What about C++-stack allocated jsval**-s or higher levels
> of indirection?  I think Igor says we do not allow them.

A quick grep for jsval ** shows that it only present in few obscure formating API as arguments.

> 
> Question: What about pointers to objects that contain jsval*-s?  I
> seem to recall Igor saying that the answer is something like the
> following.  The only such objects are JS heap objects, not C/C++
> objects.

This is not the case. jsval can refer to a an JS object that refers to C++ object that contains jsvals. Such values are rooted because the garbage collector knows how to traverse the object graph. That is, as long as the original jsval rooted, the garbage collector during the marking phase would reach the extra jsvals stored in the intermediate C++ object. This is happens since the embedding must register appropriate callbacks to mark all the necessary jsvals.

In practice the code can "forget" to teach GC about all the stored jsvals leading to GC hazards. But the past experience has shown that this second type of problems happens less frequently then forgeting to register stack-allocated jsval with the root set.

I do not have any suggestions how to addres it.   

>  A jsval instance is "transitively-rooted" if it is rooted or
> its address is a field of an object that is transitively-rooted.
> Inductively, it seems therefore that all transitively-rooted jsvals
> must point to JS-live objects; not sure how this definition interacts
> with cycle garbage collection.

Does this refr to XPCOM cycle collector that graydon develops, see bug 333078?

> 
> The allocation and initialization of the jsval in question in rules 1a
> and 1b must be atomic with respect to the garbage collector.  I think
> that requires the arguments to alloc_rooted_ptr(), register() and
> GET_OBJ_PROPERTY() to not call functions that could call the garbage
> collector and to be safe we will require them to not call any
> functions at all.
> 
> The only values left to ensure that they are rooted are
> parameter-allocated jsvals and jsval*-s (assuming jsval**-s and higher
> cannot happen), but parameter jsval*-s are already rooted because they
> can only come from something which is rooted at its creation.  The
> only remaining problem is the parameter jsvals.
> 
> Question: What about temporary values made during expression
> computation, such as calling a function by first computing all the
> arguments: one of the arguments may be a jsval which is saved while
> another argument computation runs which calls a function which calls
> the garbage collector or does other things.
> 
> Solution 1: Make parameter jsvals illegal.  This seems to be a fine
> solution except that it requires changing the current API.  Taras
> suggests that we do that and write wrappers for the public API such
> that f(jsval v, foo y) becomes the following.
> 
>     int f0(jsval *vp, foo y) {
>       // function body of f() with *vp replacing v
>     }
>     int f(jsval v, foo y) {
>       register(&v);
>       return f0(&v, y);
>     }

To be precise, it will be rewritten like:

     int f(jsval v, foo y) {
       int tmp;
       register(&v);
       tmp = f0(&v, y);
       unregister(&v);
       return tmp; 
     }

Any chance this could be updated with recent developments, or blog/doc URLs could be provided?  I'm interested in at least following it.
(In reply to comment #8)
> Any chance this could be updated with recent developments, or blog/doc URLs
> could be provided?  I'm interested in at least following it.

The current proposal for a set of rules to enforce in SpiderMonkey is at http://wiki.mozilla.org/GC_SafetySpec .
Whiteboard: [sg:investigate] → [sg:want]
We're using static and dynamic analysis for the new exact rooting work for GGC, so closing this bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Jan, when did we fix this?
Group: core-security
You need to log in before you can comment on or make changes to this bug.