Closed Bug 602274 Opened 14 years ago Closed 6 years ago

preventing GC hazards with callers of JSString::chars()

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: igor, Unassigned)

Details

(Keywords: sec-want, Whiteboard: [sg:want (fragile)])

Despite the conservative stack scanner the usage of str->chars can lead to GC hazards. Consider, for example, the following fragment form JSStructuredCloneReader::startRead:

        JSString *str = readString(nchars);
        if (!str)
            return false;
        const jschar *chars;
        size_t length;
        str->getCharsAndLength(chars, length);
        JSObject *obj = RegExp::createObjectNoStatics(context(), chars, length, data);
        if (!obj)
            return false;
        vp->setObject(*obj);

Note that str after the call to getCharsAndLength is not accessed. Thus a compiler is free to use its location for other variable like the chars itself or length. If that happens, the str and, consequently, the char array would not be rooted. That would be a GC hazard if RegExp::createObjectNoStatics could trigger a last-ditch GC and access the chars after that.

Fortunately this is not the case as chars are copied into the buffer before any GC thing is allocated. But such reasoning is rather fragile and we should investigate how to make sure that such hazards would not happen.

I suppose we need a static analysis that would detect such usage. But we also should consider changing JSString internal API to avoid such bad pattern from happening.
Ooh, creepy.  At the very least, we need an official wiki page "Requirements for assuming the conservative GC will actually root your object" that we link to from the newsgroup, main SM wiki page, and everywhere else.
js_json_parse has thin bad pattern. There js_ConsumeJSONText is called with the chars for no longer referenced str. That function does allocate GC things leading to a potential hazard depending on the compiler-generated code.

One way to mitigate this is to remove chars and charsAndLength methods and allow to access only individual chars or do some memcopy operation on string chars perhaps using some intermediate class for maximal performance. The nice thing about this is that it would allow to implement a trivial bump allocator for string chars. Currently this is not possible as the GC cannot discover and change all the references to string chars during heap compactification.
Static analysis on top of conservative GC still seems worth it, or we have only one line of defense.

/be
Whiteboard: [sg:critical?]
blocking2.0: --- → final+
Assignee: general → lw
The 'blocking 2.0+' refers to the specific bug in comment 0, yes?
(In reply to comment #4)
> The 'blocking 2.0+' refers to the specific bug in comment 0, yes?

There is no bug there. It just demonstrates potentially bad pattern. My intention with this bug was really about preventing such bad patterns. That should not block 2.0.
So this sounds like a sg:fragile or something, not critical? at this point, meaning that it needn't autoblock?  Renominating to *remove* blocking, unusually.
blocking2.0: final+ → ?
blocking2.0: ? → -
Whiteboard: [sg:critical?] → [sg:critical?] <-- see comment 5 and 6
(In reply to comment #6)
> So this sounds like a sg:fragile 

sg:fragile is a very good description of what is going on.
Whiteboard: [sg:critical?] <-- see comment 5 and 6 → [sg:want (fragile)]
Guile/SCM had the same problem. I wish I had thought of this when conservative GC was going in.

Eventually, we put markers in the code that would force the compiler to keep the objects the conservative GC did know about (the JSString in the example) alive until the derived values (the jschar * in the example) were dead. When compiling with GCC, the markers were in-line volatile asm statements that the compiler wasn't allowed to remove, but anything that forces the compiler to keep the value around would be good enough.

A static analysis ensuring that the markers are correctly placed should be practical, if we put restrictions on how the derived values can be used (no passing them to non-vetted functions; no storing them in data structures; etc).

Tom Lord, who originally produced Guile from SCM, produced his own Guile-derived(?) system called Systas Scheme, and I'm pretty sure he had his own static analysis that checked this (based on his home-rolled C preprocessor and parser).
Assignee: luke → general
Assignee: general → nobody
Group: core-security → javascript-core-security
I'm told the conservative GC is long dead.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Group: javascript-core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.