Open Bug 965384 Opened 10 years ago Updated 4 months ago

Make using a String-owned jschar* after the string is dead impossible (or at least harder)

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

People

(Reporter: terrence, Unassigned)

References

(Blocks 1 open bug)

Details

There are four different ways that one might want to use string chars and two problems that can result when one does.

====

Problem #1: Forgetting to root the owning string for the full lifetime of the jschar* ownership. The analysis will not catch JSString *str; js::SomethingGCy(str->chars()) because there is no use of str itself after the GC. Beefing up the analysis here would be hard.

Problem #2: Inline characters might move. Even if the above is fixed, once we store small strings in the nursery, the str->chars() will become invalid on a minor GC.

====

Usage #1: Looking at a single character. For example, checking if the first character of a string is a '/'. Currently we have to flatten and getChars so we have all the problems above. A new API that could avoid flattening would be nice here.

Usage #2: Copying characters out of the string into a buffer. We currently have a couple of hard-to-use APIs that do this.

Usage #3: Borrowing the jschar* to implement a full-string algorithm that needs the performance of raw pointer access.

Usage #4: Stealing ownership of the string's characters.

====

#1 and #2 are clearly safe. We should add these APIs to JSString. #3 is the most likely to get wrong and currently the easiest to use. We should add an RAII AutoBorrowOwnedChars that forces rooting of the JSString and remove direct getChars access. This class can also solve problem #2 directly by copying short strings to the stack before giving out a jschar*. I don't think that #4 is ever done intentionally at the point.
Severity: normal → S3

Today this is mitigated by making the chars(const JS::AutoRequireNoGC& nogc) to be in a NoGC scope.
However the problem persists as nothing prevents us from leaking the obtained chars outside the NoGC scope.

Maybe the returned type should capture the dependency on JS::AutoRequireNoGC and prevent raw access to the characters.

Blocks: sm-runtime
Priority: -- → P3
Severity: S3 → S4

(In reply to Nicolas B. Pierron [:nbp] from comment #1)

Maybe the returned type should capture the dependency on JS::AutoRequireNoGC and prevent raw access to the characters.

In actual usage scenarios, this has been found to pretty much not work because when you have char* data, you are very very likely to want to pass it to something that takes a char*. Which means extracting it from whatever CharsHolderThatDisallowsGC struct you are using. You can push the special type a ways, but it's just too common to end up needing to use libc or other library functions.

Our main answer to this is AutoStableStringChars, which will copy the chars if needed.

In another domain, we have ProcessData() and ProcessFixedData(). They both take lambdas that will be handed the data to process. ProcessData also gives you an AutoCheckCannotGC&& token so that the analysis will stop you from GCing (though you can nogc.reset() once you're done using the data, and then you can GC all you want.) ProcessFixedData is more similar to AutoStableStringChars, in that it copies data if it is movable. We could consider providing a similar API for strings.

Having a lambda sounds like the nicest way to discourage aliasing the char*, even if this does not prevent it.

You need to log in before you can comment on or make changes to this bug.