Closed Bug 606065 Opened 14 years ago Closed 14 years ago

Move deflated string cache into compartments

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: gal)

References

(Blocks 1 open bug)

Details

(Whiteboard: [compartments])

Attachments

(6 obsolete files)

The string we deflate is already in a specific compartment, move the cache there, too. This makes it lock free!
No longer depends on: 606064
Blocks: 606068
No longer blocks: 606068
Assignee: general → gal
Attached patch patch (obsolete) — Splinter Review
Note that we don't keep a compartment alive if only strings are live in it. Only global objects keep the compartment alive. So the deflatedStringCache might go away. I don't think anything holds on to strings like that, but maybe we should document this.
Attachment #484953 - Flags: review?(jorendorff)
The deflated string cache is only for callers of JS_GetStringBytes and related APIs, and the char * returned is valid only so long as the JSString *str from which the deflated string was got still lives.

If the string dies even though there are stack references to it just because no global for its compartment lives, then there could be a problem, but it seems unlikely. I don't remember seeing any embedding over the years in which the cx's global object did not alive any JSString used via a stack temporary, where the JSString owned deflated string bytes.

/be
Definitely want some carefully written JS API doc update.

/be
Keywords: dev-doc-needed
Nice to see locking go -- need to get rid of the rest apart from MT wrappers!

/be
Comment on attachment 484953 [details] [diff] [review]
patch

>@@ -4262,31 +4236,17 @@ DeflatedStringCache::getBytes(JSContext 
> 
>     /*
>      * In the single-threaded case we use the add method as js_DeflateString
>      * cannot mutate the map. In particular, it cannot run the GC that may
>      * delete entries from the map. But the JS_THREADSAFE version requires to
>      * deal with other threads adding the entries to the map.
>      */

The last sentence needs to be deleted.

But the rest of this comment seems to apply still. map.add() will assert if js_DeflateString triggered a GC that removed some entries. We need the map.relookupOrAdd code, or some of it.

r=me with that.
Attachment #484953 - Flags: review?(jorendorff) → review+
Why do we need the relookupOrAdd code? I don't understand.
Attached patch patch (obsolete) — Splinter Review
Attached patch patch (obsolete) — Splinter Review
Attachment #485570 - Attachment is obsolete: true
This is the new comment. I am convinced your objection is wrong and I will land the patch (try is green, too), but please double check my math here. Maybe I am overlooking something stupidly.

/*
 * 1. js_DeflateString does not mutate the map.
 * 2. At most one thread is allowed to mutate a compartment at any
 *    given time.
 * 3. Each compartment has its own map.
 *
 * Hence, map was not changed since lookupForAdd() and we can use
 * a regular add() here.
 */
http://hg.mozilla.org/tracemonkey/rev/685accd622f0
Whiteboard: [compartments], fixed-in-tracemonkey
(In reply to comment #10)
> /*
>  * 1. js_DeflateString does not mutate the map.

The comment should add that "in particular, it does not run the GC even when hitting OOM".

As a side note it may be necessary to revert this change to implement the background sweep of strings, but that is post-4.0 and we may remove JS_GetStringBytes() at that point.

Also this patch assumes that, for example, DOM workers, do not share strings accross threads, right?
Hm, what about atomized strings? They live in the default compartment and surely are shared across threads even in DOM workers. So they need alock.
#12: only atoms are allowed to be shared, everything else is a bug

#13: good point! patch in a sec
Attached patch patch (obsolete) — Splinter Review
Attachment #485572 - Attachment is obsolete: true
Attachment #485614 - Flags: review?(igor)
Attached patch patch (obsolete) — Splinter Review
For clarification the remove() path is dead.
Attachment #485614 - Attachment is obsolete: true
Attachment #485615 - Flags: review?(igor)
Attachment #485614 - Flags: review?(igor)
I backed out the aptch from TM until this is fixed.
(In reply to comment #16)
> Created attachment 485615 [details] [diff] [review]
> patch
> 
> For clarification the remove() path is dead.

HM, this does not have the getBytes locking part.
I am looking a bit deeper here and I think we could get rid of the deflated string cache altogether. There is room in JSStrings now to store the deflated value there. The only downside is that I have to make strings immutable when I deflated them, for which I need a context. Is this a good time to drop the old JS_GetStringBytes(str)-like APIs that don't take a context? This would propagate outwards a bit (JS_GetFunctionName and such), but the changes are minimal. I hacked up a patch and changed xpconnect and dom and its not a big deal for our embedding.
I suggested just moving the cache for moz2/fx4/js1.8.5. Anything else is buying risk by changing mutability on JS_GetStringBytes (assuming we figure out a way to do that without a cx, which is possible), in order to claim the capacity field of mutable strings to use as a pointer to the deflated bytes. Plus the API changes, which have their effects, and also by signature imply fallibility (leading cx param, pointer return type), but in this case that rule does not apply.

OTOH moving the cache is straightforward if you can find the cache from the str as we have found the runtime (GetGCThingRuntime). No mutability change with subtle effects on string concat perf, etc.


/be
With enough time to bake the subtleties out of the cake, sure. We have little time for added bugs, we need to stabilize what we have, more or less.

/be
Attached patch patch (obsolete) — Splinter Review
Lets please get rid of cx == NULL in the code soon, it makes everything very ugly.
Attachment #484953 - Attachment is obsolete: true
Attachment #485615 - Attachment is obsolete: true
Attachment #485650 - Flags: review?(brendan)
Attachment #485615 - Flags: review?(igor)
(In reply to comment #22)
> Created attachment 485650 [details] [diff] [review]
> patch

The patch assumes that it is OK to report OOM under the compartment lock and that JS_GetStringBytes cannot be called there by one of the hooks. If this is OK, then the patch should comment on that.
(In reply to comment #20)
> by signature imply fallibility
> (leading cx param, pointer return type), but in this case that rule does not
> apply.

GetStringBytes is fallible now but the caller has no way to find about that. I completely agree with Gal that we should remove GetStringBytes(str). But I do not think that we should add GetStringBytes(cx, str) version if we are going to change a signature. Something like GetStringBytesCopy() that allocates the bytes or copy them into supplied buffer together with autoclass that hides the details would server better and would allow to get rid of the deflated cache.
#23: the patch unlocks before reporting OOM
(In reply to comment #25)
> #23: the patch unlocks before reporting OOM

I meant OOM and bad surrogate reporting (for embeddings that set js_CStringsAreUTF8) in js_DeflatString.

I think the patch should always pass NULL to js_DeflatString and do own error reporting.
I think we should just remove JS_GetStringBytes. In FF it strips high bytes from strings so its use may map different strings into the same property name. That may lead to XSS or other troubles.

Most of debug-related uses of the function could be replaces with js_PutEscapedString/js_FileEscapedString and in the rest of places that really depends on Latin-1 bytes we should have a utility that copies the chars into a buffer and report an error if it finds any non-latin1 character.
A new API should have a C++ flavor that uses auto-allocation and RAII to free if the on-stack inline-member char buffer is too small for the string. Let's not add more uses of manual-free-required C APIs. If possible, eliminate such APIs.

/be
I'm in favor of a fix that's cleaner and longer-term, provided it does not make subtle changes that up the risk ante in ways likely to require months of beta cycling to shake out.

One risk with deflated string cache elimination in favor of a byte buffer that is allocated and owned by the caller is the memory leak risk. An auto RAII helper has no such leak but might not be usable in subtle cases where the current code uses JS_GetStringBytes and counts on the deflated string cache's entry living as long as the string. But perhaps we can inspect and even automate static analysis for such abuses.

Putting a deflated string pointer in the JSString header, requiring immutability to reclaim the capacity member's space, seems riskier to me than just keeping a (or many) deflated string caches, but going for the gold (eliminating deflated string caching of any kind), pushing the sweeter RAII helper API, seems least risk of all if it is doable.

/be
This bug is marked fixed-in-tracemonkey, but there's an unreviewed patch. What's up?
blocking2.0: ? → beta8+
Patch bounced when I tried to land it.
Whiteboard: [compartments], fixed-in-tracemonkey → [compartments]
Depends on: 607292
No longer depends on: 607292
Igor, is it fair to say that you are about done with eliminating the deflated string cache? If so, I am going to close this bug and ask to block on those bugs.
(In reply to comment #32)
> Igor, is it fair to say that you are about done with eliminating the deflated
> string cache? If so, I am going to close this bug and ask to block on those
> bugs.

I am about to eliminate the need for deflated string cache for non-atoms. To kill that for atoms we need to decide what to do about JS_GetFunctionName usage in jsd.
Comment on attachment 485650 [details] [diff] [review]
patch

Recent comments suggest I don't need to review -- please mark this patch obsolete if so (that would have removed the r?me already).

/be
Attachment #485650 - Flags: review?(brendan)
Attachment #485650 - Attachment is obsolete: true
Igor, lets change the API of JS_GetFunctionName so we don't need the dsc. Its not a real public API, its just jsd. If you can make a patch for that, I will dup this bug against yours. Not having to worry about the dsc any more for the per-compartment GC stuff would rock.
(In reply to comment #35)
> Igor, lets change the API of JS_GetFunctionName so we don't need the dsc. Its
> not a real public API, its just jsd.

This is obviously a false statement!

$ grep JS_GetFunctionName jsapi.h
JS_GetFunctionName(JSFunction *fun);
 * Prefer JS_GetFunctionId over JS_GetFunctionName because it returns null for

It's ancient public API. See JS_GetFunctionId for the Unicode-preserving API.

We can change any public API, but the argument that "it's just jsd" is wrong.

I still think we should measure string lengths. Even if DSC usage was mainly on error paths, allocating in those cases is potentially going to OOM. Complexity in JSAutoByteString is not an issue if on those paths.

But yeah, let's get rid of the DSC.

/be
(In reply to comment #35)
> Igor, lets change the API of JS_GetFunctionName so we don't need the dsc. Its
> not a real public API, its just jsd.

The problem is that few functions in jsd returns the result of JS_GetFunctionName to the caller. I am not sure what to do there. A simple solution is just to call JS_EncodeString there and document that the caller must release the strings. But that would mean that all the callers would leak. Another solution is to break the interface and return JSString pointers there. So what should be done?

> Not having to worry about the dsc 

With dsc only for atoms the cache would just need to take to default compartment lock. But I agree that that it is better to remove the API as it caused bugs like 608799.
(In reply to comment #36)
> I still think we should measure string lengths. Even if DSC usage was mainly on
> error paths, allocating in those cases is potentially going to OOM.

A fixed size-buffer would not prevent OOM if the string does not fit into it. We may consider cutting them to make sure that the error messages are short, but this is for another bug.
(In reply to comment #37)
> The problem is that few functions in jsd returns the result of
> JS_GetFunctionName to the caller. I am not sure what to do there. A simple
> solution is just to call JS_EncodeString there and document that the caller
> must release the strings. But that would mean that all the callers would leak.
> Another solution is to break the interface and return JSString pointers there.
> So what should be done?

Break API, don't change to create an obvious leak hazard.

/be
blocking2.0: beta8+ → beta9+
Fixed by an alternative approach, no longer have to block on this.
Status: NEW → RESOLVED
blocking2.0: beta9+ → ---
Closed: 14 years ago
Resolution: --- → WONTFIX
Blocks: 616927
No longer blocks: compartmentGC
I see this is marked as docs needed. Anyone feel like contributing appropriate material here?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: