Closed Bug 331456 Opened 18 years ago Closed 18 years ago

Per-runtime deflated string cache

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

Details

Attachments

(1 file, 4 obsolete files)

Currently the cache of deflated strings is global is shared between runtimes. For embeddings with multiple runtimes and threads this increases a lock congestion without benefits of sharing the deflated characters as strings are cached based on their address, no context. See comment 11 in bug 294499#11 for details.

Thus the idea is to use one cache per runtime to address this.
Attached patch Implementation (obsolete) — Splinter Review
Assignee: general → igor.bukanov
Status: NEW → ASSIGNED
Attachment #216008 - Flags: review?(brendan)
Comment on attachment 216008 [details] [diff] [review]
Implementation

>+++ src/jscntxt.h	2006-03-23 13:57:18.000000000 +0100
>@@ -134,6 +134,13 @@
>     jsdouble            *jsNegativeInfinity;
>     jsdouble            *jsPositiveInfinity;
> 
>+#ifdef JS_THREADSAFE
>+    JSLock              *deflatedStringCacheLock;
>+#endif
>+    JSHashTable         *deflatedStringCache;
>+#ifdef DEBUG
>+    uint32              deflatedStringCacheBytes;
>+#endif

Empty line here please (before comment, generally).

>     /* Empty string held for use by this runtime's contexts. */
>     JSString            *emptyString;
> 
>Index: src/jsgc.c
>===================================================================
>--- src.orig/jsgc.c	2006-03-23 13:57:03.000000000 +0100
>+++ src/jsgc.c	2006-03-23 13:57:18.000000000 +0100
>@@ -312,6 +312,22 @@
>     return (uint8 *)(pageAddress - offsetInArena + thingIndex);
> }
> 
>+JSRuntime*
>+js_GetGCStringRuntime(JSString *str)
>+{
>+    JSGCPageInfo *pi;
>+    JSGCArenaList *list;
>+
>+    pi = (JSGCPageInfo *) ((jsuword)str & ~GC_PAGE_MASK);

Time to macro-ize the right-hand side of = here, for cleaner code at all the places such an expression occurs in jsgc.c?

>+    list = PAGE_TO_ARENA(pi)->list;
>+
>+    JS_ASSERT(list->thingSize == sizeof(JSGCThing));
>+    JS_ASSERT(GC_FREELIST_INDEX(sizeof(JSGCThing)) == 0);
>+
>+    return (JSRuntime *)((unsigned char*)list

No need for unsigned, and space before * is more consistent style.

>+                         - offsetof(JSRuntime, gcArenaList));
>+}
>+
> JSBool
> js_IsAboutToBeFinalized(JSContext *cx, void *thing)
> {
>@@ -2471,21 +2434,40 @@
>     JSAtom *atom;
> 
>     rt = cx->runtime;
>-    JS_ASSERT(!rt->emptyString);
>+
>+    /* Initialize string cache */
>+#ifdef JS_THREADSAFE
>+    JS_ASSERT(!rt->deflatedStringCacheLock);
>+    rt->deflatedStringCacheLock = JS_NEW_LOCK();
>+    if (!rt->deflatedStringCacheLock)
>+        goto bad;

No need for goto bad here, save an if-test at that label by returning false.

/be
(In reply to comment #2)
> >+    return (JSRuntime *)((unsigned char*)list
> 
> No need for unsigned

I used "unsigned char *" and not "char *" since strictly speaking according to the C standard an arbitrray pointer could only be casted to "unsigned char *" while casting to "char*" was not defined. Is it OK if I replace that by uint8 *?
Good grief, does ISO C really say that?  What pedantry.  It does not square with the evolution of C on real architectures, which I witnessed a lot of, or the plain physical sense of the types and the pointer arithmetic.

Sure, uint8 * -- there are other such naughty char * casts, however, and if you aren't going to fix them I don't see the point in being different.

I've never seen a warning on this.

/be
(In reply to comment #4)
> does ISO C really say that?

Sorry for spreading FUD about ISO C. Casting to signed char* has the same restrictions as casting to unsigned char*. 

signed char* is only problematic when one dereference it: according to the standard an implementation is not required to use two's-compliment form for signed integers. So possible bit patterns for signed char may not cover the whole byte and accessing a pattern that is outside the defined range is an undefined behavior.

Still IMO using unsigned char or byte or uint8 is better as it emphasizes that raw memory is accessed and not a C string.
Attachment #216008 - Attachment is obsolete: true
Attachment #216217 - Flags: review?(brendan)
Attachment #216008 - Flags: review?(brendan)
Comment on attachment 216217 [details] [diff] [review]
Implementation v2 (addressing the nits)

> #define PAGE_TO_ARENA(pi)                                                     \
>     ((JSGCArena *)((jsuword)(pi) - (pi)->offsetInArena                        \
>                    - offsetof(JSGCArena, base)))
> 
> #define PAGE_INDEX(pi)                                                        \
>     ((size_t)((pi)->offsetInArena >> GC_PAGE_SHIFT))
> 
>+#define GC_THING_PAGE(thing)                                                  \
>+    ((JSGCPageInfo *)((jsuword)(thing) & ~GC_PAGE_MASK))

More clear and consistent name would be THING_TO_PAGE.  I'd probably have burdened all these PAGE names with _INFO after PAGE, but that's unnecessary.  Still, THING_TO_PAGE seems better than GC_THING_PAGE.

> JSBool
> js_InitRuntimeStringState(JSContext *cx)
> {
>     JSRuntime *rt;
>     JSString *empty;
>     JSAtom *atom;
> 
>     rt = cx->runtime;
>-    JS_ASSERT(!rt->emptyString);
>+
>+    /* Initialize string cache */
>+#ifdef JS_THREADSAFE
>+    JS_ASSERT(!rt->deflatedStringCacheLock);
>+    rt->deflatedStringCacheLock = JS_NEW_LOCK();
>+    if (!rt->deflatedStringCacheLock)
>+        return JS_FALSE;
>+#endif
> 
>     /* Make a permanently locked empty string. */
>+    JS_ASSERT(!rt->emptyString);
>     empty = js_NewStringCopyN(cx, js_empty_ucstr, 0, GCF_LOCK);
>     if (!empty)
>-        return JS_FALSE;
>+        goto bad;
> 
>     /* Atomize it for scripts that use '' + x to convert x to string. */
>     atom = js_AtomizeString(cx, empty, ATOM_PINNED);
>     if (!atom)
>-        return JS_FALSE;
>+        goto bad;
> 
>     rt->emptyString = empty;
>     rt->atomState.emptyAtom = atom;
>+
>     return JS_TRUE;
>+
>+  bad:
>+#ifdef JS_THREADSAFE
>+    if (rt->deflatedStringCacheLock) {

No need for if-test here.

>+        JS_DESTROY_LOCK(rt->deflatedStringCacheLock);
>+        rt->deflatedStringCacheLock = NULL;
>+    }
>+#endif
>+    return JS_FALSE;
>+
> }

About char * vs. uint8 * -- I'm fine either way in SpiderMonkey.  In C since the odd-ball DEC systems of the 70s died, char * has meant byte pointer.  And even on my beloved DECSystem 2060, where char was int was a 36-bit word (!), you couldn't address at a finer grain, so char * was fine for arbitrary pointer type.

Then came void *, but it doesn't work if you want address arithmetic.

C99 will save us, some year (maybe this year?).

/be
(In reply to comment #7)
> In C since the odd-ball DEC systems of the 70s died, char * has meant byte pointer.

Friend of mine who works in a company that provides ticket booking and reservation for few airlines had to support 36-bit C code at least in 2003. Next time we meet I ask him about the hardware.



> even on my beloved DECSystem 2060, where char was int was a 36-bit word (!),
> you couldn't address at a finer grain, so char * was fine for arbitrary pointer
> type.
> 
> Then came void *, but it doesn't work if you want address arithmetic.
> 
> C99 will save us, some year (maybe this year?).
> 
> /be
> 

Attached patch Impementation v3 (obsolete) — Splinter Review
The previous version of the patch had a bad bug: I forgot to call js_GetGCStringRuntime in js_SetStringBytes. GCC was screaming about uninitialized rt but I missed the warning in the make output :(.

So here is a patch version that changed js_SetStringBytes to take JSContext* as argument as the context is available in places where the function is called. In addition js_GetStringChars is changes to take JSRuntime* parameter so the only place where js_GetGCStringRuntime is called is JS_GetStringBytes.

It would be even cleaner to add a version of JS_GetStringBytes that takes JSContext* as argument and change JS_GetStringBytes to create a temporary context to pass as argument to js_GetStringChars and fix the following comment:

/*
 * May be called with null cx by js_GetStringBytes, see below.
 */
char *
js_DeflateString(JSContext *cx, const jschar *chars, size_t length)

But that is for another bug.
Attachment #216217 - Attachment is obsolete: true
Attachment #216255 - Flags: review?(brendan)
Attachment #216217 - Flags: review?(brendan)
(In reply to comment #9)
> It would be even cleaner to add a version of JS_GetStringBytes that takes
> JSContext* as argument and change JS_GetStringBytes to create a temporary
> context to pass as argument to js_GetStringChars and fix the following comment:
> 
> /*
>  * May be called with null cx by js_GetStringBytes, see below.
>  */
> char *
> js_DeflateString(JSContext *cx, const jschar *chars, size_t length)
> 
> But that is for another bug.
> 

Please don't -- JS_GetStringBytes is called a lot.  It should not have to create and destroy a context just to avoid a null-test on cx under js_DeflateString.  The branch penalty is small with a short then part.

/be
(In reply to comment #9)
> It would be even cleaner to add a version of JS_GetStringBytes that takes
> JSContext* as argument

No point in this API either, really -- lots of sunk costs to un-sink, and bigger fish to fry elsewhere.  Plus, infallible and cx-less is better for callers.  Most should not need to test and propagate OOM.  OOM checking is a fruitless distortion to much of our code, on modern VM-based OSes.  For handhelds it could matter, but should be handled in a more centralized way.

/be
Comment on attachment 216255 [details] [diff] [review]
Impementation v3

>@@ -4254,17 +4251,17 @@
>     /* Free chars (but not bytes, which caller frees on error) if we fail. */
>     str = js_NewString(cx, chars, length, 0);
>     if (!str) {
>         JS_free(cx, chars);
>         return NULL;
>     }
> 
>     /* Hand off bytes to the deflated string cache, if possible. */
>-    if (!js_SetStringBytes(str, bytes, nbytes))
>+    if (!js_SetStringBytes(cx, str, bytes, nbytes))

js_SetStringBytes should take an rt, not a cx that it uses only to get cx->runtime -- this restores symmetry with js_GetStringBytes, confirming the parameterization matching the data dependency.

> JS_GetStringBytes(JSString *str)
> {
>+    JSRuntime *rt;
>     char *bytes;
> 
>-    bytes = js_GetStringBytes(str);
>+    rt = js_GetGCStringRuntime(str);
>+    bytes = js_GetStringBytes(rt, str);

Single-use variable to avoid nested function call expression seems not worth its source-code weight here.

>+extern void
>+js_PurgeDeflatedStringCache(JSRuntime *rt, JSString *str);
>+

Since you're moving this, perhaps it belongs down with js_[GS]etStringBytes.

/be
Attached patch Impementation v4 (obsolete) — Splinter Review
Attachment #216255 - Attachment is obsolete: true
Attachment #216260 - Flags: review?(brendan)
Attachment #216255 - Flags: review?(brendan)
Comment on attachment 216260 [details] [diff] [review]
Impementation v4

Thanks, sorry for the patch iteration.  Glad you caught that warning -- I'm not test-building, just reading (too quickly, obviously).

/be
Attachment #216260 - Flags: review?(brendan) → review+
This is the version to commit.
Attachment #216260 - Attachment is obsolete: true
I committed the patch from attachment 216261 [details] [diff] [review].
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
This bug introduced a leak (caught by valgrind while running the shell), because we finish the string runtime state before running the last GC, so we weren't purging remaining strings from the string cache and freeing their bytes.

This checkin fixed the leak:
Checking in jsstr.c;
/cvsroot/mozilla/js/src/jsstr.c,v  <--  jsstr.c
new revision: 3.128.2.4; previous revision: 3.128.2.3
done
Checking in jsstr.h;
/cvsroot/mozilla/js/src/jsstr.h,v  <--  jsstr.h
new revision: 3.30.4.5; previous revision: 3.30.4.4
done
Checking in jscntxt.c;
/cvsroot/mozilla/js/src/jscntxt.c,v  <--  jscntxt.c
new revision: 3.83.2.5; previous revision: 3.83.2.4
done
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: