Closed Bug 533450 Opened 15 years ago Closed 14 years ago

js_GetStringBytes ignore js_CStringsAreUTF8 when JSString::isUnitString

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: sog, Assigned: gwagner)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 6 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.15) Gecko/2009102704 Fedora/3.0.15-1.fc10 Firefox/3.0.15
Build Identifier: 

The code in js_GetStringBytes after the patches that add IntStrings and UnitStrings support (bug #513530) when handling a UnitString returns the associated codepoint, but when js_CStringsAreUTF8 is true it must returns the associated UTF8 encoding for codepoints >= 0x80.

With 1.8.0rc1 all works as documented.

Reproducible: Always
Attached file A test for src/jsapi-tests (obsolete) —
Thanks for the jsapi-tests framework, its wonderful.
Assignee: general → anygregor
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Yeah I missed the js_CStringsAreUTF8 option.
Thx a lot for the test case.

In order to run the testcase within the jsapi-tests in debug mode we have to set js_NewRuntimeWasCalled to false when we destroy the runtime.
Attached patch patch (obsolete) — Splinter Review
Yes, that fix my problem. 

But I was thinking in a fix that don't penalize the js_CStringsAreUTF8 case using another static table for the case "c >= 0x80" (by definition <= 0xFF) with the precomputed UTF8 encodings.

I think that the total size of 128 * sizeof(U32) including padding, is an affordable price.  What do you think?
Another approach, my first attempt to hack into spidermonkey.
The same, now with 'patch' checked, sorry.
Attachment #416937 - Attachment is obsolete: true
Fix a typo in previous attempt
Attachment #416956 - Attachment is obsolete: true
Thanks for the patch again! Looks good.
Does anybody know how often js_CStringsAreUTF8 is used?
Should we optimize it for speed?
Comment on attachment 416863 [details] [diff] [review]
patch

>     if (JSString::isUnitString(str)) {
>+        if (js_CStringsAreUTF8)
>+            return js_DeflateString(cx, str->chars(), str->length());

This leaks the result.
IIRC couchdb uses utf-8 char strings.

/be
Attached patch refresh (obsolete) — Splinter Review
Attachment #416568 - Attachment is obsolete: true
Attachment #416863 - Attachment is obsolete: true
Attachment #416984 - Attachment is obsolete: true
Attachment #417618 - Flags: review?(brendan)
Comment on attachment 417618 [details] [diff] [review]
refresh

>+++ b/js/src/jsapi.cpp	Mon Dec 14 20:44:32 2009 -0800
>@@ -894,16 +894,20 @@ JS_CommenceRuntimeShutDown(JSRuntime *rt
> }
> 
> JS_PUBLIC_API(void)
> JS_DestroyRuntime(JSRuntime *rt)
> {
>     rt->~JSRuntime();
> 
>     js_free(rt);
>+
>+#ifdef DEBUG
>+    js_NewRuntimeWasCalled = JS_FALSE;
>+#endif

Why is this needed? It's not what the code intends, as the comment before the definition of js_NewRuntimeWasCalled says. It is only trying to flag that JS_NewRuntime was *ever* called (one or more times, doesn't matter). Resetting it here is wrong because it means the assertion in JS_SetCStringsAreUTF8 won't catch failure to call that API before ever calling JS_NewRuntime.

Ah, but your jsapi-tests test requires calling JS_SetCStringsAreUTF8 after the jsapi-tests driver has created a new runtime, doesn't it? Argh. Paging jorendorff for advice.

>+++ b/js/src/jsstr.cpp	Mon Dec 14 20:44:32 2009 -0800
>@@ -2865,16 +2865,28 @@ const char *JSString::deflatedIntStringT
>     L3(0xf0), L3(0xf1), L3(0xf2), L3(0xf3), L3(0xf4), L3(0xf5), L3(0xf6), L3(0xf7),
>     L3(0xf8), L3(0xf9), L3(0xfa), L3(0xfb), L3(0xfc), L3(0xfd), L3(0xfe), L3(0xff)
> };
> 
> #undef L1
> #undef L2
> #undef L3
> 
>+/* Static table for common UTF8 encoding */
>+#define U8(c)   (char)(((c) >> 6) | 0xc0), (char)(((c) & 0x3f) | 0x80), 0
>+#define U(c)    U8(c), U8(c+1), U8(c+2), U8(c+3), U8(c+4), U8(c+5), U8(c+6), U8(c+7)
>+
>+const char JSString::deflatedUnitStringTable[] = {
>+    U(0x80), U(0x88), U(0x90), U(0x98), U(0xa0), U(0xa8), U(0xb0), U(0xb8),
>+    U(0xc0), U(0xc8), U(0xd0), U(0xd8), U(0xe0), U(0xe8), U(0xf0), U(0xf8)
>+};
>+
>+#undef U
>+#undef UTF8

That last should be U8, right?

>@@ -3806,21 +3818,26 @@ js_GetStringBytes(JSContext *cx, JSStrin
>     JSHashTable *cache;
>     char *bytes;
>     JSHashNumber hash;
>     JSHashEntry *he, **hep;
> 
>     if (JSString::isUnitString(str)) {
> #ifdef IS_LITTLE_ENDIAN
>         /* Unit string data is {c, 0, 0, 0} so we can just cast. */
>-        return (char *)str->chars();
>+        bytes = (char *)str->chars();
> #else
>         /* Unit string data is {0, c, 0, 0} so we can point into the middle. */
>-        return (char *)str->chars() + 1;
>-#endif            
>+        bytes = (char *)str->chars() + 1;
>+#endif
>+        if((*bytes & 0x80) && js_CStringsAreUTF8) {
>+            return JSString::deflatedUnitStringTable + ((*bytes & 0x7f) * 3);
>+        } else {
>+            return bytes;
>+        }

Don't write else after return, and don't overbrace if/then and f/then/else where all parts each fit on a line.

Looks good otherwise, but I'll let Jason finish the patch (I'm supposedly on vacation).

/be
Attachment #417618 - Flags: review?(brendan) → review?(jorendorff)
Now I'm supposedly on vacation too!

Offhand, if the API isn't testable, that's an API bug and we should fix it. Still pondering how to do that.
Attached patch patchSplinter Review
Changes from Brendans review.

Brendan, Jason, ... any idea how to fix the NewRuntimeWasCalled problem?
Attachment #417618 - Attachment is obsolete: true
Attachment #417618 - Flags: review?(jorendorff)
(In reply to comment #14)
> Created an attachment (id=420367) [details]
> patch
> 
> Changes from Brendans review.
> 
> Brendan, Jason, ... any idea how to fix the NewRuntimeWasCalled problem?

Need more jsapi-test framework support? Or a different API, but we wanted to avoid per-runtime C-strings-are-UTF8 parameterization. Jason should weigh in.

/be
Attached patch remove testSplinter Review
It seems an API change will take longer. Lets just fix the js_CStringsAreUTF8 bug first.
Attachment #421940 - Flags: review?(brendan)
Comment on attachment 421940 [details] [diff] [review]
remove test

>diff -r 5c06d8cc50b0 js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp	Thu Jan 14 18:48:17 2010 -0800
>+++ b/js/src/jsstr.cpp	Fri Jan 15 15:10:36 2010 -0800
>@@ -2875,16 +2875,28 @@ const char *JSString::deflatedIntStringT
>     L3(0xf0), L3(0xf1), L3(0xf2), L3(0xf3), L3(0xf4), L3(0xf5), L3(0xf6), L3(0xf7),
>     L3(0xf8), L3(0xf9), L3(0xfa), L3(0xfb), L3(0xfc), L3(0xfd), L3(0xfe), L3(0xff)
> };
> 
> #undef L1
> #undef L2
> #undef L3
> 
>+/* Static table for common UTF8 encoding */
>+#define U8(c)   (char)(((c) >> 6) | 0xc0), (char)(((c) & 0x3f) | 0x80), 0

Nit: use char(...) C++ call-style casts.

Looks like no signed char issues here; just noting.

>+        return ((*bytes & 0x80) && js_CStringsAreUTF8)
>+              ? JSString::deflatedUnitStringTable + ((*bytes & 0x7f) * 3)
>+              : bytes;

Nit: ? and : underhang the first ( in the condition, so indent one more space on each overflow line.

Thanks, r=me with these picked.

/be
Attachment #421940 - Flags: review?(brendan) → review+
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/8b2ec4527398
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: