Closed Bug 1034627 Opened 10 years ago Closed 10 years ago

Make XPConnect work with Latin1 strings and nursery-allocated strings

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(8 files, 4 obsolete files)

3.67 KB, patch
terrence
: review+
Details | Diff | Splinter Review
3.06 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
4.91 KB, patch
bholley
: review+
Details | Diff | Splinter Review
4.89 KB, patch
bholley
: review+
Details | Diff | Splinter Review
4.06 KB, patch
bholley
: review+
Details | Diff | Splinter Review
2.72 KB, patch
bholley
: review+
Details | Diff | Splinter Review
5.77 KB, patch
terrence
: review+
Details | Diff | Splinter Review
12.18 KB, patch
bholley
: review+
Details | Diff | Splinter Review
We have to fix various JS API functions that don't work with Latin1 strings or moving GC. Most of the callers are in XPConnect, I will post patches to fix them.

There are a few calls outside xpconnect; will fix those separately.
Attachment #8450994 - Flags: review?(terrence)
Some Dump/Print functions can just use nsAutoJSString.
Attachment #8450999 - Flags: review?(bobbyholley)
Depends on: 1034191
Also pretty straight-forward.
Attachment #8451005 - Flags: review?(bobbyholley)
bholley and bz, correct me if I'm wrong but none of this looks super hot so for simplicity I'm just using nsAutoJSString (which always null-terminates, btw).
Attachment #8451017 - Flags: review?(bobbyholley)
Just curious, will all this code go away eventually?
Attachment #8451026 - Flags: review?(bzbarsky)
Right now XPCVariant has this trick where it roots the string and then uses the string's chars directly. This is not really safe once we start moving strings and their inline chars, and also doesn't work with Latin1 strings because we have to inflate.

This patch makes us copy the string chars instead. I added AllocateWStringWithSize to nsVariant, so that we only have to copy once. The alternative is to use nsAutoJSString and then nsVariant::SetFromWStringWithSize but that copies twice. If you think that's acceptable I can also do that; I've no idea if it matters here.
Attachment #8451034 - Flags: review?(bobbyholley)
Attached patch Part 7 - XPCConvert (obsolete) — Splinter Review
Places where we used memcpy now use JS_CopyStringChars, which works with Latin1 strings etc. JS_GetStringCharAt is used when we only need the first char.

Note that JS_GetTwoByteExternalStringChars is safe because the chars won't be moved by the JS engine and are guaranteed to be TwoByte, at least for now.
Attachment #8451039 - Flags: review?(bobbyholley)
> Just curious, will all this code go away eventually?

Yes, as soon as we land bug 660237 and bug 979835 (and maybe bug 1018658).
Comment on attachment 8450994 [details] [diff] [review]
Part 1 - Add some new APIs

Review of attachment 8450994 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +5422,5 @@
> +    return str->latin1OrTwoByteChar(index);
> +}
> +
> +JS_PUBLIC_API(bool)
> +JS_CopyStringChars(JSContext *cx, jschar *dest, JSString *str)

Could we make |dest| a Range so that we get bounds checking on this?
Attachment #8450994 - Flags: review?(terrence) → review+
Comment on attachment 8450999 [details] [diff] [review]
Part 2 - Dump and Print functions

Review of attachment 8450999 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +121,3 @@
>          return false;
>  
> +    NS_ConvertUTF16toUTF8 utf8str(autoStr);

Once JSStrings are latin1, this will be a lot of unnecessary work (latin1->UTF16->UTF8, when latin1 is probably just fine). Shouldn't we add an nsAutoJSCString or somesuch to use in cases like this?
(In reply to Bobby Holley (:bholley) from comment #10)
> Once JSStrings are latin1, this will be a lot of unnecessary work
> (latin1->UTF16->UTF8, when latin1 is probably just fine). Shouldn't we add
> an nsAutoJSCString or somesuch to use in cases like this?

These are just the print/dump functions as far as I know, performance shouldn't be a big concern there. But in most cases we can probably use JSAutoByteString though, it has an encodeUtf8 method. I can post a follow-up patch to do that.
Comment on attachment 8450999 [details] [diff] [review]
Part 2 - Dump and Print functions

Review of attachment 8450999 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jan de Mooij [:jandem] from comment #11)
> (In reply to Bobby Holley (:bholley) from comment #10)
> > Once JSStrings are latin1, this will be a lot of unnecessary work
> > (latin1->UTF16->UTF8, when latin1 is probably just fine). Shouldn't we add
> > an nsAutoJSCString or somesuch to use in cases like this?
> 
> These are just the print/dump functions as far as I know, performance
> shouldn't be a big concern there. But in most cases we can probably use
> JSAutoByteString though, it has an encodeUtf8 method. I can post a follow-up
> patch to do that.

My point is that we should design whatever primitives we need first, instead of writing code that we know will eventually cause a bunch of unnecessary inflation/deflation.

This is based on the understanding that JSStrings will generally be latin1, and that nsAutoJSString will remain UTF-16. If that's true, then we should use a helper like JSAutoByteString that doesn't unnecessarily round-trip through UTF-16.
Attachment #8450999 - Flags: review?(bobbyholley) → review-
Comment on attachment 8451005 [details] [diff] [review]
Part 3 - SandboxDump, AccessCheck.cpp

Review of attachment 8451005 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/Sandbox.cpp
@@ +105,1 @@
>      char *cstr = ToNewUTF8String(wstr);

Same here.

::: js/xpconnect/wrappers/AccessCheck.cpp
@@ +297,4 @@
>          return false;
>  
> +    for (size_t i = 0; i < autoStr.Length(); ++i) {
> +        switch (autoStr[i]) {

and here.
Attachment #8451005 - Flags: review?(bobbyholley) → review-
Comment on attachment 8451017 [details] [diff] [review]
Part 4 - XPCComponents and XPCWrappedJSClass

Review of attachment 8451017 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCComponents.cpp
@@ +562,2 @@
>          nsID iid;
> +        if (!iid.Parse(NS_ConvertUTF16toUTF8(autoStr).get()))

And here.
Attachment #8451017 - Flags: review?(bobbyholley) → review-
Comment on attachment 8451034 [details] [diff] [review]
Part 6 - XPCVariant

Review of attachment 8451034 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCVariant.cpp
@@ -95,5 @@
>  
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(XPCVariant)
>      JS::Value val = tmp->GetJSValPreserveColor();
>  
> -    // We're sharing val's buffer, clear the pointer to it so Cleanup() won't

It's a bit of a shame to be getting rid of this, but I guess it's unavoidable. I don't think we use variants enough these days for it to really be a problem.

@@ +297,3 @@
>          // Despite the fact that the variant holds the length, there are
>          // implicit assumptions that mWStringValue[mWStringLength] == 0
> +        mData.u.wstr.mWStringValue[length] = '\0';

Let's hoist this into AllocateWStringWithSize, and then just assert it after the call.

::: xpcom/ds/nsVariant.h
@@ +145,5 @@
>      static nsresult SetFromInterface(nsDiscriminatedUnion* data, const nsIID& iid, nsISupports *aValue);
>      static nsresult SetFromArray(nsDiscriminatedUnion* data, uint16_t type, const nsIID* iid, uint32_t count, void * aValue);
>      static nsresult SetFromStringWithSize(nsDiscriminatedUnion* data, uint32_t size, const char *aValue);
>      static nsresult SetFromWStringWithSize(nsDiscriminatedUnion* data, uint32_t size, const char16_t *aValue);
> +    static nsresult AllocateWStringWithSize(nsDiscriminatedUnion* data, uint32_t size);

Add a comment indicating that this the leaves garbage in the string (but null-terminates it - see above).
Attachment #8451034 - Flags: review?(bobbyholley) → review+
Comment on attachment 8451039 [details] [diff] [review]
Part 7 - XPCConvert

Review of attachment 8451039 [details] [diff] [review]:
-----------------------------------------------------------------

I haven't reviewed all of this yet, but I'll wait for the UTF16 thing to be addressed.

::: js/xpconnect/src/XPCConvert.cpp
@@ +369,5 @@
> +static void
> +CheckCharsInCharRange(const CharT *chars, size_t len)
> +{
> +    for (size_t i = 0; i < len; i++) {
> +        if (!CheckJSCharInCharRange(chars[i]))

Why are we not propagating this failure the way the old code used to? It seems like we should continue doing that unless we have a good reason to change it.

@@ +453,5 @@
> +        jschar ch;
> +        if (!JS_GetStringCharAt(cx, str, 0, &ch))
> +            return false;
> +
> +        *((uint16_t*)d) = uint16_t(ch);

So jschar is staying uint16, right?

@@ +657,5 @@
> +        nsAutoJSString autoStr;
> +        if (!autoStr.init(cx, str))
> +            return false;
> +
> +        CopyUTF16toUTF8(autoStr, *rs);

Same issue of unnecessarily passing through UTF16.
Attachment #8451039 - Flags: review?(bobbyholley) → review-
Thanks for the reviews, Bobby. I'll see if we can use JSAutoByteString more.

(In reply to Bobby Holley (:bholley) from comment #16)
> Why are we not propagating this failure the way the old code used to? It
> seems like we should continue doing that unless we have a good reason to
> change it.

No, the old code also didn't propagate this; it just broke out of the loop like the new function does. CheckJSCharInCharRange just has a NS_WARNING and a |return false|, it also doesn't throw an exception we can propagate or something.

> So jschar is staying uint16, right?

Yup.
(In reply to Jan de Mooij [:jandem] from comment #17)
> No, the old code also didn't propagate this; it just broke out of the loop
> like the new function does. CheckJSCharInCharRange just has a NS_WARNING and
> a |return false|, it also doesn't throw an exception we can propagate or
> something.

Ah! You're right - sorry for the noise.
Attachment #8450999 - Attachment is obsolete: true
Attachment #8453048 - Flags: review?(bobbyholley)
Attachment #8451005 - Attachment is obsolete: true
Attachment #8453049 - Flags: review?(bobbyholley)
Kind of unrelated, but nsXPCComponents_InterfacesByID::NewResolve has:

     RootedString str(cx, JSID_TO_STRING(id));
     if (38 != JS_GetStringLength(str))
         return NS_OK;

Do you have any idea what string this is supposed to match? It's pretty weird.
Attachment #8451017 - Attachment is obsolete: true
Attachment #8453054 - Flags: review?(bobbyholley)
Removes some cruft from DeflateStringToUTF8Buffer (the buffer we pass it is never too small so the code to deal with that is dead) and adds some APIs we can use in the next patch.
Attachment #8453060 - Flags: review?(terrence)
Attachment #8451039 - Attachment is obsolete: true
Attachment #8453061 - Flags: review?(bobbyholley)
> Do you have any idea what string this is supposed to match?

A uuid enclosed in curly braces.
(In reply to Boris Zbarsky [:bz] from comment #24)
> > Do you have any idea what string this is supposed to match?
> 
> A uuid enclosed in curly braces.

Ah, thanks! It makes sense now; I should have started with the iid.Parse call I guess.
Comment on attachment 8453060 [details] [diff] [review]
Part 7 - UTF8 conversion

Review of attachment 8453060 [details] [diff] [review]:
-----------------------------------------------------------------

Great! IIRC, there used to be a user that tried the stack first and fell back to malloc after a failure. Glad to see that's gone.
Attachment #8453060 - Flags: review?(terrence) → review+
Attachment #8453049 - Flags: review?(bobbyholley) → review+
Attachment #8453054 - Flags: review?(bobbyholley) → review+
Blocks: 1036777
Comment on attachment 8453061 [details] [diff] [review]
Part 8 - XPCConvert

Review of attachment 8453061 [details] [diff] [review]:
-----------------------------------------------------------------

Nice careful work! r=bholley

::: js/xpconnect/src/XPCConvert.cpp
@@ -533,5 @@
>              ws->AssignLiteral(chars, length);
> -        } else if (useAllocator && STRING_TO_JSVAL(str) == s) {
> -            // The JS string will exist over the function call.
> -            // We don't need to copy the characters in this case.
> -            ws->Rebind(chars, length);

This was a hard-fought optimization by Neil I think. I guess there's nothing we can do if these things are nursery-allocated, but could we keep it if the strings are tenured? Does that ever happen?

@@ +1835,5 @@
>              }
>              if (len < count)
>                  len = count;
>  
> +            MOZ_ASSERT(len == count);

Can we just replace these 3 lines with |len = count;|?
Attachment #8453061 - Flags: review?(bobbyholley) → review+
Neil, see comment 27.
Attachment #8453048 - Flags: review?(bobbyholley) → review+
(In reply to Bobby Holley from comment #27)
> > -        } else if (useAllocator && STRING_TO_JSVAL(str) == s) {
> > -            // The JS string will exist over the function call.
> > -            // We don't need to copy the characters in this case.
> > -            ws->Rebind(chars, length);
> 
> This was a hard-fought optimization by Neil I think. I guess there's nothing
> we can do if these things are nursery-allocated, but could we keep it if the
> strings are tenured? Does that ever happen?

Actually this existed prior to my literal string patches, although it was called NewStringWrapper (basically an nsDependentString) back then, and only worked if useAllocator was true.
Comment on attachment 8453061 [details] [diff] [review]
Part 8 - XPCConvert

>+            if (!JS_GetStringCharAt(cx, str, 0, &ch))
[This always returns the first char, so strictly speaking linearising the string is a waste of time.]

>+        if (JS_StringHasLatin1Chars(str)) {
>+            size_t len;
>+            AutoCheckCannotGC nogc;
>+            const Latin1Char *chars = JS_GetLatin1StringCharsAndLength(cx, nogc, str, &len);
>+            if (chars)
>+                CheckCharsInCharRange(chars, len);
>+        } else {
>+            size_t len;
>+            AutoCheckCannotGC nogc;
>+            const jschar *chars = JS_GetTwoByteStringCharsAndLength(cx, nogc, str, &len);
>+            if (chars)
>+                CheckCharsInCharRange(chars, len);
(Note that this only has an effect if you #define STRICT_CHECK_OF_UNICODE, I don't know whether it's worth wrapping it in an #ifdef block.)

>+        size_t utf8Length = JS::GetDeflatedUTF8StringLength(flat);
>+        rs->SetLength(utf8Length + 1);
Wrong length?

>+        JS::DeflateStringToUTF8Buffer(flat, mozilla::RangedPtr<char>(rs->BeginWriting(), utf8Length));
>+        rs->BeginWriting()[utf8Length] = '\0';
SetLength already writes the null-terminator.

>+            nsAutoJSString autoStr;
>+            if (!autoStr.init(cx, str)) {
>                 if (pErr)
>                     *pErr = NS_ERROR_XPC_BAD_CONVERT_JS_NULL_REF;
>                 return false;
>             }
>+            nsCOMPtr<nsIAtom> atom = NS_NewAtom(autoStr);
[Side note: technically we don't want the string to be an auto string because NS_NewAtom wants a string buffer anyway.]

>             if (len > count) {
>                 if (pErr)
>                     *pErr = NS_ERROR_XPC_NOT_ENOUGH_CHARS_IN_STRING;
>                 return false;
>             }
>             if (len < count)
>                 len = count;
> 
>-            if (!(chars = JS_GetStringCharsZ(cx, str))) {
>-                return false;
>-            }
>             uint32_t alloc_len = (len + 1) * sizeof(jschar);
>             if (!(*((void**)d) = nsMemory::Alloc(alloc_len))) {
>                 // XXX should report error
>                 return false;
>             }
>-            memcpy(*((jschar**)d), chars, alloc_len);
>             (*((jschar**)d))[count] = 0;
[I wish I could work out from reading the code what it was supposed to do, it makes no sense to me! Anyone have any documentation for this?]
Comment on attachment 8450994 [details] [diff] [review]
Part 1 - Add some new APIs

>+JS_GetTwoByteExternalStringChars(JSString *str)
>+{
>+    return str->asExternal().twoByteChars();
>+}
Is it worth my while looking into the possibility of adding latin1 external string support, so that ACString and NS_LITERAL_CSTRING could be passed through JS without copying?
(In reply to neil@parkwaycc.co.uk from comment #31)
> Is it worth my while looking into the possibility of adding latin1 external
> string support, so that ACString and NS_LITERAL_CSTRING could be passed
> through JS without copying?

How common is this? Right now external strings will still be TwoByte, it'd be nice if we could use Latin1 where possible, even for "static" strings, because for instance concatenating a TwoByte string and a Latin1 string will result in a TwoByte string...
Comment on attachment 8451026 [details] [diff] [review]
Part 5 - Quick stubs

r=me
Attachment #8451026 - Flags: review?(bzbarsky) → review+
sorry had to back this changes out for valgrind test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=43592123&tree=Mozilla-Inbound
Parts 5, 7 and 8:

https://hg.mozilla.org/integration/mozilla-inbound/rev/8f50bc05d337
https://hg.mozilla.org/integration/mozilla-inbound/rev/110fbc2ebc1a
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcba40acc0ac

(In reply to neil@parkwaycc.co.uk from comment #30)
> >+            if (!JS_GetStringCharAt(cx, str, 0, &ch))
> [This always returns the first char, so strictly speaking linearising the
> string is a waste of time.]

If you have a deep rope tree, getting the first character is more expensive, so if we call this repeatedly it may be cheaper to flatten the first time. In the JS engine, character operations like this one almost always flatten and it's usually fine. There are some exceptions for hot paths where it really matters.

> >+        size_t utf8Length = JS::GetDeflatedUTF8StringLength(flat);
> >+        rs->SetLength(utf8Length + 1);
> Wrong length?

Oops, good catch!

> >+        rs->BeginWriting()[utf8Length] = '\0';
> SetLength already writes the null-terminator.

True. I turned it into an assert.

> [I wish I could work out from reading the code what it was supposed to do,
> it makes no sense to me! Anyone have any documentation for this?]

Yeah it's pretty weird... I just tried to preserve the existing behavior :)
Keywords: leave-open
Huh, sorry for that. I pushed it to Try multiple times.. Maybe a unified build issue, or something that landed yesterday or today broke it :(
You need to log in before you can comment on or make changes to this bug.