Closed Bug 761552 Opened 12 years ago Closed 11 years ago

Converting narrow strings to external js strings should count the corresponding buffers to the JS compartment memory

Categories

(Toolkit :: about:memory, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: nmaier, Assigned: nmaier)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 4 obsolete files)

XPCConvert takes care of converting native data from/to jsvals, when an XPCOM call crosses the C++/JS boundary.

Some things, in particular cstrings and utf8strings (narrow strings), are converted to unicode using new buffers and passing these new buffers as external strings into js land. These unshared external strings are currently allocated using nsMemory and not counted towards the js compartment and are hence ending up as heap-unclassified.
(The buffers are allocated in XPCConvert.cpp and have a corresponding finalizer there)

Wide strings (domstring/astring) are either shared, which complicates proper memory counting, or are already allocated using JS_Alloc.
Single chars (char, wchar_t) do the right thing already and are converted to JSStrings using JS_New(UC)StringCopyN().

Purpose of this bug is to handle narrow string by either avoiding external strings and using JS strings directly (preferred), using JS_malloc instead of nsMemory or keeping nsMemory and using JS_updateMallocCounter (less precise?).
Attached patch Patch v0, fast prototype (obsolete) — Splinter Review
Hey Ms2Ger, could you give some feedback here?
I'm not usually a XPC hacker, but as far as I can tell, you are ;)

Does the approach look sane to you, or should it be done another way, or am I missing something altogether?

Also, I'd like to avoid the JS_updateMallocCounter() and use JS_malloc for the UTF8 stuff in some form, which would require patching or copying UTF8ToNewUnicode. Would it be ok to provide a UTF8ToNewUnicode(const nsACString&, PRUint32 *aUTF16Count, JSContext* cx) which then would use JS_Alloc(cx, ...) instead of nsMemory, or would that be too messy?
If that was OK, what would be more appropriate:
- an overload duplicating the code
- or just a new argument having a default value = nsnull, and then patching the existing function like:
if(cx) JS_malloc()
else nsMemory::Alloc
Attachment #630134 - Flags: feedback?(Ms2ger)
Comment on attachment 630134 [details] [diff] [review]
Patch v0, fast prototype

Review of attachment 630134 [details] [diff] [review]:
-----------------------------------------------------------------

This isn't really something I'm familiar with... Bobby?
Attachment #630134 - Flags: feedback?(Ms2ger) → feedback?(bobbyholley+bmo)
Whiteboard: [MemShrink] → [MemShrink:P2]
In the future, can you please post patches with -U8?

You can set this in your hgrc:

  [diff]
  git = 1
  showfunc = 1
  unified = 8
Comment on attachment 630134 [details] [diff] [review]
Patch v0, fast prototype

patch needs more context - there are multiple places where the first chunk could apply.

> - or just a new argument having a default value = nsnull, and then patching
> the existing function like:
> if(cx) JS_malloc()
> else nsMemory::Alloc

This seems reasonable to me, but I'm not an XPCOM string peer.




>@@ -281,28 +283,20 @@ XPCConvert::NativeData2JS(XPCLazyCallCon
>             {
>                 const nsACString* cString = *((const nsACString**)s);
> 
>-                if (!cString)
>+                if (!cString || cString->IsVoid())
>                     break;
> 
>-                if (!cString->IsVoid()) {
>-                    PRUnichar* unicodeString = ToNewUnicode(*cString);
>-                    if (!unicodeString)
>-                        return false;
>-
>-                    JSString* jsString = JS_NewExternalString(cx,
>-                                                              (jschar*)unicodeString,
>-                                                              cString->Length(),
>-                                                              &sXPCOMUCStringFinalizer);
>-
>-                    if (!jsString) {
>-                        nsMemory::Free(unicodeString);
>-                        return false;
>-                    }
>-
>-                    *d = STRING_TO_JSVAL(jsString);
>+                if (cString->IsEmpty()) {
>+                    *d = JS_GetEmptyStringValue(cx);
>+                    break;
>                 }
> 
>+                JSString* str;
>+                if (!(str = JS_NewStringCopyN(cx, cString->Data(), cString->Length())))
>+                    return false;
>+                *d = STRING_TO_JSVAL(str);
>                 break;

The old code called ToNewUnicode(nsACString), which converts utf8 to utf16: http://mxr.mozilla.org/mozilla-central/source/xpcom/string/src/nsReadableUtils.cpp#316

JS_NewStringCopyN only appears to do utf8-utf16 conversion if JS_SetCStringsAreUTF8 is called, which we don't appear to do.
Attachment #630134 - Flags: feedback?(bobbyholley+bmo)
(In reply to Bobby Holley (:bholley) from comment #4)
> The old code called ToNewUnicode(nsACString), which converts utf8 to utf16:
> http://mxr.mozilla.org/mozilla-central/source/xpcom/string/src/
> nsReadableUtils.cpp#316
> 
> JS_NewStringCopyN only appears to do utf8-utf16 conversion if
> JS_SetCStringsAreUTF8 is called, which we don't appear to do.

Actually it is my understanding that ToNewUnicode just stuffs bytes into wide chars without performing any charset conversations. UTF8ToNewUnicode would perform such a conversation.
It is actually required to not perform a conversation here, so that js may still work on binary data.
New patch, new approach to the utf-8 stuff:
JS already got JS_DecodeUTF8, so I used that instead of UTF8ToNewUnicode, thus avoiding the external string.

Justin, you're a String peer, right?
Assignee: nobody → MaierMan
Attachment #630134 - Attachment is obsolete: true
Attachment #633571 - Flags: review?(justin.lebar+bug)
Comment on attachment 633571 [details] [diff] [review]
Patch v0, Avoid external strings in XPCConvert

> Justin, you're a String peer, right?

Be that as it may, I have absolutely no idea what's going on here.
Attachment #633571 - Flags: review?(justin.lebar+bug) → review?(benjamin)
I am stuck with Flash issues now and don't have the time to really understand what's going on here. If you're willing to wait I can probably get to this in a week, or you could ask dbaron or jst to look at it.
Attachment #633571 - Flags: review?(benjamin) → review?(bobbyholley+bmo)
Comment on attachment 633571 [details] [diff] [review]
Patch v0, Avoid external strings in XPCConvert

I'm really sorry this review rotted for so long, even though I'm only
responsible for about 1-2 months of it. In general, stuff like this is really
hard to review, because nobody that I know of has the knowledge to review it
by inspection. I needed to go dig through the various string APIs, both JS and
XPCOM, to see what they do. And most people don't know what to do about the
XPConnect parts. I sort of do, but it's more "I can reason about it if I go look
at it, it just takes a while". FWIW, it's acceptable to bother the reviewers if
the patch sits for more than a week or two. It probably would have helped here,
because then the patch wouldn't have bitrotted so much.

@@ -251,63 +243,65 @@ XPCConvert::NativeData2JS(XPCLazyCallCon
         case nsXPTType::T_UTF8STRING:
             {
                 const nsACString* cString = *((const nsACString**)s);
 
                 if (!cString)
                     break;
 
                 if (!cString->IsVoid()) {
-                    PRUint32 len;
-                    jschar *p = (jschar *)UTF8ToNewUnicode(*cString, &len);
+                    size_t len;
+                    jschar *buffer;
 
-                    if (!p)
+                    if (!JS_DecodeUTF8(cx, cString->Data(), cString->Length(),
+                                       0, &len))
                         return false;
-
-                    JSString* jsString =
-                        JS_NewExternalString(cx, p, len,
-                                             &sXPCOMUCStringFinalizer);
-
-                    if (!jsString) {
-                        nsMemory::Free(p);
+                    buffer = (jschar*)JS_malloc(cx, len * sizeof(jschar));
+                    if (!buffer)
+                        return false;
+                    if (!JS_DecodeUTF8(cx, cString->Data(), cString->Length(),
+                                       buffer, &len)) {
+                        JS_free(cx, buffer);
                         return false;
                     }

Unfortunately, JS_DecodeUTF8 has gone away in the interim, and I'm guessing the JS folks don't really want to expose it It's also not ideal to call it twice. So I'd say we should split the implementation of UTF8ToNewUnicode into two parts, one to calculate the length and one to copy into a pre-allocated buffer, and expose both pieces. Let's run that by a string peer to make sure it's ok.

         case nsXPTType::T_CSTRING:

This part seems totally reasonable.
Attachment #633571 - Flags: review?(bobbyholley+bmo) → review-
Flagging jlebar as a string peer regarding comment 9.
Flags: needinfo?(justin.lebar+bug)
> FWIW, it's acceptable to bother the reviewers if the patch sits for more than a week or two.

Can you set up a bugzilla whine so that we don't need to use human cycles to ping you?

> So I'd say we should split the implementation of UTF8ToNewUnicode into two parts, one to calculate 
> the length and one to copy into a pre-allocated buffer, and expose both pieces.

That sounds fine to me.  I'd also be happy if you just copied the UTF8ToNewUnicode implementation into this function and modified it to suit your needs; it's not a lot of code we're talking about here.
Flags: needinfo?(justin.lebar+bug)
(In reply to Justin Lebar [:jlebar] from comment #11)
> > FWIW, it's acceptable to bother the reviewers if the patch sits for more than a week or two.
> 
> Can you set up a bugzilla whine so that we don't need to use human cycles to
> ping you?

A weekly whine is on by default I believe. I was definitely aware of the patch for the entire time it was in my review queue. It was just a hard thing to review and a very hard thing to make time for, because it had already sat for 5 months in someone else's queue and the lack of any pings made it hard to tell whether anyone still cared about it.

In theory reviews should be first priority. But so should security bugs, b2g blockers, tracked regressions, and answering questions on IRC. It's hard to get to everything that everyone wants you to do. Surely you of all people must understand that, right? :-)
> Surely you of all people must understand that, right? :-)

I do, but I don't see why, if you're explicitly not doing this review, a ping from a human should increase its priority.
> and the lack of any pings made it hard to tell whether anyone still cared about it.

Ah, I missed this bit.  I guess that's fair, although you could of course have asked...  :)
Split UTF8ToNewUnicode per suggestion from comment #9.
Still kept UTF8ToNewUnicode around (using the new APIs) to avoid having to change current consumers and as a convenience function.

I still kinda am unhappy with UTF8ToUnicodeBuffer not doing any bounds checking on the buffer size... This leaves a potential new foot-gun. OTOH the function is low-level enough that this might be OK.
Attachment #633571 - Attachment is obsolete: true
Attachment #705489 - Flags: review?(justin.lebar+bug)
Actually use the new APIs from Part 1 in XPCConvert.

Not sure if it makes sense to review this before Part 1 is r+

Try run of a combined and unpolished version of these patches:
https://tbpl.mozilla.org/?tree=Try&rev=b736c769e7b9
The code is essentially the same though.
Attachment #705491 - Flags: review?(bobbyholley+bmo)
Comment on attachment 705491 [details] [diff] [review]
Part 2: Avoid external JS strings in XPCConvert

Review of attachment 705491 [details] [diff] [review]:
-----------------------------------------------------------------

r=bholley

::: js/xpconnect/src/XPCConvert.cpp
@@ +236,5 @@
>                  break;
>              }
>          case nsXPTType::T_UTF8STRING:
>              {
>                  const nsACString* cString = *((const nsACString**)s);

can we rename this to utf8String?

@@ +253,5 @@
> +                if (!len)
> +                    return false;
> +
> +                const size_t buffer_size = len * sizeof(PRUnichar);
> +                PRUnichar* buffer = 

nit - trailing whitespace

@@ +259,5 @@
> +                if (!buffer)
> +                    return false;
> +
> +                uint32_t copied;
> +                if (!UTF8ToUnicodeBuffer(*cString, buffer, &copied) || 

And here.
Attachment #705491 - Flags: review?(bobbyholley+bmo) → review+
Just to be clear, JS_CreateUCString takes a non-null-terminated string?  A bunch of the callers null-terminate their strings, but perhaps that's paranoia.

This is an extremely scary API, but what the heck.  r=me if you're confident that null-termination isn't required.
Attachment #705489 - Flags: review?(justin.lebar+bug) → review+
(In reply to Justin Lebar [:jlebar] from comment #18)
> Just to be clear, JS_CreateUCString takes a non-null-terminated string?  A
> bunch of the callers null-terminate their strings, but perhaps that's
> paranoia.
> 
> This is an extremely scary API, but what the heck.  r=me if you're confident
> that null-termination isn't required.

I was confident at some point, but not so much anymore... In fact, while stuff still works, kinda, there is actually an assert for zero-termination:
http://mxr.mozilla.org/mozilla-central/source/js/src/vm/String-inl.h#290
Having the API not zero-terminate strings is a bad idea anyway.
Changed the API so that UTF8ToUnicodeBuffer now zero-terminates.
Attachment #705489 - Attachment is obsolete: true
Attachment #706770 - Flags: review?(justin.lebar+bug)
Same as before, but:
buffer_size = len * sizeof(PRUnichar);
-> buffer_size = (len + 1) * sizeof(PRUnichar);

The rest is addressing the nits, cString -> utf8-string and white space.
Attachment #705491 - Attachment is obsolete: true
Attachment #706771 - Flags: review?(bobbyholley+bmo)
Comment on attachment 706770 [details] [diff] [review]
Part 1: Provide UTF8ToUnicode functions accepting a buffer

Even better.
Attachment #706770 - Flags: review?(justin.lebar+bug) → review+
Attachment #706771 - Flags: review?(bobbyholley+bmo) → review+
Keywords: checkin-needed
Nils, do you have any data about the effect of this change?  Have you noticed if heap-unclassified goes down?
njn, I don't really know if there is much core code that really displays the change. The reason being that most of the time those strings are only rather short-lived.
Actually, about:memory should show some change because nsIMemory*Reporter use some narrow strings that are transfered C++-land to js-land. These strings would have been external js strings before. However, these strings are short-lived as well...

It became visible however, when I crafted some code (via an add-on) that keeps those strings alive longer. I can't seem to find that test extension from a couple of months back anymore :( Should have attached it here :p

I think the changes from this bug become interesting when looking for leaky or misbehaving (cache too much info) compartments, in core code or extensions.
https://hg.mozilla.org/mozilla-central/rev/a502aa076a94
https://hg.mozilla.org/mozilla-central/rev/42c786efb5d6
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
This *may* have caused a regression in a Talos RSS measure, on Android: see bug 836429.
You need to log in before you can comment on or make changes to this bug.