convert some consumers over to CopyUTF8toUTF16 / CopyUTF16toUTF8




16 years ago
4 years ago


(Reporter: alecf, Assigned: peterv)



Dependency tree / graph

Firefox Tracking Flags

(Not tracked)



(7 attachments, 1 obsolete attachment)

as a result of fixing bug 87677, we can now start converting over some consumers
where we think performance will matter.
here's a patch for xpcom/ds. reviews?
Comment on attachment 125856 [details] [diff] [review]
fix important parts of xpcom/ds

requesting review from doug  (xpcom owner)
Attachment #125856 - Flags: review?(dougt)
Comment on attachment 125856 [details] [diff] [review]
fix important parts of xpcom/ds

Attachment #125856 - Flags: superreview+
Attachment #125856 - Flags: review?(dougt) → review+
that patch has been checked in - lets use this as a dumping ground for other
consumers.. I found a few just by looking for "copyutf8toucs2" in LXR - people
put in comments saying that they wanted it.
I've got a patch for mozilla/content, will attach once my tree is updated so
that I can create a diff...
here are some perl regexp's that will do some of the substituion for you:

s/(\w+)\.Assign\(NS_ConvertUTF8toUCS2\((\w+)\)\)/CopyUTF8toUTF16\($2, $1\)/g;
s/(\w+)\.Assign\(NS_ConvertUCS2toUTF8\((\w+)\)\)/CopyUTF16toUTF8\($2, $1\)/g;

s/(\w+)\.Assign\(NS_ConvertASCIItoUCS2\((\w+)\)\)/CopyASCIItoUCS2\($2, $1\)/g;
s/(\w+)\.Assign\(NS_LossyConvertUCS2toASCII\((\w+)\)\)/CopyUCS2toASCII\($2, $1\)/g;

This works easily with "Assign" but not as easily with operator=, the problem
being that some stuff is done on variable creation

nsAutoString foo = NS_ConvertUTF8toUCS2(bar);

Also, even after applying the above regexp's you may run into this kind of problem:

char buf[100];
... fill in buff ...

the problem there is that CopyASCIItoUCS2() does not take a const char* as a

Anyway, I think if someone wanted, they could liberally apply these regexp's,
and then go build, fixing errors as they find them. I know callion and timeless
sometime do this kind of thing, so I'm cc'ing them in case. Also cc'ing Andrew
Taylor who has helped out on other footprint issues, figuring this is another
easy way to help.

the easiest way to make use of the above expressions, for anyone else listening,
is to save them to a text file, say, and then run them on a group
of files:

# first find all files that use NS_Convert*, and save it to a file
# so we can reuse it later without re-grepping (note I'm searching the
# layout and content directories in this example)
find layout content -type f | xargs grep -n -e NS_Convert -e NS_LossyConvert >

# now use that list with our substitutions
cut -d: -f1 copy-candidates.list | uniq | xargs perl -pi

# now build, fixing any build errors.

thanks to the use of grep -n, the .list file that is generated can also be
loaded in emacs.. then hit M-x compile-mode and you'll be able to C-` your way
through the candidate lines.
I ran into a couple of places where I had to add null checks, special code to
truncate, and stuff like that when dealing with char* and PRUnichar* data, would
it make sense to create methods of [Copy|Append]UTF[8|16]toUTF[16|8]() to avoid
having to spread code like that around? I'll hack something up...
Attachment #125944 - Flags: superreview?(dbaron)
Attachment #125944 - Flags: review?(alecf)
Comment on attachment 125944 [details] [diff] [review]
char* and PRUnichar* versions of Copy/AppendUTF*toUTF*

looks great - jag should also probably give you a nod on this of
Attachment #125944 - Flags: review?(alecf) → review+
Comment on attachment 125944 [details] [diff] [review]
char* and PRUnichar* versions of Copy/AppendUTF*toUTF*

Looks good to me (although why put the functions in such different orders in
the header and implementations -- and inconsistent ones in the
implementations?).  jag should review, though.
Attachment #125944 - Flags: superreview?(dbaron) → superreview?(jaggernaut)
I reorderd the methods in the implementation to match what's in the header file.
Comment on attachment 125944 [details] [diff] [review]
char* and PRUnichar* versions of Copy/AppendUTF*toUTF*

Attachment #125944 - Flags: superreview?(jaggernaut) → superreview+
Comment on attachment 125944 [details] [diff] [review]
char* and PRUnichar* versions of Copy/AppendUTF*toUTF*

Checked in.
Attachment #125990 - Flags: superreview?(alecf)
Attachment #125990 - Flags: review?(dwitte)
darin pointed out to me that we do an extra copy in XPConnect.
Ok, that's my take on fixing xpconnect to avoid the extra conversion. My only
questions are for dbaron and dbradley/brendan.. 

1) is there anything better than CBufDescriptor for what I'm trying to
accomplish? I'm trying to write into an nsAString& that has an already-allocated
2) am I doing the right thing by using JS_malloc? The docs for JS_NewString say
that it "takes ownership of the buffer" but I'm only assuming that it is
allocated via JS_malloc
Comment on attachment 125990 [details] [diff] [review]
mozilla/content changes

sr=alecf - nice!
Attachment #125990 - Flags: superreview?(alecf) → superreview+
Comment on attachment 126039 [details] [diff] [review]
convert xpcconvert.cpp

>+                    CalculateUTF8Length calculator;
>+                    nsACString::const_iterator start, end;
>+                    copy_string(cString->BeginReading(start),
>+                                cString->EndReading(end), calculator);
>-                    JSString* jsString = JS_NewUCStringCopyN(cx,
>+                    void* stringbuf =
>+                        JS_malloc(cx, (calculator.Length()+1) * sizeof (PRUnichar));

You want jschar, not PRUnichar, here -- just for style points, they're
equivalent (uint16) aliases.  Although stringbuf could be PRUnichar* if that
helps avoid ugly casts, below.

JS_malloc is a malloc wrapper, period, full stop (meaning you can hand off
malloc'd string character storage to the JS engine, whether obtained via malloc
or JS_malloc), but it does add this bit of value: if malloc fails (after your
modern OS has thrashed through swap and RAM), it will report an OOM error and
return null.  So you should check for null here, to be consistent (and to
support small device embeddings).

I am not up on the latest string fu, but I hope CBufDescriptor is going away,
not coming back!  Can't you use a dependent string or something like that?

Attachment #126068 - Flags: superreview?(alecf)
Attachment #126068 - Flags: review?(caillon)
Comment on attachment 125990 [details] [diff] [review]
mozilla/content changes

looks nice, just a few nits...

>@@ -105,33 +105,33 @@ nsSVGDocument::GetReferrer(nsAString& aR
> nsSVGDocument::GetDomain(nsAString& aDomain) {
>   if (!mDocumentURL) {
>     aDomain.Truncate();
>   } else {
>     nsCAutoString domain;
>     nsresult rv = mDocumentURL->GetHost(domain);
>     if (NS_FAILED(rv)) return rv;
>-    aDomain.Assign(NS_ConvertUTF8toUCS2(domain));
>+    CopyUTF8toUTF16(domain, aDomain);
>   }
>   return NS_OK;
> }

shift the nsCAutostring declaration and the Copy outside the if, and bag the

> nsSVGDocument::GetURL(nsAString& aURL) {
>   if (!mDocumentURL) {
>     aURL.Truncate();
>   } else {
>     nsCAutoString url;
>     nsresult rv = mDocumentURL->GetSpec(url);
>     if (NS_FAILED(rv)) return rv;    
>-    aURL.Assign(NS_ConvertUTF8toUCS2(url));
>+    CopyUTF8toUTF16(url, aURL);
>   }
>   return NS_OK;
> }

same here

>Index: content/xbl/src/nsXBLProtoImplProperty.cpp
>RCS file: /cvsroot/mozilla/content/xbl/src/nsXBLProtoImplProperty.cpp,v
>retrieving revision 1.5
>diff -u -9 -p -r1.5 nsXBLProtoImplProperty.cpp
>--- content/xbl/src/nsXBLProtoImplProperty.cpp	23 Mar 2002 22:46:10 -0000	1.5
>+++ content/xbl/src/nsXBLProtoImplProperty.cpp	19 Jun 2003 05:42:02 -0000
>@@ -214,21 +214,20 @@ nsXBLProtoImplProperty::CompileMember(ns
>   // We have a property.
>   nsresult rv = NS_OK;
>   // Do we have a getter?
>   nsAutoString getter(mGetterText);
>   nsMemory::Free(mGetterText);
>   mGetterText = nsnull;
>   nsCAutoString functionUri;
>   if (!getter.IsEmpty() && aClassObject) {
>-    functionUri = aClassStr;
>-    functionUri += NS_LITERAL_CSTRING(".");
>-    functionUri += NS_ConvertUCS2toUTF8(mName);
>+    functionUri = aClassStr + NS_LITERAL_CSTRING(".");
>+    AppendUTF16toUTF8(mName, functionUri);
>     functionUri += NS_LITERAL_CSTRING(" (getter)");
>     rv = aContext->CompileFunction(aClassObject,
>                                    nsCAutoString("onget"),
>                                    0,
>                                    nsnull,
>                                    getter, 
>                                    functionUri.get(),
>                                    0,
>                                    PR_FALSE,
>@@ -249,21 +248,20 @@ nsXBLProtoImplProperty::CompileMember(ns
>     }
>   } // if getter is not empty
>   nsresult rvG=rv;
>   // Do we have a setter?
>   nsAutoString setter(mSetterText);
>   nsMemory::Free(mSetterText);
>   mSetterText = nsnull;
>   if (!setter.IsEmpty() && aClassObject) {
>-    functionUri = aClassStr;
>-    functionUri += NS_LITERAL_CSTRING(".");
>-    functionUri += NS_ConvertUCS2toUTF8(mName);
>+    functionUri = aClassStr + NS_LITERAL_CSTRING(".");
>+    AppendUTF16toUTF8(mName, functionUri);
>     functionUri += NS_LITERAL_CSTRING(" (setter)");
>     rv = aContext->CompileFunction(aClassObject,
>                                    nsCAutoString("onset"),
>                                    1,
>                                    gPropertyArgs,
>                                    setter, 
>                                    functionUri.get(),
>                                    0,
>                                    PR_FALSE,

ugh, this method gives me the creeps. how about changing the pattern:

   nsAutoString getter(mGetterText);
   mGetterText = nsnull;

to use a DependentString instead, and put the Free at the end of the method?
there aren't any early returns here.

I despise the nsCAutoString("onget") too, but that can't be helped without
fixing the iface. for some weird reason it takes nsCString&...

>Index: content/xml/content/src/nsXMLElement.cpp
>RCS file: /cvsroot/mozilla/content/xml/content/src/nsXMLElement.cpp,v
>retrieving revision 1.97
>diff -u -9 -p -r1.97 nsXMLElement.cpp
>--- content/xml/content/src/nsXMLElement.cpp	16 Jun 2003 21:47:16 -0000	1.97
>+++ content/xml/content/src/nsXMLElement.cpp	19 Jun 2003 05:42:06 -0000
>@@ -172,24 +172,24 @@ nsXMLElement::GetXMLBaseURI(nsIURI **aUR
>       PRInt32 colon = value.FindChar(':');
>       PRInt32 slash = value.FindChar('/');
>       if (colon > 0 && !( slash >= 0 && slash < colon)) {
>         // Yay, we have absolute path!
>         // The complex looking if above is to make sure that we do not erroneously
>         // think a value of "./this:that" would have a scheme of "./that"
>         NS_ConvertUCS2toUTF8 str(value);
>-        rv = MakeURI(str,nsnull,aURI);
>+        rv = MakeURI(str, nsnull, aURI);
>         if (NS_FAILED(rv))
>           break;
>         if (!base.IsEmpty()) { // XXXdarin base is always empty
>-          str = NS_ConvertUCS2toUTF8(base);
>+          CopyUTF16toUTF8(base, str);
>           nsCAutoString resolvedStr;
>           rv = (*aURI)->Resolve(str, resolvedStr);
>           if (NS_FAILED(rv)) break;
>           rv = (*aURI)->SetSpec(resolvedStr);
>         }
>         break;
>       } else if ((value_len = value.Length()) > 0) {        
>         if (!base.IsEmpty()) {
>@@ -219,19 +219,19 @@ nsXMLElement::GetXMLBaseURI(nsIURI **aUR
>       mDocument->GetBaseURL(getter_AddRefs(docBase));
>       if (!docBase) {
>         mDocument->GetDocumentURL(getter_AddRefs(docBase));
>       }
>       if (base.IsEmpty()) {
>         *aURI = docBase.get();    
>         NS_IF_ADDREF(*aURI);  // nsCOMPtr releases this once
>       } else {
>         NS_ConvertUCS2toUTF8 str(base);
>-        rv = MakeURI(str,docBase,aURI);
>+        rv = MakeURI(str, docBase, aURI);
>       }
>     }

can you bag the |str| temporaries completely, and just do the conversion when
calling the functions?

   rv = MakeURI(NS_ConvertUCS2toUTF8(value), nsnull, aURI);

   rv = (*aURI)->Resolve(NS_ConvertUCS2toUTF8(base), resolvedStr);

> static nsresult CheckLoadURI(nsIURI *aBaseURI, const nsAString& aURI,
>                              nsIURI **aAbsURI)
> {
>   NS_ConvertUCS2toUTF8 str(aURI);
>   *aAbsURI = nsnull;
>   nsresult rv;
>-  rv = MakeURI(str,aBaseURI,aAbsURI);
>+  rv = MakeURI(str, aBaseURI, aAbsURI);

same here.

side note: from browsing nsGenericHTMLElement.cpp in lxr, I noticed a common
pattern - aOutParam.Truncate() calls at the beginning of methods, with
assignments at the end. you fixed a few here but there's a lot more... I'm
guessing the reason you didn't fix those is because there are |if
(NS_FAILED(rv)) return rv;| blocks in the middle of those functions. if you
convert them to NS_SUCCEEDED blocks, you could kill the rest of those
Truncates... up to you whether you want to bother. ;)

r=dwitte with nits addressed.
Attachment #125990 - Flags: review?(dwitte) → review+
Comment on attachment 126068 [details] [diff] [review]
mozilla/layout changes

Attachment #126068 - Flags: superreview?(alecf) → superreview+
Attachment #126068 - Flags: review?(caillon) → review+
Comment on attachment 126068 [details] [diff] [review]
mozilla/layout changes

Patch checked in.
Can we mark this fixed?
yeah, we might be able to mark this fixed now. 

I'm curious if there are any other major consumers.. I mean we've covered
layout, content, xpcom.. are there other high-impact areas anyone can think of?
I have a tree with a whole bunch of patches for this bug. I'll attach them tomorrow.
Jshin: the conversions in nsVariant are quite complex, I'm not sure we can
replace them all.
Perhaps not all, but most (i.e. except for CString). Actually, I'm wondering why
CString has to go through double conversions(Ascii->UTF16->UTF8) in the current
code. That's probably done to catch Cstrings that are not UTF-8 strings and to
make them nominally valid (for ISO-8859-1, that would be the right thing)
I bet the double conversion is there to attempt to deal with non-ASCII
characters in single-byte strings, i.e. if we'd just copy the bits into an UTF8
string, it wouldn't be UTF8 whereas with the double copy we might get it "more
right" :-)

Nonetheless it's still lame, and should ideally be fixed in a better way.
Posted patch Misc part 1Splinter Review
Comment on attachment 136833 [details] [diff] [review]
Misc part 1

Includes nsVariant. More coming later.
Attachment #136833 - Flags: superreview?(jst)
Attachment #136833 - Flags: review?(jshin)
Comment on attachment 136833 [details] [diff] [review]
Misc part 1

- In nsHTMLInputElement::SubmitNamesValues():

	 // Convert the values to strings for submission
	 char buf[20];
	 sprintf(&buf[0], "%d", clickedX);
-	 nsAutoString xVal = NS_ConvertASCIItoUCS2(buf);
+	 NS_ConvertASCIItoUTF16 xVal(buf);
	 sprintf(&buf[0], "%d", clickedY);
-	 nsAutoString yVal = NS_ConvertASCIItoUCS2(buf);
+	 NS_ConvertASCIItoUTF16 yVal(buf);

Wouldn't that be less code if you'd use nsString::AppendInt() in stead of the
conversion classes?

- In xpcom/string/public/nsReadableUtils.{cpp, h}

+NS_COM void LossyCopyUTF16toASCII( const PRUnichar* aSource, nsACString& aDest
+NS_COM void CopyASCIItoUTF16( const char* aSource, nsAString& aDest );

While you're at it, wanna add:

+NS_COM void CopyASCIItoUTF16( const char* aSource, nsAString& aDest );

+CopyASCIItoUTF16( const char* aSource, nsAString& aDest )
+  {
+    aDest.Truncate();
+    AppendASCIItoUTF16(aSource, aDest);
+  }


Attachment #136833 - Flags: superreview?(jst) → superreview+
Comment on attachment 136833 [details] [diff] [review]
Misc part 1

Whoa.. this must be a lot of work. Thanks. 
r=jshin with a few issues below resolved.

> Index: content/events/src/nsDOMEvent.cpp
> @@ -4395,12 +4395,14 @@ nsGenericHTMLElement::SetHostInHrefStrin
> -  aResult.Assign(NS_ConvertUTF8toUCS2(scheme) + NS_LITERAL_STRING("://") +
> -                 NS_ConvertUTF8toUCS2(userpass) + aHost +
> -                 NS_ConvertUTF8toUCS2(path));
> +  CopyUTF8toUTF16(scheme, aResult);

   scheme is always ASCII, isn't it? CopyASCIItoUTF16 would do in that case.
Well, someday  URI scheme might be 'i18nized'.... :-)

> Index: content/shared/src/nsHTMLValue.cpp 
> ===================================================================
> RCS file: /cvsroot/mozilla/content/shared/src/nsHTMLValue.cpp,v
> @@ -515,7 +515,7 @@ nsHTMLValue::ToString(nsAString& aResult
>        char buf[10];
>        PR_snprintf(buf, sizeof(buf), "#%02x%02x%02x",
>                    NS_GET_R(v), NS_GET_G(v), NS_GET_B(v));
> -      aResult.Assign(NS_ConvertASCIItoUCS2(buf));
> +      AppendASCIItoUTF16(buf, aResult); 

   Presumably,	aResult is truncated, right? OK. I went to the file (and a few
other files) and confirmed it :-)

> Index: dom/src/build/nsScriptNameSpaceManager.cpp
>  nsGlobalNameStruct *
> -nsScriptNameSpaceManager::AddToHash(const nsAString& aKey)
> +nsScriptNameSpaceManager::AddToHash(const char *aKey)
>  {
> +  NS_ConvertASCIItoUTF16 key(aKey);
>    GlobalNameMapEntry *entry =
>      NS_STATIC_CAST(GlobalNameMapEntry *,
> -                   PL_DHashTableOperate(&mGlobalNames, &aKey, PL_DHASH_ADD));
> +                   PL_DHashTableOperate(&mGlobalNames, &key, PL_DHASH_ADD));

 I'm wondering if namespaces in DOM is always ASCII-only(and will always be).
All the current callers appear to assume that, which is, I guess, why you
brought in the conversion inside the function. jst already sr'd it so that it's
probably the case.

> @@ -231,22 +232,22 @@ nsScriptNameSpaceManager::FillHash(nsICa
>            s->mType = nsGlobalNameStruct::eTypeExternalConstructorAlias;
>            s->mAlias->mCID = cid;
> -         s->mAlias->mProtoName.AssignWithConversion(constructorProto);
> +         AppendASCIItoUTF16(constructorProto,s->mAlias->mProtoName);

	s->mAlias->mProtoName is always empty here before append?

> @@ -872,20 +866,21 @@ nsVariant::ConvertToAString(const nsDisc
> -        rv = ToString(data, tempCString);
> +    {
> +        nsCAutoString tempCString;
> +        nsresult rv = ToString(data, tempCString);
>          if(NS_FAILED(rv))
>              return rv;
>          CopyASCIItoUCS2(tempCString, _retval);

  While you're at it, can you do s/UCS2/UTF16/ here?

>      case nsIDataType::VTYPE_CSTRING:
> +        // XXX Extra copy, can be removed if we're sure CSTRING can
> +        //     only contain ASCII.
> +     CopyUTF16toUTF8(NS_ConvertASCIItoUTF16(*data.u.mCStringValue),
> +                        _retval);
>          return NS_OK;

   I'm not sure what the 'correct' way is to handle this and similar cases.
Would it be better to check ASCII-ness before the conversion and trigger
an error if the 'input' is not ASCII? If ASCII, we can just assign
without going back and forth.  Currently, NS_ConvertASCIItoUTF16 and
CopyASCIItoUTF16 assert given a non-ASCII string, right?

>      case nsIDataType::VTYPE_WCHAR:
>      {
	// XXX Extra copies.  Jag will fix when he lands UTF8String
	nsAutoString tempString(data.u.mWCharValue);

 I'm afraid you missed the above.
Attachment #136833 - Flags: review?(jshin) → review+
Checked in attachment 136833 [details] [diff] [review], shaved off some Z/mZ.
This checkin might have caused the regression in bug 229668, "Hang after typing
keywords into address bar for "I'm feeling lucky" google search".
Posted patch More of the same (obsolete) — Splinter Review
Attachment #148747 - Flags: review?(jshin)
Comment on attachment 148747 [details] [diff] [review]
More of the same

Sorry for the delay and thanks for the work.

>Index: embedding/browser/powerplant/source/CHeaderSniffer.cpp
>RCS file: /cvsroot/mozilla/embedding/browser/powerplant/source/CHeaderSniffer.cpp,v

>-            nsCAutoString filename;
>-            mContentDisposition.Right(filename, mContentDisposition.Length() - index);
>-            defaultFileName = NS_ConvertUTF8toUCS2(filename);
>+            AppendUTF8toUTF16(Substring(mContentDisposition, index,
>+                                        mContentDisposition.Length()),
>+                              defaultFileName);

  The 3rd argument of Substring() should be either omitted or
mContentDisposition.Length() - index, shouldn't it?

>Index: extensions/datetime/nsDateTimeChannel.cpp

>-        nsAutoString title;
>-        title = NS_LITERAL_STRING("DateTime according to ")
>-              + NS_ConvertUTF8toUCS2(mHost);
>+        nsAutoString title("DateTime according to ");
>+        AppendUTF8toUTF16(mHost, title);

   nsAutoString title(NS_LITERAL_STRING("DateTime..."));

>Index: extensions/sql/pgsql/src/mozSqlConnectionPgsql.cpp

>@@ -1782,7 +1782,7 @@ FieldToValue(
>           /* a new value was found */
>           index -= 2;
>-          valueUCS2 = NS_ConvertUTF8toUCS2(concatenatedValueUTF8);
>+          CopyUTF8toUTF16(concatenatedValueUTF8, valueUCS2);
>           return NS_OK;
>         }

If it's not too much work, would you replace 'UCS2' in variable names with
'UTF16' in this file?
Attachment #148747 - Flags: review?(jshin) → review+
Attachment #148747 - Attachment is obsolete: true
Comment on attachment 157092 [details] [diff] [review]
More of the same v1.1

Carry over r=jshin.
Attachment #157092 - Flags: superreview?(jst)
Attachment #157092 - Flags: review+
Comment on attachment 157092 [details] [diff] [review]
More of the same v1.1

Attachment #157092 - Flags: superreview?(jst) → superreview+
+      CopyASCIItoUTF16(leafName, dirName);

Assignee: alecf → peterv
QA Contact: scc → xpcom
odd? -- this "linux" bug is marked dependent on a "windows" bug 87677

last patch was checked in 2004-09-16.
so is the consumer conversion completee?
Peterv (and anyone else who can advise)

regarding the last patch, I'd like your advice about this checkin having possibly caused a regression described in bug 310290.  The bug manifests as a thunderbird address book record containing diacritics not being recognized as a match to an exact duplicate of the record on a palm PDA (obviously with the same diacritics).   The code was working in Thunderbird 1.0 and then not working as of Thunderbird 1.5.   As best as I can determine, the last patch appears in TB 1.5.

My question is, under what circumstances might the checked in code cause this?  The possibility certainly exists that something in the palmsync code (or AB code) is lacking.   I'd offer more information or a more informed question but I come with a complete lack of understanding of the code involved, and the string conversion process in particular. 

The two checkins are

I arrive at this bug as the probable cause via these checkins and regression range of
- which is an absurd time range, but you can see there are very few checkins to palmsync in that time period. And only the checkins that involve string manipulation of an ab record are from this last patch.
Bits of this were checked in. I don't think there's much else to track here, so I'm going to mark this FIXED.
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 1214248
No longer blocks: 1214248
You need to log in before you can comment on or make changes to this bug.