Closed Bug 239369 Opened 20 years ago Closed 20 years ago

nsRDFXMLSerializer string do, utf8 <-> utf16

Categories

(Core Graveyard :: RDF, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: axel, Assigned: axel)

References

Details

Attachments

(2 files, 1 obsolete file)

nsRDFXMLSerializer has some unfortunate string do, esp there are quite a few
conversions between utf8 and utf16 which go forth and back.
I'll create a patch which converts all the internals to directly work on utf8,
which will cut down the conversions and string copies.

I will fix bug 236920 on the fly. Well, vice versa is closer to the truth, but
that is just historical evidence.
Attached patch do utf8 in nsRDFXMLSerializer (obsolete) — Splinter Review
This patch does a bunch of string copies less.
I folded a few rdf_BlockingWrite into appends, where I felt appropriate.
Patch tested with rdfcat and a localstore containing urls with umlauts, ie.
utf8
chars. Worked like charm.
Attachment #145457 - Flags: superreview?(darin)
Attachment #145457 - Flags: review?(bsmedberg)
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

moving the r?, bsmedberg is swamped
Attachment #145457 - Flags: review?(bsmedberg) → review?(tingley)
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

>Index: base/src/nsNameSpaceMap.cpp

> nsresult
> nsNameSpaceMap::Put(const nsAString& aURI, nsIAtom* aPrefix)
> {
>+    NS_ConvertUTF16toUTF8 uriUTF8(aURI);
>+    return Put(uriUTF8, aPrefix);
>+}

maybe define this function inline in the header?


>Index: base/src/nsRDFXMLSerializer.cpp

> PRBool
> nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource,
...
> {
>     const char* s;
>     aResource->GetValueConst(&s);
>+    nsDependentCString uri(s);

you could also use GetValueUTF8 here, which would normally give
you a shared reference to the resource's nsCString object.
Maybe that would be cheaper than computing the length of |s|
here.


>+rdf_EscapeAmpersandsAndAngleBrackets(nsCString& s)
...
>+    // escape the chars from the end back to the front.
>+    s.SetLength(newLength);
>+    --c; // last char in unescaped string
>+    char* w = s.EndWriting() - 1; // last char in grown buffer
>+    while (c > start) {
>+        switch (*c) {

there's a major problem here.  the call to SetLength might invalidate
the buffer that |c| was pointing at.  one solution is to compute the
length from |start| to |c|, resize the buffer, and then re-initalize
|start| and |c|.



>+rdf_EscapeQuotes(nsCString& s)
> {
>     PRInt32 i = 0;
>     while ((i = s.FindChar('"', i)) != -1) {
>         s.SetCharAt('&', i);
>+        s.Insert(NS_LITERAL_CSTRING("quot;"), i + 1);
>         i += 5;
>     }
> }

Can't this function just call nsCString::ReplaceSubstring?
Attachment #145457 - Flags: superreview?(darin) → superreview-
>>PRBool
>>nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource,
> 
> ...
> 
>>{
>>    const char* s;
>>    aResource->GetValueConst(&s);
>>+    nsDependentCString uri(s);
> 
> 
> you could also use GetValueUTF8 here, which would normally give
> you a shared reference to the resource's nsCString object.
> Maybe that would be cheaper than computing the length of |s|
> here.

I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
mURI = aCharPointer;

If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
sharing the buffer will only happen, if the str buffer is already shared.

(I will use GetValueUTF8 on those occasions where I need to create a 
nsCAutoString, though, thanx for the hint)
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

new patch coming up, but after easter, most likely
Attachment #145457 - Attachment is obsolete: true
Attachment #145457 - Flags: review?(tingley)
> I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
> mURI = aCharPointer;
> 
> If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
> sharing the buffer will only happen, if the str buffer is already shared.

F_SHARED means the buffer is sharable, it does not mean that it is currently
shared.  maybe that was a bad choice of verb tense for that flag :-/

the point is that |nsCString x = y;| will avoid creating a copy of y's buffer if
y is marked with the F_SHARED flag.  nsCString by default allocates sharable
buffers, so if you have a nsCString as a member variable than you can share its
buffer by assigning it into another string object.
Note to self, nsTSubstring_CharT::MutatePrep does the flag setting initially.
Addressed the review comments. I didn't feel like putting that one function
into
the header, I increase the codesize as is. +8k on windows opt, btw.
Attachment #146303 - Flags: superreview?(darin)
Attachment #146303 - Flags: review?(tingley)
Comment on attachment 146303 [details] [diff] [review]
address darins comments


This is going to take me some time to go through -- these are just the first
thing I saw.  Also, thanks for taking this on; I've really wanted to clean up
this rat's nest for a while, and never had the time to do it justice.

Actually, while you're at it, you may want to take a look at my patch in bug
231548 and see if a) it makes sense, and b) it would simplify anything to fold
it in here.

>--- base/src/nsNameSpaceMap.h	23 Mar 2002 21:20:44 -0000	1.2
>+++ base/src/nsNameSpaceMap.h	15 Apr 2004 14:10:44 -0000
>@@ -31,13 +31,13 @@ class nsNameSpaceMap
> public:
>     class Entry {
>     public:
>-        Entry(const nsAString& aURI, nsIAtom* aPrefix)
>+        Entry(const nsCSubstring& aURI, nsIAtom* aPrefix)
>             : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) {
>             MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); }
> 
>         ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); }
>         
>-        nsString mURI;
>+        nsCString mURI;

Now that you've converted nsNameSpaceMap to use CStrings, any chance you could
use it to replace the usage of the nearly-identical NameSpaceEntry struct in
nsRDFContentSink.cpp? 

>Index: base/src/nsRDFXMLDataSource.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v
>retrieving revision 1.145
>diff -u -p -r1.145 nsRDFXMLDataSource.cpp
>--- base/src/nsRDFXMLDataSource.cpp	11 Sep 2003 03:18:35 -0000	1.145
>+++ base/src/nsRDFXMLDataSource.cpp	15 Apr 2004 14:10:47 -0000
>@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu
>     // Add any namespace information that we picked up when reading
>     // the RDF/XML
>     nsNameSpaceMap::const_iterator last = mNameSpaces.last();
>-    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter != last; ++iter)
>-        serializer->AddNameSpace(iter->mPrefix, iter->mURI);
>+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
>+         iter != last; ++iter) {
>+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
>+        serializer->AddNameSpace(iter->mPrefix, uri);
>+    }

This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will
just convert the URI back to UTF8 again -- a bit of a pain for the net result
of copying one nsNameSpaceMap to another.  I'm not sure if there's a way around
it though, without changing the interface.
(In reply to comment #9)
> (From update of attachment 146303 [details] [diff] [review])
> 
> This is going to take me some time to go through -- these are just the first
> thing I saw.  Also, thanks for taking this on; I've really wanted to clean up
> this rat's nest for a while, and never had the time to do it justice.
> 
> Actually, while you're at it, you may want to take a look at my patch in bug
> 231548 and see if a) it makes sense, and b) it would simplify anything to fold
> it in here.

yiked over there.

> >--- base/src/nsNameSpaceMap.h	23 Mar 2002 21:20:44 -0000	1.2
> >+++ base/src/nsNameSpaceMap.h	15 Apr 2004 14:10:44 -0000
> >@@ -31,13 +31,13 @@ class nsNameSpaceMap
> > public:
> >     class Entry {
> >     public:
> >-        Entry(const nsAString& aURI, nsIAtom* aPrefix)
> >+        Entry(const nsCSubstring& aURI, nsIAtom* aPrefix)
> >             : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) {
> >             MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); }
> > 
> >         ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); }
> >         
> >-        nsString mURI;
> >+        nsCString mURI;
> 
> Now that you've converted nsNameSpaceMap to use CStrings, any chance you could
> use it to replace the usage of the nearly-identical NameSpaceEntry struct in
> nsRDFContentSink.cpp? 

I'd rather do that in a different bug.

> >Index: base/src/nsRDFXMLDataSource.cpp
> >===================================================================
> >RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v
> >retrieving revision 1.145
> >diff -u -p -r1.145 nsRDFXMLDataSource.cpp
> >--- base/src/nsRDFXMLDataSource.cpp	11 Sep 2003 03:18:35 -0000	1.145
> >+++ base/src/nsRDFXMLDataSource.cpp	15 Apr 2004 14:10:47 -0000
> >@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu
> >     // Add any namespace information that we picked up when reading
> >     // the RDF/XML
> >     nsNameSpaceMap::const_iterator last = mNameSpaces.last();
> >-    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter !=
last; ++iter)
> >-        serializer->AddNameSpace(iter->mPrefix, iter->mURI);
> >+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
> >+         iter != last; ++iter) {
> >+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
> >+        serializer->AddNameSpace(iter->mPrefix, uri);
> >+    }
> 
> This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will
> just convert the URI back to UTF8 again -- a bit of a pain for the net result
> of copying one nsNameSpaceMap to another.  I'm not sure if there's a way around
> it though, without changing the interface.

Not without changing the interface, and I didn't wanna do that. I don't think
we should go for too many interface changes while we're not sure where we 
wanna have them go in the future.
Note that on the side of nsRDFXMLSerializer::AddNameSpace, we need to jump
the bridge from nsA(C?)String to nsCString anyway. Though I could probably
ease that by using nsACStrings in nsNameSpaceMap.
But this isn't really a big deal in the first place, it's just a few namespaces
that get pushed over in the first place.
BTW, any nsAString can be converted to nsSubstring without copying.  However,
the current string code doesn't really provide a convenient API for that.  The
only option right now is to use Substring to create a nsSubstring from a
nsAString.  This is why you should use nsSubstring instead of nsString for
in-params.  At some point I hope to make an API that makes it easier to
construct a nsSubstring from a nsAString.
Comment on attachment 146303 [details] [diff] [review]
address darins comments

>Index: base/src/nsRDFXMLDataSource.cpp

>+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
>+         iter != last; ++iter) {
>+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
>+        serializer->AddNameSpace(iter->mPrefix, uri);
>+    }

If you know that |serializer->AddNameSpace| is going to store |uri|
into a |nsString| object, then you might want to rewrite this loop
to heap-allocate |uri| so that the assignment is cheap (just an 
increment of a reference counter)... like this:

    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
	 iter != last; ++iter) {
	nsString uri;
	AppendUTF8toUTF16(iter->mURI, uri);
	serializer->AddNameSpace(iter->mPrefix, uri);
    }

NOTE: I use AppendUTF8toUTF16 since it is slightly faster than the Copy*
form (b/c it does not have to Truncate the dest string before writing to
it).


sr=darin
Attachment #146303 - Flags: superreview?(darin) → superreview+
Attachment #146303 - Flags: review?(tingley) → review?(bsmedberg)
Comment on attachment 146303 [details] [diff] [review]
address darins comments

>@@ -238,7 +243,7 @@ nsRDFXMLSerializer::MakeQName(nsIRDFReso
>     // Just generate a random prefix
>     static PRInt32 gPrefixID = 0;
>-    aNameSpacePrefix = NS_LITERAL_STRING("NS");
>+    aNameSpacePrefix = NS_LITERAL_CSTRING("NS");
>     aNameSpacePrefix.AppendInt(++gPrefixID, 10);
>     return PR_FALSE;

aNameSpacePrefix.AssignLiteral

>@@ -324,40 +357,45 @@ nsRDFXMLSerializer::SerializeInlineAsser

>+    attr.Append("=\"", 2);
>+    rv = rdf_BlockingWrite(aStream, attr);
>+    if (NS_FAILED(rv)) return rv;
>+    s.Append("\"", 1);

use AppendLiteral the first time, and Append('"') the second time.

>@@ -436,21 +486,25 @@ nsRDFXMLSerializer::SerializeChildAssert

>+    tag.Append(">\n", 2);

same

With that, let's make it happen! Sorry for the delays in reviewing.
Attachment #146303 - Flags: review?(bsmedberg) → review+
The patch that I checked in.
Most of bsmedbergs comments were already in after I fixed the merge conflicts.
I didn't do what darin suggested, as I know that the string *is not* stored.
It goes back into a utf8-utf16 conversion. This is sad, I'll give
nsNameSpaceMap
another whacking.
fix is in, no need for any branches for my part.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: