nsRDFXMLSerializer string do, utf8 <-> utf16

RESOLVED FIXED

Status

()

Core
RDF
RESOLVED FIXED
14 years ago
13 years ago

People

(Reporter: Axel Hecht, Assigned: Axel Hecht)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

14 years ago
nsRDFXMLSerializer has some unfortunate string do, esp there are quite a few
conversions between utf8 and utf16 which go forth and back.
I'll create a patch which converts all the internals to directly work on utf8,
which will cut down the conversions and string copies.

I will fix bug 236920 on the fly. Well, vice versa is closer to the truth, but
that is just historical evidence.
(Assignee)

Comment 1

14 years ago
Created attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

This patch does a bunch of string copies less.
I folded a few rdf_BlockingWrite into appends, where I felt appropriate.
Patch tested with rdfcat and a localstore containing urls with umlauts, ie.
utf8
chars. Worked like charm.
(Assignee)

Updated

14 years ago
Attachment #145457 - Flags: superreview?(darin)
Attachment #145457 - Flags: review?(bsmedberg)
(Assignee)

Comment 2

14 years ago
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

moving the r?, bsmedberg is swamped
Attachment #145457 - Flags: review?(bsmedberg) → review?(tingley)

Comment 3

14 years ago
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

>Index: base/src/nsNameSpaceMap.cpp

> nsresult
> nsNameSpaceMap::Put(const nsAString& aURI, nsIAtom* aPrefix)
> {
>+    NS_ConvertUTF16toUTF8 uriUTF8(aURI);
>+    return Put(uriUTF8, aPrefix);
>+}

maybe define this function inline in the header?


>Index: base/src/nsRDFXMLSerializer.cpp

> PRBool
> nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource,
...
> {
>     const char* s;
>     aResource->GetValueConst(&s);
>+    nsDependentCString uri(s);

you could also use GetValueUTF8 here, which would normally give
you a shared reference to the resource's nsCString object.
Maybe that would be cheaper than computing the length of |s|
here.


>+rdf_EscapeAmpersandsAndAngleBrackets(nsCString& s)
...
>+    // escape the chars from the end back to the front.
>+    s.SetLength(newLength);
>+    --c; // last char in unescaped string
>+    char* w = s.EndWriting() - 1; // last char in grown buffer
>+    while (c > start) {
>+        switch (*c) {

there's a major problem here.  the call to SetLength might invalidate
the buffer that |c| was pointing at.  one solution is to compute the
length from |start| to |c|, resize the buffer, and then re-initalize
|start| and |c|.



>+rdf_EscapeQuotes(nsCString& s)
> {
>     PRInt32 i = 0;
>     while ((i = s.FindChar('"', i)) != -1) {
>         s.SetCharAt('&', i);
>+        s.Insert(NS_LITERAL_CSTRING("quot;"), i + 1);
>         i += 5;
>     }
> }

Can't this function just call nsCString::ReplaceSubstring?
Attachment #145457 - Flags: superreview?(darin) → superreview-
(Assignee)

Comment 4

14 years ago
>>PRBool
>>nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource,
> 
> ...
> 
>>{
>>    const char* s;
>>    aResource->GetValueConst(&s);
>>+    nsDependentCString uri(s);
> 
> 
> you could also use GetValueUTF8 here, which would normally give
> you a shared reference to the resource's nsCString object.
> Maybe that would be cheaper than computing the length of |s|
> here.

I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
mURI = aCharPointer;

If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
sharing the buffer will only happen, if the str buffer is already shared.

(I will use GetValueUTF8 on those occasions where I need to create a 
nsCAutoString, though, thanx for the hint)
(Assignee)

Comment 5

14 years ago
Comment on attachment 145457 [details] [diff] [review]
do utf8 in nsRDFXMLSerializer

new patch coming up, but after easter, most likely
Attachment #145457 - Attachment is obsolete: true
Attachment #145457 - Flags: review?(tingley)

Comment 6

14 years ago
> I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
> mURI = aCharPointer;
> 
> If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
> sharing the buffer will only happen, if the str buffer is already shared.

F_SHARED means the buffer is sharable, it does not mean that it is currently
shared.  maybe that was a bad choice of verb tense for that flag :-/

the point is that |nsCString x = y;| will avoid creating a copy of y's buffer if
y is marked with the F_SHARED flag.  nsCString by default allocates sharable
buffers, so if you have a nsCString as a member variable than you can share its
buffer by assigning it into another string object.
(Assignee)

Comment 7

14 years ago
Note to self, nsTSubstring_CharT::MutatePrep does the flag setting initially.
(Assignee)

Comment 8

14 years ago
Created attachment 146303 [details] [diff] [review]
address darins comments

Addressed the review comments. I didn't feel like putting that one function
into
the header, I increase the codesize as is. +8k on windows opt, btw.
(Assignee)

Updated

14 years ago
Attachment #146303 - Flags: superreview?(darin)
Attachment #146303 - Flags: review?(tingley)

Comment 9

14 years ago
Comment on attachment 146303 [details] [diff] [review]
address darins comments


This is going to take me some time to go through -- these are just the first
thing I saw.  Also, thanks for taking this on; I've really wanted to clean up
this rat's nest for a while, and never had the time to do it justice.

Actually, while you're at it, you may want to take a look at my patch in bug
231548 and see if a) it makes sense, and b) it would simplify anything to fold
it in here.

>--- base/src/nsNameSpaceMap.h	23 Mar 2002 21:20:44 -0000	1.2
>+++ base/src/nsNameSpaceMap.h	15 Apr 2004 14:10:44 -0000
>@@ -31,13 +31,13 @@ class nsNameSpaceMap
> public:
>     class Entry {
>     public:
>-        Entry(const nsAString& aURI, nsIAtom* aPrefix)
>+        Entry(const nsCSubstring& aURI, nsIAtom* aPrefix)
>             : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) {
>             MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); }
> 
>         ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); }
>         
>-        nsString mURI;
>+        nsCString mURI;

Now that you've converted nsNameSpaceMap to use CStrings, any chance you could
use it to replace the usage of the nearly-identical NameSpaceEntry struct in
nsRDFContentSink.cpp? 

>Index: base/src/nsRDFXMLDataSource.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v
>retrieving revision 1.145
>diff -u -p -r1.145 nsRDFXMLDataSource.cpp
>--- base/src/nsRDFXMLDataSource.cpp	11 Sep 2003 03:18:35 -0000	1.145
>+++ base/src/nsRDFXMLDataSource.cpp	15 Apr 2004 14:10:47 -0000
>@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu
>     // Add any namespace information that we picked up when reading
>     // the RDF/XML
>     nsNameSpaceMap::const_iterator last = mNameSpaces.last();
>-    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter != last; ++iter)
>-        serializer->AddNameSpace(iter->mPrefix, iter->mURI);
>+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
>+         iter != last; ++iter) {
>+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
>+        serializer->AddNameSpace(iter->mPrefix, uri);
>+    }

This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will
just convert the URI back to UTF8 again -- a bit of a pain for the net result
of copying one nsNameSpaceMap to another.  I'm not sure if there's a way around
it though, without changing the interface.
(Assignee)

Comment 10

14 years ago
(In reply to comment #9)
> (From update of attachment 146303 [details] [diff] [review])
> 
> This is going to take me some time to go through -- these are just the first
> thing I saw.  Also, thanks for taking this on; I've really wanted to clean up
> this rat's nest for a while, and never had the time to do it justice.
> 
> Actually, while you're at it, you may want to take a look at my patch in bug
> 231548 and see if a) it makes sense, and b) it would simplify anything to fold
> it in here.

yiked over there.

> >--- base/src/nsNameSpaceMap.h	23 Mar 2002 21:20:44 -0000	1.2
> >+++ base/src/nsNameSpaceMap.h	15 Apr 2004 14:10:44 -0000
> >@@ -31,13 +31,13 @@ class nsNameSpaceMap
> > public:
> >     class Entry {
> >     public:
> >-        Entry(const nsAString& aURI, nsIAtom* aPrefix)
> >+        Entry(const nsCSubstring& aURI, nsIAtom* aPrefix)
> >             : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) {
> >             MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); }
> > 
> >         ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); }
> >         
> >-        nsString mURI;
> >+        nsCString mURI;
> 
> Now that you've converted nsNameSpaceMap to use CStrings, any chance you could
> use it to replace the usage of the nearly-identical NameSpaceEntry struct in
> nsRDFContentSink.cpp? 

I'd rather do that in a different bug.

> >Index: base/src/nsRDFXMLDataSource.cpp
> >===================================================================
> >RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v
> >retrieving revision 1.145
> >diff -u -p -r1.145 nsRDFXMLDataSource.cpp
> >--- base/src/nsRDFXMLDataSource.cpp	11 Sep 2003 03:18:35 -0000	1.145
> >+++ base/src/nsRDFXMLDataSource.cpp	15 Apr 2004 14:10:47 -0000
> >@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu
> >     // Add any namespace information that we picked up when reading
> >     // the RDF/XML
> >     nsNameSpaceMap::const_iterator last = mNameSpaces.last();
> >-    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter !=
last; ++iter)
> >-        serializer->AddNameSpace(iter->mPrefix, iter->mURI);
> >+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
> >+         iter != last; ++iter) {
> >+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
> >+        serializer->AddNameSpace(iter->mPrefix, uri);
> >+    }
> 
> This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will
> just convert the URI back to UTF8 again -- a bit of a pain for the net result
> of copying one nsNameSpaceMap to another.  I'm not sure if there's a way around
> it though, without changing the interface.

Not without changing the interface, and I didn't wanna do that. I don't think
we should go for too many interface changes while we're not sure where we 
wanna have them go in the future.
Note that on the side of nsRDFXMLSerializer::AddNameSpace, we need to jump
the bridge from nsA(C?)String to nsCString anyway. Though I could probably
ease that by using nsACStrings in nsNameSpaceMap.
But this isn't really a big deal in the first place, it's just a few namespaces
that get pushed over in the first place.

Comment 11

14 years ago
BTW, any nsAString can be converted to nsSubstring without copying.  However,
the current string code doesn't really provide a convenient API for that.  The
only option right now is to use Substring to create a nsSubstring from a
nsAString.  This is why you should use nsSubstring instead of nsString for
in-params.  At some point I hope to make an API that makes it easier to
construct a nsSubstring from a nsAString.

Comment 12

14 years ago
Comment on attachment 146303 [details] [diff] [review]
address darins comments

>Index: base/src/nsRDFXMLDataSource.cpp

>+    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
>+         iter != last; ++iter) {
>+        NS_ConvertUTF8toUTF16 uri(iter->mURI);
>+        serializer->AddNameSpace(iter->mPrefix, uri);
>+    }

If you know that |serializer->AddNameSpace| is going to store |uri|
into a |nsString| object, then you might want to rewrite this loop
to heap-allocate |uri| so that the assignment is cheap (just an 
increment of a reference counter)... like this:

    for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first();
	 iter != last; ++iter) {
	nsString uri;
	AppendUTF8toUTF16(iter->mURI, uri);
	serializer->AddNameSpace(iter->mPrefix, uri);
    }

NOTE: I use AppendUTF8toUTF16 since it is slightly faster than the Copy*
form (b/c it does not have to Truncate the dest string before writing to
it).


sr=darin
Attachment #146303 - Flags: superreview?(darin) → superreview+
(Assignee)

Updated

14 years ago
Attachment #146303 - Flags: review?(tingley) → review?(bsmedberg)

Comment 13

13 years ago
Comment on attachment 146303 [details] [diff] [review]
address darins comments

>@@ -238,7 +243,7 @@ nsRDFXMLSerializer::MakeQName(nsIRDFReso
>     // Just generate a random prefix
>     static PRInt32 gPrefixID = 0;
>-    aNameSpacePrefix = NS_LITERAL_STRING("NS");
>+    aNameSpacePrefix = NS_LITERAL_CSTRING("NS");
>     aNameSpacePrefix.AppendInt(++gPrefixID, 10);
>     return PR_FALSE;

aNameSpacePrefix.AssignLiteral

>@@ -324,40 +357,45 @@ nsRDFXMLSerializer::SerializeInlineAsser

>+    attr.Append("=\"", 2);
>+    rv = rdf_BlockingWrite(aStream, attr);
>+    if (NS_FAILED(rv)) return rv;
>+    s.Append("\"", 1);

use AppendLiteral the first time, and Append('"') the second time.

>@@ -436,21 +486,25 @@ nsRDFXMLSerializer::SerializeChildAssert

>+    tag.Append(">\n", 2);

same

With that, let's make it happen! Sorry for the delays in reviewing.
Attachment #146303 - Flags: review?(bsmedberg) → review+
(Assignee)

Comment 14

13 years ago
Created attachment 158369 [details] [diff] [review]
patch that landed

The patch that I checked in.
Most of bsmedbergs comments were already in after I fixed the merge conflicts.
I didn't do what darin suggested, as I know that the string *is not* stored.
It goes back into a utf8-utf16 conversion. This is sad, I'll give
nsNameSpaceMap
another whacking.
(Assignee)

Comment 15

13 years ago
fix is in, no need for any branches for my part.
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.