Closed
Bug 239369
Opened 20 years ago
Closed 20 years ago
nsRDFXMLSerializer string do, utf8 <-> utf16
Categories
(Core Graveyard :: RDF, defect)
Core Graveyard
RDF
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: axel, Assigned: axel)
References
Details
Attachments
(2 files, 1 obsolete file)
32.76 KB,
patch
|
benjamin
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
31.17 KB,
patch
|
Details | Diff | Splinter Review |
nsRDFXMLSerializer has some unfortunate string do, esp there are quite a few conversions between utf8 and utf16 which go forth and back. I'll create a patch which converts all the internals to directly work on utf8, which will cut down the conversions and string copies. I will fix bug 236920 on the fly. Well, vice versa is closer to the truth, but that is just historical evidence.
Assignee | ||
Comment 1•20 years ago
|
||
This patch does a bunch of string copies less. I folded a few rdf_BlockingWrite into appends, where I felt appropriate. Patch tested with rdfcat and a localstore containing urls with umlauts, ie. utf8 chars. Worked like charm.
Assignee | ||
Updated•20 years ago
|
Attachment #145457 -
Flags: superreview?(darin)
Attachment #145457 -
Flags: review?(bsmedberg)
Assignee | ||
Comment 2•20 years ago
|
||
Comment on attachment 145457 [details] [diff] [review] do utf8 in nsRDFXMLSerializer moving the r?, bsmedberg is swamped
Attachment #145457 -
Flags: review?(bsmedberg) → review?(tingley)
Comment 3•20 years ago
|
||
Comment on attachment 145457 [details] [diff] [review] do utf8 in nsRDFXMLSerializer >Index: base/src/nsNameSpaceMap.cpp > nsresult > nsNameSpaceMap::Put(const nsAString& aURI, nsIAtom* aPrefix) > { >+ NS_ConvertUTF16toUTF8 uriUTF8(aURI); >+ return Put(uriUTF8, aPrefix); >+} maybe define this function inline in the header? >Index: base/src/nsRDFXMLSerializer.cpp > PRBool > nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource, ... > { > const char* s; > aResource->GetValueConst(&s); >+ nsDependentCString uri(s); you could also use GetValueUTF8 here, which would normally give you a shared reference to the resource's nsCString object. Maybe that would be cheaper than computing the length of |s| here. >+rdf_EscapeAmpersandsAndAngleBrackets(nsCString& s) ... >+ // escape the chars from the end back to the front. >+ s.SetLength(newLength); >+ --c; // last char in unescaped string >+ char* w = s.EndWriting() - 1; // last char in grown buffer >+ while (c > start) { >+ switch (*c) { there's a major problem here. the call to SetLength might invalidate the buffer that |c| was pointing at. one solution is to compute the length from |start| to |c|, resize the buffer, and then re-initalize |start| and |c|. >+rdf_EscapeQuotes(nsCString& s) > { > PRInt32 i = 0; > while ((i = s.FindChar('"', i)) != -1) { > s.SetCharAt('&', i); >+ s.Insert(NS_LITERAL_CSTRING("quot;"), i + 1); > i += 5; > } > } Can't this function just call nsCString::ReplaceSubstring?
Attachment #145457 -
Flags: superreview?(darin) → superreview-
Assignee | ||
Comment 4•20 years ago
|
||
>>PRBool
>>nsRDFXMLSerializer::MakeQName(nsIRDFResource* aResource,
>
> ...
>
>>{
>> const char* s;
>> aResource->GetValueConst(&s);
>>+ nsDependentCString uri(s);
>
>
> you could also use GetValueUTF8 here, which would normally give
> you a shared reference to the resource's nsCString object.
> Maybe that would be cheaper than computing the length of |s|
> here.
I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
mURI = aCharPointer;
If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
sharing the buffer will only happen, if the str buffer is already shared.
(I will use GetValueUTF8 on those occasions where I need to create a
nsCAutoString, though, thanx for the hint)
Assignee | ||
Comment 5•20 years ago
|
||
Comment on attachment 145457 [details] [diff] [review] do utf8 in nsRDFXMLSerializer new patch coming up, but after easter, most likely
Attachment #145457 -
Attachment is obsolete: true
Attachment #145457 -
Flags: review?(tingley)
Comment 6•20 years ago
|
||
> I can't find out how nsCString gets the F_SHARED flag set. Esp not by doing
> mURI = aCharPointer;
>
> If I read nsTSubstring_CharT::Assign( const self_type& str ) right,
> sharing the buffer will only happen, if the str buffer is already shared.
F_SHARED means the buffer is sharable, it does not mean that it is currently
shared. maybe that was a bad choice of verb tense for that flag :-/
the point is that |nsCString x = y;| will avoid creating a copy of y's buffer if
y is marked with the F_SHARED flag. nsCString by default allocates sharable
buffers, so if you have a nsCString as a member variable than you can share its
buffer by assigning it into another string object.
Assignee | ||
Comment 7•20 years ago
|
||
Note to self, nsTSubstring_CharT::MutatePrep does the flag setting initially.
Assignee | ||
Comment 8•20 years ago
|
||
Addressed the review comments. I didn't feel like putting that one function into the header, I increase the codesize as is. +8k on windows opt, btw.
Assignee | ||
Updated•20 years ago
|
Attachment #146303 -
Flags: superreview?(darin)
Attachment #146303 -
Flags: review?(tingley)
Comment 9•20 years ago
|
||
Comment on attachment 146303 [details] [diff] [review] address darins comments This is going to take me some time to go through -- these are just the first thing I saw. Also, thanks for taking this on; I've really wanted to clean up this rat's nest for a while, and never had the time to do it justice. Actually, while you're at it, you may want to take a look at my patch in bug 231548 and see if a) it makes sense, and b) it would simplify anything to fold it in here. >--- base/src/nsNameSpaceMap.h 23 Mar 2002 21:20:44 -0000 1.2 >+++ base/src/nsNameSpaceMap.h 15 Apr 2004 14:10:44 -0000 >@@ -31,13 +31,13 @@ class nsNameSpaceMap > public: > class Entry { > public: >- Entry(const nsAString& aURI, nsIAtom* aPrefix) >+ Entry(const nsCSubstring& aURI, nsIAtom* aPrefix) > : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) { > MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); } > > ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); } > >- nsString mURI; >+ nsCString mURI; Now that you've converted nsNameSpaceMap to use CStrings, any chance you could use it to replace the usage of the nearly-identical NameSpaceEntry struct in nsRDFContentSink.cpp? >Index: base/src/nsRDFXMLDataSource.cpp >=================================================================== >RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v >retrieving revision 1.145 >diff -u -p -r1.145 nsRDFXMLDataSource.cpp >--- base/src/nsRDFXMLDataSource.cpp 11 Sep 2003 03:18:35 -0000 1.145 >+++ base/src/nsRDFXMLDataSource.cpp 15 Apr 2004 14:10:47 -0000 >@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu > // Add any namespace information that we picked up when reading > // the RDF/XML > nsNameSpaceMap::const_iterator last = mNameSpaces.last(); >- for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter != last; ++iter) >- serializer->AddNameSpace(iter->mPrefix, iter->mURI); >+ for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); >+ iter != last; ++iter) { >+ NS_ConvertUTF8toUTF16 uri(iter->mURI); >+ serializer->AddNameSpace(iter->mPrefix, uri); >+ } This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will just convert the URI back to UTF8 again -- a bit of a pain for the net result of copying one nsNameSpaceMap to another. I'm not sure if there's a way around it though, without changing the interface.
Assignee | ||
Comment 10•20 years ago
|
||
(In reply to comment #9) > (From update of attachment 146303 [details] [diff] [review]) > > This is going to take me some time to go through -- these are just the first > thing I saw. Also, thanks for taking this on; I've really wanted to clean up > this rat's nest for a while, and never had the time to do it justice. > > Actually, while you're at it, you may want to take a look at my patch in bug > 231548 and see if a) it makes sense, and b) it would simplify anything to fold > it in here. yiked over there. > >--- base/src/nsNameSpaceMap.h 23 Mar 2002 21:20:44 -0000 1.2 > >+++ base/src/nsNameSpaceMap.h 15 Apr 2004 14:10:44 -0000 > >@@ -31,13 +31,13 @@ class nsNameSpaceMap > > public: > > class Entry { > > public: > >- Entry(const nsAString& aURI, nsIAtom* aPrefix) > >+ Entry(const nsCSubstring& aURI, nsIAtom* aPrefix) > > : mURI(aURI), mPrefix(aPrefix), mNext(nsnull) { > > MOZ_COUNT_CTOR(nsNameSpaceMap::Entry); } > > > > ~Entry() { MOZ_COUNT_DTOR(nsNameSpaceMap::Entry); } > > > >- nsString mURI; > >+ nsCString mURI; > > Now that you've converted nsNameSpaceMap to use CStrings, any chance you could > use it to replace the usage of the nearly-identical NameSpaceEntry struct in > nsRDFContentSink.cpp? I'd rather do that in a different bug. > >Index: base/src/nsRDFXMLDataSource.cpp > >=================================================================== > >RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFXMLDataSource.cpp,v > >retrieving revision 1.145 > >diff -u -p -r1.145 nsRDFXMLDataSource.cpp > >--- base/src/nsRDFXMLDataSource.cpp 11 Sep 2003 03:18:35 -0000 1.145 > >+++ base/src/nsRDFXMLDataSource.cpp 15 Apr 2004 14:10:47 -0000 > >@@ -1164,8 +1164,11 @@ RDFXMLDataSourceImpl::Serialize(nsIOutpu > > // Add any namespace information that we picked up when reading > > // the RDF/XML > > nsNameSpaceMap::const_iterator last = mNameSpaces.last(); > >- for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter != last; ++iter) > >- serializer->AddNameSpace(iter->mPrefix, iter->mURI); > >+ for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); > >+ iter != last; ++iter) { > >+ NS_ConvertUTF8toUTF16 uri(iter->mURI); > >+ serializer->AddNameSpace(iter->mPrefix, uri); > >+ } > > This is ugly, since this will call |mNameSpaces.Put(aURI, aPrefix)|, which will > just convert the URI back to UTF8 again -- a bit of a pain for the net result > of copying one nsNameSpaceMap to another. I'm not sure if there's a way around > it though, without changing the interface. Not without changing the interface, and I didn't wanna do that. I don't think we should go for too many interface changes while we're not sure where we wanna have them go in the future. Note that on the side of nsRDFXMLSerializer::AddNameSpace, we need to jump the bridge from nsA(C?)String to nsCString anyway. Though I could probably ease that by using nsACStrings in nsNameSpaceMap. But this isn't really a big deal in the first place, it's just a few namespaces that get pushed over in the first place.
Comment 11•20 years ago
|
||
BTW, any nsAString can be converted to nsSubstring without copying. However, the current string code doesn't really provide a convenient API for that. The only option right now is to use Substring to create a nsSubstring from a nsAString. This is why you should use nsSubstring instead of nsString for in-params. At some point I hope to make an API that makes it easier to construct a nsSubstring from a nsAString.
Comment 12•20 years ago
|
||
Comment on attachment 146303 [details] [diff] [review] address darins comments >Index: base/src/nsRDFXMLDataSource.cpp >+ for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); >+ iter != last; ++iter) { >+ NS_ConvertUTF8toUTF16 uri(iter->mURI); >+ serializer->AddNameSpace(iter->mPrefix, uri); >+ } If you know that |serializer->AddNameSpace| is going to store |uri| into a |nsString| object, then you might want to rewrite this loop to heap-allocate |uri| so that the assignment is cheap (just an increment of a reference counter)... like this: for (nsNameSpaceMap::const_iterator iter = mNameSpaces.first(); iter != last; ++iter) { nsString uri; AppendUTF8toUTF16(iter->mURI, uri); serializer->AddNameSpace(iter->mPrefix, uri); } NOTE: I use AppendUTF8toUTF16 since it is slightly faster than the Copy* form (b/c it does not have to Truncate the dest string before writing to it). sr=darin
Attachment #146303 -
Flags: superreview?(darin) → superreview+
Assignee | ||
Updated•20 years ago
|
Attachment #146303 -
Flags: review?(tingley) → review?(bsmedberg)
Comment 13•20 years ago
|
||
Comment on attachment 146303 [details] [diff] [review] address darins comments >@@ -238,7 +243,7 @@ nsRDFXMLSerializer::MakeQName(nsIRDFReso > // Just generate a random prefix > static PRInt32 gPrefixID = 0; >- aNameSpacePrefix = NS_LITERAL_STRING("NS"); >+ aNameSpacePrefix = NS_LITERAL_CSTRING("NS"); > aNameSpacePrefix.AppendInt(++gPrefixID, 10); > return PR_FALSE; aNameSpacePrefix.AssignLiteral >@@ -324,40 +357,45 @@ nsRDFXMLSerializer::SerializeInlineAsser >+ attr.Append("=\"", 2); >+ rv = rdf_BlockingWrite(aStream, attr); >+ if (NS_FAILED(rv)) return rv; >+ s.Append("\"", 1); use AppendLiteral the first time, and Append('"') the second time. >@@ -436,21 +486,25 @@ nsRDFXMLSerializer::SerializeChildAssert >+ tag.Append(">\n", 2); same With that, let's make it happen! Sorry for the delays in reviewing.
Attachment #146303 -
Flags: review?(bsmedberg) → review+
Assignee | ||
Comment 14•20 years ago
|
||
The patch that I checked in. Most of bsmedbergs comments were already in after I fixed the merge conflicts. I didn't do what darin suggested, as I know that the string *is not* stored. It goes back into a utf8-utf16 conversion. This is sad, I'll give nsNameSpaceMap another whacking.
Assignee | ||
Comment 15•20 years ago
|
||
fix is in, no need for any branches for my part.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•