Closed Bug 231548 Opened 21 years ago Closed 18 years ago

remove nsRDFParserUtils

Categories

(Core Graveyard :: RDF, defect)

x86
All
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: tingley, Assigned: readams)

References

Details

(Keywords: perf)

Attachments

(5 files, 2 obsolete files)

nsRDFParserUtils, as far as I can tell, contains legacy left over from before
RDF was hooked up to expat.  Some of its code is never called, and the rest of
performs functions that are redundant and (as a result) incorrect.

nsRDFParserUtils contains 3 functions:

GetQuotedAttributeValue() is never used, and would be redundant with
nsParserUtils::GetQuotedAttributeValue() if it were.

EntityToUnicode() is only called from StripAndConvert().

StripAndConvert is applies two transformations to attribute values:
    - Strip matching single and double quotes from strings.
    - Convert some character entities.

The first behavior is not only unnecessary, it is incorrect.  Quotes have been
stripped, to the extent that they should be, by the parser before the content
sink ever sees them; removing additional pairs of quotes is altering opaque data
that we shouldn't touch.  When presented with something like this:

    <RDF:Description RDF:about="..." NC:Name="'In Quotes'" />

The parser will strip the outer "" and pass |'In Quotes'| to RDF. 
StripAndConvert() will incorrectly remove the '' as well.

The second part selectively unescapes certain character entities, notably &amp;,
&gt;/&lt;, etc.  Again, this is redundant -- entities in the attributes have
already been handled by the parser.  (Historical note -- I believe this happened
as part of bug 57248 -- see harishd's bug 57248 comment #62 and onward, where he
discusses moving attribute entity parsing from the sinks to the parser.) 
Similar to the quote-stripping, this can produce incorrect behavior in fringe
cases involving doublely-embedded entities.

[As a side note, both these behaviors make a round-trip through the RDF system
non-idempotent for some RDF literals, which is bad behavior.]

Killing this code will reduce footprint, speed up RDF processing, and pave the
way for a string cleanup of the RDF content sink that I've been meaning to do
for some time.  I had a patch for this at one point that sped up RDF parsing of
chrome.rdf by about 8% in an opt build; it needs to be de-rotted, though.
Attached patch current thoughts (obsolete) — Splinter Review
This is missing stuff to cut it out of the build on various other platforms, if
that's necessary.  It removes the StripAndConvert calls and cleans up some
string stuff nearby that I couldn't resist getting rid of.

I'm still testing this.
rjc: not looking for a review yet, but tell me if my reasoning sounds bogus.
Attached patch patchSplinter Review
Basically the same as before, though I've included the contents of the
parseutils files, essentially for reference.  I've been testing this for a
week, and everything seems ok.
Attachment #139530 - Attachment is obsolete: true
Attachment #139937 - Flags: review?(rjc)
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.7alpha
yikes. loss of data, bumping severity.
This is a good thing to do, will attach a testcase that show the loss of data
if you do an rdfcat on it.
I wonder if we have to use a nsACString, though.
rdf_RequiresAbsoluteURI should just need nsCSubstring, 
rdf_MakeAbsoluteURI(const nsCSubstring& aBaseURI, const nsCSubstring& aURI,
                    nsCString& aResult)
No nsA, as we don't wanna cross xpcom boundaries and there's no need to use
the binary compat virtual function call interfaces to strings.
Severity: normal → critical
QA Contact: core.rdf
Target Milestone: mozilla1.7alpha → ---
Attachment #139937 - Flags: review?(rjc)
*** Bug 241228 has been marked as a duplicate of this bug. ***
This patch fixes the double-unescaping that the RDF parser does, so that things
like &amp;&nbsp; are interpreted correctly.

Note that even with this patch, mozilla still will badly mangle wide characters
in RDF data.
Flags: blocking1.9a1?
Attachment #196368 - Flags: review?(axel)
*** Bug 300602 has been marked as a duplicate of this bug. ***
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve, everything
else is just keeping mad code alive. And you were as far as changing the
signature anyway.
Attachment #196368 - Flags: review?(axel) → review-
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve in the
three contentsink call sites, everything else is just keeping mad code alive.
And you were as far as changing the signature of that "helper" anyway.
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve in the
three contentsink call sites, everything else is just keeping mad code alive.
And you were as far as changing the signature of that "helper" anyway.
This patch is the same as the other one, but removes also rdf_MakeAbsoluteURI
completely.
Attachment #196368 - Attachment is obsolete: true
Attachment #199176 - Flags: review?(axel)
Comment on attachment 199176 [details] [diff] [review]
not bitrotten; uses nsIURL::Resolve



>Index: nsRDFContentSink.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFContentSink.cpp,v
>retrieving revision 1.121.8.1
>diff -p -u -r1.121.8.1 nsRDFContentSink.cpp
>--- nsRDFContentSink.cpp	30 Aug 2005 02:24:59 -0000	1.121.8.1
>+++ nsRDFContentSink.cpp	11 Oct 2005 16:50:04 -0000
>@@ -858,20 +853,15 @@ RDFContentSinkImpl::GetIdAboutAttribute(
>             if (aIsAnonymous)
>                 *aIsAnonymous = PR_FALSE;
> 
>-            nsAutoString uri(aAttributes[1]);
>-            nsRDFParserUtils::StripAndConvert(uri);
>-
>-            rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(docURI), uri);

Check rdf_RequiresAbsoluteURI here? The cheapest way is probably to just call
into gRDFService->GetResource inside the if and after it.

>+            nsCAutoString uri;
>+            rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+            if (NS_FAILED(rv)) return rv;
> 
>-            return gRDFService->GetUnicodeResource(uri, aResource);
>+            return gRDFService->GetResource(uri, aResource);
>         }
>         else if (localName == kIdAtom) {
>             if (aIsAnonymous)
>                 *aIsAnonymous = PR_FALSE;
>-
>-            nsAutoString name(aAttributes[1]);
>-            nsRDFParserUtils::StripAndConvert(name);
>-
>             // In the spirit of leniency, we do not bother trying to
>             // enforce that this be a valid "XML Name" (see
>             // http://www.w3.org/TR/REC-xml#NT-Nmtoken), as per
>@@ -880,11 +870,12 @@ RDFContentSinkImpl::GetIdAboutAttribute(
>             // Construct an in-line resource whose URI is the
>             // document's URI plus the XML name specified in the ID
>             // attribute.
>-            name.Insert(PRUnichar('#'), 0);
>-          
>-            rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(docURI), name);
>+            nsCAutoString name;
>+            rv = mDocumentURL->Resolve(NS_LITERAL_CSTRING("#") + 
>+                                       NS_ConvertUCS2toUTF8(aAttributes[1]), name);
nsCAutoString ref('#');
AppendUTF16toUTF8(aAttributes[1], ref);

should be a tad faster. No need to call into RequiresAbsolu... here.

>+            if (NS_FAILED(rv)) return rv;
> 
>-            return gRDFService->GetUnicodeResource(name, aResource);
>+            return gRDFService->GetResource(name, aResource);
>         }
>         else if (localName == kAboutEachAtom) {
>             // XXX we don't deal with aboutEach...
>@@ -923,17 +914,15 @@ RDFContentSinkImpl::GetResourceAttribute
>       // first thing that was specified and ignore the other.
> 
>       if (localName == kResourceAtom) {
>-          nsAutoString uri(aAttributes[1]);
>-          nsRDFParserUtils::StripAndConvert(uri);
>-
>           // XXX Take the URI and make it fully qualified by
>           // sticking it into the document's URL. This may not be
>           // appropriate...
>-          nsCAutoString documentURL;
>-          mDocumentURL->GetSpec(documentURL);
>-          rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(documentURL), uri);
>+          nsresult rv;

check for rdf_RequiresAbsoluteURI here, too.

>+          nsCAutoString uri;
>+          rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+          if (NS_FAILED(rv)) return rv;
> 
>-          return gRDFService->GetUnicodeResource(uri, aResource);
>+          return gRDFService->GetResource(uri, aResource);
>       }
>   }
>   return NS_ERROR_FAILURE;
<...>

>Index: rdfutil.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/rdfutil.cpp,v
>retrieving revision 1.60
>diff -p -u -r1.60 rdfutil.cpp
>--- rdfutil.cpp	9 Sep 2004 20:17:36 -0000	1.60
>+++ rdfutil.cpp	11 Oct 2005 16:50:05 -0000
>@@ -87,85 +87,13 @@ rdf_MakeRelativeRef(const nsCSubstring& 
> }
> 
> static PRBool
>-rdf_RequiresAbsoluteURI(const nsString& uri)
>+rdf_RequiresAbsoluteURI(const nsACString& uri)

nsCString is good enough these days. This isn't really for public consumption 
anyway.

> {

<...>

>-
>-    return NS_OK;
>+    return !(StringBeginsWith(uri, NS_LITERAL_CSTRING("urn:")) ||
>+             StringBeginsWith(uri, NS_LITERAL_CSTRING("chrome:")) ||
>+             StringBeginsWith(uri, NS_LITERAL_CSTRING("nc:"),

drop the nc: one, we shouldn't have those.

>+                              nsCaseInsensitiveCStringComparator()));
> }
> 
> void
>Index: rdfutil.h
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/rdfutil.h,v

Expose rdf_RequiresAbsoluteURI here or move it over to the content sink
completely.
I guess I'd advocate to move it.

I'm a yota away from r+ now, but I guess seeing the final patch won't hurt.
Attachment #199176 - Flags: review?(axel) → review-
some shenanigans involving RequiresAbsoluteURI.  Man, converting various
strings is a pain in the butt.
Attachment #199191 - Flags: review?(axel)
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

ping?
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

r=me with the following nit fixed, 

>Index: nsRDFContentSink.cpp
>===================================================================

<...>
>+            if (rdf_RequiresAbsoluteURI(relURI)) {
>+                nsCAutoString uri;
>+                rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+                if (NS_FAILED(rv)) return rv;
>+                
>+                return gRDFService->GetResource(uri, 
>+                                                aResource);
>+            } else {
>+                return gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1]), 
>+                                                aResource);
>+            }

and 

<...>
>+          if (rdf_RequiresAbsoluteURI(relURI)) {
>+              nsresult rv;
>+              nsCAutoString uri;
>+
>+              rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+              if (NS_FAILED(rv)) return rv;
>+
>+              return gRDFService->GetResource(uri, aResource);
>+          } else {
>+              return gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1]), 
>+                                              aResource);
>+          }
>       }

No "else" after return, please.
Attachment #199191 - Flags: superreview?(shaver)
Attachment #199191 - Flags: review?(axel)
Attachment #199191 - Flags: review+
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

sr=shaver
Attachment #199191 - Flags: superreview?(shaver) → superreview+
This patch makes the requested change to the reviewed patch.  I don't have commit access so I need someone to commit for me.  Whoever does it should be sure to do a cvs remove on nsRDFParserUtils.cpp
gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1])

I'm not sure why this doesn't use GetUnicodeResource...

but anyway, I checked this in as-is (and removed nsRDFParserUtils.*)

fixed on trunk
Checking in rdf/base/src/Makefile.in;
/cvsroot/mozilla/rdf/base/src/Makefile.in,v  <--  Makefile.in
new revision: 1.36; previous revision: 1.35
done
Checking in rdf/base/src/nsRDFContentSink.cpp;
/cvsroot/mozilla/rdf/base/src/nsRDFContentSink.cpp,v  <--  nsRDFContentSink.cpp
new revision: 1.123; previous revision: 1.122
done
Removing rdf/base/src/nsRDFParserUtils.cpp;
/cvsroot/mozilla/rdf/base/src/nsRDFParserUtils.cpp,v  <--  nsRDFParserUtils.cpp
new revision: delete; previous revision: 1.17
done
Removing rdf/base/src/nsRDFParserUtils.h;
/cvsroot/mozilla/rdf/base/src/nsRDFParserUtils.h,v  <--  nsRDFParserUtils.h
new revision: delete; previous revision: 1.11
done
Checking in rdf/base/src/rdfutil.cpp;
/cvsroot/mozilla/rdf/base/src/rdfutil.cpp,v  <--  rdfutil.cpp
new revision: 1.61; previous revision: 1.60
done
Checking in rdf/base/src/rdfutil.h;
/cvsroot/mozilla/rdf/base/src/rdfutil.h,v  <--  rdfutil.h
new revision: 1.26; previous revision: 1.25
done
Rob, can you close this bug if it's fixed?
can someone close this if it should be closed?
Assignee: tingley → readams
Status: ASSIGNED → NEW
Flags: blocking1.9a1? → blocking1.9-
Closing as Fixed per Rob A's patch.  Thanks for de-rotting this.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Is there a reason to not apply this to 1.8?
Nobody requested to land it, and IMHO it's too late to land it now.
Target Milestone: --- → mozilla1.9alpha
Fair enough - I applied it to the Flock tree since we were having problems.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: