If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

remove nsRDFParserUtils

RESOLVED FIXED in mozilla1.9alpha1

Status

()

Core
RDF
--
critical
RESOLVED FIXED
14 years ago
4 years ago

People

(Reporter: Chase Tingley, Assigned: Rob Adams)

Tracking

({perf})

Trunk
mozilla1.9alpha1
x86
All
Points:
---
Bug Flags:
blocking1.9 -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 2 obsolete attachments)

(Reporter)

Description

14 years ago
nsRDFParserUtils, as far as I can tell, contains legacy left over from before
RDF was hooked up to expat.  Some of its code is never called, and the rest of
performs functions that are redundant and (as a result) incorrect.

nsRDFParserUtils contains 3 functions:

GetQuotedAttributeValue() is never used, and would be redundant with
nsParserUtils::GetQuotedAttributeValue() if it were.

EntityToUnicode() is only called from StripAndConvert().

StripAndConvert is applies two transformations to attribute values:
    - Strip matching single and double quotes from strings.
    - Convert some character entities.

The first behavior is not only unnecessary, it is incorrect.  Quotes have been
stripped, to the extent that they should be, by the parser before the content
sink ever sees them; removing additional pairs of quotes is altering opaque data
that we shouldn't touch.  When presented with something like this:

    <RDF:Description RDF:about="..." NC:Name="'In Quotes'" />

The parser will strip the outer "" and pass |'In Quotes'| to RDF. 
StripAndConvert() will incorrectly remove the '' as well.

The second part selectively unescapes certain character entities, notably &amp;,
&gt;/&lt;, etc.  Again, this is redundant -- entities in the attributes have
already been handled by the parser.  (Historical note -- I believe this happened
as part of bug 57248 -- see harishd's bug 57248 comment #62 and onward, where he
discusses moving attribute entity parsing from the sinks to the parser.) 
Similar to the quote-stripping, this can produce incorrect behavior in fringe
cases involving doublely-embedded entities.

[As a side note, both these behaviors make a round-trip through the RDF system
non-idempotent for some RDF literals, which is bad behavior.]

Killing this code will reduce footprint, speed up RDF processing, and pave the
way for a string cleanup of the RDF content sink that I've been meaning to do
for some time.  I had a patch for this at one point that sped up RDF parsing of
chrome.rdf by about 8% in an opt build; it needs to be de-rotted, though.
(Reporter)

Comment 1

14 years ago
Created attachment 139530 [details] [diff] [review]
current thoughts

This is missing stuff to cut it out of the build on various other platforms, if
that's necessary.  It removes the StripAndConvert calls and cleans up some
string stuff nearby that I couldn't resist getting rid of.

I'm still testing this.
(Reporter)

Comment 2

14 years ago
rjc: not looking for a review yet, but tell me if my reasoning sounds bogus.
(Reporter)

Comment 3

14 years ago
Created attachment 139937 [details] [diff] [review]
patch

Basically the same as before, though I've included the contents of the
parseutils files, essentially for reference.  I've been testing this for a
week, and everything seems ok.
Attachment #139530 - Attachment is obsolete: true
(Reporter)

Updated

14 years ago
Attachment #139937 - Flags: review?(rjc)
(Reporter)

Updated

14 years ago
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.7alpha

Comment 4

14 years ago
yikes. loss of data, bumping severity.
This is a good thing to do, will attach a testcase that show the loss of data
if you do an rdfcat on it.
I wonder if we have to use a nsACString, though.
rdf_RequiresAbsoluteURI should just need nsCSubstring, 
rdf_MakeAbsoluteURI(const nsCSubstring& aBaseURI, const nsCSubstring& aURI,
                    nsCString& aResult)
No nsA, as we don't wanna cross xpcom boundaries and there's no need to use
the binary compat virtual function call interfaces to strings.
Severity: normal → critical
QA Contact: core.rdf
Target Milestone: mozilla1.7alpha → ---

Comment 5

14 years ago
Created attachment 146425 [details]
roundtrip loosing all kind of data.
(Reporter)

Updated

14 years ago
Attachment #139937 - Flags: review?(rjc)
(Reporter)

Comment 6

14 years ago
*** Bug 241228 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 7

12 years ago
Created attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

This patch fixes the double-unescaping that the RDF parser does, so that things
like &amp;&nbsp; are interpreted correctly.

Note that even with this patch, mozilla still will badly mangle wide characters
in RDF data.
(Assignee)

Updated

12 years ago
Flags: blocking1.9a1?
(Assignee)

Updated

12 years ago
Attachment #196368 - Flags: review?(axel)

Comment 8

12 years ago
*** Bug 300602 has been marked as a duplicate of this bug. ***

Comment 9

12 years ago
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve, everything
else is just keeping mad code alive. And you were as far as changing the
signature anyway.
Attachment #196368 - Flags: review?(axel) → review-

Comment 10

12 years ago
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve in the
three contentsink call sites, everything else is just keeping mad code alive.
And you were as far as changing the signature of that "helper" anyway.

Comment 11

12 years ago
Comment on attachment 196368 [details] [diff] [review]
This updates the already-attached bitrotten patch

Let me go for a dive.

The content sink stores the nsIURI, then, for each ID or other relative URL,
get's the spec, creates a new URI object, calls into necko, which calls Resolve
on the new nsIURI object?

Sob. Please fix that by directly calling into mDocumentURL->Resolve in the
three contentsink call sites, everything else is just keeping mad code alive.
And you were as far as changing the signature of that "helper" anyway.
(Assignee)

Comment 12

12 years ago
Created attachment 199176 [details] [diff] [review]
not bitrotten; uses nsIURL::Resolve

This patch is the same as the other one, but removes also rdf_MakeAbsoluteURI
completely.
Attachment #196368 - Attachment is obsolete: true
Attachment #199176 - Flags: review?(axel)

Comment 13

12 years ago
Comment on attachment 199176 [details] [diff] [review]
not bitrotten; uses nsIURL::Resolve



>Index: nsRDFContentSink.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/nsRDFContentSink.cpp,v
>retrieving revision 1.121.8.1
>diff -p -u -r1.121.8.1 nsRDFContentSink.cpp
>--- nsRDFContentSink.cpp	30 Aug 2005 02:24:59 -0000	1.121.8.1
>+++ nsRDFContentSink.cpp	11 Oct 2005 16:50:04 -0000
>@@ -858,20 +853,15 @@ RDFContentSinkImpl::GetIdAboutAttribute(
>             if (aIsAnonymous)
>                 *aIsAnonymous = PR_FALSE;
> 
>-            nsAutoString uri(aAttributes[1]);
>-            nsRDFParserUtils::StripAndConvert(uri);
>-
>-            rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(docURI), uri);

Check rdf_RequiresAbsoluteURI here? The cheapest way is probably to just call
into gRDFService->GetResource inside the if and after it.

>+            nsCAutoString uri;
>+            rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+            if (NS_FAILED(rv)) return rv;
> 
>-            return gRDFService->GetUnicodeResource(uri, aResource);
>+            return gRDFService->GetResource(uri, aResource);
>         }
>         else if (localName == kIdAtom) {
>             if (aIsAnonymous)
>                 *aIsAnonymous = PR_FALSE;
>-
>-            nsAutoString name(aAttributes[1]);
>-            nsRDFParserUtils::StripAndConvert(name);
>-
>             // In the spirit of leniency, we do not bother trying to
>             // enforce that this be a valid "XML Name" (see
>             // http://www.w3.org/TR/REC-xml#NT-Nmtoken), as per
>@@ -880,11 +870,12 @@ RDFContentSinkImpl::GetIdAboutAttribute(
>             // Construct an in-line resource whose URI is the
>             // document's URI plus the XML name specified in the ID
>             // attribute.
>-            name.Insert(PRUnichar('#'), 0);
>-          
>-            rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(docURI), name);
>+            nsCAutoString name;
>+            rv = mDocumentURL->Resolve(NS_LITERAL_CSTRING("#") + 
>+                                       NS_ConvertUCS2toUTF8(aAttributes[1]), name);
nsCAutoString ref('#');
AppendUTF16toUTF8(aAttributes[1], ref);

should be a tad faster. No need to call into RequiresAbsolu... here.

>+            if (NS_FAILED(rv)) return rv;
> 
>-            return gRDFService->GetUnicodeResource(name, aResource);
>+            return gRDFService->GetResource(name, aResource);
>         }
>         else if (localName == kAboutEachAtom) {
>             // XXX we don't deal with aboutEach...
>@@ -923,17 +914,15 @@ RDFContentSinkImpl::GetResourceAttribute
>       // first thing that was specified and ignore the other.
> 
>       if (localName == kResourceAtom) {
>-          nsAutoString uri(aAttributes[1]);
>-          nsRDFParserUtils::StripAndConvert(uri);
>-
>           // XXX Take the URI and make it fully qualified by
>           // sticking it into the document's URL. This may not be
>           // appropriate...
>-          nsCAutoString documentURL;
>-          mDocumentURL->GetSpec(documentURL);
>-          rdf_MakeAbsoluteURI(NS_ConvertUTF8toUCS2(documentURL), uri);
>+          nsresult rv;

check for rdf_RequiresAbsoluteURI here, too.

>+          nsCAutoString uri;
>+          rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+          if (NS_FAILED(rv)) return rv;
> 
>-          return gRDFService->GetUnicodeResource(uri, aResource);
>+          return gRDFService->GetResource(uri, aResource);
>       }
>   }
>   return NS_ERROR_FAILURE;
<...>

>Index: rdfutil.cpp
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/rdfutil.cpp,v
>retrieving revision 1.60
>diff -p -u -r1.60 rdfutil.cpp
>--- rdfutil.cpp	9 Sep 2004 20:17:36 -0000	1.60
>+++ rdfutil.cpp	11 Oct 2005 16:50:05 -0000
>@@ -87,85 +87,13 @@ rdf_MakeRelativeRef(const nsCSubstring& 
> }
> 
> static PRBool
>-rdf_RequiresAbsoluteURI(const nsString& uri)
>+rdf_RequiresAbsoluteURI(const nsACString& uri)

nsCString is good enough these days. This isn't really for public consumption 
anyway.

> {

<...>

>-
>-    return NS_OK;
>+    return !(StringBeginsWith(uri, NS_LITERAL_CSTRING("urn:")) ||
>+             StringBeginsWith(uri, NS_LITERAL_CSTRING("chrome:")) ||
>+             StringBeginsWith(uri, NS_LITERAL_CSTRING("nc:"),

drop the nc: one, we shouldn't have those.

>+                              nsCaseInsensitiveCStringComparator()));
> }
> 
> void
>Index: rdfutil.h
>===================================================================
>RCS file: /cvsroot/mozilla/rdf/base/src/rdfutil.h,v

Expose rdf_RequiresAbsoluteURI here or move it over to the content sink
completely.
I guess I'd advocate to move it.

I'm a yota away from r+ now, but I guess seeing the final patch won't hurt.
Attachment #199176 - Flags: review?(axel) → review-
(Assignee)

Comment 14

12 years ago
Created attachment 199191 [details] [diff] [review]
the March of Patches continues...

some shenanigans involving RequiresAbsoluteURI.  Man, converting various
strings is a pain in the butt.
Attachment #199191 - Flags: review?(axel)
(Assignee)

Comment 15

12 years ago
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

ping?

Comment 16

12 years ago
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

r=me with the following nit fixed, 

>Index: nsRDFContentSink.cpp
>===================================================================

<...>
>+            if (rdf_RequiresAbsoluteURI(relURI)) {
>+                nsCAutoString uri;
>+                rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+                if (NS_FAILED(rv)) return rv;
>+                
>+                return gRDFService->GetResource(uri, 
>+                                                aResource);
>+            } else {
>+                return gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1]), 
>+                                                aResource);
>+            }

and 

<...>
>+          if (rdf_RequiresAbsoluteURI(relURI)) {
>+              nsresult rv;
>+              nsCAutoString uri;
>+
>+              rv = mDocumentURL->Resolve(NS_ConvertUCS2toUTF8(aAttributes[1]), uri);
>+              if (NS_FAILED(rv)) return rv;
>+
>+              return gRDFService->GetResource(uri, aResource);
>+          } else {
>+              return gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1]), 
>+                                              aResource);
>+          }
>       }

No "else" after return, please.
Attachment #199191 - Flags: superreview?(shaver)
Attachment #199191 - Flags: review?(axel)
Attachment #199191 - Flags: review+
Comment on attachment 199191 [details] [diff] [review]
the March of Patches continues...

sr=shaver
Attachment #199191 - Flags: superreview?(shaver) → superreview+
(Assignee)

Comment 18

12 years ago
Created attachment 202107 [details] [diff] [review]
Altered patch making the requested change

This patch makes the requested change to the reviewed patch.  I don't have commit access so I need someone to commit for me.  Whoever does it should be sure to do a cvs remove on nsRDFParserUtils.cpp
gRDFService->GetResource(NS_ConvertUCS2toUTF8(aAttributes[1])

I'm not sure why this doesn't use GetUnicodeResource...

but anyway, I checked this in as-is (and removed nsRDFParserUtils.*)

fixed on trunk
Checking in rdf/base/src/Makefile.in;
/cvsroot/mozilla/rdf/base/src/Makefile.in,v  <--  Makefile.in
new revision: 1.36; previous revision: 1.35
done
Checking in rdf/base/src/nsRDFContentSink.cpp;
/cvsroot/mozilla/rdf/base/src/nsRDFContentSink.cpp,v  <--  nsRDFContentSink.cpp
new revision: 1.123; previous revision: 1.122
done
Removing rdf/base/src/nsRDFParserUtils.cpp;
/cvsroot/mozilla/rdf/base/src/nsRDFParserUtils.cpp,v  <--  nsRDFParserUtils.cpp
new revision: delete; previous revision: 1.17
done
Removing rdf/base/src/nsRDFParserUtils.h;
/cvsroot/mozilla/rdf/base/src/nsRDFParserUtils.h,v  <--  nsRDFParserUtils.h
new revision: delete; previous revision: 1.11
done
Checking in rdf/base/src/rdfutil.cpp;
/cvsroot/mozilla/rdf/base/src/rdfutil.cpp,v  <--  rdfutil.cpp
new revision: 1.61; previous revision: 1.60
done
Checking in rdf/base/src/rdfutil.h;
/cvsroot/mozilla/rdf/base/src/rdfutil.h,v  <--  rdfutil.h
new revision: 1.26; previous revision: 1.25
done

Comment 20

12 years ago
Rob, can you close this bug if it's fixed?

Comment 21

11 years ago
can someone close this if it should be closed?
Assignee: tingley → readams
Status: ASSIGNED → NEW
Flags: blocking1.9a1? → blocking1.9-
(Reporter)

Comment 22

11 years ago
Closing as Fixed per Rob A's patch.  Thanks for de-rotting this.
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED

Comment 23

11 years ago
Is there a reason to not apply this to 1.8?

Comment 24

11 years ago
Nobody requested to land it, and IMHO it's too late to land it now.
Target Milestone: --- → mozilla1.9alpha

Comment 25

11 years ago
Fair enough - I applied it to the Flock tree since we were having problems.
You need to log in before you can comment on or make changes to this bug.