XMLParser should load DTDs on standalone, set referrer on module

VERIFIED FIXED in mozilla1.4alpha

Status

()

Core
XSLT
VERIFIED FIXED
16 years ago
15 years ago

People

(Reporter: Axel Hecht, Assigned: Axel Hecht)

Tracking

Trunk
mozilla1.4alpha
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

16 years ago
XMLParser for standalone needs some cleanup, and it should load DTDs. That way
we will finally be able to get the docbook xsl working on standalone.
(I wonder if it works on module with a patched tree, now that VERSION should
be xml.)

Now, fixing the parser would be much nicer, if we had the handler routines as
methods on the object, somehow like nsExpatDriver does. Doing so requires
expat information to be part of the XMLParser object, which in the current
architecture blows up our REQUIRES list once more.
I would prefer to go via 
txGetDocumentFromURI and #ifdef TX_EXE txParseFromIStream (moving to 
nsIInputStream some day not so far)

While fixing the signature of txGetDocumentFromURI, we should add the loading
node as additional argument, get the baseURI of it's document and set that
as referrer if we have an nsIHttpChannel.

Oh, my tree does the parsing stuff already, I fixed IDs on standalone as well.
Methods being the handler would keep XMLParser a friend of the DOM, and keep
me from adding public methods to set IDs on docs and elements.

Comment 1

16 years ago
Duplicate of bug 22942?
no, this is about the standalone XSLT processor. It has nothing to do with
mozilla-the-browser.
(Assignee)

Comment 3

15 years ago
Created attachment 112970 [details] [diff] [review]
patch

this patch changes the API to the parser, moves the standalone parser into
XMLParser.cpp, makes the C functions pure stubs and the parser object the
expat userdata. *Closely* followed what nsExpatDriver.cpp does.
ID is implemented on standalone, now, and I fixed a perf issue in id pattern
while there.
Oh, and referrer get's set on module
(Assignee)

Updated

15 years ago
Status: NEW → ASSIGNED
(Assignee)

Updated

15 years ago
Attachment #112970 - Flags: superreview?(peterv)
Attachment #112970 - Flags: review?(bugmail)
(Assignee)

Comment 4

15 years ago
Created attachment 114705 [details] [diff] [review]
txXMLParser is new files

peterv indicated that he wants txXMLParser in txXMLParser.*
So here it goes.
As patchsize doesn't change, I did a more brutal cleanup of stuff.
Even more brutal.
Attachment #112970 - Attachment is obsolete: true
(Assignee)

Updated

15 years ago
Attachment #112970 - Flags: superreview?(peterv)
Attachment #112970 - Flags: review?(bugmail)
(Assignee)

Updated

15 years ago
Attachment #114705 - Flags: superreview?(peterv)
Attachment #114705 - Flags: review?(bugmail)
(Assignee)

Updated

15 years ago
Target Milestone: --- → mozilla1.4alpha
Comment on attachment 114705 [details] [diff] [review]
txXMLParser is new files

>Index: source/xml/dom/standalone/Document.cpp
>===================================================================

>@@ -39,6 +39,7 @@
> //
> Document::Document() : NodeDefinition(Node::DOCUMENT_NODE, nsString(), NULL)
> {
>+  mIDMap.Init(0);

Do we really want 0?

>+PRBool
>+Document::setElementID(const nsAString& aID, Element* aElement)
>+{
>+  txIDEntry* id = mIDMap.AddEntry(aID);
>+  // make sure IDs are unique
>+  if (!id->mElement) {
>+    id->mElement = aElement;
>+    return PR_TRUE;
>+  }
>+  return PR_FALSE;

Turn the logic around.

>Index: source/xml/dom/standalone/dom.h
>===================================================================

>+class txIDEntry : public PLDHashStringEntry
>+{
>+public:
>+    txIDEntry(const void* aKey) : PLDHashStringEntry(aKey), mElement(nsnull) {}
>+    ~txIDEntry() { }

I prefer brackets on their own lines.

>+    Element* mElement;

You could share the string for the id value with the element. A little bit more
code.

>Index: source/xml/parser/txXMLParser.cpp
>===================================================================

>+class txXMLParser
>+{
>+  public:
>+    Document* getDocumentFromURI(const nsAString& aHref,
>+                                 const nsAString& aReferrer,
>+                                 Document* aLoader,
>+                                 nsAString& aErrMsg);
>+#ifdef TX_EXE
>+    Document* parse(istream& aInputStream, const nsAString& aUri);

This isn't really ready for stylesheet compilation.
1) We need to be able to specify our own handlers
2) We need to be able to return something else from parse or have a
beginDocument and endDocument callback or something.
Any suggestions?

>+nsresult
>+txParseDocumentFromURI(const nsAString& aHref, const nsAString& aReferrer,
>+		       Document* aLoader, nsAString& aErrMsg,
>+		       Document** aResult)

Line these up with the opening parens.

>+{
>+    NS_ENSURE_ARG_POINTER(aResult);
>+    nsresult rv = NS_OK;
>+    txXMLParser parser;
>+    *aResult = parser.getDocumentFromURI(aHref, aReferrer, aLoader, aErrMsg);
>+    if (!*aResult)
>+        rv = NS_ERROR_FAILURE;
>+    return rv;

You don't really need rv. Just return the values directly.

>+}
>+
>+#ifdef TX_EXE
>+nsresult
>+txParseFromStream(istream& aInputStream, const nsAString& aUri,
>+                  nsAString& aErrorString, Document** aResult)
>+{
>+    NS_ENSURE_ARG_POINTER(aResult);
>+    nsresult rv = NS_OK;
>+    txXMLParser parser;
>+    *aResult = parser.parse(aInputStream, aUri);
>+    aErrorString = parser.getErrorString();
>+    if (!*aResult)
>+        rv = NS_ERROR_FAILURE;
>+    return rv;

Same here.

>+}
>+
>+#endif
>+
>+
>+Document*
>+txXMLParser::getDocumentFromURI(const nsAString& aHref,
>+                                const nsAString& aReferrer,
>+                                Document* aLoader,
>+                                nsAString& aErrMsg)

We really should change this to return nsresult one day.

>+{
>+#ifndef TX_EXE
>+    nsCOMPtr<nsIURI> documentURI;
>+    nsresult rv = NS_NewURI(getter_AddRefs(documentURI), aHref);
>+    NS_ENSURE_SUCCESS(rv, 0);
>+
>+    nsCOMPtr<nsIDOMDocument> theDocument;
>+    nsCOMPtr<nsIDocument> loaderDocument =
>+        do_QueryInterface(aLoader->getNSObj());
>+    nsCOMPtr<nsILoadGroup> loadGroup;
>+    nsCOMPtr<nsIURI> loaderUri;
>+    loaderDocument->GetDocumentLoadGroup(getter_AddRefs(loadGroup));
>+    loaderDocument->GetDocumentURL(getter_AddRefs(loaderUri));
>+    NS_ENSURE_TRUE(loaderUri, 0);
>+
>+    nsCOMPtr<nsIChannel> channel;
>+    rv = NS_NewChannel(getter_AddRefs(channel), documentURI, nsnull,
>+                       loadGroup);
>+    NS_ENSURE_SUCCESS(rv, 0);
>+    nsCOMPtr<nsIHttpChannel> http(do_QueryInterface(channel));

http = do_QI because I like it better and for consistency.

>+    if (http) {
>+        nsCOMPtr<nsIURI> refUri;
>+        NS_NewURI(getter_AddRefs(refUri), aReferrer);
>+        if (refUri) {
>+            http->SetReferrer(refUri);
>+        }

You might want to add

       http->SetRequestHeader(NS_LITERAL_CSTRING("Accept"),
			     
NS_LITERAL_CSTRING("text/xml,application/xml,application/xhtml+xml,*/*;q=0.1"),
			      PR_FALSE);

>+PR_STATIC_CALLBACK(int)
>+externalEntityRefHandler(XML_Parser aParser,
>+			 const XML_Char *aContext,
>+			 const XML_Char *aBase,
>+			 const XML_Char *aSystemId,
>+			 const XML_Char *aPublicId)

Line these up with the parens.

>+{
>+    // aParser is aUserData is the txXMLParser,
>+    // we set that in txXMLParser::parse

Really? Ughhh.

>+Document* txXMLParser::parse(istream& aInputStream, const nsAString& aUri)
>+{
>+    mErrorString.Truncate();
>+    if (!aInputStream) {
>+        mErrorString.Append(NS_LITERAL_STRING("unable to parse xml: invalid or unopen stream encountered."));
>+        return nsnull;
>+    }
>+    mExpatParser = XML_ParserCreate(nsnull);
>+    mDocument = new Document();

Check for out of memory.

>+int
>+txXMLParser::StartElement(const XML_Char *aName, const XML_Char **aAtts)
>+{
>+    Element* newElement;
>+    const XML_Char** theAtts = aAtts;
>+
>+    newElement =

Element* newElement =

>+        mDocument->createElement(nsDependentString((const PRUnichar*)aName));

Check for out of memory.

>+
>+    while (*theAtts) {
>+        nsDependentString attName((const PRUnichar*)*theAtts++);
>+        nsDependentString attValue((const PRUnichar*)*theAtts++);
>+        newElement->setAttribute(attName, attValue);
>+    }
>+
>+    int idx;
>+    if ((idx = XML_GetIdAttributeIndex(mExpatParser)) > -1) {
>+        nsDependentString idName((const PRUnichar*)*(aAtts + idx));
>+        nsDependentString idValue((const PRUnichar*)*(aAtts + idx + 1));
>+        // make sure IDs are unique
>+        if (!idValue.IsEmpty() && !idName.IsEmpty() &&
>+            mDocument->setElementID(idValue, newElement)) {
>+            newElement->setIDValue(idValue);

These could be combined, especially if you share the value.

>+void
>+txXMLParser::CharacterData(const XML_Char* aChars, int aLength)
>+{
>+    Node* prevSib = mCurrentNode->getLastChild();
>+    const PRUnichar* pChars = (const PRUnichar*)aChars;

NS_STATIC_CAST

>+    if (prevSib && prevSib->getNodeType() == Node::TEXT_NODE){

Space between parens and brace.

>+        ((NodeDefinition*)prevSib)->appendData(pChars, aLength);

NS_STATIC_CAST

>+    } else {

I prefer

}
else {

>Index: source/xslt/txStandaloneXSLTProcessor.cpp
>===================================================================

>+    nsresult rv;
>+    rv = txParseFromStream(xmlInput, path, errors, &xmlDoc);

Combine those two lines.

>Index: source/xslt/txXSLTPatterns.cpp
>===================================================================

>@@ -408,7 +385,9 @@
>     aDest.Append(NS_LITERAL_STRING("txIdPattern{"));
> #endif
>     aDest.Append(NS_LITERAL_STRING("id('"));
>-    aDest.Append(mIds);
>+    for (PRUint32 k = 0; k < mIds.Count(); ++k) {

Declare k before the for.

>Index: source/lib/Makefile.in
>===================================================================

You'll have to change the mac project.
Attachment #114705 - Flags: superreview?(peterv) → superreview+
Comment on attachment 114705 [details] [diff] [review]
txXMLParser is new files

>+class txIDEntry : public PLDHashStringEntry
>+{
>+public:
>+    txIDEntry(const void* aKey) : PLDHashStringEntry(aKey), mElement(nsnull) {}
>+    ~txIDEntry() { }
>+    Element* mElement;
>+};

You could get the key out from the element if you create your own entry-class
rather then inheriting PLDHashStringEntry.

>+nsresult
>+txParseDocumentFromURI(const nsAString& aHref, const nsAString& aReferrer,
>+		       Document* aLoader, nsAString& aErrMsg,
>+		       Document** aResult)
>+{
>+    NS_ENSURE_ARG_POINTER(aResult);
>+    nsresult rv = NS_OK;
>+    txXMLParser parser;
>+    *aResult = parser.getDocumentFromURI(aHref, aReferrer, aLoader, aErrMsg);
>+    if (!*aResult)
>+        rv = NS_ERROR_FAILURE;
>+    return rv;
>+}

IMHO you could make txParseDocumentFromURI be the point where module and
standalone diverge. So txXMLParser would be only exist in standalone and the
module-code would live directly in txParseDocumentFromURI. After all,
standalone needs neither aLoader or aReferrer, so there's not really any point
in passing them further then needed.

>+Document*
>+txXMLParser::getDocumentFromURI(const nsAString& aHref,
>+                                const nsAString& aReferrer,
>+                                Document* aLoader,
>+                                nsAString& aErrMsg)
>+{
>+#ifndef TX_EXE
...
>+#else

I assume the module code was copied right over? Other then the referrer part

>+PR_STATIC_CALLBACK(int)
>+startElement(void *aUserData, const XML_Char *aName, const XML_Char **aAtts)
>+{
>+    NS_ENSURE_TRUE(aUserData, XML_ERROR_NONE);
>+    return TX_XMLPARSER(aUserData)->StartElement(aName, aAtts);
>+}

can aUserData ever be null? Wouldn't an assertion be enough?

>+/**
>+ *  Parses the given input stream and returns a DOM Document.
>+ *  A NULL pointer will be returned if errors occurred
>+**/

please use java-doc style comments

>+    const int bufferSize = 1024;
>+    static char buf[bufferSize];

hmm.. having this static looks dangerous, that'll make it
reentrant/thread-unsafe. Just place it on the stack, that should be just as
cheap.


>+    if ((idx = XML_GetIdAttributeIndex(mExpatParser)) > -1) {
>+        nsDependentString idName((const PRUnichar*)*(aAtts + idx));
>+        nsDependentString idValue((const PRUnichar*)*(aAtts + idx + 1));
>+        // make sure IDs are unique
>+        if (!idValue.IsEmpty() && !idName.IsEmpty() &&

Do you really need to check that the name is non-empty? expat should have
checked that the name is a valid xml-name.

>+    if (aPublicId) {
>+        // not supported, this is "http://some.site.net/foo.dtd" stuff
>+        return 1;
>+    }

Isn't there some XML_ERROR_X that you can return rather then '1'?

>+    nsAutoString absUrl;
>+    URIUtils::resolveHref(nsDependentString((PRUnichar*)aSystemId),
>+                          nsDependentString((PRUnichar*)aBase), absUrl);
>+    istream* extInput = URIUtils::getInputStream(absUrl, mErrorString);
>+    if (!extInput) {
>+        return 1;
>+    }
>+    XML_Parser entParser = 
>+        XML_ExternalEntityParserCreate(mExpatParser, aContext, nsnull);
>+    if (!entParser) {
>+        delete extInput;
>+        return 1;
>+    }
>+    XML_Parser parent = mExpatParser;
>+    mExpatParser = entParser;
>+    XML_SetBase(entParser, absUrl.get());

please "back up" mExpatParser before creating the new parser and do
|mExpatParser = XML_External...| directly instead.

>+    const int bufSize = 1024;
>+    static char buffer[bufSize];

Again, static seems dangerous. What happens if an external entity contains a
link to another external entity? Wouldn't that recurse into this function and
mess up the buffer?

>@@ -408,7 +385,9 @@
>     aDest.Append(NS_LITERAL_STRING("txIdPattern{"));
> #endif
>     aDest.Append(NS_LITERAL_STRING("id('"));
>-    aDest.Append(mIds);
>+    for (PRUint32 k = 0; k < mIds.Count(); ++k) {
>+        aDest.Append(*mIds[k]);
>+    }

You need to insert a ' ' between all id's


You need to OOM-check when creating the DOM-nodes.

I wonder if it's really worth passing the referrer as a separate argument when
loading documents. You might as well get that from the loader-document. That
should work even if we start figuring out the "actual" loading node.

with that, r=sicking
Attachment #114705 - Flags: review?(bugmail) → review+
The way i think we should do stylesheet loading is to let the
StartElement/Comment/CharacterData functions be purly virtual in txXMLParser and
then implement them in a txDOMParser subclass to create a DOM, and let a
txStandaloneStylesheetParser subclass and call the stylesheet-parser.

But that's for a separate patch IMHO
(Assignee)

Comment 8

15 years ago
Created attachment 116309 [details] [diff] [review]
fix comments

this patch fixes all comments by peter and jonas, except:
I didn't change the aUserData check in txXMLParser, better be overly safe.
That ain't perf critical anyway.
And I didn't share the ID value in the ID hash with the mElement, because 
mElement isn't set on generation of the entries, so there are times when
entries have no key value. As that ain't a huge deal, better be safe than
worry.
Attachment #114705 - Attachment is obsolete: true
(Assignee)

Comment 9

15 years ago
Comment on attachment 116309 [details] [diff] [review]
fix comments

carrying reviews
Attachment #116309 - Flags: superreview+
Attachment #116309 - Flags: review+
(Assignee)

Comment 10

15 years ago
checked in
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
(Assignee)

Comment 11

15 years ago
mass verifying
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.