Closed Bug 96647 Opened 23 years ago Closed 23 years ago

Change the way output is constructed in Transformiix

Categories

(Core :: XSLT, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9.9

People

(Reporter: peterv, Assigned: peterv)

References

Details

(Keywords: perf)

Attachments

(1 file, 7 obsolete files)

I have rewritten the way the output is constructed in Transformiix. As part of
the rewrite I'm fixing bug 94921, bug 59912, bug 53030 and bug 88198. I'll try
to attach the patch tomorrow, so the reviews can begin. I'm trying to get this
into 0.9.4.
Status: NEW → ASSIGNED
Depends on: 53030, 59912, 88198, 94921
OS: Mac System 8.5 → All
Priority: -- → P1
Target Milestone: --- → mozilla0.9.4
No longer depends on: 53030, 59912, 88198, 94921
Blocks: 53030, 59912, 88198, 94921
Good work Peter, thanks very much :) -mike
Blocks: 53944
Moving out :(.
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Attached patch This is not a black hole (obsolete) — Splinter Review
Don't try or review this patch, it's just work-in-progress, to show that there
is (some) progress.
Blocks: 99032
All these missed the bus/train/plane/boat/whatever. Sad.
Target Milestone: mozilla0.9.5 → mozilla0.9.6
based on the experience in bug 103659, do we need the output directory?
I don't see the txAtomList.* in base, either. Those are XSLT special, so they
should go into source/xslt, imho. Maybe this should be txXSLTAtoms, even?
See bug 76070, the naming should be consistent.

Axel
Attached patch Incomplete patch (obsolete) — Splinter Review
Patch still misses a few files, but this part should be reviewable. I'm going to
move this and dependent bugs to 0.9.7. No way this will be ready and reviewed
for 0.9.6. It'll probably go in soon after 0.9.7 opens.
ProcessorState.cpp:
+            PRInt32 nsID = node->getNamespaceID();
+            if (!nsID == kNameSpaceID_XSLT)
nsID is typedef'd in nsID.h.
nsID != kNameSpaceID_XSLT.

- * Returns the namespace URI for the given namespace prefix, this method should
+ * Returns the namespace ID for the given namespace prefix, this method should
you don't.

+            if (XMLDOMUtils::getNamespaceURI(aPrefix, (Element*)node,
aNamespaceURI))
newline?

+        if (element->getAttr(txXSLTAtoms::defaultSpace, kNameSpaceID_XSLT,
value)
default-space isn't part of the spec (at least I can't find "default-space"),
and if, it wouldn't be in the XSLT namespace.
(and it could use a newline)
My vote is pruning it.

XSLTProcessor.cpp:
+                    if (element->getAttr(txXSLTAtoms::omitXmlDeclaration,
wrong namespace.

    if (!xslType)
        return XSLType::LITERAL;
that should error, unknown element in XSLT namespace.

XSLTProcessor::processAction
+        const String& nodeName = aXsltAction->getNodeName();
you only need that for LITERALs

more to come (I guess), I'm off to the movies.
The patch for 96647 is nearly done, but it's a bit big and we need time for
reviews, unless something serious comes up this should land early in 0.9.7.
Target Milestone: mozilla0.9.6 → mozilla0.9.7
First of all, this is a VERY nice architectural change. Thanks for doing it!


Since you're fixing it up anyway, could you make XMLDOMUtils::getNodeValue use 
the same code for attributes, textnodes, comments and PIs. Simply append the 
nodeValue.


+PRInt32 Document::namespaceURIToID(const String& aNamespaceURI)
+{
+    PRInt32 nsID = kNameSpaceID_Unknown;
+    if (nsNSManager)
+        nsNSManager->GetNameSpaceID(aNamespaceURI.getConstNSString(), nsID);
+    return nsID;
+}

should you use nsNSManager->RegisterNameSpace?


+    if (!aPrefix || aPrefix == txXMLAtoms::_empty)
+        aPrefix = txXMLAtoms::xmlns;
+

why do you need to do this?


-nsSyncLoader::LoadDocument(nsIURI* documentURI, nsIDocument *aLoader, 
nsIDOMDocument **_retval)
+nsSyncLoader::LoadDocument(nsIURI* documentURI, nsIDocument *aLoader, 
nsIDOMDocument **aResult)

could we keep _retval? result isn't really an argument and the rest of moz uses 
_retval.


+    if ((mMethod == eMethodNotSet) &&
+        (aOutputFormat.mMethod != eMethodNotSet))
+        mMethod = aOutputFormat.mMethod;

You don't actually have to check (aOutputFormat.mMethod != eMethodNotSet)


+    // Merges in the values of aOutputFormat, members that have no
+    // value in aOutputFormat will not be changed.
+    void merge(txOutputFormat& aOutputFormat);

Could you mention that values that are set in *this* format will not be changed


+    while ((qName = (txQualifiedName*)iter.next())) {
+        mCDATASectionElements.add(qName);
+        iter.remove();
+    }

mmmmm, I like that loop :). But IMHO you could let the txList destructor take 
care of the deletion, that should also save a few cycles.


+ enum txOutputMethod { eMethodNotSet = 0, eXMLOutput, eHTMLOutput, 
eTextOutput };

Pike is gonna get you for the length of that one ;-)
More comments:

+enum txThreeState { eNotSet = -1, eFalse = MB_FALSE, eTrue = MB_TRUE };

are you sure that PR_TRUE isn't defined as -1 on any platform?


+                        String namePart;
+                        XMLUtils::getPrefix(token, namePart);
+                        txAtom* nameAtom = TX_GET_ATOM(namePart);
+                        PRInt32 nsID = element->lookupNamespaceID(nameAtom);

unprefixed means nullnamespace, not default namespace.


+   format.mOmitXMLDeclaration = attValue.isEqual(YES_VALUE) ? eTrue : eFalse;

IMHO we should get rid of YES_VALUE (and a lot of other static strings), would 
using "yes" be ok?


-                    //-- unknown
+                {
+                    // unknown
                     break;
+               }

please fix indentation of the last '}'


+    // Create wrapper for the source document.
+    nsCOMPtr<nsIDOMDocument> sourceDOMDocument;
+    aSourceDOM->GetOwnerDocument(getter_AddRefs(sourceDOMDocument));
+    if (!sourceDOMDocument)
+        sourceDOMDocument = do_QueryInterface(aSourceDOM);

That shouldn't work if aSourceDOM is the document node. So I don't understand 
how it can work now??


In XSLTProcessor::initializeHandlers, shouldn't you merge the output formats 
before using |aPs->getOutputFormat()->mMethod|?


In XSLTProcessor::TransformDocument, are you sure that ps will be deleted 
before sourceDocument, xslDocument and resultDocument? If it's not, bad things 
will happen.


more to come :)
+ txAtom* nameAtom = TX_GET_ATOM(name);
+ if (nameAtom == txXMLAtoms::xmlns) {

Isn't |name.IsEqual("xmlns")| faster?


+ if ((prefixAtom != txXMLAtoms::_empty) && (prefixAtom != txXMLAtoms::xmlns))
+     resultNsID = actionElement->lookupNamespaceID(prefixAtom);

This'll create attributes with prefix "xmlns" in the null-namespace.


Could you clean up the handling of text/cdata nodes in the stylesheet, we don't 
really need that |String textValue| IMHO


Do we really need ::cdata and ::entityReference in the output handlers?
in XSLTProcessor::processVariable
+        return (NodeSet*)rtf;

Why cast to NodeSet? Shouldn't txResultTreeFragment also be an ExprResult once 
we have it?


in XSLType::ATTRIBUTE:
+        XMLUtils::normalizeAttributeValue(value);

should this really be done for module?


I don't quite like that the output formats are merged in 
XSLTProcessor::initializeHandlers. It would be nice if we had some
::initializeStylesheet (i thought of a better name yesterday, but i can't 
remember it now) function that is called after all ::processTopLevel are done, 
but before we start the actual transform is started.
Not sure if the function should live in ProcessorState or XSLTProcessor, but 
i'm ok with either way. In that funciton we could do some stuff like merge 
output formats, evaluate global variables and some other foo.


in XSLType::COMMENT:
+     XMLUtils::normalizePIValue(value);

cut'n'paste error?


(note to self: file a bug on PI-target-name check)


in XSLType::VALUE_OF:
+        if (!value.isEmpty()) {
+            // XXX Need to handle disable-output-escaping
+            NS_ASSERTION(mResultHandler, "mResultHandler must not be NULL!");
+            mResultHandler->characters(value);
+        }

IMHO there's no need to check for |!value.isEmpty()|


mmmmmm.. i like the new code for LREs


+        StackIterator attributeSets((txList*)&mAttributeSetStack);

is this really safe?


In XSLTProcessor::processDefaultTemplate
+            String value = node->getNodeValue();
+            NS_ASSERTION(mResultHandler, "mResultHandler must not be NULL!");
+            mResultHandler->characters(value);

+            NS_ASSERTION(mResultHandler, "mResultHandler must not be NULL!");
+            mResultHandler->characters(node->getNodeValue());
should save you a string-copy


in XSLTProcessor::copyNode:
CDATA_SECTION_NODEs should IMHO not call mResultHandler->cdata but rather 
mResultHandler->characters, so you could use the same code as for TEXT_NODE; 
just |mResultHandler->characters(aNode->getNodeValue());|


XSLTProcessor::xslCopyOf should sort nodesets in document order before copying 
them
to build this on unix, both source/xslt/Makefile.in and build/Makefile.in need
an additional REQUIRES on unicharutil, and build/Makefile.in needs
  ../source/xslt/txRtfHandler.$(OBJ_SUFFIX) \
  ../source/xslt/txTextHandler.$(OBJ_SUFFIX) \
  ../source/xslt/txMozillaTextOutput.$(OBJ_SUFFIX) \
  ../source/xslt/txMozillaXMLOutput.$(OBJ_SUFFIX) \
oh, txQualifiedName should really be called txExpandedName
Attached patch Removed files (obsolete) — Splinter Review
Attachment #48419 - Attachment is obsolete: true
Attachment #56708 - Attachment is obsolete: true
Attached patch v2 (obsolete) — Splinter Review
Still no standalone serializers. They should be flying in any moment now :(.
Attr::Attr (standalone) shows that we don't do  XMLUtils::getLocalPart vs.
finding : ourselves consistently. Fix this one? (The only one you add)

standalone/Element.cpp
Element::setAttributeNS
don't 
+    mAttributes.setNamedItem(temp);
as that iterates over the attributes again.

ProcessorState::ProcessorState constructors could get some :mFoo(aFoo), IMHO

XSLTProcessor::processStylesheet
catch the namespace early? Maybe add a 
// XXX Error report warning
for a warning about deprecated namespace and wrong top level element with
XSLT namespace?

Add a comment for the {} around the ProcessorStates, it looks kinda bogus, and
the destructor order is worth mentioning.

:1130 has
                    const String& mode =
                        actionElement->getAttribute(MODE_ATTR);
could you getAttr that one, too?

txXMLEventHandler.h includes iostream.h twice. I doubt that you need it for
module.

hitting commit now, I regenerated all but one of these comments allready. *rant*
Oh, did anybody say? This makes us twice as fast. (tested on veiligheid/2)
I didn't measure the bloat, but if I saw this right, similar numbers there too.
Keywords: perf
String::isEqualIgnoreCase:
please remove the
+  if (this == &data)
+    return MB_TRUE;
it'll probably never happen anyway

+  const UNICODE_CHAR* otherBuffer = data.toUnicode();
looks better as
+  const UNICODE_CHAR* otherBuffer = data.strBuffer;
IMHO


Node::lookupNamespaceID:
+    if (!aPrefix || aPrefix == txXMLAtoms::_empty)
+        aPrefix = txXMLAtoms::xmlns;
+

I don't quite like that |!aPrefix|, IMHO it should be an error and not need any 
specialcasing


standalone Attr::Attr:
+  if (aNamespaceURI.isEmpty()) {
+    Attr(aName, aOwner);
+  }

That will not behave according to the DOM-spec, since a prefix in the name will 
be looked up. IMHO it would be nicer to always do what's in the |else| and just 
specialcase the namespace setting for empty aNamespaceURI


standalone Element::Element:
+  if (!aNamespaceURI.isEmpty())
+      mNamespaceID = txNamespaceManager::getNamespaceID(aNamespaceURI);

IMHO if aNamespaceURI is empty then mNamespaceID should be set to 
kNameSpaceID_None


ProcessorState::addDecimalFormat:
I would've rather that you left cleanups like this for a separate patch, but 
since you do it...
-    attValue = element->getAttribute(DECIMAL_SEPARATOR_ATTR);
+    element->getAttr(txXSLTAtoms::decimalSeparator, kNameSpaceID_None,
+                     attValue);
     if (attValue.length() == 1)
         format->mDecimalSeparator = attValue.charAt(0);
     else if (!attValue.isEmpty())
         success = MB_FALSE;

should be

-    attValue = element->getAttribute(DECIMAL_SEPARATOR_ATTR);
-    if (attValue.length() == 1)
-        format->mDecimalSeparator = attValue.charAt(0);
-    else if (!attValue.isEmpty())
-        success = MB_FALSE;
+    if (element->getAttr(txXSLTAtoms::decimalSeparator, kNameSpaceID_None,
+                         attValue) {
+      if (attValue.length() == 1)
+          format->mDecimalSeparator = attValue.charAt(0);
+      else
+        success = MB_FALSE;
+    }

and

-    attValue = element->getAttribute(INFINITY_ATTR);
+    element->getAttr(txXSLTAtoms::infinity, kNameSpaceID_None,
+                     attValue);
     if (!attValue.isEmpty())
         format->mInfinity=attValue;

should be

-    attValue = element->getAttribute(INFINITY_ATTR);
-    if (!attValue.isEmpty())
+    if (element->getAttr(txXSLTAtoms::infinity, kNameSpaceID_None,
+                         attValue))
         format->mInfinity=attValue;

/me runs for cover


XSLTProcessor::process:
+        if (!importFrame.hasNext())
+            // XXX ErrorReport: out of memory
+            return;
+        importFrame.next();

would be nicer as

+        if (!importFrame.next())
+            // XXX ErrorReport: out of memory
+            return;

IMHO


XSLTProcessor::processAction case XSLType::APPLY_TEMPLATES:
+                if (!exprResult)
+                    // XXX ErrorReport: out of memory
+                    break;

This can be other errors then "out of memory". Just do the |break| IMHO. The 
errorhandling should be already taken care of once we get this far. Same for 
FOR_EACH and IF


XSLTProcessor::processAction case XSLType::ATTRIBUTE:
+                txAtom* nameAtom = TX_GET_ATOM(name);
+                if (nameAtom == txXMLAtoms::xmlns) {

I still think that |name.isEqual("xmlns")| is faster?

+      (prefixAtom != txXMLAtoms::xmlns))

I really don't see that doing <xsl:attribute name="xmlns:foo"/> should create 
an attribute in the null namespace. IMHO the specialcasing for xmlns should be 
removed and ::lookupNamespaceID should return kNameSpaceID_Unknown for "xmlns". 
You could leave the latter for another bug if you want.


XSLTProcessor::processAction case XSLType::COMMENT:
The '-' loop will stop too soon if there is a '-' with an non-'-' following. 
You'll have to always loop all the way to the end, and then check if the last 
char is '-'.


XSLTProcessor::copyNode case Node::ATTRIBUTE_NODE:
+            Attr* attr = (Attr*)aNode;

Why not use aNode directly?

That's about it except for the output handlers, but you gotta explain those to 
me
I vote against changing the prefixAtom. I see a good benefit in having both
the empty atom and a null atom referring to an empty prefix.
I want to keep the relation between xmlns prefix and the xmlns namespace, too.

> I really don't see that doing <xsl:attribute name="xmlns:foo"/> should create
> an attribute in the null namespace.
reread the spec: if namespace is given, that is used. Otherwise the prefix is
resolved, unless it's xmlns, that is dropped.
Which brings us to the point that the result attribute must not have the prefix
xmlns. Peter, you should cut the xmlns: from the name.
no, xmlns can be dropped for *output*, not when deciding what namespace the attr has.

So for module this isn't a problem at all, but standalone *should* remove "xmlns:" prefixes (probably in txStandaloneOutputHandler::attribute since other ways of creating attributes could also result in attributes with "xmlns:" prefixes). But IMHO that has "lets do that later" written all over it :-) (unless peterv wanna do it now of course)

The reason that IMHO Node::lookupNamespaceID("xmlns") is that neither xslt  (http://www.w3.org/TR/xslt#section-Expressions) or xml-infoset (http://www.w3.org/TR/xml-infoset/#infoitem.element) says that the xmlns prefix maps to the xmlns namespace.
*** Bug 112172 has been marked as a duplicate of this bug. ***
Blocks: 113962
Blocks: 114414
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Attached patch v3 (obsolete) — Splinter Review
New year, new patch.
Attachment #58527 - Attachment is obsolete: true
What is the change to the nsXMLContentSink?

in the new attr constructor:
+  if (aNamespaceURI.isEmpty()) {
+    Attr(aName, aOwner);
+  }
I don't think this will init the the namespace correctly if aName is prefixed. 
How about just:
{
  if (aNamespaceURI.isEmpty())
    mNamespaceID = kNameSpaceID_None;
  else
    mNamespaceID = txNamespaceManager::getNamespaceID(aNamespaceURI);

  specified = MB_TRUE;
  String localPart;
  XMLUtils::getLocalPart(nodeName, localPart);
  mLocalName = TX_GET_ATOM(localPart);
  mNamespaceID = txNamespaceManager::getNamespaceID(aNamespaceURI);
}
oh, or how you do it for elements?

Woohooo! Support for disable-output-escape! (Calm down people, it's only for 
standalone, not for mozilla).

In ElementAvailableFnCall and SystemPropertyFunctionCall you resolve prefix-
>namespace using wrong node, you use a node from the source document, but 
should use the element from the xsl-stylesheet. I'd say only 
change ::getNameSpace -> ::getPrefix in this patch and we'll do the right thing 
later.

That's all except for the output handlers and the new member-variables to the 
XSLTProcessor, but we'll take that on irc
hrm.. remove the last line from my code-snippet...
don't assert |NS_ASSERTION(element, "No element to add the attribute to.");|
in txMozillaXMLOutput::attribute. That's not our fault but the fault of the 
stylesheet author.

In txMozillaXMLOutput::characters, shouldn't you always mText.Append? So that 
the <title> get a real childnode? (or will it anyway)

still more to come...
The list of empty elements isn't the same as the one in the XSLT spec, 
intentional?

txXMLOutput::characters outputs too many CDATA start and end "tags". But that 
could wait for later IMHO.
ok, that's it except for the stuff i said on irc (just nsAutoString and <title> 
iirc)
Blocks: 119854
I miss the Element::setAttributeNS hunk from
http://bugzilla.mozilla.org/show_bug.cgi?id=96647#c19

For standalone, you lack to set mOut in XSLTProcessor.cpp:2205, which causes
a crasher on stylesheets defaulting to HTML output. You need to copy the mOut
from the old mOutputHandler to the new one.
Get rid of
     ostream* mOutputStream;
in txXMLOutput.h, AFAICT, it ain't used.

The rest is fine with me.
source/xslt/Makefile.in needs
@@ -38,15 +38,27 @@
                  content_xsl \
                  widget \
                  necko \
+                 unicharutil \
                  $(NULL)
 endif
Attached patch v4 (obsolete) — Splinter Review
Hopefully the last patch. I'll move this bug to the next milestone. Reviews and
superreview are done and will need to be marked after final control of the new
patch. I'll file follow-up bugs for a couple of things that are wrong/need
improvement.
Attachment #58526 - Attachment is obsolete: true
Attachment #63680 - Attachment is obsolete: true
Comment on attachment 65618 [details] [diff] [review]
v4

Module is fine, but a few changes in
standalone need some tweaks.

xml/dom/standalone/dom.h:
make Element a friend of Attr:
class Attr : public NodeDefinition
{
    // AttrMap and Element need to be friends to be able to update the
    // ownerElement
    friend class AttrMap;
    friend class Element;
  public:

xslt/XSLTProcessor.cpp:
::startElement
(setting is more :-))
#ifdef TX_EXE
	    ostream* out;
	    mOutputHandler->getOutputStream(&out);
	    delete mOutputHandler;
	    mOutputHandler = new txHTMLOutput();
	    NS_ASSERTION(mOutputHandler, "Setting mResultHandler to NULL!");
	    mOutputHandler->setOutputStream(out);
	    mResultHandler = mOutputHandler;
#endif

xslt/txHTMLOutput.cpp:
remove
    mHTMLEmptyTags.put(txHTMLAtoms::col, txHTMLAtoms::col);
or add col to the txHTMLAtoms.
the standalone output handlers all need
getOutputStream, I factored those into
txStreamXMLEventHandler.
I'll attach a patch for those, that's faster.

with those, r=axel@pike.org
Attachment #65618 - Flags: review+
Here is the promised patch. Sounds like this is the right solution to the issue

with setOutputStream. getOutputStream was duplicated code, too.
You get a warning (which i think should be a builderror) on
     StackIterator attributeSets((txList*)&mAttributeSetStack);
change it to
     StackIterator *attributeSets = mAttributeSetStack.iterator();
     if (!attributeSets)
         return;
(don't forget to delete)


typo in source/xslt/makefile.win, change |unicharutils| to |unicharutil|


You still have a few |if (mOutputFormat.mIndent)| in txXMLOutput.cpp


with that and Pikes stuff, r=sicking
oh, and you add quite a few tabs to 

Document* XSLTProcessor::process(Document& xmlDocument,
                                 Document& xslDocument)

:(
Actually, IMHO it would be ok to remove that function all together. I'm 
presonally not even convinced that we want it, and if/when we do it'll be easy 
to add. Your call...
Attached patch v5Splinter Review
Attachment #65618 - Attachment is obsolete: true
Attachment #65641 - Attachment is obsolete: true
Patch for bug 96647 should be final, so these will be fixed early in 0.9.9
(really!).
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Comment on attachment 66002 [details] [diff] [review]
v5

Marking jst's superreview.
Attachment #66002 - Flags: superreview+
Here's a few nits I found while looking through this patch:

- What's the deal with String::isEqualIgnoreCase()? It deals with unicode, it's
only case-insensitive for ASCII characters.

- In source/xml/dom/standalone/Document.cpp, inconsistent indentation.

- In txMozillaTextOutput::setOutputDocument(), more error checking needed.

- In txMozillaXMLOutput.cpp:

+nsresult txMozillaXMLOutput::getRootContent(nsIContent** aReturn)
+{
+    NS_ENSURE_ARG_POINTER(aReturn);

This is an internal method, right? If so, no need for the NS_ENSURE_ARG_POINTER,
just don't pass null to this method.

- In txOutputFormat.cpp:

+void txOutputFormat::reset()
+{
+    mMethod = eMethodNotSet;
+    mVersion.clear();
+    if (mEncoding.isEmpty())
+    mOmitXMLDeclaration = eNotSet;

no indentation after if?
Fix checked in. I'll be filing the follow-up bugs soon.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
VERIFIED.

Orange juice to peterv :-)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: