Closed Bug 151002 (txwalker) Opened 22 years ago Closed 21 years ago

abstract the content by a treewalker

Categories

(Core :: XSLT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.6beta

People

(Reporter: axel, Assigned: peterv)

References

(Blocks 1 open bug, )

Details

(Whiteboard: TX_WALKER_BRANCH)

Attachments

(8 files, 14 obsolete files)

2.78 KB, text/html
Details
1.51 KB, text/xml
Details
10.75 KB, text/plain
Details
2.50 KB, application/octet-stream
Details
7.88 KB, patch
Details | Diff | Splinter Review
238.70 KB, patch
sicking
: review+
Details | Diff | Splinter Review
239.80 KB, patch
Details | Diff | Splinter Review
656 bytes, patch
axel
: review+
peterv
: superreview+
Details | Diff | Splinter Review
We are on and off experiencing bad impacts from having to face two 
DOM implementations.
Moving towards a unique interface tailored exactly to what we need might
give us some performance benefits.
Having a treewalker also lets us implement the differences between the
DOM and the XPath datamodel, including XSLT whitespace stripping.
Depends on: 149021
IMHO the walker shouldn't work with Nodes at all. That would put us back to 
square one wrt needing wrappers and needing to hash-lookup them all the time. 
Instead we should operate on a single object, the walker, and use that to 
retrieve all the needed data.

However it might be a good idea to add a |Node* getNode()| function in the 
first stage so that we don't have to convert over all code at once. So we could 
use the walker in some code, and as soon as we interact with some code that 
isn't converted yet we just get the node and use that as argument. However the 
|getNode| function along with the wrappers should be removed as soon as all 
code is converted.
Sicking: this doesn't address how the engine would work though. How would we
handle nodesets? How would we handle extension functions (that can take
nodes/nodesets)? Don't get me wrong, I like the way you want to go but I'm not
sure if it's possible.
Also, any idea of a timeframe? Should I go ahead and finish my wrappers rewrite
(10% speedup), or just drop it and wait for this?
Yes, there are several specifics that needs to be sorted out, most related to 
nodesets. All we might need to do for nodesets is to change

nserror add(Node* aNode)
to
nserror add(txXPathTreeWalker& aWalker)

and

Node* get(int aIndex)
to
void get(int aIndex, txTreeWalker& aWalker)
or maybe
void txTreeWalker& get(aIndex)

(though it might be nice to add some iterating capabilities while we're at it, 
but that's orthogonal)

There is a question of how we would implement the nodesets though. They could 
either hold on to an array of txXPathTreeWalkers, which would mean that we'd 
have the same implementation regardless of how the txXPathTreeWalker is 
implemented. Or we could have separate implementations for module/standalone 
and let the nodeset hold on to the actual objects that


wrt extension-functions. I don't really see the problem with that. We would 
have to convert our current nodesets into something XPCOM-ish that we could 
hand back and forth to whatever the extension-functions will be, no?
the timeframe is pretty big i'd say (which is why I havn't been stressing this 
new way of working very much). So I think that both the new wrappers and the 
childIterator would be good things to have in the meantime.
As I proposed in my whitepaper, bug 149021 is the first step to this.
I don't think we should land this all at once.

Stuff we should find out before phase one are IMHO:

- (hopefully final) name for the class ;-)

- how do we iterate.
  Jonas' stuff is rather close to DOM function calls, mine was a bit apart from
  DOM, but IMHO closer to names common to iterators. (Not the DOM TreeWalker, of 
  course)
  Do we want a separate pattern for iterating attrs and for iterating children?
  (IMHO, Jonas' stuff misses reverse order for attrs and namespaces)

My whitepaper as of now completely lacks the node info stuff, but in the current
stage, that can wait.
Both attempts lack any bootstrapping of the treewalker, too.
forgot, Jonas doesn't have a filter functionality yet.
We kinda need this in the API, as for example DOM XPath doesn't have whitespace
stripping, and (I'm drunk and sick, so don't take my word for it) whitespace stuff
depends on whether it's the source or the style doc.
bootstraping the treewalker needs to be implementation specific (proably some 
#ifdef TX_EXE ctors) since the bootstrap needs to take a "native" node. 
Bootstraping is done in very few places; in xmlParser::getDocumentFromURI, in 
xmlParser::parse and in XSLTProcessor::TransformDocument.

Where do we need reversed iterating of attributes/namespaces? If we do then we 
could just add previousAttribute/previousNamespace, but I don't think we 
need'em.

The reason that I didn't do iterator-style functions is that we're not 
guaranteed to finish each iteration before starting the next. For example, 
would the following work:


iter.iterateChildren();
Node* tmp = 0;
while ((tmp = iter.getNode())) {
  doFoo(iter);
  iter.forward();
}

doFoo(iter) {
  iter.iterateChildren();
  Node* tmp = 0;
  while ((tmp = iter.getNode())) {
    // do stuff
    iter.forward();
  }
  iter.goParent();
}


and what would happen if you did

iter.iterateChildren();
iter.forward();
iter.goParent();
iter.forward();


Filtering needs to be specified at bootstraping. Basically the only parameter 
to whitespace-filtering is "the set of element-names for which whitespace 
should be preserved". After that whitespace-stripping should always be done 
according to the same rules.

It might be convinient to be able to only step through certain node-types nodes 
when walking the XSLT document, so a |nextElementSibling| and/or 
|nextTextnodeSibling| functions might be a good idea. Lets see what bug 149021 
gives us.
ok, I didn't grok the code snippets to full detail, but I don't see a problem
with the method names and not fully iterating thru the siblings before changing
the hierarchy.

We should be more clear about the term "filter".
WhiteStripFilter is whitespace stripping according to
http://www.w3.org/TR/xslt#strip
ElementFilter is just returning elements. I wonder if we'd need any other 
non-stripping filter.

stripping:
As I read it, there's no way that a node may or may not be stripped, depending
on the context in the transformation. That is specified by xsl:strip-space,
xsl:preserve-space and @xml:space.
Therefor stripping is really just a matter of bootstrapping (and the copy 
constructor).
We have two classes of stripping, one for the stylesheet plus friends 
(xsl:include and xsl:import), preserving only xsl:text, and one for source docs
(document(), too, AFAICT), preserving and stripping as given in the stylesheet.
We should probably share that data, via a ProcessorState. Sharing globally would
hork separate instances.

Filter could be moved to a iterateElems supplimentary method.
Looking at what we have here, it may be useful to have separate APIs for
- children
- attributes
- child elements
as a separate API removes us from having state management in the walker or
state arguments in the call. Plus, having one entry point is probably bad for
inlining the code.
Hmm.. their walker is a bit more then a walker. It is an entire XPath engine
since the walker has to implement:

XPathExpression Compile(String)
Object Evaluate(XPathExpression)
and
bool Matches(XPathExpression)
*** Bug 149021 has been marked as a duplicate of this bug. ***
I did a profile of module, and the topmost culprit was the nsCOMPtr/QI in
Node::getNodeType.
We should (maybe in a new bug) add a 
MBool ::IsOfType(txContentType aType)
to the walker, going into nsIContent::IsContentOfType for module elements and
text nodes.
txContentType would be an enum of eElement, eText and eAttribute.
(Peter's tree uses nsIContent as core interface, so we should be able to use
what that offers.)
Blocks: 197956
peterv handed out a patch to Jonas and me, I created a branch with his stuff.
Follow it here:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=TX_WALKER_BRANCH&branchtype=match&dir=mozilla%2Fextensions%2Ftransformiix&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=week&mindate=&maxdate=&cvsroot=%2Fcvsroot
The tag is TX_WALKER_BRANCH, the current base is TX_WALKER_20030711_BASE, which
is the state of MOZILLA_1_5a_BASE
Whiteboard: TX_WALKER_BRANCH
added missing files in xpath, fixed crashers in txNodeSet.
I have a compiling tree for standalone now, but I'm not happy with it yet.

As there are no comments in txXPathTreeWalker.h, what are
    PRBool moveToFirstPrecedingInDocOrder();
    PRBool moveToNextPrecedingInDocOrder();
supposed to do?
... so I got my tree down to broken numbering and an issue with select59, 
something with keys and unions.
Not sure why I'm loosing one xmlns right now, will look later.
Still need to give some serious review to that code before I check it in.
I have some non-standalone issues in my tree, too, btw.
... landed my standalone tree.
Fixed issues with txNodeSet::mDirection that crash.
Fixed a hickup in txStylesheet that prevented whitespace stripping from working.

On the topic of txXPathNodeFilter and txXPathNodeUtils::isNodeOfType, IMHO, the
values in txXPathNodeFilter should be those of nsIContent.
Two issues I'd like to see fixed on the nsIContent side of things, CDATA sections
don't claim to be eTEXT and there is no test for processing instructions. Gonna
bug jst.

All I did so far is building standalone/regression hunting, no kind of review so
far.
Attached patch changes to content (obsolete) — Splinter Review
These are the changes to content that we need.
nsITextContent should be able to append, and nsIContent::IsContentOfType fixups

for pis and CDATA sections.
Comment on attachment 128313 [details] [diff] [review]
changes to content

ready for review. In case you wonder about NS_ConvertASCIItoUCS2, there is no
AppendASCIItoUCS2 in nsReadableUtils.h :-(
Attachment #128313 - Flags: superreview?(jst)
Attachment #128313 - Flags: review?(caillon)
Jonas, we need some profiling here.
My build takes about twice the time as 1.4. Not sure where that's coming from,
I hope it's something simple, like, some hickup in the new nodeset.
The changes needed to build module are on the branch, I didn't touch the code
about the warnings in txXPathNode yet.
did some profiling, gotta be my windows build. no idea what happens.

firebird trunk:
chess-fo/chess.xsl: 70.016 secs
jenitennison/page.xsl: 63.907 secs
jenitennison/markme.xsl: 19.446 secs
schematron/schematron-basic.xsl: 58.431 secs

firebird branch:
chess-fo/chess.xsl: 68.13 secs
jenitennison/page.xsl: 59.184 secs
jenitennison/markme.xsl: 18.462 secs
schematron/schematron-basic.xsl: 37.954 secs

Most impact on the schematron, most significant difference was in 
PathExpr::evaluate. Though that's not the hottest girl in town in those profiles
Comment on attachment 128313 [details] [diff] [review]
changes to content

>Index: content/base/public/nsIContent.h
>===================================================================

>+    /** xml processing instructions */
>+    ePROCESSINGINSTRUCTION = 0x00000020

ePROCESSING_INSTRUCTION?

>Index: content/base/src/nsGenericDOMDataNode.cpp
>===================================================================

>+NS_IMETHODIMP
>+nsGenericDOMDataNode::AppendTextTo(nsAString& aResult)

>+    aResult.Append(NS_ConvertASCIItoUCS2(mText.Get1b(),
>+                                         mText.GetLength()).get(),
>+                   mText.GetLength());

Add a XXX comment that this needs AppendASCIItoUTF16.

>Index: content/html/content/src/nsAttributeContent.cpp
>===================================================================

>+NS_IMETHODIMP
>+nsAttributeContent::AppendTextTo(nsAString& aResult)

>+    aResult.Append(NS_ConvertASCIItoUCS2(mText.Get1b(),
>+                                         mText.GetLength()).get(),
>+                   mText.GetLength());

Ditto.
Please make sure that this doesn't confuse any code that uses eTEXT (it now
gets CDATA sections too).
Attachment #128313 - Flags: superreview?(jst) → superreview+
Comment on attachment 128313 [details] [diff] [review]
changes to content

checked in
Attachment #128313 - Attachment is obsolete: true
Attached file testcase for crasher
This testcase gives a crash. It's a standalone version of namespace44 of the
xalan test cases. If you try to reproduce the crash in buster, be patient.
It depends on the js GC kicking in, so it's hardly reproducable unless you
modify
your tree to reliable GC whenever possible.
The crash comes from an additional Release on the source document, AFAICT.
Suspects are either the union expr (which would indicate a problem with merging

nodesets) or the xsl:number special axis. I currently suspect the latter, but
I am not sure. I'll attach a refcount log for nsDocument in a second, the 
offending document is 0x08233F40, if I'm not completely mistaken.
bah, too large.
I uploaded the file to
http://www.axel.pike.org/mozilla/Bug151002-nsDocument-refcnt-log.gz
Here is the balance tree (well the relevant part of it). Much easier to handle
than the file I posted.
I blame
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/extensions/transformiix/source/xpath/txMozillaXPathTreeWalker.cpp&rev=1.1.2.3#741

not addrefing. Gonna check on that later, if you guys don't beat me at it.
That definitely needs an addref or

mPosition = nsnull;
document.swap(mPosition.mDocument);
This doesn't fix the testcase, basically because we hork the document order in
the content sink, bug filed on that.
Without this change, content may be null and IndexOf does the wrong thing.
We can handle this differently, though, not sure.
This should fix as well as speed up txXPathNodeUtils::comparePosition
Attached patch basic walker (obsolete) — Splinter Review
This patch is a first take on a basic walker. It can be a bit further optimized
but i wanted to wait with that so that we can compare the speed of this with
the speed of the "axis walker"
Attached patch basic walker alternative (obsolete) — Splinter Review
This is an alternative way of using the basic walker. The only difference is in
how the walker is used, the walker itself is exactly the same as in attachment
130431 [details] [diff] [review].

The difference is that in recursions through the tree (in copy-of, key(),
LocationStep::fromDescendants etc) this one does the same thing as the
axis-walker and creates a new walker for each level. The advantage of this is
that we don't end up using/creating mDescendants. The dissadvantage is that we
end up doing more refcounting.

I'm still not sure which way is faster, though i expect that this second one
is. (If/when we move to not refcounting as we move around in the tree this will
definitly be faster)
Comment on attachment 129806 [details] [diff] [review]
Fix txXPathNodeUtils::comparePosition

Checked in.

I also removed the |aNode.isDocument() && aOtherNode.isDocument()| test since I
realized that we'll deal with that case in the |document != otherDocument| test
further down anyway, and it's such an uncommon case that we'll save cycles by
removing it.

I also realized that we can remove the |aNode.isDocument()| tests too and just
rely on the last return to take care of that. I'll attach a patch that does
that.
Attachment #129806 - Attachment is obsolete: true
This one does make calls to comparePosition slower for the case when one of the
nodes is the root node since we'll then get the entire parent-chain of the
other node before bailing. However it should be an uncommon case since you'd
basically have to do expressions like "foo | /" or "//." which i've never seen
in a stylesheet.

Let me know what you think.
I don't think this really buys us much. However, you can always use |if
(!content)| and |if (!otherContent)| instead of the isDocument checks in your patch.
Comment on attachment 130940 [details] [diff] [review]
further tweaks to txXPathNodeUtils::comparePosition

naah, calling isDocument should be just as fast and think the current code is
clearer. Let's just drop this one
Attachment #130940 - Attachment is obsolete: true
txXPathTreeWalker::moveTo is broken. It should reset the local state, or at
least copy it from the other walker.
I doubt that txCopyBase::copyNode in txInstructions.cpp:334 really wants to create
a temporary walker and move the other walker there. Not that I'm convinced that
this will compile on all platforms in the first place ;-).
(That same code needs to put xmlns attrs in the result, too.)

txMozillaXPathTreeWalker could share the code to walk to the first next sibling
of a parent, used for the following axis methods.
> (That same code needs to put xmlns attrs in the result, too.)

Er, no.
txCopyBase::copyNode does need to add the local namespace definitions to the 
output. Not doing so would be a regression. Well, it is right now. output105.
Guess you don't see that regression on module as I hacked around ignoring the
xmlns stuff in DiffDom.
xmlns attributes nodes are not attribute nodes in the XPath datamodel. Unless
I'm misunderstanding the spec, I don't see why this would be a regression rather
than a fix. xmlns attributes need to be added in the standalone serializers.
Well, copy-of copies all attributes and namespace nodes to the result. Which calls
for having xmlns attrs in the result. That kinda worked on standalone with the old
code, and doesn't on the branch. I still think that module should have xmlns 
attrs, too.
The interesting question is, did the output105 test succeed on standalone just
by coincidence? Maybe. I may let this one slip and reinforce bug 76842.
> Well, copy-of copies all attributes and namespace nodes to the result. Which
> calls for having xmlns attrs in the result.

When you serialize. We can provide a hint to the serializer as to which prefix
to use. xmlns attributes are not attributes in XPath, we shouldn't have them in
the output in module.

> That kinda worked on standalone with the old code, and doesn't on the branch.
> I still think that module should have xmlns attrs, too.

It used to work as a side-effect of a bug. I don't see why we'd want xmlns
attributes in the output in module. This calls for namespace fixup, even if
you'd convince me that we want xmlns attributes in module. There's no guarantee
that the source tree has the necessary xmlns attributes anyway.
This basically shows that there is no difference between the different trees.
On the first run the axiswalker seemed slightly faster (.3%) but on a second
run it was equally much slower. So in the end I think we should go with
whatever is arcitecually nicer, which to me of course is the basic walker.

Weather we use a single walker to walk an entire subtree or if we create a new
at each level doesn't matter much to me. So either attachment 130431 [details] [diff] [review] or
attachment 130435 [details] [diff] [review] is fine. The version creating a new walker at each level
(130435) might gain somewhat more from the upcomming nsIContent
decomtamination, but looking at the numbers produced I doubt it matters much.
Attached patch basic walker sync-to-tip (obsolete) — Splinter Review
This is the same version as attachment 130431 [details] [diff] [review], i.e. we use a single walker to
walk subtrees. I've only synced to the tip as well as fix the standalone
version of txXPathNodeUtils::getNodeValue so now standalone runs.
This is my rationale why having an axes-oriented walker is cool. This patch 
get's rid of the switch statement (which was oddly ordered, in the first place)

in LocationStep::evaluate in favour of two maps of member function pointers.
(If the concept is as new to you as to me, and your C++ books suck as mine, I 
got my info from
http://new-brunswick.net/workshop/c++/faq/pointers-to-members.html).

The descendant-or-self is a tad out-of-line, but I think it's worth to have
moveToNextDescendant only deal with content nodes.

Oh, and yes, +58/-132, just for Jonas.
So why is this better? If +58/-132 is the argument then shouldn't you take into
account that the axiswalker is 690 lines longer?
And i'm not fully against having axiswalking functions. However if we do then
IMHO we should

a) Implement them on top of a basic walker to avoid duplicating the
   axisfunctionality.

b) Implement them when we need axes in more then one place. I don't care too much
   about this though, we could implement it right away if you want.
Attached patch basic walker latest (obsolete) — Splinter Review
This is the latest walker. The only difference is in the walker itself that is
more optimized, such as remove unneccesry calls to GetChildCount and make the
indexarray be an inline member rather then a nsAutoPtr. I also optimized
getNodeValue to use nsITextContent as much as possible.

This uses the 'alternative' way of iterating subtrees, as described in comment
32.

Attaching on request from peterv
Attachment #130431 - Attachment is obsolete: true
Attachment #130435 - Attachment is obsolete: true
Attachment #132177 - Attachment is obsolete: true
The branch should me more or less ready to land right? Modulo reviews of course.

Or does anyone have any major changes that needs to be adressed before reviewing
is meaningfull?
Step 1 seems to be get leaf to copy NodeSet.* to txNodeSet.*. I'm currently
comparing trees to make sure we're up to date with the trunk changes.
Tentatively putting this in 1.6b.
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.6beta
Maybe we should take the opportunity to move xml/parser into xml/. We also need
to remove at least one level from xml/dom/standalone (do we want to remove two?)
and prefix the files coming from xml/dom/standalone with tx.
IMHO those things could wait until later, at least moving/renaming the
standalone dom. The reason is that that should probably be rewritten from
scratch anyway (atomized and fewer classes).

Plus I generally dislike cleanup and functional changes in the same patch.
The point is that we need to have files copied and giving leaf a list once is
better that three lists at different times. I didn't say we needed to switch to
using the new files in this bug, we just need to make sure we do it in this
milestone which I would ensure.
Hmm, actually, we might not be able to copy NodeSet since it's already on the
branch :-/. Oh well.
I'll attach a patch to the trunk that ports the cosmetic changes on the branch
to the trunk to cut down the real diff.
Attached patch Cleanup (obsolete) — Splinter Review
Here's a patch to the trunk with cleanup to ease merging the branch. It doesn't
include the actual move of NodeSet.*. My plan is to copy the current trunk
versions to txNodeSet.*, apply the patch to the new copy and then check in this
patch.
Attachment #134736 - Flags: review?(bugmail)
Note to self: output26 and output86 has regressed but might be fixed by bug 224231
Cleanup checked in. NodeSet is now txNodeSet.
Started reviewing, this is what i've got so far:

General comments:
In module, do we want to be able to deal with nodes that are not in the main
tree? i.e. nodes with mDocument==nsnull? IMHO we should since people might
perform DOM-XPath and XSLT on such nodes. If so we need to make the walker able
to deal with GetDocument returning null, and the rest of the code needs to be
able to deal with txXPathNodeUtils::getOwnerDocument returning null.

You need to make all callers of txXPathNativeNode::createXPathNode be able to
deal with null-returns (OOM)

The module walker needs to be OOM-safe. (havn't checked yet if the standalone
one is)

I'm slightly nervous about txForwardContext holding on to a reference rather
then containing a txXPathNode. Have you checked that all callers use an argument
that outlives the context? For example if they use a node from a nodeset the
nodeset isn't modified before the context dies.

Why does the definition of txXPathNodeUtils live in txXPathTreeWalker.h rather
then txXPathNode.h?



In txParseDocumentFromURI:
I'd prefer if |nsIDOMDocument* theDocument| was an nsCOMPtr, an extra
addref/release should be insignificant compared to loading and parsing a doc.


In LocationStep:
Please prefix the arguments to fromDescendants and fromDescendantsRev with 'a'
since those functions are basically rewritten anyway.


In PathExpr::evaluate:
|tmpNodes->addAndTransfer(resNodes)| might not be safe for the first step since
for example keys can return a nodeset owned by somebody else. You could use
|recycler()->getNonSharedNodeSet|


In RelationalExpr::compareResults:
Why change the |switch| to |if|-statements? That is likly to be slower on some
compilers.


In UnionExpr::evaluate:
Please fix the tabs.


In txNodeSet.cpp:
The change of licenceplate in txNodeSet looks wrong. Same in txNodeSet.h. There
are some random tabs in this file.

In txNodeSet::txNodeSet(const txXPathNode& aNode...:
+    if (!ensureGrowSize(1)) {
+        NS_ERROR("out of memory");
Do we really want to assert on OOM?

In a few places in txNodeSet:
+    txXPathNode* temp = new(mStart) txXPathNode(aNode);
+    if (!temp) {
Can this really fail? I don't see how that could happen since the memory is
already allocated.

+txNodeSet::~txNodeSet()
+{
+    NS_ASSERTION(!mMarks, "should've swept before this is going away");
+    delete [] mMarks;
I think mMarks can be non-null if we bail before calling sweep on the nodeset
and then never end up recycling that nodeset so the assertion should probably go.

Why the change in the way nodesets are merged (txNodeSet::add). AFAICT the old
algorithm was faster.

In txNodeSet::append(const txXPathNode& aNode):
+    --mStart;
+    txXPathNode* temp = new(mStart) txXPathNode(aNode);
+    NS_ENSURE_TRUE(temp, NS_ERROR_FAILURE);
Need to increase mStart in case of failure (if we care to look for failure).

In txNodeSet::clear:
The |if (mStart)|-test shouldn't be needed when TX_DONT_RECYCLE_BUFFER isn't
defined.

In txNodeSet::indexOf:
Can't this function be called during predicates when the nodeset is inverted,
i.e. doesn't it have to deal with reversed nodesets? You could get rid of
|counter| and just return |pos - mStart|. The |mStart == mEnd| test isn't really
needed.

In txNodeSet::get:
+    ++aIndex;
+    return mEnd[-aIndex];
|return mEnd[-aIndex - 1]| ?

In txNodeSet::ensureGrowSize:
+    if (mDirection == kReversed &&
+        mStart && mStartBuffer && aSize <= mStart - mStartBuffer) {
+        return PR_TRUE;
+    }
The mStart and mStartBuffer tests shouldn't be needed. Same in the |mDirection
== kForward|-section and the |oldSize|, |oldLength| and |newLength| initializations.

+    if (oldSize > 0) {
+        txXPathNode* dest = newArr;
+        if (mDirection == kReversed) {
+            dest += newLength - oldSize;
+        }
If you move setting |dest| to outside the |if| you can use it when setting new
mStart/mEnd further down.

In txNodeSet.h
+     * the nodeset as required. Do only call append(aNode), get(), mark()
append(aNodeSet)
In NodeSetFunctionCall::evaluate:
                 case NAME:
                 {
-                    switch (node->getNodeType()) {
-                        case Node::ATTRIBUTE_NODE:
-                        case Node::ELEMENT_NODE:
-                        case Node::PROCESSING_INSTRUCTION_NODE:
+                    switch (txXPathNodeUtils::getNodeType(node)) {
+                        case txXPathNodeType::ATTRIBUTE_NODE:
+                        case txXPathNodeType::ELEMENT_NODE:
+                        case txXPathNodeType::PROCESSING_INSTRUCTION_NODE:
You cant remove this entire |switch| now since txXPathNodeUtils::getNodeName
should return the right thing for all nodetypes.

In txExecutionState.cpp:
-      mRTFDocument(nsnull),
+//      mRTFDocument(nsnull),
Feel free to remove this compleatly, it isn't used any more.

In class txLoadedDocumentEntry:
+    txLoadedDocumentEntry(const txLoadedDocumentEntry& aToCopy)
+        : nsStringHashKey(aToCopy)
     {
     }
Feel free to add an assertion in there since if this was called we'd be horked.

In txMozillaStylesheetCompiler.cpp:
Is the added #include needed?

In txMozillaXSLTProcessor::TransformDocument:
+    txExecutionState es(mStylesheet);
+
+    nsAutoPtr<txXPathNode>
sourceNode(txXPathNativeNode::createXPathNode(aSourceDOM));
This seems dangerous since |sourceNode| will be destroyed before |es| and es
might hold a reference to sourceNode. Switch the two around should be safe. Same
in a all other ::Transform* functions in this class.

-    // Create wrapper for the source document.
-    mSource = aSourceDOM;
Doesn't notifyError() still need this?

In txXSLTNumber::getValueList level=single:
+               
aValues.add(NS_INT32_TO_PTR(getSiblingCount(walker.getCurrentPosition(),
countPattern,
Shouldn't this be just |getSiblingCount(walker, ...|? Seems strange that this
compiles, do we need to add |explicit| to the walker ctor?

In txXSLTNumber::getValueList:
+        PRBool hasAncestor = PR_TRUE;
         MBool matchedFrom = MB_FALSE;
-        while (node) {
-            if (aFromPattern && node != currNode &&
-                aFromPattern->matches(node, aContext)) {
+        while (hasAncestor) {
Seems like what you want is |do {} while()|

In txXSLTNumber::getSiblingCount:
+    txXPathTreeWalker walker(aWalker.getCurrentPosition());
You could just use aWalker directly, the callers only rely on the walker staying
within the same parent. Though you should probably comment on this in the
functiondeclaration.

+    PRBool hasSibling = walker.moveToPreviousSibling();
+    while (hasSibling) {
+        if (aCountPattern->matches(walker.getCurrentPosition(), aContext)) {
             ++value;
         }
-        node = node->getPreviousSibling();
+        hasSibling = walker.moveToPreviousSibling();
     }
Just do |while(walker.moveToPreviousSibling()) {|.

In txIdPattern::matches:
+    nsresult rv = txXPathNativeNode::getElement(aNode, &elem);
+    NS_ENSURE_SUCCESS(rv, PR_FALSE);
An assertion should be enough since you've already verified that the node is an
element.

In txKeyPattern::matches:
+    txXPathNode* contextDoc = txXPathNodeUtils::getOwnerDocument(aNode);
Needs to be an nsAutoPtr

In txStepPattern::matches:
-    if (!mIsAttr && !aNode->getParentNode())
+    txXPathTreeWalker walker(aNode);
+    // XXX isn't this wrong when mIsAttr is set
+    if (!walker.moveToDOMParent() && !mIsAttr)
Isn't attribute-step pattern matching broken since we'll never walk to the
parent of the attribute. The easiest thing would probably be to do something like:
if ((!mIsAttr && walker->getNodeType() == txXPathNodeType::ATTRIBUTE_NODE) ||
    !walker.moveToParent()) {
    return PR_FALSE;
}
That should get rid of the need for the moveToDOMParent function.

In GenerateIdFunctionCall::evaluate:
+        nsRefPtr<StringResult> strRes;
If you make this a rawpointer you can save yourself an addref. Nothing between
getting the strRes and returning can fail anyway so there's no risk of leaks.
Same thing twice in that function.

Feel free to make the code use FunctionCall::evaluateToNodeSet while you're there.

In txKey.h and txKeyFunctionCall.cpp:
In general, it is not enough to key on the hash-value on the document, just
because two documents have the same hash-value doesn't mean they are the same
document (though that happens to be true in our implmenetations). Either put the
entire txXPathNode into the two key-hashes or rename
txXPathNodeUtils::getHashKey to something like
txXPathNodeUtils::getUniqueIdentifier (and change the name of the hash-members
to something like mDocumentIdentifier).


Just have the new files to go!
In txMozillaXPathTreeWalker.cpp:

txXPathTreeWalker::setTo probably belongs in txXPathNodeUtils?

IMHO txXPathTreeWalker::setTo doesn't need to test what kind of pointer is used.
Same in the txXPathNode copy-ctor and txXPathNode::operator==.

txXPathNodeUtils::isSameNode isn't used or implemented.

Please remove the empty txXPathNode dtor (or at least inline it).

txXPathNodeUtils::getAttr doesn't need the IsContentOfType-call.

What's the logic for atomized txXPathNodeUtils::getLocalName returning true or
false? Does it really need to return anything or is a nullname enough indication
of a missing name? What i'm mostly worried about is it returning null and
PR_TRUE at the same time.

txXPathNodeUtils::getNamespaceID will return the xhtml-namespace for
html-elements rather then null. Do we care?

No need for the temporary in txXPathNodeUtils::getNamespaceURI.

I just realized that all callers of txXPathNodeUtils::getOwnerDocument will have
to check for null regardless of what we decide on on the GetDocument-issue. For
OOM reasons. Same goes for all txXPathNativeNode::createXPathNode functions.

You should probably assert if txXPathNodeUtils::getElementById is called on a
node-type other then documents rather then just return null. (Though hopefully
we can get rid of this function entierly)

Please fix createXPathNode(nsIDOMNode* aNode) to deal with attributes not
implementing nsIAttribute to avoid regressing xpath on xul.

txXPathNodeUtils::getAttr should probably truncate the string at the top of the
function.


In txStandaloneXPathTreeWalker.cpp:
remove the XXX comments in txXPathTreeWalker::moveTo(First|Last)Child. The
answer to the question is "no".

remove the XXX comments and the attribute/document checks from
moveTo(Next|Previous)Sibling. The checks are not needed.

Please remove the empty txXPathNode dtor (or at least inline it).

txXPathNodeUtils::getAttr should probably truncate the string at the top of the
function.

txXPathNodeUtils::getNodeValue could use |mInner->nodeValue| directly rather
then calling |mInner->getNodeValue|.

Ugh, looks like the last patch i attached to this bug used the wrong
txStandaloneXPathTreeWalker.cpp. The correct one is in attachment 132177 [details] [diff] [review]. The
only difference is the implementation of txXPathNodeUtils::getNodeValue. Sorry
about that. (Make the helperfunction there use mInner->nodeValue directly too).

You should probably assert if txXPathNodeUtils::getElementById is called on a
node-type other then documents rather then just return null. (Though hopefully
we can get rid of this function entierly)

The |#define kFmtSizeAttr|s can be removed.


In txXPathNode.h:
Would it be better to make the txXPathNode copy-ctor public then to make
txNodeSet a friend?


In txXPathTreeWalker.h
txUint32Array::RemoveValuesAt isn't used.

+#ifdef TX_EXE
+#else
Change to |#ifndef|


That's it!!
Found a couple more things:

In txExecutionState::init:
+    // Set up loaded-documents-hash
+    rv = mLoadedDocuments.init(txXPathNodeUtils::getOwnerDocument(aNode));
The node created here is leaked.

txSingleNodeContext has the same issue of holding a reference as
txForwardContext. Might not actually be a problem but I wanted to check.
Hrm. Standalone doesn't even compile.

txStandaloneXPathTreeWalker.cpp
c:/Sources/standalone/walker-tree/bopt/extensions/transformiix/source/xpath/../.
./../../../mozilla/extensions/transformiix/source/xpath/txStandaloneXPathTreeWal
ker.cpp(347) : error C2039: 'moveToFirstDescendant' : is not a member of 'txXPat
hTreeWalker'
        c:/Sources/standalone/walker-tree/bopt/extensions/transformiix/source/xp
ath/../../../../../mozilla/extensions/transformiix/source/xpath/txXPathTreeWalke
r.h(78) : see declaration of 'txXPathTreeWalker'
c:/Sources/standalone/walker-tree/bopt/extensions/transformiix/source/xpath/../.
./../../../mozilla/extensions/transformiix/source/xpath/txStandaloneXPathTreeWal
ker.cpp(357) : error C2039: 'moveToNextDescendant' : is not a member of 'txXPath
TreeWalker'
        c:/Sources/standalone/walker-tree/bopt/extensions/transformiix/source/xp
ath/../../../../../mozilla/extensions/transformiix/source/xpath/txXPathTreeWalke
r.h(78) : see declaration of 'txXPathTreeWalker'
make: *** [txStandaloneXPathTreeWalker.obj] Error 2
Pike: see comment 62, the sentence starting with "Ugh". BTW, I'm not on the
branch anymore so it doesn't have my latest tree.
Attached patch Current patch (obsolete) — Splinter Review
> Why the change in the way nodesets are merged (txNodeSet::add). AFAICT the old
> algorithm was faster.
The previous code did some pretty strange "who's smaller" comparisons about 
position in the nodeset or something, which don't matter. The only thing that 
matters is who's next to add. And that takes turns between the two nodesets, 
independent of the position. I switched this over to take codes from one 
set, then the other. As we need to use different transferOps for this and 
aNodes, this code looks uglier that it appears to need.
I can see pretty easy ways to confuse the
 if (thisPos > otherPos) {
in the old code, so I don't see the "faster".

> Can't this function be called during predicates when the nodeset is inverted?
No, the txNodeSetContext in ::evaluatePredicates knows the proximity position.

> append(aNodeSet)
no, that's not allowed in reversed mode.
At every step we do the following:
Take the last node from one nodeset (call this nodeset A) and serach where to
insert this node into the other nodeset (B). Then copy all nodes after that
insertposition to the resultset and copy the last node in A into the resultset*.

There are two reasons why choosing A and B so that A.size() < B.size is a good
thing:
1. The best solution is the one where the chunk of nodes copied from B is as big
   as possible. Assuming we know nothing about the relative positions of the
   nodes the best way to get a big chunk is to choose a big B. 
2. In the worst case the size of the chunk copied from B is zero. In that case
   we will move a single node from A into the resultset. A will still stay the
   smallest nodeset so if the worst case continues we will empty out A as fast
   as possible if A is always the smaller nodeset.

[*] Right now the current branchcode doesn't do the copy from A into the
resultset, but we should fix that.
what we do with this patch:

Take the last node from one nodeset (call this nodeset A) and serach where to
insert this node into the other nodeset (B). Then copy all nodes after that
insertposition to the resultset. Period. Difference to old code.
Now take the last node of B and find it's insert position into A. Move all
nodes after that within A to the end.

See how that's faster? You don't move the single node and start from scratch but
you just move the nodes chunk by chunk, taking turns.
Attached patch Current patch (obsolete) — Splinter Review
Attachment #132822 - Attachment is obsolete: true
Attachment #134736 - Attachment is obsolete: true
Attachment #134989 - Attachment is obsolete: true
> In txParseDocumentFromURI:
> I'd prefer if |nsIDOMDocument* theDocument| was an nsCOMPtr, an extra
> addref/release should be insignificant compared to loading and parsing a doc.

I see no need.

> In LocationStep:
> Please prefix the arguments to fromDescendants and fromDescendantsRev with 'a'
> since those functions are basically rewritten anyway.

Done.

> In PathExpr::evaluate:
> |tmpNodes->addAndTransfer(resNodes)| might not be safe for the first step since
> for example keys can return a nodeset owned by somebody else. You could use
> |recycler()->getNonSharedNodeSet|

Could you revisit this? We call getNonSharedNodeSet just before the call to
addAndTransfer.

> In UnionExpr::evaluate:
> Please fix the tabs.

Done.

> In txNodeSet.cpp:
> The change of licenceplate in txNodeSet looks wrong. Same in txNodeSet.h.
> There are some random tabs in this file.

Removed the tabs. What's wrong with the license? (see
http://www.mozilla.org/MPL/boilerplate-1.1/mpl-tri-license-c)

> In txNodeSet::txNodeSet(const txXPathNode& aNode...:
> +    if (!ensureGrowSize(1)) {
> +        NS_ERROR("out of memory");
> Do we really want to assert on OOM?

Removed the assert. I thought you wanted to assert? :-)

> In a few places in txNodeSet:
> +    txXPathNode* temp = new(mStart) txXPathNode(aNode);
> +    if (!temp) {
> Can this really fail? I don't see how that could happen since the memory is
> already allocated.

Yeah, I don't think it can fail so I removed the error-checking.

> +txNodeSet::~txNodeSet()
> +{
> +    NS_ASSERTION(!mMarks, "should've swept before this is going away");
> +    delete [] mMarks;
> I think mMarks can be non-null if we bail before calling sweep on the nodeset
> and then never end up recycling that nodeset so the assertion should probably
> go.

Done.

> In txNodeSet::append(const txXPathNode& aNode):
> +    --mStart;
> +    txXPathNode* temp = new(mStart) txXPathNode(aNode);
> +    NS_ENSURE_TRUE(temp, NS_ERROR_FAILURE);
> Need to increase mStart in case of failure (if we care to look for failure).

Removed error-checking.

> In txNodeSet::clear:
> The |if (mStart)|-test shouldn't be needed when TX_DONT_RECYCLE_BUFFER isn't
> defined.

What if mStart == mEnd == nsnull?

> In txNodeSet::get:
> +    ++aIndex;
> +    return mEnd[-aIndex];
> |return mEnd[-aIndex - 1]| ?

Done

> In NodeSetFunctionCall::evaluate:
>                  case NAME:
>                  {
> -                    switch (node->getNodeType()) {
> -                        case Node::ATTRIBUTE_NODE:
> -                        case Node::ELEMENT_NODE:
> -                        case Node::PROCESSING_INSTRUCTION_NODE:
> +                    switch (txXPathNodeUtils::getNodeType(node)) {
> +                        case txXPathNodeType::ATTRIBUTE_NODE:
> +                        case txXPathNodeType::ELEMENT_NODE:
> +                        case txXPathNodeType::PROCESSING_INSTRUCTION_NODE:
> You cant remove this entire |switch| now since txXPathNodeUtils::getNodeName
> should return the right thing for all nodetypes.

I prefer calling getEmptyStringResult.

> 
> In txExecutionState.cpp:
> -      mRTFDocument(nsnull),
> +//      mRTFDocument(nsnull),
> Feel free to remove this compleatly, it isn't used any more.

Well, it's a reminder that we need a document for RTFs.

> In txMozillaStylesheetCompiler.cpp:
> Is the added #include needed?

Removed.

> In txMozillaXSLTProcessor::TransformDocument:
> +    txExecutionState es(mStylesheet);
> +
> +    nsAutoPtr<txXPathNode>
> sourceNode(txXPathNativeNode::createXPathNode(aSourceDOM));
> This seems dangerous since |sourceNode| will be destroyed before |es| and es
> might hold a reference to sourceNode. Switch the two around should be safe.
> Same in a all other ::Transform* functions in this class.

Done.

> -    // Create wrapper for the source document.
> -    mSource = aSourceDOM;
> Doesn't notifyError() still need this?

Added back.

> In txXSLTNumber::getValueList level=single:
> +               
> aValues.add(NS_INT32_TO_PTR(getSiblingCount(walker.getCurrentPosition(),
> countPattern,
> Shouldn't this be just |getSiblingCount(walker, ...|? Seems strange that this
> compiles, do we need to add |explicit| to the walker ctor?

Added explicit to the ctor's and the compiler choked. Done.

> In txXSLTNumber::getValueList:
> +        PRBool hasAncestor = PR_TRUE;
>          MBool matchedFrom = MB_FALSE;
> -        while (node) {
> -            if (aFromPattern && node != currNode &&
> -                aFromPattern->matches(node, aContext)) {
> +        while (hasAncestor) {
> Seems like what you want is |do {} while()|

Done.

> +    PRBool hasSibling = walker.moveToPreviousSibling();
> +    while (hasSibling) {
> +        if (aCountPattern->matches(walker.getCurrentPosition(), aContext)) {
>              ++value;
>          }
> -        node = node->getPreviousSibling();
> +        hasSibling = walker.moveToPreviousSibling();
>      }
> Just do |while(walker.moveToPreviousSibling()) {|.

Done.

> In txKeyPattern::matches:
> +    txXPathNode* contextDoc = txXPathNodeUtils::getOwnerDocument(aNode);
> Needs to be an nsAutoPtr

Done.

> In GenerateIdFunctionCall::evaluate:
> +        nsRefPtr<StringResult> strRes;
> If you make this a rawpointer you can save yourself an addref. Nothing between
> getting the strRes and returning can fail anyway so there's no risk of leaks.
> Same thing twice in that function.

Done.

> Feel free to make the code use FunctionCall::evaluateToNodeSet while you're
> there.

Done.

> Ugh, looks like the last patch i attached to this bug used the wrong
> txStandaloneXPathTreeWalker.cpp. The correct one is in attachment 132177 [details] [diff] [review]. The
> only difference is the implementation of txXPathNodeUtils::getNodeValue. Sorry
> about that. (Make the helperfunction there use mInner->nodeValue directly
> too).

Done.

> You should probably assert if txXPathNodeUtils::getElementById is called on a
> node-type other then documents rather then just return null. (Though hopefully
> we can get rid of this function entierly)

getElementById is dead

> The |#define kFmtSizeAttr|s can be removed.

Done.

> In txXPathTreeWalker.h
> txUint32Array::RemoveValuesAt isn't used.

Removed.

> +#ifdef TX_EXE
> +#else
> Change to |#ifndef|

Done.

> In txExecutionState::init:
> +    // Set up loaded-documents-hash
> +    rv = mLoadedDocuments.init(txXPathNodeUtils::getOwnerDocument(aNode));
> The node created here is leaked.

Done.

> In txStepPattern::matches:
> -    if (!mIsAttr && !aNode->getParentNode())
> +    txXPathTreeWalker walker(aNode);
> +    // XXX isn't this wrong when mIsAttr is set
> +    if (!walker.moveToDOMParent() && !mIsAttr)
> Isn't attribute-step pattern matching broken since we'll never walk to the
> parent of the attribute. The easiest thing would probably be to do something
> like:
> if ((!mIsAttr && walker->getNodeType() == txXPathNodeType::ATTRIBUTE_NODE) ||
>     !walker.moveToParent()) {
>     return PR_FALSE;
> }
> That should get rid of the need for the moveToDOMParent function.

Done.
> > In PathExpr::evaluate:
> > |tmpNodes->addAndTransfer(resNodes)| might not be safe for the first step
> > since for example keys can return a nodeset owned by somebody else. You
> > could use |recycler()->getNonSharedNodeSet|
> Could you revisit this? We call getNonSharedNodeSet just before the call to
> addAndTransfer.

Sorry, i should have been more specific. You need to protect |resNodes| too.
Right now only the nodeset that you move nodes into is protected. Not the
nodeset you move nodes out from.

> > In txNodeSet::clear:
> > The |if (mStart)|-test shouldn't be needed when TX_DONT_RECYCLE_BUFFER isn't
> > defined.
> What if mStart == mEnd == nsnull?
Then the |mStart < mEnd| test won't be true.

> What's wrong with the license
You changed the original developer and removed the stuff about "Portions created
by Keith Visco as a Non MITRE employee". I think that you also need to add the
other people listed in the old plate to the contributers list. In general i'd
prefer to do licencechanges separatly.
> In class txLoadedDocumentEntry:
> +    txLoadedDocumentEntry(const txLoadedDocumentEntry& aToCopy)
> +        : nsStringHashKey(aToCopy)
>      {
>      }
> Feel free to add an assertion in there since if this was called we'd be
> horked.

Done.

> In txIdPattern::matches:
> +    nsresult rv = txXPathNativeNode::getElement(aNode, &elem);
> +    NS_ENSURE_SUCCESS(rv, PR_FALSE);
> An assertion should be enough since you've already verified that the node is
> an element.

Done.

> IMHO txXPathTreeWalker::setTo doesn't need to test what kind of pointer is
> used. Same in the txXPathNode copy-ctor and txXPathNode::operator==.

Done.

> txXPathNodeUtils::isSameNode isn't used or implemented.

Removed.

> Please remove the empty txXPathNode dtor (or at least inline it).

Done.

> txXPathNodeUtils::getAttr doesn't need the IsContentOfType-call.

Removed.

> What's the logic for atomized txXPathNodeUtils::getLocalName returning true or
> false? Does it really need to return anything or is a nullname enough
> indication of a missing name? What i'm mostly worried about is it returning
> null and PR_TRUE at the same time.

Made it return already_AddRefed<nsIAtom> (also removed unused atomized
txXPathTreeWalker::getLocalName).

> txXPathNodeUtils::getNamespaceID will return the xhtml-namespace for
> html-elements rather then null. Do we care?
> 
> No need for the temporary in txXPathNodeUtils::getNamespaceURI.

Removed.

> Please remove the empty txXPathNode dtor (or at least inline it).

Done.

> txXPathNodeUtils::getNodeValue could use |mInner->nodeValue| directly rather
> then calling |mInner->getNodeValue|.

Done.

> You need to make all callers of txXPathNativeNode::createXPathNode be able to
> deal with null-returns (OOM)

Done.

> In txKey.h and txKeyFunctionCall.cpp:
> In general, it is not enough to key on the hash-value on the document, just
> because two documents have the same hash-value doesn't mean they are the same
> document (though that happens to be true in our implmenetations). Either put
> the entire txXPathNode into the two key-hashes or rename
> txXPathNodeUtils::getHashKey to something like
> txXPathNodeUtils::getUniqueIdentifier (and change the name of the hash-members
> to something like mDocumentIdentifier).

Done, though I don't understand your objection, getHashKey was supposed to
return a unique key per document.

> I just realized that all callers of txXPathNodeUtils::getOwnerDocument will
> have to check for null regardless of what we decide on on the
> GetDocument-issue. For OOM reasons. Same goes for all
> txXPathNativeNode::createXPathNode functions.

Done.

> You should probably assert if txXPathNodeUtils::getElementById is called on a
> node-type other then documents rather then just return null. (Though hopefully
> we can get rid of this function entierly)

getElementById is dead.

> txXPathNodeUtils::getAttr should probably truncate the string at the top of
> the function.

Hmm, not really. Callers just need to diregard the value in the string if the
call fails (which they do iirc).

> In txStandaloneXPathTreeWalker.cpp:
> remove the XXX comments in txXPathTreeWalker::moveTo(First|Last)Child. The
> answer to the question is "no".

Done.

> remove the XXX comments and the attribute/document checks from
> moveTo(Next|Previous)Sibling. The checks are not needed.

Done.

> txXPathNodeUtils::getAttr should probably truncate the string at the top of
> the function.

See above.

> I'm slightly nervous about txForwardContext holding on to a reference rather
> then containing a txXPathNode. Have you checked that all callers use an
> argument that outlives the context? For example if they use a node from a
> nodeset the nodeset isn't modified before the context dies.

There's only one caller, txStepPattern::matches. I think it's safe.

> txSingleNodeContext has the same issue of holding a reference as
> txForwardContext. Might not actually be a problem but I wanted to check.

Used in txMozillaXSLTProcessor and txStandaloneXSLTProcessor. Both hold on to
the initial context node, so this is safe too I think.

> Why the change in the way nodesets are merged (txNodeSet::add). AFAICT the old
> algorithm was faster.

Pike answered this.

> In txNodeSet::indexOf:
> Can't this function be called during predicates when the nodeset is inverted,
> i.e. doesn't it have to deal with reversed nodesets? You could get rid of
> |counter| and just return |pos - mStart|. The |mStart == mEnd| test isn't
> really needed.

Pike answered this.

> In txNodeSet::ensureGrowSize:
> +    if (mDirection == kReversed &&
> +        mStart && mStartBuffer && aSize <= mStart - mStartBuffer) {
> +        return PR_TRUE;
> +    }
> The mStart and mStartBuffer tests shouldn't be needed. Same in the |mDirection
> == kForward|-section and the |oldSize|, |oldLength| and |newLength|
> initializations.

Done.

> +    if (oldSize > 0) {
> +        txXPathNode* dest = newArr;
> +        if (mDirection == kReversed) {
> +            dest += newLength - oldSize;
> +        }
> If you move setting |dest| to outside the |if| you can use it when setting new
> mStart/mEnd further down.

Done.

> In txNodeSet.h
> +     * the nodeset as required. Do only call append(aNode), get(), mark()
> append(aNodeSet)

Pike answered this.

> The module walker needs to be OOM-safe. (havn't checked yet if the standalone
> one is)

Where is the module one not OOM-safe?

> Sorry, i should have been more specific. You need to protect |resNodes| too.
> Right now only the nodeset that you move nodes into is protected. Not the
> nodeset you move nodes out from.

Ok, I've done this but only for the first step, right? Please take a look at
these changes again.
Attached patch Current patch (obsolete) — Splinter Review
Attachment #135334 - Attachment is obsolete: true
txNameTest.cpp:
+    nsCOMPtr<nsIAtom> localName = txXPathNodeUtils::getLocalName(aNode);
is that ambiguous or not? I know it's for a plain pointer, not sure if it is
for an already_Addrefed.

txXPathTreeWalker::moveToParent (mozilla)
+    // XXX do we need this test? we assume in a bunch of other places that the node
+    // is in the doc so we should probably enforce that during bootstrap

Yes, I think we do, for orphaned subtrees. I wonder what "/" is supposed to mean
for an orphaned tree. We make that null right now, which seems ok to me.
We don't get to the owner-document for the IO cases though. So something like
"document('foo.xml')" might break when called from an transformation of an
orphaned tree due to security checks (falsly?) kicking in. But this is rather to
tight than too loose and an edge case, mark that on your list.

More XXX on that below.

txXPathNodeUtils::comparePosition
+        // The isContent tests can be removed if we rely on that mIndex < 0
+        // for content.
mIndex is PRUint32 now, so no.

txNodeSet::findPosition
should 
+    txXPathNode* midpos = aFirst + (aLast - aFirst) / 2;
be
+    txXPathNode* midpos = aFirst + (aLast - aFirst - 1) / 2;
? Take 3 inputs, 0 .. 3, ends up with 3/2, which gives 1 here, so it's fine, but
I can't find if that's speced or implementation dependent. If it is, ending up
with 2 would be sad.

txXSLTNumber::getValueList

+            PRBool hasParent = walker.moveToParent();
+            while (hasParent) {
+                if (aFromPattern->matches(walker.getCurrentPosition(), aContext)) {
                     break;
                 }
 
+                hasParent = walker.moveToParent();
             }

could be 

+            PRBool hasParent;
+            while (hasParent = walker.moveToParent()) {
+                if (aFromPattern->matches(walker.getCurrentPosition(), aContext)) {
                     break;
                 }
             }

The licensing of CurrentFunctionCall seems ok to me, but that could just happen.
Not part of this bug/patch. (Hrm, not sure about the copyright date.)

That's all I found while reading attachement 135611.
missed braces,
+            while (hasParent = walker.moveToParent()) {
should of course be
+            while ((hasParent = walker.moveToParent())) {
> txNameTest.cpp:
> +    nsCOMPtr<nsIAtom> localName = txXPathNodeUtils::getLocalName(aNode);
> is that ambiguous or not? I know it's for a plain pointer, not sure if it is
> for an already_Addrefed.

nsCOMPtr has operator=( const already_AddRefed<T>& rhs ).

> txXPathNodeUtils::comparePosition
> +        // The isContent tests can be removed if we rely on that mIndex < 0
> +        // for content.
> mIndex is PRUint32 now, so no.

Done.

> txNodeSet::findPosition
> should 
> +    txXPathNode* midpos = aFirst + (aLast - aFirst) / 2;
> be
> +    txXPathNode* midpos = aFirst + (aLast - aFirst - 1) / 2;
> ? Take 3 inputs, 0 .. 3, ends up with 3/2, which gives 1 here, so it's fine,
> but I can't find if that's speced or implementation dependent. If it is,
> ending up with 2 would be sad.

Integer division truncates the fractional part.

> txXSLTNumber::getValueList
> +            PRBool hasParent;
> +            while (hasParent = walker.moveToParent()) {
> +                if (aFromPattern->matches(walker.getCurrentPosition(),
aContext)) {
>                      break;
>                  }
>              }

Done.
> In RelationalExpr::compareResults:
> Why change the |switch| to |if|-statements? That is likly to be slower on some
> compilers.

Done.

> In txXSLTNumber::getSiblingCount:
> +    txXPathTreeWalker walker(aWalker.getCurrentPosition());
> You could just use aWalker directly, the callers only rely on the walker
> staying within the same parent. Though you should probably comment on this in
> the functiondeclaration.

Done.

> txXPathTreeWalker::setTo probably belongs in txXPathNodeUtils?

Removed setTo.

> Please fix createXPathNode(nsIDOMNode* aNode) to deal with attributes not
> implementing nsIAttribute to avoid regressing xpath on xul.

Done.

> The licensing of CurrentFunctionCall seems ok to me, but that could just
> happen. Not part of this bug/patch. (Hrm, not sure about the copyright date.)

Removed.
Attached patch Current patch (obsolete) — Splinter Review
Attachment #135611 - Attachment is obsolete: true
Attached patch Current patch (obsolete) — Splinter Review
Attachment #135751 - Attachment is obsolete: true
the latest PathExpr change is wrong. That part was correct before, what needs
change is the code right before the

tmpNodes->addAndTransfer(resNodes);

That statement will change both |tmpNodes| and |resNodes|. Right now there is
code in place that will protect |tmpNodes| from being shared. However |resNodes|
has no such protection which is what you need to add.

tmpNodes = resNodes;
won't modify either nodeset so no protection is needed.

In txXPathNodeUtils::getOwnerDocument:
+        aNode.mContent->GetNodeInfo()->GetDocument();
Need a nullcheck, you can't be sure that the node has a nodeInfo. You might even
want to try to go to the parent and get the document through there if aNode
doesn't have a nodeinfo.
In module txXPathNodeUtils::getNodeValue:
Getting the nodevalue of a comment will go through the PI path which will assert
(and work suboptimally). Comments implement nsITextContent so you could change
the |IsContentOfType(nsIContent::eTEXT)| test to test for PIs and use the
nsITextContent patch for everything else.
re #82 (wow, what a number), module txXPathNodeUtils::getNodeValue:
actually, nsTextNode, nsCommentNode and nsXMLProcessingInstruction all go
thru the same code path for GetNodeValue and each is a 
nsDOMGenericDataNode and thus a nsITextContent. So I guess you can just QI to
nsITextContent and use that.
Maybe wallpaper the code with a trailing handling of nsIDOMNode with a 
NS_WARNING("This is neither an attribute nor an element nor an nsITextContent,"
"using nsIDOMNode.")

or so.
nsDOMGenericDataNode doesn't QI to nsITextContent, and neither does
nsXMLProcessingInstruction.
Attached patch v1Splinter Review
Only changes with previous patch should be in
txXPathNodeUtils::getOwnerDocument, txXPathNodeUtils::getNodeValue,
txXPathTreeWalker::comparePosition and PathExpr::evaluate.
Attachment #135766 - Attachment is obsolete: true
Attachment #135810 - Flags: superreview?(jst)
Attachment #135810 - Flags: review?(bugmail)
Comment on attachment 135810 [details] [diff] [review]
v1

Module txXPathNodeUtils::getNamespaceID will return the wrong namespace for
html-elements which will break DOM-XPath on html. Get the nodeinfo and use the
namespace in there instead.

With that, r=sicking
Attachment #135810 - Flags: review?(bugmail) → review+
I've filed bug 226124 to take care of some remaining reviewer's comments and
some cleanup that I still want to do after landing this patch.
Comment on attachment 135810 [details] [diff] [review]
v1

- In txParseDocumentFromURI():

+    // Raw pointer, we want the resulting txXPathNode to hold a reference to
+    // the document.
+    nsIDOMDocument* theDocument = nsnull;
+    rv = loader->LoadDocumentAsXML(channel, loaderUri, &theDocument);
     if (NS_FAILED(rv) || !theDocument) {
	 aErrMsg.Append(NS_LITERAL_STRING("Document load of ") + 
			aHref + NS_LITERAL_STRING(" failed."));
	 return rv;
     }

-    *aResult = new Document(theDocument);
+    *aResult = txXPathNativeNode::createXPathNode(theDocument);
     if (!*aResult) {
+	 NS_RELEASE(theDocument);
	 return NS_ERROR_FAILURE;
     }

This refcounting code needs to be cleaned up (as we discussed). File a followup
bug on that...

- In txXPathTreeWalker::moveToSibling():

+    PRUint32 newIndex = mCurrentIndex + aDir;
+    nsIContent* newChild = parent ? parent->GetChildAt(newIndex) :
+				     document->GetChildAt(newIndex);
+    if (!newChild) {
+	 return PR_FALSE;
+    }

I don't see anything that guarantees that mCurrentIndex is > 0. If it's 0 and
aDir is -1, you'll end up asking fir the child at -1, which will return null,
so the code does the right thing, but in a roundabout kinda way :-) Add a
check, or a comment explaining this.

- In txXPathNodeUtils::getNodeName():

+	     // Check for html
+	     if (aNode.mContent->IsContentOfType(nsIContent::eHTML) &&
+		 nodeInfo->NamespaceEquals(kNameSpaceID_None)) {
+		 ToUpperCase(aName);
+	     }

The above check would probably be a bit faster over-all if you flip the
conditions around, the nodeinfo namespace check is inlined, the
IsContentOfType() call is virtual. Or is most data that comes through this in
no namespace anyway, so the order doesn't matter?

+    // Check for html
+    if (aNode.mContent->IsContentOfType(nsIContent::eHTML) &&
+	 aNode.mContent->GetNodeInfo()->NamespaceEquals(kNameSpaceID_None)) {
+	 ToUpperCase(aName);
+    }

... same thing.


- In txXPathNodeUtils::getNodeType():

+    if (aNode.isContent()) {
+	 PRUint16 type;
+	 nsCOMPtr<nsIDOMNode> node = do_QueryInterface(aNode.mContent);
+	 node->GetNodeType(&type);
+
+	 return type;
+    }

Won't that always be ELEMENT_NODE?


- In getTextContent()

Looks like this method appends the text content to the string, then maybe name
it appendTextContent()?

- In txXPathNodeUtils::getNodeValue():

Same thing here, appendNodeValue()?

- In txXPathNodeUtils::getDocument(), txXPathNodeUtils::getOwnerDocument(),

These functions actually create wrappers and the caller is supposed to deal
with the ownership of the created wrappers... Then maybe name these
createDocumentWrapper(), and createOwnerDocumentWrapper(), or something? Not a
big deal tho...

- In txXPathNodeUtils::getXSLTId():

+    CopyASCIItoUTF16(nsPrintfCString(kFmtSizeAttr, gPrintfFmtAttr,
aNode.mContent,
+				     aNode.mIndex), aResult);

Nit of the day, second line argument indentation off-by-one :-)

- In txXPathNodeUtils::getBaseURI():

+    if (aNode.isDocument()) {
+	 node = do_QueryInterface(aNode.mDocument);
+    }
+    else {
+	 node = do_QueryInterface(aNode.mContent);
+    }
+
+    if (node) {
+	 node->GetBaseURI(aURI);
+    }

Add else { aURI.Truncate(); }.

+// XXXX Inline?
+
+/* static */
+txXPathNode*
+txXPathNativeNode::createXPathNode(nsIDOMDocument* aDocument)
+{
+    nsCOMPtr<nsIDocument> document = do_QueryInterface(aDocument);
+    return new txXPathNode(document);
+}

I doubt this is used enough to make inlining it worth it, might as well hide
the nsCOMPtr code etc in a function...

- In txXSLTNumber::getValueList():

+	 } while(walker.moveToParent());

Nit two of the day, add a space after while :-)

Other than that, this looks very very nice! Well done.

sr=jst
Attachment #135810 - Flags: superreview?(jst) → superreview+
> - In txParseDocumentFromURI():
> 
> +    // Raw pointer, we want the resulting txXPathNode to hold a reference to
> +    // the document.
> +    nsIDOMDocument* theDocument = nsnull;
> +    rv = loader->LoadDocumentAsXML(channel, loaderUri, &theDocument);
>      if (NS_FAILED(rv) || !theDocument) {
> 	 aErrMsg.Append(NS_LITERAL_STRING("Document load of ") + 
> 			aHref + NS_LITERAL_STRING(" failed."));
> 	 return rv;
>      }
> 
> -    *aResult = new Document(theDocument);
> +    *aResult = txXPathNativeNode::createXPathNode(theDocument);
>      if (!*aResult) {
> +	 NS_RELEASE(theDocument);
> 	 return NS_ERROR_FAILURE;
>      }
> 
> This refcounting code needs to be cleaned up (as we discussed). File a followup
> bug on that...

I'll do it in bug 226124.

> - In txXPathTreeWalker::moveToSibling():
> 
> +    PRUint32 newIndex = mCurrentIndex + aDir;
> +    nsIContent* newChild = parent ? parent->GetChildAt(newIndex) :
> +				     document->GetChildAt(newIndex);
> +    if (!newChild) {
> +	 return PR_FALSE;
> +    }
> 
> I don't see anything that guarantees that mCurrentIndex is > 0. If it's 0 and
> aDir is -1, you'll end up asking fir the child at -1, which will return null,
> so the code does the right thing, but in a roundabout kinda way :-) Add a
> check, or a comment explaining this.

Added a comment.

> - In txXPathNodeUtils::getNodeName():
> 
> +	     // Check for html
> +	     if (aNode.mContent->IsContentOfType(nsIContent::eHTML) &&
> +		 nodeInfo->NamespaceEquals(kNameSpaceID_None)) {
> +		 ToUpperCase(aName);
> +	     }
> 
> The above check would probably be a bit faster over-all if you flip the
> conditions around, the nodeinfo namespace check is inlined, the
> IsContentOfType() call is virtual. Or is most data that comes through this in
> no namespace anyway, so the order doesn't matter?

Done.

> +    // Check for html
> +    if (aNode.mContent->IsContentOfType(nsIContent::eHTML) &&
> +	 aNode.mContent->GetNodeInfo()->NamespaceEquals(kNameSpaceID_None)) {
> +	 ToUpperCase(aName);
> +    }
> 
> ... same thing.

Done.

> - In txXPathNodeUtils::getNodeType():
> 
> +    if (aNode.isContent()) {
> +	 PRUint16 type;
> +	 nsCOMPtr<nsIDOMNode> node = do_QueryInterface(aNode.mContent);
> +	 node->GetNodeType(&type);
> +
> +	 return type;
> +    }
> 
> Won't that always be ELEMENT_NODE?

No (PIs, Comment nodes, text nodes).

> - In getTextContent()
> 
> Looks like this method appends the text content to the string, then maybe name
> it appendTextContent()?

Done.

> - In txXPathNodeUtils::getNodeValue():
> 
> Same thing here, appendNodeValue()?

Done.

> - In txXPathNodeUtils::getDocument(), txXPathNodeUtils::getOwnerDocument(),
> 
> These functions actually create wrappers and the caller is supposed to deal
> with the ownership of the created wrappers... Then maybe name these
> createDocumentWrapper(), and createOwnerDocumentWrapper(), or something? Not a
> big deal tho...

I'll do this in bug 226124 since I plan to redo the way we create these wrappers..

> - In txXPathNodeUtils::getXSLTId():
> 
> +    CopyASCIItoUTF16(nsPrintfCString(kFmtSizeAttr, gPrintfFmtAttr,
> aNode.mContent,
> +				     aNode.mIndex), aResult);
> 
> Nit of the day, second line argument indentation off-by-one :-)

Done.

> - In txXPathNodeUtils::getBaseURI():
> 
> +    if (aNode.isDocument()) {
> +	 node = do_QueryInterface(aNode.mDocument);
> +    }
> +    else {
> +	 node = do_QueryInterface(aNode.mContent);
> +    }
> +
> +    if (node) {
> +	 node->GetBaseURI(aURI);
> +    }
> 
> Add else { aURI.Truncate(); }.

Done.

> +// XXXX Inline?
> +
> +/* static */
> +txXPathNode*
> +txXPathNativeNode::createXPathNode(nsIDOMDocument* aDocument)
> +{
> +    nsCOMPtr<nsIDocument> document = do_QueryInterface(aDocument);
> +    return new txXPathNode(document);
> +}
> 
> I doubt this is used enough to make inlining it worth it, might as well hide
> the nsCOMPtr code etc in a function...

Removed the comment.

> - In txXSLTNumber::getValueList():
> 
> +	 } while(walker.moveToParent());
> 
> Nit two of the day, add a space after while :-)

Done.
Attached patch Final patchSplinter Review
Final patch is the one that got checked in. Thanks to the reviewers.
This have added a warning on brad TBox:

+extensions/transformiix/source/xpath/txMozillaXPathTreeWalker.cpp:302
+ `nsIDocument*document' might be used uninitialized in this function
On second thought, my suggesions for not checking isDocument() in operator== and
a few more might not be safe since gcc3 does some funky alias-optimizations, we
should probably fix that before 1.6b
Hi,
  I;m getting on Tru64Unxi 5.1A with current cvs checkout(yesterday) the
following(GNU libiconv 1.9.1 detected during configure):

make[4]: Entering directory
`/afs/gsf.de/sources/mozilla/extensions/transformiix/build'
txMozillaXPathTreeWalker.cpp
Building deps for ../source/xpath/txMozillaXPathTreeWalker.cpp
cxx -o ../source/xpath/txMozillaXPathTreeWalker.o -c -DOSTYPE=\"OSF1V5\"
-DOSARCH=\"OSF1\" -DHAVE_DEPENDENT_LIBS  -I../../../dist/include/xpcom
-I../../../dist/include/string -I../../../dist/include/dom
-I../../../dist/include/content -I../../../dist/include/widget
-I../../../dist/include/necko -I../../../dist/include/js
-I../../../dist/include/xpconnect -I../../../dist/include/caps
-I../../../dist/include/locale -I../../../dist/include/unicharutil
-I../../../dist/include/htmlparser -I../../../dist/include/webshell
-I../../../dist/include/docshell -I../../../dist/include/layout
-I../../../dist/include/uconv -I../../../dist/include/windowwatcher
-I../../../dist/include/mimetype -I../../../dist/include/intl
-I../../../dist/include/transformiix -I../../../dist/include
-I/afs/.gsf.de/sources/mozilla/dist/include/nspr     -I. -I./../source/xslt
-I./../source/base -I./../source/xml -I./../source/xml/parser
-I./../source/xpath -I./../source/xslt/util -I./../source/xslt/functions     
-I/software/@sys/usr/freetype-2.1.4rc1/include -I/software/@sys/usr/include
-I/usr/local2/include -I/usr/local/include -I/usr/local2/openssl/include
-I/software/@sys/usr/freetype-2.1.4rc1/include  -O0 -arch ev56 -g2 -noexceptions
-ieee -ptr ../../../dist/cxx_repository -pthread  -DDEBUG -D_DEBUG -DDEBUG_root
-DTRACING -g  -I/software/@sys/usr/freetype-2.1.4rc1/include
-I/software/@sys/usr/include -I/usr/local2/include -I/usr/local/include
-I/usr/local2/openssl/include -I/software/@sys/usr/freetype-2.1.4rc1/include 
-DNSCAP_DISABLE_TEST_DONTQUERY_CASES=1 -DNSCAP_DISABLE_DEBUG_PTR_TYPES=1
-DNEED_USLEEP_PROTOTYPE=1 -DD_INO=d_ino -DSTDC_HEADERS=1 -DHAVE_ST_BLKSIZE=1
-DHAVE_SIGINFO_T=1 -DHAVE_UINT=1 -DHAVE_UINT_T=1 -DHAVE_64BIT_OS=1
-DHAVE_DIRENT_H=1 -DHAVE_GETOPT_H=1 -DHAVE_SYS_BITYPES_H=1 -DHAVE_MEMORY_H=1
-DHAVE_UNISTD_H=1 -DHAVE_NL_TYPES_H=1 -DHAVE_MALLOC_H=1 -DHAVE_X11_XKBLIB_H=1
-DHAVE_SYS_STATVFS_H=1 -DHAVE_LIBC_R=1 -DHAVE_LIBM=1 -DFUNCPROTO=15
-DHAVE_XSHM=1 -D_REENTRANT=1 -DHAVE_RANDOM=1 -DHAVE_STRERROR=1 -DHAVE_LCHOWN=1
-DHAVE_FCHMOD=1 -DHAVE_SNPRINTF=1 -DHAVE_MEMMOVE=1 -DHAVE_RINT=1
-DHAVE_NL_LANGINFO=1 -DHAVE_FLOCKFILE=1 -DHAVE_LOCALTIME_R=1 -DHAVE_STRTOK_R=1
-DNEED_CPP_DERIVED_TEMPLATE_OPERATORS=1 -DNEED_CPP_TEMPLATE_CAST_TO_BASE=1
-DCANT_RESOLVE_CPP_CONST_AMBIGUITY=1 -DHAVE_I18N_LC_MESSAGES=1
-DMOZ_DEFAULT_TOOLKIT=\"gtk\" -DMOZ_WIDGET_GTK=1 -DMOZ_ENABLE_XREMOTE=1
-DMOZ_X11=1 -DMOZ_ENABLE_COREXFONTS=1 -DMOZ_EXTRA_X11CONVERTERS=1 -DOJI=1
-DIBMBIDI=1 -DMOZ_VIEW_SOURCE=1 -DACCESSIBILITY=1 -DMOZ_XPINSTALL=1
-DMOZ_JSLOADER=1 -DMOZ_MATHML=1 -DMOZ_LOGGING=1 -DDETECT_WEBSHELL_LEAKS=1
-DMOZ_USER_DIR=\".mozilla\" -DNSCAP_DONT_PROVIDE_NONCONST_OPEQ=1 -DMOZ_XUL=1
-DMOZ_PROFILESHARING=1 -DMOZ_PROFILELOCKING=1 -DMOZ_DLL_SUFFIX=\".so\"
-DXP_UNIX=1 -DUNIX_ASYNC_DNS=1 -DJS_THREADSAFE=1 -DNS_PRINT_PREVIEW=1
-DNS_PRINTING=1 -DMOZ_REFLOW_PERF=1 -DMOZ_REFLOW_PERF_DSP=1
-DMOZILLA_VERSION=\"1.6b\"  -D_MOZILLA_CONFIG_H_ -DMOZILLA_CLIENT
../source/xpath/txMozillaXPathTreeWalker.cpp
cxx: Error: ../source/xpath/txMozillaXPathTreeWalker.cpp, line 705: 
          identifier "gPrintfFmtAttr" is undefined
    CopyASCIItoUTF16(nsPrintfCString(kFmtSizeAttr, gPrintfFmtAttr, aNode.mContent,
---------------------------------------------------^
cxx: Info: 1 error detected in the compilation of
"../source/xpath/txMozillaXPathTreeWalker.cpp".
make[4]: *** [../source/xpath/txMozillaXPathTreeWalker.o] Error 1
make[4]: Leaving directory
`/afs/gsf.de/sources/mozilla/extensions/transformiix/build'
Urgh.
http://lxr.mozilla.org/seamonkey/source/extensions/transformiix/source/xpath/txMozillaXPathTreeWalker.cpp#683
has grintfFmtAtt (note the missing P). If someone could make a patch for that
and comment 92 I'll r/sr (had to give back my build machine to AOL and my new
one hasn't arrived yet :-/).
Comment on attachment 136104 [details] [diff] [review]
fix missing `P' in grintfFmtAtt

r=me, requesting sr from peter.

Drivers, this is a bustage fix for 64 bit OSes. Change is for HAVE_64BIT_OS
only. epsilon risk.
Attachment #136104 - Flags: superreview?(peterv)
Attachment #136104 - Flags: review+
Attachment #136104 - Flags: approval1.6b?
Attachment #136104 - Flags: superreview?(peterv) → superreview+
re #92, I wonder if we could see that in action somehow.
The alias optim is for members of a struct, I wouldn't automagically suspect 
badness for a union.
Attachment #136104 - Flags: approval1.6b? → approval1.6b+
Comment on attachment 136104 [details] [diff] [review]
fix missing `P' in grintfFmtAtt

attachment 136104 [details] [diff] [review] checked in, thanx to Martin for the patch
Actually we don't have to worry about aliasing here for two reasons. First of
all we don't dereference these members (i.e. we're only using the
pointer-values, not what they're pointing at). Second, Pike is right, unions are
excepted.
Comment 100 and fixed. Thanks for all the comments and reviews and fixes.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: