Open Bug 323547 Opened 19 years ago Updated 2 years ago

Non-deterministic rendering on Xalan test mdocs04 (uses document())

Categories

(Core :: XSLT, defect)

x86
Linux
defect

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

Details

Attachments

(8 files)

BUILD: current trunk build

STEPS TO REPRODUCE:
1) Run the XSLT tests (see http://wiki.mozilla.org/XSLT_Tests)
2) See the mdocs04 test sometimes succeed and sometimes fail.

I suspect the problem is that the subdocument loads sometimes complete in a different order and we don't handle that right.  I'll try to post a self-contained testcase to Bugzilla.
Attached file Data file #1
Attached file Data file #2
Attached file XSLT stylesheet
I was experimenting with this and found that changing DocumentFunctionCall::retrieveNode() to use append() rather than add() when appending the imported document (note the two occurrences) to the node set fixes the problem.  However, I don't believe this is the proper solution and that the problem lies somewhere in txNodeSet::add().
Aah, I see the problem.  When using txNodeSet::add(), the nodes are inserted into the node set based on their document order.  However, the nodes we're adding are all document nodes, and txXPathNodeUtils::comparePosition() compares nodes from different owner documents by pointer, hence the random positioning we're seeing.

So maybe changing add() to append() is a reasonable fix since the generated node set will only include imported document nodes.  Alternatively, txXPathNodeUtils::comparePosition() could be changed to always order new imported document nodes after existing ones.
Attachment #208599 - Flags: superreview?(peterv)
Attachment #208599 - Flags: review?(bugmail)
This is a known issue.

The "problem" is in the way we sort document nodes. XSLT and XPath has a concept of ordering of nodes and nodesets are always ordered according to this order. This order is well defined for nodes in the same connected subtree, but how do you order the root nodes of two disconnected subtrees?

The XSLT spec only says that the order should be consistent, however it is not clear if it should be consistent during a single transformation, between multiple runs of the same stylesheet, or always independent of the stylesheet. See the last paragraph in

http://www.w3.org/TR/xslt#document

In any case the mdocs test is wrong to expect a certain order, "Hello GoodBye" is no more right then "GoodBye Hello".

What we do is that we order the documents according to the addres of the nsIDocument. What we could do is to order them alphabeticaly according the the document url. However there is a performance issue here and we'd need to figure out a fast way to do that.
Comment on attachment 208600 [details] [diff] [review]
Change order of two imported nodes in txXPathNodeUtils::comparePosition()

This makes ordering inconsistent even within the same transformation so this is wrong.
Attachment #208600 - Flags: superreview?(peterv)
Attachment #208600 - Flags: review?(bugmail)
Attachment #208600 - Flags: review-
Comment on attachment 208599 [details] [diff] [review]
Change add() to append() in retrieveNode()

Nodesets should always be ordered so this is wrong. We only use append when we know that that'll produce an ordered nodeset.
Attachment #208599 - Flags: superreview?(peterv)
Attachment #208599 - Flags: review?(bugmail)
Attachment #208599 - Flags: review-
Attached file Counter example
This testcase should produce the same string, be it "Hello GoodBye" or "GoodBye Hello" twice.

The first patch will be non-deterministic in the second string but not the first, and will therefor sometimes produce two different strings.

The second patch will always force the two stings to be different.
I had a patch to fix this at one point, I'll try to find it again. IIRC it used the order in which the loads were started (which isn't ideal either).
If we classify consistency at three levels:

1. Same sorting order within a single transformation
2. Same sorting order across transfomations for a single stylesheet+source doc
3. Always the same sorting order

We currently do 1. Sorting on loading order would be level 2. The only way I can think of to do level 3 is to sort by uri alphabetically.

Again, I'm not convinced the spec mandates anything beyond level 1. But, IMHO, the higher level we can be the better, as long as it doesn't hurt performance too much.
I now see that my patches were quite naive and were missing the big picture here :)  This bug is not just about using the document() function with a node-set argument, but really how a union involving document nodes is computed.  The xslt spec says you should treat this usage of document(<node-set>) as the union of the result of calling document() for each node in the node-set.

I think a better test case for this situation is the following stylesheet with the template:

<xsl:template match="/root">
<out>
<xsl:copy-of select="document('e.xml') | c | a | document('d.xml') | b"/>
</out>
</xsl:template>

This stylesheet is applied to an XML document with the root element <root> and child nodes <a>, <b>, <c>.  Documents d.xml and e.xml are single-node documents made up of the nodes <D> and <E>, respectively.

Mozilla will generate a variety of outputs such as EabcD, abcDE, abcED -- any combination that has all the document nodes "abc" sorted together.

An informal survey of other XSLT processors:
Xalan-J (v2.4.1):  EabcD
MSXML (WinXP SP2): abcED
libxml2 (v2.6.21): EabDc

None of them agree :)  However, they all have the document nodes "abc" sorted properly (as they should).  The interesting thing here is that they all consistently preserve the order of the document nodes D and E -- following the order that they appear in the union expression.  I think if we can get moz to preserve the order, then that would address the problem originally described in this bug.

Could txNodeSet::add() always send added document nodes to the end of the list?  This would be an efficient solution that should preserve the union order (assuming it is processed right to left).  The order will always be the same for the same set of documents.
Putting document nodes last doesn't really help when all nodes are documents. Also, what do you do when sorting something like:

document('a.xml')/a | document('b.xml')/b

There none of the nodes are document nodes, but the two nodes are still in disconnected subtrees.

Also, libxml's answer seems outright wrong, the three nodes 'abc' should always be together.

If we want to mimic the other processors we need to figure out what they do. A resonable guess is that they order the documents according to the order they were loaded.
(In reply to comment #17)
> Putting document nodes last doesn't really help when all nodes are documents.

Why is this bad when all nodes are documents?  Assuming the union is evaluated left to right, the order of the document nodes in the node set will match the order in the union if each document node is appended to the end of the node set.

> Also, what do you do when sorting something like:
> 
> document('a.xml')/a | document('b.xml')/b
> 
> There none of the nodes are document nodes, but the two nodes are still in
> disconnected subtrees.

We've been talking about document nodes, but in reality we're really talking about nodes that have different owner documents from the source xml -- this is how the current code considers the nodes when sorting.  But the sorting of nodes from any other document is not specified by the spec, so even in this case:

document('ab.xml')/root/b | document('ab.xml')/root/a

Where we could attempt to sort the nodes based on their position in 'ab.xml', we probably shouldn't.

> Also, libxml's answer seems outright wrong, the three nodes 'abc' should always
> be together.

It is certainly strange, but does it actually violate the spec?

> If we want to mimic the other processors we need to figure out what they do. A
> resonable guess is that they order the documents according to the order they
> were loaded.

I don't know if we want to mimic other processors outright, but we should try to produce the same result across transformations, and deal with this underspecified part of the specification in a way that users would expect.  I don't think I would expect the libxml result :)

A quick check of libxml and Xalan-J with having two different orderings of the document() function in the union show that the ordering follows the order of the union expression.  This shows that the ordering is probably not based on the load order (assuming they only load the documents once).  I'll try to look at the actual code and see what is being done.
> > Putting document nodes last doesn't really help when all nodes are
> > documents.
> 
> Why is this bad when all nodes are documents?  Assuming the union is evaluated
> left to right, the order of the document nodes in the node set will match the
> order in the union if each document node is appended to the end of the node
> set.

I'm not saying it's bad, i'm just saying that it doesn't answer the question "how should we order documents". Remember that all nodes (including documents) do have an order and that nodesets should always be ordered in this order. Always putting documents last when performing a union will not give consistent results. See comment 13.

> > Also, what do you do when sorting something like:
> > 
> > document('a.xml')/a | document('b.xml')/b
> > 
> > There none of the nodes are document nodes, but the two nodes are still in
> > disconnected subtrees.
> 
> We've been talking about document nodes, but in reality we're really talking
> about nodes that have different owner documents from the source xml -- this is
> how the current code considers the nodes when sorting.

Exactly. But it all breaks down to how you order the documents. Within a single document ordering is well defined. So in the above example the question comes down to "which is first, document a.xml or document b.xml".

> But the sorting of
> nodes from any other document is not specified by the spec, so even in this
> case:
> 
> document('ab.xml')/root/b | document('ab.xml')/root/a
> 
> Where we could attempt to sort the nodes based on their position in 'ab.xml',
> we probably shouldn't.

No, the spec is very clear here. Those nodes must be order in the order they appear in the ab.xml document.

See http://www.w3.org/TR/xslt#document

> > Also, libxml's answer seems outright wrong, the three nodes 'abc' should 
> > always be together.
> 
> It is certainly strange, but does it actually violate the spec?

Yes. The spec says that the order is determined by the order of the documents. Clearly this order is the same for all nodes in the document. Therefor, either all nodes in one doc is before all nodes in the other, or the other way around. There can be no mixing.

> A quick check of libxml and Xalan-J with having two different orderings of the
> document() function in the union show that the ordering follows the order of
> the union expression.  This shows that the ordering is probably not based on
> the load order (assuming they only load the documents once).  I'll try to look
> at the actual code and see what is being done.

Well, the first time you run the union the union order will be the same as the load order. So the question is, what does the following template produce:

<xsl:template match="/">
  <xsl:copy-of select="document('a.xml') | document('b.xml')" />
  <xsl:copy-of select="document('b.xml') | document('a.xml')" />
</xsl:template>

Do the two copy commands produce different result? Or do they produce the same? As stated above, I do not believe that using order is allowed by the spec, the two commands above should result in the same output.
(In reply to comment #19)
> Well, the first time you run the union the union order will be the same as the
> load order. So the question is, what does the following template produce:
> 
> <xsl:template match="/">
>   <xsl:copy-of select="document('a.xml') | document('b.xml')" />
>   <xsl:copy-of select="document('b.xml') | document('a.xml')" />
> </xsl:template>
> 
> Do the two copy commands produce different result? Or do they produce the same?
> As stated above, I do not believe that using order is allowed by the spec, the
> two commands above should result in the same output.

Yes, this is basically the test I ran, and I re-ran it including MSXML:

Xalan-J: AB BA
libxml:  AB BA
MSXML:   AB AB

So from this one could guess that Xalan-J and libxml use the union order and MSXML uses the load order.  Although this test:

<xsl:variable name="a" select="document('a.xml')"/>
<xsl:variable name="b" select="document('b.xml')"/>
<xsl:template match="/">
  <xsl:copy-of select="$a | $b" />
  <xsl:copy-of select="$b | $a" />
</xsl:template>

Has the same result as the above regardless of the order of the variable declarations (which should affect the load order of the documents):

Xalan-J: AB BA
libxml:  AB BA
MSXML:   AB AB

...so maybe MSXML does not use load order after all.

I think the line in the spec we're discussing here is:

  There are no constraints on how the implementation orders documents other
  than that it must do so consistently: an implementation must always use
  the same order for the same set of documents.

After reading this more carefully, I think I'm coming around to your line of thinking that documents should have some kind of consistent ordering, independent of the order that they are added to a node-set.  I believe that does indeed capture the spirit of the spec.  Load order seems to be the best thing we have to order the documents by.

So what would be the right way to implement this?  We're already keeping track of loaded documents (but not their load order) in txExecutionState.  Additionally, txNodeSet has no access to that information.  Maybe txXPathNode needs a new field to store the load order of its owner document?
> <xsl:variable name="a" select="document('a.xml')"/>
> <xsl:variable name="b" select="document('b.xml')"/>
> <xsl:template match="/">
>   <xsl:copy-of select="$a | $b" />
>   <xsl:copy-of select="$b | $a" />
> </xsl:template>
> 
> Has the same result as the above regardless of the order of the variable
> declarations (which should affect the load order of the documents):
> 
> Xalan-J: AB BA
> libxml:  AB BA
> MSXML:   AB AB

Wow, that xalan and libxml fails here is surprising. Interestring to see that MSXML comes out on top. I keep being impressed by their xslt implementation.

> ...so maybe MSXML does not use load order after all.

Yeah, looks like not. What if you clear the cache and restart IE inbetween? Maybe they are sorting on cache keys or something like that.

Actually, sorting on uri isn't as bad as I first thought. The problem, as you notice, is that although we could keep track of this in txExecutionState (where we could sort the documents as they come in) that information needs to get to txNodeSet.

One way to do this is to add a method on txIEvalContext that allows two documents to be compared. Another way is to set a property on each document we load using the nsIDocument::SetProperty method.

Adding a function on txIEvalContext would probably result in a faster implementation, but i'm not sure how many more places we would need to forward a pointer to that interface.
Just to drop my stream of thought, the document order doesn't really need to 
scale, right? So sorting documents by URL by just assigning a 32 or 64 bit int
and doing a binary assignement (the first gets max_int/2, and then insert as in
a binary tree) should work pretty well for the most part. IMHO. And only fall back
to this if the document types differ and the INT ain't set. Something like this.
Draft from the hotel line.
Status: NEW → ASSIGNED
*** Bug 341238 has been marked as a duplicate of this bug. ***
Assignee: xslt → nobody
QA Contact: keith → xslt
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: