Closed
Bug 85893
Opened 24 years ago
Closed 23 years ago
sortByDocumentOrder is WAY SLOW
Categories
(Core :: XSLT, defect, P2)
Core
XSLT
Tracking
()
VERIFIED
FIXED
mozilla0.9.9
People
(Reporter: axel, Assigned: sicking)
References
Details
Attachments
(4 files, 4 obsolete files)
|
248 bytes,
application/x-perl
|
Details | |
|
239 bytes,
text/xml
|
Details | |
|
225 bytes,
text/xml
|
Details | |
|
64.12 KB,
patch
|
axel
:
review+
jag+mozilla
:
superreview+
|
Details | Diff | Splinter Review |
I profiled transformiix standalone with docbook stylesheets and a 1.5 meg xml
file. Almost all time is wasted in sortByDocumentOrder.
Looking at the code, that is not too suprising.
Rework the code to use two stacks for each node, and get the childnumber only
when really needed.
We have to evaluate if we wanna cache this. Someone needs to fix Map :-(.
(Who files that one?)
Axel
| Reporter | ||
Comment 1•24 years ago
|
||
| Assignee | ||
Comment 2•24 years ago
|
||
actually, isn't most of the time spent in sortByDocumentOrder spent searching
in Map. This time would be totally eliminated if we moved ::appearsFirst to the
Node class (as part of moving towards txXPathNode).
Module is also probobly hit somewhat by DOMHelper::getChildNumber (jst would
slap us soooo hard if he saw that :) )
| Reporter | ||
Comment 3•24 years ago
|
||
Note:
Things have completely changed in my head.
The current scheme is constructing the node set already sorted, based on a
compare function and a key generating function. More to come...
Axel
| Reporter | ||
Updated•24 years ago
|
Priority: -- → P5
| Assignee | ||
Comment 4•23 years ago
|
||
This is basically done and works fine. The implementation of
NodeSet::add(NodeSet*) still needs a bit more optimisation and the NodeSet::add
methods should of course not be named add2. I'm also a bit dubious about
removing all sortByDocumentOrder calls, since we might want to support unsorted
nodesets in the future.
| Assignee | ||
Comment 5•23 years ago
|
||
Ok, here goes. This one should be fully optimised both for ::add(Node*) and
::add(NodeSet*). To be honest i'm starting to think that i commented that stuff
too much, let me know if that is the case.
obsoleting the screenshot since that should have changed pretty much since the
death of DOMHelper.
I decided to keep an NodeSet::sortByDocumentOrder (it's inlined and no-op).
Mostly so that we don't loose track of where we should sort in case we in the
future don't always produce sorted nodesets.
Attachment #38407 -
Attachment is obsolete: true
Attachment #61957 -
Attachment is obsolete: true
| Assignee | ||
Comment 6•23 years ago
|
||
taking and setting some fields
Assignee: axel → sicking
Severity: major → normal
Priority: P5 → P2
Target Milestone: --- → mozilla0.9.8
| Assignee | ||
Comment 7•23 years ago
|
||
pushing
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.8 → mozilla0.9.9
| Assignee | ||
Comment 8•23 years ago
|
||
Attachment #62741 -
Attachment is obsolete: true
| Reporter | ||
Comment 9•23 years ago
|
||
the nits:
constructors, of course.
Modeline for NodeSet.h
"except under special conditions", that gives confidence to all of us. :-)
The contextDoc chunk is already on the trunk.
I yet have to look at NodeSet.* for real.
I'm not so sure if the sortByDocumentOrder being a noop is such a good idea.
Peter, what does your extension function tree think about this?
| Reporter | ||
Comment 10•23 years ago
|
||
NodeSet.h:
MBool add(Node* aNode);
replace "inserted-sorted" with something in english.
"The node is inserted into the NodeSet according to document order."
(according sounds wrong, too)
MBool add(const NodeSet* aNodes);
"The resulting NodeSet is sorted by document order and does not contain
duplicate nodes." or something like that.
MBool append();
add a comment that the caller of these functions takes care that the resulting
NodeSet is in document order.
Make these return nsresult rather than MBool.
"Revrese"
Add a comment on top of the set of functions inherited from ExprResult. I don't
like duplicate comments in inheritance trees too much, peterv?
findPosition:
return the position, and make the nodeExist an argument by reference
NodeSet.cpp:
constructor : foo. I wonder if you could even do
mElements({aNode});
if(!ensureSize(mElementCount + 1))
whitespace, and move it down. You don't know if you have to add the node yet.
I'm not fond of MBool NodeSet::add(const NodeSet* aNodes).
I'm not sure that it's an optimal algorithm, and I'm not sure about
using memmove on overlapping src and target.
Something like getting a marker field for both NodeSets, storing the
positions in the final nodeset in there, and then merge the two sounds
like it would be more efficient. Less memory operations, though we
need the temporary memory. But I'd imagine that to be faster.
Maybe one could merge the two nodesets in chunks, too.
'Revrese'
NodeSet::indexOf() should use findPosition.
Comment 11•23 years ago
|
||
Comment on attachment 66457 [details] [diff] [review]
sync with tip
>+NodeSet::NodeSet()
>+{
>+ mBufferSize = kTxNodeSetMinSize;
>+ mElements = new Node*[kTxNodeSetMinSize];
>+ if (!mElements) {
>+ NS_ASSERTION(0, "out of memory");
>+ mBufferSize = 0;
>+ }
>
>+ mElementCount = 0;
>+}
NodeSet::NodeSet() : mBufferSize(kTxNodeSetMinSize),
mElementCount(0)
>+NodeSet::NodeSet(const NodeSet& aSource)
>+{
>+ mElements = 0;
>+ mBufferSize = 0;
>+ mElementCount = 0;
>+ append(&aSource);
>+}
Ditto.
>+ if(!ensureSize(mElementCount + 1))
Ugh, space after if.
>+ return MB_FALSE;
>+
>+ int pos;
>+ if (findPosition(aNode, 0, mElementCount - 1, pos)) {
>+ memmove(mElements + pos + 1,
>+ mElements + pos,
>+ (mElementCount - pos) * sizeof(Node*));
>+ mElements[pos] = aNode;
>+ mElementCount++;
++mElementCount;
>+ // There were some elements still left in this nodeset that needs to
>+ // be moved
need to be moved.
>+ // There were some elements still left in the other nodeset that needs
>+ // to be copied
need
>+ * Revrese the order of the nodes.
Ew, revresing the order sounds horrible :).
>+ if (aLast - aFirst <= 1) {
>+ // If we search 2 nodes or less there is no point in further divides
>+ for (aPos = aFirst; aPos <= aLast; ++aPos) {
Do we really need to loop?
>+ void sortByDocumentOrder() {}
Newlines. Free from the gasstation last time I looked.
>+ if (!context || !expressions.getLength())
>+ return new StringResult("error");
Too many spaces. Free?
>+ NodeSet* nodeSet;
>
> if (context->getNodeType() != Node::DOCUMENT_NODE)
>- nodeSet->add(context->getOwnerDocument());
>+ nodeSet = new NodeSet(context->getOwnerDocument());
> else
>- nodeSet->add(context);
>+ nodeSet = new NodeSet(context);
>
> return nodeSet;
return new NodeSet(...); (twice) and drop the else then.
>+ if (!context || expressions.getLength() == 0 || !nodes)
>+ return nodes;
Too many spaces
>+ if (exprResult &&
>+ exprResult->getResultType() == ExprResult::NODESET)
>+ nodes->add((NodeSet*)exprResult);
>+
>+ delete exprResult;
One space too much.
Drop sortByDocumentOrder as discussed on irc.
Comment 12•23 years ago
|
||
Comment on attachment 66457 [details] [diff] [review]
sync with tip
>@@ -83,13 +89,13 @@
> for (int i=0; i<nodeSet->size(); i++) {
> String val;
> XMLDOMUtils::getNodeValue(nodeSet->get(i), val);
>- key->getNodes(val,contextDoc)->copyInto(*res);
>+ res->add(key->getNodes(val, contextDoc));
> }
> }
> else {
> String val;
> exprResult->stringValue(val);
>- key->getNodes(val,contextDoc)->copyInto(*res);
>+ res->append(key->getNodes(val, contextDoc));
> }
> delete exprResult;
> return res;
in the first case you add but in the second you append?
| Reporter | ||
Comment 13•23 years ago
|
||
re #12: In case of multiple keys, you have to merge the nodesets, if there is
just one, you just append that to the empty res.
| Reporter | ||
Comment 14•23 years ago
|
||
some performance data on this:
On the ever so funny "let's merge a union for nothing" test, I get a speedup
by a factor of 2. Not bad. We still have a time O(n^2), because nodeOrder
needs to iterate over the previous siblings. We are WAY faster for stylesheets
that are not brain dead stupid. I'll attach a generate.pl, union.xsl and
children.xsl. Here go a few numbers for 8000 child elements:
| trunk | new
union | 1:07.4| 38.9
children | 41.4| 3.0
whoops, where did all the love go. As you see, doing the nodeorder stuff is
the expensive part here.
(All numbers for standalone)
| Reporter | ||
Comment 15•23 years ago
|
||
perl script to generate a <doc> with a specified number of children.
edit $maxitem to scale it
| Reporter | ||
Comment 16•23 years ago
|
||
stylesheet to generate a union of all the separate children types.
uses a union of large NodeSets. Expected output are the numbers from 1 thru
$maxiter
| Reporter | ||
Comment 17•23 years ago
|
||
the easy way to achieve the same as union.xsl.
We see HUGE improvements on stylesheet/source combinations like this.
| Assignee | ||
Comment 18•23 years ago
|
||
Attachment #66457 -
Attachment is obsolete: true
| Reporter | ||
Comment 19•23 years ago
|
||
Comment on attachment 67094 [details] [diff] [review]
fix comments
More thinking,
NodeSet::indexOf(Node* aNode)
should have a
// XXX evaluate cost of this.
It has to be done for Attr aNodes, but I'm not sure when we should to it for
other node types. We prolly shouldn't do it at all if the nodes of this nodeset
don't have OrderInfo structs yet. We shouldn't do it for small nodesets.
We need some hard data on this. Make a comment for now.
I'd like to have stronger comments on the NodeSet::append API. Use "Appends"
instead of "Adds" in the two comments, plus a preface to the api like this:
/*
* append API
* These functions should be used with care.
* They are intended to be used when the caller assures that the resulting
* NodeSet remains in document order.
* Abuse will break document order, and cause errors in the result.
* These functions are significantly faster than the add API, as no
* Node::OrderInfo structs will be generated.
*/
the delete exprResult; in UnionExpr::evaluate has a TAB
Attachment #67094 -
Flags: review+
| Assignee | ||
Comment 20•23 years ago
|
||
Done locally, only comment stuff, so i won't attach a new patch
Comment 21•23 years ago
|
||
Comment on attachment 67094 [details] [diff] [review]
fix comments
rs=jag
Attachment #67094 -
Flags: superreview+
| Assignee | ||
Comment 22•23 years ago
|
||
checked in,
thanks for reviews!
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Description
•