Closed Bug 218770 Opened 21 years ago Closed 21 years ago

ideas for faster sorting

Categories

(Core :: XSLT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: sicking, Assigned: sicking)

Details

(Keywords: perf)

Attachments

(1 file, 2 obsolete files)

I've just filed this bug so that I don't forget a few of ideas I had for faster sorting in the nodesorter: First of all we shouldn't allocate each SortableNode by itself. It's a lot more efficient if we make one allocation at the beginning allocating all SortableNode and init them right away. Second we shouldn't allocate the mSortValues-arrays as separate arrays in each SortableNode. It's better to make a single allocation with aNodes.size()*mNKeys elements. We could even do what nsVoidArray does and put the mSortValues inlined into the SortableNode by when allocating space for the sortable node allocate enough space for the sort-values and changing |TxObject** mSortValues| to |TxObject* mSortValues[1]|. We might also want to change new+memset(0) to calloc (see bug 124302). Third we should use a more efficient algorithm for the actual sort. Right now it compares a node-to-be-inserted to every node until it finds one it should be inserted before. A much more optimal way to find the insert-position is to use binary search. The only issue with this is that currently we keep the sorted list in a linked list to avoid memoryshuffels when the nodes are inserted. Doing binary search in a linked list will be pretty slow so I think we're going to have to live with the memoryshuffels.
I suggest we do the following: Make one big chunk of sortValues. Sort a index array (that gets rid of the memory shuffle problem a bit, plus we can ensure stability) Use NS_QuickSort.
Attached patch patch v1 (obsolete) — Splinter Review
this implements all the above ideas. Additionally it makes us not shuffle the nodes to be sorted first into the internal structures and then back into the nodeset, rather it just creates a new nodeset and inserts the node in the appropriate order there. This one might be worth checking into the tree before we land the walkerpatch since sorting is the only client of clone and destroy and this patch removes the need for them.
Attachment #131943 - Flags: superreview?(peterv)
Attachment #131943 - Flags: review?(axel)
btw, we might want to land this before the walker since it will get rid of the need for txXPathNodeUtils::cloneNode/destroy since we no longer move nodes into a SortableNode structure but rather just keep them in the nodeset.
Comment on attachment 131943 [details] [diff] [review] patch v1 >Index: source/xslt/util/txNodeSorter.cpp >=================================================================== >RCS file: /cvsroot/mozilla/extensions/transformiix/source/xslt/util/txNodeSorter.cpp,v >retrieving revision 1.15 >diff -u -6 -p -r1.15 txNodeSorter.cpp >--- source/xslt/util/txNodeSorter.cpp 16 Jun 2003 22:31:19 -0000 1.15 >+++ source/xslt/util/txNodeSorter.cpp 23 Sep 2003 06:45:54 -0000 <...> >@@ -162,126 +164,141 @@ txNodeSorter::addSortElement(Expr* aSele > mSortKeys.add(key); > mNKeys++; > > return NS_OK; > } > >-nsresult txNodeSorter::sortNodeSet(NodeSet* aNodes, txExecutionState* aEs) >+nsresult >+txNodeSorter::sortNodeSet(NodeSet* aNodes, txExecutionState* aEs, >+ NodeSet** aResult) > { >- if (mNKeys == 0) >+ if (mNKeys == 0 || aNodes->isEmpty()) { >+ NS_ADDREF(*aResult = aNodes); no code in REF macros, please. Assignments are not fatal, but these macros are not atomic. >+ > return NS_OK; >+ } > >- txList sortedNodes; >- txListIterator iter(&sortedNodes); >+ *aResult = nsnull; > >- int len = aNodes->size(); >+ nsRefPtr<NodeSet> sortedNodes; >+ nsresult rv = aEs->recycler()->getNodeSet(getter_AddRefs(sortedNodes)); >+ NS_ENSURE_SUCCESS(rv, rv); > >- // Step through each node in NodeSet... >- int i; >- for (i = len - 1; i >= 0; i--) { >- SortableNode* currNode = new SortableNode(aNodes->get(i), mNKeys); >- if (!currNode) { >- // XXX ErrorReport: out of memory >- iter.reset(); >- while (iter.hasNext()) { >- SortableNode* sNode = (SortableNode*)iter.next(); >- sNode->clear(mNKeys); >- delete sNode; >- } >- return NS_ERROR_OUT_OF_MEMORY; >- } >- iter.reset(); >- SortableNode* compNode = (SortableNode*)iter.next(); >- while (compNode && (compareNodes(currNode, compNode, aNodes, aEs) > 0)) { >- compNode = (SortableNode*)iter.next(); >- } >- // ... and insert in sorted list >- iter.addBefore(currNode); >+ txNodeSetContext* evalContext = new txNodeSetContext(aNodes, aEs); >+ NS_ENSURE_TRUE(evalContext, NS_ERROR_OUT_OF_MEMORY); >+ >+ rv = aEs->pushEvalContext(evalContext); >+ NS_ENSURE_SUCCESS(rv, rv); >+ >+ // Create and set up memoryblock for sort-values and indexarray >+ PRUint32 len = NS_STATIC_CAST(PRUint32, aNodes->size()); >+ void* mem = PR_Malloc(len * (sizeof(PRInt32) + mNKeys * sizeof(TxObject*))); >+ NS_ENSURE_TRUE(mem, NS_ERROR_OUT_OF_MEMORY); PR_Calloc, here. That should be cheaper, taken that there is more memory nulled for the sortValues than non-nulled for the indexing. >+ >+ PRUint32* indexes = NS_STATIC_CAST(PRUint32*, mem); >+ TxObject** sortValues = NS_REINTERPRET_CAST(TxObject**, indexes + len); >+ >+ PRUint32 i; >+ for (i = 0; i < len; ++i) { >+ indexes[i] = i; > } >- >- // Clean up and set nodes in NodeSet >- // Note that the nodeset shouldn't be changed until the sort is done >- // since it's the current-nodeset used during xpath evaluation >- aNodes->clear(); >- iter.reset(); >- while (iter.hasNext()) { >- SortableNode* sNode = (SortableNode*)iter.next(); >- aNodes->append(sNode->mNode); >- sNode->clear(mNKeys); >- delete sNode; >+ memset(sortValues, 0, len * mNKeys * sizeof(TxObject*)); >+ >+ // Sort the indexarray >+ SortData sortData; >+ sortData.mNodeSorter = this; >+ sortData.mContext = evalContext; >+ sortData.mSortValues = sortValues; >+ sortData.mRv = NS_OK; Make SortData a all-public class local to this .cpp, put this into a constructor. >+ NS_QuickSort(indexes, len, sizeof(PRInt32), compareNodes, &sortData); >+ >+ PRUint32 numSortValues = len * mNKeys; >+ for (i = 0; i < numSortValues; ++i) { >+ delete sortValues[i]; > } >- >+ >+ if (NS_FAILED(sortData.mRv)) { >+ PR_Free(mem); >+ return sortData.mRv; >+ } >+ >+ delete aEs->popEvalContext(); delete the eval context way up. >+ >+ // Insert nodes in sorted order in new nodeset >+ for (i = 0; i < len; ++i) { >+ rv = sortedNodes->append(aNodes->get(indexes[i])); >+ if (NS_FAILED(rv)) { >+ PR_Free(mem); >+ return rv; >+ } >+ } >+ >+ free(mem); >+ of course, PR_Free(mem); >+ NS_ADDREF(*aResult = sortedNodes); >+ > return NS_OK; > } > >-int txNodeSorter::compareNodes(SortableNode* aSNode1, >- SortableNode* aSNode2, >- NodeSet* aNodes, >- txExecutionState* aEs) >+// static >+int >+txNodeSorter::compareNodes(const void* aIndexA, const void* aIndexB, >+ void* aSortData) > { >- txListIterator iter(&mSortKeys); >+ NS_ENSURE_SUCCESS(sortData->mRv, -1); >+ > nsresult rv = NS_OK; >- int i; >+ SortData* sortData = NS_STATIC_CAST(SortData*, aSortData); >+ txListIterator iter(&sortData->mNodeSorter->mSortKeys); >+ PRUint32 indexA = *NS_STATIC_CAST(const PRUint32*, aIndexA); >+ PRUint32 indexB = *NS_STATIC_CAST(const PRUint32*, aIndexB); >+ TxObject** sortValuesA = sortData->mSortValues + >+ indexA * sortData->mNodeSorter->mNKeys; >+ TxObject** sortValuesB = sortData->mSortValues + >+ indexB * sortData->mNodeSorter->mNKeys; > >+ int i; > // Step through each key until a difference is found >- for (i = 0; i < mNKeys; i++) { >+ for (i = 0; i < sortData->mNodeSorter->mNKeys; i++) { > SortKey* key = (SortKey*)iter.next(); > // Lazy create sort values >- if (!aSNode1->mSortValues[i]) { >- txForwardContext evalContext(aEs->getEvalContext(), aSNode1->mNode, aNodes); >- aEs->pushEvalContext(&evalContext); >+ if (!sortValuesA[i]) { >+ sortData->mContext->setPosition(indexA + 1); // position is 1-based > nsRefPtr<txAExprResult> res; >- rv = key->mExpr->evaluate(&evalContext, getter_AddRefs(res)); >- NS_ENSURE_SUCCESS(rv, -1); >+ rv = key->mExpr->evaluate(sortData->mContext, getter_AddRefs(res)); >+ if (NS_FAILED(rv)) { >+ sortData->mRv = rv; >+ return -1; >+ } > >- aEs->popEvalContext(); >- aSNode1->mSortValues[i] = key->mComparator->createSortableValue(res); >- if (!aSNode1->mSortValues[i]) { >- // XXX ErrorReport >+ sortValuesA[i] = key->mComparator->createSortableValue(res); >+ if (!sortValuesA[i]) { >+ sortData->mRv = NS_ERROR_OUT_OF_MEMORY; > return -1; > } > } >- if (!aSNode2->mSortValues[i]) { >- txForwardContext evalContext(aEs->getEvalContext(), aSNode2->mNode, aNodes); >- aEs->pushEvalContext(&evalContext); >+ if (!sortValuesB[i]) { >+ sortData->mContext->setPosition(indexB + 1); // position is 1-based > nsRefPtr<txAExprResult> res; >- rv = key->mExpr->evaluate(&evalContext, getter_AddRefs(res)); >- NS_ENSURE_SUCCESS(rv, -1); >+ rv = key->mExpr->evaluate(sortData->mContext, getter_AddRefs(res)); >+ if (NS_FAILED(rv)) { >+ sortData->mRv = rv; >+ return -1; >+ } > >- aEs->popEvalContext(); >- aSNode2->mSortValues[i] = key->mComparator->createSortableValue(res); >- if (!aSNode2->mSortValues[i]) { >- // XXX ErrorReport >+ sortValuesB[i] = key->mComparator->createSortableValue(res); >+ if (!sortValuesB[i]) { >+ sortData->mRv = NS_ERROR_OUT_OF_MEMORY; > return -1; > } > } > factor the eval code into a helper method, ensureSortData(sortValue*, position, SortData&) or so. That saves code and my lead to less branchprediction errors. > // Compare node values >- int compRes = key->mComparator->compareValues(aSNode1->mSortValues[i], >- aSNode2->mSortValues[i]); >+ int compRes = key->mComparator->compareValues(sortValuesA[i], >+ sortValuesB[i]); > if (compRes != 0) > return compRes; > } > // All keys have the same value for these nodes >- return 0; >-} >- >-txNodeSorter::SortableNode::SortableNode(Node* aNode, int aNValues) >-{ >- mNode = aNode; >- mSortValues = new TxObject*[aNValues]; >- if (!mSortValues) { >- // XXX ErrorReport: out of memory >- return; >- } >- memset(mSortValues, 0, aNValues * sizeof(void *)); >-} >- >-void txNodeSorter::SortableNode::clear(int aNValues) >-{ >- int i; >- for (i = 0; i < aNValues; i++) { >- delete mSortValues[i]; >- } >- >- delete [] mSortValues; >+ >+ return indexA < indexB ? -1 : 1; just return indexB - indexA > } >Index: source/xslt/util/txNodeSorter.h >=================================================================== >RCS file: /cvsroot/mozilla/extensions/transformiix/source/xslt/util/txNodeSorter.h,v >retrieving revision 1.9 >diff -u -6 -p -r1.9 txNodeSorter.h >--- source/xslt/util/txNodeSorter.h 26 Mar 2003 01:10:14 -0000 1.9 >+++ source/xslt/util/txNodeSorter.h 23 Sep 2003 06:45:54 -0000 >@@ -49,12 +49,13 @@ class Expr; > class Node; > class NodeSet; > class txExecutionState; > class TxObject; > class txXPathResultComparator; > class txIEvalContext; >+class txNodeSetContext; > > /* > * Sorts Nodes as specified by the W3C XSLT 1.0 Recommendation > */ > > class txNodeSorter >@@ -63,37 +64,31 @@ public: > txNodeSorter(); > ~txNodeSorter(); > > nsresult addSortElement(Expr* aSelectExpr, Expr* aLangExpr, > Expr* aDataTypeExpr, Expr* aOrderExpr, > Expr* aCaseOrderExpr, txIEvalContext* aContext); >- nsresult sortNodeSet(NodeSet* aNodes, txExecutionState* aEs); >+ nsresult sortNodeSet(NodeSet* aNodes, txExecutionState* aEs, >+ NodeSet** aResult); > > private: >- class SortableNode >+ struct SortData > { >- public: >- SortableNode(Node* aNode, int aNValues); >- void clear(int aNValues); >+ txNodeSorter* mNodeSorter; >+ txNodeSetContext* mContext; > TxObject** mSortValues; >- Node* mNode; >+ nsresult mRv; > }; Move this out of txNodeSorter and make it a class in the .cpp ... > struct SortKey > { > Expr* mExpr; > txXPathResultComparator* mComparator; > }; >- >- int compareNodes(SortableNode* sNode1, >- SortableNode* sNode2, >- NodeSet* aNodes, >- txExecutionState* aEs); > >- MBool getAttrAsAVT(Element* aSortElement, >- nsIAtom* aAttrName, >- nsAString& aResult); >+ static int compareNodes(const void* aIndexA, const void* aIndexB, >+ void* aSortData); ... as just this needs access to the nodesorter. > > txList mSortKeys; > int mNKeys; > }; > > #endif
Attachment #131943 - Flags: review?(axel) → review-
There's nothing wrong with NS_ADDREF(*foo = bar), and for NS_IF_ADDREF it's even the preferred form.
> PR_Calloc, here. That should be cheaper, taken that there is more memory nulled > for the sortValues than non-nulled for the indexing. I pondered that but *afaict* the only thing calloc buys you is that you save cycles if the page is paged out before you actually use it. That should not be the case here though since we're going to use the entire memoryblock pretty much immideatly. So I don't think the extra cycles for clearing more memory then needed outweighs the benefit of calloc. > Make SortData a all-public class local to this .cpp, put this into a > constructor. Why? For simple things like this where the object is only constructed once i prefer to keep things simple and just use a struct. > delete the eval context way up. Why? I prefer to delete things in the opposite way that they are created so this should actually be moved to after the final PR_Free. Did so. The exception to this is that I delete the sortvalues immideatly, but that is only to avoid having to do it at several places like the PR_Free. > factor the eval code into a helper method, > ensureSortData(sortValue*, position, SortData&) or so. > That saves code and my lead to less branchprediction errors. I did like this suggestion and implemented it, but it actually turned out to be more lines of code then the current way so I reverted back :( Fixed everything else, new patch comming up
Attached patch patch v2 (obsolete) — Splinter Review
fixes comments from pike
Attachment #131943 - Attachment is obsolete: true
Attachment #132071 - Flags: superreview?(peterv)
Attachment #132071 - Flags: review?(axel)
> I did like this suggestion and implemented it, but it actually turned out to > be more lines of code then the current way so I reverted back :( works for me, see below. Note, the !sortValuesB[i] check in the if () is to have the branchprediction before calling the function. static PRBool ensureSortValue(TxObject* aSortValue, SortKey* aKey, SortData* aSortData, PRUint32 aIndex) { aSortData->mContext->setPosition(aIndex + 1); // position is 1-based nsRefPtr<txAExprResult> res; nsresult rv = aKey->mExpr->evaluate(aSortData->mContext, getter_AddRefs(res)); if (NS_FAILED(rv)) { aSortData->mRv = rv; return PR_FALSE; } aSortValue = aKey->mComparator->createSortableValue(res); if (!aSortValue) { aSortData->mRv = NS_ERROR_OUT_OF_MEMORY; return PR_FALSE; } return PR_TRUE; } { NS_ENSURE_SUCCESS(sortData->mRv, -1); SortData* sortData = NS_STATIC_CAST(SortData*, aSortData); txListIterator iter(&sortData->mNodeSorter->mSortKeys); PRUint32 indexA = *NS_STATIC_CAST(const PRUint32*, aIndexA); PRUint32 indexB = *NS_STATIC_CAST(const PRUint32*, aIndexB); TxObject** sortValuesA = sortData->mSortValues + indexA * sortData->mNodeSorter->mNKeys; TxObject** sortValuesB = sortData->mSortValues + indexB * sortData->mNodeSorter->mNKeys; int i; // Step through each key until a difference is found for (i = 0; i < sortData->mNodeSorter->mNKeys; i++) { SortKey* key = (SortKey*)iter.next(); // Lazy create sort values if (!sortValuesA[i] && !ensureSortValue(sortValuesA[i], key, sortData, indexA) { return -1; } if (!sortValuesB[i] && !ensureSortValue(sortValuesB[i], key, sortData, indexA) { return -1; } } // Compare node values //... }
Attached patch patch v3Splinter Review
implements ensureSortValue (though i called it calcSortValue since it's only called to do the actual calculation) as Pike requested. I also changed the code to return 0 on failure since i figured that that might make the sortroutine exit a bit faster (not sure if that's the case, and IMHO it doesn't really matter anyway, but i figured why not)
Attachment #132071 - Attachment is obsolete: true
Attachment #132213 - Flags: superreview?(peterv)
Attachment #132213 - Flags: review?(axel)
Attachment #132071 - Flags: superreview?(peterv)
Attachment #132071 - Flags: review?(axel)
Comment on attachment 132213 [details] [diff] [review] patch v3 please return -1 on error, that should ensure stability and I guess nsQuickSort is faster then. Looking at the code, cmp == 0 triggers two loops. In all three cases of failure in compareNodes. I'm still not keen about the location of the popEvalContext, but r=axel@pike.org
Attachment #132213 - Flags: review?(axel) → review+
Comment on attachment 132213 [details] [diff] [review] patch v3 > Index: source/xslt/util/txNodeSorter.cpp > =================================================================== > + void* mem = PR_Malloc(len * (sizeof(PRInt32) + mNKeys * sizeof(TxObject*))); sizeof(PRUint32) > + NS_QuickSort(indexes, len, sizeof(PRInt32), compareNodes, &sortData); sizeof(PRUint32) > + delete aEs->popEvalContext(); Maybe comment in the early returns about how txExecutionState cleans this up if we fail to pop. > +// static > +int > +txNodeSorter::compareNodes(const void* aIndexA, const void* aIndexB, > + void* aSortData) > // Step through each key until a difference is found > - for (i = 0; i < mNKeys; i++) { > + for (i = 0; i < sortData->mNodeSorter->mNKeys; i++) { ++i
Attachment #132213 - Flags: superreview?(peterv) → superreview+
checked in, thanks for reviews
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: