Closed Bug 345099 Opened 18 years ago Closed 18 years ago

Sort ranges in the selection for faster comparison

Categories

(Core :: DOM: Selection, defect)

1.8 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8.1beta2

People

(Reporter: brettw, Assigned: brettw)

References

Details

(Keywords: fixed1.8.1)

Attachments

(3 files, 6 obsolete files)

The selection contains a sequence of ranges. These ranges are currently stored in an undefined order. These ranges should instead be sorted in DOM order so they can be searched and manipulated faster. Binary searching should then be used for those cases where the list is iterated over.
Blocks: 345048
Blocks: 345101
Target Milestone: --- → mozilla1.8.1beta2
It should be noted that currently (since bug 73373 was fixed), you can select multiple ranges (using CTRL), and when copying, they will be copied into the clipboard in the order in which they were selected (not in the DOM order!). It seems that fixing the current bug will change that behavior. I'm not sure whether that's a bad thing or not - I personally don't see the value in multiple selection.
Changing copy behavior seems fine to me, possibly even more desirable, probably not much worse. If totally necessary, we can have store that order separately.
Blocks: 343610
Blocks: 234936
Here are some timings for the patch I'm working on for nsTypedSelection::LookUpSelection. Units are microseconds (1/1M seconds) per comparison. I computed this by timing the calls in a 10K loop, and dividing by 10K. Old New 0 ranges: 0.316 0.0154 0.312 0.0150 0.310 0.0151 avg: 0.312 0.0152 new/old = 0.048 1 range: 10.1995 10.9430 10.2452 10.7635 13.7798 25.9530 <-- slower 12.0907 11.4539 avg: 11.5788 14.7784 new/old = 1.276 9 ranges: 61.5726 36.1701 56.0549 45.2374 64.4519 36.0935 avg: 60.6931 39.1670 new/old = 0.645 250 ranges: 3131.5237 180.8531 3177.1578 183.6467 3060.1409 170.8664 avg: 3122.9408 178.4554 new / old = 0.0571 You can see that as the number of ranges in the selection increases, the better we are (@250 ranges, we're 17.5 times faster than the old version). However, we are sometimes slower in the case of one range, sometimes by almost a factor of two. In the one slower case, I did select all for the textarea. The old version of this function has a flag for whether the caller wants a slow check or if it has enough information and we can check less. One advantage of this new patch is that we always do a full check, eliminating the need for this flag and some extra complexity. I am inclined to say that the extra slowness in the case of one range is OK, especially since it reduces complexity. In a bad case where there are 100K nodes visible on the screen, they're all selected, and we're drawing all of them, this means 2.6 seconds for checking selections instead of 1.3 seconds. If this is unacceptable, we can add a second code path in the case of 1 node, and preserve the fast check flag.
Attached patch Selection patch with timing (obsolete) — Splinter Review
Here is a patch with the timing code in it so I don't lose it. I'm going to clean it up and post a final patch soon.
Attached patch Patch (obsolete) — Splinter Review
This patch uses an interval tree to store selection ranges. The timing numbers were given above. I checked pages with many nodes like this: http://www.craptastic.org/bah/test.html and it does not regress the time I am able to perceive it taking doing "Select All," which I believe exercises the slow case discussed above.
This patch is required for reasonable spellcheck performance. Requesting blocking.
Flags: blocking1.8.1?
Comment on attachment 230226 [details] [diff] [review] Patch Uri: will you be able to get to this soon? If not, do you know of another qualified reviewer?
Attachment #230226 - Flags: review?(uriber)
Oops, ignore the changes to nsTextFrame.cpp in the patch above.
Below is my attempt at reviewing this. All of the issues save one are just questions, suggestions or minor comments. The one place where I think I might have found a flaw in the logic is marked by "***". I did not review (due to lack of qualification) XPCOM-related issues. So whoever does the sr (I'd recommend roc) should take an extra look to make sure everything is OK. ----------------------------------------- >+ PRInt32 mBeginIndex; // index into mRanges of this item It seems to me like you don't really need this field, as you're always accessing a RangeData through its index in mRanges. I assume you have it for symmetry with mEndIndex (and perhaps to enable sanity checks), but perhaps it's not worth the trouble (and time) of maintaining it? >+ // This data structure is an interval tree Is it? It certainly holds intervals, but I can't see a tree anywhere in the implementation, just sorted arrays. >+// If there is item(s) in the array equal to the imput point, we will return >+// a random imdex between or adjacent to this item(s). "input", "index". >+ // a common case is that we have no ranges yet >+ if (mRanges.Length() == 0) { >+ PRInt32 index = (PRInt32)mRanges.Length(); Wouldn't "index" always be 0? So why do we need it? >+ // Performance: 99% of the time, the beginning array and the ending array >+ // will be the same because the ranges do not overlap. We could save a few >+ // compares (which can be expensive) in this common case by special casing >+ // this. This would mean we do 2 compares instead of log(n) in the common >+ // case, and 2+log(n) in the worse case. This describes an unimplemented enhancement, right? Perhaps precede the comment with "XXX" to make that clear. Also, I'm not sure I understand what those 2 compares actually are. >+ // insert the range, being careful to revert everything on error to keep >+ // consistency >+ if (! mRanges.InsertElementAt(beginInsertionPoint, >+ RangeData(aItem, beginInsertionPoint, endInsertionPoint))) { >+ mRanges.RemoveElementAt(beginInsertionPoint); >+ return NS_ERROR_OUT_OF_MEMORY; >+ } I don't think you want to remove the element if inserting it failed. >+ // find the range's index & remove it >+ PRInt32 idx = -1; >+ PRUint32 i; >+ for (i = 0; i < mRanges.Length(); i ++) { >+ if (mRanges[i].mRange == aItem) { >+ idx = (PRInt32)i; >+ break; >+ } >+ } You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch and a forward scan) to do a binary search here, right? Is it not worth the trouble? >+ if (mRangeEndings[i] >= idx) >+ mRangeEndings[i] --; Any special reason for ">=" here, instead of ">"? >+ // this node matches, go back one >+ (*aIndex) --; I think it would be clearer if you say "the previous node matches", as you're referring to the node at (*aIndex)-1. >+// nsTypedSelection::MoveIndexToNextEndMismatch >+// >+// If there is one or more range ending at the requested position, we >+// want to start checking the first one that ends inside the range. This >+// moves the given index to point to the first index inside the range. >+// It may be one past the last item in the array. Either I didn't understand the comment, or I didn't understand the code, or they don't match. It seems to me that this actually moves the given index to point to the first range which ends after the requested position. >+ // adjust the indices in case of exact matches, FindInsertionPoint will >+ // give us a random index that would still be sorted if there is a match >+ if (aAllowAdjacent) { >+ // include adjacent points >+ rv = MoveIndexToNextEndMismatch(&beginningIndex, aEndNode, aEndOffset, >+ nsnull); >+ NS_ENSURE_SUCCESS(rv, rv); >+ rv = MoveIndexToFirstBeginMatch(&endingIndex, aBeginNode, aBeginOffset, >+ &mRangeEndings); >+ NS_ENSURE_SUCCESS(rv, rv); >+ } else { *** I don't think this works. First, we want to move beginningIndex to point to the first range in mRanges whose *beginning* is after aEndNode/aEndOffset, right? But to do so, you're using MoveIndexToNextEndMismatch, which only looks at *ends* of ranges (and compares them to aEndNode/aEndOffset). Then, we want to find the last range in mRangeEndings whose *end* is before aBeginNode/aBeginOffset, but we're using MoveIndexToFirstBeginMatch, which looks at *beginnings* of ranges. I don't have any comments or questions about nsTypedSelection::LookUpSelection. >+ // The caller wants to know if the node is entirely within the given range, >+ // so we have to check all intersecting ranges. >+ for (PRInt32 i = 0; i < overlappingRanges.Count(); i ++) { >+ PRBool nodeStartsBeforeRange, nodeEndsAfterRange; >+ if (NS_SUCCEEDED(nsRange::CompareNodeToRange(content, overlappingRanges[i], >+ &nodeStartsBeforeRange, >+ &nodeEndsAfterRange))) >+ { >+ if (!nodeStartsBeforeRange && !nodeEndsAfterRange) { >+ *aYes = PR_TRUE; >+ return NS_OK; > } > } > } I wonder if it would be possible to implement this inside GetRangesForIntervalCOMArray - using a flag that would tell it to look only for ranges fully contained within the given range, rather than those intersecting it. At least in theory, that should be more efficient than finding all the intersecting ranges, and then filtering them this way. However, if this is not a performance problem in practice, it might not be worth the trouble.
Attached patch Patch V2 (obsolete) — Splinter Review
This patch addresses all or Uri's review except for moving the "all inside" test into GetRangesForInterval. This flag is not used very much, I see only 2 calls: nsContentAreaDragDrop and nsPlaintextDataTransfer.cpp. The range finding code is already kind of scary. I think we should push as much of this complexity out of the function.
Attachment #230226 - Attachment is obsolete: true
Attachment #230461 - Flags: review?(uriber)
Attachment #230226 - Flags: review?(uriber)
Flags: blocking1.8.1? → blocking1.8.1+
Blocks: 345751
Comment on attachment 230461 [details] [diff] [review] Patch V2 The examples in the comments really help understanding what's going on - thanks. You haven't addressed four of the points from my review (detailed below). For (b) I assume the reason is that it's not worth the trouble. For (a), (c), and (d), I assume you either have a reason to keep them the way it is, or it's an oversight. If it's the latter - please address them before checking in. Either way, r=me. I should also say that this is really great code, IMO. --------------- (a) > >+ // a common case is that we have no ranges yet > >+ if (mRanges.Length() == 0) { > >+ PRInt32 index = (PRInt32)mRanges.Length(); > > Wouldn't "index" always be 0? So why do we need it? (b) > >+ // find the range's index & remove it > >+ PRInt32 idx = -1; > >+ PRUint32 i; > >+ for (i = 0; i < mRanges.Length(); i ++) { > >+ if (mRanges[i].mRange == aItem) { > >+ idx = (PRInt32)i; > >+ break; > >+ } > >+ } > > You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch > and a forward scan) to do a binary search here, right? Is it not worth the > trouble? (c) > >+ if (mRangeEndings[i] >= idx) > >+ mRangeEndings[i] --; > > Any special reason for ">=" here, instead of ">"? (d) > >+ // this node matches, go back one > >+ (*aIndex) --; > > I think it would be clearer if you say "the previous node matches", as you're > referring to the node at (*aIndex)-1.
Attachment #230461 - Flags: review?(uriber) → review+
Why not use nsRange instead of nsIDOMRange here? Give yourself some inline getters to simplify this code. I think ultimately we'd like to have the ranges in a selection be discontiguous, then we'd only need one array here. In fact we could use some balanced tree function so even AddRange could be O(log N). Future work. + if (mRanges.Length() == 0) { + PRInt32 index = (PRInt32)mRanges.Length(); You just said it's zero...
(In reply to comment #11) > > >+ // find the range's index & remove it > > >+ PRInt32 idx = -1; > > >+ PRUint32 i; > > >+ for (i = 0; i < mRanges.Length(); i ++) { > > >+ if (mRanges[i].mRange == aItem) { > > >+ idx = (PRInt32)i; > > >+ break; > > >+ } > > >+ } > > > > You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch > > and a forward scan) to do a binary search here, right? Is it not worth the > > trouble? In this case, I think it's a bad idea. The DOM comparisons are very expensive because they walk the DOM tree and do lots of virtual function calls. Doing pointer comparisons in a linear array is very fast, even though we do more of them. For small numbers of ranges, linear is definitely faster. I'm unsure at what point the logarithmic method would be faster, but a wild guess would be several thousand ranges. I post a new patch with the rest, and put a comment in about that.
Attached patch Patch V3 (obsolete) — Splinter Review
This addresses the rest of Uri's comments.
Attachment #230461 - Attachment is obsolete: true
Attachment #230583 - Flags: superreview?(roc)
Attachment #230175 - Attachment is obsolete: true
(In reply to comment #15) > did you see my comments? Do you mean that I should convert nsIDOMRanges to nsRange and use that internally? I don't see what that gets me. Where do you suggest using inline getters?
Whiteboard: [has patch][needs sr]
Comment on attachment 230583 [details] [diff] [review] Patch V3 >Index: layout/generic/nsSelection.cpp >+ nsresult FindInsertionPoint( >+ nsTArray<PRInt32>* aRemappingArray, I think this would be better as a const nsTArray<PRInt32>&, what do you think? > nsresult > nsTypedSelection::RemoveItem(nsIDOMRange *aItem) > { >+ // meed to update the range ending list to reflect the removed item "need" // nsTypedSelection::MoveIndexToFirstMatch Could you add like one sentence to the beginning of your comment that says what the function does, before you start talking about the parameters and stuff? In particular, what do you get back? Also, same comment about the remapping array being a const reference. It might be more readable here if you used a temporary instead of (*aIndex) directly, but it's up to you. // nsTypedSelection::MoveIndexToNextMismatch I like the function description here. My other couple of comments for MoveIndexToFirstMatch apply here though. >+NS_IMETHODIMP >+nsTypedSelection::GetRangesForIntervalCOMArray(nsIDOMNode* aBeginNode, PRInt32 aBeginOffset, >+ nsIDOMNode* aEndNode, PRInt32 aEndOffset, >+ PRBool aAllowAdjacent, >+ nsCOMArray<nsIDOMRange>* aRanges) >+{ ... >+ // adjust the indices in case of exact matches, FindInsertionPoint will >+ // give us a random index that would still be sorted if there is a match >+ if (aAllowAdjacent) { >+ // include adjacent points >+ // 1 3 5 5 5 8 9 10 10 10 11 12 >+ // ^ ^ <-- we have this >+ // ^ ^ <-- compute this range >+ // endingIndex beginningIndex >+ rv = MoveIndexToFirstMatch(&endingIndex, aBeginNode, aBeginOffset, >+ &mRangeEndings, PR_FALSE); >+ NS_ENSURE_SUCCESS(rv, rv); >+ rv = MoveIndexToNextMismatch(&beginningIndex, aEndNode, aEndOffset, >+ nsnull, PR_TRUE); >+ NS_ENSURE_SUCCESS(rv, rv); In this case, it seems like endingIndex and beginningIndex will be correct but reversed. Am I missing something? >-nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aRecursive, PRBool* aYes) >+nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aAllowPartial, PRBool* aYes) > { > if (!aYes) > return NS_ERROR_NULL_POINTER; >+ NS_ASSERTION(ValidateRanges(), "Ranges out of sync"); > *aYes = PR_FALSE; > >- // Iterate over the ranges in the selection checking for the content: >- PRInt32 cnt = mRangeArray.Count(); >- for (PRInt32 i = 0; i < cnt; ++i) >- { >- nsIDOMRange* range = mRangeArray[i]; >- if (!range) >- return NS_ERROR_UNEXPECTED; >+ if (mRanges.Length() == 0) >+ return NS_OK; > >- nsCOMPtr<nsIContent> content (do_QueryInterface(aNode)); >- if (content) >- { >- if (nsRange::IsNodeIntersectsRange(content, range)) >- { >- // If recursive, then we're done -- IsNodeIntersectsRange does the right thing >- if (aRecursive) >- { >- *aYes = PR_TRUE; >- return NS_OK; >- } >+ nsresult rv; >+ nsCOMPtr<nsIContent> content (do_QueryInterface(aNode, &rv)); >+ NS_ENSURE_SUCCESS(rv, rv); I think you can move this down and avoid doing it at all in some of your early-return cases. >Index: content/base/public/nsISelection2.idl >+#include "nsISupports.idl" Strictly speaking you don't need this, you get it through nsISelection.idl Looks good otherwise.
Attachment #230583 - Flags: superreview?(roc) → superreview+
(In reply to comment #16) > (In reply to comment #15) > > did you see my comments? > > Do you mean that I should convert nsIDOMRanges to nsRange and use that > internally? I don't see what that gets me. > > Where do you suggest using inline getters? Add some to nsRange.
(In reply to comment #18) > > Where do you suggest using inline getters? > > Add some to nsRange. I see, so I should add a static IID to nsRange when people add ranges. What about if somebody gives us an nsIDOMRange that isn't an nsRange? I know this won't really happen in practice, but the APIs are designed for this kind of thing. This patch needs to get in last week for 2.0b2. Can we just get it in and fix the ranges on trunk if you still think it's important? For Brian's comments: I added more comments on the index selection to make it clearer, and changed the remapping arrays to be const pointers (we can't use const references since the parameter is optional).
Attachment #230583 - Attachment is obsolete: true
Fixed on trunk.
Whiteboard: [has patch][needs sr] → [has patch]
Whiteboard: [has patch] → [baking on trunk]
Attached patch Final branch patch (obsolete) — Splinter Review
Attachment #230921 - Flags: approval1.8.1?
Attachment #230940 - Flags: superreview?
Attachment #230940 - Flags: review?
Attachment #230940 - Flags: superreview?(bryner)
Attachment #230940 - Flags: superreview?
Attachment #230940 - Flags: review?(bryner)
Attachment #230940 - Flags: review?
Attachment #230940 - Flags: superreview?(bryner)
Attachment #230940 - Flags: superreview+
Attachment #230940 - Flags: review?(bryner)
Attachment #230940 - Flags: review+
Attachment #230921 - Attachment is obsolete: true
Attachment #230943 - Flags: approval1.8.1?
Attachment #230921 - Flags: approval1.8.1?
Attachment #230943 - Attachment is obsolete: true
Attachment #230944 - Flags: approval1.8.1?
Attachment #230943 - Flags: approval1.8.1?
Trivial update for header files is on trunk.
Comment on attachment 230944 [details] [diff] [review] Hopefully the last try :( a=dbaron on behalf of drivers. Please check in to MOZILLA_1_8_BRANCH and mark fixed1.8.1 once you have done so. But please keep an eye out for potential regressions.
Attachment #230944 - Flags: approval1.8.1? → approval1.8.1+
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [baking on trunk]
Depends on: 347036
Depends on: 392746
Depends on: 486072
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: