Closed
Bug 345099
Opened 18 years ago
Closed 18 years ago
Sort ranges in the selection for faster comparison
Categories
(Core :: DOM: Selection, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.8.1beta2
People
(Reporter: brettw, Assigned: brettw)
References
Details
(Keywords: fixed1.8.1)
Attachments
(3 files, 6 obsolete files)
52.54 KB,
patch
|
Details | Diff | Splinter Review | |
711 bytes,
patch
|
bryner
:
review+
bryner
:
superreview+
|
Details | Diff | Splinter Review |
53.00 KB,
patch
|
dbaron
:
approval1.8.1+
|
Details | Diff | Splinter Review |
The selection contains a sequence of ranges. These ranges are currently stored in an undefined order. These ranges should instead be sorted in DOM order so they can be searched and manipulated faster.
Binary searching should then be used for those cases where the list is iterated over.
Assignee | ||
Updated•18 years ago
|
Target Milestone: --- → mozilla1.8.1beta2
Comment 1•18 years ago
|
||
It should be noted that currently (since bug 73373 was fixed), you can select multiple ranges (using CTRL), and when copying, they will be copied into the clipboard in the order in which they were selected (not in the DOM order!).
It seems that fixing the current bug will change that behavior. I'm not sure whether that's a bad thing or not - I personally don't see the value in multiple selection.
Assignee | ||
Comment 2•18 years ago
|
||
Changing copy behavior seems fine to me, possibly even more desirable, probably not much worse. If totally necessary, we can have store that order separately.
Assignee | ||
Comment 3•18 years ago
|
||
Here are some timings for the patch I'm working on for nsTypedSelection::LookUpSelection. Units are microseconds (1/1M seconds) per comparison. I computed this by timing the calls in a 10K loop, and dividing by 10K.
Old New
0 ranges: 0.316 0.0154
0.312 0.0150
0.310 0.0151
avg: 0.312 0.0152 new/old = 0.048
1 range: 10.1995 10.9430
10.2452 10.7635
13.7798 25.9530 <-- slower
12.0907 11.4539
avg: 11.5788 14.7784 new/old = 1.276
9 ranges: 61.5726 36.1701
56.0549 45.2374
64.4519 36.0935
avg: 60.6931 39.1670 new/old = 0.645
250 ranges: 3131.5237 180.8531
3177.1578 183.6467
3060.1409 170.8664
avg: 3122.9408 178.4554 new / old = 0.0571
You can see that as the number of ranges in the selection increases, the better we are (@250 ranges, we're 17.5 times faster than the old version). However, we are sometimes slower in the case of one range, sometimes by almost a factor of two. In the one slower case, I did select all for the textarea.
The old version of this function has a flag for whether the caller wants a slow check or if it has enough information and we can check less. One advantage of this new patch is that we always do a full check, eliminating the need for this flag and some extra complexity.
I am inclined to say that the extra slowness in the case of one range is OK, especially since it reduces complexity. In a bad case where there are 100K nodes visible on the screen, they're all selected, and we're drawing all of them, this means 2.6 seconds for checking selections instead of 1.3 seconds. If this is unacceptable, we can add a second code path in the case of 1 node, and preserve the fast check flag.
Assignee | ||
Comment 4•18 years ago
|
||
Here is a patch with the timing code in it so I don't lose it. I'm going to clean it up and post a final patch soon.
Assignee | ||
Comment 5•18 years ago
|
||
This patch uses an interval tree to store selection ranges. The timing numbers were given above. I checked pages with many nodes like this: http://www.craptastic.org/bah/test.html and it does not regress the time I am able to perceive it taking doing "Select All," which I believe exercises the slow case discussed above.
Assignee | ||
Comment 6•18 years ago
|
||
This patch is required for reasonable spellcheck performance. Requesting blocking.
Flags: blocking1.8.1?
Assignee | ||
Comment 7•18 years ago
|
||
Comment on attachment 230226 [details] [diff] [review]
Patch
Uri: will you be able to get to this soon? If not, do you know of another qualified reviewer?
Attachment #230226 -
Flags: review?(uriber)
Assignee | ||
Comment 8•18 years ago
|
||
Oops, ignore the changes to nsTextFrame.cpp in the patch above.
Comment 9•18 years ago
|
||
Below is my attempt at reviewing this. All of the issues save one are just questions, suggestions or minor comments. The one place where I think I might have found a flaw in the logic is marked by "***".
I did not review (due to lack of qualification) XPCOM-related issues. So whoever does the sr (I'd recommend roc) should take an extra look to make sure everything is OK.
-----------------------------------------
>+ PRInt32 mBeginIndex; // index into mRanges of this item
It seems to me like you don't really need this field, as you're always accessing a RangeData through its index in mRanges. I assume you have it for symmetry with mEndIndex (and perhaps to enable sanity checks), but perhaps it's not worth the trouble (and time) of maintaining it?
>+ // This data structure is an interval tree
Is it? It certainly holds intervals, but I can't see a tree anywhere in the implementation, just sorted arrays.
>+// If there is item(s) in the array equal to the imput point, we will return
>+// a random imdex between or adjacent to this item(s).
"input", "index".
>+ // a common case is that we have no ranges yet
>+ if (mRanges.Length() == 0) {
>+ PRInt32 index = (PRInt32)mRanges.Length();
Wouldn't "index" always be 0? So why do we need it?
>+ // Performance: 99% of the time, the beginning array and the ending array
>+ // will be the same because the ranges do not overlap. We could save a few
>+ // compares (which can be expensive) in this common case by special casing
>+ // this. This would mean we do 2 compares instead of log(n) in the common
>+ // case, and 2+log(n) in the worse case.
This describes an unimplemented enhancement, right? Perhaps precede the comment with "XXX" to make that clear. Also, I'm not sure I understand what those 2 compares actually are.
>+ // insert the range, being careful to revert everything on error to keep
>+ // consistency
>+ if (! mRanges.InsertElementAt(beginInsertionPoint,
>+ RangeData(aItem, beginInsertionPoint, endInsertionPoint))) {
>+ mRanges.RemoveElementAt(beginInsertionPoint);
>+ return NS_ERROR_OUT_OF_MEMORY;
>+ }
I don't think you want to remove the element if inserting it failed.
>+ // find the range's index & remove it
>+ PRInt32 idx = -1;
>+ PRUint32 i;
>+ for (i = 0; i < mRanges.Length(); i ++) {
>+ if (mRanges[i].mRange == aItem) {
>+ idx = (PRInt32)i;
>+ break;
>+ }
>+ }
You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch and a forward scan) to do a binary search here, right? Is it not worth the trouble?
>+ if (mRangeEndings[i] >= idx)
>+ mRangeEndings[i] --;
Any special reason for ">=" here, instead of ">"?
>+ // this node matches, go back one
>+ (*aIndex) --;
I think it would be clearer if you say "the previous node matches", as you're referring to the node at (*aIndex)-1.
>+// nsTypedSelection::MoveIndexToNextEndMismatch
>+//
>+// If there is one or more range ending at the requested position, we
>+// want to start checking the first one that ends inside the range. This
>+// moves the given index to point to the first index inside the range.
>+// It may be one past the last item in the array.
Either I didn't understand the comment, or I didn't understand the code, or they don't match. It seems to me that this actually moves the given index to point to the first range which ends after the requested position.
>+ // adjust the indices in case of exact matches, FindInsertionPoint will
>+ // give us a random index that would still be sorted if there is a match
>+ if (aAllowAdjacent) {
>+ // include adjacent points
>+ rv = MoveIndexToNextEndMismatch(&beginningIndex, aEndNode, aEndOffset,
>+ nsnull);
>+ NS_ENSURE_SUCCESS(rv, rv);
>+ rv = MoveIndexToFirstBeginMatch(&endingIndex, aBeginNode, aBeginOffset,
>+ &mRangeEndings);
>+ NS_ENSURE_SUCCESS(rv, rv);
>+ } else {
*** I don't think this works. First, we want to move beginningIndex to point to the first range in mRanges whose *beginning* is after aEndNode/aEndOffset, right? But to do so, you're using MoveIndexToNextEndMismatch, which only looks at *ends* of ranges (and compares them to aEndNode/aEndOffset).
Then, we want to find the last range in mRangeEndings whose *end* is before aBeginNode/aBeginOffset, but we're using MoveIndexToFirstBeginMatch, which looks at *beginnings* of ranges.
I don't have any comments or questions about nsTypedSelection::LookUpSelection.
>+ // The caller wants to know if the node is entirely within the given range,
>+ // so we have to check all intersecting ranges.
>+ for (PRInt32 i = 0; i < overlappingRanges.Count(); i ++) {
>+ PRBool nodeStartsBeforeRange, nodeEndsAfterRange;
>+ if (NS_SUCCEEDED(nsRange::CompareNodeToRange(content, overlappingRanges[i],
>+ &nodeStartsBeforeRange,
>+ &nodeEndsAfterRange)))
>+ {
>+ if (!nodeStartsBeforeRange && !nodeEndsAfterRange) {
>+ *aYes = PR_TRUE;
>+ return NS_OK;
> }
> }
> }
I wonder if it would be possible to implement this inside GetRangesForIntervalCOMArray - using a flag that would tell it to look only for ranges fully contained within the given range, rather than those intersecting it. At least in theory, that should be more efficient than finding all the intersecting ranges, and then filtering them this way. However, if this is not a performance problem in practice, it might not be worth the trouble.
Assignee | ||
Comment 10•18 years ago
|
||
This patch addresses all or Uri's review except for moving the "all inside" test into GetRangesForInterval. This flag is not used very much, I see only 2 calls: nsContentAreaDragDrop and nsPlaintextDataTransfer.cpp.
The range finding code is already kind of scary. I think we should push as much of this complexity out of the function.
Attachment #230226 -
Attachment is obsolete: true
Attachment #230461 -
Flags: review?(uriber)
Attachment #230226 -
Flags: review?(uriber)
Updated•18 years ago
|
Flags: blocking1.8.1? → blocking1.8.1+
Comment 11•18 years ago
|
||
Comment on attachment 230461 [details] [diff] [review]
Patch V2
The examples
in the comments really help understanding what's going on - thanks.
You haven't addressed four of the points from my review (detailed below).
For (b) I assume the reason is that it's not worth the trouble.
For (a), (c), and (d), I assume you either have a reason to keep them the way it is, or it's an oversight. If it's the latter - please address them before checking in.
Either way, r=me.
I should also say that this is really great code, IMO.
---------------
(a)
> >+ // a common case is that we have no ranges yet
> >+ if (mRanges.Length() == 0) {
> >+ PRInt32 index = (PRInt32)mRanges.Length();
>
> Wouldn't "index" always be 0? So why do we need it?
(b)
> >+ // find the range's index & remove it
> >+ PRInt32 idx = -1;
> >+ PRUint32 i;
> >+ for (i = 0; i < mRanges.Length(); i ++) {
> >+ if (mRanges[i].mRange == aItem) {
> >+ idx = (PRInt32)i;
> >+ break;
> >+ }
> >+ }
>
> You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch
> and a forward scan) to do a binary search here, right? Is it not worth the
> trouble?
(c)
> >+ if (mRangeEndings[i] >= idx)
> >+ mRangeEndings[i] --;
>
> Any special reason for ">=" here, instead of ">"?
(d)
> >+ // this node matches, go back one
> >+ (*aIndex) --;
>
> I think it would be clearer if you say "the previous node matches", as you're
> referring to the node at (*aIndex)-1.
Attachment #230461 -
Flags: review?(uriber) → review+
Why not use nsRange instead of nsIDOMRange here? Give yourself some inline getters to simplify this code.
I think ultimately we'd like to have the ranges in a selection be discontiguous, then we'd only need one array here. In fact we could use some balanced tree function so even AddRange could be O(log N). Future work.
+ if (mRanges.Length() == 0) {
+ PRInt32 index = (PRInt32)mRanges.Length();
You just said it's zero...
Assignee | ||
Comment 13•18 years ago
|
||
(In reply to comment #11)
> > >+ // find the range's index & remove it
> > >+ PRInt32 idx = -1;
> > >+ PRUint32 i;
> > >+ for (i = 0; i < mRanges.Length(); i ++) {
> > >+ if (mRanges[i].mRange == aItem) {
> > >+ idx = (PRInt32)i;
> > >+ break;
> > >+ }
> > >+ }
> >
> > You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch
> > and a forward scan) to do a binary search here, right? Is it not worth the
> > trouble?
In this case, I think it's a bad idea. The DOM comparisons are very expensive because they walk the DOM tree and do lots of virtual function calls. Doing pointer comparisons in a linear array is very fast, even though we do more of them. For small numbers of ranges, linear is definitely faster. I'm unsure at what point the logarithmic method would be faster, but a wild guess would be several thousand ranges.
I post a new patch with the rest, and put a comment in about that.
Assignee | ||
Comment 14•18 years ago
|
||
This addresses the rest of Uri's comments.
Attachment #230461 -
Attachment is obsolete: true
Attachment #230583 -
Flags: superreview?(roc)
did you see my comments?
Assignee | ||
Updated•18 years ago
|
Attachment #230175 -
Attachment is obsolete: true
Assignee | ||
Comment 16•18 years ago
|
||
(In reply to comment #15)
> did you see my comments?
Do you mean that I should convert nsIDOMRanges to nsRange and use that internally? I don't see what that gets me.
Where do you suggest using inline getters?
Assignee | ||
Updated•18 years ago
|
Whiteboard: [has patch][needs sr]
Comment 17•18 years ago
|
||
Comment on attachment 230583 [details] [diff] [review]
Patch V3
>Index: layout/generic/nsSelection.cpp
>+ nsresult FindInsertionPoint(
>+ nsTArray<PRInt32>* aRemappingArray,
I think this would be better as a const nsTArray<PRInt32>&, what do you think?
> nsresult
> nsTypedSelection::RemoveItem(nsIDOMRange *aItem)
> {
>+ // meed to update the range ending list to reflect the removed item
"need"
// nsTypedSelection::MoveIndexToFirstMatch
Could you add like one sentence to the beginning of your comment that says what the function does, before you start talking about the parameters and stuff? In particular, what do you get back?
Also, same comment about the remapping array being a const reference.
It might be more readable here if you used a temporary instead of (*aIndex) directly, but it's up to you.
// nsTypedSelection::MoveIndexToNextMismatch
I like the function description here. My other couple of comments for MoveIndexToFirstMatch apply here though.
>+NS_IMETHODIMP
>+nsTypedSelection::GetRangesForIntervalCOMArray(nsIDOMNode* aBeginNode, PRInt32 aBeginOffset,
>+ nsIDOMNode* aEndNode, PRInt32 aEndOffset,
>+ PRBool aAllowAdjacent,
>+ nsCOMArray<nsIDOMRange>* aRanges)
>+{
...
>+ // adjust the indices in case of exact matches, FindInsertionPoint will
>+ // give us a random index that would still be sorted if there is a match
>+ if (aAllowAdjacent) {
>+ // include adjacent points
>+ // 1 3 5 5 5 8 9 10 10 10 11 12
>+ // ^ ^ <-- we have this
>+ // ^ ^ <-- compute this range
>+ // endingIndex beginningIndex
>+ rv = MoveIndexToFirstMatch(&endingIndex, aBeginNode, aBeginOffset,
>+ &mRangeEndings, PR_FALSE);
>+ NS_ENSURE_SUCCESS(rv, rv);
>+ rv = MoveIndexToNextMismatch(&beginningIndex, aEndNode, aEndOffset,
>+ nsnull, PR_TRUE);
>+ NS_ENSURE_SUCCESS(rv, rv);
In this case, it seems like endingIndex and beginningIndex will be correct but reversed. Am I missing something?
>-nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aRecursive, PRBool* aYes)
>+nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aAllowPartial, PRBool* aYes)
> {
> if (!aYes)
> return NS_ERROR_NULL_POINTER;
>+ NS_ASSERTION(ValidateRanges(), "Ranges out of sync");
> *aYes = PR_FALSE;
>
>- // Iterate over the ranges in the selection checking for the content:
>- PRInt32 cnt = mRangeArray.Count();
>- for (PRInt32 i = 0; i < cnt; ++i)
>- {
>- nsIDOMRange* range = mRangeArray[i];
>- if (!range)
>- return NS_ERROR_UNEXPECTED;
>+ if (mRanges.Length() == 0)
>+ return NS_OK;
>
>- nsCOMPtr<nsIContent> content (do_QueryInterface(aNode));
>- if (content)
>- {
>- if (nsRange::IsNodeIntersectsRange(content, range))
>- {
>- // If recursive, then we're done -- IsNodeIntersectsRange does the right thing
>- if (aRecursive)
>- {
>- *aYes = PR_TRUE;
>- return NS_OK;
>- }
>+ nsresult rv;
>+ nsCOMPtr<nsIContent> content (do_QueryInterface(aNode, &rv));
>+ NS_ENSURE_SUCCESS(rv, rv);
I think you can move this down and avoid doing it at all in some of your early-return cases.
>Index: content/base/public/nsISelection2.idl
>+#include "nsISupports.idl"
Strictly speaking you don't need this, you get it through nsISelection.idl
Looks good otherwise.
Attachment #230583 -
Flags: superreview?(roc) → superreview+
(In reply to comment #16)
> (In reply to comment #15)
> > did you see my comments?
>
> Do you mean that I should convert nsIDOMRanges to nsRange and use that
> internally? I don't see what that gets me.
>
> Where do you suggest using inline getters?
Add some to nsRange.
Assignee | ||
Comment 19•18 years ago
|
||
(In reply to comment #18)
> > Where do you suggest using inline getters?
>
> Add some to nsRange.
I see, so I should add a static IID to nsRange when people add ranges. What about if somebody gives us an nsIDOMRange that isn't an nsRange? I know this won't really happen in practice, but the APIs are designed for this kind of thing.
This patch needs to get in last week for 2.0b2. Can we just get it in and fix the ranges on trunk if you still think it's important?
For Brian's comments: I added more comments on the index selection to make it clearer, and changed the remapping arrays to be const pointers (we can't use const references since the parameter is optional).
Assignee | ||
Comment 20•18 years ago
|
||
Attachment #230583 -
Attachment is obsolete: true
Assignee | ||
Updated•18 years ago
|
Whiteboard: [has patch] → [baking on trunk]
Assignee | ||
Comment 22•18 years ago
|
||
Attachment #230921 -
Flags: approval1.8.1?
Assignee | ||
Comment 23•18 years ago
|
||
Attachment #230940 -
Flags: superreview?
Attachment #230940 -
Flags: review?
Assignee | ||
Updated•18 years ago
|
Attachment #230940 -
Flags: superreview?(bryner)
Attachment #230940 -
Flags: superreview?
Attachment #230940 -
Flags: review?(bryner)
Attachment #230940 -
Flags: review?
Updated•18 years ago
|
Attachment #230940 -
Flags: superreview?(bryner)
Attachment #230940 -
Flags: superreview+
Attachment #230940 -
Flags: review?(bryner)
Attachment #230940 -
Flags: review+
Assignee | ||
Comment 24•18 years ago
|
||
Attachment #230921 -
Attachment is obsolete: true
Attachment #230943 -
Flags: approval1.8.1?
Attachment #230921 -
Flags: approval1.8.1?
Assignee | ||
Comment 25•18 years ago
|
||
Attachment #230943 -
Attachment is obsolete: true
Attachment #230944 -
Flags: approval1.8.1?
Attachment #230943 -
Flags: approval1.8.1?
Assignee | ||
Comment 26•18 years ago
|
||
Trivial update for header files is on trunk.
Comment on attachment 230944 [details] [diff] [review]
Hopefully the last try :(
a=dbaron on behalf of drivers. Please check in to MOZILLA_1_8_BRANCH and mark fixed1.8.1 once you have done so.
But please keep an eye out for potential regressions.
Attachment #230944 -
Flags: approval1.8.1? → approval1.8.1+
Assignee | ||
Updated•18 years ago
|
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [baking on trunk]
You need to log in
before you can comment on or make changes to this bug.
Description
•