If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Sort ranges in the selection for faster comparison

RESOLVED FIXED in mozilla1.8.1beta2

Status

()

Core
Selection
RESOLVED FIXED
11 years ago
9 years ago

People

(Reporter: Brett Wilson, Assigned: Brett Wilson)

Tracking

({fixed1.8.1})

1.8 Branch
mozilla1.8.1beta2
fixed1.8.1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8.1 +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 6 obsolete attachments)

(Assignee)

Description

11 years ago
The selection contains a sequence of ranges. These ranges are currently stored in an undefined order. These ranges should instead be sorted in DOM order so they can be searched and manipulated faster.

Binary searching should then be used for those cases where the list is iterated over.
(Assignee)

Updated

11 years ago
Blocks: 345048
(Assignee)

Updated

11 years ago
Blocks: 345101
(Assignee)

Updated

11 years ago
Target Milestone: --- → mozilla1.8.1beta2
It should be noted that currently (since bug 73373 was fixed), you can select multiple ranges (using CTRL), and when copying, they will be copied into the clipboard in the order in which they were selected (not in the DOM order!).

It seems that fixing the current bug will change that behavior. I'm not sure whether that's a bad thing or not - I personally don't see the value in multiple selection.
(Assignee)

Comment 2

11 years ago
Changing copy behavior seems fine to me, possibly even more desirable, probably not much worse. If totally necessary, we can have store that order separately.
(Assignee)

Updated

11 years ago
Blocks: 343610
(Assignee)

Updated

11 years ago
Blocks: 234936
(Assignee)

Comment 3

11 years ago
Here are some timings for the patch I'm working on for nsTypedSelection::LookUpSelection. Units are microseconds (1/1M seconds) per comparison. I computed this by timing the calls in a 10K loop, and dividing by 10K.

             Old        New
0 ranges:    0.316      0.0154
             0.312      0.0150
             0.310      0.0151
       avg:  0.312      0.0152  new/old = 0.048

1 range:     10.1995    10.9430
             10.2452    10.7635
             13.7798    25.9530 <-- slower
             12.0907    11.4539
       avg:  11.5788    14.7784  new/old = 1.276

9 ranges:    61.5726    36.1701
             56.0549    45.2374
             64.4519    36.0935
       avg:  60.6931    39.1670  new/old = 0.645

250 ranges:  3131.5237   180.8531
             3177.1578   183.6467
             3060.1409   170.8664
       avg:  3122.9408   178.4554  new / old = 0.0571

You can see that as the number of ranges in the selection increases, the better we are (@250 ranges, we're 17.5 times faster than the old version). However, we are sometimes slower in the case of one range, sometimes by almost a factor of two. In the one slower case, I did select all for the textarea.

The old version of this function has a flag for whether the caller wants a slow check or if it has enough information and we can check less. One advantage of this new patch is that we always do a full check, eliminating the need for this flag and some extra complexity.

I am inclined to say that the extra slowness in the case of one range is OK, especially since it reduces complexity. In a bad case where there are 100K nodes visible on the screen, they're all selected, and we're drawing all of them, this means 2.6 seconds for checking selections instead of 1.3 seconds. If this is unacceptable, we can add a second code path in the case of 1 node, and preserve the fast check flag.
(Assignee)

Comment 4

11 years ago
Created attachment 230175 [details] [diff] [review]
Selection patch with timing

Here is a patch with the timing code in it so I don't lose it. I'm going to clean it up and post a final patch soon.
(Assignee)

Comment 5

11 years ago
Created attachment 230226 [details] [diff] [review]
Patch

This patch uses an interval tree to store selection ranges. The timing numbers were given above. I checked pages with many nodes like this: http://www.craptastic.org/bah/test.html and it does not regress the time I am able to perceive it taking doing "Select All," which I believe exercises the slow case discussed above.
(Assignee)

Comment 6

11 years ago
This patch is required for reasonable spellcheck performance. Requesting blocking.
Flags: blocking1.8.1?
(Assignee)

Comment 7

11 years ago
Comment on attachment 230226 [details] [diff] [review]
Patch

Uri: will you be able to get to this soon? If not, do you know of another qualified reviewer?
Attachment #230226 - Flags: review?(uriber)
(Assignee)

Comment 8

11 years ago
Oops, ignore the changes to nsTextFrame.cpp in the patch above.
Below is my attempt at reviewing this. All of the issues save one are just questions, suggestions or minor comments. The one place where I think I might have found a flaw in the logic is marked by "***".

I did not review (due to lack of qualification) XPCOM-related issues. So whoever does the sr (I'd recommend roc) should take an extra look to make sure everything is OK.

-----------------------------------------

>+  PRInt32 mBeginIndex; // index into mRanges of this item

It seems to me like you don't really need this field, as you're always accessing a RangeData through its index in mRanges. I assume you have it for symmetry with mEndIndex (and perhaps to enable sanity checks), but perhaps it's not worth the trouble (and time) of maintaining it?

>+  // This data structure is an interval tree

Is it? It certainly holds intervals, but I can't see a tree anywhere in the implementation, just sorted arrays.

>+//    If there is item(s) in the array equal to the imput point, we will return
>+//    a random imdex between or adjacent to this item(s).

"input", "index".

>+  // a common case is that we have no ranges yet
>+  if (mRanges.Length() == 0) {
>+    PRInt32 index = (PRInt32)mRanges.Length();

Wouldn't "index" always be 0? So why do we need it?

>+  // Performance: 99% of the time, the beginning array and the ending array
>+  // will be the same because the ranges do not overlap. We could save a few
>+  // compares (which can be expensive) in this common case by special casing
>+  // this. This would mean we do 2 compares instead of log(n) in the common
>+  // case, and 2+log(n) in the worse case.

This describes an unimplemented enhancement, right? Perhaps precede the comment with "XXX" to make that clear. Also, I'm not sure I understand what those 2 compares actually are.

>+  // insert the range, being careful to revert everything on error to keep
>+  // consistency
>+  if (! mRanges.InsertElementAt(beginInsertionPoint,
>+        RangeData(aItem, beginInsertionPoint, endInsertionPoint))) {
>+    mRanges.RemoveElementAt(beginInsertionPoint);
>+    return NS_ERROR_OUT_OF_MEMORY;
>+  }

I don't think you want to remove the element if inserting it failed.

>+  // find the range's index & remove it
>+  PRInt32 idx = -1;
>+  PRUint32 i;
>+  for (i = 0; i < mRanges.Length(); i ++) {
>+    if (mRanges[i].mRange == aItem) {
>+      idx = (PRInt32)i;
>+      break;
>+    }
>+  }

You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch and a forward scan) to do a binary search here, right? Is it not worth the trouble?

>+    if (mRangeEndings[i] >= idx)
>+      mRangeEndings[i] --;

Any special reason for ">=" here, instead of ">"?

>+    // this node matches, go back one
>+    (*aIndex) --;


I think it would be clearer if you say "the previous node matches", as you're referring to the node at (*aIndex)-1.

>+// nsTypedSelection::MoveIndexToNextEndMismatch
>+//
>+//    If there is one or more range ending at the requested position, we
>+//    want to start checking the first one that ends inside the range. This
>+//    moves the given index to point to the first index inside the range.
>+//    It may be one past the last item in the array.

Either I didn't understand the comment, or I didn't understand the code, or they don't match. It seems to me that this actually moves the given index to point to the first range which ends after the requested position.

>+  // adjust the indices in case of exact matches, FindInsertionPoint will
>+  // give us a random index that would still be sorted if there is a match
>+  if (aAllowAdjacent) {
>+    // include adjacent points
>+    rv = MoveIndexToNextEndMismatch(&beginningIndex, aEndNode, aEndOffset,
>+                                    nsnull);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    rv = MoveIndexToFirstBeginMatch(&endingIndex, aBeginNode, aBeginOffset,
>+                                    &mRangeEndings);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+  } else {

*** I don't think this works. First, we want to move beginningIndex to point to the first range in mRanges whose *beginning* is after aEndNode/aEndOffset, right? But to do so, you're using MoveIndexToNextEndMismatch, which only looks at *ends* of ranges (and compares them to aEndNode/aEndOffset).
Then, we want to find the last range in mRangeEndings whose *end* is before aBeginNode/aBeginOffset, but we're using MoveIndexToFirstBeginMatch, which looks at *beginnings* of ranges.

I don't have any comments or questions about nsTypedSelection::LookUpSelection.

>+  // The caller wants to know if the node is entirely within the given range,
>+  // so we have to check all intersecting ranges.
>+  for (PRInt32 i = 0; i < overlappingRanges.Count(); i ++) {
>+    PRBool nodeStartsBeforeRange, nodeEndsAfterRange;
>+    if (NS_SUCCEEDED(nsRange::CompareNodeToRange(content, overlappingRanges[i],
>+                                                 &nodeStartsBeforeRange,
>+                                                 &nodeEndsAfterRange)))
>+    {
>+      if (!nodeStartsBeforeRange && !nodeEndsAfterRange) {
>+        *aYes = PR_TRUE;
>+        return NS_OK;
>       }
>     }
>   }

I wonder if it would be possible to implement this inside GetRangesForIntervalCOMArray - using a flag that would tell it to look only for ranges fully contained within the given range, rather than those intersecting it. At least in theory, that should be more efficient than finding all the intersecting ranges, and then filtering them this way. However, if this is not a performance problem in practice, it might not be worth the trouble.
(Assignee)

Comment 10

11 years ago
Created attachment 230461 [details] [diff] [review]
Patch V2

This patch addresses all or Uri's review except for moving the "all inside" test into GetRangesForInterval. This flag is not used very much, I see only 2 calls: nsContentAreaDragDrop and nsPlaintextDataTransfer.cpp.

The range finding code is already kind of scary. I think we should push as much of this complexity out of the function.
Attachment #230226 - Attachment is obsolete: true
Attachment #230461 - Flags: review?(uriber)
Attachment #230226 - Flags: review?(uriber)

Updated

11 years ago
Flags: blocking1.8.1? → blocking1.8.1+
(Assignee)

Updated

11 years ago
Blocks: 345751
Comment on attachment 230461 [details] [diff] [review]
Patch V2

The examples 
in the comments really help understanding what's going on - thanks.

You haven't addressed four of the points from my review (detailed below).
For (b) I assume the reason is that it's not worth the trouble.

For (a), (c), and (d), I assume you either have a reason to keep them the way it is, or it's an oversight. If it's the latter - please address them before checking in.

Either way, r=me.
I should also say that this is really great code, IMO.

---------------
(a)
> >+  // a common case is that we have no ranges yet
> >+  if (mRanges.Length() == 0) {
> >+    PRInt32 index = (PRInt32)mRanges.Length();
> 
> Wouldn't "index" always be 0? So why do we need it?

(b)
> >+  // find the range's index & remove it
> >+  PRInt32 idx = -1;
> >+  PRUint32 i;
> >+  for (i = 0; i < mRanges.Length(); i ++) {
> >+    if (mRanges[i].mRange == aItem) {
> >+      idx = (PRInt32)i;
> >+      break;
> >+    }
> >+  }
> 
> You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch
> and a forward scan) to do a binary search here, right? Is it not worth the
> trouble?
 
(c)
> >+    if (mRangeEndings[i] >= idx)
> >+      mRangeEndings[i] --;
> 
> Any special reason for ">=" here, instead of ">"?

(d)
> >+    // this node matches, go back one
> >+    (*aIndex) --;
> 
> I think it would be clearer if you say "the previous node matches", as you're
> referring to the node at (*aIndex)-1.
Attachment #230461 - Flags: review?(uriber) → review+
Why not use nsRange instead of nsIDOMRange here? Give yourself some inline getters to simplify this code.

I think ultimately we'd like to have the ranges in a selection be discontiguous, then we'd only need one array here. In fact we could use some balanced tree function so even AddRange could be O(log N). Future work.

+  if (mRanges.Length() == 0) {
+    PRInt32 index = (PRInt32)mRanges.Length();

You just said it's zero...

(Assignee)

Comment 13

11 years ago
(In reply to comment #11)
> > >+  // find the range's index & remove it
> > >+  PRInt32 idx = -1;
> > >+  PRUint32 i;
> > >+  for (i = 0; i < mRanges.Length(); i ++) {
> > >+    if (mRanges[i].mRange == aItem) {
> > >+      idx = (PRInt32)i;
> > >+      break;
> > >+    }
> > >+  }
> > 
> > You could have used FindInsertionPoint (followed by MoveIndexToFirstBeginMatch
> > and a forward scan) to do a binary search here, right? Is it not worth the
> > trouble?

In this case, I think it's a bad idea. The DOM comparisons are very expensive because they walk the DOM tree and do lots of virtual function calls. Doing pointer comparisons in a linear array is very fast, even though we do more of them. For small numbers of ranges, linear is definitely faster. I'm unsure at what point the logarithmic method would be faster, but a wild guess would be several thousand ranges.

I post a new patch with the rest, and put a comment in about that.
(Assignee)

Comment 14

11 years ago
Created attachment 230583 [details] [diff] [review]
Patch V3

This addresses the rest of Uri's comments.
Attachment #230461 - Attachment is obsolete: true
Attachment #230583 - Flags: superreview?(roc)
did you see my comments?
(Assignee)

Updated

11 years ago
Attachment #230175 - Attachment is obsolete: true
(Assignee)

Comment 16

11 years ago
(In reply to comment #15)
> did you see my comments?

Do you mean that I should convert nsIDOMRanges to nsRange and use that internally? I don't see what that gets me.

Where do you suggest using inline getters?
(Assignee)

Updated

11 years ago
Whiteboard: [has patch][needs sr]
Comment on attachment 230583 [details] [diff] [review]
Patch V3

>Index: layout/generic/nsSelection.cpp
>+  nsresult FindInsertionPoint(
>+      nsTArray<PRInt32>* aRemappingArray,

I think this would be better as a const nsTArray<PRInt32>&, what do you think?

> nsresult
> nsTypedSelection::RemoveItem(nsIDOMRange *aItem)
> {
>+  // meed to update the range ending list to reflect the removed item
"need"

 	
// nsTypedSelection::MoveIndexToFirstMatch

Could you add like one sentence to the beginning of your comment that says what the function does, before you start talking about the parameters and stuff?  In particular, what do you get back?

Also, same comment about the remapping array being a const reference.

It might be more readable here if you used a temporary instead of (*aIndex) directly, but it's up to you.

// nsTypedSelection::MoveIndexToNextMismatch

I like the function description here.  My other couple of comments for MoveIndexToFirstMatch apply here though.

>+NS_IMETHODIMP
>+nsTypedSelection::GetRangesForIntervalCOMArray(nsIDOMNode* aBeginNode, PRInt32 aBeginOffset,
>+                                               nsIDOMNode* aEndNode, PRInt32 aEndOffset,
>+                                               PRBool aAllowAdjacent,
>+                                               nsCOMArray<nsIDOMRange>* aRanges)
>+{
...
>+  // adjust the indices in case of exact matches, FindInsertionPoint will
>+  // give us a random index that would still be sorted if there is a match
>+  if (aAllowAdjacent) {
>+    // include adjacent points
>+    // 1 3 5 5 5 8 9 10 10 10 11 12
>+    //       ^          ^            <-- we have this
>+    //     ^                  ^      <-- compute this range
>+    //  endingIndex   beginningIndex
>+    rv = MoveIndexToFirstMatch(&endingIndex, aBeginNode, aBeginOffset,
>+                               &mRangeEndings, PR_FALSE);
>+    NS_ENSURE_SUCCESS(rv, rv);
>+    rv = MoveIndexToNextMismatch(&beginningIndex, aEndNode, aEndOffset,
>+                                 nsnull, PR_TRUE);
>+    NS_ENSURE_SUCCESS(rv, rv);

In this case, it seems like endingIndex and beginningIndex will be correct but reversed.  Am I missing something?

>-nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aRecursive, PRBool* aYes)
>+nsTypedSelection::ContainsNode(nsIDOMNode* aNode, PRBool aAllowPartial, PRBool* aYes)
> {
>   if (!aYes)
>     return NS_ERROR_NULL_POINTER;
>+  NS_ASSERTION(ValidateRanges(), "Ranges out of sync");
>   *aYes = PR_FALSE;
> 
>-  // Iterate over the ranges in the selection checking for the content:
>-  PRInt32 cnt = mRangeArray.Count();
>-  for (PRInt32 i = 0; i < cnt; ++i)
>-  {
>-    nsIDOMRange* range = mRangeArray[i];
>-    if (!range)
>-      return NS_ERROR_UNEXPECTED;
>+  if (mRanges.Length() == 0)
>+    return NS_OK;
> 
>-    nsCOMPtr<nsIContent> content (do_QueryInterface(aNode));
>-    if (content)
>-    {
>-      if (nsRange::IsNodeIntersectsRange(content, range))
>-      {
>-        // If recursive, then we're done -- IsNodeIntersectsRange does the right thing
>-        if (aRecursive)
>-        {
>-          *aYes = PR_TRUE;
>-          return NS_OK;
>-        }
>+  nsresult rv;
>+  nsCOMPtr<nsIContent> content (do_QueryInterface(aNode, &rv));
>+  NS_ENSURE_SUCCESS(rv, rv);

I think you can move this down and avoid doing it at all in some of your early-return cases.

>Index: content/base/public/nsISelection2.idl
>+#include "nsISupports.idl"

Strictly speaking you don't need this, you get it through nsISelection.idl

Looks good otherwise.
Attachment #230583 - Flags: superreview?(roc) → superreview+
(In reply to comment #16)
> (In reply to comment #15)
> > did you see my comments?
> 
> Do you mean that I should convert nsIDOMRanges to nsRange and use that
> internally? I don't see what that gets me.
> 
> Where do you suggest using inline getters?

Add some to nsRange.

(Assignee)

Comment 19

11 years ago
(In reply to comment #18)
> > Where do you suggest using inline getters?
> 
> Add some to nsRange.

I see, so I should add a static IID to nsRange when people add ranges. What about if somebody gives us an nsIDOMRange that isn't an nsRange? I know this won't really happen in practice, but the APIs are designed for this kind of thing.

This patch needs to get in last week for 2.0b2. Can we just get it in and fix the ranges on trunk if you still think it's important?

For Brian's comments: I added more comments on the index selection to make it clearer, and changed the remapping arrays to be const pointers (we can't use const references since the parameter is optional).
(Assignee)

Comment 20

11 years ago
Created attachment 230919 [details] [diff] [review]
Final trunk patch
Attachment #230583 - Attachment is obsolete: true
(Assignee)

Comment 21

11 years ago
Fixed on trunk.
Whiteboard: [has patch][needs sr] → [has patch]
(Assignee)

Updated

11 years ago
Whiteboard: [has patch] → [baking on trunk]
(Assignee)

Comment 22

11 years ago
Created attachment 230921 [details] [diff] [review]
Final branch patch
Attachment #230921 - Flags: approval1.8.1?
(Assignee)

Comment 23

11 years ago
Created attachment 230940 [details] [diff] [review]
Trivial update for IDL file
Attachment #230940 - Flags: superreview?
Attachment #230940 - Flags: review?
(Assignee)

Updated

11 years ago
Attachment #230940 - Flags: superreview?(bryner)
Attachment #230940 - Flags: superreview?
Attachment #230940 - Flags: review?(bryner)
Attachment #230940 - Flags: review?
Attachment #230940 - Flags: superreview?(bryner)
Attachment #230940 - Flags: superreview+
Attachment #230940 - Flags: review?(bryner)
Attachment #230940 - Flags: review+
(Assignee)

Comment 24

11 years ago
Created attachment 230943 [details] [diff] [review]
New branch patch including bustage fix
Attachment #230921 - Attachment is obsolete: true
Attachment #230943 - Flags: approval1.8.1?
Attachment #230921 - Flags: approval1.8.1?
(Assignee)

Comment 25

11 years ago
Created attachment 230944 [details] [diff] [review]
Hopefully the last try :(
Attachment #230943 - Attachment is obsolete: true
Attachment #230944 - Flags: approval1.8.1?
Attachment #230943 - Flags: approval1.8.1?
(Assignee)

Comment 26

11 years ago
Trivial update for header files is on trunk.
Comment on attachment 230944 [details] [diff] [review]
Hopefully the last try :(

a=dbaron on behalf of drivers.  Please check in to MOZILLA_1_8_BRANCH and mark fixed1.8.1 once you have done so.

But please keep an eye out for potential regressions.
Attachment #230944 - Flags: approval1.8.1? → approval1.8.1+
(Assignee)

Updated

11 years ago
Status: NEW → RESOLVED
Last Resolved: 11 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [baking on trunk]
Depends on: 346729

Updated

11 years ago
Depends on: 347036

Updated

10 years ago
Depends on: 392746
Depends on: 486072
You need to log in before you can comment on or make changes to this bug.