Closed Bug 998507 Opened 10 years ago Closed 10 years ago

add BinarySearch

Categories

(Core :: MFBT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: luke, Assigned: luke)

References

Details

Attachments

(1 file, 1 obsolete file)

Attached patch hoist-binary-lookup (obsolete) — Splinter Review
Dan already found a ton of duplicate open-coded binary searches.  I need to add yet another in bug 998490, so might as well factor out a nice, reasonably generic algorithm.

In theory, all these binary searches could instead use std::lower_bound, but it's a pain to use in practice b/c of iterator template goo and because I always have to lookup how to interpret the meaning of the returned iterator.  This patch adds something a bit less generic but good enough for a lot of use cases, I think.
Attachment #8409147 - Flags: review?(sunfish)
There's similar work also in bug 917918.  CCing the author of that work for any particular commentary he has, and/or for when those patches might (or at this point, might not) land.  Clearly we need to do *something* here, in any event.
Flags: needinfo?(georg.fritzsche)
Yes, ideally, and soon :)
Comment on attachment 8409147 [details] [diff] [review]
hoist-binary-lookup

Review of attachment 8409147 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/AsmJSSignalHandlers.cpp
@@ +211,5 @@
>  
> +struct GetHeapAccess
> +{
> +    const AsmJSModule &module;
> +    GetHeapAccess(const AsmJSModule &module) : module(module) {}

Add an explicit keyword here.

::: mfbt/BinarySearch.h
@@ +30,5 @@
> + */
> +
> +template <typename T, typename ValueAtOp>
> +bool
> +BinarySearch(T target, ValueAtOp valueAt, size_t begin, size_t end, size_t *match)

Alternatively, this could be:

template <typename Iterator, typename ValueAtOp>
Iterator
BinarySearch(T target, ValueAtOp valueAt, Iterator begin, Iterator end)

returning end when the element is not found. The warped C++ part of my brain finds this more C++-ish and more generic. And I think it's still better than std::lower_bound because it either returns a found element or end, not a confusing "bound" value. But, it is a little more confusing when your actual use case is searching through a range of size_t values. So I'm ok with this either way.

@@ +37,5 @@
> +
> +  size_t low = begin;
> +  size_t high = end;
> +  while (low != high) {
> +    size_t middle = low + (high - low) / 2;

... although note that if you do switch to iterators, this subtract would return a signed value, making this divide signed, while it'd be nice to keep this as an unsigned divide. An explicit cast would suffice.
Attachment #8409147 - Flags: review?(sunfish) → review+
Comment on attachment 8409147 [details] [diff] [review]
hoist-binary-lookup

Review of attachment 8409147 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/BinarySearch.h
@@ +38,5 @@
> +  size_t low = begin;
> +  size_t high = end;
> +  while (low != high) {
> +    size_t middle = low + (high - low) / 2;
> +    T middleValue = valueAt(middle);

In bug 917918 comment 9, Waldo suggested adding (non-complexity-changing) assertions to verify that BinarySearch's input data is actually sorted.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #1)
> There's similar work also in bug 917918.  CCing the author of that work for
> any particular commentary he has, and/or for when those patches might (or at
> this point, might not) land.  Clearly we need to do *something* here, in any
> event.

Sadly i didn't manage to finish that up yet as other work kept coming up that had to be done first.
I'm not sure yet how soon that will change, so it's best not to wait on me (and maybe take over that bug?).
Flags: needinfo?(georg.fritzsche)
Comment on attachment 8409147 [details] [diff] [review]
hoist-binary-lookup

Review of attachment 8409147 [details] [diff] [review]:
-----------------------------------------------------------------

If we want to unify the custom binary search implementations we have across the tree we also need:
* a variant that takes a comparison functor for when operator< doesn't directly work on the items
* if there was no match, return the index where the value would be inserted
I tried to address those over in bug 917918.
(In reply to Dan Gohman [:sunfish] from comment #3)
> Alternatively, this could be:
> 
> template <typename Iterator, typename ValueAtOp>
> Iterator
> BinarySearch(T target, ValueAtOp valueAt, Iterator begin, Iterator end)

Over time, I've really grown to appreciate the style of functions returning a boolean indicating success/failure, rather than return values that must be interpreted in a callee-specific way to determine success failure.  I've also grown fond of de-genericizing algorithms as much as possible since concrete-ness improves intuition and understandability (even after just reading Stepanov's new book a few months ago).  If we find some use cases which want this level of genericity, then I'd be happy to either generalize this (or add another, more generic version); but I'd prefer to wait until that use case arises.

(In reply to Chris Peterson (:cpeterson) from comment #4)
Sounds good, I'll add that.

(In reply to Georg Fritzsche [:gfritzsche] from comment #6)
> If we want to unify the custom binary search implementations we have across
> the tree we also need:
> * a variant that takes a comparison functor for when operator< doesn't
> directly work on the items

This seems like it'd be a separate overload (the BinarySearchIf).  I'd prefer to hold off on adding that until someone writes a patch to convert a specific use case.

> * if there was no match, return the index where the value would be inserted
> I tried to address those over in bug 917918.

I can modify the current impl to return this index via 'match' (which would be renamed to matchOrInsertionPoint).
This patch changes the outparam to return the insertion point on miss and adds tests for this.

Also, on second thought, it's nicer to template over the container type as bug 917918 didmak instead of having some abstract "ValueAt" projection.  For one thing, it's more intuitive without sacrificing generality.  For another, it avoids the need to creating a trivial ValueAtOp when you simply want c[i].
Attachment #8409147 - Attachment is obsolete: true
Attachment #8410292 - Flags: review?(sunfish)
Comment on attachment 8410292 [details] [diff] [review]
hoist-binary-lookup

Review of attachment 8410292 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/BinarySearch.h
@@ +13,5 @@
> + * The algorithm searches the given container 'c' over the sorted index range
> + * [begin, end) for an index 'i' where 'c[i] == target'. If such an index 'i' is
> + * found, BinarySearch returns 'true' and the index is returned via the outparam
> + * 'matchOrInsertionPoint'. If no index is found, BinarySearch returns 'false'
> + * and the outparam returns the first index in [begin, end) where 'target' can

Since it can return end in the case that target is greater than any element in the container, the return is actally in [begin, end], which is perhaps better described as [begin, end+1).

@@ +35,5 @@
> +  size_t low = begin;
> +  size_t high = end;
> +  while (low != high) {
> +    size_t middle = low + (high - low) / 2;
> +    T middleValue = c[middle];

This should be const T& instead of plain T so that it doesn't copy the element, in case of a container with expensive-to-copy elements.
Attachment #8410292 - Flags: review?(sunfish) → review+
Luke: your patch does not include the loop assertions that Waldo recommended in bug 917918 comment 9 (that I linked in this bug's comment 4):

  MOZ_ASSERT(c[low] <= c[mid]);
  MOZ_ASSERT(c[middle] <= c[high]);
  MOZ_ASSERT(c[middle] <= c[high]);
https://hg.mozilla.org/mozilla-central/rev/0de0cedf6a7d
https://hg.mozilla.org/mozilla-central/rev/139c2e8c264f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.