Closed Bug 715419 Opened 12 years ago Closed 11 years ago

consider specializing Array.prototype.sort when given the comparator is "return arg1 - arg2"

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla21

People

(Reporter: luke, Assigned: laszio.bugzilla)

References

Details

(Whiteboard: [good first bug][mentor=luke@mozila.com][lang=c++])

Attachments

(4 files, 11 obsolete files)

2.68 KB, text/plain
Details
1.06 KB, application/gzip
Details
1.01 KB, patch
luke
: review+
Details | Diff | Splinter Review
12.96 KB, patch
luke
: review+
Details | Diff | Splinter Review
We'd want to measure first, but I bet a huge percent of the uses of a comparator arg to Array.prototype.sort pass exactly this function since that is the canonical way to get an arithmetic sort.  Other common patterns we may consider measuring:

 return arg1 > arg2
 return arg2 - arg1
 return arg1.prop1 $ arg2.prop2 (where $ is either <, >, or -)

A quick'n'dirty experiment shows a 20x speedup when sorting an array of 200,000 random integers with "return arg1 - arg2".

This came up in the context of bug 715181.
Whiteboard: [good first bug][mentor=luke@mozila.com] → [good first bug][mentor=luke@mozila.com][lang=c++]
I'm seeing a speedup of ~40x (!) with this test script:

  var arr = [];
  for (var i = 0; i < 200000; ++i)
    arr.push(Math.floor(Math.random() * 200000));

  var bef = new Date();
  arr.sort(function(a,b){return a-b;});
  var aft = new Date();
  print(aft - bef);

(In reply to Luke Wagner [:luke] from comment #0)
> We'd want to measure first

How? :)
(In reply to Reuben Morais [:reuben] from comment #1)
> I'm seeing a speedup of ~40x (!) with this test script:

Yeah... calling JS from C++ is teh slow.  I should point out there is a bug to the whole of 'sort()' in JS (JS-calling-JS is fast) like v8 does: bug 462300.  Hacking trunk to work directly on ints (probably the same way you did in comment 1) shows us still 4x faster:

 trunk:                      290ms
 v8:                         66ms
 a.toInt32() <= b.toInt32(): 15ms

> (In reply to Luke Wagner [:luke] from comment #0)
> > We'd want to measure first
> 
> How? :)

First, you would write the bytecode-pattern-matching portion of the patch (an example where we already do this is js::str_replace; see first big comment).  Alternatively, we could match the pattern based on the SSA produced by script->analysis(), but let's start simple.  Then you would instrument the browser (add a pair of counters to JSRuntime (sortWithComparator and sortWithPatternMatchedComparator) which you can printf() from JS_DestroyRuntime) to see what percent of sort()-with-comparator was matching the pattern.  This data (as well as logging missed patterns) could suggest adding/dropping patterns.

If all that is too much trouble, I'm sure we'd take a patch for the pattern getarg0|getarg1|sub; it seems common enough on jsfiddle ;-)
s/a bug to the whole of/a bug to implement the whole of/
(In reply to Luke Wagner [:luke] from comment #2)
> (In reply to Reuben Morais [:reuben] from comment #1)
> > I'm seeing a speedup of ~40x (!) with this test script:
> 
> Yeah... calling JS from C++ is teh slow.  I should point out there is a bug
> to the whole of 'sort()' in JS (JS-calling-JS is fast) like v8 does: bug
> 462300.  Hacking trunk to work directly on ints (probably the same way you
> did in comment 1) shows us still 4x faster:
> 
>  trunk:                      290ms
>  v8:                         66ms
>  a.toInt32() <= b.toInt32(): 15ms
> 
> > (In reply to Luke Wagner [:luke] from comment #0)
> > > We'd want to measure first
> > 
> > How? :)
> 
> First, you would write the bytecode-pattern-matching portion of the patch
> (an example where we already do this is js::str_replace; see first big
> comment).  Alternatively, we could match the pattern based on the SSA
> produced by script->analysis(), but let's start simple.  Then you would
> instrument the browser (add a pair of counters to JSRuntime
> (sortWithComparator and sortWithPatternMatchedComparator) which you can
> printf() from JS_DestroyRuntime) to see what percent of
> sort()-with-comparator was matching the pattern.  This data (as well as
> logging missed patterns) could suggest adding/dropping patterns.
> 
> If all that is too much trouble, I'm sure we'd take a patch for the pattern
> getarg0|getarg1|sub; it seems common enough on jsfiddle ;-)

To do my initial tests I used a simple memcmp against getarg getarg sub ret. I'll use the proper opcode sizes like str_replace does and introduce the counter.

But my question in comment 1 was rather "How are we going to get measurements from real world usage?". Or are the results from benchmarks and Firefox, for example, enough for this?
(In reply to Reuben Morais [:reuben] from comment #4)
> But my question in comment 1 was rather "How are we going to get
> measurements from real world usage?". Or are the results from benchmarks and
> Firefox, for example, enough for this?

I can tell you now that there is only one use of 'sort' in SunSpider, V8, and Kraken.  But we all know they aren't the "real web" ;).  One idea is to try running membuster (http://gregor-wagner.com/tmp/mem) which opens the top 150 web sites.  Another is to browser around for a day.
(In reply to Luke Wagner [:luke] from comment #5)
> (In reply to Reuben Morais [:reuben] from comment #4)
> > But my question in comment 1 was rather "How are we going to get
> > measurements from real world usage?". Or are the results from benchmarks and
> > Firefox, for example, enough for this?
> 
> I can tell you now that there is only one use of 'sort' in SunSpider, V8,
> and Kraken.  But we all know they aren't the "real web" ;).  One idea is to
> try running membuster (http://gregor-wagner.com/tmp/mem) which opens the top
> 150 web sites.  Another is to browser around for a day.

Real like has kept me away from this bug for the last two weeks, sorry.
So, I just tested printing the stats on JS_DestroyRuntime and ~JSRuntime, and it works like expected on the JS shell, but not in the browser - it always says 0/0.
Inspecting with gdb I can see the counters being incremented, but JS_DestroyRuntime is only ever called with different JSRuntime, with no stats registered. Do you know what could cause this?
I don't.  So you have the stats in JSRuntime and you are printing in JS_DestroyRuntime?  That should work...
@luke, I would like to work on this.

I'm new to SpiderMonkey and would love if someone could explain in a summarized comment what needs to be done after all the discussions above.

Thanks.
comment 0 is still a valid summary.  The discussion above was on the subject of measuring how often this pattern really shows up.  The first step would be to write a function to match the bytecode pattern for the patterns listed in comment 0.  For a similar example, see LambdaIsGetElem in jsstr.cpp.
I'm also new to Mozilla and interested in this problem. I'm wondering if this problem was already solved or not? I checked bug #715422 and found that the solution was already integrated, although not exactly same to the original patch, into mozilla-central. There seems nothing committed for this one, though.

If there is nobody currently working on it, may I pick it up? My plan is to recognize and specialize the simple comparators early and then merge the logics with the default comparator. In this way it would also be benefited by the optimizations on the default comparator. The biggest challenge should be on recognizing. Currently I've very little knowledge about the internal JS code representation, could anybody please give me some hints?
Hi Ting-Yuan!  As comment 9 points out, a good place to get started on pattern-matching is LambdaIsGetElem in jsstr.cpp.  If you build a js shell (https://developer.mozilla.org/en-US/docs/SpiderMonkey/Build_Documentation), you can use the "dis()" function to dump the bytecode so you can play around with different functions to get a feel for what bytecode comes out and what patterns you want to match.  As comment 9 points out, it would be great to work on the pattern-matcher first and try it out on real sites to see what percentage of sorting comparators it matches (and try to improve this by looking at the functions that don't match).
Attached patch WIP patch (obsolete) — Splinter Review
Hi Luke, I'm posting a WIP path and need some feedback. Am I in the right way?

I have 4 problems/notes:

1. When writing the replacing comparator in native/C++, I realized that I was actually doing a translation. Is it possible to generalize this idea (specialization) to JIT compiling the comparator, subject to some conditions like the size of array and comparator?

2. In SubOperation() which performs the the real subtraction in the OP_SUB handler in interpreter, there is a check:

   if (!res->setNumber(d) && !(lhs.isDouble() || rhs.isDouble()))
       types::TypeScript::MonitorOverflow(cx, script, pc);

   It seems to have something to do with type inference. I have no idea how to deal with it. If it has side-effects, how could I simulate the behaviour while eliminating the comparator?

3. The "-" operator requires both of its operands to be numerics. The behaviour is thus different to the default comparator which compares alphabetically. So I separated the cases, instead of that mentioned earlier.

4. I tried to sort/compare the elements directly while they are all INT32s rather than casting to doubles in advance, which is the general case. It started to be slower than the general case with array sizes at about 100000. Therefore I decided not to add the All-INT32 specialization with the comparator specialization. The All-Strings case should also hold with much smaller array size since the string-to-numeric conversions cost much more.

Thanks!
Attachment #670681 - Flags: feedback?(luke)
(In reply to Ting-Yuan Huang from comment #12)
First of all: great job and thanks!  This is definitely in the right direction and I think all we need is to think about all the corner cases and measure measure measure.  In particular, to measure this type of thing, it'd be great to instrument a browser that keeps a count:
 - number of sort() calls
 - number of sort() calls with a comparator
 - % matched by matchNumericComparator

The second measurement is to see micro-benchmarks that show the speedup for sorting various arrays.

> 1. When writing the replacing comparator in native/C++, I realized that I
> was actually doing a translation. Is it possible to generalize this idea
> (specialization) to JIT compiling the comparator, subject to some conditions
> like the size of array and comparator?

Indeed, we will JIT the comparator when it gets hot.  The problem is that calling from C++ into JIT code isn't very fast.  In bug 797131, we just made it faster to call into JIT code (you'll notice the new FastInvokeGuard in the SortComparatorFunction branch now).  However, I still think for trivial comparators (like x - y) we can get a good speedup over the general JIT case.

> 2. In SubOperation() which performs the the real subtraction in the OP_SUB
> handler in interpreter, there is a check:
> 
>    if (!res->setNumber(d) && !(lhs.isDouble() || rhs.isDouble()))
>        types::TypeScript::MonitorOverflow(cx, script, pc);
> 
>    It seems to have something to do with type inference. I have no idea how
> to deal with it. If it has side-effects, how could I simulate the behaviour
> while eliminating the comparator?

Great question and diligence!  Type Inference monitors the result of +,-,etc so that it can tell whether the results always fit in an int32; if we running the script, we would need to check for overflow.  However, since aren't, I think we can safely ignore this.

> 3. The "-" operator requires both of its operands to be numerics. The
> behaviour is thus different to the default comparator which compares
> alphabetically. So I separated the cases, instead of that mentioned earlier.

Yes, I think you did it the right way.  I hope I didn't mislead you above.

> 4. I tried to sort/compare the elements directly while they are all INT32s
> rather than casting to doubles in advance, which is the general case. It
> started to be slower than the general case with array sizes at about 100000.
> Therefore I decided not to add the All-INT32 specialization with the
> comparator specialization. The All-Strings case should also hold with much
> smaller array size since the string-to-numeric conversions cost much more.

I'm surprised that doing an in-place All-INT32 specialization isn't faster.  Was the code essentially equivalent to the current "allInts" case (except with a numeric comparator instead of SortComparatorLexicographicInt32)?
Thanks! I'll do more measurements to get more confidence, in the way discussed earlier.

As for JIT, is it possible to give some hints, since the "hotness" of the comparator is highly predictable? Another characteristic of comparators is that they are usually pure. Could this be useful for JIT?

I'll double confirm the ALL-INT32 case and post the patch and result. By the way, the benchmarks ran on a newest 64-bits cpu(3rd-gen core i7). It may not be the case on other architectures, especially 32-bits ones without hard-fp.
(In reply to Ting-Yuan Huang from comment #14)
> Thanks! I'll do more measurements to get more confidence, in the way discussed earlier.

Great!

> As for JIT, is it possible to give some hints, since the "hotness" of the
> comparator is highly predictable?

We could force more eager compilation.  We wouldn't want to jit-compile the very first time, though, since it is useful to run a few times to let TI infer the types being used in the comparator.

> Another characteristic of comparators is
> that they are usually pure. Could this be useful for JIT?

In theory, it could.  However, calling out to C++ already causes most of the deoptimizations (spilling registers to memory, preventing hoisting, etc).
Attached patch WIP patch (obsolete) — Splinter Review
Attached is the updated patch that includes specialization on the "-" comparator over All-Int32s. A check was also made to ensure the bytecode is valid (JSFunction::isInterpreted()).

Now the All-Int32s case always runs faster when sorting on-the-fly. It is my
poor implementation that leads to poor performance and misleading results previously.

Here is the micro-benchmark results:

array size: 200000
Contents                        old(ms) new(ms) Speedup(x)
random int in 200000                347      21      16.52
random double in 200000             369      27      13.67
random string of int in [0,200000)  630      33      19.09
random string of double in Q.15    2203      76      28.99
{}                                  121      54       2.24
{} with a customized valueOf()     1408      52      27.08
{} with a customized toString()    2217      70      31.67
Number(random int in [0, 200000))   732      33      22.18
String(random int in [0, 200000))  1032      39      26.46

The speedup comes from comparators them-self, and maybe also the ability to convert the array elements into single and simple types in advance and at once.

Seems not bad. But, the "real world" statistics from http://gregor-wagner.com/tmp/mem shows that:

Total sort() called:   16468
Default comparator:    13013
Unmatched comparator:  3453
Matched comparator:    2

And this specialization introduces a little bit of overheads of check to every sort() with a customized comparator.

I'm trying to dump the unmatched comparator and will update the results later.
Attachment #670681 - Attachment is obsolete: true
Attachment #670681 - Flags: feedback?(luke)
Attached file recorded comparators
Attached are recorded comparators, where:
logs/a.txt contains the full, unmodified records.
logs/b.txt removed two dominants.

Also recorded are sizes of arrays. Most of the arrays are shorter than 10. The longest one is only 55.
Attached patch WIP patch (obsolete) — Splinter Review
When matched, it is faster even when the array size is only 2. That is, the overhead of a specialization is no more than a ordinary JS comparison. It is slower when the array size is 1 (the comparator is not called) that can be avoided completely.

From the statistics, about half of the size of array in question is 1. Am I breaking something by modifying the check from "if (i == 0) to "if (i < 2)"?
Attachment #671704 - Attachment is obsolete: true
Attachment #672142 - Flags: feedback?(luke)
Wow, those are excellent speedups on the test-cases!

I think the results from the "real world" page set just indicates that the front page of major sites don't do much sorting (and when they do, it is for tiny arrays).  That kinda makes sense; they are trying to optimize page-load speed.  Ideally, we'd try to find webapps that *did* use sort heavily (I'm guessing GMail, GDocs, facebook, twitter, ...) and see what the statistics looked like on them.  Would you be willing to try some of these popular JS-heavy sites or others that you visit?

Honestly, this is a pretty clean patch and "return x - y" is such a canonical pattern (with such a good speedup) that I'm inclined to land the change regardless.  I'll start reviewing in the meantime.
Comment on attachment 672142 [details] [diff] [review]
WIP patch

Review of attachment 672142 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, this is a great patch!  With the template factoring suggested below, I definitely think we should go forward with this.

SpiderMonkey tries to have a very consistent style (although you can always find places where we are inconsistent), so I several purely stylistic nits below; please don't be discouraged, that is normal for new contributors :)

::: js/src/jsarray.cpp
@@ +2048,5 @@
> +    return true;
> +}
> +
> +bool
> +ComparatorInt32_nsub(const Value &a, const Value &b, bool *lessOrEqualp)

Instread of "sub" and "nsub", could we have "LeftMinusRight" and "RightMinusLeft" ?

@@ +2059,5 @@
> +typedef enum {
> +    M_NONE = 0,
> +    M_NUMERIC_SUB,
> +    M_NUMERIC_NSUB
> +} MATCHED_COMPARATOR;

For greater style consistency, could you write:

enum MatchedComparator {
  Match_None,
  Match_LeftMinusRight,
  Match_RightMinusLeft
};

@@ +2064,5 @@
> +
> +/*
> + * Specialize when comparators are known.
> + *
> + * Currently it only knows the following and the counterpart:

The wording here is a bit ambiguous.  Instead of listing the bytecode, you could just write:
  Currently we only match "return x - y" and "return y - x".

@@ +2075,5 @@
> + * @param   v   The comparator in JavaScript, should be a function.
> + * @return      The corresponding native comparator on success.
> + *              NULL when failed.
> + */
> +MATCHED_COMPARATOR __attribute__((optimize("O0")))

Why is this __attribute__ necessary?  If there is a compiler bug, then it should be documented in a comment and inside an #ifdef.  If it is to affect inlining, we have portable macros JS_ALWAYS_INLINE and JS_NEVER_INLINE, depending on what you intended.  Honestly, this function doesn't seem too hot so I wouldn't think we'd want to do either.

@@ +2166,5 @@
>  
>      uint32_t len;
>      if (!GetLengthProperty(cx, obj, &len))
>          return false;
> +    if (len < 2) {

I think you are correct.  Perhaps we could add a simple comment:
  [].sort() = [] and [a].sort() = [a]

@@ +2231,5 @@
>              allInts = allInts && v.isInt32();
>          }
>  
>          n = vec.length();
> +        if (n < 2) {

Here I am not so sure.  is [,2] == [,2] or is it [2]?  I'd have to read the spec carefully.  Since this is, iiuc, a pure optimization, and not necessary for correctness, why don't we leave it unmodified for simplicity?

@@ +2296,5 @@
>              }
>          } else {
> +            MATCHED_COMPARATOR comp = MatchNumericComparator(fval);
> +
> +            if (M_NONE != comp) {

I think the standard spidermonkey style would be to write "if (comp != MatchNone)"

@@ +2331,5 @@
> +                    }
> +
> +                    /* Order vec[n:2n-1] using numElements.index */
> +                    for (size_t i = 0; i < n; i ++)
> +                        vec[n + i] = vec[numElements[i].elementIndex];

This algorithm has the same overall structure as the above StringifiedElement sort, if you hoist out the element type and the preparation function.  Do you suppose we could factor it out into a function template?  That would decrease the complexity added by this patch even more, further justifying its addition.
Attachment #672142 - Flags: feedback?(luke) → feedback+
Attached patch WIP patch (obsolete) — Splinter Review
Thanks, Luke. I refined the patch according to your comments.

Here are something worth to mention:

1. You are right, the "if (n < 2)" case is a bug that
   [,a].sort() should be [a,] rather than [,a].

2. __attribute__((optimize("O0"))__ is my fault that I used it to help debug
   and forgot to clean it up.

3. As for refactoring, instead of making the whole "Convert-Sort-Reorder" a template, I tried another approach. Because the string comparator relies on the string buffer, the construction of the comparator can't be relaxed from preparation code easily.
Attachment #672701 - Flags: review?(luke)
Attached are the comparator statistics by surfing gmail and facebook for a while.

Now the probability of successfully matching a comparator increased to 4/201. The majority of size of arrays with customized comparators is still "1" (108 out of 201).
Interesting results, thanks.  I see a lot of "return a.x - b.x" which makes sense as a common pattern (as witnessed by the 'keys' function to Python's sort).  Perhaps we should match that one day, but it would be a lot more work.

One pattern that we could pick up pretty easily is:
  return a>b?1:a<b?-1:0
Since this is, iiuc, equivalent to - as a comparator.
Comment on attachment 672701 [details] [diff] [review]
WIP patch

Review of attachment 672701 [details] [diff] [review]:
-----------------------------------------------------------------

This is looking great!  Thank you for addressing my previous comments.  The two hoisted Sort functions look good.  With this last set of style nits I think we'll be ready to land.

::: js/src/jsarray.cpp
@@ +2024,5 @@
> +
> +bool
> +ComparatorNumericLeftMinusRight(const NumericElement &a,
> +                                const NumericElement &b,
> +                                bool *lessOrEqualp)

Recently we switched to 99-charactor columns, so this should all (barely) fit on one line.  (Same goes for the 3 other comparators.)

@@ +2059,5 @@
> +
> +    return true;
> +}
> +
> +enum {

Can you name the enum ComparatorMatchResult and use it below as the return value of MatchNumericComparator and the type of 'comp'?

@@ +2127,5 @@
> +typedef bool (*ComparatorNumeric)(const NumericElement &a,
> +                                  const NumericElement &b,
> +                                  bool *lessOrEqualp);
> +
> +ComparatorNumeric SortComparatorNumerics[] = {

Can you move these two tables above so each table is right after its function definitions?

@@ +2134,5 @@
> +    ComparatorNumericRightMinusLeft
> +};
> +
> +template<typename K, typename C>
> +bool MergeSortByKey(AutoValueVector &vec, K &keys, C comparator, size_t len)

Can you JS_ASSERT(vec.length() == len)?  Also, the SM style is to put the function name on the beginning of a new line:

bool
MergeSortByKey

Can you also do this for the two functions below?

@@ +2177,5 @@
> +        cursor = sb.length();
> +    }
> +
> +    /* Resize strElements so we can perform MergeSort. */
> +    JS_ALWAYS_TRUE(strElements.resize(2 * len));

Can you also hoist this statement into MergeSortByKey and add JS_ASSERT(vec.length() == vec.reserved() / 2)?

I think your reason for not doing this is that this line is logically coupled to the above 'reserve' call.  How about you create a function: ReserveSpaceToMergeSortByKey that contains the 'reserve' call and call it from both sort functions?  That way the double trickery is abstracted from the two callers.

@@ +2192,5 @@
> +/*
> + * Sort Values as numerics.
> + *
> + * To minimize #conversions, SortNumerically first converts all Values to
> + * numerics at once, then sort the elements by these cached numerics.

"Numerically" is a good adjective, but in the comment, could you replace "numeric" with "number"?

@@ +2204,5 @@
> +        return false;
> +
> +    /* Convert Values to numerics. */
> +    for (size_t i = 0; i < len; i++) {
> +        double dv;

Since we are now a C++ codebase, the style for new code (and whenever we touch old code) is to push declarations down as close to their initialization as possible (so this 'dv' would go before 'ToNumber').

@@ +2221,5 @@
> +    JS_ALWAYS_TRUE(numElements.resize(2 * len));
> +
> +    /* Sort Values in vec numerically. */
> +    if (!MergeSortByKey(vec, numElements,
> +                SortComparatorNumerics[comp], len)) {

When a function call spans multiple lines, the SM style is to line up the arguments with the opening (https://wiki.mozilla.org/JavaScript:SpiderMonkey:C_Coding_Style#Other_whitespace).

@@ +2344,5 @@
>                                 SortComparatorLexicographicInt32(cx))) {
>                      return false;
>                  }
>              } else {
> +                if (!SortAlphabetically(cx, vec, n))

For consistency, could this be SortLexicographically instead?

@@ +2356,5 @@
> +            if (comp != Match_None) {
> +                if (allInts) {
> +                    if (!MergeSort(vec.begin(), n, vec.begin() + n,
> +                                   SortComparatorInt32s[comp]))
> +                        return false;

One SpiderMonkey style guideline is that, if the condition is multi-line, you have to brace the then-branch, even if it is only a single line.
Attachment #672701 - Flags: review?(luke)
Attached patch WIP patch (obsolete) — Splinter Review
Thanks! and sorry for so many places I missed...

I fixed most of them according to your comments but did not quite understand what you meant about reserving spaces for MergeSortByKey(). So I tried to make SortNumerically/SortLexicographically responsible for reserving space and split out the a parameter for scratch space. In this way MergeSortByKey() is used almost the same to MergeSort() and don't have to check the trick of double length -- the scratch space is explicit now.

Additionally, I changed the implementation of reordering in MergeSortByKey() a bit. It is now performed in place and avoids the trick that the sorted result is placed right after the original vector -- another double-length trick.
Attachment #672142 - Attachment is obsolete: true
Attachment #672701 - Attachment is obsolete: true
Comment on attachment 673167 [details] [diff] [review]
WIP patch

Review of attachment 673167 [details] [diff] [review]:
-----------------------------------------------------------------

Nice refactoring and nice in-place sort algorithm!  The patch looks great, only two nits and a suggested optimization.

In case you haven't, I'd run jit-tests (js/src/jit-test/jit_test.py $(PATH_TO_SHELL)) and jstests (js/src/tests/jstests.py $(PATH_TO_SHELL)) (or just push to try server).

Let me know if you need any help landing.

::: js/src/jsarray.cpp
@@ +2025,5 @@
> +bool
> +ComparatorNumericLeftMinusRight(const NumericElement &a, const NumericElement &b,
> +                                bool *lessOrEqualp)
> +{
> +    *lessOrEqualp = (a.dv <= b.dv);

No \n needed before 'return true' here or below.

@@ +2128,5 @@
> +}
> +
> +template<typename K, typename C>
> +inline bool
> +MergeSortByKey(AutoValueVector &vec, K keys, size_t len, K scratch, C comparator)

Another SM style is to put any outparams last and pass by pointer (this makes it easier to read, at call sites, which parameter is mutated).  So, in this case, that would mean putting 'vec' last.

@@ +2345,5 @@
>              args.rval().setObject(*obj);
>              return true; /* The array has only holes and undefs. */
>          }
>  
>          JS_ALWAYS_TRUE(vec.resize(n * 2));

With the in-place sorting, one slight optimization we could make is to push this 'resize' down into the paths that actually need it (the allStrings, allInts and unmatched comparator cases).  AutoVectorRooter::resize has to call makeRangeGCSafe which means this will save an O(n) pass.
Attachment #673167 - Flags: review+
Thank you very much, Luke! although I still have two questions :-)

I tested with tests/jstest.py and jit-test/jit_test.py. jit-test/jit_test.py failed (only) on jit-test/tests/basic/testOverRecursed3.js:

----------------
// |jit-test| error:InternalError

x = [];
x.push(x);
x.toString = x.sort;
x.toString();
----------------

because the case that array.length() == 1 is optimized out. The spec<1> seems to have no words on the corner case. Should we revert the optimization or adapt the test case?

As for the position of parameters, vec is used as both input and output. Should it be regarded as output?


<1> http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
    15.4.4.11 Array.prototype.sort (comparefn)
Hah!  I think we should adapt the test-case (IIUC, just add a second "x.push(x)").
Attached patch WIP patch (obsolete) — Splinter Review
Hi Luke, the patch is refined accordingly (remove newlines, adjust parameters and push vec.resize()).

With the other patch to testOverRecursed3.js, both tests/jstests.py and jit-test/jit_test.py passed. I'm waiting for level 1 commit access admission and will drop it to try server as soon as possible.

BTW, this patch is now based on 1c3e4cb1f754.
Attachment #673167 - Attachment is obsolete: true
Comment on attachment 673781 [details] [diff] [review]
Avoid the trivial/corner testcase not addressed in specification

You are most fastidious with your review requests ;)
Attachment #673781 - Flags: review?(luke) → review+
Assignee: general → thuang
Attached patch WIP patch (obsolete) — Splinter Review
Hi Luke, I missed an "AutoAssertNoGC nogc;" before getting the raw script (fun->script()) and it failed in some tests in debug builds.

Since
https://bugzilla.mozilla.org/attachment.cgi?id=673167
was already r+, I guess it's OK to mark the new patch r+?
Attachment #673783 - Attachment is obsolete: true
Attachment #676054 - Flags: review+
Found another testcase that expects undefined behaviour.
Attachment #673781 - Attachment is obsolete: true
Added header in the patch.
Attachment #676054 - Attachment is obsolete: true
Hi Luke, would you please review my modification to one more testcase? This is again caused by eliminating the need to sort arrays of size 1. Instead of lengthening the array, in this case it might be better to make the calls to "toString()" explicit.
Attachment #676074 - Attachment is obsolete: true
Attachment #676080 - Flags: review?(luke)
Comment on attachment 676080 [details] [diff] [review]
Part 1: Fix testcases that expect undefined behaviour

Hah, silliness.
Attachment #676080 - Flags: review?(luke) → review+
Oops, I forgot about this bug.  Ting-Yuan, I saw you have an @mozilla.com email so I assumed you already had hg commit privileges; do you need help?
Luke mentioned he was interested in me giving this a once-over just to be safe, as well -- flagging this so I won't forget even longer than I have already...
Flags: needinfo?(jwalden+bmo)
Hi Luke, sorry, I got some problem on the android and b2g tests on try server and I'm still trying to figure out why. Other tests seemed to be fine:
https://tbpl.mozilla.org/?tree=Try&rev=9dc9d7127309
The patch is now merged to:
Parent e57bd488af4c351016062e655b29415b993d7b13

and an earlier version reviewed by Luke:
https://bug715419.bugzilla.mozilla.org/attachment.cgi?id=673167
Attachment #676075 - Attachment is obsolete: true
Attachment #679091 - Flags: review+
OK, I believe the failed tests on try server are unrelated:
https://tbpl.mozilla.org/?tree=Try&rev=5cf58e79d4b1
https://tbpl.mozilla.org/?tree=Try&rev=9dc9d7127309

Luke, would you please help me check them in? That requires commit access level 3 and I'm currently at 1.
Keywords: checkin-needed
You bet.  I'll wait for waldo's thumbs up (comment 38).
OK, a week on my checkin-needed bug queries is long enough. Please re-request it once Waldo has responded.
Keywords: checkin-needed
Just posting here to get this back in front of folks.
Sorry for being a stickler here, but is this ever going to land?
No need to apologize, my review queue has been a disaster for awhile now.  And I can't do this now because I'm trying to focus entirely on getting through bug 769872, which has certain deadlines associated with it that require it basically be prioritized over everything else.  I can maybe stretch that slightly for super-easy reviews, but this is not such a one.  :-(
Flags: needinfo?(jwalden+bmo)
Flags: needinfo?(jwalden+bmo)
I cleared out enough of my super-important review backlog to plow through this today.  Sorry for the delay here.  :-(  As penance I'm updating this patch as needed to make it applicable against trunk again -- not that there's too much to do, but it should be my problem to do it, for delaying so long.

It took me awhile to work through all this and convince myself it was all good.  I found the best way to do that was to edit the patch and expand comments and such along the way.  So here's a rebased patch, with a few extra changes I'll call out here.  The changes are numerous enough, if all pretty simple, that I think I want someone to look at this one last time before saying "good enough".

Also, given the delay, this'll want tryservering.  But it passes jstests/jit-tests, so it should be one-last-time review-ready.

js/src/jit-test/tests/basic/testOverRecursed3.js
================================================

This test fails with the patch (at least in shell), because sorting a length-1 array no longer calls the toString method on the array (which in this case is sort, ergo recursion).  I made the array length-2 to trigger the toString-y recursion case again, per the test name.

js/src/jsarray.cpp
==================

I made ComparatorMatchResult an enum, not a typedef to an anonymous enum.

I changed the comment on MatchNumericComparator slightly, to call out x-y/y-x as particularly common bytecode patterns for a comparator function.

I had to update MatchNumericComparator to work with lazy scripts.  I could create the script to handle even lazy scripts here, but it seems not worth the trouble, so if the function is lazy-interpreted, I just don't bother matching.

I added a big comment to MergeSortByKey to explain what the code loops there were doing.  I got the vague gist from the previous comment, but it took a bit more spelling it out before I was convinced it was doing the right thing.  A double-check of this comment's interpretation against the code would be especially appreciated.  I also noted an invariant at the end of the loop that can't be asserted, and why.

I was going to change the JS_ALWAYS_TRUE(Vector::resize()) calls to use a new infallibleResize method, but implementing that's not as trivial as I'd hope.  So I'm deferring that (and changing places here) to bug 840808.

I changed some |if (!MergeSortByKey(...)) return false; return true;| into just |return MergeSortByKey(...)|.

I tweaked the comment on the length < 2 case to be more precise: (a = []).sort() == a, but it's not the case that [].sort() == [].  Ditto for [a].

I tweaked the !InParallelSection() assert to explain why in its output.

js/src/tests/js1_5/Regress/regress-312588.js
============================================

This test also expected over-recursion for sorting a deeply-nested single-element array -- I made it loop to a large number rather than infinitely.
Attachment #679091 - Attachment is obsolete: true
Attachment #713223 - Flags: review?(luke)
Flags: needinfo?(jwalden+bmo)
...oh, and now I see you found those two tests and had changed/fixed them in a separate patch.  Sigh.  :-)  I'd assumed the test fixup patch would have landed ahead of the other patch here, what with nothing holding it back.
Comment on attachment 713223 [details] [diff] [review]
Unbitrotted, with various adjustments/improvements

Review of attachment 713223 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, nice to see this about to land.  Time for sorting numbers the obvious way to be blazingly fast.

::: js/src/jsarray.cpp
@@ +1392,5 @@
> +     * Otherwise we keep moving values into vacated spaces in succession until
> +     * the vacated space is at index |i|.  Once there we fill it with the
> +     * copied value, and |(*vec)[i]| has its proper value.  This loop moves
> +     * only mispositioned elements greater than |i|, and only to their proper
> +     * positions, so it visits each element at most once.

The third para is good because it states the loop invariant and complexity, but I feel that the first two paragraphs basically restate the code without adding insight.  Perhaps they could either be removed or replaced with a single summarizing sentence?
Attachment #713223 - Flags: review?(luke) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/53de36ab95d1

I tried to summarize some, maybe I got something reasonable.  Although, really, more clarity about what was done was what I wanted most there -- I spent a long time convincing myself that loop was correct.

Thanks again for the patch, and sorry I was so slow about getting to it!  :-\
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla21
https://hg.mozilla.org/mozilla-central/rev/53de36ab95d1
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: