Last Comment Bug 403977 - Huge Speed Drop in Array.prototype.sort
: Huge Speed Drop in Array.prototype.sort
Status: RESOLVED FIXED
: perf, regression, testcase
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal with 2 votes (vote)
: ---
Assigned To: Brian Crowder
:
:
Mentors:
http://celtickane.com/projects/jsspee...
Depends on: 371636
Blocks: 408368
  Show dependency treegraph
 
Reported: 2007-11-15 15:47 PST by John Resig
Modified: 2007-12-14 09:33 PST (History)
12 users (show)
brendan: wanted1.9+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Shark Profile of Trunk Code in JS Shell (2.46 KB, text/plain)
2007-11-15 15:49 PST, John Resig
no flags Details
Shark Profile of SpiderMonkey 1.7.0 Code in JS Shell (920 bytes, text/plain)
2007-11-15 15:50 PST, John Resig
no flags Details
Shark Profile of Trunk vs 1.7.0 Code in JS Shell (2.63 KB, text/plain)
2007-11-15 15:51 PST, John Resig
no flags Details
pre-calculate strings for sorting (14.61 KB, patch)
2007-12-03 22:13 PST, Brian Crowder
no flags Details | Diff | Splinter Review
less wasteful (15.51 KB, patch)
2007-12-04 14:12 PST, Brian Crowder
no flags Details | Diff | Splinter Review
least wasteful? (16.61 KB, patch)
2007-12-04 20:53 PST, Brian Crowder
no flags Details | Diff | Splinter Review
Using realloc (14.21 KB, patch)
2007-12-05 05:44 PST, Igor Bukanov
crowderbt: review+
mbeltzner: approvalM10-
mbeltzner: approval1.9+
Details | Diff | Splinter Review
Using realloc v2 (15.55 KB, patch)
2007-12-11 15:04 PST, Igor Bukanov
crowderbt: review+
Details | Diff | Splinter Review
diff between v1 and v2 patches (2.52 KB, text/plain)
2007-12-11 15:06 PST, Igor Bukanov
no flags Details
Using realloc v3 (15.59 KB, patch)
2007-12-12 11:26 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Using realloc v4 (14.82 KB, patch)
2007-12-12 13:38 PST, Igor Bukanov
crowderbt: review+
Details | Diff | Splinter Review
v3-v4 delta (3.85 KB, patch)
2007-12-12 13:42 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Using realloc v4b (14.93 KB, patch)
2007-12-12 14:23 PST, Igor Bukanov
no flags Details | Diff | Splinter Review
Using realloc v5 (15.07 KB, patch)
2007-12-13 02:01 PST, Igor Bukanov
crowderbt: review+
Details | Diff | Splinter Review
using realloc v5b (15.85 KB, patch)
2007-12-13 11:47 PST, Brian Crowder
crowderbt: review+
Details | Diff | Splinter Review
using realloc v5b (for real) (15.01 KB, patch)
2007-12-13 11:51 PST, Brian Crowder
crowderbt: review+
Details | Diff | Splinter Review

Description John Resig 2007-11-15 15:47:28 PST
The test_Array test on the Celtic Kane site runs significantly slower in the latest trunk code - here's the full test:

	var arr = new Array();
	for(var i=0; i<=94; i++)
		{	arr.push(i);}
	for(var i=0; i<=arr.length; i++)
	{
		arr.push( arr.pop());
		arr.reverse(arr.sort());
		arr.push(arr.splice(0,1));
	}
	arr = arr.join();

It takes about:
66ms in SpiderMonkey 1.7
950ms in Trunk

I've attached the results of Shark profiling data (let me know if some additional reports are needed).
Comment 1 John Resig 2007-11-15 15:49:13 PST
Created attachment 288908 [details]
Shark Profile of Trunk Code in JS Shell
Comment 2 John Resig 2007-11-15 15:50:13 PST
Created attachment 288909 [details]
Shark Profile of SpiderMonkey 1.7.0 Code in JS Shell
Comment 3 John Resig 2007-11-15 15:51:01 PST
Created attachment 288910 [details]
Shark Profile of Trunk vs 1.7.0 Code in JS Shell
Comment 4 Brendan Eich [:brendan] 2007-11-15 20:48:22 PST
Igor, can you field this one?

/be
Comment 5 Igor Bukanov 2007-11-16 00:32:18 PST
push/pop and join calls in the test case does not affect the results. The same slowdown happens with the following test case: 

var arr = new Array();
for(var i=0; i<=94; i++)
    arr.push(i);

for(var i=0; i<=arr.length; i++)
{
    arr.reverse(arr.sort());
    arr.push(arr.splice(0,1));
}

On my Pentium-M 1.1 gHz laptop under Fedora-8 I got for an optimized build of js shell:

1.8.1: 0.20 s
trunk: 1.84 s
Comment 6 Igor Bukanov 2007-11-16 01:26:42 PST
The reason for the slowdown is that the test case constructs an array where it is very expensive to convert to a string just one particular element. Before calling the sort the element is placed at the end of the array. This element converted to string is lexicographically smaller then the rest of elements and the sort will bring it to the beginning of the array.

Since 1.8.1 branch uses a heapsort, such arrangement will convert the element to the string only once due to the nature of heapsort. The trunk uses mergesort and that will call the string conversion for that element array.length number of times.  

A solution for the problem would be to convert elements to string only once. That is, when array_sort detects that not all eleemnts are strings with the string comparator, it can convert elements to strings only once. Since this would also provide an alternative solution to bug 371636, I suggest to address this in that bug.
Comment 7 Brian Crowder 2007-11-28 15:00:31 PST
It is possible to imagine a case where an object's toString method could yield different results on different iterations through the sort algorithm.  This seems likely to yield unpredictable results, since the sort algorithm itself isn't specified, so we should safely be able to cache the -first- toString() result and never worry about it again...  riiiiight?  Or do we somehow need to figure out if the object has a toString override?
Comment 8 Jeff Walden [:Waldo] (remove +bmo to email) 2007-11-28 15:43:30 PST
Nothing in the spec explicitly precludes this.  There might be some way to twist this to preclude it given a sufficiently rigid ordering of calls to SortCompare and [[Get]], but if there is, I very much doubt anyone will complain or fault us for ignoring such a hypothetical preclusion.  If we break them, any other engine is just as likely to break them with some trivial algorithm tweak.

Radix sort for the win!
Comment 9 Igor Bukanov 2007-11-28 15:44:09 PST
(In reply to comment #7)
> so we should safely be able to cache the -first- toString()
> result and never worry about it again...  riiiiight?

That is the idea, to cache toString for Object cases, since ECMA does not define the sorting algorithm and AFAIK nobody complained that FF uses different compare sequence than MSIE.

What MSIE (or Safary) would show for a test case like: 

var N = 20, i;
var a = [], o;
for (i = 0; i != N; ++i) {
    o = { n: 0 };
    o.toString = function() { return "" + ++this.n; }
    a.push(o);
}

a.sort();

for (i = 0; i != a.length; ++i)
    a[i] = a[i].n;

alert(a.join(" "));

For FF it shows:

3 3 3 4 4 3 3 4 4 3 3 4 4 3 3 4 4 3 3 3

If the sequence would contain only 1, then for sure the caching is used.
Comment 10 Igor Bukanov 2007-11-28 16:01:45 PST
For references here is the data for the test from comment 9 for Konqueror:

4 4 5 4 3 8 10 4 4 3 2 7 4 3 2 7 2 3 2 1

So Konqueror does not cache the results of toString for objects.

Comment 11 Brian Crowder 2007-11-28 16:08:18 PST
What sort algorithm does Konq use?  Are they beating us on this benchmark?
Comment 12 Jeff Walden [:Waldo] (remove +bmo to email) 2007-11-28 16:20:08 PST
In reply to comment #8)
> Nothing in the spec explicitly precludes this.  There might be some way to
> twist this to preclude it given a sufficiently rigid ordering of calls to
> SortCompare and [[Get]]

I'm mistaken, and there's a trivial twisting -- given a three-element array,
sorting it using SortCompare requires converting one property value to string
twice, at which point behavior can't be defined any more given
"implementation-dependent sequence" and an inconsistent toString.  This is
inarguably a bug in the spec, because with this you can't satisfy the required
properties of the returned object.  Given this is a bug and nobody in their
right minds would have cared before anyway, we should absolutely not worry
about this.

(In reply to comment #11)
> What sort algorithm does Konq use?  Are they beating us on this benchmark?

Last I looked JavaScriptCore didn't use anything radix sort-y (there was a FIXME/XXX comment about it noting the increased memory to store all the strings), so I highly doubt KJS does either.  At a second glance, JSC uses merge sort up to a certain size (1000 currently) and then switches to qsort, so I'd bet we beat them on any mostly-sorted arrays of size larger than 1000, at least.  I don't know whether we win on smaller arrays.
Comment 13 Brian Crowder 2007-11-28 16:35:16 PST
brendan: worth bringing up as a spec-fix for ES4?
Comment 14 Brian Crowder 2007-12-03 22:13:51 PST
Created attachment 291381 [details] [diff] [review]
pre-calculate strings for sorting

We now generate string values for pairs of val/string sort items to pass to the mergesort routine up front.  Also, the mergesort no longer tries to make guesses between *p = *q for swapping, but instead uses memcpy always (the majority of mergesort uses in the spidermonkey code are not for jsvals anymore).  Lastly, this patch removes the compare_as_strings routine, which it effectively obsoletes.  There may be a small savings to be had by restoring the "toCString" concept of the old patch for integer values, or perhaps by sorting number-only arrays using a specialized sort-routine.
Comment 15 Igor Bukanov 2007-12-04 01:03:01 PST
(In reply to comment #14)
> Created an attachment (id=291381) [details]
> pre-calculate strings for sorting

For the cases when we have non-default comparator or a default comparator with strings the patch deoptimize performance and waste extra memory imposing an overhead on 2 sane cases of sort usage. This is not good.
Comment 16 Brian Crowder 2007-12-04 01:13:50 PST
Comment on attachment 291381 [details] [diff] [review]
pre-calculate strings for sorting

Yeah, both easy fixes.  I'll have another go at this in the morning.
Comment 17 Brian Crowder 2007-12-04 14:12:25 PST
Created attachment 291522 [details] [diff] [review]
less wasteful
Comment 18 Igor Bukanov 2007-12-04 14:23:55 PST
Comment on attachment 291522 [details] [diff] [review]
less wasteful


> static JSBool
> array_sort(JSContext *cx, uintN argc, jsval *vp)
> {
>+        elemsize = sizeof(jsval);
>     } else {
>         fval = JSVAL_NULL;
>-        all_strings = JS_TRUE;  /* check for all string values */
>+        elemsize = sizeof(MSortItem);
>     }

This still wastes memory for the common case of sorting strings with the default comparator. I would suggest to reallocate the array when dealing with the default comparator after it is detected that not all elements are strings.
Comment 19 Brian Crowder 2007-12-04 14:31:24 PST
If I've already managed to allocate the temp array, then reallocating afterwords is even more wasteful.  The temp array itself will go away very quickly after the sorting is done, so why bother reallocating?  The only real issue here is of potentially not being able to allocate the len * 4 sized contiguous array.

The alternative would be to loop over the array elements an extra time at startup to check for all_strings, and count holes -- this would yield a much smaller initial allocation for the all-strings case, and also for arrays with holes.

Another easy optimization for strings would be to just make the toString member be a copy of the original jsval (the js_ValueToString call is obviously redundant).

I'm not sure the extra linear cost of calculating all_strings is worth it, though.
Comment 20 Igor Bukanov 2007-12-04 15:02:29 PST
(In reply to comment #19)
> I'm not sure the extra linear cost of calculating all_strings is worth it,
> though.

we have 2 typical cases, sorting arbitrary values using non-default allocator, and sorting strings with the default allocator. Both cases are already optimized in the current code and should not be de-optimized with the patch. The problem is the third case when one wants to sort non-string elements with the default comparator. 

So far I have not seen a single practical case for that, but it is used in various benchmarks :(. Thus the patch should target this particular case with an extra code if necessary. Otherwise we would get an example of optimizing for benchmarks for the price of slowing down the practical cases. 

For that I suggest in the existing code after array_sort collects the elements into a temporary vector to realloc and re-arrange it into the vector of 2-word cells. During such rearrangement the code can also convert the elements into strings. Now, if MSortItem would be declared as 

typedef struct MSortItem {
    jsval  toString;
    jsval  value;
} MSortItem;

then one would be able to use the existing sort_compare_strings as-is saving the need to add an extra sorting function.
 
Comment 21 Igor Bukanov 2007-12-04 15:07:49 PST
(In reply to comment #20)
> re-arrange it into the vector of 2-word cells.

In fact MSortItem is not necessary. One can just move the elements to the even positions in the reallocated array, put their toString results as odd element, call 

ok = js_MergeSort(vec, (size_t) newlen, 2 * sizeof(jsval),
                  sort_compare_strings, cx, mergesort_tmp);
 
Then it just move back the even elements.

 During such rearrangement the code can also convert the elements into
> strings. Now, if MSortItem would be declared as 
> 
> typedef struct MSortItem {
>     jsval  toString;
>     jsval  value;
> } MSortItem;
> 
> then one would be able to use the existing sort_compare_strings as-is saving
> the need to add an extra sorting function.
> 
> 

Comment 22 Brian Crowder 2007-12-04 20:53:53 PST
Created attachment 291604 [details] [diff] [review]
least wasteful?

In terms of allocated space, this should be the least "wasteful" implementation so far, but I'm not convinced it is the optimal one.  I think we may indeed be better off -not- doing O(n) GetArrayElement() calls up front, simply to calculate whether or not the array contains strings (the more-precise calculation of the allocated temporary vector for arrays with holes may be more like icing, depending on how many really huge sparse arrays happen in the wild).  Perhaps what we -really- ought to be doing is something more optimal still for homogenous arrays (ie., all ints? skip the string conversion entirely)
Comment 23 Igor Bukanov 2007-12-05 04:01:35 PST
(In reply to comment #22)
> Created an attachment (id=291604) [details]
> least wasteful?

The patch enumerates the elements of the JS array twice. That does not work since a getter can return different elements each time.
Comment 24 Igor Bukanov 2007-12-05 05:44:13 PST
Created attachment 291654 [details] [diff] [review]
Using realloc

This an alternative version that realloc the temporary vector to make for room for strings when it is known that the default allocator is used with with non-string array.

The patch also shrinks the temporary array when it is known that it comes with undefs/holes to relive the memory pressure ASAP in the case of huge spare arrays.
Comment 25 Igor Bukanov 2007-12-05 05:56:51 PST
(In reply to comment #22)
> Perhaps what we -really- ought to be doing is something more optimal
> still for homogenous arrays (ie., all ints? skip the string conversion
> entirely)

That would be really optimizing for the sake of benchmarks. Sorting an array with numbers using the default comparator orders the numbers lexicographically. This is not very useful and most likely means a bug in the code or an unpractical benchmark. 

We should not spend too much efforts on that unless there is a need like the problem exposed by this bug demonstrating performance regression in certain cases against 1.8 branch. If fixing the regression also helps the benchmarks, then fine. At least this is not programming just to make FF to *look* faster.
Comment 26 Brian Crowder 2007-12-05 08:31:42 PST
Comment on attachment 291654 [details] [diff] [review]
Using realloc

+            tvr.count = newlen * 4;

Shouldn't this happen earlier, in case a GC happens during js_ValueToString in the loop above? (if I'm mistaken, please explain why)  Otherwise, looks good to me (is this technically patchcest?  do we need another reviewer?).
Comment 27 Igor Bukanov 2007-12-05 09:05:26 PST
(In reply to comment #26)
> (From update of attachment 291654 [details] [diff] [review])
> +            tvr.count = newlen * 4;
> 
> Shouldn't this happen earlier, in case a GC happens during js_ValueToString in
> the loop above?

Note the code before that:

mergesort_tmp = vec + newlen;
memset(mergesort_tmp, 0, newlen * sizeof(jsval));
tvr.count = newlen * 2;

This roots vec[0..2*newlen - 1]. The string conversion loop needs precisely this space. When that loop is done and the code got strings for all elements without errors, then code tries to reallocate the array to grab memory for merge sort scratch space. 
Comment 28 Brian Crowder 2007-12-05 09:08:53 PST
Right, of course.  Does the mergesort scratch space actually need to be contiguous with the storage for the actual elements?  Would it be easier/faster to alloc a new vector for that, rather than allocating a double-sized vector and copying?
Comment 29 Igor Bukanov 2007-12-05 09:09:16 PST
Comment on attachment 291654 [details] [diff] [review]
Using realloc

The patch address a performance regression with the sort implementation.
Comment 30 Igor Bukanov 2007-12-05 09:15:57 PST
(In reply to comment #28)
> Right, of course.  Does the mergesort scratch space actually need to be
> contiguous with the storage for the actual elements?

No, it does not. But replacing realloc with malloc wouls lead to more code to write with an extra tvr and JS_free.
Comment 31 Mike Beltzner [:beltzner, not reading bugmail] 2007-12-06 11:18:46 PST
Comment on attachment 291654 [details] [diff] [review]
Using realloc

a=drivers for when trunk opens after Fx3 Beta 2 freeze
Comment 32 Mike Beltzner [:beltzner, not reading bugmail] 2007-12-06 11:19:50 PST
(Brendan/Igor: if you think this is rock-solid safe and really needed for beta2, please renominate for M10; we're on a tight timeline to make the december ship date, which is why we're being wary)
Comment 33 Mike Beltzner [:beltzner, not reading bugmail] 2007-12-06 11:31:05 PST
Comment on attachment 291654 [details] [diff] [review]
Using realloc

Shaver and other poked me about this and expressed risk vs. reward.

It's now a=drivers for M10, please land ASAP.
Comment 34 Igor Bukanov 2007-12-06 11:33:25 PST
(In reply to comment #32)

This can wait beta3.

<;)>
Plus landing this now is not good from a marketing point of view. The trunk already has the patch from bug 371636 that covers most advanced sorting benchmarks so they would run faster in beta2. Landing this after beta2 would allow to claim yet another performance improvement in the area of sorting in beta 3. 
</;)>
Comment 35 Mike Beltzner [:beltzner, not reading bugmail] 2007-12-06 11:36:27 PST
Comment on attachment 291654 [details] [diff] [review]
Using realloc

Works for drivers - this will wait until after the Beta 2 freeze.
Comment 36 Brian Crowder 2007-12-11 08:58:05 PST
Igor, do you want to land this?
Comment 37 Igor Bukanov 2007-12-11 10:34:58 PST
(In reply to comment #36)
> Igor, do you want to land this?

I wanted to do it today, but tinderbox was mostly orange today. And the current state does not look promising for a check in window before I go to sleep:(. 
Comment 38 Brian Crowder 2007-12-11 10:46:58 PST
I'll land it as soon as the trees look green, then, if you don't mind?
Comment 39 Igor Bukanov 2007-12-11 10:59:54 PST
(In reply to comment #38)
> I'll land it as soon as the trees look green, then, if you don't mind?

Thanks, that would be nice.
Comment 40 Brian Crowder 2007-12-11 12:17:56 PST
jsarray.c:3.133
Comment 41 Bob Clary [:bc:] 2007-12-11 12:47:46 PST
(In reply to comment #40)
> jsarray.c:3.133
> 

This may have cause the regressions in <http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1197404518.1197405575.18881.gz>
Comment 42 Brian Crowder 2007-12-11 13:08:13 PST
This caused mochitest failures and has been backed out
Comment 44 Igor Bukanov 2007-12-11 15:04:47 PST
Created attachment 292687 [details] [diff] [review]
Using realloc v2

The new version makes sure that the fastcopy optimization in js_MergeSort is only used when the size of elements is sizeof(jsval). The previous version applied the optimization based on the sorting routine. Reuse of sort_compare_strings to sort (cached_toString value) pairs broke that.
Comment 45 Igor Bukanov 2007-12-11 15:06:19 PST
Created attachment 292689 [details]
diff between v1 and v2 patches
Comment 46 Brian Crowder 2007-12-11 15:22:04 PST
Comment on attachment 292687 [details] [diff] [review]
Using realloc v2

Have you checked this against the regressions reported by the two test-suites?  Sorry I missed this in review...  also, how do you feel about abandoning the fastcopy stuff entirely?  I'm not sure it's a performance improvement, since the fastcopy checks seem like they must be as costly or more costly than simply always using memcpy.
Comment 47 Igor Bukanov 2007-12-11 15:51:08 PST
(In reply to comment #46)
> Have you checked this against the regressions reported by the two test-suites? 

The result of the bug is that it would leave an array with non-string elements unsorted. This precisely matches the failures reported by the mochitests where they complained that [1, 2, -1].sort() gives [1, 2, -1] and not [-1, 1, 2].

> how do you feel about abandoning the fastcopy stuff entirely?

Theoretically the check for the fastcopy flag and a word copy should be faster than a generic purpose memcpy. On the other hand the sort is not used frequently and, when it used, it is often used for small arrays. So extra fastcopy checks => more code to load to the CPU cache. With small arrays that are already in the cache the extra bandwidth may in fact slowdown the rare sort calls.

Thus I am in favor of removing the checks.
Comment 48 Brian Crowder 2007-12-11 17:19:26 PST
Brendan says let's test the fastcopy perf win/loss before we do it.  I agree.  If you want to leave it in for now, I will hit it in a follow-up bug?
Comment 49 Igor Bukanov 2007-12-12 11:26:33 PST
Created attachment 292803 [details] [diff] [review]
Using realloc v3

The new version avoids an extra && when detecting the fastcopy mode.
Comment 50 Brian Crowder 2007-12-12 11:44:37 PST
Comment on attachment 292803 [details] [diff] [review]
Using realloc v3

+        /* Use realloc to ignore failures when shrinking. */
+        vec = realloc(vec, (size_t) newlen * 2 * sizeof(jsval));

Shrinking here seems pointless, especially if we might have to grow again later depending on the sort criteria.  Keep worst-case to only two reallocs for this sort?  We aren't regressing, since we used to simply alloc (2 * len + 1) for the new vector and scratch space.

+  * sort_compare_strings we move the original values to the odd
+  * indexes in vec, put the string conversion results to the even
Nit: "put ... to" should be "put ... in"

+ * This requires to double the temporary storage including the
Nit: "This requires doubling..."

+ * Re-orange and string-convert the elements of the vector from
"Re-orange" should be "rearrange"?

+ * the tail here and, after the sorting, move the results back
Nit: "after sorting" instead of "after the sorting"

Sorry I missed the comment-nits the first time around.
Comment 51 Igor Bukanov 2007-12-12 12:02:24 PST
(In reply to comment #50)
> (From update of attachment 292803 [details] [diff] [review])
> +        /* Use realloc to ignore failures when shrinking. */
> +        vec = realloc(vec, (size_t) newlen * 2 * sizeof(jsval));
> 
> Shrinking here seems pointless, especially if we might have to grow again later
> depending on the sort criteria.

When sorting case like 
a = Array(1 << 20);
a[0] = 0;
a.sort();

that realloc will shrink 8MB array down to 8 bytes before the sorting. 

If the second growing realloc happens, it would mean a default allocator against an array with holes and non-string elements. This is not the case I care to waste an extra code.
Comment 52 Brian Crowder 2007-12-12 12:16:30 PST
> When sorting case like 
> a = Array(1 << 20);
> a[0] = 0;
> a.sort();
> 
> that realloc will shrink 8MB array down to 8 bytes before the sorting. 

This is seemingly a very rare case (as are very, very sparse arrays, in general, I'm guessing), is it really worth optimizing for?  The 8MB array is going to be released very quickly, and if it goes largely unused, then it is unlikely the OS will commit the pages for it anyway.  Why go to the effort of running yet another alloc for the common case of an array that's 80-90% full and for which shrinking provides little benefit at potentially significant cost?

> If the second growing realloc happens, it would mean a default allocator
> against an array with holes and non-string elements. This is not the case I
> care to waste an extra code.

Certainly, in that case the 2nd realloc -is- necessary, and there's not a lot you can do about it anyway.  -This- realloc still seems pointless to me, though.  Even in your worst-case example, the allocation only hangs around for to duration of the sort, and you're not avoiding an OOM condition here, since you've already got an array large enough to support the sort and the required scratch space.  True; you're saving some space for the pending toString conversions, but I think for the common case it is not enough to justify the extra alloc/memcpy of this realloc.
Comment 53 Brendan Eich [:brendan] 2007-12-12 12:20:51 PST
A shrinking realloc does not necessarily copy. On some systems it can, and so can fail (BSD-style naive size classifying mallocs do this).

/be
Comment 54 Brian Crowder 2007-12-12 12:24:22 PST
(In reply to comment #53)
> A shrinking realloc does not necessarily copy. On some systems it can, and so
> can fail (BSD-style naive size classifying mallocs do this).
> 
> /be

Yeah, that's my concern.  I realize on some systems the shrinking-realloc is really a NOOP.  But why pay for it on systems that it isn't?
Comment 55 Igor Bukanov 2007-12-12 13:38:14 PST
Created attachment 292835 [details] [diff] [review]
Using realloc v4

The new version fixes the nits and removes the shrinking realloc.
Comment 56 Igor Bukanov 2007-12-12 13:42:19 PST
Created attachment 292839 [details] [diff] [review]
v3-v4 delta
Comment 57 Brian Crowder 2007-12-12 13:46:29 PST
Comment on attachment 292835 [details] [diff] [review]
Using realloc v4

Sorry I missed this one last time:

+ * (above). The remaining newlen positions used as GC rooted temp space
Nit: a/used as/are used as/ and maybe call "temp" space "scratch" space instead?

+ * can use it to re-arrange and convert to strings first and try

"rearrange"

+            do {
+                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
+                if (!ok)
+                    goto out;
+                vec[i] = vec[2 * i + 1];
+            } while (++i != newlen);

Don't want to check operation limit here.  Bad cut and paste?  Once again, sorry I'm catching this on review #3.
Comment 58 Igor Bukanov 2007-12-12 14:20:51 PST
(In reply to comment #57)
> +            do {
> +                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
> +                if (!ok)
> +                    goto out;
> +                vec[i] = vec[2 * i + 1];
> +            } while (++i != newlen);
> 
> Don't want to check operation limit here.

I would prefer to give a branch callback a chance to stop even such seemingly fast loop. 
Comment 59 Igor Bukanov 2007-12-12 14:23:26 PST
Created attachment 292855 [details] [diff] [review]
Using realloc v4b

This v4 with comment changes to fix bad English. Here is the delta:

     /*
      * The first newlen elements of vec are copied from the array object
-     * (above). The remaining newlen positions used as GC rooted temp space
-     * for mergesort. Here we clear the tmp-values before GC-rooting them.
-     * We assume JSVAL_NULL==0 to optimize initialization using memset.
+     * (above). The remaining newlen positions are used as GC-rooted scratch
+     * space for mergesort. We must clear the space before including it to
+     * the root set covered by tvr.count. We assume JSVAL_NULL==0 to optimize
+     * initialization using memset.
      */
     mergesort_tmp = vec + newlen;
     memset(mergesort_tmp, 0, newlen * sizeof(jsval));
     tvr.count = newlen * 2;
 
     /* Here len == 2 * (newlen + undefs + number_of_holes). */
     if (fval == JSVAL_NULL) {
         /*
@@ -1196,17 +1197,17 @@ array_sort(JSContext *cx, uintN argc, js
              * indexes and pass 2 * sizeof(jsval) as an element size to the
              * sorting function. In this way sort_compare_strings will only
              * see the string values when it casts the compare arguments as
              * pointers to jsval.
              *
              * This requires doubling the temporary storage including the
              * scratch space for the merge sort. Since vec already contains
              * the rooted scratch space for newlen elements at the tail, we
-             * can use it to re-arrange and convert to strings first and try
+             * can use it to rearrange and convert to strings first and try
              * realloc only when we know that we successfully converted all
              * the elements.
              */
Comment 60 Brian Crowder 2007-12-12 14:25:49 PST
Don't the potential branch-callback(s) here -dramatically- slow this loop down?  I thought we used CHECK_OPERATION_LIMIT essentially to restrict JSOPs, not C ops.  Without that check and its ok-check, this loop should be astoundingly fast.  And I realize this is a fallacious argument, but do we really intend to give tight control like this to all of the C loops in spidermonkey?
Comment 61 Igor Bukanov 2007-12-13 02:01:46 PST
Created attachment 292932 [details] [diff] [review]
Using realloc v5

I removed the branch callback from that loop and added comments with reasons for not calling it. Here is a delta against v4b:

         if (!all_strings) {
+            /*
+             * Do not call the branch callback in the following moving loop
+             * to make it fast and unroot the cached results of toString
+             * invocations before the callback has a chance to run the GC.
+             */
             i = 0;
             do {
-                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
-                if (!ok)
-                    goto out;
                 vec[i] = vec[2 * i + 1];
             } while (++i != newlen);
Comment 62 Brian Crowder 2007-12-13 10:33:40 PST
Comment on attachment 292932 [details] [diff] [review]
Using realloc v5

w00t (<--- Merriam Webster's word-of-the-year, I can formally use it in bugzilla now)!  I will land this shortly, if you don't.
Comment 63 Jason Orendorff [:jorendorff] 2007-12-13 11:28:32 PST
(In reply to comment #61)
>              i = 0;
>              do {
>                  vec[i] = vec[2 * i + 1];
>              } while (++i != newlen);
> 

Is this the same as...

  for (i = 0; i < newlen; i++)
      vec[i] = vec[2 * i + 1];

?
Comment 64 Brian Crowder 2007-12-13 11:40:50 PST
Yeah, I wouldn't mind seeing that rephrased...  (now that the op-count is gone)
Comment 65 Brian Crowder 2007-12-13 11:47:50 PST
Created attachment 292984 [details] [diff] [review]
using realloc v5b

With jorendorff's nit picked.  I'm landing this now.
Comment 66 Brian Crowder 2007-12-13 11:51:07 PST
Created attachment 292985 [details] [diff] [review]
using realloc v5b (for real)

Last patch had a bogus config.mk change in it.
Comment 67 Brian Crowder 2007-12-13 12:29:21 PST
jsarray.c: 3.135
Comment 68 Igor Bukanov 2007-12-13 14:25:23 PST
(In reply to comment #63)
> (In reply to comment #61)
> >              i = 0;
> >              do {
> >                  vec[i] = vec[2 * i + 1];
> >              } while (++i != newlen);
> > 
> 
> Is this the same as...
> 
>   for (i = 0; i < newlen; i++)
>       vec[i] = vec[2 * i + 1];
> 

TThis is not the same as the later will have to check for i < newlen even when i == 0 and will have an extra jump in the generated code. 
Comment 69 Brian Crowder 2007-12-13 15:09:05 PST
> > Is this the same as...
> > 
> >   for (i = 0; i < newlen; i++)
> >       vec[i] = vec[2 * i + 1];
> > 
> 
> TThis is not the same as the later will have to check for i < newlen even when
> i == 0 and will have an extra jump in the generated code. 

Worth noting, maybe worth fixing?  I'm not too worried about it, but will gladly r+ and land a follow-up patch, if you're interested, Igor.
Comment 70 Jeff Walden [:Waldo] (remove +bmo to email) 2007-12-13 23:07:14 PST
Any reasonably smart compiler should be able to see that newlen isn't reset between the |newlen == 0| check and the loop, so the generated code should ideally be the same.  Even if they don't, I wouldn't be surprised to see them generate special code for the 0 case to avoid the branch inside the loop.  I don't think this is worth worrying about, but feel free to check what the compilers do if you're feeling paranoid -- I wouldn't bother, personally.
Comment 71 Jason Orendorff [:jorendorff] 2007-12-14 06:57:46 PST
(In reply to comment #68)
> This is not the same as the later will have to check for i < newlen even when
> i == 0 and will have an extra jump in the generated code. 

Yeah, but computers are fast.  The former causes extra branches in my brain!
Comment 72 Igor Bukanov 2007-12-14 08:26:24 PST
(In reply to comment #70)
> Any reasonably smart compiler should be able to see that newlen isn't reset
> between the |newlen == 0| check and the loop, so the generated code should
> ideally be the same. 

I filed the bug 408368 that shows the difference with GCC at -Os. For references, for the browser the option is just -O. But I am not going to spend more time on this and if somebody wants, then feel free to grab that bug.

> Even if they don't, I wouldn't be surprised to see them
> generate special code for the 0 case to avoid the branch inside the loop.  I
> don't think this is worth worrying about, but feel free to check what the
> compilers do if you're feeling paranoid -- I wouldn't bother, personally.

The issue here is that a comment/nit for the patch from comment 61 to use a different loop was picked and landed without giving a chance for the author of the patch to reply.

I do not mind and, in fact, would appreciate if somebody fixes a bad English or a code style violation in my patch before checking it in. But changing the code before landing without giving a chance to explain the original reasons behind it or just to acknowledge a suggestion does not sound right.
Comment 73 Brian Crowder 2007-12-14 09:23:27 PST
> I do not mind and, in fact, would appreciate if somebody fixes a bad English or
> a code style violation in my patch before checking it in. But changing the code
> before landing without giving a chance to explain the original reasons behind
> it or just to acknowledge a suggestion does not sound right.

Igor:  Very sorry, this was bad form on my part, caused by some urgency to land the patch.  In the future, I'll follow up with the rewrite.
Comment 74 Brian Crowder 2007-12-14 09:33:37 PST
This is reverted in CVS to Igor's original code (and I hope Igor will forgive my faux pas).  If the war between efficiency and clarity is won, do let me know.

Note You need to log in before you can comment on or make changes to this bug.