Closed Bug 403977 Opened 17 years ago Closed 17 years ago

Huge Speed Drop in Array.prototype.sort

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jeresig, Assigned: crowderbt)

References

()

Details

(Keywords: perf, regression, testcase)

Attachments

(4 files, 12 obsolete files)

2.46 KB, text/plain
Details
920 bytes, text/plain
Details
2.63 KB, text/plain
Details
15.01 KB, patch
crowderbt
: review+
Details | Diff | Splinter Review
The test_Array test on the Celtic Kane site runs significantly slower in the latest trunk code - here's the full test:

	var arr = new Array();
	for(var i=0; i<=94; i++)
		{	arr.push(i);}
	for(var i=0; i<=arr.length; i++)
	{
		arr.push( arr.pop());
		arr.reverse(arr.sort());
		arr.push(arr.splice(0,1));
	}
	arr = arr.join();

It takes about:
66ms in SpiderMonkey 1.7
950ms in Trunk

I've attached the results of Shark profiling data (let me know if some additional reports are needed).
Igor, can you field this one?

/be
Assignee: general → igor
push/pop and join calls in the test case does not affect the results. The same slowdown happens with the following test case: 

var arr = new Array();
for(var i=0; i<=94; i++)
    arr.push(i);

for(var i=0; i<=arr.length; i++)
{
    arr.reverse(arr.sort());
    arr.push(arr.splice(0,1));
}

On my Pentium-M 1.1 gHz laptop under Fedora-8 I got for an optimized build of js shell:

1.8.1: 0.20 s
trunk: 1.84 s
The reason for the slowdown is that the test case constructs an array where it is very expensive to convert to a string just one particular element. Before calling the sort the element is placed at the end of the array. This element converted to string is lexicographically smaller then the rest of elements and the sort will bring it to the beginning of the array.

Since 1.8.1 branch uses a heapsort, such arrangement will convert the element to the string only once due to the nature of heapsort. The trunk uses mergesort and that will call the string conversion for that element array.length number of times.  

A solution for the problem would be to convert elements to string only once. That is, when array_sort detects that not all eleemnts are strings with the string comparator, it can convert elements to strings only once. Since this would also provide an alternative solution to bug 371636, I suggest to address this in that bug.
Depends on: 371636
Flags: wanted1.9+
Summary: Huge Speed Drop in Array Manipulation → Huge Speed Drop in Array.prototype.sort
It is possible to imagine a case where an object's toString method could yield different results on different iterations through the sort algorithm.  This seems likely to yield unpredictable results, since the sort algorithm itself isn't specified, so we should safely be able to cache the -first- toString() result and never worry about it again...  riiiiight?  Or do we somehow need to figure out if the object has a toString override?
Nothing in the spec explicitly precludes this.  There might be some way to twist this to preclude it given a sufficiently rigid ordering of calls to SortCompare and [[Get]], but if there is, I very much doubt anyone will complain or fault us for ignoring such a hypothetical preclusion.  If we break them, any other engine is just as likely to break them with some trivial algorithm tweak.

Radix sort for the win!
(In reply to comment #7)
> so we should safely be able to cache the -first- toString()
> result and never worry about it again...  riiiiight?

That is the idea, to cache toString for Object cases, since ECMA does not define the sorting algorithm and AFAIK nobody complained that FF uses different compare sequence than MSIE.

What MSIE (or Safary) would show for a test case like: 

var N = 20, i;
var a = [], o;
for (i = 0; i != N; ++i) {
    o = { n: 0 };
    o.toString = function() { return "" + ++this.n; }
    a.push(o);
}

a.sort();

for (i = 0; i != a.length; ++i)
    a[i] = a[i].n;

alert(a.join(" "));

For FF it shows:

3 3 3 4 4 3 3 4 4 3 3 4 4 3 3 4 4 3 3 3

If the sequence would contain only 1, then for sure the caching is used.
For references here is the data for the test from comment 9 for Konqueror:

4 4 5 4 3 8 10 4 4 3 2 7 4 3 2 7 2 3 2 1

So Konqueror does not cache the results of toString for objects.

What sort algorithm does Konq use?  Are they beating us on this benchmark?
In reply to comment #8)
> Nothing in the spec explicitly precludes this.  There might be some way to
> twist this to preclude it given a sufficiently rigid ordering of calls to
> SortCompare and [[Get]]

I'm mistaken, and there's a trivial twisting -- given a three-element array,
sorting it using SortCompare requires converting one property value to string
twice, at which point behavior can't be defined any more given
"implementation-dependent sequence" and an inconsistent toString.  This is
inarguably a bug in the spec, because with this you can't satisfy the required
properties of the returned object.  Given this is a bug and nobody in their
right minds would have cared before anyway, we should absolutely not worry
about this.

(In reply to comment #11)
> What sort algorithm does Konq use?  Are they beating us on this benchmark?

Last I looked JavaScriptCore didn't use anything radix sort-y (there was a FIXME/XXX comment about it noting the increased memory to store all the strings), so I highly doubt KJS does either.  At a second glance, JSC uses merge sort up to a certain size (1000 currently) and then switches to qsort, so I'd bet we beat them on any mostly-sorted arrays of size larger than 1000, at least.  I don't know whether we win on smaller arrays.
brendan: worth bringing up as a spec-fix for ES4?
We now generate string values for pairs of val/string sort items to pass to the mergesort routine up front.  Also, the mergesort no longer tries to make guesses between *p = *q for swapping, but instead uses memcpy always (the majority of mergesort uses in the spidermonkey code are not for jsvals anymore).  Lastly, this patch removes the compare_as_strings routine, which it effectively obsoletes.  There may be a small savings to be had by restoring the "toCString" concept of the old patch for integer values, or perhaps by sorting number-only arrays using a specialized sort-routine.
Assignee: igor → crowder
Status: NEW → ASSIGNED
Attachment #291381 - Flags: review?(igor)
(In reply to comment #14)
> Created an attachment (id=291381) [details]
> pre-calculate strings for sorting

For the cases when we have non-default comparator or a default comparator with strings the patch deoptimize performance and waste extra memory imposing an overhead on 2 sane cases of sort usage. This is not good.
Comment on attachment 291381 [details] [diff] [review]
pre-calculate strings for sorting

Yeah, both easy fixes.  I'll have another go at this in the morning.
Attachment #291381 - Flags: review?(igor)
Attached patch less wasteful (obsolete) — Splinter Review
Attachment #291381 - Attachment is obsolete: true
Attachment #291522 - Flags: review?(igor)
Comment on attachment 291522 [details] [diff] [review]
less wasteful


> static JSBool
> array_sort(JSContext *cx, uintN argc, jsval *vp)
> {
>+        elemsize = sizeof(jsval);
>     } else {
>         fval = JSVAL_NULL;
>-        all_strings = JS_TRUE;  /* check for all string values */
>+        elemsize = sizeof(MSortItem);
>     }

This still wastes memory for the common case of sorting strings with the default comparator. I would suggest to reallocate the array when dealing with the default comparator after it is detected that not all elements are strings.
If I've already managed to allocate the temp array, then reallocating afterwords is even more wasteful.  The temp array itself will go away very quickly after the sorting is done, so why bother reallocating?  The only real issue here is of potentially not being able to allocate the len * 4 sized contiguous array.

The alternative would be to loop over the array elements an extra time at startup to check for all_strings, and count holes -- this would yield a much smaller initial allocation for the all-strings case, and also for arrays with holes.

Another easy optimization for strings would be to just make the toString member be a copy of the original jsval (the js_ValueToString call is obviously redundant).

I'm not sure the extra linear cost of calculating all_strings is worth it, though.
(In reply to comment #19)
> I'm not sure the extra linear cost of calculating all_strings is worth it,
> though.

we have 2 typical cases, sorting arbitrary values using non-default allocator, and sorting strings with the default allocator. Both cases are already optimized in the current code and should not be de-optimized with the patch. The problem is the third case when one wants to sort non-string elements with the default comparator. 

So far I have not seen a single practical case for that, but it is used in various benchmarks :(. Thus the patch should target this particular case with an extra code if necessary. Otherwise we would get an example of optimizing for benchmarks for the price of slowing down the practical cases. 

For that I suggest in the existing code after array_sort collects the elements into a temporary vector to realloc and re-arrange it into the vector of 2-word cells. During such rearrangement the code can also convert the elements into strings. Now, if MSortItem would be declared as 

typedef struct MSortItem {
    jsval  toString;
    jsval  value;
} MSortItem;

then one would be able to use the existing sort_compare_strings as-is saving the need to add an extra sorting function.
 
(In reply to comment #20)
> re-arrange it into the vector of 2-word cells.

In fact MSortItem is not necessary. One can just move the elements to the even positions in the reallocated array, put their toString results as odd element, call 

ok = js_MergeSort(vec, (size_t) newlen, 2 * sizeof(jsval),
                  sort_compare_strings, cx, mergesort_tmp);
 
Then it just move back the even elements.

 During such rearrangement the code can also convert the elements into
> strings. Now, if MSortItem would be declared as 
> 
> typedef struct MSortItem {
>     jsval  toString;
>     jsval  value;
> } MSortItem;
> 
> then one would be able to use the existing sort_compare_strings as-is saving
> the need to add an extra sorting function.
> 
> 

Attached patch least wasteful? (obsolete) — Splinter Review
In terms of allocated space, this should be the least "wasteful" implementation so far, but I'm not convinced it is the optimal one.  I think we may indeed be better off -not- doing O(n) GetArrayElement() calls up front, simply to calculate whether or not the array contains strings (the more-precise calculation of the allocated temporary vector for arrays with holes may be more like icing, depending on how many really huge sparse arrays happen in the wild).  Perhaps what we -really- ought to be doing is something more optimal still for homogenous arrays (ie., all ints? skip the string conversion entirely)
Attachment #291522 - Attachment is obsolete: true
Attachment #291604 - Flags: review?(igor)
Attachment #291522 - Flags: review?(igor)
(In reply to comment #22)
> Created an attachment (id=291604) [details]
> least wasteful?

The patch enumerates the elements of the JS array twice. That does not work since a getter can return different elements each time.
Attached patch Using realloc (obsolete) — Splinter Review
This an alternative version that realloc the temporary vector to make for room for strings when it is known that the default allocator is used with with non-string array.

The patch also shrinks the temporary array when it is known that it comes with undefs/holes to relive the memory pressure ASAP in the case of huge spare arrays.
Attachment #291654 - Flags: review?(crowder)
(In reply to comment #22)
> Perhaps what we -really- ought to be doing is something more optimal
> still for homogenous arrays (ie., all ints? skip the string conversion
> entirely)

That would be really optimizing for the sake of benchmarks. Sorting an array with numbers using the default comparator orders the numbers lexicographically. This is not very useful and most likely means a bug in the code or an unpractical benchmark. 

We should not spend too much efforts on that unless there is a need like the problem exposed by this bug demonstrating performance regression in certain cases against 1.8 branch. If fixing the regression also helps the benchmarks, then fine. At least this is not programming just to make FF to *look* faster.
Comment on attachment 291654 [details] [diff] [review]
Using realloc

+            tvr.count = newlen * 4;

Shouldn't this happen earlier, in case a GC happens during js_ValueToString in the loop above? (if I'm mistaken, please explain why)  Otherwise, looks good to me (is this technically patchcest?  do we need another reviewer?).
Attachment #291654 - Flags: review?(crowder) → review+
Attachment #291604 - Attachment is obsolete: true
Attachment #291604 - Flags: review?(igor)
(In reply to comment #26)
> (From update of attachment 291654 [details] [diff] [review])
> +            tvr.count = newlen * 4;
> 
> Shouldn't this happen earlier, in case a GC happens during js_ValueToString in
> the loop above?

Note the code before that:

mergesort_tmp = vec + newlen;
memset(mergesort_tmp, 0, newlen * sizeof(jsval));
tvr.count = newlen * 2;

This roots vec[0..2*newlen - 1]. The string conversion loop needs precisely this space. When that loop is done and the code got strings for all elements without errors, then code tries to reallocate the array to grab memory for merge sort scratch space. 
Right, of course.  Does the mergesort scratch space actually need to be contiguous with the storage for the actual elements?  Would it be easier/faster to alloc a new vector for that, rather than allocating a double-sized vector and copying?
Comment on attachment 291654 [details] [diff] [review]
Using realloc

The patch address a performance regression with the sort implementation.
Attachment #291654 - Flags: approval1.9?
(In reply to comment #28)
> Right, of course.  Does the mergesort scratch space actually need to be
> contiguous with the storage for the actual elements?

No, it does not. But replacing realloc with malloc wouls lead to more code to write with an extra tvr and JS_free.
Comment on attachment 291654 [details] [diff] [review]
Using realloc

a=drivers for when trunk opens after Fx3 Beta 2 freeze
Attachment #291654 - Flags: approvalM10-
Attachment #291654 - Flags: approval1.9?
Attachment #291654 - Flags: approval1.9+
(Brendan/Igor: if you think this is rock-solid safe and really needed for beta2, please renominate for M10; we're on a tight timeline to make the december ship date, which is why we're being wary)
Comment on attachment 291654 [details] [diff] [review]
Using realloc

Shaver and other poked me about this and expressed risk vs. reward.

It's now a=drivers for M10, please land ASAP.
Attachment #291654 - Flags: approvalM10- → approvalM10+
(In reply to comment #32)

This can wait beta3.

<;)>
Plus landing this now is not good from a marketing point of view. The trunk already has the patch from bug 371636 that covers most advanced sorting benchmarks so they would run faster in beta2. Landing this after beta2 would allow to claim yet another performance improvement in the area of sorting in beta 3. 
</;)>
Comment on attachment 291654 [details] [diff] [review]
Using realloc

Works for drivers - this will wait until after the Beta 2 freeze.
Attachment #291654 - Flags: approvalM10+ → approvalM10-
Igor, do you want to land this?
(In reply to comment #36)
> Igor, do you want to land this?

I wanted to do it today, but tinderbox was mostly orange today. And the current state does not look promising for a check in window before I go to sleep:(. 
I'll land it as soon as the trees look green, then, if you don't mind?
(In reply to comment #38)
> I'll land it as soon as the trees look green, then, if you don't mind?

Thanks, that would be nice.
jsarray.c:3.133
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
(In reply to comment #40)
> jsarray.c:3.133
> 

This may have cause the regressions in <http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTest/1197404518.1197405575.18881.gz>
This caused mochitest failures and has been backed out
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Using realloc v2 (obsolete) — Splinter Review
The new version makes sure that the fastcopy optimization in js_MergeSort is only used when the size of elements is sizeof(jsval). The previous version applied the optimization based on the sorting routine. Reuse of sort_compare_strings to sort (cached_toString value) pairs broke that.
Attachment #291654 - Attachment is obsolete: true
Attachment #292687 - Flags: review?(crowder)
Attached file diff between v1 and v2 patches (obsolete) —
Comment on attachment 292687 [details] [diff] [review]
Using realloc v2

Have you checked this against the regressions reported by the two test-suites?  Sorry I missed this in review...  also, how do you feel about abandoning the fastcopy stuff entirely?  I'm not sure it's a performance improvement, since the fastcopy checks seem like they must be as costly or more costly than simply always using memcpy.
Attachment #292687 - Flags: review?(crowder) → review+
(In reply to comment #46)
> Have you checked this against the regressions reported by the two test-suites? 

The result of the bug is that it would leave an array with non-string elements unsorted. This precisely matches the failures reported by the mochitests where they complained that [1, 2, -1].sort() gives [1, 2, -1] and not [-1, 1, 2].

> how do you feel about abandoning the fastcopy stuff entirely?

Theoretically the check for the fastcopy flag and a word copy should be faster than a generic purpose memcpy. On the other hand the sort is not used frequently and, when it used, it is often used for small arrays. So extra fastcopy checks => more code to load to the CPU cache. With small arrays that are already in the cache the extra bandwidth may in fact slowdown the rare sort calls.

Thus I am in favor of removing the checks.
Brendan says let's test the fastcopy perf win/loss before we do it.  I agree.  If you want to leave it in for now, I will hit it in a follow-up bug?
Attached patch Using realloc v3 (obsolete) — Splinter Review
The new version avoids an extra && when detecting the fastcopy mode.
Attachment #292687 - Attachment is obsolete: true
Attachment #292803 - Flags: review?(crowder)
Comment on attachment 292803 [details] [diff] [review]
Using realloc v3

+        /* Use realloc to ignore failures when shrinking. */
+        vec = realloc(vec, (size_t) newlen * 2 * sizeof(jsval));

Shrinking here seems pointless, especially if we might have to grow again later depending on the sort criteria.  Keep worst-case to only two reallocs for this sort?  We aren't regressing, since we used to simply alloc (2 * len + 1) for the new vector and scratch space.

+  * sort_compare_strings we move the original values to the odd
+  * indexes in vec, put the string conversion results to the even
Nit: "put ... to" should be "put ... in"

+ * This requires to double the temporary storage including the
Nit: "This requires doubling..."

+ * Re-orange and string-convert the elements of the vector from
"Re-orange" should be "rearrange"?

+ * the tail here and, after the sorting, move the results back
Nit: "after sorting" instead of "after the sorting"

Sorry I missed the comment-nits the first time around.
(In reply to comment #50)
> (From update of attachment 292803 [details] [diff] [review])
> +        /* Use realloc to ignore failures when shrinking. */
> +        vec = realloc(vec, (size_t) newlen * 2 * sizeof(jsval));
> 
> Shrinking here seems pointless, especially if we might have to grow again later
> depending on the sort criteria.

When sorting case like 
a = Array(1 << 20);
a[0] = 0;
a.sort();

that realloc will shrink 8MB array down to 8 bytes before the sorting. 

If the second growing realloc happens, it would mean a default allocator against an array with holes and non-string elements. This is not the case I care to waste an extra code.
> When sorting case like 
> a = Array(1 << 20);
> a[0] = 0;
> a.sort();
> 
> that realloc will shrink 8MB array down to 8 bytes before the sorting. 

This is seemingly a very rare case (as are very, very sparse arrays, in general, I'm guessing), is it really worth optimizing for?  The 8MB array is going to be released very quickly, and if it goes largely unused, then it is unlikely the OS will commit the pages for it anyway.  Why go to the effort of running yet another alloc for the common case of an array that's 80-90% full and for which shrinking provides little benefit at potentially significant cost?

> If the second growing realloc happens, it would mean a default allocator
> against an array with holes and non-string elements. This is not the case I
> care to waste an extra code.

Certainly, in that case the 2nd realloc -is- necessary, and there's not a lot you can do about it anyway.  -This- realloc still seems pointless to me, though.  Even in your worst-case example, the allocation only hangs around for to duration of the sort, and you're not avoiding an OOM condition here, since you've already got an array large enough to support the sort and the required scratch space.  True; you're saving some space for the pending toString conversions, but I think for the common case it is not enough to justify the extra alloc/memcpy of this realloc.
A shrinking realloc does not necessarily copy. On some systems it can, and so can fail (BSD-style naive size classifying mallocs do this).

/be
(In reply to comment #53)
> A shrinking realloc does not necessarily copy. On some systems it can, and so
> can fail (BSD-style naive size classifying mallocs do this).
> 
> /be

Yeah, that's my concern.  I realize on some systems the shrinking-realloc is really a NOOP.  But why pay for it on systems that it isn't?
Attached patch Using realloc v4 (obsolete) — Splinter Review
The new version fixes the nits and removes the shrinking realloc.
Attachment #292803 - Attachment is obsolete: true
Attachment #292835 - Flags: review?(crowder)
Attachment #292803 - Flags: review?(crowder)
Attached patch v3-v4 delta (obsolete) — Splinter Review
Comment on attachment 292835 [details] [diff] [review]
Using realloc v4

Sorry I missed this one last time:

+ * (above). The remaining newlen positions used as GC rooted temp space
Nit: a/used as/are used as/ and maybe call "temp" space "scratch" space instead?

+ * can use it to re-arrange and convert to strings first and try

"rearrange"

+            do {
+                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
+                if (!ok)
+                    goto out;
+                vec[i] = vec[2 * i + 1];
+            } while (++i != newlen);

Don't want to check operation limit here.  Bad cut and paste?  Once again, sorry I'm catching this on review #3.
Attachment #292835 - Flags: review?(crowder) → review+
(In reply to comment #57)
> +            do {
> +                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
> +                if (!ok)
> +                    goto out;
> +                vec[i] = vec[2 * i + 1];
> +            } while (++i != newlen);
> 
> Don't want to check operation limit here.

I would prefer to give a branch callback a chance to stop even such seemingly fast loop. 
Attached patch Using realloc v4b (obsolete) — Splinter Review
This v4 with comment changes to fix bad English. Here is the delta:

     /*
      * The first newlen elements of vec are copied from the array object
-     * (above). The remaining newlen positions used as GC rooted temp space
-     * for mergesort. Here we clear the tmp-values before GC-rooting them.
-     * We assume JSVAL_NULL==0 to optimize initialization using memset.
+     * (above). The remaining newlen positions are used as GC-rooted scratch
+     * space for mergesort. We must clear the space before including it to
+     * the root set covered by tvr.count. We assume JSVAL_NULL==0 to optimize
+     * initialization using memset.
      */
     mergesort_tmp = vec + newlen;
     memset(mergesort_tmp, 0, newlen * sizeof(jsval));
     tvr.count = newlen * 2;
 
     /* Here len == 2 * (newlen + undefs + number_of_holes). */
     if (fval == JSVAL_NULL) {
         /*
@@ -1196,17 +1197,17 @@ array_sort(JSContext *cx, uintN argc, js
              * indexes and pass 2 * sizeof(jsval) as an element size to the
              * sorting function. In this way sort_compare_strings will only
              * see the string values when it casts the compare arguments as
              * pointers to jsval.
              *
              * This requires doubling the temporary storage including the
              * scratch space for the merge sort. Since vec already contains
              * the rooted scratch space for newlen elements at the tail, we
-             * can use it to re-arrange and convert to strings first and try
+             * can use it to rearrange and convert to strings first and try
              * realloc only when we know that we successfully converted all
              * the elements.
              */
Attachment #292689 - Attachment is obsolete: true
Attachment #292835 - Attachment is obsolete: true
Attachment #292839 - Attachment is obsolete: true
Attachment #292855 - Flags: review?(crowder)
Don't the potential branch-callback(s) here -dramatically- slow this loop down?  I thought we used CHECK_OPERATION_LIMIT essentially to restrict JSOPs, not C ops.  Without that check and its ok-check, this loop should be astoundingly fast.  And I realize this is a fallacious argument, but do we really intend to give tight control like this to all of the C loops in spidermonkey?
Attached patch Using realloc v5 (obsolete) — Splinter Review
I removed the branch callback from that loop and added comments with reasons for not calling it. Here is a delta against v4b:

         if (!all_strings) {
+            /*
+             * Do not call the branch callback in the following moving loop
+             * to make it fast and unroot the cached results of toString
+             * invocations before the callback has a chance to run the GC.
+             */
             i = 0;
             do {
-                ok = JS_CHECK_OPERATION_LIMIT(cx, JSOW_JUMP);
-                if (!ok)
-                    goto out;
                 vec[i] = vec[2 * i + 1];
             } while (++i != newlen);
Attachment #292855 - Attachment is obsolete: true
Attachment #292932 - Flags: review?(crowder)
Attachment #292855 - Flags: review?(crowder)
Comment on attachment 292932 [details] [diff] [review]
Using realloc v5

w00t (<--- Merriam Webster's word-of-the-year, I can formally use it in bugzilla now)!  I will land this shortly, if you don't.
Attachment #292932 - Flags: review?(crowder) → review+
(In reply to comment #61)
>              i = 0;
>              do {
>                  vec[i] = vec[2 * i + 1];
>              } while (++i != newlen);
> 

Is this the same as...

  for (i = 0; i < newlen; i++)
      vec[i] = vec[2 * i + 1];

?
Yeah, I wouldn't mind seeing that rephrased...  (now that the op-count is gone)
Attached patch using realloc v5b (obsolete) — Splinter Review
With jorendorff's nit picked.  I'm landing this now.
Attachment #292932 - Attachment is obsolete: true
Attachment #292984 - Flags: review+
Last patch had a bogus config.mk change in it.
Attachment #292984 - Attachment is obsolete: true
Attachment #292985 - Flags: review+
jsarray.c: 3.135
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
(In reply to comment #63)
> (In reply to comment #61)
> >              i = 0;
> >              do {
> >                  vec[i] = vec[2 * i + 1];
> >              } while (++i != newlen);
> > 
> 
> Is this the same as...
> 
>   for (i = 0; i < newlen; i++)
>       vec[i] = vec[2 * i + 1];
> 

TThis is not the same as the later will have to check for i < newlen even when i == 0 and will have an extra jump in the generated code. 
> > Is this the same as...
> > 
> >   for (i = 0; i < newlen; i++)
> >       vec[i] = vec[2 * i + 1];
> > 
> 
> TThis is not the same as the later will have to check for i < newlen even when
> i == 0 and will have an extra jump in the generated code. 

Worth noting, maybe worth fixing?  I'm not too worried about it, but will gladly r+ and land a follow-up patch, if you're interested, Igor.
Any reasonably smart compiler should be able to see that newlen isn't reset between the |newlen == 0| check and the loop, so the generated code should ideally be the same.  Even if they don't, I wouldn't be surprised to see them generate special code for the 0 case to avoid the branch inside the loop.  I don't think this is worth worrying about, but feel free to check what the compilers do if you're feeling paranoid -- I wouldn't bother, personally.
(In reply to comment #68)
> This is not the same as the later will have to check for i < newlen even when
> i == 0 and will have an extra jump in the generated code. 

Yeah, but computers are fast.  The former causes extra branches in my brain!
Blocks: 408368
(In reply to comment #70)
> Any reasonably smart compiler should be able to see that newlen isn't reset
> between the |newlen == 0| check and the loop, so the generated code should
> ideally be the same. 

I filed the bug 408368 that shows the difference with GCC at -Os. For references, for the browser the option is just -O. But I am not going to spend more time on this and if somebody wants, then feel free to grab that bug.

> Even if they don't, I wouldn't be surprised to see them
> generate special code for the 0 case to avoid the branch inside the loop.  I
> don't think this is worth worrying about, but feel free to check what the
> compilers do if you're feeling paranoid -- I wouldn't bother, personally.

The issue here is that a comment/nit for the patch from comment 61 to use a different loop was picked and landed without giving a chance for the author of the patch to reply.

I do not mind and, in fact, would appreciate if somebody fixes a bad English or a code style violation in my patch before checking it in. But changing the code before landing without giving a chance to explain the original reasons behind it or just to acknowledge a suggestion does not sound right.
> I do not mind and, in fact, would appreciate if somebody fixes a bad English or
> a code style violation in my patch before checking it in. But changing the code
> before landing without giving a chance to explain the original reasons behind
> it or just to acknowledge a suggestion does not sound right.

Igor:  Very sorry, this was bad form on my part, caused by some urgency to land the patch.  In the future, I'll follow up with the rewrite.
This is reverted in CVS to Igor's original code (and I hope Igor will forgive my faux pas).  If the war between efficiency and clarity is won, do let me know.
You need to log in before you can comment on or make changes to this bug.