Closed
Bug 701560
Opened 12 years ago
Closed 12 years ago
template version of merge sort
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla11
People
(Reporter: igor, Assigned: igor)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 4 obsolete files)
33.21 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
For the bug 669245 and other GC-related patches I need a sort function. Currently jsarray.cpp provides an implementation of the merge sort. However, that uses a function pointer for the compare operation. That is be expensive especially for simple compare operations. It would be nice to have a template version of the sort implementation to avoid the overhead of indirect function calls.
Assignee | ||
Comment 1•12 years ago
|
||
The patch provides the templated version of the merge sort and switches jsarray.cpp to use it. To measure its effects I use the following benchmark that sorts an array with 2e6 strings: var a = []; for (var i = 2e6; i != 0; --i) a.push(""+i); var t = Date.now(); a.sort(); t = Date.now() - t; print("Time: "+t); Without the patch the running time is 724 ms, with the patch that drops to 451 ms.
Assignee: general → igor
Attachment #573685 -
Flags: review?(luke)
Comment 2•12 years ago
|
||
Iterative merge sort, cool :) 1) Since the scratch space is callee-allocated it doesn't matter for performance, but you could drop the bool * parameter if you adjusted the insertion sort limit (and the starting run size) based on whether ceil(log2(nelem)) (or its integer equivalent) is even or odd. That might introduce a little bit of slowdown by making INS_SORT_LIMIT dynamic though. 2) Could C++11 move semantics (e.g. *dst++ = move(*a++)) be used to avoid deep copying for sorting strings? I don't know if Mozilla's build environments all support those. 3) If comparisons can be expensive (e.g. almost-equal strings), have you considered using binary merging for cases where e.g. run1 > 8 * run2? If the majority of overhead is in the sorting function itself (e.g. sorting POD types) this won't help.
Comment 3•12 years ago
|
||
Ah, I see you do copy the result back over to src a bit further down. So yes, you can avoid that copy by doing what I proposed in 1). (sorry about the bugspam)
Assignee | ||
Comment 4•12 years ago
|
||
(In reply to Emanuel Hoogeveen from comment #2) > 1) Since the scratch space is callee-allocated it doesn't matter for > performance, but you could drop the bool * parameter if you adjusted the > insertion sort limit (and the starting run size) based on whether > ceil(log2(nelem)) (or its integer equivalent) is even or odd. That might > introduce a little bit of slowdown by making INS_SORT_LIMIT dynamic though. The goal of this bug is to replace in the existing implementation of the merge sort the function pointer comparator with a template compare operation to make things faster especially with a trivial compare operation. Optimizations like the above suggestions is orthogonal to that especially since they may slowdown things. For example, I have remember from benchmarks that the increasing the insertion sort size can make things slower, see bug 224128 comment 25 and later. So I will file a bug to consider tuning the sort including investigating decreasing the extra storage requirement to n/2. > 2) Could C++11 move semantics (e.g. *dst++ = move(*a++)) be used to avoid > deep copying for sorting strings? I don't know if Mozilla's build > environments all support those. AFAIC we cannot so far use C++11 freely. > 3) If comparisons can be expensive (e.g. almost-equal strings), have you > considered using binary merging for cases where e.g. run1 > 8 * run2? If the > majority of overhead is in the sorting function itself (e.g. sorting POD > types) this won't help. A compare operation is rather expensive if it is done using user-supplied JS function. So anything that decrease their number should be helpful. But that should not harm other cases.
Assignee | ||
Comment 5•12 years ago
|
||
The new version replaces in the comparator class operator()(a, b) with lessOrEqual(a, b) to make clearer the meaning of the compare operation.
Attachment #573685 -
Attachment is obsolete: true
Attachment #573685 -
Flags: review?(luke)
Attachment #574244 -
Flags: review?(luke)
Assignee | ||
Comment 6•12 years ago
|
||
The previous attachment had a wrong patch.
Attachment #574244 -
Attachment is obsolete: true
Attachment #574244 -
Flags: review?(luke)
Attachment #574245 -
Flags: review?(luke)
Comment 7•12 years ago
|
||
(In reply to Igor Bukanov from comment #4) > The goal of this bug is to replace in the existing implementation of the > merge sort the function pointer comparator with a template compare operation > to make things faster especially with a trivial compare operation. > Optimizations like the above suggestions is orthogonal to that You're right, I don't mean to hold up this bug for it (which is clearly an improvement over the status quo). I'll do some testing at home, see if I can setup a build environment, and file a follow-up if I can demonstrate improvements after this lands :)
![]() |
||
Comment 8•12 years ago
|
||
Comment on attachment 574245 [details] [diff] [review] v2 for real Great, the patch will make things much nicer. Can we avoid this whole "error()" business and have the return value of MergeSort and the comparator indicate success/fail? Thus lessOrEqual would return its success-value through an outparam.
Attachment #574245 -
Flags: review?(luke)
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #8) > Comment on attachment 574245 [details] [diff] [review] [diff] [details] [review] > v2 for real > > Can we avoid this whole > "error()" business and have the return value of MergeSort and the comparator > indicate success/fail? In that case if the compare function is not available at the point of template instantiation or it cannot be inlined, then the compiler cannot see that the compare function is infallible and would generate useless error checking code. With separated error function that can be inlined this is avoided. An alternative to the error() is an extra template parameter that specifies the return type of the compare operation. It can be either bool to specify an infallible operation or some 3-state type. I tried that, but it quickly became rather awkward with the need to map that extra type to void in case of infallible compare or to bool in case of fallible operation.
![]() |
||
Comment 10•12 years ago
|
||
(In reply to Igor Bukanov from comment #9) > In that case if the compare function is not available at the point of > template instantiation or it cannot be inlined, then the compiler cannot see > that the compare function is infallible and would generate useless error > checking code. For all the cases where this actually matters it should be easy (and already done) to ensure that the definition of the comparator is visible and inlined; that's the whole point, right? It would be crazy to gunk up the whole code with this or the other proposed non-standard error-handling strategy for such a miniscule and improbable situation.
Assignee | ||
Comment 11•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #10) > For all the cases where this actually matters it should be easy (and already > done) to ensure that the definition of the comparator is visible and > inlined; that's the whole point, right? It would be crazy to gunk up the > whole code with this or the other proposed non-standard error-handling > strategy for such a miniscule and improbable situation. I tried to implement a patch that would use a stl style comparator: bool operator()(const T &a, const T &b, bool *lessOrEqualp); However, that made a benchmark that similar to one in the comment 1 to run in about 569 ms, not 548 ms. as it does with the last patch. I also tried a version that uses 3-state result, like error, false, true: int operator()(const T &a, const T &b); That improved the running time slightly to 560 ms, but still this is slower than 548. So for some reasons the GCC 4.6 on 32 bit can optimize the error() method better. So is 10% reduction in speed is acceptable price for the following a standard error-handling convention?
![]() |
||
Comment 12•12 years ago
|
||
(In reply to Igor Bukanov from comment #11) > So is 10% reduction in speed is acceptable price for the following a > standard error-handling convention? I don't see why there would be a speed difference if everything was being inlined. Did you try JS_ALWAYS_INLINE'ing the comparator? Failing that, it would be interesting to see the asm diff.
Assignee | ||
Comment 13•12 years ago
|
||
This is the patch that I use to compare with v2. It requires for the comparator to implement: bool operator()(const T &a, const T &b, bool *lessOrEqualp); If I add JS_ALWAYS_INLINE on top of it, then the test timing decreases from 569 to 558. That matches the version that uses 3-state return value. But that does not match 548 that I got from that error function. Assembly diff shows AFAICS that with that error() function GCC uses less stack variables and keeps more stuff in registers.
Attachment #574668 -
Flags: review?
Assignee | ||
Comment 14•12 years ago
|
||
The new version uses the standard style of the error handling in the comparator signature but tunes the merge function implementation to minimize the amount of generated code. With that change the sorting time for the test case drops from 569 to 522 ms, which is significantly better than for the previous patch with non-standard error propagation (548 ms). The assembly shows that with GCC 4.6 on 32-bit x86 code everything is inlined even without JS_ALWAYS_INLINE on the compare functions (just inline is enough). For references, without the patch the sorting time is 839 ms.
Attachment #574245 -
Attachment is obsolete: true
Attachment #574668 -
Attachment is obsolete: true
Attachment #574668 -
Flags: review?
Attachment #574887 -
Flags: review?(luke)
![]() |
||
Comment 15•12 years ago
|
||
Comment on attachment 574887 [details] [diff] [review] v4 Review of attachment 574887 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ds/Sort.h @@ +135,5 @@ > + if (lessOrEqual) > + break; > + T tmp = array[j - 1]; > + array[j - 1] = array[j]; > + array[j] = tmp; Incidentally, gcc seems really bad manipulating uint64 by value on 32-bit. I just looked with -O3 and got something awful. When I broke it down into two manual uint32 swaps, the resulting code was 6 memory movs shorter. This suggests replacing all these inlined swaps with a call to js::Swap(a,b) (in jsutil.h) which has an overload for js::Value that does the uint32 swaps. If you're tired of tweaking this, feel free to ignore; just a suggestion. Adding a reusable Swap(a,b) does seem nice, though. @@ +157,5 @@ > + return false; > + } > + T *swap = vec1; > + vec1 = vec2; > + vec2 = swap; ... and this could use Swap() as well. ::: js/src/jsarray.cpp @@ +1988,5 @@ > + *lessOrEqualp = (result <= 0); > + return true; > +} > + > +struct SortComparatorStrings { Can all the {'s go on a newline? It matches function definitions. @@ +2129,2 @@ > ~AutoFreeVector() { > + Foreground::free_(vec); Since you are already touching this, can you just replace this with a js::Vector? @@ +2216,5 @@ > vec[2 * i].setString(str); > } while (i != 0); > > + Value *extraScratch = static_cast<Value *>(cx->malloc_(newlen * 2 * sizeof(Value))); > + if (!extraScratch) Also could you use a Vector here?
Attachment #574887 -
Flags: review?(luke) → review+
Assignee | ||
Comment 16•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #15) > This > suggests replacing all these inlined swaps with a call to js::Swap(a,b) (in > jsutil.h) which has an overload for js::Value that does the uint32 swaps. > If you're tired of tweaking this, feel free to ignore; just a suggestion. > Adding a reusable Swap(a,b) does seem nice, though. I file a separated bug about it and other sorting improvements/fixes. > > +struct SortComparatorStrings { > > Can all the {'s go on a newline? It matches function definitions. > > @@ +2129,2 @@ > > ~AutoFreeVector() { > > + Foreground::free_(vec); > > Since you are already touching this, can you just replace this with a > js::Vector? I will replace it with AutoVectorRooter.
Assignee | ||
Comment 17•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e1587f23d2f0
Comment 18•12 years ago
|
||
(In reply to Igor Bukanov from comment #17) > https://hg.mozilla.org/integration/mozilla-inbound/rev/e1587f23d2f0 Burning.
Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Ms2ger from comment #18) > (In reply to Igor Bukanov from comment #17) > > https://hg.mozilla.org/integration/mozilla-inbound/rev/e1587f23d2f0 > > Burning. I backed out that and landed a version with trivial fixes for 64 bits: https://hg.mozilla.org/integration/mozilla-inbound/rev/9f758d9c051e
Comment 20•12 years ago
|
||
Talos Regression :( Dromaeo (CSS) decrease 1.46% on XP Mozilla-Inbound-Non-PGO https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/EM2bCJ0Rk9M
Comment 21•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9f758d9c051e (Although is the regression in comment 20 expected?)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla11
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #20) > Talos Regression :( Dromaeo (CSS) decrease 1.46% on XP > Mozilla-Inbound-Non-PGO > > https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/ > EM2bCJ0Rk9M Dromaeo (CSS) does not call Array.sort. I suppose the extra code bloat coming from the template expansion could made things slower, via, for example, moving the array methods that the benchmark calls into different pages.
Comment 23•12 years ago
|
||
Comment on attachment 574887 [details] [diff] [review] v4 >--- /dev/null >+++ b/js/src/ds/Sort.h >+ * The Initial Developer of the Original Code is >+ * Mozilla Corporation. "the Mozilla Foundation."
Updated•12 years ago
|
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 24•12 years ago
|
||
How about you just hop on #jsapi, find someone to rubber-stamp the change, and push it without passive-aggressively reopening the bug seemingly just for that? I mean, really. :-)
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•