Closed Bug 1082580 Opened 10 years ago Closed 11 months ago

nsTArray::Sort uses non-templated NS_QuickSort function where each comparison is a non-inlined function call

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: bjacob, Unassigned)

References

Details

Attachments

(1 file)

benchmarking qsort vs std::sort 10 years ago Benoit Jacob [:bjacob] (mostly away) 1.22 KB, patch		Details \| Diff \| Splinter Review

Benoit Jacob [:bjacob] (mostly away)

Reporter

Description

•

10 years ago

This is sort of the school example of where C++ can be 2x faster than C - basically the reason why templates were added to C++. NS_QuickSort is a plain C function; every time it needs to compare two array entries, it calls a comparison function that takes 3 parameters and returns a return value. Want to bet that NS_QuickSort spends more time accomoding this, than doing useful work? Anyway, I'll just use the C++ STL for now.

Nathan Froyd [:froydnj]

Comment 1

•

10 years ago

(In reply to Benoit Jacob [:bjacob] from comment #0) > Want to bet that NS_QuickSort spends more time accomoding this, than doing > useful work? Not without a profile, I don't. > Anyway, I'll just use the C++ STL for now. Of course, NS_QuickSort has the notable codesize benefit that there's not N sort implementations for N different datatypes...

Benoit Jacob [:bjacob] (mostly away)

Reporter

Comment 2

•

10 years ago

$ nm -C obj-firefox-debug/dist/bin/libxul.so | grep 'nsTArray_Impl' | grep '[>]::Sort' | wc -l 41 So we have 41 instantiations of nsTArray_Impl::Sort. With a standalone program, one can check that libstdc++ generates roughly 1.7k of code per std::sort instantiation, with -O3 -DNDEBUG, with both clang and gcc. So we can expect that switching nsTArray_Impl::Sort to std::sort would increase libxul's code size by about 41 * 1.7k ~= 70k.

Benoit Jacob [:bjacob] (mostly away)

Reporter

Comment 3

•

10 years ago

Attached patch benchmarking qsort vs std::sort — Details — Splinter Review

Results are just as expected... sometimes, the wisdom of "always ask for a profile before talking about performance" breaks down. $ clang++ -O3 -DNDEBUG a.cpp -o a $ ./a sorting an array of 1000000 elements (templated) std::sort took 0.0582 s (non-templated) std::qsort took 0.111 s (templated) std::sort took 0.0584 s (non-templated) std::qsort took 0.112 s (templated) std::sort took 0.0589 s (non-templated) std::qsort took 0.109 s (templated) std::sort took 0.0593 s (non-templated) std::qsort took 0.111 s

Nathan Froyd [:froydnj]

Comment 4

•

10 years ago

Thank you for providing the numbers. The next questions would be: 1) Is this a realistic simulation of Gecko's sorting workloads? (My guess is that virtually nothing in Gecko sorts a million elements in one go.) 2) Are we seeing profiles where NS_QuickSort is the bottleneck? I'm all for faster code, but I'm not keen on introducing ~70K of code just so a few 10-element sorts can go a microsecond faster.

Benoit Jacob [:bjacob] (mostly away)

Reporter

Comment 5

•

10 years ago

(In reply to Nathan Froyd (:froydnj) from comment #4) > Thank you for providing the numbers. > > The next questions would be: > > 1) Is this a realistic simulation of Gecko's sorting workloads? (My guess > is that virtually nothing in Gecko sorts a million elements in one go.) Actually, changing the size in this benchmark, the 2x performance difference stays roughly constant across _all_ array sizes all the way down to size 2. (Yes, clock_gettime(CLOCK_THREAD_CPUTIME_ID) is that precise). > > 2) Are we seeing profiles where NS_QuickSort is the bottleneck? That's a good question indeed :) I pushed to try a patch switching to std::sort, and Talos didn't see any significant performance difference. https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9d816742dea2 http://perf.snarkfest.net/compare-talos/?oldRevs=54217864bae9,155b84a1d18a,c7f5a7b46fcd&newRev=9d816742dea2&submit=true > > I'm all for faster code, but I'm not keen on introducing ~70K of code just > so a few 10-element sorts can go a microsecond faster. Of course. All sorts are 2x faster, even 2-element sorts, when using a templated sort, but that doesn't mean that sorting is important for gecko performance.

Simon Giesecke [:sg] [he/him]

Comment 6

•

5 years ago

I'd like to give that a heads up, given that the modern libstdc++ implementation uses introsort, which has optimal worst-case performance as opposed to quicksort?

(I have seen arguments that don't want to use Sort but rather a loop doing InsertElementSorted, because that might be faster for small N, which doesn't seem like a good situation.)

Can we just see how changing from NS_QuickSort to std::sort affects the code size, and then think again if the effect is so bad that is warrants to keep using the non-templated sort? It's not even necessarily the case that every instance of the template generates distinct code, even for different template parameters.

Simon Giesecke [:sg] [he/him]

Comment 7

•

5 years ago

It turns out that replacing NS_QuickSort by std::sort in nsTArray_Impl::Sort is not that trivial, since apparently that's used with types that are not (move-)assignable, which is a requirement for std::sort. OTOH, that lead me to the insight that nsTArray_Impl::Sort uses memutils even for types that use MOZ_DECLARE_RELOCATE_USING_MOVE_CONSTRUCTOR, which is a defect. (But I am not yet sure if sorting is done for any such element type)

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Jens Stutte [:jstutte]

Updated

•

2 years ago

Updated

•

11 months ago

Status: NEW → RESOLVED

Closed: 11 months ago

Depends on: 1839051

Resolution: --- → WORKSFORME

Bugzilla

nsTArray::Sort uses non-templated NS_QuickSort function where each comparison is a non-inlined function call

Categories

(Core :: XPCOM, defect)

Tracking

()

People

(Reporter: bjacob, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type