Write a fast GCD algorithm
Categories
(Core :: MFBT, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox115 | --- | fixed |
People
(Reporter: padenot, Assigned: padenot)
References
(Blocks 1 open bug)
Details
Attachments
(4 files)
I'm going to reduce a bunch of fractions in another patch, and the gcd algorithm we have is very naive, lets get something faster.
Assignee | ||
Comment 1•2 years ago
|
||
Compile with:
clang++ -O3 bench-gcd.cpp -std=c++17
sample run on x86_64:
~::$ ./a.out
reducing 1000000 fractions
binary -- took: 99391810ns (99.3918ns per fraction) 307709210 cycles, 307.709 per fractions
euclid -- took: 1003870058ns (1003.87ns per fraction) 3107962156 cycles, 3107.96 per fractions
That's about a 10x speedup.
On the g++ I have locally, the speedup is less pronounced: the new version is slower than clang (~170ns per fraction), the old version is faster than clang (~656ns per fraction). Weird but also not that problematic.
Comment 2•2 years ago
|
||
Could we actually just switch to std::gcd
https://en.cppreference.com/w/cpp/numeric/gcd?
Assignee | ||
Comment 3•2 years ago
|
||
It's ~10% slower than my code on the clang we use at the optimization level we use.
Comment 4•2 years ago
|
||
The implementation available in https://en.algorithmica.org/hpc/algorithms/gcd/ avoids a few branching and is, in my setup, more than two times faster than the version linked to this patch. It's worth giving it a try :-)
Comment 5•2 years ago
|
||
I asked ChatGPT for some answers, and it gave these 4 versions.
Assignee | ||
Comment 6•2 years ago
|
||
Depends on D173313
Assignee | ||
Comment 7•2 years ago
|
||
Perf notes:
https://lemire.me/blog/2013/12/26/fastest-way-to-compute-the-greatest-common-divisor/
Depends on D173314
Comment 8•2 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #3)
It's ~10% slower than my code on the clang we use at the optimization level we use.
Looking at the generated assembly code, your version has a tighter loop code when compared to std::gcd
. Only after switching to -march=x86-64-v3
both versions generate (essentially) the same code.
Assignee | ||
Comment 9•2 years ago
|
||
(In reply to André Bargull [:anba] from comment #8)
(In reply to Paul Adenot (:padenot) from comment #3)
It's ~10% slower than my code on the clang we use at the optimization level we use.
Looking at the generated assembly code, your version has a tighter loop code when compared to
std::gcd
. Only after switching to-march=x86-64-v3
both versions generate (essentially) the same code.
The current code has a version that's 30% faster than the previous one I had, so even faster.
Comment 10•2 years ago
|
||
I think the current version has a bug, because it doesn't handle overflows in T diff = aA - aB;
. For example GCD<uint32_t>(3, 7)
returns 3
, whereas std::gcd<uint32_t>(3, 7)
returns the correct result 1
.
Comment 11•2 years ago
|
||
Comment 12•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6dd80575e551
https://hg.mozilla.org/mozilla-central/rev/04c60cd83c5f
Comment 13•2 years ago
|
||
Backed out 80 changesets (bug 1821362, bug 1703812, bug 1817997) for causing media crashes as in Bug 1833890.
Backout link: https://hg.mozilla.org/mozilla-central/rev/225c5ab0d999e743db5298d125893ae0702884af
Comment 14•2 years ago
|
||
Comment 15•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/159e2eee7549
https://hg.mozilla.org/mozilla-central/rev/dc506e3ad29d
Description
•