Closed
Bug 1477622
Opened 6 years ago
Closed 6 years ago
Add microbenchmarks measuring hash table performance
Categories
(Core :: XPCOM, enhancement)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(1 file)
Hash tables are used a lot, and we have a bunch of different implementations. Microbenchmarks will give us a sense of how they compare, performance-wise.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 2•6 years ago
|
||
Results from an opt build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 39.0 ms 35.3 ms 16.9 ms 23.9 ms > 38.2 ms 35.2 ms 15.7 ms 24.4 ms > 38.4 ms 36.1 ms 16.5 ms 25.4 ms > 39.3 ms 36.2 ms 16.1 ms 24.6 ms > 39.9 ms 36.3 ms 16.0 ms 25.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 116225, "replicates": [115107,113632,116428,116225,117817], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (579 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 40.6 ms 40.1 ms 19.6 ms 24.9 ms > 39.2 ms 38.0 ms 18.7 ms 25.0 ms > 38.8 ms 38.0 ms 19.4 ms 25.3 ms > 38.6 ms 38.8 ms 18.7 ms 24.9 ms > 38.6 ms 37.8 ms 18.6 ms 25.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 121068, "replicates": [125224,120960,121527,121068,119899], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (609 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 17.9 ms 22.0 ms 12.8 ms 6.8 ms > 18.2 ms 21.1 ms 13.0 ms 7.2 ms > 18.1 ms 21.5 ms 12.8 ms 7.0 ms > 17.3 ms 22.0 ms 13.0 ms 7.4 ms > 17.4 ms 21.8 ms 12.4 ms 7.2 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 59439, "replicates": [59474,59439,59405,59711,58833], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (297 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 847.6 ms 830.6 ms 127.1 ms 107.7 ms > 865.8 ms 837.4 ms 128.4 ms 108.3 ms > 870.3 ms 819.4 ms 127.1 ms 107.8 ms > 1006.6 ms 853.4 ms 128.0 ms 115.3 ms > 884.0 ms 831.1 ms 126.5 ms 106.8 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 1939955, "replicates": [1912975,1939955,1924661,2103302,1948463], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (9830 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 701.5 ms 679.4 ms 118.9 ms 106.0 ms > 737.3 ms 784.3 ms 121.3 ms 105.8 ms > 703.2 ms 673.9 ms 119.4 ms 105.6 ms > 703.7 ms 676.4 ms 119.4 ms 108.1 ms > 708.8 ms 684.0 ms 121.9 ms 110.7 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 1607643, "replicates": [1605724,1748703,1602168,1607643,1625358], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (8189 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 468.8 ms 334.1 ms 55.7 ms 118.1 ms > 384.4 ms 292.0 ms 54.6 ms 116.4 ms > 447.6 ms 348.5 ms 60.1 ms 125.5 ms > 459.9 ms 292.1 ms 54.9 ms 115.9 ms > 387.6 ms 296.2 ms 54.9 ms 117.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 922830, "replicates": [976660,847416,981651,922830,855663], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (4584 ms) Results from a release (i.e. --enable-release) build on my fast Linux box: > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 52.4 ms 21.0 ms 33.5 ms > 41.6 ms 52.2 ms 19.9 ms 33.7 ms > 41.3 ms 51.4 ms 19.3 ms 33.5 ms > 41.8 ms 51.1 ms 19.8 ms 34.2 ms > 41.3 ms 51.3 ms 19.3 ms 34.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 146968, "replicates": [150070,147343,145430,146968,145905], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (736 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 43.2 ms 55.6 ms 19.7 ms 34.0 ms > 43.6 ms 56.9 ms 19.6 ms 34.6 ms > 45.5 ms 55.6 ms 19.6 ms 33.8 ms > 43.1 ms 54.9 ms 19.1 ms 33.6 ms > 43.0 ms 55.2 ms 18.8 ms 34.4 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 152587, "replicates": [152587,154664,154490,150807,151499], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (764 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 15.2 ms 17.8 ms 12.2 ms 5.5 ms > 15.5 ms 18.5 ms 11.7 ms 5.7 ms > 14.9 ms 18.5 ms 11.9 ms 5.4 ms > 15.3 ms 17.7 ms 11.9 ms 5.3 ms > 14.7 ms 18.3 ms 11.6 ms 5.5 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 50608, "replicates": [50650,51419,50608,50147,50102], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (253 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 71.1 ms 46.7 ms 14.5 ms 8.2 ms > 66.0 ms 47.1 ms 15.0 ms 8.1 ms > 62.3 ms 46.4 ms 15.2 ms 8.1 ms > 66.4 ms 46.4 ms 14.6 ms 8.2 ms > 66.8 ms 46.9 ms 14.7 ms 8.1 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 136219, "replicates": [140511,136219,132036,135527,136441], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (681 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 55.7 ms 32.8 ms 12.8 ms 8.2 ms > 55.2 ms 33.5 ms 12.8 ms 8.2 ms > 54.0 ms 33.1 ms 12.8 ms 8.2 ms > 54.3 ms 32.4 ms 12.9 ms 8.1 ms > 55.0 ms 33.4 ms 12.9 ms 8.6 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 109523, "replicates": [109523,109718,108079,107759,109966], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (545 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 9.2 ms 7.4 ms 5.2 ms 7.8 ms > 8.9 ms 7.4 ms 5.0 ms 7.9 ms > 8.8 ms 7.5 ms 5.0 ms 8.4 ms > 8.6 ms 7.6 ms 5.1 ms 7.9 ms > 8.6 ms 7.8 ms 5.0 ms 7.9 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 29352, "replicates": [29609,29286,29733,29298,29352], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (148 ms) Things of note when comparing the non-release build with the release build: - PLDHash and nsDataHash are noticeably slower with --enable-release. - JSHash is noticeably faster with --enable-release. - The Rust ones are massively (10--20x) faster with --enable-release. (The non-release Rust results are quite similar to a debug build.) I'm not sure what to make of that. Things of note within the release build results: - nsDataHash is the same or slightly slower than PLDHash. This makes sense, because it's layered on top of PLDHash. - JSHash is much faster than PLDHash/nsDataHash: about 2--6x faster, depending on the benchmark. They use almost identical double hashing algorithms, so the most likely reason for this is that PLDHash has "virtual ops" that require function calls, while JSHash uses templates and so can inline those calls. - Rust hash tables with default hashing are slower than PLDHash for succ_lookups but faster for the other benchmarks. Fnv hashing speeds things up a bit. Fx hashing speeds things up a *lot*. - FxHash is the fastest overall, and JSHash is second fastest. The others are significantly slower. Conclusions: - We should use JSHash instead of PLDHash/nsDataHash for hot tables such as CCGraph::mPtrToNodeMap (bug 1477627). This will require moving JSHash into MFBT (bug 1477626), as was done with js::Vector some time ago. Then hot PLDHash/nsDataHash instances can be converted to JSHash. (As well as being faster, JSHash also has a much nicer API.) - We should use FxHash instead of FnvHash within Rust code (bug 1477628). Unlike PLDHash-to-JSHash conversions, which must be done one at a time and only when it makes sense, FnvHash-to-FxHash conversions are trivial and can be done en masse. (BTW, this very change was a win for rustc back in 2016: https://github.com/rust-lang/rust/pull/37229). - Robin hood hashing may explain why FxHash is faster than JSHash. But it looks like there are some constant factors (e.g. avoiding virtual op calls) that can be fixed more easily and give bigger wins than changing the algorithm.
Comment 3•6 years ago
|
||
mozreview-review |
Comment on attachment 8994104 [details] Bug 1477622 - Add microbenchmarks measuring hash table performance. https://reviewboard.mozilla.org/r/258730/#review265684 Thanks. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:261 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, PLDHash, [this] { > + BenchImpl(Bench_Cpp_PLDHashTable); > +}); Please add a benchmark for `std::unordered_map`, so we have some data to point to if somebody comes along and asks why we're not using the `std::` data structures. (Or we have data indicating we should switch, but I expect that's unlikely.) Bonus points if you test it with the default hash and also whatever `mozilla::Hash` does. ::: xpcom/rust/gtest/bench-collections/Bench.cpp:265 (Diff revision 1) > +MOZ_GTEST_BENCH_F(BenchCollections, nsDataHash, [this] { > + BenchImpl(Bench_Cpp_nsDataHashtable); > +}); Why is this benchmark valuable compared to the `PLDHash` benchmark? ::: xpcom/rust/gtest/bench-collections/bench.rs:51 (Diff revision 1) > +fn Bench_Rust<H: std::hash::BuildHasher>(mut hs: HashSet<*const c_void, H>, params: *const Params, > + vals: *const *const c_void, len: usize) { I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` from optimizing away the body of this function after inlining? Should we note that? Or should we use `black_box` (or whatever it's called) to achieve the same thing without the extra conditional branches?
Attachment #8994104 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 4•6 years ago
|
||
> Please add a benchmark for `std::unordered_map` Good suggestion. Here are the results with the default hash function: > succ_lookups fail_lookups insert_remove iterate > 44.5 ms 51.1 ms 21.9 ms 5.0 ms > 43.7 ms 50.9 ms 21.5 ms 5.0 ms > 43.8 ms 51.3 ms 21.4 ms 4.9 ms > 43.7 ms 51.3 ms 21.6 ms 5.0 ms > 43.2 ms 50.9 ms 21.6 ms 5.0 ms Very similar to PLDHash, except that iteration over unordered_set is much faster: indeed, faster than any other hash table. > > + BenchImpl(Bench_Cpp_nsDataHashtable); > > Why is this benchmark valuable compared to the `PLDHash` benchmark? I was curious to see if the cost of the layering was significant, and the answer is "slightly". I'll remove it. > I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` > from optimizing away the body of this function after inlining? Should we > note that? Or should we use `black_box` (or whatever it's called) to > achieve the same thing without the extra conditional branches? I used `assert!` because it seemed the most equivalent thing to `MOZ_RELEASE_ASSERT`. I'll add a note that the cross-language comparisons are less reliable than the intra-language comparisons.
Assignee | ||
Comment 5•6 years ago
|
||
Here are Linux opt 64 measurements from a try push:
> TEST-START | BenchCollections.unordered_set
> succ_lookups fail_lookups insert_remove iterate total
> 105.0 ms 118.6 ms 35.6 ms 7.3 ms 266.5 ms
> 100.5 ms 119.3 ms 37.3 ms 7.3 ms 264.4 ms
> 100.7 ms 119.0 ms 38.3 ms 7.3 ms 265.3 ms
> 100.7 ms 118.7 ms 38.0 ms 7.3 ms 264.7 ms
> 100.7 ms 119.4 ms 37.7 ms 7.3 ms 265.1 ms
> TEST-START | BenchCollections.PLDHash
> succ_lookups fail_lookups insert_remove iterate total
> 75.1 ms 63.6 ms 29.6 ms 58.2 ms 226.5 ms
> 74.6 ms 63.9 ms 29.6 ms 58.2 ms 226.3 ms
> 74.5 ms 64.5 ms 29.3 ms 59.9 ms 228.2 ms
> 74.7 ms 65.2 ms 29.4 ms 58.6 ms 228.0 ms
> 74.8 ms 64.1 ms 29.3 ms 58.1 ms 226.2 ms
> TEST-START | BenchCollections.JSHash
> succ_lookups fail_lookups insert_remove iterate total
> 29.0 ms 29.8 ms 17.7 ms 14.4 ms 91.0 ms
> 28.5 ms 30.0 ms 17.6 ms 14.3 ms 90.4 ms
> 28.6 ms 30.1 ms 17.6 ms 14.3 ms 90.6 ms
> 28.3 ms 29.7 ms 17.7 ms 13.9 ms 89.6 ms
> 28.1 ms 29.9 ms 17.6 ms 14.0 ms 89.6 ms
> TEST-START | BenchCollections.RustHash
> succ_lookups fail_lookups insert_remove iterate total
> 93.4 ms 79.4 ms 21.7 ms 13.2 ms 207.6 ms
> 93.2 ms 80.9 ms 21.3 ms 13.1 ms 208.5 ms
> 94.4 ms 78.8 ms 21.4 ms 12.8 ms 207.5 ms
> 95.6 ms 80.1 ms 21.5 ms 12.8 ms 210.0 ms
> 94.1 ms 80.4 ms 22.3 ms 13.3 ms 210.0 ms
> TEST-START | BenchCollections.RustFnvHash
> succ_lookups fail_lookups insert_remove iterate total
> 88.1 ms 57.1 ms 20.9 ms 13.1 ms 179.2 ms
> 87.9 ms 56.6 ms 20.3 ms 13.1 ms 177.8 ms
> 87.7 ms 56.9 ms 20.1 ms 13.3 ms 177.9 ms
> 87.8 ms 56.6 ms 20.1 ms 13.2 ms 177.6 ms
> 87.6 ms 56.8 ms 20.2 ms 13.2 ms 177.7 ms
> TEST-START | BenchCollections.RustFxHash
> succ_lookups fail_lookups insert_remove iterate total
> 14.5 ms 10.8 ms 8.7 ms 13.0 ms 47.1 ms
> 14.0 ms 10.8 ms 8.8 ms 13.0 ms 46.6 ms
> 13.9 ms 10.8 ms 8.7 ms 12.9 ms 46.3 ms
> 14.0 ms 10.8 ms 8.6 ms 13.0 ms 46.5 ms
> 13.9 ms 10.8 ms 8.6 ms 12.9 ms 46.3 ms
Roughly similar to what I get locally (albeit uniformly slower) except that unordered_set is clearly worse than PLDHash.
Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/1ea12424829b Add microbenchmarks measuring hash table performance. r=froydnj
Comment 7•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/1ea12424829b
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment 8•6 years ago
|
||
It would probably be good to include the size of the hash table, if possible, in these benchmarks.
Comment 9•6 years ago
|
||
Relatedly, I found this in my RSS feed this morning: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/
You need to log in
before you can comment on or make changes to this bug.
Description
•