Closed Bug 1477622 Opened 6 years ago Closed 6 years ago

Add microbenchmarks measuring hash table performance

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file)

Hash tables are used a lot, and we have a bunch of different implementations. Microbenchmarks will give us a sense of how they compare, performance-wise.
Blocks: 1477626
Blocks: 1477628
Results from an opt build on my fast Linux box:

> [ RUN      ] BenchCollections.PLDHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            39.0 ms           35.3 ms           16.9 ms           23.9 ms
>            38.2 ms           35.2 ms           15.7 ms           24.4 ms
>            38.4 ms           36.1 ms           16.5 ms           25.4 ms
>            39.3 ms           36.2 ms           16.1 ms           24.6 ms
>            39.9 ms           36.3 ms           16.0 ms           25.6 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 116225, "replicates": [115107,113632,116428,116225,117817], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.PLDHash (579 ms)
> [ RUN      ] BenchCollections.nsDataHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            40.6 ms           40.1 ms           19.6 ms           24.9 ms
>            39.2 ms           38.0 ms           18.7 ms           25.0 ms
>            38.8 ms           38.0 ms           19.4 ms           25.3 ms
>            38.6 ms           38.8 ms           18.7 ms           24.9 ms
>            38.6 ms           37.8 ms           18.6 ms           25.0 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 121068, "replicates": [125224,120960,121527,121068,119899], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.nsDataHash (609 ms)
> [ RUN      ] BenchCollections.JSHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            17.9 ms           22.0 ms           12.8 ms            6.8 ms
>            18.2 ms           21.1 ms           13.0 ms            7.2 ms
>            18.1 ms           21.5 ms           12.8 ms            7.0 ms
>            17.3 ms           22.0 ms           13.0 ms            7.4 ms
>            17.4 ms           21.8 ms           12.4 ms            7.2 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 59439, "replicates": [59474,59439,59405,59711,58833], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.JSHash (297 ms)
> [ RUN      ] BenchCollections.RustHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           847.6 ms          830.6 ms          127.1 ms          107.7 ms
>           865.8 ms          837.4 ms          128.4 ms          108.3 ms
>           870.3 ms          819.4 ms          127.1 ms          107.8 ms
>          1006.6 ms          853.4 ms          128.0 ms          115.3 ms
>           884.0 ms          831.1 ms          126.5 ms          106.8 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 1939955, "replicates": [1912975,1939955,1924661,2103302,1948463], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustHash (9830 ms)
> [ RUN      ] BenchCollections.RustFnvHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           701.5 ms          679.4 ms          118.9 ms          106.0 ms
>           737.3 ms          784.3 ms          121.3 ms          105.8 ms
>           703.2 ms          673.9 ms          119.4 ms          105.6 ms
>           703.7 ms          676.4 ms          119.4 ms          108.1 ms
>           708.8 ms          684.0 ms          121.9 ms          110.7 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 1607643, "replicates": [1605724,1748703,1602168,1607643,1625358], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFnvHash (8189 ms)
> [ RUN      ] BenchCollections.RustFxHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           468.8 ms          334.1 ms           55.7 ms          118.1 ms
>           384.4 ms          292.0 ms           54.6 ms          116.4 ms
>           447.6 ms          348.5 ms           60.1 ms          125.5 ms
>           459.9 ms          292.1 ms           54.9 ms          115.9 ms
>           387.6 ms          296.2 ms           54.9 ms          117.1 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 922830, "replicates": [976660,847416,981651,922830,855663], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFxHash (4584 ms)

Results from a release (i.e. --enable-release) build on my fast Linux box:

> [ RUN      ] BenchCollections.PLDHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            43.2 ms           52.4 ms           21.0 ms           33.5 ms
>            41.6 ms           52.2 ms           19.9 ms           33.7 ms
>            41.3 ms           51.4 ms           19.3 ms           33.5 ms
>            41.8 ms           51.1 ms           19.8 ms           34.2 ms
>            41.3 ms           51.3 ms           19.3 ms           34.0 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 146968, "replicates": [150070,147343,145430,146968,145905], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.PLDHash (736 ms)
> [ RUN      ] BenchCollections.nsDataHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            43.2 ms           55.6 ms           19.7 ms           34.0 ms
>            43.6 ms           56.9 ms           19.6 ms           34.6 ms
>            45.5 ms           55.6 ms           19.6 ms           33.8 ms
>            43.1 ms           54.9 ms           19.1 ms           33.6 ms
>            43.0 ms           55.2 ms           18.8 ms           34.4 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 152587, "replicates": [152587,154664,154490,150807,151499], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.nsDataHash (764 ms)
> [ RUN      ] BenchCollections.JSHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            15.2 ms           17.8 ms           12.2 ms            5.5 ms
>            15.5 ms           18.5 ms           11.7 ms            5.7 ms
>            14.9 ms           18.5 ms           11.9 ms            5.4 ms
>            15.3 ms           17.7 ms           11.9 ms            5.3 ms
>            14.7 ms           18.3 ms           11.6 ms            5.5 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 50608, "replicates": [50650,51419,50608,50147,50102], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.JSHash (253 ms)
> [ RUN      ] BenchCollections.RustHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            71.1 ms           46.7 ms           14.5 ms            8.2 ms
>            66.0 ms           47.1 ms           15.0 ms            8.1 ms
>            62.3 ms           46.4 ms           15.2 ms            8.1 ms
>            66.4 ms           46.4 ms           14.6 ms            8.2 ms
>            66.8 ms           46.9 ms           14.7 ms            8.1 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 136219, "replicates": [140511,136219,132036,135527,136441], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustHash (681 ms)
> [ RUN      ] BenchCollections.RustFnvHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>            55.7 ms           32.8 ms           12.8 ms            8.2 ms
>            55.2 ms           33.5 ms           12.8 ms            8.2 ms
>            54.0 ms           33.1 ms           12.8 ms            8.2 ms
>            54.3 ms           32.4 ms           12.9 ms            8.1 ms
>            55.0 ms           33.4 ms           12.9 ms            8.6 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 109523, "replicates": [109523,109718,108079,107759,109966], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFnvHash (545 ms)
> [ RUN      ] BenchCollections.RustFxHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>             9.2 ms            7.4 ms            5.2 ms            7.8 ms
>             8.9 ms            7.4 ms            5.0 ms            7.9 ms
>             8.8 ms            7.5 ms            5.0 ms            8.4 ms
>             8.6 ms            7.6 ms            5.1 ms            7.9 ms
>             8.6 ms            7.8 ms            5.0 ms            7.9 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 29352, "replicates": [29609,29286,29733,29298,29352], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFxHash (148 ms)

Things of note when comparing the non-release build with the release build:

- PLDHash and nsDataHash are noticeably slower with --enable-release.

- JSHash is noticeably faster with --enable-release.

- The Rust ones are massively (10--20x) faster with --enable-release.
  (The non-release Rust results are quite similar to a debug build.)

I'm not sure what to make of that.

Things of note within the release build results:

- nsDataHash is the same or slightly slower than PLDHash. This makes sense,
  because it's layered on top of PLDHash.

- JSHash is much faster than PLDHash/nsDataHash: about 2--6x faster, depending
  on the benchmark. They use almost identical double hashing algorithms, so the
  most likely reason for this is that PLDHash has "virtual ops" that require
  function calls, while JSHash uses templates and so can inline those calls.

- Rust hash tables with default hashing are slower than PLDHash for
  succ_lookups but faster for the other benchmarks. Fnv hashing speeds things
  up a bit. Fx hashing speeds things up a *lot*.

- FxHash is the fastest overall, and JSHash is second fastest. The others are
  significantly slower.

Conclusions:

- We should use JSHash instead of PLDHash/nsDataHash for hot tables such as
  CCGraph::mPtrToNodeMap (bug 1477627). This will require moving JSHash into
  MFBT (bug 1477626), as was done with js::Vector some time ago. Then hot
  PLDHash/nsDataHash instances can be converted to JSHash. (As well as being
  faster, JSHash also has a much nicer API.)

- We should use FxHash instead of FnvHash within Rust code (bug 1477628). Unlike
  PLDHash-to-JSHash conversions, which must be done one at a time and only when
  it makes sense, FnvHash-to-FxHash conversions are trivial and can be done en
  masse. (BTW, this very change was a win for rustc back in 2016:
  https://github.com/rust-lang/rust/pull/37229).

- Robin hood hashing may explain why FxHash is faster than JSHash. But it looks
  like there are some constant factors (e.g. avoiding virtual op calls) that
  can be fixed more easily and give bigger wins than changing the algorithm.
Blocks: 1477632
Comment on attachment 8994104 [details]
Bug 1477622 - Add microbenchmarks measuring hash table performance.

https://reviewboard.mozilla.org/r/258730/#review265684

Thanks.

::: xpcom/rust/gtest/bench-collections/Bench.cpp:261
(Diff revision 1)
> +MOZ_GTEST_BENCH_F(BenchCollections, PLDHash, [this] {
> +  BenchImpl(Bench_Cpp_PLDHashTable);
> +});

Please add a benchmark for `std::unordered_map`, so we have some data to point to if somebody comes along and asks why we're not using the `std::` data structures.  (Or we have data indicating we should switch, but I expect that's unlikely.)  Bonus points if you test it with the default hash and also whatever `mozilla::Hash` does.

::: xpcom/rust/gtest/bench-collections/Bench.cpp:265
(Diff revision 1)
> +MOZ_GTEST_BENCH_F(BenchCollections, nsDataHash, [this] {
> +  BenchImpl(Bench_Cpp_nsDataHashtable);
> +});

Why is this benchmark valuable compared to the `PLDHash` benchmark?

::: xpcom/rust/gtest/bench-collections/bench.rs:51
(Diff revision 1)
> +fn Bench_Rust<H: std::hash::BuildHasher>(mut hs: HashSet<*const c_void, H>, params: *const Params,
> +                                         vals: *const *const c_void, len: usize) {

I'm assuming that we're relying on the `assert!()` bits to prevent `rustc` from optimizing away the body of this function after inlining?  Should we note that?  Or should we use `black_box` (or whatever it's called) to achieve the same thing without the extra conditional branches?
Attachment #8994104 - Flags: review?(nfroyd) → review+
> Please add a benchmark for `std::unordered_map`

Good suggestion. Here are the results with the default hash function:

>      succ_lookups      fail_lookups     insert_remove           iterate
>           44.5 ms           51.1 ms           21.9 ms            5.0 ms
>           43.7 ms           50.9 ms           21.5 ms            5.0 ms
>           43.8 ms           51.3 ms           21.4 ms            4.9 ms
>           43.7 ms           51.3 ms           21.6 ms            5.0 ms
>           43.2 ms           50.9 ms           21.6 ms            5.0 ms

Very similar to PLDHash, except that iteration over unordered_set is much faster: indeed, faster than any other hash table.


> > +  BenchImpl(Bench_Cpp_nsDataHashtable);
> 
> Why is this benchmark valuable compared to the `PLDHash` benchmark?

I was curious to see if the cost of the layering was significant, and the answer is "slightly". I'll remove it.

> I'm assuming that we're relying on the `assert!()` bits to prevent `rustc`
> from optimizing away the body of this function after inlining?  Should we
> note that?  Or should we use `black_box` (or whatever it's called) to
> achieve the same thing without the extra conditional branches?

I used `assert!` because it seemed the most equivalent thing to `MOZ_RELEASE_ASSERT`. I'll add a note that the cross-language comparisons are less reliable than the intra-language comparisons.
Here are Linux opt 64 measurements from a try push:

> TEST-START | BenchCollections.unordered_set
>   succ_lookups  fail_lookups insert_remove       iterate         total
>       105.0 ms      118.6 ms       35.6 ms        7.3 ms      266.5 ms
>       100.5 ms      119.3 ms       37.3 ms        7.3 ms      264.4 ms
>       100.7 ms      119.0 ms       38.3 ms        7.3 ms      265.3 ms
>       100.7 ms      118.7 ms       38.0 ms        7.3 ms      264.7 ms
>       100.7 ms      119.4 ms       37.7 ms        7.3 ms      265.1 ms
> TEST-START | BenchCollections.PLDHash
>   succ_lookups  fail_lookups insert_remove       iterate         total
>        75.1 ms       63.6 ms       29.6 ms       58.2 ms      226.5 ms
>        74.6 ms       63.9 ms       29.6 ms       58.2 ms      226.3 ms
>        74.5 ms       64.5 ms       29.3 ms       59.9 ms      228.2 ms
>        74.7 ms       65.2 ms       29.4 ms       58.6 ms      228.0 ms
>        74.8 ms       64.1 ms       29.3 ms       58.1 ms      226.2 ms
> TEST-START | BenchCollections.JSHash
>   succ_lookups  fail_lookups insert_remove       iterate         total
>        29.0 ms       29.8 ms       17.7 ms       14.4 ms       91.0 ms
>        28.5 ms       30.0 ms       17.6 ms       14.3 ms       90.4 ms
>        28.6 ms       30.1 ms       17.6 ms       14.3 ms       90.6 ms
>        28.3 ms       29.7 ms       17.7 ms       13.9 ms       89.6 ms
>        28.1 ms       29.9 ms       17.6 ms       14.0 ms       89.6 ms
> TEST-START | BenchCollections.RustHash
>   succ_lookups  fail_lookups insert_remove       iterate         total
>        93.4 ms       79.4 ms       21.7 ms       13.2 ms      207.6 ms
>        93.2 ms       80.9 ms       21.3 ms       13.1 ms      208.5 ms
>        94.4 ms       78.8 ms       21.4 ms       12.8 ms      207.5 ms
>        95.6 ms       80.1 ms       21.5 ms       12.8 ms      210.0 ms
>        94.1 ms       80.4 ms       22.3 ms       13.3 ms      210.0 ms
> TEST-START | BenchCollections.RustFnvHash
>   succ_lookups  fail_lookups insert_remove       iterate         total
>        88.1 ms       57.1 ms       20.9 ms       13.1 ms      179.2 ms
>        87.9 ms       56.6 ms       20.3 ms       13.1 ms      177.8 ms
>        87.7 ms       56.9 ms       20.1 ms       13.3 ms      177.9 ms
>        87.8 ms       56.6 ms       20.1 ms       13.2 ms      177.6 ms
>        87.6 ms       56.8 ms       20.2 ms       13.2 ms      177.7 ms
> TEST-START | BenchCollections.RustFxHash
>   succ_lookups  fail_lookups insert_remove       iterate         total
>        14.5 ms       10.8 ms        8.7 ms       13.0 ms       47.1 ms
>        14.0 ms       10.8 ms        8.8 ms       13.0 ms       46.6 ms
>        13.9 ms       10.8 ms        8.7 ms       12.9 ms       46.3 ms
>        14.0 ms       10.8 ms        8.6 ms       13.0 ms       46.5 ms
>        13.9 ms       10.8 ms        8.6 ms       12.9 ms       46.3 ms

Roughly similar to what I get locally (albeit uniformly slower) except that unordered_set is clearly worse than PLDHash.
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ea12424829b
Add microbenchmarks measuring hash table performance. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/1ea12424829b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
It would probably be good to include the size of the hash table, if possible, in these benchmarks.
Relatedly, I found this in my RSS feed this morning: https://hpjansson.org/blag/2018/07/24/a-hash-table-re-hash/
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: