Closed Bug 1477627 Opened 6 years ago Closed 6 years ago

Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet

Categories

(Core :: DOM: Core & HTML, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(1 file, 1 obsolete file)

Bug 1425770 shows that js::HashSet (soon to be mozilla::HashSet, via bug 1477626) is much faster than PLDHashSet. CCGraph::mPtrToNodeMap is one of the hottest hash tables in Firefox, so we should convert it to mozilla::HashSet.
Does mozilla::HashSet perform well regardless of the size of the set?
Changing the hash table is pretty easy, so we can just try it. A good benchmark is to close 5 TechCrunch tabs at once.
The internals of js::HashSet are very similar to those of PLDHashSet. The former was based on the latter, and if you look closely at the code you can see this clearly. So in theory it should give similar results.

I did some testing to confirm. The benchmark from bug 1477622 uses tables of size 1,024 and 131,072. I changed it to use tables of size 1,000,000 and 10,000,000. The results are basically the same, and js::HashSet is still clearly the best of the C++ hash tables, but slower than FxHash.

> [ RUN      ] BenchCollections.PLDHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           425.3 ms          477.4 ms         4068.6 ms          214.3 ms
>           424.4 ms          474.9 ms         4066.3 ms          217.5 ms
>           418.8 ms          471.4 ms         4064.4 ms          217.2 ms
>           421.2 ms          476.0 ms         4127.9 ms          212.4 ms
>           420.6 ms          479.5 ms         4096.5 ms          215.7 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 5185663, "replicates": [5185663,5183100,5171857,5237574,5212372], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.PLDHash (26020 ms)
> [ RUN      ] BenchCollections.nsDataHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           437.4 ms          518.2 ms         3619.0 ms          223.4 ms
>           440.1 ms          518.6 ms         3632.5 ms          225.9 ms
>           440.8 ms          519.7 ms         3623.1 ms          224.2 ms
>           441.4 ms          521.7 ms         3647.0 ms          224.3 ms
>           441.8 ms          525.9 ms         3624.1 ms          228.9 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 4817189, "replicates": [4797998,4817189,4807882,4834488,4820812], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.nsDataHash (24078 ms)
> [ RUN      ] BenchCollections.JSHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           237.1 ms          295.5 ms         2299.2 ms          128.3 ms
>           235.7 ms          296.2 ms         2292.2 ms          126.6 ms
>           237.1 ms          296.2 ms         2296.3 ms          126.8 ms
>           237.1 ms          292.8 ms         2295.9 ms          126.4 ms
>           238.3 ms          299.2 ms         2304.0 ms          126.7 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 2956536, "replicates": [2960196,2950856,2956536,2952259,2968175], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.JSHash (14788 ms)
> [ RUN      ] BenchCollections.RustHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           419.6 ms          320.1 ms         2761.9 ms          161.9 ms
>           404.3 ms          310.7 ms         2758.0 ms          164.8 ms
>           409.8 ms          309.9 ms         2755.5 ms          158.2 ms
>           408.5 ms          308.4 ms         2769.8 ms          162.5 ms
>           410.6 ms          311.8 ms         2757.7 ms          161.4 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 3641621, "replicates": [3663555,3637823,3633606,3649296,3641621], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustHash (18225 ms)
> [ RUN      ] BenchCollections.RustFnvHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           356.5 ms          264.3 ms         2127.4 ms          143.1 ms
>           355.1 ms          262.4 ms         2127.7 ms          142.1 ms
>           355.0 ms          258.3 ms         2130.0 ms          143.6 ms
>           355.7 ms          259.8 ms         2134.1 ms          146.0 ms
>           359.5 ms          255.9 ms         2126.9 ms          147.0 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 2889375, "replicates": [2891238,2887346,2887054,2895747,2889375], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFnvHash (14451 ms)
> [ RUN      ] BenchCollections.RustFxHash
> 
>       succ_lookups      fail_lookups     insert_remove           iterate
>           148.9 ms           93.2 ms          978.5 ms           86.4 ms
>           148.5 ms           95.6 ms          976.0 ms           88.2 ms
>           150.5 ms           92.2 ms          980.2 ms           85.4 ms
>           150.3 ms           93.8 ms          978.0 ms           85.9 ms
>           148.3 ms           94.2 ms          976.5 ms           86.4 ms
> PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 1308087, "replicates": [1307096,1308347,1308374,1308087,1305461], "lowerIsBetter": true, "shouldAlert": false}]}]}
> [       OK ] BenchCollections.RustFxHash (6538 ms)
> The internals of js::HashSet are very similar to those of PLDHashSet. The former was based on the latter, and if you look closely at the code you can see this clearly. So in theory it should give similar results.

To clarify: by "should give similar results", I mean js::HashSet's performance should scale with table size in a similar fashion to PLDHashTable.
See Also: → 1004285
mccr8: this is my current patch. I haven't measured anything yet.
Attachment #8996239 - Flags: feedback?(continuation)
Comment on attachment 8996239 [details] [diff] [review]
Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet

Review of attachment 8996239 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/base/nsCycleCollector.cpp
@@ +918,5 @@
>      mEdges.Clear();
>      mWeakMaps.Clear();
>      mRootCount = 0;
> +    mPtrInfoMap.finish();
> +    if (!mPtrInfoMap.init(kInitialMapLength)) {

Huh, kind of strange that we keep this map around in between CCs.

@@ +2247,5 @@
>  
>  PtrInfo*
>  CCGraphBuilder::AddNode(void* aPtr, nsCycleCollectionParticipant* aParticipant)
>  {
> +  JS::AutoSuppressGCAnalysis suppress;

You may not need this any more. The GC analysis might understand the new hash table better, because it does not involve function pointers.

@@ +2264,5 @@
>  
> +    if (!mGraph.mPtrInfoMap.add(p, result)) {
> +      // `result` leaks here, but we can't free it because it's
> +      // pool-allocated within NodePool.
> +      mGraph.mOutOfMemory = true;

Please leave the assert here.
Attachment #8996239 - Flags: feedback?(continuation) → feedback+
> Huh, kind of strange that we keep this map around in between CCs.
Oh right, at some point you made the allocation of the actual table lazy, so we're not using memory.

It looks like maybe that's not true for Hashtable.h, which would be bad.
> Oh right, at some point you made the allocation of the actual table lazy, so
> we're not using memory.
> 
> It looks like maybe that's not true for Hashtable.h, which would be bad.

That's correct. I initially tried just doing finish() on the table in that Clear() function, and then init() in the Init() function, but then some IsEmpty() does `count() == 0` when the table is uninitialized and mozilla::HashTable asserts.

I see three possibilities:
- Easy: make IsEmpty() do `!initialized() || count() == 0`.
- Medium: make count() callable when the table is uninitialized.
- Hard: make the entry storage lazy, like PLDHashTable.
> - Hard: make the entry storage lazy, like PLDHashTable.

The nice thing about this is that it would let us remove the init() function, because the constructor would no longer allocate.
Those all sound reasonable to me. IsEmpty() is just used to check that we're not letting memory sit around when idle.
> Please leave the assert here.

I removed it because it's effectively dead -- there was/is an early return if `result` is null just above in the function. Is that ok?
(In reply to Nicholas Nethercote [:njn] from comment #11)
> > Please leave the assert here.
> 
> I removed it because it's effectively dead -- there was/is an early return
> if `result` is null just above in the function. Is that ok?

I meant the MOZ_ASSERT(false, "Ran out of memory while building cycle collector graph"); assert.
Oh, sure thing.
Priority: -- → P3
> A good benchmark is to close 5 TechCrunch tabs at once.

And what should I measure/observe while doing that?
Flags: needinfo?(continuation)
(In reply to Nicholas Nethercote [:njn] from comment #14)
> And what should I measure/observe while doing that?

Set javascript.options.mem.log to true in about:config, open the browser console, filter on cc( so you only see CC times, then look at the total time for the CC.
Flags: needinfo?(continuation)
mccr8, I'm having trouble getting meaningful CC measurements, so I haven't
managed to quantify the effect of this.
Attachment #8998396 - Flags: review?(continuation)
Attachment #8996239 - Attachment is obsolete: true
I was going to see if I could tell if there's any impact on the CC times, but it looks like bug 1481998 bitrotted this patch. Does the finish() call just get renamed to clearAndCompact() and the init(kInitialMapLength) just get deleted? And kInitialMapLength gets passed to the ctor for mPtrInfoMap?
Flags: needinfo?(n.nethercote)
I did some CC measurements with and without the patch in the bug.

Preparation:
- Uncommented the #define of COLLECT_TIME_DEBUG in nsCycleCollector.cpp
- Made TimeLog::Checkpoint do an early return if (XRE_GetProcessType() != GeckoProcessType_Content), so it would only print things out in the content process.
- Set dom.ipc.processCount to 1 in about:config to avoid worrying about multiple processes.
- Start the browser, and load example.com in one tab (to keep the content process alive).

For the actual testing, I then loaded https://html.spec.whatwg.org/ in a new tab, and let it settle down for a bit. Then I closed the tab and waited for the cleanup CC, which produced a bunch of output including something like:
cc: total cycle collector time was 684ms in 8 slices
cc: visited 480903 ref counted and 7747 GCed objects, freed 480776 ref counted and 7564 GCed objects (99%).

(The number of visited things is nice because it gives you reassurance that the different cleanup CCs are doing about the same amount of work.)

I repeated this process 4 or so times with and without the patch. The numbers are fairly noisy, but without the patch the CC times seemed to be around 750ms. With the patch, the times seems to be a little under 700ms. So it looks like there's a definite improvement. It'll be interesting to see if anything shows up on telemetry, though the high end is a little harder to see.
The time saved was in MarkRoots(), as expected.
Comment on attachment 8998396 [details] [diff] [review]
Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet

Review of attachment 8998396 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this. I've been poking at something similar for years but I never got around to actually finishing it.

(I also confirmed that this shows up as not using any memory when the CC isn't running, thanks to the lazy initialization.)
Attachment #8998396 - Flags: review?(continuation) → review+
Thank you for doing the measurements.

> Does the finish() call
> just get renamed to clearAndCompact() and the init(kInitialMapLength) just
> get deleted? And kInitialMapLength gets passed to the ctor for mPtrInfoMap?

Yes, all that, plus the `!mPtrInfoMap.initialized()` in IsEmpty() is converted to `mPtrInfoMap.empty()`. Presumably you figured that out, given that you got measurements working :)
Flags: needinfo?(n.nethercote)
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b1f74eb6d3d
Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet. r=mccr8
https://hg.mozilla.org/mozilla-central/rev/6b1f74eb6d3d
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Also, if the numbers I found above are right, then this is one of the largest improvements to CC throughput (ie rather than pause time) since Olli's skippable work many years ago.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: