Closed
Bug 1477627
Opened 6 years ago
Closed 6 years ago
Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet
Categories
(Core :: DOM: Core & HTML, enhancement, P3)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: n.nethercote)
References
Details
Attachments
(1 file, 1 obsolete file)
6.53 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
Bug 1425770 shows that js::HashSet (soon to be mozilla::HashSet, via bug 1477626) is much faster than PLDHashSet. CCGraph::mPtrToNodeMap is one of the hottest hash tables in Firefox, so we should convert it to mozilla::HashSet.
Comment 1•6 years ago
|
||
Does mozilla::HashSet perform well regardless of the size of the set?
Comment 2•6 years ago
|
||
Changing the hash table is pretty easy, so we can just try it. A good benchmark is to close 5 TechCrunch tabs at once.
Assignee | ||
Comment 3•6 years ago
|
||
The internals of js::HashSet are very similar to those of PLDHashSet. The former was based on the latter, and if you look closely at the code you can see this clearly. So in theory it should give similar results. I did some testing to confirm. The benchmark from bug 1477622 uses tables of size 1,024 and 131,072. I changed it to use tables of size 1,000,000 and 10,000,000. The results are basically the same, and js::HashSet is still clearly the best of the C++ hash tables, but slower than FxHash. > [ RUN ] BenchCollections.PLDHash > > succ_lookups fail_lookups insert_remove iterate > 425.3 ms 477.4 ms 4068.6 ms 214.3 ms > 424.4 ms 474.9 ms 4066.3 ms 217.5 ms > 418.8 ms 471.4 ms 4064.4 ms 217.2 ms > 421.2 ms 476.0 ms 4127.9 ms 212.4 ms > 420.6 ms 479.5 ms 4096.5 ms 215.7 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "PLDHash", "value": 5185663, "replicates": [5185663,5183100,5171857,5237574,5212372], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.PLDHash (26020 ms) > [ RUN ] BenchCollections.nsDataHash > > succ_lookups fail_lookups insert_remove iterate > 437.4 ms 518.2 ms 3619.0 ms 223.4 ms > 440.1 ms 518.6 ms 3632.5 ms 225.9 ms > 440.8 ms 519.7 ms 3623.1 ms 224.2 ms > 441.4 ms 521.7 ms 3647.0 ms 224.3 ms > 441.8 ms 525.9 ms 3624.1 ms 228.9 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "nsDataHash", "value": 4817189, "replicates": [4797998,4817189,4807882,4834488,4820812], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.nsDataHash (24078 ms) > [ RUN ] BenchCollections.JSHash > > succ_lookups fail_lookups insert_remove iterate > 237.1 ms 295.5 ms 2299.2 ms 128.3 ms > 235.7 ms 296.2 ms 2292.2 ms 126.6 ms > 237.1 ms 296.2 ms 2296.3 ms 126.8 ms > 237.1 ms 292.8 ms 2295.9 ms 126.4 ms > 238.3 ms 299.2 ms 2304.0 ms 126.7 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "JSHash", "value": 2956536, "replicates": [2960196,2950856,2956536,2952259,2968175], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.JSHash (14788 ms) > [ RUN ] BenchCollections.RustHash > > succ_lookups fail_lookups insert_remove iterate > 419.6 ms 320.1 ms 2761.9 ms 161.9 ms > 404.3 ms 310.7 ms 2758.0 ms 164.8 ms > 409.8 ms 309.9 ms 2755.5 ms 158.2 ms > 408.5 ms 308.4 ms 2769.8 ms 162.5 ms > 410.6 ms 311.8 ms 2757.7 ms 161.4 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustHash", "value": 3641621, "replicates": [3663555,3637823,3633606,3649296,3641621], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustHash (18225 ms) > [ RUN ] BenchCollections.RustFnvHash > > succ_lookups fail_lookups insert_remove iterate > 356.5 ms 264.3 ms 2127.4 ms 143.1 ms > 355.1 ms 262.4 ms 2127.7 ms 142.1 ms > 355.0 ms 258.3 ms 2130.0 ms 143.6 ms > 355.7 ms 259.8 ms 2134.1 ms 146.0 ms > 359.5 ms 255.9 ms 2126.9 ms 147.0 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFnvHash", "value": 2889375, "replicates": [2891238,2887346,2887054,2895747,2889375], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFnvHash (14451 ms) > [ RUN ] BenchCollections.RustFxHash > > succ_lookups fail_lookups insert_remove iterate > 148.9 ms 93.2 ms 978.5 ms 86.4 ms > 148.5 ms 95.6 ms 976.0 ms 88.2 ms > 150.5 ms 92.2 ms 980.2 ms 85.4 ms > 150.3 ms 93.8 ms 978.0 ms 85.9 ms > 148.3 ms 94.2 ms 976.5 ms 86.4 ms > PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "BenchCollections", "subtests": [{"name": "RustFxHash", "value": 1308087, "replicates": [1307096,1308347,1308374,1308087,1305461], "lowerIsBetter": true, "shouldAlert": false}]}]} > [ OK ] BenchCollections.RustFxHash (6538 ms)
Assignee | ||
Comment 4•6 years ago
|
||
> The internals of js::HashSet are very similar to those of PLDHashSet. The former was based on the latter, and if you look closely at the code you can see this clearly. So in theory it should give similar results.
To clarify: by "should give similar results", I mean js::HashSet's performance should scale with table size in a similar fashion to PLDHashTable.
Assignee | ||
Comment 5•6 years ago
|
||
mccr8: this is my current patch. I haven't measured anything yet.
Attachment #8996239 -
Flags: feedback?(continuation)
Comment 6•6 years ago
|
||
Comment on attachment 8996239 [details] [diff] [review] Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet Review of attachment 8996239 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/base/nsCycleCollector.cpp @@ +918,5 @@ > mEdges.Clear(); > mWeakMaps.Clear(); > mRootCount = 0; > + mPtrInfoMap.finish(); > + if (!mPtrInfoMap.init(kInitialMapLength)) { Huh, kind of strange that we keep this map around in between CCs. @@ +2247,5 @@ > > PtrInfo* > CCGraphBuilder::AddNode(void* aPtr, nsCycleCollectionParticipant* aParticipant) > { > + JS::AutoSuppressGCAnalysis suppress; You may not need this any more. The GC analysis might understand the new hash table better, because it does not involve function pointers. @@ +2264,5 @@ > > + if (!mGraph.mPtrInfoMap.add(p, result)) { > + // `result` leaks here, but we can't free it because it's > + // pool-allocated within NodePool. > + mGraph.mOutOfMemory = true; Please leave the assert here.
Attachment #8996239 -
Flags: feedback?(continuation) → feedback+
Comment 7•6 years ago
|
||
> Huh, kind of strange that we keep this map around in between CCs.
Oh right, at some point you made the allocation of the actual table lazy, so we're not using memory.
It looks like maybe that's not true for Hashtable.h, which would be bad.
Assignee | ||
Comment 8•6 years ago
|
||
> Oh right, at some point you made the allocation of the actual table lazy, so
> we're not using memory.
>
> It looks like maybe that's not true for Hashtable.h, which would be bad.
That's correct. I initially tried just doing finish() on the table in that Clear() function, and then init() in the Init() function, but then some IsEmpty() does `count() == 0` when the table is uninitialized and mozilla::HashTable asserts.
I see three possibilities:
- Easy: make IsEmpty() do `!initialized() || count() == 0`.
- Medium: make count() callable when the table is uninitialized.
- Hard: make the entry storage lazy, like PLDHashTable.
Assignee | ||
Comment 9•6 years ago
|
||
> - Hard: make the entry storage lazy, like PLDHashTable.
The nice thing about this is that it would let us remove the init() function, because the constructor would no longer allocate.
Comment 10•6 years ago
|
||
Those all sound reasonable to me. IsEmpty() is just used to check that we're not letting memory sit around when idle.
Assignee | ||
Comment 11•6 years ago
|
||
> Please leave the assert here.
I removed it because it's effectively dead -- there was/is an early return if `result` is null just above in the function. Is that ok?
Comment 12•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #11) > > Please leave the assert here. > > I removed it because it's effectively dead -- there was/is an early return > if `result` is null just above in the function. Is that ok? I meant the MOZ_ASSERT(false, "Ran out of memory while building cycle collector graph"); assert.
Assignee | ||
Comment 13•6 years ago
|
||
Oh, sure thing.
Updated•6 years ago
|
Priority: -- → P3
Assignee | ||
Comment 14•6 years ago
|
||
> A good benchmark is to close 5 TechCrunch tabs at once.
And what should I measure/observe while doing that?
Flags: needinfo?(continuation)
Comment 15•6 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #14) > And what should I measure/observe while doing that? Set javascript.options.mem.log to true in about:config, open the browser console, filter on cc( so you only see CC times, then look at the total time for the CC.
Flags: needinfo?(continuation)
Assignee | ||
Comment 16•6 years ago
|
||
mccr8, I'm having trouble getting meaningful CC measurements, so I haven't managed to quantify the effect of this.
Attachment #8998396 -
Flags: review?(continuation)
Assignee | ||
Updated•6 years ago
|
Attachment #8996239 -
Attachment is obsolete: true
Comment 17•6 years ago
|
||
I was going to see if I could tell if there's any impact on the CC times, but it looks like bug 1481998 bitrotted this patch. Does the finish() call just get renamed to clearAndCompact() and the init(kInitialMapLength) just get deleted? And kInitialMapLength gets passed to the ctor for mPtrInfoMap?
Flags: needinfo?(n.nethercote)
Comment 18•6 years ago
|
||
I did some CC measurements with and without the patch in the bug. Preparation: - Uncommented the #define of COLLECT_TIME_DEBUG in nsCycleCollector.cpp - Made TimeLog::Checkpoint do an early return if (XRE_GetProcessType() != GeckoProcessType_Content), so it would only print things out in the content process. - Set dom.ipc.processCount to 1 in about:config to avoid worrying about multiple processes. - Start the browser, and load example.com in one tab (to keep the content process alive). For the actual testing, I then loaded https://html.spec.whatwg.org/ in a new tab, and let it settle down for a bit. Then I closed the tab and waited for the cleanup CC, which produced a bunch of output including something like: cc: total cycle collector time was 684ms in 8 slices cc: visited 480903 ref counted and 7747 GCed objects, freed 480776 ref counted and 7564 GCed objects (99%). (The number of visited things is nice because it gives you reassurance that the different cleanup CCs are doing about the same amount of work.) I repeated this process 4 or so times with and without the patch. The numbers are fairly noisy, but without the patch the CC times seemed to be around 750ms. With the patch, the times seems to be a little under 700ms. So it looks like there's a definite improvement. It'll be interesting to see if anything shows up on telemetry, though the high end is a little harder to see.
Comment 19•6 years ago
|
||
The time saved was in MarkRoots(), as expected.
Comment 20•6 years ago
|
||
Comment on attachment 8998396 [details] [diff] [review] Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet Review of attachment 8998396 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for doing this. I've been poking at something similar for years but I never got around to actually finishing it. (I also confirmed that this shows up as not using any memory when the CC isn't running, thanks to the lazy initialization.)
Attachment #8998396 -
Flags: review?(continuation) → review+
Assignee | ||
Comment 21•6 years ago
|
||
Thank you for doing the measurements.
> Does the finish() call
> just get renamed to clearAndCompact() and the init(kInitialMapLength) just
> get deleted? And kInitialMapLength gets passed to the ctor for mPtrInfoMap?
Yes, all that, plus the `!mPtrInfoMap.initialized()` in IsEmpty() is converted to `mPtrInfoMap.empty()`. Presumably you figured that out, given that you got measurements working :)
Flags: needinfo?(n.nethercote)
Comment 22•6 years ago
|
||
Pushed by nnethercote@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6b1f74eb6d3d Convert CCGraph::mPtrToNodeMap to a mozilla::HashSet. r=mccr8
Comment 23•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6b1f74eb6d3d
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Comment 24•6 years ago
|
||
Also, if the numbers I found above are right, then this is one of the largest improvements to CC throughput (ie rather than pause time) since Olli's skippable work many years ago.
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•