Closed
Bug 944810
Opened 11 years ago
Closed 11 years ago
Consider using a std::map instead of a pldhash in the cycle collector
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
INVALID
People
(Reporter: bjacob, Unassigned)
Details
Attachments
(1 file)
4.01 KB,
patch
|
Details | Diff | Splinter Review |
We've had some trouble recently with the cycle collector running into pldhash's size limit (see bug 902922 and friends).
Maybe the CC graph doesn't need to use a hash table in the first place? The key type is pointers, so it's ordered.
I have no idea how a std::map is going to compare speed-wise, but the only way to find out is to benchmark it ;-)
So here is a small patch making the CC use a std::map. It also has the benefit of being a net decrease of code size!
Do you guys have CC benchmarks that we could try this on? (my hope is that speed isn't significantly affected either way...)
The other thing to measure would be memory usage, of couse. And that would have to be measured separately for each implementation of the STL.
I also have a question. FindNode doesn't seem to ever be called. So how is this map every queried? I only see the graphbuilder adding nodes, but I don't see where it does anything with that data structure.
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(bugs)
Comment 1•11 years ago
|
||
When I last measured it a few years ago, hash table look ups were about 15% of CC time. So not a huge percentage, but not nothing. I hacked up a trie at one point, but it was really slow. My crude trie implementation may be to blame for that.
As for "benchmarks", you can look at how long an idle CC takes, and how long it takes to clean up after a reasonable big page like TechCrunch.
The data structure is queried as part of AddNodeToMap: if an object is already in the CC graph, we use its existing entry, otherwise we add a new one. So, it is just used to avoid duplication in the graph.
FindNode isn't used right now. I added it recently to prepare for incremental cycle collection. It is needed when an object dies during cycle collection to remove the object from the graph. Maybe it is also used for some other misc. logging.
Comment 2•11 years ago
|
||
> We've had some trouble recently with the cycle collector running into
> pldhash's size limit (see bug 902922 and friends).
And we fixed that. You keep worrying about this, but I don't think you need to. If the limit becomes a problem in the future, it can be increased with moderate effort. (The increase itself is trivial, it's the testing side that requires more effort if we don't want the test times to blow up.)
> The other thing to measure would be memory usage, of couse. And that would
> have to be measured separately for each implementation of the STL.
This is a show-stopper, IMO. We currently don't have any way to measure memory usage of STL containers. I investigated it in bug 806514 and concluded for that case that converting a std::vector to nsTArray was by far the easiest solution. Let's not convert in the other direction, please.
Reporter | ||
Comment 3•11 years ago
|
||
Here's a try push just to see if my patch is stupid to the point of not working - https://tbpl.mozilla.org/?tree=Try&rev=0fcdad6d54c4
Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #2)
> > We've had some trouble recently with the cycle collector running into
> > pldhash's size limit (see bug 902922 and friends).
>
> And we fixed that. You keep worrying about this, but I don't think you need
> to. If the limit becomes a problem in the future, it can be increased with
> moderate effort. (The increase itself is trivial, it's the testing side
> that requires more effort if we don't want the test times to blow up.)
The reason why I worry about that is that I thought, but I could have misunderstood, that we only shifted the threshold above which bad things happen, from 8M entries to 64M entries; am I getting this wrong?
>
> > The other thing to measure would be memory usage, of couse. And that would
> > have to be measured separately for each implementation of the STL.
>
> This is a show-stopper, IMO. We currently don't have any way to measure
> memory usage of STL containers.
Fair point, so that would prevent us from landing as-is a patch that just uses a std::map, but if the approach were considered promising, I suppose that we could consider getting a std::map-like container implementation in MFBT, no? So I mean, yes this is a show stopper wrt landing, but not wrt experimenting, right?
Reporter | ||
Comment 5•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #1)
> As for "benchmarks", you can look at how long an idle CC takes, and how long
> it takes to clean up after a reasonable big page like TechCrunch.
Could you give me pointers to existing logging/debugging helpers? How do I know how long CC took?
Comment 6•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #5)
> Could you give me pointers to existing logging/debugging helpers? How do I
> know how long CC took?
Ah, right. Set javascript.options.mem.log to true in about:config, then hit something like command-shift-j to bring up the Browser Console and you'll see entries like:
CC(T+271327.0) duration: 13ms, suspected: 163, visited: 657 RCed and 1193 GCed, collected: 0 RCed and 0 GCed (0|177 waiting for GC)
ForgetSkippable 11 times before CC, min: 1 ms, max: 9 ms, avg: 2 ms, total: 29 ms, max sync: 9 ms, removed: 3297
(There will also be larger GC entries.)
Reporter | ||
Comment 7•11 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #4)
> (In reply to Nicholas Nethercote [:njn] from comment #2)
> > > We've had some trouble recently with the cycle collector running into
> > > pldhash's size limit (see bug 902922 and friends).
> >
> > And we fixed that. You keep worrying about this, but I don't think you need
> > to. If the limit becomes a problem in the future, it can be increased with
> > moderate effort. (The increase itself is trivial, it's the testing side
> > that requires more effort if we don't want the test times to blow up.)
>
> The reason why I worry about that is that I thought, but I could have
> misunderstood, that we only shifted the threshold above which bad things
> happen, from 8M entries to 64M entries; am I getting this wrong?
More to the point: I'm happy to hear that we could push the limit farther if we needed to, but wouldn't be nicer not to have to worry about any limit aside from actually running out of memory?
Reporter | ||
Comment 8•11 years ago
|
||
Out of curiosity, I would be curious as to what it is about STL containers that can't be measured by passing a suitably instrumented allocator? Do STL containers (in some implementations) allocate things on the heap without using the allocator, or are you concerned about the sizeof(*this) part?
Reporter | ||
Comment 9•11 years ago
|
||
OK, here is a performance comparison, it kills down the idea of replacing pldhash by std::map here.
I'm using GCC 4.6.2 on linux, x86-64.
I repeatedly loaded http://p3d.in/3axnM and recorded CC times, both spontaneous and triggered in about:memory.
See how the worst CC durations are about 9.5 seconds with pldhash and 14 seconds with std::map.
Without patch (i.e. using pldhash)
CC(T+0.0) duration: 71ms, suspected: 15792, visited: 7169 RCed and 5417 GCed, collected: 3641 RCed and 37 GCed (3678|0 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+6.1) duration: 2ms, suspected: 4615, visited: 1505 RCed and 150 GCed, collected: 60 RCed and 2 GCed (62|815 waiting for GC)
ForgetSkippable 11 times before CC, min: 0 ms, max: 3 ms, avg: 1 ms, total: 12 ms, max sync: 0 ms, removed: 3503
CC(T+15.7) duration: 7ms, suspected: 1644, visited: 2814 RCed and 1595 GCed, collected: 1889 RCed and 1448 GCed (3337|5 waiting for GC)
ForgetSkippable 15 times before CC, min: 0 ms, max: 5 ms, avg: 1 ms, total: 17 ms, max sync: 2 ms, removed: 4144
CC(T+22.6) duration: 6ms, suspected: 275, visited: 570 RCed and 169 GCed, collected: 36 RCed and 18 GCed (54|217 waiting for GC)
ForgetSkippable 9 times before CC, min: 0 ms, max: 1 ms, avg: 1 ms, total: 9 ms, max sync: 0 ms, removed: 2860
CC(T+22.9) duration: 3ms, suspected: 188, visited: 278 RCed and 151 GCed, collected: 0 RCed and 0 GCed (54|247 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+35.9) duration: 7ms, suspected: 473, visited: 1030 RCed and 159 GCed, collected: 228 RCed and 18 GCed (246|202 waiting for GC)
ForgetSkippable 10 times before CC, min: 0 ms, max: 3 ms, avg: 1 ms, total: 12 ms, max sync: 0 ms, removed: 8009
CC(T+57.8) duration: 9373ms, suspected: 1900, visited: 5380 RCed and 9115769 GCed, collected: 4493 RCed and 9115622 GCed (9120115|18 waiting for GC)
ForgetSkippable 6 times before CC, min: 0 ms, max: 3 ms, avg: 2 ms, total: 12 ms, max sync: 1 ms, removed: 7441
CC(T+74.6) duration: 9298ms, suspected: 1288, visited: 4680 RCed and 9115824 GCed, collected: 4297 RCed and 9115677 GCed (9119974|0 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 6 ms, avg: 3 ms, total: 11 ms, max sync: 0 ms, removed: 1215
CC(T+83.9) duration: 9265ms, suspected: 4309, visited: 1358 RCed and 9113859 GCed, collected: 780 RCed and 9113712 GCed (18234466|5 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+88.0) duration: 4ms, suspected: 147, visited: 363 RCed and 153 GCed, collected: 0 RCed and 0 GCed (0|61 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 2 ms, total: 4 ms, max sync: 0 ms, removed: 1009
CC(T+88.7) duration: 3ms, suspected: 144, visited: 255 RCed and 153 GCed, collected: 0 RCed and 0 GCed (0|73 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+111.7) duration: 9372ms, suspected: 2009, visited: 5222 RCed and 9115705 GCed, collected: 4479 RCed and 9115560 GCed (9120039|112 waiting for GC)
ForgetSkippable 8 times before CC, min: 0 ms, max: 5 ms, avg: 1 ms, total: 15 ms, max sync: 0 ms, removed: 7887
CC(T+121.1) duration: 9416ms, suspected: 4472, visited: 1316 RCed and 9113771 GCed, collected: 779 RCed and 9113626 GCed (18234444|116 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+126.8) duration: 4ms, suspected: 69, visited: 262 RCed and 181 GCed, collected: 27 RCed and 37 GCed (64|65 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 1 ms, total: 3 ms, max sync: 0 ms, removed: 1241
CC(T+133.8) duration: 4ms, suspected: 193, visited: 331 RCed and 144 GCed, collected: 0 RCed and 0 GCed (64|208 waiting for GC)
ForgetSkippable 4 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, max sync: 0 ms, removed: 336
CC(T+154.3) duration: 9511ms, suspected: 2023, visited: 5128 RCed and 9115705 GCed, collected: 4489 RCed and 9115561 GCed (9120050|60 waiting for GC)
ForgetSkippable 7 times before CC, min: 0 ms, max: 4 ms, avg: 1 ms, total: 11 ms, max sync: 0 ms, removed: 8072
CC(T+159.7) duration: 745ms, suspected: 1930, visited: 1321 RCed and 193 GCed, collected: 43 RCed and 40 GCed (83|0 waiting for GC)
ForgetSkippable 11 times before CC, min: 0 ms, max: 2 ms, avg: 1 ms, total: 11 ms, max sync: 2 ms, removed: 2034
CC(T+165.7) duration: 5ms, suspected: 146, visited: 279 RCed and 153 GCed, collected: 0 RCed and 0 GCed (83|122 waiting for GC)
ForgetSkippable 2 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 1 ms, max sync: 0 ms, removed: 335
CC(T+167.4) duration: 4ms, suspected: 146, visited: 250 RCed and 153 GCed, collected: 0 RCed and 0 GCed (83|130 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
With patch (ie. using std::map)
CC(T+9.3) duration: 18ms, suspected: 9393, visited: 8881 RCed and 8 GCed, collected: 4037 RCed and 5 GCed (4042|307 waiting for GC)
ForgetSkippable 10 times before CC, min: 1 ms, max: 3 ms, avg: 2 ms, total: 26 ms, max sync: 0 ms, removed: 9195
CC(T+21.6) duration: 15ms, suspected: 2418, visited: 5595 RCed and 6254 GCed, collected: 4922 RCed and 6108 GCed (11030|78 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 6 ms, avg: 3 ms, total: 7 ms, max sync: 1 ms, removed: 4180
CC(T+22.7) duration: 6ms, suspected: 990, visited: 1144 RCed and 146 GCed, collected: 0 RCed and 0 GCed (11030|183 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 1 ms, avg: 1 ms, total: 3 ms, max sync: 0 ms, removed: 314
CC(T+47.5) duration: 12595ms, suspected: 479, visited: 2614 RCed and 9113637 GCed, collected: 1902 RCed and 9113586 GCed (9115488|53 waiting for GC)
ForgetSkippable 4 times before CC, min: 0 ms, max: 4 ms, avg: 2 ms, total: 8 ms, max sync: 0 ms, removed: 6509
CC(T+61.6) duration: 12610ms, suspected: 1005, visited: 1239 RCed and 9113618 GCed, collected: 785 RCed and 9113567 GCed (18229840|203 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 1 ms, avg: 0 ms, total: 2 ms, max sync: 0 ms, removed: 407
CC(T+66.9) duration: 16ms, suspected: 1246, visited: 2940 RCed and 2113 GCed, collected: 2616 RCed and 1959 GCed (4575|0 waiting for GC)
ForgetSkippable 2 times before CC, min: 2 ms, max: 3 ms, avg: 3 ms, total: 6 ms, max sync: 0 ms, removed: 1782
CC(T+86.0) duration: 11634ms, suspected: 130, visited: 1937 RCed and 9113882 GCed, collected: 1679 RCed and 9113727 GCed (9115406|52 waiting for GC)
ForgetSkippable 3 times before CC, min: 0 ms, max: 4 ms, avg: 2 ms, total: 7 ms, max sync: 2 ms, removed: 274
CC(T+96.8) duration: 11ms, suspected: 1838, visited: 3472 RCed and 2077 GCed, collected: 2798 RCed and 1926 GCed (4724|16 waiting for GC)
ForgetSkippable 10 times before CC, min: 0 ms, max: 3 ms, avg: 1 ms, total: 14 ms, max sync: 0 ms, removed: 8551
CC(T+104.2) duration: 154ms, suspected: 737, visited: 549 RCed and 194 GCed, collected: 35 RCed and 33 GCed (68|0 waiting for GC)
ForgetSkippable 4 times before CC, min: 0 ms, max: 1 ms, avg: 1 ms, total: 4 ms, max sync: 2 ms, removed: 463
CC(T+116.0) duration: 3ms, suspected: 188, visited: 296 RCed and 159 GCed, collected: 0 RCed and 0 GCed (68|17 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+143.9) duration: 14794ms, suspected: 1903, visited: 5254 RCed and 9115632 GCed, collected: 4481 RCed and 9115486 GCed (9119967|143 waiting for GC)
ForgetSkippable 9 times before CC, min: 0 ms, max: 7 ms, avg: 1 ms, total: 16 ms, max sync: 0 ms, removed: 8918
CC(T+157.3) duration: 13396ms, suspected: 4471, visited: 1328 RCed and 9113696 GCed, collected: 780 RCed and 9113550 GCed (18234297|146 waiting for GC)
ForgetSkippable 0 times before CC, min: 0 ms, max: 0 ms, avg: 0 ms, total: 0 ms, max sync: 0 ms, removed: 0
CC(T+162.4) duration: 5ms, suspected: 897, visited: 1082 RCed and 194 GCed, collected: 27 RCed and 37 GCed (64|65 waiting for GC)
ForgetSkippable 2 times before CC, min: 1 ms, max: 2 ms, avg: 2 ms, total: 4 ms, max sync: 0 ms, removed: 843
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
Reporter | ||
Updated•11 years ago
|
Flags: needinfo?(bugs)
Comment 10•11 years ago
|
||
> Out of curiosity, I would be curious as to what it is about STL containers
> that can't be measured by passing a suitably instrumented allocator? Do STL
> containers (in some implementations) allocate things on the heap without
> using the allocator, or are you concerned about the sizeof(*this) part?
From what I remember, using a custom allocator was either impossible or extremely awkward. See bug 806514 comments 0 through 7. If there are other C++ tricks that would heard, I'd be happy to hear about them.
Reporter | ||
Comment 11•11 years ago
|
||
OK. If I understand correctly, what is hard in this case is that we would need to store member data on the allocator object, and that's not supported.
Btw for completeness I also benchmarked the CC using unordered_map (still GCC 4.6). It's faster than using map, but still 5% to 10% slower than using pldhash.
Comment 12•11 years ago
|
||
My major concern with using a hashtable instead of some tree-like structure is that the allocation for a hashtable has to be one block of memory, which seems like it could be bad for users who are experiencing VM issues. I'm interested in the performance of other structures that can break their allocations up into smaller segments.
Comment 13•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #1)
> When I last measured it a few years ago, hash table look ups were about 15%
> of CC time. So not a huge percentage, but not nothing. I hacked up a trie
> at one point, but it was really slow. My crude trie implementation may be
> to blame for that.
OOC, was this a bit-by-bit radix tree, or was the fan-out larger?
Flags: needinfo?(continuation)
Comment 14•11 years ago
|
||
I think it was byte-by-byte. I didn't look up the right way to implement or anything, I just slapped something together.
Flags: needinfo?(continuation)
Comment 15•11 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #14)
> I think it was byte-by-byte. I didn't look up the right way to implement or
> anything, I just slapped something together.
Ah, OK. byte-by-byte is probably too big; bit-by-bit or nibble-by-nibble sounds more suitable.
Adding a radix/patricia tree to MFBT, converting the cycle collector to use that, and measuring performance differences would be a worthy exercise.
You need to log in
before you can comment on or make changes to this bug.
Description
•