And probably make it a linked list in the process.
OK, so I definitely want to make this a linked list. As far as arena allocation goes, Stuart says that fragmentation is not a worry: in the size I'm talking about here (56 bytes for a FrameConstructionItem at the moment, but likely to grow by a few words) jemalloc effectively has a built-in arena. I did some measurements to see how many FrameConstructionItems we're allocating and what the high water marks are for various tasks. Task | Allocations | High water mark -------------------------------------------------------------- Browser startup | 200 | 35 Type "cnn.com" in url bar | 358 | 35 Load cnn.com | ~3000 | 66 Load html5 single-page | ~175,000 | 1019 Pretty-print 700KB XML file | ~150,000 | 2421 So we would be recycling objects off that list all along, basically. So the only reason to use an arena-like thing would be performance, right? jemalloc does apparently do some locking, so it might be nice to avoid that. If we do want to do an arena, then it seems like an nsFixedSizeAllocator is the way to go. That's what we use elsewhere (for nodeinfos, parser stuff, etc). Given the above high-water marks, it seems like using a static fixed-size allocator that we don't bother to clear (as we do for other cases) should not be a problem: 2500 * 56 bytes == 75KB memory usage.
Created attachment 366324 [details] [diff] [review] Fix
In comment #1 you pretty much convinced me we don't need the custom arena :-). jemalloc or any other good allocator should have a thread-local freelist so some number of objects can be allocated and freed without locking. Feel like creating a microbenchmark in nsCSSFrameConstructor to compare a million allocations and releases of a hundred objects each way? Otherwise, if you want to go with what you've got, it looks fine, except it seems to me there is no reason not to release the pool at the end of a frame construction phase --- that should be super-easy and simple right?
OK, did the microbenchmark on Mac (so no jemalloc; building on Linux now to test jemalloc). The upshot is that the fixed-size allocator handles that test in about 40ns per alloc/free pair, while the default OS allocator needs more like 240ns. Given the numbers above, that comes out to 6ms vs 36ms allocating all the FrameConstructionItems on that 700KB XML file, or 7ms vs 42ms on the HTML5 load. Not worth worrying about. So if fragmentation is not a worry (including on mac, where we don't use jemalloc), code simplicity says no custom allocator stuff here.
On Linux, I get similar numbers: 20s for jemalloc, 5s for the fixed size allocator (the Mac numbers above correspond to 24s for system malloc, 4s for the fixed size allocator). Still not worth worrying about.
Hijacking my own bug.
Summary: Switch FrameConstructionItemList to arena allocation → [FIX]Switch FrameConstructionItemList to using PRCList
Created attachment 366481 [details] [diff] [review] Without the fixed-size allocator
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.