Closed Bug 1377999 Opened 8 years ago Closed 5 years ago

Try out an arena-based allocator for DOM nodes again

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla76

Tracking Flags:

Tracking

Status

firefox76

---

fixed

People

(Reporter: ehsan.akhgari, Assigned: sefeng211)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: parity-safari, Whiteboard: [MemShrink:P2])

Attachments

(23 files, 5 obsolete files)

Part 1: Refactor nsPresArena into a reusable base class so that it can be shared between DOM and layout 8 years ago (no longer active) 13.86 KB, patch		Details \| Diff \| Splinter Review
Part 2: Disambiguate ArenaObjectID in nsPresArena.cpp 8 years ago (no longer active) 1.86 KB, patch		Details \| Diff \| Splinter Review
Part 3: Add support for allocating DOM nodes into arenas 8 years ago (no longer active) 21.30 KB, patch		Details \| Diff \| Splinter Review
Part 4: Port the most common nodes appearing in HTML documents to be allocated in arenas 8 years ago (no longer active) 187.95 KB, patch		Details \| Diff \| Splinter Review
Bug 1377999 - Disable cross docGroup adoption for test cases and Chrome code 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make DOM objects uses a dedicated arena allocator 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Unsupport cross docGroup node adoption 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make DOM objects uses a dedicated arena allocator 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Add the capability to do DOM node Arena allocation r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make HTML Element to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make Attribute to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make SVG nodes to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make MathML nodes to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make SMILTimedElement to use DOM Arena r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make nsXMLElement to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make nsXULElement to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make GeneratedImageContent to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make DocumentFragment to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Set nsINode to use the customized new operator r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Enable default new operator back for Document node r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make Comment to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make TextNode to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make CDATASection to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make DocumentType to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make ProcessingInstruction to adapt the DOMArena changes r=smaug 6 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Refactor some macros to support a final DeleteCycleCollectable r=smaug 5 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Make nsIContent to declare a final DeleteCycleCollectable r=smaug 5 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1377999 - Handle cross docGroup node adoption by keeping arena alive r=smaug 5 years ago Sean Feng [:sefeng211] 47 bytes, text/x-phabricator-request		Details \| Review

(no longer active)

Reporter

Description

•

8 years ago

While profiling some slow DOM code that comes up in benchmarks like Speedometer, we end up bottlenecking on pure pointer chasing for DOM traversal because of bad memory locality of DOM nodes. I think we can do better, by attempting arena allocation of DOM nodes again. I have some ideas on how to do this again without the downsides of the previous approach (see bug 403830). Firstly we can try to share code with nsPresArena. That in itself will give us the advantage of freelists that aren't fixed-size, contrary to the approach above. But I'd also like to try next power of two rouding bucket allocation like jemalloc does in order to avoid fragmentation when reusing memory of freed nodes to allocate new nodes, so nodes will be sorted into separate arenas based on their bucket size. Furthermore I would like to put attributes and content nodes in separate arenas since they are typically accessed separately during traversal, so leaving them separate improves the locality of traversal algorithms going over each one. One complication is how to handle node adoption. Olli's suggestion was to allocate them using the default allocator in the adoption case and store the allocator in a property on the node. This way we don't have to worry about finding the right arena when the node is about to be destroyed. We of course need a node flag to know whether to check the property and those are scarce these days. This setup will obviously use more memory than we currently use, the question is how much, and I'd like to find that out. We're hoping bug 1377993 improves DOM memory usage to some extent in order to buy us some breathing room here. I'd also like to get some performance measurements once the changes are implemented in order to know what the exact memory-speed trade-off is. After we have this data we can have a conversation about whether we should land the patches, tune them or give up! CCing erahm to keep him in the loop from the beginning but no reason to be alarmed just yet. :-)

Andrew Overholt [:overholt]

Updated

•

8 years ago

Priority: -- → P1

(no longer active)

Reporter

Comment 1

•

8 years ago

So I did a study of the current sizes of our DOM nodes on 32-bit and 64-bit. Here is the raw data: https://docs.google.com/a/mozilla.com/spreadsheets/d/1i6yWXsDz_LFvvG6uvy57HupkHwSfdEjlIZq25Jsh7os/edit?usp=sharing Here are the take-aways that are going to shape the design here: * HTMLAudioElemennt and HTMLVideoElement are huge in size and total outliers. Due to the next power of two bucketing we can't unfortunately include them in this optimization. I filed bug 1378506 about this as a follow-up to reduce the size of HTMLMediaElement. * On 32-bit, the only node that would fit in a 64-bit bucket is a Comment node. On 64-bit, this node's size is 120 bytes, and there are a lot of other nodes that are 128 bytes, so we need to have a bucket size of 128 bytes size there. So we could arena allocate Comment nodes on 64-bit platforms and leave it heap allocated on 32-bit platforms, but for now for simplicity I'd rather leave it alone on all platforms. * All other nodes on both platforms fit nicely in one of these three buckets: 128 bytes, 256 bytes, or 512 bytes.

(no longer active)

Reporter

Comment 2

•

8 years ago

I forgot to include Text and CDATASection, added those to the spreadsheet now. Their presence together with comment might make it worthwhile to have a 64 byte bucket for 32-bit platforms since Text especially is very common.

(no longer active)

Reporter

Updated

•

8 years ago

Depends on: 1378983

(no longer active)

Reporter

Comment 3

•

8 years ago

Eric, what types of tests should I run to get a sense of whether the memory usage of the patches here is acceptable or not?

Flags: needinfo?(erahm)

Mats Palmgren (inactive)

Comment 4

•

8 years ago

> Firstly we can try to share code with nsPresArena. I'd argue against using nsPresArena for DOM nodes. nsPresArena is intentionally slower and use more memory than needed in order to support poisoning and type confusion mitigation measures. Listing a few things off the top of my head: 1. writing the poison value costs CPU cycles and possibly cache misses 2. memory for one type of instance can only be re-used for that same type, which leads to a slight over-allocation / cache misses 3. the "free lists" are not lists, but nsTArrays, which means malloc / free calls just to manage these pointers. (without poisoning you could instead use the free instances themselves for a linked list, which would be faster and use less memory) 4. memory is never free()'d; we always spend the high-water mark for each type separately I don't think these measures are necessary for ref-counted DOM nodes. I think it would be better to write a new non-poisoning arena class that can be optimized for performance.

(no longer active)

Reporter

Comment 5

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #4) > > Firstly we can try to share code with nsPresArena. > > I'd argue against using nsPresArena for DOM nodes. > nsPresArena is intentionally slower and use more memory than needed > in order to support poisoning and type confusion mitigation measures. > Listing a few things off the top of my head: > 1. writing the poison value costs CPU cycles and possibly cache misses I have disabled that part for DOM arenas. > 2. memory for one type of instance can only be re-used for that same > type, which leads to a slight over-allocation / cache misses For DOM arenas, our bucketing strategy will be very different. I have a set of WIP patches which I will post later today which should demonstrate where I'm moving towards. But for now, we use 4 arenas for DOM node allocations in 64-bit and 5 in 32-bit platforms, based on the description in comment 0 depending on the size of the nodes not their type. > 3. the "free lists" are not lists, but nsTArrays, which means malloc / > free calls just to manage these pointers. (without poisoning you > could instead use the free instances themselves for a linked list, > which would be faster and use less memory) Yeah this is one aspect which isn't ideal. Right now the mozilla::dom::Arena class that I have is fairly simple. I may end up switching its backend implementation to not share anything with nsPresArena for that reason if profiling shows the cost of maintaining the "free lists" is important, but right now there are other more important performance issues with the patches which I'll write about in a separate comment... (For now I have decided to reuse the nsPresArena code since the arena manipulation code is kind of the least interesting aspect of this experiment. None of the decisions here are final yet.) > 4. memory is never free()'d; we always spend the high-water mark for > each type separately Not sure if I follow this part, but the memory management of DOM nodes will not change, they will stay refcounted and will be freed. We of course eventually need to make sure if an entire arena gets freed up we reclaim the memory of the arena back to the OS which nsPresArena currently doesn't know how to do. > I don't think these measures are necessary for ref-counted DOM nodes. > I think it would be better to write a new non-poisoning arena class > that can be optimized for performance. I expect to end up doing that in the end, but I'd still like to try to share code if we can. So far we aren't sharing all that much code at the "nsPresArena" level at any rate, as most of the sharing happens at the ArenaAllocator level...

(no longer active)

Reporter

Comment 6

•

8 years ago

So here is a summary of where I am right now with this. I have the basic patches that implement the core of the idea here. So far I have spent most of the time on the correctness side of things, and the patches are good enough for local browsing, and they manage to successfully run Speedometer but the try server was fairly unhappy with me last night when I pushed what I had to try. I haven't looked into the details of failures too closely yet. I have been looking at the perf side of things, and things have been looking not all that great! Initially when I ran Speedometer on a build with my patches, it was around 10 points *slower* than a build without these patches! As far as I can tell, none of the code that I have added really shows up in the profile all that much, here is some details on the profiles I have looked at: * nsContentList::PopulateSelf() seems to show up in profiles after the patches at around half the time it used to take before the patches. The pointer chasing loops there were one of the motivating factors behind the work here. * The cost of allocating new HTML elements seems to have gone down in the profiles after the patches compared to profiles before. Again, this is what I would expect. * The cost of CC graph building (CCGraphBuilder::BuildGraph and stuff under it) has skyrocketted in the profiles after the patches!!! In a profile before the patches of about 100 of Speedometer subtest runs, this function took 0.45% of the total time. After the patches, it takes 26.74% of the total time! ** As far as I can tell, at least a huge part of this cost is due to a massive increase in the cost of hashtable lookups under this code! PLDHashTable::Add() has gone up from 0.38% of total time before to 9.32% of total time after!!! The profile shows that after the patches, when we are trying to add new entries to the CC hashtable, we're getting collisions all the time, and we're getting destroyed by the hashtable performance. :-( I think this could be due to the fact that the pointers stored into the hashtable now follow a pattern that hashes quite awfully for some reason, but I have yet to investigate this very deeply. It's still unclear if there are other things at play here, but the hashtable collisions issue here makes any kind of perf comparison pointless for now...

Mats Palmgren (inactive)

Comment 7

•

8 years ago

> Not sure if I follow this part, but the memory management of DOM nodes > will not change, they will stay refcounted and will be freed. You mean "freed" in the sense "given back to the arena", but that only means it'll end up on a free list. What I mean is, even if you "free" all your DOM nodes, so that the arena chunks could be *really* freed in the jemalloc free() sense, it won't. Memory that's been malloc'ed for the chunks are only free()'d when the arena is destroyed. Worse, nsPresArena only re-use memory for the same type as it was originally allocated for, so for C++ types A and B that are the same size you can have thousands of free A's in the arena, but if there are no free B's, a B allocation will still claim new arena space rather than re-using a free A. You can workaround this latter feature though, by using "synthetic" type IDs so that multiple C++ classes to use the same ID, at the cost of over-allocating for the smaller types. (I'm guessing this is what you're effectively doing for the buckets you're talking about)

Mats Palmgren (inactive)

Comment 8

•

8 years ago

> PLDHashTable::Add() has gone up from 0.38% of total time before to 9.32% of total time after You're using RegisterArenaRefPtr/DeregisterArenaRefPtr? Those are really expensive and should be avoided if possible - it keeps a hashtable of all objects.

Emanuel Hoogeveen [:ehoogeveen]

Comment 9

•

8 years ago

From reading the past few comments, I take it these arenas share no code with the GC's arenas? GC arenas just hold a single type of object (really all they care about is the size), using bump allocation where they can and spans to deal with fragmentation. It shouldn't be hard to remove the GC-specific bits and make something a little more generic, but I suppose that's another rabbit hole (especially since GC arenas come from GC chunks, and I don't know as much about how the GC chunk pool works).

Mats Palmgren (inactive)

Comment 10

•

8 years ago

Right, nsPresArena is mostly used for layout/style system (non-refcounted) objects.

(no longer active)

Reporter

Comment 11

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #8) > > PLDHashTable::Add() has gone up from 0.38% of total time before to 9.32% of total time after > > You're using RegisterArenaRefPtr/DeregisterArenaRefPtr? No. This is CCGraph::mPtrNodeToMap <https://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/xpcom/base/nsCycleCollector.cpp#842>, nothing that I added. The massive increase in hashtable collisions is of course quite obvious. It is because of this extremely crappy hash function that is used for this hash table: <https://searchfox.org/mozilla-central/rev/f1472e5b57fea8d4a7bbdd81c44e46958fe3c1ce/xpcom/ds/PLDHashTable.cpp#73>. Sigh. :-( What my patch changes is it makes a lot of nodes that get added to this table as keys have addresses that are very close in numerical values, and this hash function just shifts them down and otherwise leaves them unchanged, so the entries don't end up distributed in the hashtable *at all*, leading to the really horrible performance I was observing. I'll file a separate bug to fix that.

(no longer active)

Reporter

Comment 12

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #7) > > Not sure if I follow this part, but the memory management of DOM nodes > > will not change, they will stay refcounted and will be freed. > > You mean "freed" in the sense "given back to the arena", but that only > means it'll end up on a free list. Yes. > What I mean is, even if you "free" > all your DOM nodes, so that the arena chunks could be *really* freed > in the jemalloc free() sense, it won't. Memory that's been malloc'ed > for the chunks are only free()'d when the arena is destroyed. Yes, I understand. That is what I was talking about in the second part of the paragraph you quoted. :-) We should fix that for DOM nodes. (I'm not really sure why the current situation is OK for frames really either, my idea was to fix it centrally for both if possible, but I care about the DOM arena side here.) > Worse, nsPresArena only re-use memory for the same type as it was > originally allocated for, so for C++ types A and B that are the same > size you can have thousands of free A's in the arena, but if there > are no free B's, a B allocation will still claim new arena space > rather than re-using a free A. > > You can workaround this latter feature though, by using "synthetic" > type IDs so that multiple C++ classes to use the same ID, at the cost > of over-allocating for the smaller types. > (I'm guessing this is what you're effectively doing for the buckets > you're talking about) Yes that is exactly what I'm doing. :-) (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #9) > From reading the past few comments, I take it these arenas share no code > with the GC's arenas? Correct. > GC arenas just hold a single type of object (really > all they care about is the size), using bump allocation where they can and > spans to deal with fragmentation. > > It shouldn't be hard to remove the GC-specific bits and make something a > little more generic, but I suppose that's another rabbit hole (especially > since GC arenas come from GC chunks, and I don't know as much about how the > GC chunk pool works). That sounds like overkill to me to be honest. :-)

(no longer active)

Reporter

Comment 13

•

8 years ago

With the path in bug 1379282, the cost of hashtable lookups drop back down to the pre-patch levels!

Depends on: 1379282

Mats Palmgren (inactive)

Comment 14

•

8 years ago

> I'm not really sure why the current situation is OK for frames really either It's both a mitigation against type confusion - it guarantees a memory address is never re-used for a different type; and a mitigation against UAF - it's always filled with poison while not used (both for the lifetime of the shell). It's a price we're willing to pay, since it's mitigated hundreds of otherwise exploitable crashes.

(no longer active)

Reporter

Comment 15

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #14) > > I'm not really sure why the current situation is OK for frames really either > > It's both a mitigation against type confusion - it guarantees a memory > address is never re-used for a different type; and a mitigation against > UAF - it's always filled with poison while not used (both for the lifetime > of the shell). It's a price we're willing to pay, since it's mitigated > hundreds of otherwise exploitable crashes. OK, makes sense. I think your comments have made me want to not really reuse much code from nsPresArena any more after all, since I think if I end up special casing the arena page deallocation issue only for DOM arenas then there is not much sharing in practice anyway. I'll post my WIP patches now before I go on vacation but the first 2 parts will probably just completely change, and the third part will get rewritten on top of it when I get back...

(no longer active)

Reporter

Comment 16

•

8 years ago

Attached patch Part 1: Refactor nsPresArena into a reusable base class so that it can be shared between DOM and layout — Details — Splinter Review

(no longer active)

Reporter

Comment 17

•

8 years ago

Attached patch Part 2: Disambiguate ArenaObjectID in nsPresArena.cpp — Details — Splinter Review

(no longer active)

Reporter

Comment 18

•

8 years ago

Attached patch Part 3: Add support for allocating DOM nodes into arenas — Details — Splinter Review

(no longer active)

Reporter

Comment 19

•

8 years ago

Attached patch Part 4: Port the most common nodes appearing in HTML documents to be allocated in arenas — Details — Splinter Review

Mats Palmgren (inactive)

Comment 20

•

8 years ago

Just to remind you when you come back: it's suboptimal to use nsTArray for the free lists in a non-poisoning arena. Using a single-link list on the objects themselves is much faster. Maybe the FreeList type could be a template param? Or maybe it could have FreeListArray and FreeListLinkedList and then select which one to use based on EnablePoisoning through some template magic?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 21

•

8 years ago

yeah, nsTArray is bad for performance. linked list or SegmentedVector should work better.

Eric Rahm [:erahm]

Comment 22

•

8 years ago

(In reply to (Away 7/10-7/17) from comment #3) > Eric, what types of tests should I run to get a sense of whether the memory > usage of the patches here is acceptable or not? The "tabs open" measurements from awsy should be a good indicator. Do a try run without your patches, do a bunch of retriggers, then repeat with the patches. You can then diff them in perfherder. I think the following should work: > ./mach try -b o -p linux64 -u awsy-e10s -t none --rebuild 5

Flags: needinfo?(erahm)

Boris Zbarsky [:bzbarsky]

Comment 23

•

8 years ago

I sort of wonder whether it's worth doing some sort of moving allocator, with handles, etc, for nodes. It's a big project, but it would allow us to do arenas without weird high-water-mark or fragmentation behavior: when an arena chunk gets too sparse, just move things out of it and kill it off.

Eric Rahm [:erahm]

Comment 24

•

8 years ago

P2 for now, but we definitely want to have a good feel for the memory usage changes before landing this.

Whiteboard: [MemShrink] → [MemShrink:P2]

(no longer active)

Reporter

Comment 25

•

8 years ago

I have decided to punt on this for 57, I think... :/ I have kept the patches in my queue but people keep bitrotting them, and it's no longer worth my time trying to keep them applied. So far what I had causes slowdowns as opposed to make anything faster and I haven't had the time to fix the freelist issues yet but those issues don't really show up in profiles anyhow, so I doubt that would be where to look for the sources of the slowdown. Perhaps we can look at this again when there is not so much time pressure...

(no longer active)

Reporter

Comment 26

•

8 years ago

Note to self: current non-building tree here: https://github.com/ehsan/gecko-dev/tree/arena_allocated_dom_nodes I think the most recent bitrot was due to bug 1387956, possibly among others...

Andrew Overholt [:overholt]

Comment 27

•

8 years ago

AFAIK these aren't destined to be fixed in 56 or 57 so I'm setting the priority to P2.

Priority: P1 → P2

Vicky Chin [:vchin]

Updated

•

7 years ago

Blocks: 1505256

(no longer active)

Reporter

Updated

•

7 years ago

Assignee: ehsan → sefeng

Sean Feng [:sefeng211]

Assignee

Comment 28

•

7 years ago

Just to provide an update about preventing cross docGroup adoption (The context here is we want to see if we can prevent cross docGroup adoption because it could make the implementation of this bug easier and it could also help Fission for isolating docGroups) I created a new patch (https://phabricator.services.mozilla.com/D15150) with minimum changes to prevent cross docGroup adoption, pushed to try server and spent some time on test failures to see why the tests failed and what they were trying to test (Try build of the patch https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8be95a7639bcd98fe406a3c1d49ef4c963565a6). I think the Try build looks good enough to do a short summary for what we should expect if we prevent cross docGroup adoption (No related test failures for Linux, although there are a few for Windows) I think the test failures fall into these categories: 1) Adoption happens around custom XULElements (different security principals); There are only a few use cases in our code 2) Tests related to testing cross compartment features (creates node adoption between Chrome code and content document) 3) Feature tests which "accidentally" trigger this (For example, https://searchfox.org/mozilla-central/rev/64ef7bc9fb478ba7c292f9fa57813d3fe864201e/accessible/tests/mochitest/treeupdate/test_ariaowns.html#601) I think it is fair to say we can prevent cross docGroup adoption, this is doable and shouldn't cause much trouble for us.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 29

•

7 years ago

That looks green enough to do some performance testing. You could also push to try with all the talos tests, with and without your patch so that we could see how it behaves with existing talos tests.

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: DOM → DOM: Core & HTML

Sean Feng [:sefeng211]

Assignee

Comment 30

•

6 years ago

I am really sorry for not updating the patch for a while and I'll make sure I update the patch in future!

I am going to give a summary of what I have tried so far and what my plan is, please feel free to provide any suggestions!

The idea is each docGroup will have its own arena, and all DOM objects belong to this docGroup will be allocated into this arena. If the dom objects is going to be allocated before the docGroup is created, it will use the normal malloc function.

The first memory allocator approach I did (Thanks Ehsan for the based patch) used the common memory allocator approach. Like having dedicated memory chunks for different sizes, and place the dom objects accordingly into these memory chunks. I reused the xpcom/ds/ArenaAllocator.h to achieve it.

4 sizes (128, 256, 512, Attributes): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=3858ed4436941cffc88a359d40454a80df47932c&framework=1
Same as 1) but also increased the chunk size from 8KB to 1MB: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=baeafc14b7b4d97d82ff28801402272a323e226e&framework=1
No more sizes differentiation, place everything together based on their allocation order, 10MB chunk size : https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=9380f80e0c586996ba30e80563a1c2ee54123ec5&framework=1
No more sizes differentiation, each type of node has its own chunk (chunk per node): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=24bf0ecc08c1b423ac54e90e8fcdc58a3dc07732&framework=1
Chunk per node and also 1MB Chunk size: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=986bc580f281cec14b445b6185f48103f356e400&framework=1
6 sizes (128, 144, 176, 256, 512, Attribute): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=a8676a1647ff7e77f3b511a39e4425de378f38c5&framework=1

None of them really improve anything except 3), so I started to focus to make 3) better.

If we want to be able to place dom objects that have different sizes next to each other, A proper implementation should be able to merge adjacent free objects with different sizes. So I started with a bitmap implementation. However this implementation doesn't work well because the larger the memory chunk, the longer time to flip the bits, which increased the time to do memory allocation. And it also didn't work very well https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=a50004bf9769d6b0b4605c952237ed43ba413d17.

These are the 2 approaches that I am trying to make it better.
https://phabricator.services.mozilla.com/D36114, back to 3), and accept the fact that we can't merge adjacent memory blocks`. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=ded51b9113b2cab20c4ec9b81a4f15e914bbda55

https://phabricator.services.mozilla.com/D36105, my own implementation to handle free blocks by using a lot of extra data structures https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=9071ea8e420b608e71787d3632af155138e55e91.

Again I haven't found a good implementation yet and I also don't have a lot of confidences for my patches because they are a lot of stuff in there and I am worrying I did something totally unnecessary. Any help would be much appreciated!

Randell pointed out jemalloc arena might help, so I will give it a try.

Eric Rahm [:erahm]

Comment 31

•

6 years ago

It sounds a lot like you're effectively re-implementing mozjemalloc (with a pre-allocated backing buffer) through trial and error, it might be worth familiarizing yourself with that logic to get an idea of how we balance various requirements (fragmentation, free list overhead, cache line sharing, perf, etc).

A chunk size of 10MB is going to be a hard sell from a memshrink perspective; I'm also a bit concerned that any chance of wins from cache locality is going to be diminished at that size.

As jesup has suggested it might make sense to just dedicate a mozjemalloc arena to dom allocations (particularly with fission enabled where fewer pages will be sharing a dom allocator)

Sean Feng [:sefeng211]

Assignee

Comment 32

•

6 years ago

•

Edited

Yeah, 10MB was just a test to see if it makes sense to place them based on their allocation order, despite their sizes.

I just got the test results for jemalloc arena, the result looks good.

Talso: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d, quite a few high confidence improvements, also some med improvements, and no high confidence regressions. However, some subtests regressed, for example https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d&originalSignature=1657197&newSignature=1657197&framework=1&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7

raptor: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d&framework=10, no high confidence changes, a few med confidences improvements/regressions.

Sean Feng [:sefeng211]

Assignee

Comment 33

•

6 years ago

AWSY result https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=b00ef1a5f7d39403275e8d8ba3fa9d1c6f9b9bc9&newProject=try&newRevision=bd41da1b6e4a9c7136b8be0ab3f08364834a3c33&framework=4

seems fine for most of them, Resident Memory opt stylo regressed

also some improvements, for example https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=bd41da1b6e4a9c7136b8be0ab3f08364834a3c33&originalSignature=1887533&newSignature=1887533&framework=4&originalRevision=b00ef1a5f7d39403275e8d8ba3fa9d1c6f9b9bc9

Sean Feng [:sefeng211]

Assignee

Comment 34

•

6 years ago

Attached file Bug 1377999 - Disable cross docGroup adoption for test cases and Chrome code (obsolete) — Details

Sean Feng [:sefeng211]

Assignee

Comment 35

•

6 years ago

Attached file Bug 1377999 - Make DOM objects uses a dedicated arena allocator (obsolete) — Details

Depends on D41054

(no longer active)

Reporter

Updated

•

6 years ago

Blocks: heap-partitioning

Depends on: 1364359

Tom Ritter [:tjr] (OOTO until April)

Updated

•

6 years ago

Flags: needinfo?(tom)

Tom Ritter [:tjr] (OOTO until April)

Updated

•

6 years ago

Flags: needinfo?(tom)

Keywords: parity-safari

Christian Holler (:decoder)

Comment 37

•

6 years ago

When adding this new arena, can we please ensure that it has memory checking support (MOZ_MAKE_MEM_NOACCESS and friends)? Otherwise, use-after-free inside the new arena will be much harder to detect.

Christian Holler (:decoder)

Comment 38

•

6 years ago

(In reply to Christian Holler (:decoder) from comment #37)

When adding this new arena, can we please ensure that it has memory checking support (MOZ_MAKE_MEM_NOACCESS and friends)? Otherwise, use-after-free inside the new arena will be much harder to detect.

Nevermind, just found it in the patch now :)