Try out an arena-based allocator for DOM nodes again
Categories
(Core :: DOM: Core & HTML, enhancement, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox76 | --- | fixed |
People
(Reporter: ehsan.akhgari, Assigned: sefeng211)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
(Keywords: parity-safari, Whiteboard: [MemShrink:P2])
Attachments
(23 files, 5 obsolete files)
|
13.86 KB,
patch
|
Details | Diff | Splinter Review | |
|
1.86 KB,
patch
|
Details | Diff | Splinter Review | |
|
21.30 KB,
patch
|
Details | Diff | Splinter Review | |
|
187.95 KB,
patch
|
Details | Diff | Splinter Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review | |
|
47 bytes,
text/x-phabricator-request
|
Details | Review |
Updated•8 years ago
|
| Reporter | ||
Comment 1•8 years ago
|
||
| Reporter | ||
Comment 2•8 years ago
|
||
| Reporter | ||
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
| Reporter | ||
Comment 5•8 years ago
|
||
| Reporter | ||
Comment 6•8 years ago
|
||
Comment 7•8 years ago
|
||
Comment 8•8 years ago
|
||
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
| Reporter | ||
Comment 11•8 years ago
|
||
| Reporter | ||
Comment 12•8 years ago
|
||
| Reporter | ||
Comment 13•8 years ago
|
||
Comment 14•8 years ago
|
||
| Reporter | ||
Comment 15•8 years ago
|
||
| Reporter | ||
Comment 16•8 years ago
|
||
| Reporter | ||
Comment 17•8 years ago
|
||
| Reporter | ||
Comment 18•8 years ago
|
||
| Reporter | ||
Comment 19•8 years ago
|
||
Comment 20•8 years ago
|
||
Comment 21•8 years ago
|
||
Comment 22•8 years ago
|
||
Comment 23•8 years ago
|
||
Comment 24•8 years ago
|
||
| Reporter | ||
Comment 25•8 years ago
|
||
| Reporter | ||
Comment 26•8 years ago
|
||
Comment 27•8 years ago
|
||
| Reporter | ||
Updated•7 years ago
|
| Assignee | ||
Comment 28•7 years ago
|
||
Comment 29•7 years ago
|
||
Updated•6 years ago
|
| Assignee | ||
Comment 30•6 years ago
|
||
I am really sorry for not updating the patch for a while and I'll make sure I update the patch in future!
I am going to give a summary of what I have tried so far and what my plan is, please feel free to provide any suggestions!
The idea is each docGroup will have its own arena, and all DOM objects belong to this docGroup will be allocated into this arena. If the dom objects is going to be allocated before the docGroup is created, it will use the normal malloc function.
The first memory allocator approach I did (Thanks Ehsan for the based patch) used the common memory allocator approach. Like having dedicated memory chunks for different sizes, and place the dom objects accordingly into these memory chunks. I reused the xpcom/ds/ArenaAllocator.h to achieve it.
-
4 sizes (128, 256, 512, Attributes): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=3858ed4436941cffc88a359d40454a80df47932c&framework=1
-
Same as 1) but also increased the chunk size from 8KB to 1MB: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=baeafc14b7b4d97d82ff28801402272a323e226e&framework=1
-
No more sizes differentiation, place everything together based on their allocation order, 10MB chunk size : https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=9380f80e0c586996ba30e80563a1c2ee54123ec5&framework=1
-
No more sizes differentiation, each type of node has its own chunk (chunk per node): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=24bf0ecc08c1b423ac54e90e8fcdc58a3dc07732&framework=1
-
Chunk per node and also 1MB Chunk size: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=986bc580f281cec14b445b6185f48103f356e400&framework=1
-
6 sizes (128, 144, 176, 256, 512, Attribute): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=8ad795d272596f8c87a3f7e2ad95d3a5976fbe33&newProject=try&newRevision=a8676a1647ff7e77f3b511a39e4425de378f38c5&framework=1
None of them really improve anything except 3), so I started to focus to make 3) better.
If we want to be able to place dom objects that have different sizes next to each other, A proper implementation should be able to merge adjacent free objects with different sizes. So I started with a bitmap implementation. However this implementation doesn't work well because the larger the memory chunk, the longer time to flip the bits, which increased the time to do memory allocation. And it also didn't work very well https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=a50004bf9769d6b0b4605c952237ed43ba413d17.
These are the 2 approaches that I am trying to make it better.
https://phabricator.services.mozilla.com/D36114, back to 3), and accept the fact that we can't merge adjacent memory blocks`. https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=ded51b9113b2cab20c4ec9b81a4f15e914bbda55
https://phabricator.services.mozilla.com/D36105, my own implementation to handle free blocks by using a lot of extra data structures https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d19fa4ef993b0330b23abcc26370672426453ac8&newProject=try&newRevision=9071ea8e420b608e71787d3632af155138e55e91.
Again I haven't found a good implementation yet and I also don't have a lot of confidences for my patches because they are a lot of stuff in there and I am worrying I did something totally unnecessary. Any help would be much appreciated!
Randell pointed out jemalloc arena might help, so I will give it a try.
Comment 31•6 years ago
|
||
It sounds a lot like you're effectively re-implementing mozjemalloc (with a pre-allocated backing buffer) through trial and error, it might be worth familiarizing yourself with that logic to get an idea of how we balance various requirements (fragmentation, free list overhead, cache line sharing, perf, etc).
A chunk size of 10MB is going to be a hard sell from a memshrink perspective; I'm also a bit concerned that any chance of wins from cache locality is going to be diminished at that size.
As jesup has suggested it might make sense to just dedicate a mozjemalloc arena to dom allocations (particularly with fission enabled where fewer pages will be sharing a dom allocator)
| Assignee | ||
Comment 32•6 years ago
•
|
||
Yeah, 10MB was just a test to see if it makes sense to place them based on their allocation order, despite their sizes.
I just got the test results for jemalloc arena, the result looks good.
Talso: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d, quite a few high confidence improvements, also some med improvements, and no high confidence regressions. However, some subtests regressed, for example https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d&originalSignature=1657197&newSignature=1657197&framework=1&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7
raptor: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=621f3eafaf3a8c5b179071440532ddc0d11984f7&newProject=try&newRevision=3f4e0d61a36fe2732d809c73106ce73feb301e8d&framework=10, no high confidence changes, a few med confidences improvements/regressions.
| Assignee | ||
Comment 33•6 years ago
|
||
seems fine for most of them, Resident Memory opt stylo regressed
also some improvements, for example https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&newRevision=bd41da1b6e4a9c7136b8be0ab3f08364834a3c33&originalSignature=1887533&newSignature=1887533&framework=4&originalRevision=b00ef1a5f7d39403275e8d8ba3fa9d1c6f9b9bc9
| Assignee | ||
Comment 34•6 years ago
|
||
| Assignee | ||
Comment 35•6 years ago
|
||
Depends on D41054
| Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 37•6 years ago
|
||
When adding this new arena, can we please ensure that it has memory checking support (MOZ_MAKE_MEM_NOACCESS and friends)? Otherwise, use-after-free inside the new arena will be much harder to detect.
Comment 38•6 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #37)
When adding this new arena, can we please ensure that it has memory checking support (MOZ_MAKE_MEM_NOACCESS and friends)? Otherwise, use-after-free inside the new arena will be much harder to detect.
Nevermind, just found it in the patch now :)
Updated•6 years ago
|
Updated•6 years ago
|
| Assignee | ||
Comment 39•6 years ago
|
||
| Assignee | ||
Comment 40•6 years ago
|
||
Depends on D43135
Updated•6 years ago
|
| Assignee | ||
Comment 41•6 years ago
|
||
Some updates to the latest perf numbers.
Tests I did was all talos tests, all raptor tp6 tests, all AWSY test and speedometer tests.
With Pref disabled (uses the current allocation methodology): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=cdccfb33532721b5e2ef69b157b4bad74db2eb1a&newProject=try&newRevision=ac8cad1eab833dfe2bcc11ffb6aa5bcf47eb546c&framework=1
WIth Pref enabled (uses arena allocation): https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=cdccfb33532721b5e2ef69b157b4bad74db2eb1a&newProject=try&newRevision=2e884e10d09503a1036e99a23da22b2f2958f4e5&framework=1
There's only one regression which exists in both comparisons, which is time_to_session_store_window_restored_ms for windows 7. I am not too worrying about it as it exists in both comparisons so it shouldn't be related to the arena allocation itself, but I am looking into it.
Comment 42•6 years ago
|
||
Comment on attachment 9087538 [details]
Bug 1377999 - Unsupport cross docGroup node adoption
Revision D43135 was moved to bug 1467970. Setting attachment 9087538 [details] to obsolete.
| Assignee | ||
Comment 43•6 years ago
|
||
Just an update to this bug.
At this point, it is unlikely that we will land the patch, because we couldn't find a good reason to land it. (Event if the DOM arena is disabled by default).
We tried to find a real use case where it improved the DOM object iteration performance. We pick a large document from html standard, then used NodeIterator to iterator all DOM nodes to eliminate the overheads of javascript, and we expected using DOM Arena should improve the performance of NodeIterator (because it should.....), however, it wasn't the case, and we found a 2% regression. Since we are using jemalloc arena, there aren't too much things we can tune.
I could do more testing to find a special case where the performance is improved, maybe a large document with only Div elements, but again, we would need a real use case improvement for us to land it.
Comment 44•6 years ago
|
||
I could do more testing to find a special case where the performance is improved, maybe a large document with only Div elements, but again, we would need a real use case improvement for us to land it.
There are security benefits to isolating DOM Nodes. Most of the security benefit exists if each type (really: size!) of DOM Node would have a separate arena, but there might be some benefit by having them all somewhere else compared to, say, the media codec allocations. So we might not need a performance win to take this. On the other hand, the patch looks huge, so I could understand there are complexity concerns.
Do I understand your comments correctly that you had a version earlier that was able to group DOM Nodes per object size?
| Assignee | ||
Comment 45•6 years ago
|
||
The current patch which gives us a seems-okayish performance groups DOM Nodes per DocGroup, so it isn't really per object size, as objects with different sizes get grouped into to the same arena.
I have never tried to do it per object size. In the very beginning, I tried to group DOM objects per node type, but it didn't work well, lots of regressions.
Comment 46•6 years ago
|
||
If you're using moz_arena_malloc, you do end up with nodes of the same size being grouped together.
Comment 47•6 years ago
|
||
If you're using moz_arena_malloc, you do end up with nodes of the same size being grouped together.
In this case this patch could be a huge security gain wrt. robustness against UAF exploits. A pointer to an incorrectly freed DOM node can't be manipulated (by heap feng shui) to point to somewhere else (like the vtable of another class). If it is performance neutral, we should very seriously consider still taking it.
Sean, does the above change your assessment on whether it is worthwhile to mold this into land-able state?
Comment 48•6 years ago
|
||
How is using this mozjemalloc arena so different to use normal mozjemalloc from security point of view.
mozjemalloc has bucket sizes, so nodes within certain size limits end up using same bucket.
Comment 49•6 years ago
|
||
How is using this mozjemalloc arena so different to use normal mozjemalloc from security point of view.
If the DOM nodes are mixed with regular allocations, there's more opportunity to get a different object that has a size that fits in the same bucket, into the memory area previously occupied by an object that has a dangling pointer (i.e. UAF).
Comment 50•6 years ago
•
|
||
Say I manage to convince Firefox to free a DOM Node X that I have partial control over (by changing the attributes). I can't do anything interesting just changing the attributes of this DOM node. It is freed (but still used, the UAF) by the allocator, which now allocates a More Interesting Internal Data Structure on the same spot. Now, the memory spot that I control and that was a harmless attribute actually contains a vtable ptr = profit.
This kind of attack is harder if the arena only allows other Less Interesting Objects to be allocated into the freed up memory. It's even harder if it's similarly sized objects of a similar type (i.e. other DOM Nodes), since there's less chance the vtable is in a useful position.
It is even harder if every DOM Node has its own arena. WebKit has this, but it's quite a step further from this still.
Comment 51•6 years ago
|
||
It is even harder if every DOM Node has its own arena.
Fwiw, this is basically what we have for frame classes in the PresShell arena. An arena slot allocated to say a nsBlockFrame is only ever reallocated to a new nsBlockFrame, thus eliminating type confusion.
| Assignee | ||
Comment 52•6 years ago
|
||
gcp: Thanks for all of these info. Yeah, I mean that makes sense to me.
I think for us to make a decision, we need to understand how much security gain does this patch provide, and also whether the security gain worth the performance regressions.
To help us understand more about the performance regressions, based on Ehsan's suggestion, I am going to run the nodeIterator benchmark again on some phones, and we also hope it may shine on some low-end phones because low-end phones usually have a very small amount of cache.
I also feel this thing might need a broader discussion.
Comment 53•6 years ago
•
|
||
(In reply to Mats Palmgren (:mats) from comment #51)
It is even harder if every DOM Node has its own arena.
Fwiw, this is basically what we have for frame classes in the PresShell arena. An arena slot allocated to say a nsBlockFrame is only ever reallocated to a new nsBlockFrame, thus eliminating type confusion.
Yeah, that I know and understand the benefits there. But I'm really wondering if the benefits of using a mozjemalloc arena for all the DOM nodes (well, more precisely, all HTML elements and Text nodes) are worth the complexity and possible performance and memory usage regressions.
(I was expecting performance benefits from the patch, but that doesn't seem to be the case.)
(nsIFrames had the disadvantage that they weren't refcounted, so ensuring sane lifetime management was harder than with refcounted and CCed DOM nodes. And WeakFrames were added because of somewhat unsafe crashes, and that was even before the current arena.)
| Reporter | ||
Comment 54•6 years ago
|
||
(In reply to Gian-Carlo Pascutto [:gcp] from comment #50)
Say I manage to convince Firefox to free a DOM Node X that I have partial control over (by changing the attributes). I can't do anything interesting just changing the attributes of this DOM node. It is freed (but still used, the UAF) by the allocator, which now allocates a More Interesting Internal Data Structure on the same spot. Now, the memory spot that I control and that was a harmless attribute actually contains a vtable ptr = profit.
I don't actually think that we can achieve this benefit, unless if we have per-node-type arenas. Consider the following two nodes, A and B, A with one base class and B with two, so B would have two vtable pointers. So in the scenario above if the UAF'ed ptr is an A pointer and you manipulate its first word-size data member, and then get Gecko to allocate a B node in its place, you're controlling a vtable pointer again.
For frames, we actually use a much different allocation strategy, we allocate giving the type of the frame object, which is used to reuse frame allocation spots of the same type. This guarantees that the scenario that I described above can never happen, but it is not something that this patch currently implements, and the complexity of implementing such an approach is quite high...
| Assignee | ||
Comment 55•6 years ago
|
||
While I was doing the testing on low-end phones, we found a bug of NodeIterator (bug 1596238), and once we fixed it, things started to changed.
I did more testing 2 low-end phones (Moto G5, Galaxy S4) and on my high-end XPS 15. I used the same testing technique which was using NodeIterator API to iterate all DOM elements, and as the result, DOM Arena performed very well, it was better than the baseline(which was a commit where my changes based of).
Full results: https://docs.google.com/spreadsheets/d/1AlsPqeEiaed83Nm8OnZNbFxrnAuYdX-ima6djvfR18c/edit?usp=sharing
So yeah, we are reconsidering our decision!
| Assignee | ||
Comment 56•6 years ago
|
||
| Assignee | ||
Comment 57•6 years ago
|
||
Depends on D57698
| Assignee | ||
Comment 58•6 years ago
|
||
Depends on D57699
| Assignee | ||
Comment 59•6 years ago
|
||
Depends on D57700
| Assignee | ||
Comment 60•6 years ago
|
||
Depends on D57701
| Assignee | ||
Comment 61•6 years ago
|
||
Depends on D57702
| Assignee | ||
Comment 62•6 years ago
|
||
Depends on D57703
| Assignee | ||
Comment 63•6 years ago
|
||
Depends on D57704
| Assignee | ||
Comment 64•6 years ago
|
||
Depends on D57705
| Assignee | ||
Comment 65•6 years ago
|
||
Depends on D57706
Updated•6 years ago
|
| Assignee | ||
Comment 66•6 years ago
|
||
We customized the new operator to allow Dom Arena to be hooked
in.
Depends on D57698
| Assignee | ||
Comment 67•6 years ago
|
||
Document is a special case. Since Documents could own NodeInfoManager,
use the global new operator for Documents
Depends on D62350
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
| Assignee | ||
Comment 68•6 years ago
|
||
Depends on D57707
| Assignee | ||
Comment 69•6 years ago
|
||
Depends on D62352
| Assignee | ||
Comment 70•6 years ago
|
||
Depends on D62353
| Assignee | ||
Comment 71•6 years ago
|
||
Depends on D62354
| Assignee | ||
Comment 72•6 years ago
|
||
Depends on D62355
Updated•6 years ago
|
| Assignee | ||
Comment 73•5 years ago
|
||
There's no existing macro to allow use declare a final
DeleteCycleCollectable, this patch adds one
Depends on D62351
| Assignee | ||
Comment 74•5 years ago
|
||
Depends on D63262
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
| Assignee | ||
Comment 75•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 76•5 years ago
|
||
Comment 77•5 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/4152a66c9cab
https://hg.mozilla.org/mozilla-central/rev/9b10d4a18d8f
https://hg.mozilla.org/mozilla-central/rev/479b7648e14f
https://hg.mozilla.org/mozilla-central/rev/be4727306b56
https://hg.mozilla.org/mozilla-central/rev/cd8a79470a89
https://hg.mozilla.org/mozilla-central/rev/c8f21f2b4452
https://hg.mozilla.org/mozilla-central/rev/52013e3b7c7e
https://hg.mozilla.org/mozilla-central/rev/ad20ed22a644
https://hg.mozilla.org/mozilla-central/rev/fc5b51e31c10
https://hg.mozilla.org/mozilla-central/rev/d377d84d8500
https://hg.mozilla.org/mozilla-central/rev/83e21f5ac307
https://hg.mozilla.org/mozilla-central/rev/1d82b510aae0
https://hg.mozilla.org/mozilla-central/rev/538c6cf14b44
https://hg.mozilla.org/mozilla-central/rev/e9f0c79a628a
https://hg.mozilla.org/mozilla-central/rev/ac3e0036d412
https://hg.mozilla.org/mozilla-central/rev/cdc0080abe33
https://hg.mozilla.org/mozilla-central/rev/6d39a375d5e2
https://hg.mozilla.org/mozilla-central/rev/5ed5a5b322db
https://hg.mozilla.org/mozilla-central/rev/32010372f437
Description
•