Reduce allocation poisoning in mozjemalloc in release
Categories
(Core :: Memory Allocator, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox120 | --- | fixed |
People
(Reporter: jesup, Assigned: pbone)
References
Details
Attachments
(3 files)
Google and I believe safari aren't poisoning all allocations on free (chrome may be poisoning 1 cacheline's worth on x86). This is costing us significant performance, perhaps as much as 2%, maybe more. We should consider disabling this in release and late beta (or perhaps all betas).
We'll want try perf runs to disable poisoning if it's left on in nightly/local builds.
Upsides: performance(!)
Downsides:
- Less obvious UAF crashes (where the memory hasn't been reallocated/cleared)
** UAFs are likely to show up as wildptr crashes - Information leakage in freshly reallocated memory
** only from the same site due to fission, except maybe on android - Uninitialized data will be non-deterministic, which may expose existing undefined-behavior bugs. Note however that most of these are caught by the compiler or static analysis test runs.
- UAF-caused crashes will be less deterministic
Comment 1•1 year ago
|
||
I have filed at least a few crashes on poison values that got fixed that were only showing up on release or beta, for what it is worth.
Comment 2•1 year ago
|
||
I don't think we should do this. The poison value has enabled fixing a lot of crashes. Maybe somebody could take another look at performance improvements for poisoning?
Reporter | ||
Comment 3•1 year ago
|
||
(chatted a bunch with mccr8)
Performance runs show 0.75% on SP3, and ~1.5% on pageload (*speedindex), up to 4% on amazon.
We can probably retain most (80%? 90%?) of the e5e5 crashes by poisoning a cacheline or two. That might have little or no impact on the perf win.
One risk is that we may not crash if the UAF is quick (before reallocation). For example, calling through vptrs. This makes it hard to spot UAFs that happen quickly.
An additional possible mitigation to these effects is to leverage the PHC work to do probabilistic poisoning of memory.
Assignee | ||
Comment 4•1 year ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #3)
One risk is that we may not crash if the UAF is quick (before reallocation). For example, calling through vptrs. This makes it hard to spot UAFs that happen quickly.
How does that relate to poisoning a cache line or two? That sounds more like delaying the poisoning for later?
Assignee | ||
Comment 5•1 year ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #3)
(chatted a bunch with mccr8)
Performance runs show 0.75% on SP3, and ~1.5% on pageload (*speedindex), up to 4% on amazon.We can probably retain most (80%? 90%?) of the e5e5 crashes by poisoning a cacheline or two. That might have little or no impact on the perf win.
Thanks.
For poisoning only one cache line I have a 0.53% sp3 improvement on Windows, but with low confidence:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ceeefc072d1e8e62ea52de536b6e8ff4374f36d5&newProject=try&newRevision=721d93ba9f4f510ec956370b76f815444dd9648c&framework=13&page=1&filter=speedometer
And a 2.17% improvment on amazon's speedindex (low confidence).
My patch goes to some effort to write to only one cache line, except when the memory cell is less than or equal to 64 bytes long, it just writes the whole thing in that case. For example an 96 byte cell won't necessarily be aligned at the beginning of a 64 byte cache line, in that case we write only to the first cache line even if that means only writing 32 bytes.
I'm testing other options too.
Assignee | ||
Comment 6•1 year ago
|
||
Here's my tests with disabling poisoning altogether compared with the same baseline:
Comparing the no poisoning (Base) vs "poison first word only" (New):
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=8a20423c80ee7b97ff45c4d9ccadc5ccf09e485c&newProject=try&newRevision=721d93ba9f4f510ec956370b76f815444dd9648c&framework=13&page=1&filter=speedometer
Something weird is happening for MacOS. Everything else is at most half a percent different but with low confidence.
Assignee | ||
Comment 7•1 year ago
|
||
This patch alters mozjemalloc's poisoning to poison only the first 64 bytes
(aligned to a single cache line) of (most) memory cells. It has this
behaviour in late beta and release builds and keeps the original "poison
all" behaviour for nightly and early beta builds.
Updated•1 year ago
|
Comment 8•1 year ago
|
||
64 bytes sounds a bit smaller than I'd like. For instance, sizeof(Element) is 136.
Comment 9•1 year ago
|
||
sizeof Element should be 128 on opt builds (It should fit into 2 cache lines).
https://searchfox.org/mozilla-central/rev/a7e33b7f61e7729e2b1051d2a7a27799f11a5de6/dom/base/Element.cpp#200-234
Comment 10•1 year ago
|
||
pbone, could you test a bit larger values, like 128, and also 256, just to see what numbers we get on try?
Comment 11•1 year ago
|
||
FYI since we have a very narrow use-case here (poisoning 1-N bytes at most with N being a fixed value) I wonder if we could reuse the techniques from this paper to make the actual poisoning very fast by largely avoiding branches in the write path. I had mentioned that paper in bug 1821077 comment 0 but I feel it's even more relevant now given the direction we're taking here.
Assignee | ||
Comment 12•1 year ago
|
||
= Synthetic tests
Based on logalloc recording of speedometer 2.
logalloc replay | Cells completely poisoned | Cells partly poisoned | Bytes poisoned | |
---|---|---|---|---|
No poisoning | 6.04s +- 0.5 | 0 | 0 | 0 |
Poison 64 | 6.88s +- 0.3 | 8,716,742 (39.2%) | 13,507,373 (60.8%) | 1,110,050,631 (7.9%) |
Poison 128 | 6.95s +- 0.5 | 16,606,293 (74.7%) | 5,617,797 (25.3%) | 1,919,564,807 (13.7%) |
Poison 256 | 7.00s +- 0.4 | 19,182,550 (86.3%) | 3,041,515 (13.7%) | 2,539,338,927 (18.1%) |
Poison full | 8.73s +- 0.4 | 22,223,977 (100.0%) | 0 | 14,020,182,663 (100%) |
I wanted to see how the object size distribution effects how much poisoning these different options have. I'm doing full speedometer tests on try now.
Assignee | ||
Comment 13•1 year ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #11)
FYI since we have a very narrow use-case here (poisoning 1-N bytes at most with N being a fixed value) I wonder if we could reuse the techniques from this paper to make the actual poisoning very fast by largely avoiding branches in the write path. I had mentioned that paper in bug 1821077 comment 0 but I feel it's even more relevant now given the direction we're taking here.
it's something I started reading about but decided (at the time when I read it) it wasn't low hanging fruit. However I've recently been wrong about what is/isn't low hanging fruit so maybe this is worth it. Plus if we are doing a maximum of say 128 bytes then that gives us more known information which we can use to eliminate code paths & branches (which I think is what you're saying).
Assignee | ||
Comment 14•1 year ago
|
||
Speedometer and pageload
WIndows shippable
sp2 | sp3 | amazon SpeedIndex warm | |
---|---|---|---|
No poisoning | 179.42 ± 0.48 | 13.39 ± 0.31 | 237.55 ± 2.6 |
Poison 64 | 179.09 ± 0.87 | 13.33 ± 0.83 | 242.13 ± 2.56 |
Poison 128 | 179.19 ± 0.43 | 13.35 ± 0.59 | 241.62 ± 3.05 |
Poison 256 | 179.5 ± 0.55 | 13.35 ± 0.33 | 240.47 ± 2.82 |
Poison full | 178.84 ± 0.84 | 13.32 ± 1.03 | 243.2 ± 1.96 |
WIndows nightly as release
sp2 | sp3 | |
---|---|---|
No poisoning | 173.58 ± 0.61 | 12.89 ± 0.57 |
Poison 64 | 173.46 ± 0.3 | 12.86 ± 0.71 |
Poison 128 | 173.28 ± 0.66 | 12.89 ± 0.35 |
Poison 256 | 173.79 ± 0.4 | 12.89 ± 0.5 |
Poison full | 172.65 ± 0.65 | 12.87 ± 0.42 |
Remarks
It looks as if optimisations in either the memset routine (like Gabriele mentions in comment 11) or the CPU's handling of these write patterns are helping with 128 and 256 byte writes. This is the opposite trend from the synthetic tests above. Which could be because:
- Firefox causes this memory to be cached anyway and logalloc-replay doesn't. <- probably this.
- the synethic tests are run on my desktop, while I ran speedometer&pageload through try. These are different CPUs with different microarchitectures.
- The Firefox tests are built with PGO.
No poisoning is much faster than any poisoning, with say 128 or 256 byte poisoning being only slightly faster than full poisoning (The confidence on Speedometer results was "low" while it was "med" on page load tests).
Some specific sp2/3 subtests and other whole tests saw larger improvments. It could be different on something like ARM64 or different x86 microarchitectures.
I'd be happy to land the change for 128 or 256 bytes if others are happy from a, debugging, crash reporting and defensive programming perspective. (NI: mccr8 and gsvelto). I'd probably prefer to keep full poisoning on Nightly builds of Firefox. I considered and spoke to some people about probabilistic poisoning, It might not have a big advantage if we poison as much as 256 bytes. But I'd be happy to test/implement it later on (after Bug 1801255).
Thanks.
Links
My beseline for these links is "full poisoning" aka Mozilla Central:
Compare with poison 64:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e+&newProject=try&newRevision=aa923d46004b8f5f63ab6e6fbc3f87e5311d77c7&framework=13&page=1
Compare with poison 128:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e+&newProject=try&newRevision=b6feb137ff5ec0919b790a34fcc5e14c6f3d4a64&framework=13&page=1
Compare with no poisoning
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e&newProject=try&newRevision=eefb139d6eac8d1f82b19cdb9b2f90c7f88683b0&framework=13&page=1
Comment 15•1 year ago
|
||
256 bytes worth of poisoning is fine by me. I just analyzed a potential UAF that involved an object roughly 220 bytes in size and the poisoning appeared towards the end. I'm pretty confident we won't be dealing with objects much larger than that, at least not often enough to matter.
Comment 16•1 year ago
|
||
256 bytes of poisoning for release and beta sounds fine to me from a bug finding perspective. I'm not sure how to interpret the performance results.
Assignee | ||
Comment 17•1 year ago
|
||
Updated•1 year ago
|
Assignee | ||
Comment 18•1 year ago
|
||
Also change the amount of bytes to poison to 256.
Depends on D187359
Updated•1 year ago
|
Comment 19•1 year ago
|
||
Comment 20•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/738232f0258c
https://hg.mozilla.org/mozilla-central/rev/a25316f82e51
https://hg.mozilla.org/mozilla-central/rev/c477e7c90ec5
https://hg.mozilla.org/mozilla-central/rev/4c4f85a8042e
Comment 21•1 year ago
|
||
Hope y'all don't mind the summary edit. "Disable" is not what ended up being implemented
Updated•1 year ago
|
Description
•