Closed Bug 1850008 Opened 8 months ago Closed 6 months ago

Reduce allocation poisoning in mozjemalloc in release

Categories

(Core :: Memory Allocator, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
120 Branch
Tracking Status
firefox120 --- fixed

People

(Reporter: jesup, Assigned: pbone)

References

Details

Attachments

(3 files)

Google and I believe safari aren't poisoning all allocations on free (chrome may be poisoning 1 cacheline's worth on x86). This is costing us significant performance, perhaps as much as 2%, maybe more. We should consider disabling this in release and late beta (or perhaps all betas).

We'll want try perf runs to disable poisoning if it's left on in nightly/local builds.

Upsides: performance(!)
Downsides:

  • Less obvious UAF crashes (where the memory hasn't been reallocated/cleared)
    ** UAFs are likely to show up as wildptr crashes
  • Information leakage in freshly reallocated memory
    ** only from the same site due to fission, except maybe on android
  • Uninitialized data will be non-deterministic, which may expose existing undefined-behavior bugs. Note however that most of these are caught by the compiler or static analysis test runs.
  • UAF-caused crashes will be less deterministic

I have filed at least a few crashes on poison values that got fixed that were only showing up on release or beta, for what it is worth.

I don't think we should do this. The poison value has enabled fixing a lot of crashes. Maybe somebody could take another look at performance improvements for poisoning?

(chatted a bunch with mccr8)
Performance runs show 0.75% on SP3, and ~1.5% on pageload (*speedindex), up to 4% on amazon.

We can probably retain most (80%? 90%?) of the e5e5 crashes by poisoning a cacheline or two. That might have little or no impact on the perf win.

One risk is that we may not crash if the UAF is quick (before reallocation). For example, calling through vptrs. This makes it hard to spot UAFs that happen quickly.

An additional possible mitigation to these effects is to leverage the PHC work to do probabilistic poisoning of memory.

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #3)

One risk is that we may not crash if the UAF is quick (before reallocation). For example, calling through vptrs. This makes it hard to spot UAFs that happen quickly.

How does that relate to poisoning a cache line or two? That sounds more like delaying the poisoning for later?

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #3)

(chatted a bunch with mccr8)
Performance runs show 0.75% on SP3, and ~1.5% on pageload (*speedindex), up to 4% on amazon.

We can probably retain most (80%? 90%?) of the e5e5 crashes by poisoning a cacheline or two. That might have little or no impact on the perf win.

Thanks.

For poisoning only one cache line I have a 0.53% sp3 improvement on Windows, but with low confidence:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ceeefc072d1e8e62ea52de536b6e8ff4374f36d5&newProject=try&newRevision=721d93ba9f4f510ec956370b76f815444dd9648c&framework=13&page=1&filter=speedometer

And a 2.17% improvment on amazon's speedindex (low confidence).

My patch goes to some effort to write to only one cache line, except when the memory cell is less than or equal to 64 bytes long, it just writes the whole thing in that case. For example an 96 byte cell won't necessarily be aligned at the beginning of a 64 byte cache line, in that case we write only to the first cache line even if that means only writing 32 bytes.

I'm testing other options too.

This patch alters mozjemalloc's poisoning to poison only the first 64 bytes
(aligned to a single cache line) of (most) memory cells. It has this
behaviour in late beta and release builds and keeps the original "poison
all" behaviour for nightly and early beta builds.

Attachment #9351402 - Attachment description: Bug 1850008 - Poison at most the first 64 bytes of memory cells r=jesup → Bug 1850008 - Poison at most the first 64 bytes of memory cells r=jesup,glandium

64 bytes sounds a bit smaller than I'd like. For instance, sizeof(Element) is 136.

pbone, could you test a bit larger values, like 128, and also 256, just to see what numbers we get on try?

Flags: needinfo?(pbone)

FYI since we have a very narrow use-case here (poisoning 1-N bytes at most with N being a fixed value) I wonder if we could reuse the techniques from this paper to make the actual poisoning very fast by largely avoiding branches in the write path. I had mentioned that paper in bug 1821077 comment 0 but I feel it's even more relevant now given the direction we're taking here.

= Synthetic tests

Based on logalloc recording of speedometer 2.

logalloc replay Cells completely poisoned Cells partly poisoned Bytes poisoned
No poisoning 6.04s +- 0.5 0 0 0
Poison 64 6.88s +- 0.3 8,716,742 (39.2%) 13,507,373 (60.8%) 1,110,050,631 (7.9%)
Poison 128 6.95s +- 0.5 16,606,293 (74.7%) 5,617,797 (25.3%) 1,919,564,807 (13.7%)
Poison 256 7.00s +- 0.4 19,182,550 (86.3%) 3,041,515 (13.7%) 2,539,338,927 (18.1%)
Poison full 8.73s +- 0.4 22,223,977 (100.0%) 0 14,020,182,663 (100%)

I wanted to see how the object size distribution effects how much poisoning these different options have. I'm doing full speedometer tests on try now.

(In reply to Gabriele Svelto [:gsvelto] from comment #11)

FYI since we have a very narrow use-case here (poisoning 1-N bytes at most with N being a fixed value) I wonder if we could reuse the techniques from this paper to make the actual poisoning very fast by largely avoiding branches in the write path. I had mentioned that paper in bug 1821077 comment 0 but I feel it's even more relevant now given the direction we're taking here.

it's something I started reading about but decided (at the time when I read it) it wasn't low hanging fruit. However I've recently been wrong about what is/isn't low hanging fruit so maybe this is worth it. Plus if we are doing a maximum of say 128 bytes then that gives us more known information which we can use to eliminate code paths & branches (which I think is what you're saying).

Speedometer and pageload

WIndows shippable

sp2 sp3 amazon SpeedIndex warm
No poisoning 179.42 ± 0.48 13.39 ± 0.31 237.55 ± 2.6
Poison 64 179.09 ± 0.87 13.33 ± 0.83 242.13 ± 2.56
Poison 128 179.19 ± 0.43 13.35 ± 0.59 241.62 ± 3.05
Poison 256 179.5 ± 0.55 13.35 ± 0.33 240.47 ± 2.82
Poison full 178.84 ± 0.84 13.32 ± 1.03 243.2 ± 1.96

WIndows nightly as release

sp2 sp3
No poisoning 173.58 ± 0.61 12.89 ± 0.57
Poison 64 173.46 ± 0.3 12.86 ± 0.71
Poison 128 173.28 ± 0.66 12.89 ± 0.35
Poison 256 173.79 ± 0.4 12.89 ± 0.5
Poison full 172.65 ± 0.65 12.87 ± 0.42

Remarks

It looks as if optimisations in either the memset routine (like Gabriele mentions in comment 11) or the CPU's handling of these write patterns are helping with 128 and 256 byte writes. This is the opposite trend from the synthetic tests above. Which could be because:

  • Firefox causes this memory to be cached anyway and logalloc-replay doesn't. <- probably this.
  • the synethic tests are run on my desktop, while I ran speedometer&pageload through try. These are different CPUs with different microarchitectures.
  • The Firefox tests are built with PGO.

No poisoning is much faster than any poisoning, with say 128 or 256 byte poisoning being only slightly faster than full poisoning (The confidence on Speedometer results was "low" while it was "med" on page load tests).

Some specific sp2/3 subtests and other whole tests saw larger improvments. It could be different on something like ARM64 or different x86 microarchitectures.

I'd be happy to land the change for 128 or 256 bytes if others are happy from a, debugging, crash reporting and defensive programming perspective. (NI: mccr8 and gsvelto). I'd probably prefer to keep full poisoning on Nightly builds of Firefox. I considered and spoke to some people about probabilistic poisoning, It might not have a big advantage if we poison as much as 256 bytes. But I'd be happy to test/implement it later on (after Bug 1801255).

Thanks.

Links

My beseline for these links is "full poisoning" aka Mozilla Central:

Compare with poison 64:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e+&newProject=try&newRevision=aa923d46004b8f5f63ab6e6fbc3f87e5311d77c7&framework=13&page=1

Compare with poison 128:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e+&newProject=try&newRevision=b6feb137ff5ec0919b790a34fcc5e14c6f3d4a64&framework=13&page=1

Compare with poison 256:
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e+&newProject=try&newRevision=cc15d80fce7b0970714f5975799fa077318b2880&framework=13&page=1&filter=amazon+SpeedIndex

Compare with no poisoning
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3330b2acd782f29b9d0aacc44a0b62fa8791923e&newProject=try&newRevision=eefb139d6eac8d1f82b19cdb9b2f90c7f88683b0&framework=13&page=1

Flags: needinfo?(pbone)
Flags: needinfo?(gsvelto)
Flags: needinfo?(continuation)

256 bytes worth of poisoning is fine by me. I just analyzed a potential UAF that involved an object roughly 220 bytes in size and the poisoning appeared towards the end. I'm pretty confident we won't be dealing with objects much larger than that, at least not often enough to matter.

Flags: needinfo?(gsvelto)

256 bytes of poisoning for release and beta sounds fine to me from a bug finding perspective. I'm not sure how to interpret the performance results.

Flags: needinfo?(continuation)
Attachment #9351402 - Attachment description: Bug 1850008 - Poison at most the first 64 bytes of memory cells r=jesup,glandium → Bug 1850008 - pt 2. Poison at most the first 64 bytes of a memory cell r=jesup,glandium

Also change the amount of bytes to poison to 256.

Depends on D187359

Assignee: rjesup → pbone
Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/738232f0258c
pt 1. Refactor parsing of MALLOC_OPTIONS r=glandium
https://hg.mozilla.org/integration/autoland/rev/a25316f82e51
pt 2. Poison at most the first 64 bytes of a memory cell r=jesup,glandium
https://hg.mozilla.org/integration/autoland/rev/c477e7c90ec5
pt 3. Add a parameter for how many bytes to poison r=glandium,smaug
https://hg.mozilla.org/integration/autoland/rev/4c4f85a8042e
apply code formatting via Lando
Regressions: 1858137

Hope y'all don't mind the summary edit. "Disable" is not what ended up being implemented

Summary: Disable allocation poisoning in mozjemalloc in release → Reduce allocation poisoning in mozjemalloc in release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: