Open Bug 1809610 Opened 2 years ago Updated 2 months ago

[meta] Reduce lock contention in jemalloc

Tracking

()

Status:

REOPENED

People

(Reporter: pbone, Assigned: pbone)

References

(Depends on 9 open bugs, Blocks 3 open bugs)

Details

(Keywords: meta, perf:responsiveness)

Attachments

(4 obsolete files)

Bug 1809610 - Move chunk releasing out of the arena lock's critical section r=glandium 2 years ago Paul Bone [:pbone] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1809610 - Poison memory we want to free before we take the arena lock r=glandium 2 years ago Paul Bone [:pbone] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1809610 - Don't run background sweep and background free concurrently r=jonco 2 years ago Paul Bone [:pbone] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1809610 - Coalesce madvise calls until after sweeping r=glandium,jonco 2 years ago Paul Bone [:pbone] 48 bytes, text/x-phabricator-request		Details \| Review

Paul Bone [:pbone]

Assignee

Description

•

2 years ago

Jon and Jan noticed a lot of lock contention in jemalloc (Bug 1809058) specifically in the free routine from off-thread sweeping. Likewise I noticed the JS arena is often contended (Bug 1805644) we can move several things to outside the arena lock's critical section on the free path. The candidates are:

VirtualFree/munmap
madvise
Poisoning

Jeff Muizelaar [:jrmuizel]

Comment 1

•

2 years ago

In the speedometer 3 meeting yesterday, Randal suggested that he's already been looking at this and that there are existing bugs.

Flags: needinfo?(rjesup)

Mike Hommey [:glandium]

Comment 2

•

2 years ago

There are existing bugs, and known problems from doing so (at least for poisoning, IIRC).

Mike Hommey [:glandium]

Comment 3

•

2 years ago

Found the known problems poisoning outside a lock: https://phabricator.services.mozilla.com/D60036#1843167

Mike Hommey [:glandium]

Comment 4

•

2 years ago

IME the best way to reduce lock contention is to have more fine-grained locking (at the run level), which itself requires not using pthread locks because they are too large. I did that in the Rust prototype, with good results.

Paul Bone [:pbone]

Assignee

Comment 5

•

2 years ago

(In reply to Mike Hommey [:glandium] from comment #4)

IME the best way to reduce lock contention is to have more fine-grained locking (at the run level), which itself requires not using pthread locks because they are too large. I did that in the Rust prototype, with good results.

I'll take a look. I'm proposing "just moving" this code rather than doing fine-grained locking because I didn't want to invest too much time. But I'll consider making that a longer-term plan, part of looking at other allocators.

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Comment 6

•

2 years ago

Attached file Bug 1809610 - Move chunk releasing out of the arena lock's critical section r=glandium (obsolete) — Details

The system calls of releasing a chunk of memory can be costly and should be
done outside the arena lock's critical section so that other threads aren't
blocked waiting for the lock.

Paul Bone [:pbone]

Assignee

Comment 7

•

2 years ago

Attached file Bug 1809610 - Poison memory we want to free before we take the arena lock r=glandium (obsolete) — Details

Depends on D166775

Paul Bone [:pbone]

Assignee

Comment 8

•

2 years ago

Attached file Bug 1809610 - Don't run background sweep and background free concurrently r=jonco (obsolete) — Details

This should reduce lock contention on the jemalloc locks but it also makes
it easier to manage jemalloc state in the next patch.

Depends on D166776

Paul Bone [:pbone]

Assignee

Comment 9

•

2 years ago

Attached file Bug 1809610 - Coalesce madvise calls until after sweeping r=glandium,jonco (obsolete) — Details

We must hold the arena lock while calling madvise so that our record of what
is madvised and what isn't is consistent with the OS. But we can batch
these operations until after the JS GC has completed sweeping which combines
more system calls and doesn't block the arena lock for so long.

Depends on D166777

Phabricator Automation

Updated

•

2 years ago

Attachment #9312151 - Attachment description: Bug 1809610 - Coalleasce madvise calls until after sweeping r=glandium,jonco → Bug 1809610 - Coalesce madvise calls until after sweeping r=glandium,jonco

Paul Bone [:pbone]

Assignee

Comment 10

•

2 years ago

In a review jonco asked if coalleascing the calls to madvise leads to fewer-but-longer pauses. I want to share that here on the bug where it's more visible:

	mozilla-central	with patches
Number of times Purge() called	49,706.00	36,110.00
Total duration of Purge()	2.44s	1.95s
Average pause	49.04us	53.98us
Median	43.96us	40.77us
90th percentile	85.03us	81.77us
99th percentile	181.08us	337.83us

The arena lock is held for the entire duration of Purge(), so all these numbers represent the length of that critical section and likely pauses on the main thread if the JS engine tries to allocate memory (but not gecko, that's a different arena). I obtained the figures above from opening a tab with the speedometer 2 benchmark, running the benchmark then closing the browser. They are for the content process only.

While the total time spent in the lock, and therefore the benchmark result is a little better. I don't like the idea of adding latency in those 99th percentile results. So I'm now experimenting with having Purge() exit early even if it hasn't de-committed "enough" dirty pages, then running again each time a thread obtains the lock until enough dirtied pages are purged. The best results so far are for purging one chunk at a time. Which has better average-case and worst-case behaviour than my patches so far, and only slightly worse overall throughput - but I'll keep experimenting.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Comment 11

•

2 years ago

(In reply to Paul Bone [:pbone] from comment #10)
Thanks for sharing these results. This seems to reduce the number of purges by 25% while keeping the median and average pause roughly the same, so I'd say that's a good win.

Flags: needinfo?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 12

•

2 years ago

Here are some results from try:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3240143fd8c0db90e502ba9ede37b54a21ec544d&newProject=try&newRevision=62882f3e634051a946e0c550ab3d8678a0c5adad&framework=13&page=1&showOnlyComparable=1

I also ran tests in the "browsertime", "awsy" and "js-bench" groups.

Lots of things are better, particularly responsiveness but also some throughput. Some page load benchmarks are worse though. I'm wondering if now that a bunch of madvise calls happen all in one lock transaction if that's hitting some benchmarks in an unlucky way. I'm going to investigate a bit further.

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Updated

•

2 years ago

Blocks: 1811985

Jan de Mooij [:jandem]

Updated

•

2 years ago

Blocks: 1811987

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1812231

Paul Bone [:pbone]

Assignee

Comment 13

•

2 years ago

(In reply to Paul Bone [:pbone] from comment #12)

Here are some results from try:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=3240143fd8c0db90e502ba9ede37b54a21ec544d&newProject=try&newRevision=62882f3e634051a946e0c550ab3d8678a0c5adad&framework=13&page=1&showOnlyComparable=1

I also ran tests in the "browsertime", "awsy" and "js-bench" groups.

Lots of things are better, particularly responsiveness but also some throughput. Some page load benchmarks are worse though. I'm wondering if now that a bunch of madvise calls happen all in one lock transaction if that's hitting some benchmarks in an unlucky way. I'm going to investigate a bit further.

I ran the "fandom" tests (some of the most extreme in try) on my PC and extracted the fcp values. I ran it against the baseline and then each of the 5 patches in my workspace applied in series. I'm not an expert statistician, so grain of salt. But the data appears to be too noisy to say that there's a statistical difference. I was hoping to find "Ah, this patch is the one that makes page loads slower" and have clues about why. But I can't spot a difference that "feels" significant. I'll try them on "try" in case the environment makes a difference.

In the meantime if someone else can spot something in this data or has other insights about the tests I'd appreciate the clues.

Flags: needinfo?(smaug)

Flags: needinfo?(rjesup)

Flags: needinfo?(bas)

Paul Bone [:pbone]

Assignee

Comment 14

•

2 years ago

I'm going to break this bug up into sub-bugs so that landing each one is lower risk and gives us a better chance at understanding performance changes (including unforseen regressions).

No longer depends on: 1812231

Summary: Move some work to outside critical sections. → Reduce lock contention in jemalloc

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Priority: -- → P1

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1810953

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1810954

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1609478

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Keywords: perf:responsiveness

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1814808

Phabricator Automation

Comment 15

•

2 years ago

Comment on attachment 9312148 [details]
Bug 1809610 - Move chunk releasing out of the arena lock's critical section r=glandium

Revision D166775 was moved to bug 1814808. Setting attachment 9312148 [details] to obsolete.

Attachment #9312148 - Attachment is obsolete: true

Phabricator Automation

Comment 16

•

2 years ago

Comment on attachment 9312149 [details]
Bug 1809610 - Poison memory we want to free before we take the arena lock r=glandium

Revision D166776 was moved to bug 1609478. Setting attachment 9312149 [details] to obsolete.

Attachment #9312149 - Attachment is obsolete: true

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1814813

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1814815

Phabricator Automation

Comment 17

•

2 years ago

Comment on attachment 9312151 [details]
Bug 1809610 - Coalesce madvise calls until after sweeping r=glandium,jonco

Revision D166778 was moved to bug 1814815. Setting attachment 9312151 [details] to obsolete.

Attachment #9312151 - Attachment is obsolete: true

Paul Bone [:pbone]

Assignee

Updated

•

2 years ago

Depends on: 1815069

Gian-Carlo Pascutto [:gcp]

Updated

•

2 years ago

Keywords: meta

Summary: Reduce lock contention in jemalloc → [meta] Reduce lock contention in jemalloc

Paul Bone [:pbone]

Assignee

Comment 18

•

2 years ago

I have some ideas to try to try to figure out the regression.

Flags: needinfo?(smaug)

Flags: needinfo?(bas)

Phabricator Automation

Comment 19

•

2 years ago

Comment on attachment 9312150 [details]
Bug 1809610 - Don't run background sweep and background free concurrently r=jonco

Revision D166777 was moved to bug 1814813. Setting attachment 9312150 [details] to obsolete.

Attachment #9312150 - Attachment is obsolete: true

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Whiteboard: [sp3]

Jira Integration Bot

Updated

•

2 years ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-113

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

See Also: https://mozilla-hub.atlassian.net/browse/SP3-113 →

Whiteboard: [sp3]

Paul Bone [:pbone]

Assignee

Comment 20

•

6 months ago

Unassigned because this is a meta bug.

Status: ASSIGNED → RESOLVED

Closed: 6 months ago

Resolution: --- → FIXED

Paul Bone [:pbone]

Assignee

Updated

•

6 months ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Paul Bone [:pbone]

Assignee

Updated

•

6 months ago

Depends on: 1898646

Paul Bone [:pbone]

Assignee

Updated

•

6 months ago

Depends on: 1898647

Paul Bone [:pbone]

Assignee

Updated

•

5 months ago

Depends on: 1843997

Paul Bone [:pbone]

Assignee

Updated

•

5 months ago

Depends on: 1488780

Takanori MATSUURA

Updated

•

4 months ago

Updated

•

2 months ago

Depends on: 1916791

Paul Bone [:pbone]

Assignee

Updated

•

2 months ago

Depends on: 1921908

You need to log in before you can comment on or make changes to this bug.